Thesis - Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations


■ 10.2017 - 10.2020 ■ Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations

○ NLP (Utility)
○ Doctoral Project: 3 years in Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)

○ Technical Keywords: Word Alignment, Variational Auto-Encoder, Subword Tokenization, Byte Pair Encoding, SentencePiece, Unigram, Expectation-Maximization, Hidden Markov Model, Convolutional Neural Network, Long Short Term Memory Network, Giza++, Fastalign, Simalign, Eflomal, Tensorflow, Pytorch, Python, Slurm, GPU
Github
○ Word alignment - app: Word Alignment Evaluation Application
○ Word alignment statistic for corpora used in the project: Application
○ Subword tokenization for word alignment: Application