Leveraging Loanword Constraints for Improving Machine Translation in Low-resource Settings
- Felermino Ali
Translating from high-resource to low-resource languages like Emakhuwa remains a challenge due to limited parallel data, orthographic variation, and frequent loanwords and code-switching. In this talk Felermino will discuss how to apply lexicon-guided neural machine translation, integrating bilingual dictionaries, and loanword mappings into the training process to address this challenge.
Our method uses over 8,000 dictionary entries and 12,000 loanword mappings to build sentence-specific glossaries incorporated via input augmentation. Experiments on FLORES+ show improved lexical coverage, reduced inconsistencies, and more contextual accurate translations. Suggesting a promising direction for low-resource MT by bridging data scarcity and vocabulary gaps through structured lexical integration.
-
-
Felermino Ali
ML Researcher
-
-
Regardez suivant
-
-
-
-
-
-
-
-
Physics and information theory of generative diffusion
- Luca Ambrogioni
-
-
Upper Bound 2024: Towards Human-Centered AI in AAA Video Game
- Raluca Georgescu