Digital Archives @ UP Diliman

Item Details

Status : Verified

Personal Name	Visperas, Moses L.
Resource Title	On the Effects of Language Clustering for Low-Resource Multilingual Machine Translation Model for Select Philippine Languages
Date Issued	24 January 2024
Abstract	Language translation is a tedious task for written communication that needs mastery for both the source and the target language. In efforts to reduce the amount of work necessary to perform this task, machine translation (MT) systems are present. Commercial MT systems are available online but only support a few Philippine languages. Meanwhile, recent studies about MT systems for other Philippine languages are either outdated or perform poorly when translating to or from Tagalog. Neural machine translation (NMT) has gained vast popularity over the years – with it achieving significant quality across multiple languages. However, NMT still underperforms when translating low-resource language pairs such as most Philippine languages. In this study, we investigated the effectiveness of language clustering in multilingual machine translation for low-resource Philippine languages. We analyzed two clusters: the northern Philippine cluster (Ibanag, Ilocano, Pangasinan) and the Central Philippine cluster (Bicolano, Cebuano, Hiligaynon, Waray), comparing them to a baseline multilingual model consisting of all the aforementioned languages. Our models, trained on the JW300 dataset using T5-small architecture, were evaluated using n-gram (BLEU, NIST, METEOR) and distance-based (TER) metrics. We statistically validated our models with paired bootstrap resampling where we sampled and evaluated 300 sentence pairs for each translation direction from an original 1000 test data for 1000 iterations. All models were able to produce good translations where on average they have BLEU scores of 30 and above. Initial analysis revealed that language clustering showed potential benefits, improving confidence intervals in translations from Tagalog to other languages. Specifically, it resulted in an average gain of 0.7813 and 0.8754 BLEU points for the lower and upper bounds of the confidence intervals, respectively. However, detailed statistical analysis using paired bootstrap resampling found
Degree Course	MS Electrical Engineering
Language	English
Keyword	machine translation; deep learning; multilingual; language clustering
Material Type	Thesis/Dissertation

Preliminary Pages

586.10 Kb

Details

Category : P - Author wishes to publish the work personally.

Access Permission : Limited Access

Copyright © 2017-2026 | The University Library, UP Diliman

Acceptable Use Policy | Privacy Policy