Published 13:46 IST, October 20th 2020

Facebook unveils AI translator for 100 languages without relying on English data

Facebook announced that the MMT model that can directly translate “100×100 languages” in any direction without relying on only English-centric data.

Tech
2 min read

Follow:

null | Image: self

Facebook unveiled a software based on machine learning that can translate between any pair of 100 languages without relying on English. The first multilingual machine translation (MMT) model is an open-source artificial intelligence software which directly trains on one language to another without using English as intermediate data which helps preserve the meaning.

Facebook AI research assistant Angela Fan said in a blog post that advanced multilingual systems can process multiple languages but they compromise on accuracy by relying on English data to bridge the gap between the source and target languages. Fan announced that the MMT model that can directly translate “100×100 languages” in any direction without relying on only English-centric data.

“Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations,” wrote Fan.

Read: Facebook Rejects 2.2 Million Ads; Takes Down 120,000 Posts Ahead Of US Elections 2020

Read: Congress Cries Foul Over Blocking Of Meira Kumar's Facebook Page, Draws 'Bihar Polls' Link

The research assistant further described that Facebook built the “many-to-many” data set with 7.5 billion sentences for 100 languages. She said that tech giant used several scaling techniques to build a universal model with 15 billion parameters, which captures information from related languages and reflects a more diverse script of languages and morphology.

Bridge languages

Fan said that the team identified a small number of bridge languages, which are usually one to three major languages of each group, to connect the languages of different groups. Giving the example of Hindi, Bengali, and Tamil as bridge languages for Indo-Aryan languages, she said that the team mined parallel training data for all possible combinations of these bridge languages.

“Our training data set ended up with 7.5 billion parallel sentences of data, corresponding to 2,200 directions. Since the mined data can be used to train two directions of a given language pair...our mining strategy helps us effectively sparsely mine to best cover all 100×100 directions in one model,” she wrote.

Read: Trump Slams Facebook, Twitter For Taking Down Controversial Article Critical Of Joe Biden

Read: Facebook Bans Anti-vaccination Ads To Clamp Down On Misinformation Amid COVID-19

Updated 13:45 IST, October 20th 2020

IPL