The Languages That Will Not Auto-Translate

In a steadily globalizing world that’s spinning faster every day, there’s little doubt that auto-translation is here to stay. Despite all the snickering at machine translations, language apps based on neural networks (interconnected computer systems that mimic human thought processes) are still unbeatable for speed (instant) and cost (free).

However, no number of algorithms can yet replace the human brain, much less the sensitivity to context and idioms (it’s raining cats and dogs) of professional translators.

Fast, Free, and Flawed―but Maturing

Thanks to the influence of international organizations (like the United Nations), and multilingual institutions (like the European Parliament), massive databases have been building up huge amounts of parallel data for over fifty years. However, it was only during the second decade of the 21st century―with the advent of deep neural networks (DNNs)―that all this human-translated documentation could be put to more practical use.

Using these linguistic treasure troves, a number of major technology players have developed free translation platforms, such as:

  • Google Translate: (133 languages) Accessed by more than 500 million daily users, with English, Spanish, Arabic, Russian, Portuguese, and Indonesian used most frequently, while Bengali, Haitian Creole, and Tajik linger in the rear;
  • Microsoft’s Bing Translator: (103 languages) A cloud service that is part of Microsoft Cognitive Services, integrated across multiple products that include Bing, MS Office, Edge, Skype, and Windows, as well as Apple and Android devices;
  • DeepL: (28 languages) This engine is now expanding from its European base to include languages from all over the world; it is grounded on a gigantic corpus of human-translated sentences, idioms, and snippets contained in the Linguee online dictionary.

Broadening the Focus

This initial dependence on these digital goldmines of parallel data perhaps explains why automatic translation currently includes European languages like Finnish (five million speakers) while ignoring 48 million Ethiopians who speak Oromo. Other mother tongues still languishing in technological limbo include Bhojpuri (51 million), Fula (24 million), Sylheti (11 million), and Kirundi (9 million).

However, this situation is changing, with added translation languages expanding the range of machine translation. As noted by Carl Rubino, a program manager at IARPA (the US Intelligence Services Research Center), “Many of the challenges we face today, such as economic and political instability, the Covid-19 pandemic, and climate change, transcend our planet― and are thus multilingual in nature.”

Saving Lives through Languages

As these challenges often weigh heaviest on underprivileged communities that are the least equipped to cope with them, instant but accurate communications are rapidly becoming a matter of life and death. While the output of human translators is limited by physical constraints, computers can run 24/7 at superhuman speeds. They can churn out non-stop streams of analyses, reports, and guidelines that might be grammatically imperfect, but are fit for purpose in tight emergency timeframes.

This is the true value of extending the scope of automatic translation. By facilitating instant communication across linguistic and cultural barriers when lives and livelihoods are at stake, these faceless algorithms throw lifelines to low-tech communities battling to survive adverse conditions.

Low-Resource Languages

Although spoken by millions of people, many languages offer limited (and often monolingual) written resources, despite their rich oral traditions. For deep neural networks, these low-resource languages―as they’re known in the industry―have been hard to tackle. Meanwhile, speakers of these tongues are busily uploading posts and blogs that may well ensure the survival of their societies, despite their traditional lack of records and books.

In historical terms, multilingual sources among some of these cultures were often limited to narrow datasets provided by faith-based literature, particularly holy books that were widely translated, like the Q‘ran and the Bible. In more modern times, print, audiovisual media, and social networks are building up solid inventories of single-language data that can be analyzed, and translated by deep neural networks.

Social Networks Building Community Safety Nets

Modern neural network models can now be pre-trained with spoken and written monolingual sources. The theory is that neural models have learned certain features and structures of human language, established through parameters that are now being applied to translation tasks.

With users all over the world posting content that is often fairly repetitive across cultural borders and in their mother tongues, neural models can now summarize texts for users. To do so, these apps seem to need very little bilingual training from parallel data, with a few hundred thousand words (perhaps half a dozen novels) being enough.

Takeaway: With some 7,000 languages spoken worldwide (but only about 4,000 of them written), virtual translation apps have vast fields open for expansion. From healthcare to agriculture, bridging linguistic and cultural gaps through automatic translation is clearly the path to a better future for humankind―but always with a helping hand from professional translators, who are not only expertly versed in more than one language but also in more than one culture.

Image by Yatheesh Gowda from Pixabay