Does Machine Translation Hallucinate Electric Sheep?

Robots may not dream, but they do hallucinate. With artificial intelligence at the center of the media spotlight lately, the tendency of algorithm-based technology to “hallucinate” material has become a primary concern regarding the use of AI, and this is no different for the field of machine translation.

In this post, we examine what machine translation hallucinations are, why they occur, and how to prevent this serious issue from affecting your potential translation projects.

What are machine translation hallucinations?

Machine translation uses computer software—usually artificial intelligence or machine learning solutions—to translate text from one language to another, in contrast to human translation. While machine translation can help save time and money, unfortunately, these machines can also produce hallucinations: translation outputs that are completely unrelated from the original input (and which are often bizarre).

For example, one experiment with machine translation found that the model used hallucinated completely new stories. The original sentence was a statement about an ongoing strike in Colombia; after first going through Google Translate into Marathi and then being translated back into English by the machine translator, the sentence became a wildly off-base statement about U.S. children serving in the Jehovah’s Witnesses. As this example illustrates, hallucinations can happen especially often when translating out of English into less commonly used languages, about which the machine may not have as much reliable data.

Why do hallucinations occur?

Some neural machine translation (NMT) hallucinations are caused by “input perturbation,” or an unexpected element of the input that consequently taints the output. The input might have a typo, a quirky style, an unusual word, or a word that is simply not accounted for in the model.

Other hallucinations stem from underlying issues with the data used to train the model. Researchers have found, for instance, that some models have issues with over-memorization of phrases, that is, phrases that are so memorized by the machine that it wants to repeat those phrases in exactly the same way each time, without recognizing context or idiom. A high amount of “noise” in the body of training data—too many erroneous, or misaligned, pairs between source and target sentences—can also result in hallucinations.

And when it comes to LLM-based models like ChatGPT, the picture is equally troubling. In addition to off-target translations and translation glitches or failures, training data can also contain toxic material that sneaks into the translation output.

How to avoid translation hallucinations

As you can imagine, machine translation hallucinations can be disastrous for a business, eroding user confidence and causing grave security concerns. While there are ways to attempt fine-tuning of the machines themselves, the only tried-and-true way to fix hallucinations beyond a doubt is with the human touch—a real, professional human translator.

Using human translation doesn’t mean you can’t use machine translation as the first step in a major project. Instead, you can hire a human translator to clean up the “first draft” of machine translation in a process known as post-editing. Not only will post-editors review and correct your machine translations, polishing them to a high standard, but their re-translations can also be incorporated into the memory of the machine itself, improving its “training” for future projects.

For the best results, you should seek out professional linguists specifically trained in machine translation post-editing, such as our expert post-editors at Trusted Translations, who follow a rigorous process to ensure the quality of each translation.

Image by Enrique from Pixabay