Can Artificial Intelligence Decipher Lost Languages?

Ancient Writing

Celebrated in lore and legend, ancient civilizations and their forgotten languages have long fascinated historians, archaeologists, and linguists. Today, these researchers have a new instrument in their professional toolboxes: Artificial Intelligence, a technology that could help unravel the secrets of societies that vanished millennia ago.

Hieroglyphics and the Rosetta Stone

From Sumerian cuneiform to pre-Roman Etruscan inscriptions, both academics and amateurs have pored over these relics for centuries. Among them was Jean-Francois Champollion (1790-1832), a young Frenchman who deciphered one of the best-known ancient writing systems: Egyptian hieroglyphics. To do so, he used the famous Rosetta Stone. This granodiorite stele was inscribed with fragments of the same text in three scripts—32 lines of Demotic, 14 lines of hieroglyphics, and 54 lines of Greek.

Cuneiform and Other Ancient Enigmas

Another intriguing example is Ugaritic. Discovered by French archaeologists in 1929 on a series of clay tablets found in the 1920s in the Tell of Ugarit, it was written in a consonantal cuneiform alphabet. This extinct Northwest Semitic language was used by Hebrew scholars analyzing Biblical Hebrew texts, revealing similarities between ancient Israel and Judah and their neighboring cultures.

Elsewhere in the world, there are plenty of mysterious texts still to be deciphered, like the Voynich Manuscript (Europe), the Cascajal Block (Central America), and Rongorongo (Rapa Nui/Easter Island).

Bronze Age Mysteries

Perhaps the best-known example in modern times is Linear B, initially found among Cretan ruins dating back to the Bronze Age. Although British architect Michael Ventris is usually credited with deciphering Linear B—now acknowledged as the earliest form of Greek orthography and developed about 1400 BC—his efforts were underpinned by classicist Alice Kober. She compiled a primitive analog ‘database’ in her New York home, storing about 180,000 slips of paper in cigarette boxes. Tragically, she died two years before this mysterious code was cracked by Ventris in 1952.

Between them, it took over six decades of painstaking effort to understand Linear B, which is an offshoot of the Linear A syllabary, used by the mysterious Minoan civilization and possibly unrelated to the Indo-European languages. However, technology is speeding up the decipherment of these ancient languages, only a century after the excavations of Knossos by British archaeologist Sir Arthur Evans (1851–1941).

Indus Valley Mystery

During the 1870s, bricks salvaged from a ruined town in Punjab were used as ballast underpinning almost a hundred miles of railway track between the towns of Multan and Lahore. However, Army engineer and archaeological surveyor Alexander Cunningham found a few shards of ancient pottery, as well as a tiny stone tablet about 1.5 inches square, inscribed with six unfamiliar characters and a one-horned bull or rhinoceros (or maybe even a unicorn).

Since then, some 4,000 other relics have been unearthed, most of them along the Indus River in Pakistan, with others in India and even Iraq. Containing up to 700 unique symbols, it seems likely that they were used as taxation and trade control seals, and possibly read right to left. However, nobody knows what these pre-Vedic signs mean, despite over a hundred attempts published during the past century, but there’s renewed momentum being driven by AI’s cutting-edge technology.

Deep Learning Limits

Experts like Indus script researcher Bahata Ansumali Mukhopadhyay is well aware of the limits of even the most powerful computers. She believes that many cognitive aspects cannot be encoded into convenient frameworks, as current AI iterations are unable to cope with information that is not quantifiable in ways understood by computers.

Even deep learning—currently the dominant AI technique—is just a matter of pattern recognition, with output improving in parallel to the amount of information fed into the system. However, this data-by-the-truckload approach falters with low-resource subjects like ancient languages, which are often incomplete, chipped and eroded by time. Scholars (and computers) have no way of knowing whether a scratch endows a symbol with new meaning, or is simply random damage.

What Lies Ahead?

Even machine learning enthusiasts—like MIT scientist Jiaming Luo—are not expecting instant translations of lost languages to be automatically churned out by some “archeo-trans” app. At best, he envisages analyzing these linguistic fragments against contemporary languages in their surrounding regions, seeking fragments of linguistic links.

He feels that a hybrid approach is the most likely to produce positive outcomes. Initially, the brute force of massive computers could be used to examine artifacts and shortlist possible relationships with known languages. This might save the decades of effort needed by earlier orthographists, allowing today’s experts to take over more subtle aspects, with inspiration spurring blind leaps of faith.

Takeaway: Alphabets and symbols are unique reflections of the civilizations that once used them. Thousands of years later, it’s quite clear that even the most sophisticated machines are still no match for the human brain, particularly for translating the thoughts and concepts that make each society unique.

Image by Peace,love,happiness from Pixabay