Researchers have developed an AI enabled system to efficiently translate 2,500 year old texts from the Persia’s Achaemenid Empire.
Over 2,500 years ago Persia’s Achaemenid Empire recorded their ‘paperwork’ onto clay tablets. Tens of thousands of these tablets were discovered by archaeologists in Iran in 1933. Since then, researchers have painstakingly studied and translated these ancient documents by hand, but this manual deciphering process is very difficult, slow and prone to errors.
Due to the three-dimensional nature of the tablets and the complexity of the cuneiform characters, computer aided systems have been unable to translate these documents. A breakthrough at the University of Chicago may have finally automated the transcription of these tablets. This translation has revealed rich information about Achaemenid history, society and language.
With a training set of more than 6,000 annotated images from the Persepolis Fortification Archive, the Center for Data and Computing-funded project will build a model that can ‘read’ the unanalysed tablets in the collection.
Revolutionising archaeological and anthropological research
“If we could come up with a tool that is flexible and extensible, that can spread to different scripts and time periods, that would really be field-changing,” said Susanne Paulus, associate professor of Assyriology.
“From the computer vision perspective, it’s really interesting because these are the same challenges that we face. Computer vision over the last five years has improved so significantly; ten years ago, this would have been hand wavy, we wouldn’t have gotten this far,” Sanjay Krishnan of the Department of Computer Science.
“It’s a good machine learning problem, because the accuracy is objective here, we have a labelled training set and we understand the script pretty well and that helps us. It’s not a completely unknown problem.”
The training set is a result of more than 80 years of close study by the University of Chicago. Using this collection, researchers created a dictionary of the Elamite language inscribed on the tablets, and students learning how to decipher cuneiform built a database of more than 100,000 ‘hotspots,’ or identified individual signs.