How To Clean A TM And Not Die Trying

As we mentioned in a previous post, cleaning the translation memory (TM) of one of our largest and oldest clients was a task that required the use of Xbench and a consequent report which we imported into an Excel spreadsheet, and finally choosing Olifant to edit the TM. The TM had thousands of segments, and Xbench reported more than sixteen thousand inconsistencies. It was a job which required many hours, but we managed to succeed, and we’ll tell you how.

With our Excel spreadsheet in sight and the TM displayed in Olifant, it was time to unify the inconsistencies. Since the person who performed the task knows the client enough, in most cases he knew what the best option was, but whenever doubts arose he was able to consult with the managers in charge of the project, who, in turn, consulted with the client.

To input changes in the memory, it was only needed to execute a “Find and Replace” (Ctrl + H) in Olifant, paste the “non-preferred” and the “preferred” versions (just to give them a name, since there was not always an incorrect option) in the corresponding boxes, and make the appropriate replacements. Although there is the option to replace everything, in this case it was not a way to implement changes that would provide a lot of confidence. Sometimes, the segment that needs to be modified is composed of two or three words that also appear in another instance as part of a sentence, and replacing everything without analyzing each case can lead to inputting errors.

Like almost all programs, Olifant also offers the options of matching uppercase letters, matching the whole word and using regular expressions. This last option was really useful when reviewing figures. As the client works in the technology industry, many of its texts are full of figures: dimensions, quantities, capacities, etc.

At the same time, due to formatting issues, it’s common that each of the figures be in-between tags, which makes it difficult to tell what figures correspond to numbers that are part of a tag. To correct these issues, we use regular expressions with the help of our IT team.

And so, thanks to the collaboration between the PMs, the linguists, the IT department and the client, we managed to carry out our task of cleaning the TM, a task that, as we said before, should be done from time to time to avoid filling it with obsolete segments.