Evaluating SymEval

SymEval is a tool written in Python for measuring the effort invested in the Post-Editing (PE) process. It reports the edit distance between segments in two different translations using the Levenshtein algorithm.

To test SymEval we first prepared four TMX files. The first one had the source and the result of translating it with an engine.

The other three files had Post-Editing jobs done by different applicants, one failed, one barely passed and one good. All four files contained the same source in English, and the target language used for this test was Spanish.

From the point of view of the numeric results, SymEval provides overall statistics called Project Score that can be seen as the percentage of the machine translation that was used. Two almost identical files will have a score near 100.

The Project Scores for these three files, each of them compared to the machine translation, were as follows:

          result_bad:     60.07          result_barely:  72.00          result_good:    58.98

At first we were expecting the numbers to have some correlation with the results, that is the scores increasing (or decreasing) according to the quality of the Post-Editing. But this is not necessarily the case. Thinking a bit about the logic used by the tool, it becomes clear that it cannot help with the judgment of either the quality of the machine translation, or that of the Post-Editing, since the measurements are purely quantitative.

Even this quantitative measure is not applied uniformly, which makes it a bit unreliable. For example, two Post-Editors changed (correctly) the machine translation of General Employment from “General de Empleo” into “Empleo General”. But while in one of them this was reported as a change of 0.712, in the other it was considered to be only 0.4.

In terms of usability the tool has some problems. All the inputs have to be the way the system likes it, otherwise one is greeted with error messages of the style check your input.

In the help page it says that the Eval tool only supports XLIFF files, but it actually does support also TMX. When the Eval tool is used with TMX files it produces an XML file highlighting the differences between the two target versions.

Trying to use the Diff tool with the test_suit.txt files generated with Eval we’ve got an error message: Status: No match has been found. Reading in the description of what is it that the Diff tool does, it looks as if it should be used only when the evaluation was done with monolingual files.

Overall, it is good to have a tool like this available in the public domain, but it needs a bit of technical skill and patience to set it up properly and take full advantage of it.