Custom Machine Translation Development Process

To launch a new Custom Machine Translation Engine, Trusted Translations requires an initial training and setup period.

The following is a typical implementation process to build a new MT Engine.

Customizing New MT Engines for Each Language Pair

The training data needed to build a quality engine is very dependent on the specifics of the content domain type; so, samples of existing content will be extremely useful in order to build relevant training data, and consequently, a high quality engine for each language pair. There are various options available to gather training data to build a customized engine.

  • Existing Translated Material:

    The ideal starting point for any custom machine translation engine is to find and utilize previously translated material involving content that is very similar to what is desired to be translated. The more previously translated material available, the less expensive and faster the process will be.

  • Existing Monolingual Data:

    If sufficient amounts of source content exist, it is possible to extract clean, monolingual sentences that can then be translated by our experts. By creating a set of parallel data for each language pair, we can create the appropriate content from which to build and train a custom engine.

  • Creating a Specialized Corpus from other sources:

    In addition to utilizing monolingual data, we will search the Web for materials that are as closely aligned to the content that will run through the engine. As our engines are statistical in nature, we will need to search for both parallel and monolingual data. Initially, we will build systems using the client-provided data side-by-side with the supplementary data mined from the Web in order to demonstrate the effectiveness of this approach.

    The parallel data found on the Web will need to be cleaned (spell-checked, alignments checked, duplicates deleted, etc.) before it is of use as training data for an MT system. Much larger amounts of manual involvement is required in this scenario compared to when the client is able to deliver sufficient amounts of good quality aligned data from the outset. It will take 4 to 6 weeks to build the new engine.

As more and more output is post-edited, this can be converted to good quality training data. Thus, the quality of the system will improve quite quickly over time.

Custom New MT Engines improves with Human Post Editing

There are various workflows involving a Custom Machine Translation Engine. One common configuration is to integrate a human postediting process. Under this workflow, the output from the Custom Machine Translation Engine is edited by one of our expert linguists to improve the quality of the current output as well as to re-train the engine for future translations. While the editor modifies the output to improve the quality, the engine becomes more intelligent. As more translations flow through the engine, the more intelligent the engines become. Over time, the quality gap between a full human translation and this solution will narrow greatly. Plus, the turnaround time and costs will be significantly less. These engines, in our opinion, will become an asset and a market differentiator for any Client that has such a need.