In our last Blog “Big Translation Obstacles”, we showed that a Machine Translation system with customized vocabulary can be more effective than a general system. But how do we produce such a customized system.
The main ingredient are high-quality bilingual sentence pairs on the specific domain, among patents of comparable patents. These can then be used to refine a generic MT using readily available tools like AutoML.
Chilin offers high quality, domain-specific, bilingual sentence pairs that are ideal for this purpose. Chilin’s sentence pairs are derived from global patent documents and offer much richer content than can be found by scraping the web. More information on Chilin’s data will soon be available on the Chilin blog.
Bilingual Technical Terms for Post-Editing
A good machine translation gets the translator off to a good start. But some editing will often be required. One common problem is that technical terms in English are often represented by multiple Chinese characters in Chinese. These are called Multi Word Expressions (MWEs). Another problem is that translations of technical terms are often not unique. Different MWEs may be commonly used. Alternative translations are called Multiple Renditions. Chilin has built and extensive corpus of bilingual technical terms including MWEs, Multiple Renditions, frequency of use data, and examples.
Here is an example taken from Chilin’s PatentLex post-editing tool is shown below. “phenylacetic acid” is a MWE with multiple renditions. The translator can see all of the multiple renditions and their frequency of use.
How does the translator select the correct option? By clicking on the MWE, PatentLex shows examples of how the term is used in a related context.
We will be discussing Chilin’s corpus and PatentLex product in future blog posts.
For cost effective translation of Chinese and English technical documents, Chilin is your best answer.
Please contact us if you have any questions.