In a recent post, I included information on when to use the ever-more prevalent free online translation engines. (Yes, we do sometimes tell our clients to use free translation. Our responsibility is to our clients, and some texts simply do not call for professional work. “Don’t pay for something you can get for free! May 2015 )
Readers came back with questions about machine translation (MT): What is it, exactly? Is it something new? Why doesn’t everyone use it? What is an MT engine?
All good questions!
The ultimate goal of MT is to build software that can move the meaning of a text from one language to another, and people have been trying to do this since the mid-1900s. The idea may have originated with Descartes, but the computer age finally gave it legs.
One of the first approaches used was to compile huge amounts of data from dictionaries and grammar references in both the source and target language. The computer was taught to match words across languages like a bilingual dictionary: word by word. This is called Rule-based MT (RBMT), and it can work fairly well if the two languages are related and tend to use the same type of sentence structure, such as subject-verb-object. When sentence structure varies, rules of syntax become vitally important, and RBMT works less well.
Human language is an intricate, layered thing. When I studied computational linguistics years ago, we designed a back-end processor, and I am still astonished at the hidden complexities that came to light in the paragraph we were given to analyze.
As computers became more powerful, MT programs evolved as well, gaining the capacity for more and more information. Statistical machine translation (SMT) arose, which relies on trusted bilingual corpora gleaned from previous translations. This method can be used with a fair amount of success on texts that are similar. Of course, the original translations that form each corpus must be reviewed, corrected, and updated carefully to avoid propagating errors.
SMT has grown to include computer analysis of the syntactic structures of each language, resulting in pattern-matching. This helps with both the first step in translation—decoding the source text—and the second—recoding the meaning into the target text.
Today, many companies use a hybrid of RBMT and SMT, often using the results of the latter to clean up the output of the former. These MT engines can be especially valuable when a company’s bilingual corpora have grown to thousands of matching segments and its writers are taught to use specific grammatical constructions and vocabulary.
Depending on the quality required for the target text, most MT is post-edited by human translators. This stage can be a nightmare if the MT engines are not carefully constructed and maintained.
Even the best MT can fall victim to ambiguity. A sentence that is clear in the mind of the author may need explanation before any translation can be done. A very simple example is Everyone saw her duck. Did the woman bend down quickly, or did she have a Mallard with her? Context would obviously fix this particular example, but it’s difficult to teach a software program to recognize and resolve ambiguities.
Machines have taken over many positions previously filled by humans and will keep doing so. Our job is to evaluate what is gained and what is lost—using the old saying as an illustration, what is the bathwater, and what is the baby? We all use a computer keyboard, but we haven’t thrown away our pens and pencils. They’re still the best tool for certain situations, and sometimes they may be the only option available. MT is another example. Yes, it should be used—why wouldn’t we take advantage of years of R&D? But we shouldn’t throw out the skill of professional linguists, whose expertise is crucial and can in some situations, such as biomedical translations, literally be a matter of life and death.
Contact me if you have other questions about MT or if you’d like to know when you should use it!