Does Google Translate Output Accord with Reality?—And Remarks on the Morphosyntax of Russian Nominalizations

Aug 17, 2015

GooleTranslate_RealityI have written extensively about problems with Google Translate and its many bloopers in both lexical choices and producing a grammatically cohesive output. It is thus ironic that Google Translate’s “translation” of the Russian sootvetsvuet dejstvitel’nosti ‘accords with reality’ is… “untrue” (see screenshot on the left). Still, it is surprising to find Google Translate’s output that is exactly the opposite of what it should be, considering that Google Translate looks for matches of a given string in a corpus of parallel texts. I cannot imagine a context in which sootvetsvuet dejstvitel’nosti ‘accords with reality’ means (or is translated by a human translator) as “untrue”.


Somewhat more understandable are Google Translate’s errors where the grammatical structure is “translated” as exactly the opposite of what it should be. A case in point is an example from the paper I am currently working on, which I threw at Google Translate to see how it would manage. As you can see from the screenshot on the left, Google Translate rendered obožanie Putina narodom ‘adoration of Putin by the people’ (an example found in the National Corpus of Russian) as “Putin’s adoration of the people”—so ironic on many levels! (Exact phrase searches in Google bring up two hits for the Russian phrase, but unsurprisingly none for the English phrase that is the Google Translate’s output.)

Had Google Translate been relying on analyzing the grammatical structure, reversing who is doing it to whom might have been more understandable because the morphosyntax of Russian nominalizations (i.e. noun phrases that describe events rather than “people, place or thing”) is very complex. Considering only transitive nominalizations, that is those that have both an Agent (the one who is doing it, corresponding to the subject in a sentence) and an Object (that to which the action is done, corresponding to the object in a sentence), we find that both of these elements can be marked by a variety of morphological cases.* Thus, the Object can be either genitive, instrumental, or dative, and the Agent can be either genitive or instrumental. (Of the traditionally distinguished six Russian cases, nominative and accusative are reserved for clauses/sentences, and prepositional/locative to complements of prepositions.) Moreover, there is no prohibition for the Agent to be in the same morphological case as the Object—thus challenging the commonly held view that the purpose of case is to distinguish grammatical functions, like subject, object, and so on. Of the six possible combinations of two different options for Agent case and three different options for Object case, five are attested. The following table represents all the possible case combinations in Russian nominalizations (all examples in the table are either from a corpus or from elicitation tasks with native speakers):**

Agent Object Examples
Instrumental Genitive razrušenie goroda vragom ‘destruction city.GEN enemy.INSTR’
kasanie igrokom setki ‘touching player.INSTR net.GEN’
Genitive Genitive bojazn’ ženščin krys ‘fear women.GEN rats.GEN’
Instrumental Instrumental upravlenie kuxarkoj gosudarstvom ‘managing cook.INSTR state.INSTR’
Genitive Instrumental torgovlja angličan opiumom ‘trade Englishmen.GEN opium.INSTR’
Instrumental Dative n/a
Genitive Dative udivlenie sportsmena uspexu ‘surprise athlete.GEN success.DAT’

To add to the morphosyntactic complexity of these phrases, the word order is relatively free (i.e. it is subject to considerations such as weight, nominal/pronominal nature of arguments, and old/new information). Specifically, the Agent can either precede or follow the Object:

(a)          razrušenie goroda vragom ‘destruction city.GEN enemy.INSTR’ — preferred if the city is “old information” and the enemy is “new information”

(b)          razrušenie vragom goroda ‘destruction enemy.INSTR city.GEN’ — preferred if the enemy is “old information” and the city is “new information”

Thus, in most cases neither the morphological case nor the word order tells us unequivocally which is the Agent and which is the Object. How can one tell then who did it to whom? In some instances, no ambiguity arises because of the nature of the entities involved. For instance, in the examples below it is clear that the professor solves the problem and not vice versa, or that the Cherepanovs invented a locomotive and not vice versa.

(c)           rešenie                 ètoj        zadači                    professorom Pupkinym

solution                [this problem].GEN         [professor Pupkin].INSTR

‘Professor Pupkin’s solving of this problem’

(d)          izobretenie         parovoza                             otcom i synom Čerepanovymi

inventing             locomotive.GEN               [father and son Cherepanov].INSTR

‘father and son Cherepanov’s inventing a locomotive’

Another clue comes from case, albeit in an indirect way: although the surface forms look exactly the same, not all genitives are the same and not all instrumentals are the same. Let’s start with the genitive. Remembering that the order of Agent and Object can be flipped and is thus not important, consider:

(e)          kasanie igrokom setki ‘touching player.INSTR net.GEN’

(f)           razrušenie goroda vragom ‘destruction city.GEN enemy.INSTR’

Both of these examples have an Object in the genitive and the Agent in the instrumental. But despite appearances, the two genitives are of different nature: in (e) the genitive is an instance of the so-called “lexical case” and in (f) it is an instance of “structural case”. If we turn the nominalizations into sentences, the two genitives behave differently: the lexical genitive in (e) is preserved, but the structural genitive in (f) is replaced by accusative:

(g)          Igrok kasaetsja setki ‘player.NOM is touching net.GEN’

(f)           Vrag razrušaet gorod ‘enemy.NOM is destroying city.ACC’

Lexical case is a property of a specific verb (or more precisely verbal root): ‘touch’ in Russian takes a genitive Object, ‘manage’ takes an instrumental Object, and so on. Knowing what lexical case (if any) a given verb takes is part of knowing that verb. In contrast, structural cases are sort of default: when the Object is not marked by lexical case, the Object becomes genitive and the Agent — instrumental. Thus, in a nominalization with one instrumental and one genitive, either one of the cases may be lexical (see ‘touching’ and ‘trade’ examples in the table). Alternatively, both cases can be structural in which case the Object is always genitive and the Agent instrumental.

Nominalizations with two genitives or two instrumentals arise when the underlying verbal stem takes an Object in the corresponding case (e.g. in the table above ‘fear’ takes a genitive Object and ‘manage’ take an instrumental one). Unless our general knowledge allows us to disambiguate such structures, they can be genuinely ambiguous in isolation (though context typically disambiguates them): do women fear rats or do rats fear women? do cooks manage the state or does the state manage the cooks? (The latter example is a paraphrase of a famous quote from Vladimir Lenin that “even a cook can manage a state”, so most speakers would jump to that interpretation, but other examples such as upravlenie kafedroj docentami ‘managing department.INSTR lecturers.INSTR’ can be understood both ways: as the lecturers directing the department or the department directing the lecturers.)

Going back to the original example that proved to be problematic for Google Translate, the verb obožat’ ‘adore’ does not take a lexically case-marked Object, so in the corresponding sentence the object is the (default) accusative: Narod obožaet ètot fil’m ‘people.NOM adores this film.ACC’. Hence, the only pattern we can expect in nominalization is with the instrumental Agent and the genitive Object. Thus, obožanie Putina narodom ‘adoration Putin.GEN people.INSTR’ can only be understood as the people’s adoration of Putin, and not the reverse, as Google Translate’s output suggests. (Last but not least, note that “adoration” can be translated into Russian two different ways: poklonenie and obožanie—when speaking of the Magi in the Biblical context, for example, the former is the correct lexical choice.)


*Linguists typically contrast Agents with Themes (both called “thematic roles”) and Subjects with Objects (both called “grammatical functions”). The opposition Agent-Object is commonly used in discussions of ergativity, however, and since the morphosyntax of Russian nominalizations resembles an ergative system (a point I am exploring in my current paper), I use this opposition as well.

** Some speakers do not accept some of these examples as grammatical; in other words, there is variation across speakers as to which of these patterns are possible. The exact nature of this variation is, as of yet, unknown. More research is needed…

