Shared Innovations Are More Important Than Shared Retentions

May 24, 2014


One linguistic phenomenon that can cause numerous errors in constructing a phylogenetic tree using lexicostatistical methods is borrowing across languages, as discussed in earlier posts. In this post, we focus on another factor that likewise leads to a misshapen tree: using shared retentions rather than shared innovations as testimony for intermediate nodes on the tree. As correctly pointed out by Jaakko Häkkinen in his response to Bouckaert et al.’s article, “such retentions cannot reliably testify for an intermediary proto-language, because the highest retention rate can sometimes be found in the opposite ends of a language family”. To put it differently, an innovation in one language can make its sister and its cousin on the true divergence tree look more similar and hence more closely related than they actually are, as schematized in the chart on the left.


Below, I will exemplify this phenomenon with four Slavic languages: Russian, Belarusian, Ukrainian, and Polish. The former three are typically said to form the East Slavic branch, while the latter belongs to the West Slavic branch (the third branch, South Slavic, will not be discussed here). As I show below, Russian is in several respects the innovating language in the East Slavic group, which makes the other two East Slavic languages, Belarusian and Ukrainian, appear more closely related to Polish than they are to Russian, as schematized on the left. It should be remembered, however, that the labels “innovating” and “conservative” apply to these languages only with respect to the specific phenomena under discussion; in many other respects, Belarusian and Ukrainian, or Polish may be the “innovating languages”, and Russian the more conservative tongue.


Let’s first consider the lexicon, as it is the domain that Bouckaert et al. rely on.* The table on the left lists the names of the twelve calendar months in the four languages. For ease of presentation, the cognates shared by Polish with Belarusian or Ukrainian (or both) are highlighted by different colors; the cognates shared by Polish with Russian (and in one instance, with Belarusian) are given in boldface. Based on this data alone, one would be fully justified in classifying Polish as most closely related to Belarusian and (somewhat less closely perhaps) to Ukrainian. Russian, contrarily, would be seen as a more distant cousin. As it so happens, this is exactly the tree that Bouckaert et al. provide in their paper (Supplementary Materials, Figure S1); the relevant detail of their tree is reproduced below:


However, this view results from an incorrect interpretation of the data. Rather than being testimony for the closer link of Belarusian/Ukrainian to Polish than to Russian, these data result from the fact that Russian adopted the month names of the Julian calendar, while the other three languages generally retained the original Slavic terms. (The Julian term for ‘May’ has intruded into the otherwise non-Julian system of Belarusian and Polish; and Polish mazec ‘March’ is also Julian in origin.) As discussed by Sussex and Cubberley (2006: 476), the earlier Slavic names for months “show etymologies … reflecting various aspects of flora, fauna, climate and activity”. For example, the term for February derives from ‘bitter, fierce’ in reference to the typically cold weather of the month. The term for ‘July’ comes from ‘linden tree’; interestingly, Russian has the word lipa for ‘linden tree’ but it does not preserve the month name based on it. Likewise, ‘September’ is the ‘heather’-month, while ‘November’ is the ‘leaf-falling’ month. The other month names that are not shared between the three languages—Belarusian, Ukrainian, and Polish—may come from different roots, but they too have weather- or activity-describing etymologies: for example, the name for ‘August’ in Ukrainian and Polish comes from the word for ‘sickle’ (cf. Russian serp ‘sickle’), while in Belarusian it derives from the root for ‘reaping’. Similarly, the names for ‘October’ in Belarusian and Polish derive from two different words for ‘flax’, while the Ukrainian term comes from the root for ‘yellow’. Crucially for our argument, the shared cognates across Belarusian, Ukrainian, and Polish are shared retentions, not shared innovations; lexicostatistical methods often take this sort of data—mistakenly!—to be evidence of common descent.

Instead, the more typical—and much better supported—classification of the four languages in question groups Belarusian and Ukrainian together with Russian rather than with Polish. Numerous phonological, morphological, and even syntactic phenomena support this classification (an interested reader is referred to Sussex & Cubberley 2006 for a more detailed discussion); here, I will mention only two phonological phenomena that unify the three East Slavic languages (Russian, Belarusian, and Ukrainian) in contrast to the West Slavic languages such as Polish (and often in contrast to the South Slavic languages as well). The first such phenomenon is the so-called pleophony. As a result of a complex series of changes, East Slavic languages ended up with sequences -oro– and -olo- (in roots of words), whereas West Slavic languages have corresponding -ro– and -lo-. Compare, for example, the Russian korova ‘cow’ and zoloto ‘gold’ to Polish krowa  and złoto. Importantly, Ukrainian and Belarusian follow the Russian pleophony pattern: for example, ‘cow’ in Ukrainian is korova and in Belarusian karova; ‘gold’ in Ukrainian is zoloto and in Belarusian zolata (generally, Ukrainian does not reduce vowels the same way Russian does, while Belarusian does reduce vowels and reflects the vowel reduction in spelling as well; as a result, Russian words are spelled like the Ukrainian ones, while their pronunciation is closer to their Belarusian counterparts).

Another phonological pattern that groups the three East Slavic languages in contrast to Polish (and other West Slavic languages) is the treatment of nasal vowels inherited from Proto-Slavic: in the East Slavic languages these vowels have lost their nasal qualities, whereas Polish has retained nasal vowels. The back nasal vowels, essentially the short and long nasal o-sounds, have been replaced in East Slavic by /u/, as in ruka ‘hand’ and zub ‘tooth’ (shared by all three East Slavic languages). In contrast, in Polish these have become the nasal e- and a-sounds, marked in Polish orthography by the hooks under the corresponding vowel letters, as in ręka (pronounced /renka/) and ząb (pronounced /zamp/). Similarly, the short and long nasal e-sounds have turned into /a/ in East Slavic, as in p’at’ ‘five’ and r’ad ‘row’ (subsequently, in Belarusian the “soft” r-sound has been turned “hard”, as in rad ‘row’). The corresponding forms in Polish feature nasal e- and a-sounds, as in pięć ‘five’ (pronounced /pjenč/) and rząd (pronounced /žand/). Once again, Belarusian and Ukrainian pattern with Russian rather than with Polish.

To return to vocabulary issues, the month names discussed above are not the only area where Russian deviates from Belarusian or Ukrainian. Generally speaking, Russian has borrowed more heavily than its East Slavic brethren from Finnic-speaking neighbors to the north, Turkic- and Iranian-speaking neighbors to the east and south, as well as from Western European languages. A significant proportion of its lexicon is also constituted by words borrowed from Old Church Slavonic (OCS). Despite the various movements in favor of the vernacular, these words, often originally belonging to the higher registers, remained in the language and can often be identified by their phonological characteristics, particularly where they exhibit combinations not found in modern Russian. Belarusian and Ukrainian, unlike Russian, have gone further towards adapting these words to native phonological patterns, to the extent that they have them at all. Such phonological traits that reveal the OCS origin of certain Russian words include the lack of pleophony (discussed above), as well as the sequence /ra/ instead of the more common East Slavic /ro/, both of which are exemplified by the Russian nagrada ‘reward’ vs. the Ukrainian nahoroda. Other OCS-isms in Russian are the appearance of /žd/ and /šč/, as in odežda ‘clothes’ and osveščenie ‘illumination’ vs. the corresponding Ukrainian odeža and osvičennja. Another notable Old Church Slavonic feature in Russian is the use of the verbal prefix {iz-} for {vy-}, as in the OCS borrowing izgonjat’ ‘exile’, contrasting both with the more colloquial, native Russian vygonjat’ ‘hoot away’ and with the Ukrainian vyhanjaty ‘exile’.

These examples of lexical innovation in Russian, whether concerning the borrowing of the Julian month names or the retained OCS lexis, exemplify another important drawback of lexicostatistic methods: the lexical level alone is not very reliable in determining language relatedness. As Häkkinen puts it, “a word could equally well be a later loanword than an inherited word”. The main reason for this unreliability of the lexical level is that “a sound change can be seen in numerous words, while words are single, separate units. A word appears, disappears or gets replaced independently from all other words, but sound change affects the whole vocabulary”. As scientists, we linguists look for systematic phenomena, hence our preference for grammatical (read, “phonological, morphological, or syntactic”) patterns over idiosyncratic words.



*The pattern schematized in the diagrams above arises not only with respect to lexical innovations: phonological, morphological, and syntactic changes too may make a sister of an innovating language appear more similar to its cousin. For example, the application of the First Germanic Sound Shift (also known as Grimm’s Law) in Proto-Germanic makes Latin and Irish appear more closely related to Russian, Lithuanian, and Sanskrit than they are to Germanic languages such as English, Dutch, and Icelandic. Thus, Latin, Irish, Russian, Lithuanian, and Sanskrit all retain the PIE /d/, as in the words for ‘ten’ decem (Latin), deich (Irish), desjat’ (Russian), dešimt (Lithuanian), and daśan (Sanskrit) vs. the innovation /t/ in English ten, Dutch tien, and Icelandic: tíu. The correct philogenetic tree would have Romance (Latin) and Celtic (Irish) languages grouped with Germanic rather than Balto-Slavic (Russian, Lithuanian) or Indo-Iranian (Sanskrit). Similarly, the complete loss of the nominative-accusative distinction  in English (except with pronouns) makes Germanic languages retaining this morphological distinction with nouns or articles, such as Icelandic (hattur vs. hatt ‘hat’) and German (der Tisch vs. den Tisch ‘the table’), more similar to each other than either is to English. Again, the correct philogenetic tree has English more closely related to German than to Icelandic.



Häkkinen, Jaakko (2012) “Problems in the method and interpretations of the computational phylogenetics based on linguistic data: An example of wishful thinking: Bouckaert et al. 2012”.

Sussex, Roland and Paul Cubberley (2006) The Slavic Languages. Cambridge University Press..


