Slavs and Their Languages—Reconciling Genetics and Linguistic Findings

Sep 8, 2015

[Thanks to Martin Lewis for helpful discussions, for crafting the passages from our book cited below, and for excellent editorial comments on earlier drafts of this post.]


A recent article by a team of scholars led by Oleg Balanovsky, published in PLOS One and titled “Genetic Heritage of the Balto-Slavic Speaking Populations: A Synthesis of Autosomal, Mitochondrial and Y‑Chromosomal Data”, sheds new light on the emergence and dispersal of the peoples currently speaking Slavic (and Baltic) languages. While I will not comment on the methodological points of the article, I would like to reflect briefly on the conclusions (simply assuming their validity) and their place in the investigation of the peopling of Eurasia. (This is also an opportunity to remind local readers about our upcoming course at Stanford’s Continuing Studies program, “The Science of Understanding the Deep Human Past: Linguistics, Genetics, and Archaeology”; registration is now open and classes start on September 21. For more information, check here.)

Their main conclusions are summarized by the authors of “Genetic Heritage…” as follows:

“The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pool.”


These findings accord well with existing historical and linguistic accounts, as discussed in our recent book, The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. Drawing heavily on the scholarship of historian Peter Heather, we emphasize the ubiquitous nature and the importance of two key processes in the peopling of today’s Slavic-speaking zone: rapid migration (as opposed to slower demic diffusion) and language shift. Although the various chapters in the history of Slavic-speaking groups are different in details,

“even the more dispersed episodes, such as that taking Slavic speakers westward to the Elbe River, cannot reasonably be modeled as a “wave-of-advance” (demic diffusion), as far too much ground (900 kilometers) was covered too quickly (150 years). And instead of advancing randomly, Heather [(2010: 422)] argues, these people seem to have sought out “wooded uplands rather than the more open plains”, owing to their swidden (“slash-and-burn”) agricultural orientation”. Such a subsistence system, moreover, pre-adapted these early Slavic-speaking peoples to migration, a process facilitated as well by their relatively simple yet highly functional material culture.” [Pereltsvaig & Lewis 2015: 129]

But migration alone is not sufficient to explain the “extraordinary scale of Slavic expansion in this period, which encompassed most of central and eastern Europe” (Pereltsvaig & Lewis 2015: 129). Language shift by pre-existing populations into Slavic is the second key piece of the puzzle. A lengthy passage from our book bearing on this issue is worth quoting here:

“Most of the lands of central Europe occupied by the Slavs at this time were politically chaotic and partially depopulated, owing to the exodus of large numbers of Germanic-speaking peoples, including political leaders and the most effective military units. A similar loss of order occurred in the Balkans after the Eastern Roman Empire’s abandonment of the Danube frontier in 614 CE and the transferal of the elites to coastal enclaves and the imperial core. The Slavic groups that subsequently moved into these areas were generally quite small in scale, but they were nonetheless militarily capable, owing in part to enforced tutelage under the Avar Khanate. Such groups seem to have subjugated the disorganized remnants of the pre-existing populations, but then accepted them into their own societies after “conversion” to their Slavic language and lifeways. Owing to its relatively egalitarian nature, Slavic culture at the time had its own draw.”


The exact composition of the indigenous populations before the coming of the Slavs differed from area to area. The future West Slavs (and to some extent East Slavs) encountered mostly Germanic-speaking groups. (Future East Slavs must have also come into contact with speakers of Baltic, Finnic, and Turkic languages.) In contrast, future South Slavs incorporated pre-existing populations speaking Romance languages, Greek, Albanian, Illyrian, and other languages indigenous to the Balkans, depending on the particular area of the South Slavic-speaking zone. (As can be seen from the image reproduced on the left, the different input of various pre-Slavic groups is also reflected by genetic admixture, which the “Genetic Heritage…” article confirmed as well.)

Is there linguistic evidence of these early patterns of contact between Slavic invaders and subjugated pre-existing populations? Numerous lexical loans from the early period (e.g. Germanic-derived luk ‘onion’ in Russian and similar forms in other Slavic languages) provide some support for the early Germanic-Slavic contact hypothesis. However, given the nature of these early contacts, involving Slavic invaders and subjugated non-Slavic speakers, one would expect to find traces of grammatical substrate influences of the pre-existing languages on Slavic languages. Potential Finnic substrate influences in Russian have been discussed in the literature (cf. Grenoble 2010, McAnallen 2011), yet I am not aware of any convincing evidence for early Germanic or other substrate influences in any Slavic languages. However, such grammatical “borrowings” can be easily masked by subsequent developments and can also be difficult to distinguish from later influences. For example, McAnallen 2011 discusses a case of such grammatical “borrowing” from German on Czech (presumably also applicable to Polish), but this particular influence most likely occurred in the late Middle Ages, not the earlier period of Slavic expansion. Similarly, certain grammatical phenomena in South Slavic languages (typically referred to as “Balkanization”, more on which below) can be ascribed either to substrate influences of pre-existing languages or to contact with surviving languages that continued to co-exist with Slavic in the Balkans; pinpointing precisely when these contact-induced phenomena were established in South Slavic is difficult. Thus, the absence of conspicuous grammatical influences of pre-existing languages on Slavic that can be unequivocally ascribed to the early period is not all that surprising; in this case, the absence of evidence does not equate to evidence of absence.

The “Genetic Heritage…” article leads me to stress another important point, which is unfortunately often ignored by many discussants of the genetic-linguistic historical parallels: whenever recent genetic findings do not accord well with the (earlier) linguistic narrative, the tendency by geneticists and journalists alike is to deem the linguistic narrative as incorrect or incomplete. Instead, it should be remembered that the two disciplines—genetics and linguistics—shed light on the history of different entities: people and languages, respectively. When the two stories do not coincide, it is not necessarily because one of them is flawed, but simply because they describe two different aspects of the world. While languages often spread with the dispersal of the peoples who speak them, they do not always travel with genes, as can be seen clearly from the examples of English, Spanish, Portuguese, or Russian. In addition to the physical descendants of the Anglo-Saxon invaders, Roman soldiers stationed in Iberia, and East Slavs from the Kievan Rus’, these languages are spoken today by millions of genetically-unrelated individuals—and entire indigenous groups—found in such regions as in rural Alaska, the Andes, the Amazonian rainforest, Australia, the Caribbean, and Siberia. These regions picked English, Spanish, Portuguese, and Russian more from acculturation than migration.

As mentioned above, language shift played an important role in the history of the Slavic languages. Moreover, language-internal processes led them to diversify in a way that does not accord perfectly with the genetics picture. Thus, the “Genetic Heritage…” article stresses the division between West and East Slavs, on the one hand, and South Slavs, on the other hand. Linguistically speaking, there are some reasons to group West and East Slavic languages together, in opposition to South Slavic languages. However, as discussed by Sussex & Cubberley (2006: 42), there are also good linguistic reasons to divide the Slavic-speaking world along the longitudinal rather than latitudinal line. In particular, Sussex & Cubberley point out that the South Slavic languages can be divided into two subgroups: western, comprised of Serbian, Croatian, and Slovenian, and eastern, consisting of Bulgarian and Macedonian. Moreover, they mention “some significant similarities between dialects of Slovenian and Croatian, the northernmost South Slavic languages, and Slovak, the southernmost of the West Slavic languages, especially in Central Slovak dialects” (p. 502). Thus, one can talk about a dialect continuum (or at least shared isoglosses) that extends across the West/South Slavic divide. In the east, the picture is a bit more murky: Sussex & Cubberley allude to “parallels between … Russian and Bulgarian” (p. 42), yet the only parallels discussed in a later chapter of their book pertain to Bulgarian (and in some cases Macedonian) and northern dialects of Russian. Among such parallels, Sussex & Cubberley (2006: 524) list the “contraction of ‘vowel + j + vowel’ sequences” in some verbal and adjectival forms resulting in either long or single/short vowel: for example, forms such as [starəjə] ‘old.FEM.SG.NOM’ in Contemporary Standard Russian correspond to [staraa] or [stara] in northern Russian dialects; comparable stará and stára are found in Bulgarian and Macedonian, respectively. Another similarity they discuss is the development of a post-posed definite article: compare kníga-ta ‘the book’ (cf. kníga ‘book’) in northern Russian dialects and kniga-ta ‘the book’ in Bulgarian. The lack of geographical contiguity between the eastern South Slavic languages and the northern Russian (East Slavic) dialects that exhibit such similarities suggests that these are parallel rather than shared developments.

In short, while some of the phenomena that distinguish western and eastern subgroups of South Slavic languages can be explained by two separate migratory streams of the future South Slavs “via both the west and east of the Carpathian Mountains” (Sussex & Cubberley 2006: 42), other similarities may be due to independent developments or patterns of later contacts once the group arrived in the Balkans. For example, Sussex & Cubberley (2006: 44) further suggest that the greater degree of “Balkanization” of Bulgarian and Macedonian is due to “the closer contacts of the eastern group [of South Slavic languages] with the Turks (the Bulgarians and Macedonians were under the domination of the Ottoman Empire from 1396 to 1897).” Thus, some of the discrepancy between a more clear-cut north-south genetic divide and the more complex linguistic picture can be accounted for by the difference in how far back the divisions emerged (pre- or post-migration), but most of the inconsistencies between the genetic and linguistic narratives are due to their differing objects of study. Parallel developments in eastern South Slavic and northern Russian dialects, in particular, show that linguistic picture is not completely reducible to migration and contact, which typically have genetic reflexes, but is determined also by language-internal developments that have no parallels in genetics.



