Is “massive migration from the steppe … a source for Indo-European languages in Europe”?

Feb 18, 2015 by

[I am deeply grateful to Martin W. Lewis for the inspiring discussions of, and extensive collaboration on, the issues examined here, as well as for editing the draft of this post.]

The Biorxiv online has recently published an article titled “Massive migration from the steppe is a source for Indo-European languages in Europe”, which of course caught my attention. (The entire article can be read here.) The 39-member research team (first author: Wolfgang Haak) includes David Anthony, whose views on the Proto-Indo-European (PIE) homeland debate are quite well-known (see his 2007 book The Horse, The Wheel and Language), so predictably the title of the article is an unequivocal assertion rather than a question. Yet the abstract, with its abrupt transition from genetic and archeological discussion to a linguistic conclusion, did not seem promising at first. The article itself, however, articulates the argument much better. While the abstract states that “these results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe”, the article itself is less categorical in tone: “our results provide new data relevant to debates on the origin and expansion of Indo-European languages in Europe”. Here, I will not comment on the methodology or the conclusions concerning genetics (the reader is referred to blog posts by Razib Khan, see here and here). Instead, I will take the genetic results for granted and examine what conclusions regarding the PIE urheimat can be drawn from this study.

978110705453001The authors of the study sensibly admit that “ancient DNA is silent on the question of the languages spoken by preliterate populations”. Therefore, no amount of DNA data or computational analysis, however sophisticate, can offer direct evidence of PIE homeland. However, ancient (and to some degree, modern) DNA, coupled with archeological record, can provide “evidence about processes of migration” of preliterate populations. As Martin W. Lewis and I discuss in detail in chapter 9 of our forthcoming book The Indo-European Controversy: Facts and Fallacies in Historical Linguistics, any theory of Indo-European language dispersal must be compatible with such migration history. Let’s now consider what that migration history may be and which of the two leading competitors, the Steppe or the Anatolian theory, accords better with it.

According to Haak et al. (2015), their study “document[…] a massive migration ~4,500 years ago associated with the Yamnaya and Corded Ware cultures”. Coming out of the southern Russian steppes, this population flow “replaced ~3/4 of the ancestry of central Europeans”. Thus, proponents of the Anatolian theory, which postulates that Indo-European languages arrived in Europe from present-day Turkey via the Balkans several millennia prior to the arrival of the Yamnaya migrants (cf. Bouckaert et al. 2012) must explain why such a massive influx of later migrants did not produce a language shift—nor even affect the pre-existing Indo-European language(s) in any serious way. Although numbers alone do not decide the outcome of language encounters, as evidenced by the shift in favor of the language of a relatively small group of Magyar migrants in what is now Hungary, a language of a minority group may “win” over that of a majority group only if the minority has social prestige, military superiority, and/or political control. There are numerous cases where a relatively small group of invaders subjugates a large group of locals—the Anglo-Saxon and Norman invasions of the British Isles come to mind. But it is hard to imagine that a large wave of Yamnaya migrants was absorbed without the new arrivals exerting some such form of social dominance. That they brought with them powerful horse-riding technology makes it even likelier that the Yamnaya migrants had not only numbers but also social dominance on their side as well. It is therefore much more likely that Yamnaya migrants served as “vectors for the spread of Indo-European languages into Europe” rather than that they adopted the Indo-European language of the pre-Yamnaya indigenes.

Figure 4The Yamnaya culture, however, postdates the Anatolian split separating Anatolian languages such as Hittite and Luvian from the rest of the Indo-European family, under both the Steppe and the Anatolian theories. Consequently, it cannot be associated with speakers of PIE itself, but rather with its descendant, so-called “Proto-Nuclear-Indo-European” (or PNIE), which is the ancestor of all Indo-European languages except the Anatolian ones. Haak et al. (2015) seem to acknowledge this point by stating that their “results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe” (highlight mine). However, technically, if they are correct in associating PNIE with the Yamnaya culture, then all of the Indo-European languages of Europe descend from PNIE. Anatolian languages, which are the only ones that do not descend from PNIE, were spoken in the Asian part of present-day Turkey, not in Europe.

In principle, the results of Haak et al.’s study are compatible with a possibility that the Yamnaya culture is associated with a later descendant of PIE, such as the ancestor of all the surviving (or currently existing) branches of Indo-European, which Martin Lewis and I dubbed “Proto-Surviving-Indo-European” (PSIE). In other words, PSIE is the result of the second split in the Indo-European family, a split that separated Tocharian languages from the rest of the Indo-European “clan”. However, such association of the Yamnaya culture with PSIE is unlikely in light of the probable link between the Yamnaya culture and its offshoot further east—the intrusive Afanasievo culture in the western Altai Mountains (Anthony 2011, 2013). According to Anthony (2013: 10), the Afanasievo culture is quite distinct from that of the “ceramic-making mountain foragers” who occupied the region earlier. Moreover, Afanasievo material culture exhibits traits characteristics of the Yamnaya culture: Yamnaya kurgan grave types, a typical Yamnaya burial pose, Yamnaya ceramic types and decoration, and sleeved axes and daggers of specific Yamnaya types are all found in Afanasievo sites in the western Altai (Kubarev 1988; Chernykh, Kuz’minykh, and Orlovskaya 2004). According to Mallory and Mair (2000), the Afanasievo culture is linked to the linguistic ancestors of the Tocharians.

Moreover, the hypothesis that the Afanasievo culture resulted from a Yamnaya migration, which crossed an astonishingly long distance of approximately 1,200 miles, accords very well with Don Ringe’s conclusion (see Ringe 2006, inter alia) that the Tocharian branch split off “cleanly”, meaning that once the division occurred, the speakers of Proto-Tocharian lost contact with the other Indo-Europeans. Consequently, their languages shared no common innovations with other Indo-European languages, nor did they borrow from or provide loanwords to them, as discussed by Ringe.

All in all, the migration history that emerges from genetic and archeological studies, including Haak et al., accords well with the Steppe hypothesis. An early migration took speakers of Proto-Anatolian from the Pontic Steppes to what is now Turkey, leaving its sister branch, PNIE, to be spoken in the steppe zone by the people archeologists associate with the Yamnaya culture. A second major migration took a subset of the Yamnaya folk on a long journey across Eurasia, creating the Afanasievo culture and a rift between the Tocharian and PSIE branches of PNIE. Further splits took some Yamnaya people to Europe, where they effectively overrun the existing non-Indo-European-speaking populations (as documented by Haak et al.) and effect a language shift, while the rest of the Yamnaya PSIE speakers become the ancestors of Indo-Iranian and some other branches of Indo-European.

The double link between the Yamnaya people in the steppes and Europeans, on the one hand, and between the Yamnaya culture and that of the Afanasievo sites in western Altai, on the other hand, is much harder to account for under the Anatolian hypothesis. The only sensible scenario under the Anatolian theory is the one explored in Colin Renfrew’s later work (Renfrew 1999) and dubbed by Mallory (2013) “Anatolian Neolithic Plan B”. Under this scenario, Proto-Anatolian speakers remained within the Anatolian homeland while the speakers of PNIE relocated to the Pontic Steppe area, with the subsequent events proceeding as described above. But this scenario runs into its own problems: specifically, it is incompatible with the only feasible account of the relationship between Anatolian speakers and the indigenous non-Indo-European peoples of Asia Minor, such as the Hattians. Everything we know of these groups from the historical and archeological record points in the direction of the Hittite speakers arriving to the area inhabited by the Hattians, whom they eventually came to dominate, rather than a non-Indo-European-speaking Hattian peasantry “diffusing” into the Anatolian land from elsewhere. In other words, the “Anatolian Neolithic Plan B” scenario requires yet another instance of a mass settlement by a group that ends up constituting a demographic majority without social power—as pointed out above, a virtually impossible turn of events.

The only solution to this problem within the Anatolian theory, proposed by Grigoriev (2002: 354-357, 412-415), is to assume that the speakers of Proto-Anatolian first migrated away from the Anatolian homeland, leaving at least some of the area vacant for the Hattians to move in, while speakers of what was to become “the rest of the Indo-European family” (i.e. PNIE) remained in (eastern) Anatolia, and that subsequently speakers of Anatolian languages returned to the region where they were attested. As Martin Lewis and I argue in our book, however, such a scenario is geographically ridiculous, unduly complicated, and fits poorly with the archeological record.

As Mark Baker once wrote: “The best theory is not the one that brings everything into line with its one favorite fact, but the one that finds the greatest degree of harmony and convergence among all the facts” (Baker 2001: 31). All in all, the Steppe theory rather than its Anatolian competitor increasingly appears to be that “best theory”. And Haak et al.’s chief contribution to the Indo-European debate is in bringing to the table yet another set of facts that the winning theory must be able to account for.





Anthony, David W. (2007) The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton, NJ: Princeton University Press.

Anthony, David W. (2011) Horseback Riding and Bronze Age Pastoralism in the Eurasian Steppes. Lecture delivered at Secrets of the Silk Road Symposium at University of Pennsylvania Museum of Archeology and Anthropology, March 2011.

Baker, Mark (2001) Phrase structure as representation of “primitive” grammatical relations. In: William Davies and Stan Dubinsky (eds.) Objects and other subjects: Grammatical functions, functional categories, and configurationality. Dordrecht: Kluwer. Pp. 21-51.

Bouckaert, Remco; Philippe Lemey; Michael Dunn; Simon J. Greenhill; Alexander V. Alekseyenko; Alexei J. Drummond; Russell D. Gray; Marc A. Suchard; and Quentin D. Atkinson (2012) Mapping the Origins and Expansion of the Indo-European Language Family. Science 337: 957-960.

Chernykh, Evgenii, Evgenii V. Kuz’minykh, and L.B. Orlovskaya (2004) Ancient metallurgy in northeast Asia: from the Urals to the Saiano-Altai. In: Katheryn M. Linduff (ed.) Metallurgy in Ancient Eastern Eurasia from the Urals to the Yellow River. Chinese Studies 31. Lewiston: Edwin Mellen Press. Pp. 15-36.

Grigoriev, Stanislav A. (2002) Ancient Indo-Europeans. Chelyabinsk: Rifei.

Haak, Wolfgang et al. (2015) Massive migration from the steppe is a source for Indo-European languages in Europe. biorxiv online.

Kubarev, Vladimir D. (1988) Drevnie Rospisi Karakola. Novosibirsk: Nauka.

Mallory, James P. (2013) Twenty-first century clouds over Indo-European homelands. Journal of Language Relationship 9: 145-154.

Mallory, James P. and Victor H. Mair (2000) The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples From the West. London: Thames and Hudson.

Renfrew, Colin (1999) Time depth, convergence theory, and innovation in Proto-Indo-European: ‘Old Europe’ as a PIE linguistic area. Journal of Indo-European Studies 27: 257-293.

Ringe, Don (2006) Proto-Indo-European wheeled vehicle terminology. Unpublished Ms., University of Pennsylvania.


Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:

  • John Cowan

    As Anthony and Ringe 2015 point out, the Anatolian hypothesis makes no sense in its own terms: if the non-Anatolian IE speakers dispersed north, east, and west of Anatolia, how did they remain in touch with one another to form PNIE? Even Plan B doesn’t meet this objection.

  • KF Levin

    In your discussion of Grigoriev, you neglected to mention that the Yamnaya are half Armenian. Surely relevant.

    • In what sense is “the Yamnaya … half Armenian”? Linguistic? Genetic? Surely it’s relevant…

      • Kamran

        Genetic. Yamnaya are a mixture of a middle eastern herder/farmer population similar to armenians (although modern bedouins, who are ultra-near eastern genetically, also show the same signal when used as a proxy for that group) and the native foragers of far eastern europe.

        The Y-chromosomes of the yamnaya are almost entirely R1a/R1b and they seem to be descended paternally from the foragers. The mother’s side is something like 80% near eastern. So we could guess how that encounter went….

        So their language was the product of mixture right? Mostly PIE comes from the forager men, but the Gamkrelidze theory of proto-kartvelian contacts is vindicated as well: It came with the moms!!

        So there is no need to triangulate a homeland by looking at contacts with Proto-uralic, proto-kartvelian, if we assume the populations that formed the yamnaya horizon brought their languages along with them.

        Btw, another interesting thing is that the foragers were very similar to a 24,000 year old boy from mal’ta in south siberia. Another recent study found out that this kids relatives were almost 50% of the ancestry in native americans, the other half being similar to east asians. And again most native american Y-DNA haplogroups are Q, which is the brother of R.

        These northern guys had huge mojo.

        • KF Levin

          No, their Y chromosomes are 100% R1b and of the kind that is common in Armenians.

        • Genetic studies, I am afraid, are a total mess. I am still to see a publication that shows this convincingly, even for a non-specialist like me. They use all sorts of “proxies” so inaccurately that the results are rather meaningless.

          As for Gamkrelidze’s theory and the influence of the “moms”, it’s an interesting idea for sure, although as you probably know, such influences do not cause language shift, simply changes to the language itself — check out this series for details:

          Gamkrelidze’s theory that PIE homeland was in the southern Caucasus has been all but debunked. For more details on this, check out our forthcoming book (by Martin Lewis and myself), chapter 9:

          • KF Levin

            As a non-linguist and non-geneticist, I have to give the advantage to the geneticists who, after all, present hard DNA data on carbon-dated bones, and not layer upon layer of reconstruction and interpretation based on languages attested many thousands of years after PIE. I hope you discuss the genetic evidence in future editions of your book.

          • When/if geneticists produce solid results that are of interest to linguists and to the extent that such findings are relevant to linguistic issues, we discuss them. But all in all, thinking that genetic results are more reliable than linguistic reconstructions is a rather ignorant view… Good luck learning about both fields!

          • Jonathan Gress

            It seems very hard to make non-linguists understand the precision with which historical linguists have reconstructed Proto-Indo-European and other proto-languages. The same could be said of any other technical field, like biology, but somehow biologists enjoy huge prestige and biology-denying creationists are rightly laughed at even by those who know little or nothing of how biologists determine prehistorical genetic relationships. Unfortunately, linguistics does not enjoy nearly the same level of prestige and we are constantly having to battle our own “creationists” everywhere.

          • Well-said, Jonathan! Except our own “creationists” come in so many guises, that it’s a huge and never-ending battle…

          • Jonathan Gress

            It might be good to write up something on how historical linguistics can profitably be connected to archeological and genetic evidence. In the Ringe and Anthony paper, they explain that the only way to connect linguistics and archeology is to take reconstructed words for artifacts with secure meanings, like “wheel”, and then find archeological evidence for a culture with the requisite artifacts in the right time and place, i.e. a culture with wheels in the places and times where we expect PIE to have originated and spread through the internal linguistic evidence.

            When it comes to genetics, it seems we can’t connect genetics to linguistics so directly, but only indirectly through the archeology, i.e. we get the DNA from the bones of people from that archeological culture, and then see if the genes spread in the same or similar directions as the languages. But by this point we’re working at a few removes from the linguistic evidence.

          • Not only that, but there is a difference between what genetic and linguistic findings indicate: the former are about movements of PEOPLE, the latter about movements of LANGUAGES. The two are related, for sure, but not the same, as looking at the DNA of native English speakers readily shows.

            Therefore, linguists have their own problems and their own solutions to them, that geneticists and archeologists might contribute to, but can’t completely replace. It’s too bad few people actually understand this point… (I don’t mean you, Jonathan!)

          • KF Levin

            Linguistics can not match the reliability of the hard sciences. A biologist or geneticist can reconstruct the ancestral form of two organisms or populations, and this can be compared with ancient DNA or fossils. A linguist can reconstruct *PIE, but they can’t compare *PIE with PIE, because PIE is not recorded. I’m not criticizing linguists who are doing great work, but it’s a weaker discipline for sure.

          • The only thing weak here is your understanding of how either genetics or linguistics works. Linguists can reconstruct ancestral languages that are recorded (which has been done), thus verifying and validating the method, which can then be applied to problems without the known answer such as PIE. Similarly, geneticists can reconstruct genealogical/species trees based on DNA and compare to ancient DNA, and if the method works, they can apply it to problems with no known answer, i.e. where no ancient DNA is available. By the way, ancient DNA is analyzable only so far back: after a certain time, it degrades beyond the possibility of analysis.

            As for the overall reliability, linguistics and not genetics currently has more reliable models and methods. If you follow the genetics debates, you’ll see that their results change every other week because new methods are being developed and perfected. So virtually nothing geneticists tell us know is likely to remain valid in the near future. Linguists, on the other hand, have already developed and perfected their methods so they can now be applied RELIABLY to new problems. No surprise, of course, as linguists got a significant head-start: they began developing their methods some 200 years before geneticists discovered the structure of DNA.

          • KF Levin

            There is no reason to be rude.

            Your argument that linguistics can reconstruct languages that are recorded is not a particularly good one, because of the issue of time depth. Being able to reconstruct languages of a couple of thousand years does not mean that they can reconstruct languages over a much longer time span (6,500 years to PIE).

            This is like saying that being able to predict tomorrow’s weather is the same as being able to predict next week’s weather. At least weather scientists can test how good their predictions are (because next week’s weather will eventually be recorded). Linguists simply can’t compare their predictions for prehistoric language against observation, because there are simply no observations of PIE 6,500 years ago.

          • I am not being rude, just stating the facts. By the same token, I could say that you were rude — and cut off the discussion by banning you from the site, which I didn’t do.

            To answer your point, linguists can reconstruct a language from 2,500 years ago and compare it to the attested form and then reconstruct the language from 5,000 years ago BASED ON THE EARLIER LANGUAGES (recorded or reconstructed). In fact, serious linguists (not the Gray and Atkinson types) do not throw together Italian and Hindi and reconstruct PIE on the basis of those comparisons, but reconstruct step by step backwards. If the method is valid for verifiable reconstructions, what grounds do we have to think that it won’t work for non-verifiable reconstructions?

            Moreover, if geneticists can only reconstruct DNA genealogy that can be verified directly, why bother to do such reconstructions? If their methods are only good for problems where the answer is already known anyway, what’s the point of such methods. Scientists don’t want to be able to solve problems that are already solved. Anyone who understands how science works would know that.

          • KF Levin

            I was never rude. Raising legitimate concerns about the efficacy of linguistics isn’t being rude, unless you take the honor of your discipline so personally.

            The two earliest attested Indo-European languages are Hittite and Greek, and their common ancestor is 3,000 years before the Bronze Age (if the 6,500 age is right). Indo-Europeanists don’t only use Hittite and Greek to reconstruct PIE (so they work with even greater time depths), but 3,000 years is still more than plenty.

            Which languages have linguists been able to reconstruct across 3,000 years and compared against an attested ancestral language? (I doubt that there are many, since writing is not that old).

            As for genetics, the entire point is that geneticists don’t have to reconstruct genomes, they get them off actual bones.

          • I was never rude either. I am just stating the facts.

            re: your question, you answered it yourself. Hittite is about midway in time between PIE and now (actually, closer to PIE than to now). Ancient Greek is a little later than Hittite, and Vedic Sanskrit is earlier than Hittite. Avestan is from about the same era as Hittite.

            re: genetics, they don’t reconstruct GENOMES per se, but genealogical trees based on genomes. And as I said before, they can sequence ancient DNA only going so far back. It degrades.

            So before you make stupid pronouncements, perhaps you should learn the facts, eh?

          • KF Levin

            Please, instead of calling me stupid, answer the question of which languages linguists have been able to reconstruct across 3,000 years. That is, a recorded language P at time T, two daughter languages A and B at time T+3000, and a reconstruction *P of the common ancestor of A and B that is shown to be very similar to P.

            Until you can show that, your faith that linguistically reconstruction *PIE is very accurate is not based on any testing of the underlying method on empirical data.

            The fact that Hittite is halfway between PIE and now is not relevant to the discussion, as Hittite did not leave any long-living linguistic progeny that would allow linguists to compare a reconstructed *Hittite with the actual Hittite across 3,000 years.

            DNA degradation is a strawman, as geneticists can sequence ancient DNA from the relevant time periods for PIE very easily (as the new study shows). Linguists can’t look at the languages of 6,500 years ago directly, geneticists can look at genomes of people 6,500 years ago (and even much older) directly.

          • I did name a few languages, so maybe I did call you correctly? Vedic Sanskrit is one. Just read what I said above.

            As for the time depth, comparative reconstruction is only valid for certain time spans — 10,000 years is probably the figure most linguists would be comfortable with. Genetics can look farther back. But genetics gives less reliable results when we look at shorter time depths. Is a microscope less useful in principle than a telescope?

          • Jonathan Gress

            As a non-linguist and non-geneticist, you are equally unqualified to evaluate the state of either discipline.

      • KF Levin

        Half the Yamnaya ancestry is like Armenians. Their Y chromosomes are of the type common in Armenians.

  • Bernie Brightman

    But how can we understand the Tocharians wandering so far to the east? Why did they go so far? Were they running away from someone? Were they pursuing someone? How long did it take? What was the interaction with the neighboring proto-Indo-Iranians?

    • Great questions, Bernie! I don’t think we have solid answers to them though…

    • John Cowan

      My guess is that they were following an established trade route.

  • Aleksandr

    A short note about well known linguistics trees,
    obtained with the method of population genetic (via genetic algorithms (GA)).

    Using of GA give a good results in different applications,
    mainly in bioinformatics, economics, computational science, engineering, and
    some other fields. As it known GA belong to big class of evolutionary
    algorithms (where can be find a solution of optimization problems using methods
    of natural evolution, such as crossing, inheritance, mutation, selection).

    Genetic algorithms have known limitations: time (needed for solving the
    problem), fitness function (need for repeated fitness function when evaluating
    complex problems), ergodicity of alone mutation, GA do not scale well for
    complex problems (need for a big search space size), etc.

    From a theoretical point of view, the biggest problem of GA is
    that they are heuristic (these algorithms do not have rigorous justification,
    but, nevertheless, they can gives an acceptable solution to some practically
    important cases).

    So why genetic algorithms (i.e. heuristic algorithms) do not allow now to obtain
    a unique solution when solving different linguistics problems. But that does
    not mean, of course, the necessity to develop the use of these algorithms in
    different areas of linguistics. More and more powerful and high-speed computers
    are created; both algorithms, and methods of their realisation are developing,
    for example, methods of parallel programming, etc.

    But now I concern with some caution to the conclusions of known “evolutionary biologist” to the problems of historical linguistics.