The Mis-Mapping of the Indo-European Homeland

May 24, 2014 by

The animated map that accompanies the Science article of Bouckaert et al. contains numerous errors when it comes to the location and timing of language differentiation “events”. While such mistakes are problematic in and of themselves, they also lead to the incorrect mapping of the ultimate Indo-European differentiation event, that is the highest-order split of Proto-Indo-European into its first branches. Locating this initial division, in turn, effectively specifies the location of the Indo-European homeland, the key issue that the article purports to resolve. In other words, mis-mapping permeates the whole project, from the incorrectly mapped individual languages (and in some cases dialects) to the incorrectly mapped intermediate proto-languages to the incorrectly mapped ultimate proto-language, Proto-Indo-European. This post focuses on two additional problems that skew the analysis toward locating the Indo-European urheimat in Anatolia as opposed to the steppe zone.

As we have seen, Bouckaert et al. assume a diffusion-only model of language spread, without considering non-random migration routes that are best modeled as advection. However, since they fine-tune their model by assuming some non-uniformity to the diffusion process (e.g. by distinguishing “land” from “water”), they avoid locating the origin of population spread in the geographical center of the selected sample of Indo-European tongues, as the authors stress in the Supplementary Materials. As they point out, the geographical centroid of the sampled languages, marked on their maps by a green star (p. 958) is located roughly in northern Crimean peninsula at the edge of the steppe zone. Bouckaert et al. furthermore avoid another potential simplification: selecting the center of mass of the language sample as the Indo-European homeland. Because of their oversampling of European languages versus those of Asia, their center of mass would most likely hit around the Balkans, not Anatolia. If anything, this simplistic approach to locating the urheimat would generate “evidence” in support of the Balkan-Carpathian theory of Indo-European origins, proposed by the eminent Russian linguist and historian Igor D’iakonov in his seminal 1985 paper “On the Original Home of the Speakers of Indo-European” and supported most recently by Alexei Kassian of the Russian Academy of Sciences.

While Bouckaert et al. manage to avoid these two oversimplifications—selecting the geographical center as the origin of completely isomorphic diffusion or choosing the sample’s center of mass—nonetheless two factors do appear to sway their solution towards Anatolia: the location of the highest-order split in the IE language family and the southern location of extant Indo-Iranian languages.

The first issue concerns the Anatolian branch of Indo-European, whose latest known location is in what is now the Asian part of Turkey. Given Bouckaert et al.’s assumptions, this means that the split into Anatolian- and non-Anatolian (or Indo-European proper, IEP for short) branches necessarily happened in Anatolia. However, this hypothesis is problematic for two reasons. First, at the time that this highest-order differentiation happened, there was no reason to expect that Anatolian would end up as a relatively small cluster of languages which would all become extinct long before Bouckaert and his colleagues were born, while the IEP would grow into a vastly bushier family tree, many of whose descendants will survive into the third millennium CE. In other words, there is no reason to assume that the Anatolian daughter of PIE remained in situ, whereas the IEP moved on somewhere else. It could have been the other way around, with the IEP daughter staying where the PIE mother was spoken and the Anatolian daughter moving away. Such a process is exactly what the steppe theory hypothesizes. Nor does the fact that all Anatolian languages were last attested in Asia Minor mean that they were always spoken there. For example, while Gothic survived the longest in Eastern Europe, its homeland is in what is now Germany or southern Scandinavia.


The same point can be illustrated with numerous additional examples from extant languages and families. In several cases,  not a single language of a given family is still spoken in the original homeland. For example, much evidence indicates that the Austronesian language family originated in mainland coastal China, yet today no members of the family can be found on the mainland, where they were displaced long ago by advancing Sinitic languages. The closest location to this hypothesized urheimat where Austronesian  languages are still spoken is in Taiwan, which also exhibits the highest degree of diversity for Austronesian languages among all the areas where these languages are currently found.


Similarly, the homeland of Turkic languages is generally thought to have been in Mongolia, and area that is now totally outside the family’s distribution. The Turkic family also provides clear evidence that the homeland need not be located in the area where the first language to have split off the family tree is currently attested: the most distinctive Turkic language is Chuvash, but it is spoken not in northern Mongolia, but rather in the distant Middle Volga region. Significantly, a diffusion-only model of the kind developed by Bouckaert et al. would never designate a linguistic homeland in an area outside of the range of currently spoken and historically attested extinct languages within the group, even though we know this to have been the case in regard to several major language families.

The same pattern obtains within the Indo-European family: for example, the Celtic languages originated in the Central Europe (and are mapped as such by Bouckaert et al.), yet no Celtic language is spoken on the continent, with the exception of Breton, which backtracked there from the British Isles in historical times.


In general, whether Anatolia, the Ukrainian steppes, the Balkans, or some other location turns out to be the ultimate Indo-European homeland, it is almost certain that the language spoken there now has not developed in its current location throughout time. For example, Ukrainian did not evolve from its earliest ancestor wholly in Ukraine, even if we assume the steppe theory of PIE: wherever PIE originated, its relevant descendants are known to have migrated into Eastern Central Europe (Northwest Indo-European), then backtracked eastwards into Eastern Europe (Proto-Balto-Slavic), continuing into the area between Vistula (Wisla) and Dniepr rivers (Proto-Slavic), to the Middle Dnieper region (East Slavic), and finally to present-day Ukraine. It is likely from the same Slavic homeland between the Vistula (Wisla) and Dniepr that the South Slavic languages (Serbian, Bulgarian, Macedonian) moved to the Balkans, while Romanian ultimately traces back to the Romance homeland in Italy. In fact, no language currently attested in the Balkans appears to have developed entirely in region, without sojourning somewhere else at some earlier time, even if we assume the Balkan theory of PIE. It is thus not clear why Anatolian languages, such as Hittite or Luwian, must be exceptional in this respect by never moving away from their postulated homeland.


The second problem concerns the relatively southern location of extant Indo-Iranian languages, as well as that of their intermediate proto-language as depicted in Bouckaert et al.’s animated map. Thus, the “movie” shows the Indo-Iranian population front advancing eastwards from present-day Turkey into Iran and onwards to northern India. Given their assumptions of “diffusion only” and an Anatolian homeland, this is the only way Indo-Iranian expansion can be mapped. However, it is well-established that the Indo-Iranian proto-language was located much further to north, in the North Caspian steppe zone. Indo-Iranian speakers subsequently migrated eastward into Central Asia before splitting into Iranian and Indic branches. Therefore, either the “diffusion only” or the Anatolian homeland assumption, or both, must be abandoned.

The evidence for the steppe sojourn of Indo-Iranian speakers comes from the numerous loanwords from Indo-Iranian languages into Uralic, which must have happened over a prolonged period of time. For example, Early Proto-Uralic is said to have borrowed *juxi/jôwxi ‘to drink’ from the Early Proto-Indo-Iranian *gughew (which traces back to PIE *ghew-); the borrowing of the Late Proto-Uralic *śeta ‘100’ from Late Proto-Indo-Iranian *ćatam (tracing back to PIE *kmtóm) must have happened at a later period (see Häkkinen 2012, p. 8). Such a prolonged period of linguistic contact suggests a long-term co-existence of the two language families in neighboring areas. While the location of the Uralic homeland is itself a controversial issue, the consensus is to place it in the Volga-Ural region, although its ancestor probably came from Southern Siberia, north from the Sayan Mountains. Since the Indo-Iranian branch must have developed in the vicinity of Proto-Uralic, it can be placed in the North Caspian Steppes, but not on the Iranian plateau. In other words, the earliest location of the Indo-Iranian branch was much farther to the north than its descendant languages, even the earliest ones attested in written records, disproving the idea of a direct diffusional route from Anatolia to Iranian Plateau and hence to South Asia.

The fact that virtually all known migrations to, and invasions of, South Asia came via a the northern steppe corridor further supports this theory. Agents of the British Raj were preoccupied with India’s northwestern frontier for good reason, as they understood this historical-geographical dynamic quite well. Historically attested mass movements into the Indian subcontinent through this route are numerous. One of the more interesting examples is that of the Indo-European speaking Yuezhi, who arrived roughly 2,000 years ago. Many scholars believe that the Yuezhi were descendants of the Tocharians of the Tarim Basin, whose language was an early IE offshoot. Although the Yuezhi did not spread their language in South Asia, they were instrumental in building the powerful Kushan Empire that long served as a bridge between India, Central Asia, and the Middle East. Subsequently, several waves of Turkic speakers descended from Central Asia to the plains of northern India; like the Yuezhi, their linguistic role was relatively minor, but their political and cultural impacts were hugely significant. Any model of population movement that rules out such instances of advection will miss such crucial patterns, and hence will result in fundamentally false accounts of human history.



D’iakonov, Igor M. (1985) “On the Original Home of the Speakers of Indo-European”. Journal of Indo-European Studies. Volume 13.

Häkkinen, Jaakko (2012) “Problems in the method and interpretations of the computational phylogenetics based on linguistic data An example of wishful thinking: Bouckaert et al. 2012”, available online.







