Disentangling “The Tangled Roots of English”

Feb 28, 2015 by

[I am deeply grateful to Martin W. Lewis for the inspiring discussions of, and extensive collaboration on, the issues examined here.]

Haak_mapSeveral articles, written by historical linguists, geneticists and archeologists, have been published in recent weeks on the issue of the Indo-European origins—and drew renewed public attention to the topic through reports in the popular media (see here and here). Three of them deserve a special mention. The first is “The Indo-European Homeland from Linguistic and Archaeological Perspectives”, written by archeologist David W. Anthony and historical linguist Don Ringe and published in Annual Review of Linguistics. The second is “Ancestry-Constrained Phylogenetic Analysis Supports the Indo-European Steppe Hypothesis”, written by a team led by historical linguist Andrew Garrett; it is to be published in Language, the flagship journal of the Linguistic Society of America (the preprint is available online). The third is “Massive migration from the steppe is a source for Indo-European languages in Europe”, written by a large team of geneticists and archeologists (including David W. Anthony, one of the strongest advocates of the Steppe theory) and published in Biorxiv online. Notably, all three articles support the Steppe theory, which links Proto-Indo-European (PIE), the ancestor of all Indo-European languages, to the pastoralist inhabitants of the Pontic Steppes in southern Russia some 5,500 years ago. This has not been a good month for the Steppe theory’s main competitor, the Anatolian theory, proposed by archeologist Colin Renfrew and most recently supported by Russell D. Gray, Quentin D. Atkinson and their colleagues (cf. Bouckaert et al. 2012). According to their view, PIE was spoken by Neolithic farmers in Anatolia (present-day Turkey) about 8,500 years ago.

As discussed in the forthcoming book by Martin W. Lewis and myself, The Indo-European Controversy: Facts and Fallacies in Historical Linguistics, historical linguists typically side with the Steppe theory (and, more generally, advocate a more recent and more northerly PIE homeland). But if you read the New York Times, particularly its publications by senior science journalist Nicholas Wade, you would not know that. Although journalists should be unbiased and evenhanded, as well as somewhat knowledgeable in the area they cover, Wade clearly picked a side in the Indo-European debate years ago and keeps providing coverage based on his own prejudices rather than facts. On August 23, 2012, his front-section article’s headline declared that “Family Tree of Languages Has Roots In Anatolia”, and the first sentence of the article itself proclaimed that “Biologists using tools developed for drawing evolutionary family trees … have solved a longstanding problem in archaeology: the origin of the Indo-European family of languages”. (Why the origin of a language family is dubbed a “problem in archeology” is as bewildering as the rest of the article.) Wade’s most recent piece, titled “The Tangled Roots of English” (hence the title of this post) and published in the NYT on February 23, 2015, states that “a surprisingly sudden resolution of this longstanding issue may be at hand”, in reference to the new pieces of evidence in support of the Steppe theory. Of course, there is nothing surprising or sudden about the Steppe theory, originally proposed by Marija Gimbutas in the 1950s. Only someone who is either completely ignorant of, or purposefully ignoring, the subsequent debate could be startled by the recent avalanche of evidence in favor of the Steppe theory.

Anthony&RingeWade’s ignorance of even the most basic notions in historical linguistics and his favoritism are both evident in the latest article, where he pays only lip service to the advocates of the Steppe theory and rushes to support the Anatolian theory without providing either arguments in favor of it or challenging the mounting evidence against it. As one illustrative example of this, consider Wade’s treatment of the “wheel” argument, one of the strongest arguments against the Anatolian theory. Wade writes:

“Linguists objected that proto-Indo-European could not have fragmented so early because the wheel wasn’t invented 8,000 years ago, yet many Indo-European languages have related words for wheel that must be derived from a common parent. But Dr. Renfrew argued that, long after their dispersal, these languages could all have borrowed the word for wheel along with the invention itself.”

The subsequent paragraph of his article turns to an overview of Atkinson & Gray’s 2003 article in Nature; thus, Renfrew’s position is presented as unchallenged. However, Wade follows Renfrew in committing the fallacy of thinking that just because something might have happened, it indeed did happen. Words for wheels and wheeled vehicles could have been borrowed; more generally, words for various technological inventions—from ‘shoulder yoke’ to ‘iPad’—frequently are borrowed from one language into another, typically together with the spread of the invention itself. But the fact is that the words for ‘wheels’ in the Tocharian, Indo-Iranian, Greek, and Germanic branches of Indo-European are all inherited from the same ancestral root (cited as “kwekwlos” in Wade’s article), as shown in the image on the left, from Anthony & Ringe (2015: 204). If these words were borrowed, they would show some signs of “foreign sound signatures” from the source language rather than the sound changes that occurred within the target language itself. The detailed argument is presented in Ringe (2006), and I will not go into it here. Suffice it to say that Ringe’s argument is based on a close examination of the relevant linguistic forms, whose formation is unique, involving reduplication, a zero-grade root, and a thematic vowel. According to Ringe (2006: 4), “the probability that it could have been formed independently more than once is virtually nil”. Thus, the only sensible conclusion, which Wade omits from his presentation, is that the “wheeled” vocabulary is inherited from PIE, not borrowed by some of its much-later descendants from their “cousins” in other Indo-European branches. As I have concluded in my earlier post on the subject, “the ‘wheel’ vocabulary originated in PIE prior to its split into daughter languages, which thus must have happened some time after 4000 BCE” (i.e. the time when evidence of wheel use appears in the archeological record). In the 2012 article, Wade waves such difficulties away without rebuttal. In the most recent piece he does not mention them at all.

Besides ignoring any inconvenient facts, Wade gets “entangled” in the facts (and in their analysis) that he does discuss in the article, particularly with respect to the history of English. Here his statements are simply contradictory. To understand why this is the case, we need to take a closer look at Wade’s presentation of Chang et al.’s (2015) work. The main innovation of this paper and the key departure from the model used in Bouckaert et al. (2012) is that Chang et al. implement ancestry constraints which restrict the relationships between eight ancient and medieval languages and thirty-nine modern languages in their data set, where the later languages are well-known to be the descendants of the earlier languages. Thus, Chang et al. assume the following ancestor-descendant relationships: Vedic Sanskrit—Indo-Aryan languages, Ancient Greek—Modern Greek, Latin—Romance languages, Classical Armenian—Modern Armenian dialects, Old Irish—modern Irish and Scots Gaelic, Old West Norse—West Scandinavian (Faroese, Icelandic, and Norwegian), Old High German—modern High German varieties (German, Swiss German, and Luxembourgish), and Old English—modern English. As Chang et al. explain, a “logical possible alternative” to a model with ancestry constraints, such as theirs, is that the later languages are descended not from the putative ancestor, but from another language spoken at the same time; “for example, perhaps Modern Irish and Scots Gaelic are descended not from Old Irish, but from an undocumented variety that had already significantly diverged from it” (p. 205). Chang et al. further explain that this alternative is “the only interpretation of the phylogenetic trees given by Bouckaert and colleagues …, whose analyses include ancestral and descendant languages but do not constrain their relationship except that each ancestral language forms a clade with its descendants” (p. 206). In Bouckaert et al.’s tree, ancient and medieval languages are crucially not, as most historical linguists believe (see Chang et al. for extensive references), ancestral to the later languages, : Latin is not ancestral to Romance languages, Vedic Sanskrit is not ancestral to Indo-Aryan languages, Old Persian is not ancestral to modern Persian, and so on. By the same token, Old Norse is not ancestral to modern Icelandic and Faroese, let alone other Scandinavian languages. Likewise, Bouckaert et al.’s tree “shows Old Irish evolving for over 500 years after its common ancestor with the other Goidelic languages” (Chang et al. 2015: 206). As for English, Bouckaert et al.’s tree shows it to descend not from Old English but from another variety spoken at the same time.

On this issue, as on the Indo-European problem as a whole, Wade sides with Bouckaert et al.; he writes: “But the case is not yet closed. … Dr. Garrett’s correction of the Bouckaert tree … may not be as conclusive as [it] seem[s]”. Furthermore, he cites Atkinson as saying: “The Garrett and Chang model is overzealous in forcing ancient languages to be directly ancestral – the data don’t support this”. It is not clear what data that might be, as the historical linguistic evidence for the ancestral relationships assumed by Garrett and his team is very solid. When it comes to modern English, some scholars have proposed that it descends not from Old English but from Old Norse—a theory that I have challenged in previous posts (see here, here, and here). Note, however, that Bouckaert et al.’s tree does not show that modern English is a descendant of Old Norse, or even of Frisian for that matter. Thus, under Bouckaert et al.’s scenario it must derive from some undocumented West Germanic variety spoken at the same time as Old English (5th-10th century).

Naturally, one wonders where this alleged ancestor—the “true Old English”, as it were!—must have been spoken. Wade’s explanation is that this “true Old English” was spoken at the same time and in the same place as what we normally label as “Old English”, which existed as a parallel written variant: “living languages are likely to be descended from a spoken language that diverged from the written version”, he writes. While it is generally true that people tend to write in a somewhat different way from how they speak, it does not mean that they always write and speak parallel yet distinct languages. On this issue, Wade quotes Paul Heggarty, a historical linguist at the Max Planck Institute for Evolutionary Biology, as saying that “written languages tend to be fossilized”. But Wade fails to understand the significance of that quote: it means that written languages are likely to be more representative of the spoken language of an earlier period, but not of a different language altogether. Thus, in English we write knight, as in Old English, but pronounce [najt] as in (late) Middle English. Moreover, such fossilized elements are typically found in two areas of language: pronunciation and syntax, but neither is particularly relevant to Bouckaert et al.’s model. For example, the pronunciation of dog (with a rounded or unrounded vowel) is not important for them, but the fact that it replaced the earlier hund as the “everyday equivalent” of the meaning ‘dog’ is what matters. And since Bouckaert et al.’s entire approach is predicated on examining only vocabulary, the syntactic fossilization and stylization typical of ancient and medieval texts, especially of translations of religious texts, which often copied syntactic structures of the original, is also irrelevant.

To make matters worse for Wade, not only does he uncritically side with the view that ancestral relationships, well established in historical linguistics, do not hold, but he also contradicts his own conclusion elsewhere in the article. In the discussion of the “wheel argument”, Wade states that “hweohl in Old English [is] itself the ancestor of wheel in modern English”. Yet, under his favored scenario, namely that Old English is not the ancestor of modern English but rather its “great-aunt”, modern English could not possibly have inherited its word wheel (or any other word, for that matter) from Old English. Words are like hair color and not like a coin collection: you can only inherit it from direct ancestors, not from side branches of the family. Thus, Wade’s article is not internally-consistent, in addition to being overly dramatic, highly biased, and rather lacking in understanding of the subject matter at hand—not at all what one would expect from a leading science journalist of a major news outlet. In following posts, I will look at Wade’s other articles on language-related issues—and a bigger picture of peddling misinformation and perpetuating ignorance will emerge.



Anthony, David W. and Don Ringe (2015) The Indo-European Homeland from Linguistic and Archaeological Perspectives. Annual Review of Linguistics 1: 199-219.

Bouckaert, Remco; Philippe Lemey; Michael Dunn; Simon J. Greenhill; Alexander V. Alekseyenko; Alexei J. Drummond; Russell D. Gray; Marc A. Suchard; and Quentin D. Atkinson (2012) Mapping the Origins and Expansion of the Indo-European Language Family. Science 337: 957-960.

Chang, Will; Chundra Cathcart; David Hall & Andrew Garrett (2015) Ancestry-Constrained Phylogenetic Analysis Supports the Indo-European Steppe Hypothesis. To appear in: Language.

Haak, Wolfgang et al. (2015) Massive migration from the steppe is a source for Indo-European languages in Europe. biorxiv online.

Ringe, Don (2006) Proto-Indo-European wheeled vehicle terminology. Unpublished Ms., University of Pennsylvania.



Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:

  • Norvallus

    Nearly all Old English is Old West Saxon. Modern English derives from Old Mercian, or so the handbooks tell me.

    • Jonathan Gress

      That is true, but the point is are Old West Saxon and Old Mercian sufficiently distinct to matter for the purposes of ancestry constraints? For example, of those OWS words with cognates in modern English, the vast majority can be derived by regular sound change. OWS and OM don’t look like languages with five centuries of separate development between them, in other words, which is nevertheless the consequence of not using ancestry constraints in these models.

      • Thanks for jumping in, Jonathan — I was going to say exactly the same thing. Only you put it much better than I ever could!

    • John Cowan

      True enough, and that means that some of the supposed sound-changes between Old and Middle English may be illusory. In particular, the change > toe probably never actually happened; rather, there was a change in pre-Old-West-Saxon ō > ā which never took place in Old East Mercian.

  • Norvallus

    Certainly not. The trouble is that we know much less about the Old Mercian and Old Northumbrian lexicon than we do about Old West Saxon. How do you measure the effect of ancestry constraints? Since you don’t know what other factors might have played a role, such as status factors, geographical location, ease of travel, and so on. Norval Smith (Norvallus is my Disqus login, but I selected FB)

    • Jonathan Gress

      I think this is a good question, but my impression is that, on balance, not having the constraints gives a more historically misleading picture than having them.

      • Norvallus

        How do you assign a value to the constraints?

        • Jonathan Gress

          A numerical value? I couldn’t give you an answer: I had trouble following the details of the computational methods in the Chang/Garrett paper. All I could tell you is that not having the constraints causes the model to push the dates much further back than having the constraints, and there are many cogent linguistic arguments for insisting on ancestry between the ancient and modern languages in question. Without the constraints, it seems the model is compelled to hypothesize sibling relationships between the ancient and modern languages, and this necessitates positing more degrees of separation, e.g. instead of modern English being the daughter of Old English, which is one degree of separation, ME becomes the sister of OE, which is two degrees of separation.

          • As far as I can tell, this issue is more relevant for the models that rely entirely on the lexicon — when it comes to the grammatical side of things, I don’t think anyone in their right mind would postulate that these Old Mercian and Old Northumbrian and Old West Saxon were separate LANGUAGES (as opposed to DIALECTS).

    • As far as I know, Northumbrian got (lexically) quite distinctive in the late Old English/early Middle English period, due to the Vikings.

  • Norvallus

    Thanks, I’ll contact AG.

  • HC

    I have two questions.

    1) Who is this Wade fellow? Apparently, some newspaper science journalist who, if I’m not very much mistaken, doesn’t do any research himself but merely writes about it. Aren’t you giving him an awful lot of your time and attention, compared to the real McCoys? Surely a footnote would be enough, no?

    2) Is the Garrett et.al. article another example of the adage “Garbage in, garbage out”, as it still is a matter of “using tools developed for drawing evolutionary family trees”, or would you, now that this kind of method supports the Steppe theory, (be willing to) reconsider the idea that using such tools is no good anyway or anytime? The fact that you write that this “has not been a good month for the Steppe theory’s main competitor, the Anatolian theory” seems to suggest that you take all three articles you mention rather seriously.

    • 1) There are two reasons why I am dedicating such attention to Mr. Wade: first, compare the number of people who read his NYT articles to those who read the “real McCoys” — he’s doing a lot of harm because NYT is considered a serious newspaper and so many people rely on science journalists, Wade included, to give them the gist of what scientists are doing. Second, he’s a symptom of a bigger problem: the near-total ignorance of even the most basic linguistics among non-specialists (scholars in other disciplines and the general public alike). If it weren’t the case, his editor would say “what crap” and never publish it — same as should have been the reaction of the editors of Science, Nature and other journals that also publish junk when it comes to language-related matters, in large part because they don’t know any better.

      2) you answered your own question, I think. The adage says “garbage in”, not “garbage inside”. I am not saying that the computational models are wrong in and of themselves. Their application to language, as done by the Gray, Atkinson & Co, is wrong. Garrett et al. put cleaner data in so they get a better result. Longobardi and his team are applying the same computational models to an entirely different sort of linguistic data and get much cleaner results. We discuss this in detail in our book:


      P.S. I take all articles I discuss here on the blog seriously, whether positively or not.

      • Norvallus

        Dear Asya,
        I don’t think you quite answered HC’s second question as to the relevance of “using tools developed for drawing evolutionary family trees.” But maybe you discuss this in your book.

        • I never claimed that “using tools developed for drawing evolutionary family trees” is no good anywhere and anytime. Works perfectly well in genetics as far as I can tell. Works much better when what goes in is grammatical rather than lexical data. We do discuss this rather at length in the book — sorry if it means for you waiting for the answer, but I couldn’t possibly copy entire chapters here.

  • More on Nicholas Wade’s views of historical linguistics here: