Do “Ultraconserved Words” Reveal Linguistic Macro-Families?

Oct 4, 2014 by

[This post was originally published in May 2013]

Today’s post takes on a recently published article by Mark Pagel, Quentin Atkinson, Andreea Calude, and Andrew Meade entitled “Ultraconserved words point to deep language ancestry across Eurasia”, published in PNAS. 


Linguistic Critique

Can words remain recognizable across more than a dozen millennia, their meanings understandable by people speaking languages in diverse linguistic families? Mark Pagel, Quentin Atkinson and their co‑authors answer in the affirmative. Journalists and bloggers, moreover, have tended to interpret their study as indicating little change in the core vocabulary of a massive assemblage of languages, those found in the supposed “Eurasiatic super-family” tying together Indo-European, Uralic, Altaic, Kartvelian (Georgian), Inuit-Upik (Eskimo) and Chukchee-Kamchatkan.* Mark Frauenfelder at reported that “a research team led by Mark Pagel at the University of Reading in England has identified 23 ‘ultraconserved words’ that have remained largely unchanged for 15,000 years”. Tia Ghose, LiveScience Staff Writer echoed with “The researchers could predict what 23 words, including “I,” “ye,” “mother,” “male,” “fire,” “hand” and “to hear” might sound like in an ancestral language dating to 15,000 years ago”. Even venerated publications could not avoid sensationalism. In the Washington Post, David Brown, claims that the following passage consists largely of such “ultraconserved words: “You, hear me! Give this fire to that old man. Pull the black worm off the bark and give it to the mother. And no spitting in the ashes!”. Brown further contends that:

“… if you went back 15,000 years and spoke these words to hunter-gatherers in Asia in any one of hundreds of modern languages, there is a chance they would understand at least some of what you were saying.”

Sorry, but there is no such chance. If you go back a mere two thousand years in any preserved language, changes have been enormous. If you go back 15,000 years, all comprehension would vanish. Within any one “Eurasiatic” family, which diversified much more recently, speakers of languages in one branch are seldom able understand anything in Brown’s passage if expressed in a language of a different branch.

Consider, for example, a direct translation from Russian:

Vy, uslyshte menja! Dajte etot ogon’ tomu staromu muzhu. Potjanite chjornogo chervja s kory i dajte jego materi. I ne plevat’ v pepel!

Yet the authors claim that such ultraconserved words are found across all of the branches of the hypothesized “Eurasiatic” family. It goes without saying that the mutual comprehension of this passage would be nil between speakers of Georgian, Chukchi, Sakha, Tamil, and Udmurt, to name just a few languages in this hypothetical family.

Brown’s article in The Washington Post continues:

“That’s because all of the nouns, verbs, adjectives and adverbs in the four sentences are words that have descended largely unchanged from a language that died out as the glaciers retreated at the end of the last Ice Age. Those few words mean the same thing, and sound almost the same, as they did then.”

Just how wrong this claim is can be seen from the example of just one of the 23 “ultraconserved words”: man. First consider its history merely within English over the past millennium and a half. Today, it is pronounced /mæn/ and means one of two things: either ‘an adult male person’ or ‘a person of either gender’. The latter meaning, however, is considered sexist by many, and is thus falling out of use. Words such as chairman, fisherman, and policeman are thus being replaced by such gender-neutral forms as chairperson, fisher, and police officer, just as  mankind is yielding to humankind. But as the gender-neutral meaning of man is still evident in manslaughter and in the phrase no man’s land. As it turns out, the meaning of ‘an adult male’ is relatively new. In Old English (roughly, prior to the Norman invasion of 1066), this word—pronounced then with a vowel articulated further back in the mouth—did not mean a ‘male person’ but had only the gender-neutral sense of ‘a human being, person (male or female)’. The word acquired the sense of ‘adult male’ in Middle English. Prior to that time, an adult male was a wer, as distinguished from a wif, which then meant ‘woman (of any marital status)’, as it still does in idiomatic expressions like old wives’ tale and in the compound midwife, originally meaning ‘with woman (during labor)’. The word wer began to disappear in the late 13the century and was eventually replaced by man, which retained its old, more general meaning as it acquired the new, gender-specific one. (The term wer did survive, however, in such terms as “werewolf,” which make one wonder whether a female lycanthrope should be referred to as “wifwolf”.) Note also that the Old English man had additional meanings besides ‘person’, including ‘servant, vassal’, as in all the king’s horses and all the king’s men (we retain this meaning to this day). Thus, clearly the meanings of even “ultraconserved words” show considerable change over much shorter periods than 15,000 years.

Pronunciations of such core terms change too, as I indicated above with the shift in vowel articulation in man through the history of English. Within the Germanic branch of the Indo-European family, the reflexes of the reconstructed ancestral Proto-Germanic form *manwaz include Old Norse maðr, Danish mand, Gothic manna. In other Indo-European branches we find Sanskrit (Indic) manuh, Avestan (Iranian) manu-, Old Church Slavonic (Slavic) mozi. The latter is related to the Russian form muzh, found in the Russian version of the odd “Stone Age” passage above. This plethora of phonological forms in related languages is a result of sound changes, different in each family.

The list of the “ultraconserved words” in the PNAS article itself contains quite a few surprises, even we restrict ourselves to the 1,500-year long history of English rather than the supposed 15,000 years of shared “Eurasiatic” history. Among those oddities are thou and ye, both of which changed their meaning (and form, in the case of ye), switching from informal to formal. Another surprise is not, a word of recent pedigree as a negative particle (Pagel et al. incorrectly call it an “adverb”). Not began its career in the mid-13th century as an unstressed variant of the emphatic noht/naht ‘in no way’, not unlike pas in the modern French two-part ne… pas negation. In fact, both English and French are undergoing the so-called “Jespersen’s Cycle” (named after a Danish linguist and Anglicist Otto Jespersen). In the first stage of this cycle negation is expressed by a single preverbal element; in the second stage, a postverbal emphatic element is added and made obligatory; and in the third stage, this postverbal emphatic element replaces the preverbal element, making the latter optional or eliminating it altogether. Thus, in Old English negation was expressed by a preverbal ne, as in ic ne seah (literally ‘I not saw’). In Middle English, the same sentence was expressed as I ne saugh noht (literally ‘I not saw nothing’). Finally, in Early Modern English (around the time of Shakespeare), this sentence became I saw not (eventually, lexical verbs stopped inverting around negation and the so-called do-support was introduced to give us the modern I did not see). In a parallel development, Old French had only the preverbal negation, as in jeo ne dis (literally ‘I not say’). In Modern Standard French both a preverbal and a postverbal element are obligatory, as in je ne dis pas (literally, ‘I not say nothing’), while in colloquial French, which represents Stage 3 of Jespersen’s Cycle, the preverbal ne is optional, so that je dis pas is perfectly acceptable.

All of these subtleties escape the authors of the PNAS paper, who ignore grammatical patterns and changes as much as possible. Even their assignment of some of the 23 “ultraconserved words” to “parts of speech” is flawed. For instance, they call the demonstratives this and that “adjectives”, though these words exhibit neither adjectival morphology nor adjectival syntax. For example, demonstratives are in complementary distribution with articles, quantifiers, and possessors, resulting in the ungrammaticality of *the this book, *every this book, and *John’s this book; whereas adjectives are perfectly capable of co-occurring with these elements, as in the interesting book, every interesting book, and John’s interesting book. Note also that Pagel et al. give different labels to who and what: according to them, the former is a “pronoun”, while the latter is an “adverb”. Yet, these two words clearly share the same syntactic properties (except for the demonstrative-like use of what, as in What book?). Both who and what must appear in the beginning of a question (e.g. Who did you see? and What did you see?). But only one of them can occur in the beginning if both are present, as in Who brought what to the potluck party? and What was cooked by who? (ignoring the who/whom distinction).

While these issues may seem trivial or irrelevant to the larger considerations of the PNAS paper, they underscore the central issue, something repeatedly missed or consciously ignored by these authors and their collaborators (cf. Gray and Atkinson 2003, Bouckaert et al. 2012, and elsewhere): to whit, language is not merely words. The interchangeability of “words” and “language” is a neat conjuring trick, evident in the first sentence of the article’s abstract (highlighting mine):

“The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to reserve evidence of their ancestry beyond 5,000 to 9,000 y.”

Pagel and Atkinson’s search for family relationships among languages is set off course at the onset by looking in the wrong place. It has been understood at least since Antoine Meillet’s work a hundred years ago that grammatical properties are more reliable than words as indicators of familial relationships. As Meillet (1908: 126) noted “Les coincidences de vocabulaire n’ont en general qu’une très petite valeur probante” (“Coincidences of vocabulary are in general of very little probative value”). In recent years, the searchlight has been focused—by bone fide linguists, not evolutionary biologists—on abstract syntactic properties, establishing formal grammar as a population science; see, for example, the work of Giuseppe Longobardi and Cristina Guardiano (e.g. Longobardi & Guardiano 2009). Just as the biological classification of species, originally based on externally accessible characteristics, underwent a revolution on the grounds of progress in theoretical biology, namely the rise of molecular genetics, so too progress in the phylogenetic classification of languages must be based on progress in theoretical linguistics. In order to push the research frontier, we linguists need to identify the basic building blocks of language, its “atoms”, in Mark Baker’s memorable metaphor, and examine carefully how they play out in linguistic evolution. Looking for “words that survived since the last Ice Age”, in contrast,  is a seductive but ultimately a futile enterprise.



*Technically, the grouping proposed by Pagel et al. (2013) is different from the original extent of the Eurasiatic macro-family, as proposed by Joseph Greenberg, in that it does not include Nivkh, but does include Kartvelian and Dravidian families. Nor is this grouping co-extensive with the Nostratic macro-family, as proposed by Vladislav Illyč-Svityč and Aaron Dolgopolsky: Pagel et al.’s grouping includes Chukchee-Kamchatkan but does not include Afroasiatic. It is also noteworthy that most linguist find the Altaic family to be deeply problematic.

  • Belles Lettres

    There used to be excellent scholars like Jasanoff in America. What a pity. But I liked the megamath at the end of the article. Very funny.

    By the way, the etymology of ‘man’ as cited in your article is a lookalike hoax itself. I’d say, the etymology of the Germanic man-word is at best unclear among experts.

    The manwaz-reconstruction dates back to Hirt and was accepted ad hoc for a long time, because it seemed to fit smoothly to Vedic mánu- (and mánus-) which it actually doesn’t. It was rejected as soon as scholars took a deeper look at it (Streitberg, Hardarson, Bammesberger, Seebold and others). Actually, there is nothing in Germanic which could lead us to PG manwaz, if you don’t believe in the Germanic godfather Mannus.

    There is a much better explanation for the Germanic forms: PIE *dʰeģʰōm >> PG gumōn ‘earthling → person’ (as gumo and like Latin homō) >> gmon- > hysterokinetic 1. strong mon-ḗn-∅ > nom. manna (and all n-forms), 2. weak mon-n-éz > gen. mans (and all root forms).

    OCS. мѧжь ‘man (for Greek ársēn and Latin masculus), husband’ is rarely discussed, but it must come from a different inflection (*man-g-jō-s). I’d like to know if there is Lithuanian zmonés for real. Apparently, all Slavonic forms have been strictly male for a long time. The meaning ‘human person’ was reduced to ‘male person’ in Germanic serveral times, too, in relation to the habit to express female persons as women which explains why the etymon has split several times.

    Obviously, the authors of the article didn’t take into consideration that the man-etymon is the word for human being or man in Germanic only (and maybe Latin), but even in Vedic, manu- is not the standard expression for those two meanings. PIE-people wouldn’t have understood English ‘man’ or even Gothic ‘manna’, that’s for sure. But as far as I can see, the authors didn’t operate with real words anyway. They went straightforward to statistics.

    Where does this urge come from to go beyond PIE and create a Eurasian super family? There is nothing to gain out there for scientists. Many people have died trying for many years and noone ever found one single solid sound equation.