Are Words Just Like Genes?

[Thanks to Rory van Tuyl for drawing my attention to Morell’s article and to Martin Lewis for many enlightening discussions of these issues and for helpful editorial suggestions]

In an article published on September 19, 2014 in Science Magazine, journalist Virginia Morell sings the praises of Russell Gray, whose “key insight”, she claims, is that “words are just like genes, in that they resemble each other because of shared ancestry”. In this one sentence, Morell ascribed to Gray a claim that is as patently wrong as saying that the Earth is flat, while making it sound as if he claimed to have made a discovery that was first advanced nearly 200 years before he was born. If I were Russell Gray, I would be writing a refutation to be published in the next issue of Science Magazine. As I am not him, I will address Ms. Morell’s claims in this post.

First, the idea that words may resemble each other across languages was first developed in the writings of 18th century British philologists—mostly well-educated amateurs, whose main interests lay elsewhere—such as James Parsons (1705-1770) and Sir William Jones (1746-1794). In his 1767 book The Remains of Japhet, being historical enquiries into the affinity and origins of the European languages, Parsons noted that words for numerals 1-10 across many languages of Europe and India have similar shape: duo in Latin, two in English, dva in Russian, dvi in Bengali etc. Yet in other languages, these words sound completely differently: iki in Turkish and shnayim in Hebrew. The same is true of other numerals in these languages: take ‘five’, which is quinque in Latin, five in English, pjat’ in Russian, and pac in Bengali, but bes in Turkish and xamisha in Hebrew. Although the words for ‘five’ in Latin, English, Russian, and Bengali are less obviously similar to each other than those for ‘two’, it can be shown that they derive from the same ancestral form via systematic sound changes that applied in the histories of these languages. Turkish and Hebrew are not related, however. Parsons discovery of the Indo-European language family was recapitulated nicely in Jones’ renowned “Philologer passage”, cited in virtually every textbook on historical linguistics (highlights mine):

“The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong, indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists. There is a similar reason, though not quite so forcible, for supposing that both the Gothic and Celtic, though blended with a different idiom, had the same origin with the Sanskrit; and the old Persian might be added to the same family.”

By the late 19th century, the idea that words may resemble each other because they derive from the same ancestral form was so commonplace that Charles Darwin wrote in The Descent of Man: “If two languages were found to resemble each other in a multitude of words […], they would be universally recognized as having sprung from a common source”.*

While common descent is an important reason for words to be similar across languages, it is not the only one. It was also Parsons who first noticed that not all such lexical similarities derive from shared ancestry: for example, the Malay word for ‘two’, dua, is completely unrelated to those in the Indo-European languages mentioned above, despite their apparent sound similarity (cf. Latin duo). Such accidental look-alikes are not as rare as one might think. My favorite example is Russian strannyj and Italian strano, both of which mean ‘strange, weird’. Their similarity is even more apparent if we omit language-specific inflectional morphology: Russian strann- and Italian stran-. Yet, these two words are not etymologically related (and the presence of a second –n at the end of the Russian stem is indicative of that.)

Besides common descent and pure accident, words may resemble each other because of borrowing from one language into another. Lexical borrowing is a common result of even the briefest and shallowest language contact. For example, in the period from the 1960s to the 1980s, English borrowed from Russian sputnik and perestroika, while Russian (in particular, Russian hippie slang) borrowed from English shuzy (‘shoes, typically Western-made runners’) and truzera (‘trousers, typically Western-made jeans’) along with many other words. Yet there was very little face-to-face contact between Russian and American (or British) citizens at that time. Russian hippies were the last people the Soviet authorities would allow to meet a foreigner, if they could help it. But languages seem to “inhale” words from each other just as easily as people inhale air.

Lexical borrowing—besides being extremely common—is not always easily distinguishable from words similar by virtue of common descent. While words denoting cultural concepts such as foods, religious terms, and technological innovations are most commonly borrowed from one language into another (e.g. Russian spagetti from Italian, evangelie from Greek, and kompjutor from English), even words for most basic concepts can be transferred across languages. For example, the Yiddish kinship term almone ‘widow’ (an important concept in Jewish law, the halakha) was borrowed from Hebrew, whereas some other kinship terms, such as bube ‘grandma’ and zeyde ‘grandpa’, were derived from Slavic.


Crucially, lexical borrowing can conceal familial relationships among languages, as is clear from the following example, discussed in more detail in an earlier post. As can be seen from my map on the left, lexical borrowing can easily cross family boundaries. For example, the Latin root for ‘onion’, which gave us the Italian cipolla, was borrowed into some Germanic and Slavic languages (cf. German Zwiebel, Ukrainian tsibulja), as well as into such non-Indo-European languages as Basque (tipula) and Finnish (sipuli).** Similarly, the French oignon was borrowed into Celtic and some Germanic languages (Irish oinniún, English onion). The original Germanic root for ‘onion’, replaced in English and German by Romance imports, but retained in North Germanic languages (Swedish lök), was also borrowed into Slavic languages, giving rise to Russian luk. (This latter borrowing probably occurred through a now-extinct East Germanic language, Gothic.)

Croc-gu-phantLexical borrowing is also vital to the understanding of why words are not “just like genes”. While gene transfer between species (or even larger biological groupings) is not impossible, it is extremely rare. Languages, in contrast, can easily “soak” words from many other languages, including unrelated ones, with even minimal contact among them. Surely, some languages—including English—seem more eager to take on words of other languages. Yiddish is another example of a language that has taken numerous words from Hebrew, Slavic languages, and even historical Judeo-Romance dialects known collectively as Loez. A recent Languages Of The World post discussed Max Weinreich’s famous example sentence noxn benčn hot der zejdə gəkojft a sejfər ‘after the blessing grandfather bought a holy book’, which contains a Loez-origin root benč- (cf. Latin benedicere ‘bless’), a Slavic-origin zejdə (cf. Russian ded ‘grandfather’) and a Hebrew-origin sejfər (meaning ‘book, scroll’ in Hebrew but restricted to ‘Jewish holy book’ in Yiddish). If words were “just like genes”, we would expect mixed biological creatures of the type described in the children’s book Croc-gu-phant, part crocodile, part jaguar, and part elephant. Yet such creatures do not exist in the biological world.

Yiddish is not the only “croc-gu-phant” of the linguistic world. Another such shmeltsshrakh (“fusion language”), to use Max Weinreich’s term, is Papiamentu, a language of the ABC Islands in the Caribbean, which combines elements of Iberian Romance (Spanish and Portuguese) with those derived from Dutch and African languages. Typically, there is some systematic way in which elements of such a mixed language come from which “parent language”. For example, Media Lengua combines Spanish vocabulary with bound morphemes from Quechua. Similarly, Mednyj Aleut, a nearly extinct language spoken on Bering Island, blends Aleut nouns and Russian verbs, each with the full inflectional complexity of the source languages. Perhaps the best-known example of a mixed language is Michif, spoken by small communities in Canada and in North Dakota, which combines nouns and nominal bound morphemes from French with verbs and verbal bound morphemes from Cree, an Algonquin language.


In addition to being easily transferable from language to language (and often subject to conscious choice), words are not subject to positive or negative selection, unlike genes (or more accurately, alleles of a gene). For example, a new allele that emerged in certain human populations allowed adult humans to digest milk and so was positively selected, as it conferred major nutritional advantages once cows were domesticated.*** No such positive selection ever happens to newly introduced words, which are subject to historical contingencies and random factors like fashion. Neither is there an analog of negative selection in language. This is due to the fact that words provide no adaptive advantage to people(s) who have them. This is true of both sounds and the meanings of words. As was noted by the “father of modern linguistics” Ferdinand de Saussure, the association of sound and meaning of a word is largely random: the sound of house is neither more appropriate to the concept nor better for the “survival of the fittest” than maison (French), dom (Russian), bayit (Hebrew), or iglu (Inuktitut). Similarly, having two separate words for ‘hand’ and ‘arm’ (or ‘foot’ and ‘leg’) is neither better nor worse—nor better adapted to any particular physical environment—than having a single umbrella term (cf. Russian ruka ‘hand/arm’, noga ‘leg/foot’), as can be seen from the WALS map reproduced on the left. Certainly, having an umbrella term, as in Russian, makes it easier and less cumbersome to describe a disease that afflicts one person’s foot and another person’s leg (cf. Russian U nix boljat nogi lit. ‘to them ache feet/legs’), but English speakers manage.

To recap, words are not “just like genes” in that they are easily borrowed from language to language, even across family boundaries, are subject to conscious choice, and are not subject to natural selection. Linguistic evolution, in other words, is only vaguely similar to biological evolution—and blindly transferring concepts, principles, and methods from biology into linguistics can lead to nothing but trouble.

These issues are discussed further in the forthcoming book co-authored by Martin W. Lewis and myself, The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. Stay tuned!



*The missing part of the quote makes reference to grammatical similarities across languages, which too may be indicative of common descent. As this post is concerned with words, I am leaving this issue aside.

**The (Eastern) Yiddish word cibələ ‘onion’ is the Latin root that was first borrowed into Slavic and from there into Germanic. On the derivation of German Zwiebel, see here.

***In fact, two very similar alleles of the same gene emerged as a result of two separate mutations, once in Northern Europe and independently in Eastern Africa.


