Absolute Dating and the Romance Problems on the Bouckaert/Atkinson Model

May 24, 2014 by

As noted at the end of the previous post, Bouckaert et al.’s dating of the Romani split off the rest of the Indic tree at 1500 BCE (3,500 years ago) is a gross miscalculation. Linguistic evidence involving the grammatical gender system of Romani shows that the split must have occurred some 2,500 years later. But there is a larger issue here concerning the dates for the various splits on the Indo-European family tree. The dating procedure employed by Bouckaert et al. is based on two essential assumptions: that the rate of loss (or gain) of cognates is steady, and that certain key splits—the ones they have chosen as data—have been incontestably dated through historical records. But both of these assumptions are blatantly wrong.


The idea that replacements in the core vocabulary of any language happen at a regular pace was first explored in the 1950s by Morris Swadesh. Swadesh thought that such temporal consistency would allow the dating of language divergence, and he called the resulting study glottochronology. Parallels are often drawn between glottochronology and an absolute dating technique used in evolutionary biology that is known as “molecular clock”. However, the latter technique was not established until the 1960s. Instead, the idea of a constant rate of lexical change must have been inspired by earlier work on radioactive decay. Swadesh compiled a list of lexical items that he assumed to be resistant against borrowing; the original list had 200 words, but the most widely used version in use today has only 100 items. Bouckaert’s team, however, went back to the original 200-item list.

According to Swadesh, the rate of change in that most conservative part of the lexicon is about 14% per millennium. However, unlike sub-atomic changes or mutations in genes, which happen at random, changes in language are often precipitated by extra-linguistic, social factors. As a result, lexical replacements in the core vocabulary as elsewhere can happen in waves, and as a result the rate of replacement may differ radically from one language to another. For example, Bergsland & Vogt (1962) demonstrated convincingly a “rate of change” in conservative Icelandic at around 4% per millennium, as compared to the 20% rate found in closely related Norwegian. Russian linguist Sergei Starostin, however, showed that if loanwords are eliminated from the calculations, the rate of change for Norwegian comes down to the expected rate of 5–6 “native” replacements per millennium. Yet even when it comes to the core vocabulary—the Swadesh list of 100 words assumed to be the least likely to be replaced—borrowing remains significant: it has been shown that English has borrowed 31 items in the 100-word Swadesh list, Turkish 22, French 27, and Albanian a whopping 41 (see McMahon 2010). While separating the husk of loanwords from the grain of “native” vocabulary is crucial to making glottochronology feasible, the procedure is never easy and is sometimes all but impossible, especially for languages whose history is known imperfectly (computational methods for identifying loanwords are discussed in an earlier post).

Bouckaert et al., follow current lexicostatistical norms in attempting to alleviate the problems associated with inconstant lexical change by adopting a calibration technique based on known dates.* Instead of calculating the dates of all divergence points on a language tree based on some presupposed constant rate of change (e.g. 5% or 14% per millennium) from the present day backward in time, such studies peg some of the splits to clearly dated historical events. After, having thus put such scale on the family tree, they calculate the dates of other splits in relation to those already known.

Establishing such calibration points, however, generally proves to be more complicated than it sounds. As it turns out, Bouckaert et al. provide erroneous dates for numerous divergence points. We have already pointed out the miscalculation of the date of the Romani split by some 2,500 years. Another wildly misplaced divergence point concerns the split of the Insular Celtic languages into the Goidelic and Brittonic branches (the former including Irish and Scottish Gaelic, and the latter including Welsh, Cornish, and Breton). The trees given in the authors’ Supplementary Materials (Fig. S1 and S2) indicate this split as occurring about 900 BCE (2,900 years ago); strangely enough, the map frame for 692 BCE from the animation on the authors’ website does not yet show this split, although the front of the Celtic advance is shown as already firmly established in southern British Isles. Putting aside this inconsistency between the different multimedia presentations of the study’s results—one of many—this date is too early as well; according to archeological record, the ancestors of the various Celtic groups were living at the time in Austria and Bavaria. It would have helped if the authors had included Continental Celtic varieties such as Gaulish into their sample. This omission throws off the dates of the various divergence points inside the Celtic cluster by centuries.

Some of the dating errors are apparently due to assuming incorrect calibration points. A prime example is the alleged separation of Romanian from the remaining Romance languages, placed by the authors in 270 CE, when Dacia ceased to be part of the Roman Empire. (Once again, the various presentations of their results do not coincide: the trees in the Supplementary Materials indicate that the slit occurred about 2,000 years ago, and their animated map frame for 18 CE indicates the split via the thick green lines beginning to diverge in the area of Rome). However, as pointed out by a LanguageHat reader Etienne, the 270 CE date

“is not an established datum. The issue as to whether the Romanian language directly stems from the Latin of the Roman province of Dacia or from the Latin of a later group of migrants (whose point of origin must have been South of the Danube) has been fiercely debated over several generations. If the latter scenario (a later migration to Dacia) is true, then of course the date of separation between Romanian and its Romance sisters must be later, indeed perhaps much later, than 270 CE.”

One argument in favor of the latter scenario is that the Romans occupied Dacia for only about 170 years before abandoning it; this does not look like long enough period for Latin to have taken hold so firmly. In this respect, Romania can be compared to Britain, which was occupied by Rome for much longer but did not ultimately become Romance-speaking. Because the first preserved Romanian text dates only from the 16th century, we do not have direct evidence as to its early stages. But indirect evidence in support of the later-arrival theory can be pieced together based on Romanian dialects. As Etienne further explains (in personal communication), “the core problem relates to Romanian dialects: they are much too homogeneous”, as compared, for example, to the dialects of northern France: across northern France mutual intelligibility between non-adjacent dialects was until recently practically nil, whereas in Romania mutual intelligibility across dialects is far greater. Given that northern France and Romania are about the same size, one would expect the dialect differentiation in the two areas to be roughly comparable (or perhaps one would anticipate finding even more homogeneity in northern France, which has no large mountain chain like the Carpathians). Of course, the surprising uniformity of Romanian dialects is easily explainable if the separation of Romanian postdates rather than predates the separation of French from the original Latin.


It is also significant that the Roman Province of Dacia corresponded to less than half the territory of present-day Romania: for example, the Bucharest region, whose dialect is the basis of standard Romanian, never belonged to the Roman Empire. If Romanian goes back to the Latin of Roman Dacia, as the 270 CE date of the separation presupposes, it must have spread outside of the Empire into lands that were not part of Dacia. But if this is the case, we would expect to find greater dialectal differentiation in the “old Romanian territories” compared to the lands in which the language spread later. Sadly for Bouckaert et al., this expectation runs contrary to fact, as not only are Romanian dialects very homogeneous, they are equally homogeneous across all parts of the country. Such basic facts of Romanian dialectology, however, make perfect sense if Romanian spread over present-day Romania and Moldova long after the fall of the Roman Empire from some other place, most likely from south of the Danube. As is the case for the early split of Romani, the incorrect classification of Slavic languages, and the incorrect depiction of a closer tie of Frisian to Flemish and Dutch than to English (discussed in an earlier GeoCurrents post), Bouckaert et al.’s false assumption regarding Romanian probably derives from the fact that although it is clearly a Romance language, Romanian nonetheless has a heavy Slavic influence in the lexis, including items in the Swadesh 200-word list, such as trăi ‘to live’ (from the Slavic trajati ‘to last, continue’), lovi ‘to hunt’ (from the Slavic loviti ‘to hunt, chase’), and zăpadă ‘snow’ (from the Slavic zapadati ‘to fall’).

Quentin D. Atkinson, one of the authors of Bouckaert et al., claimed in a podcast interview with Science’s Isabelle Boni that they “were able to find 14 different known divergence times on the tree, which we used to calibrate the rate of change”. Given the fallacy of the one concerning Romanian, the others must be regarded with skepticism, and as a result they will be examined in later posts.

More generally, Bouckaert et al. appear to make an assumption—false, as I argue below—that the split of the Romance branch into individual languages coincides with the fall of the Western Roman Empire. In the case of Romanian, this date is probably too early, but in the case of the split of the Italo-Western Romance branch (after Sardinian and Romanian split off) into individual languages, the date is probably too late. Vulgar Latin, the actual speech of the common people during the late Roman Empire, exhibited pronounced differences over both time and space. While mentioned by the ancients, Vulgar Latin was never transcribed or described in detail. Nonetheless, it is attested in Late Latin texts, especially those that condemn linguistic “errors” in spoken Latin, such as Appendix Probi, a work written in the third or fourth century CE. Moreover, some literary works written in a lower register of Latin—especially the dialogues of the comedies of Plautus and Terence, many of whose characters were slaves—provide a glimpse into the world of Vulgar Latin in the classical period.

Another important source of evidence about Vulgar Latin comes from inscriptions, such as the ones discovered in Pompei, which were buried in volcanic ash in 79 CE—nearly 400 years before the fall of the Roman Empire. In those inscriptions, we find that the Latin ending -t, found on verbs in the third person singular, is often left out. It is assumed that its variable presence in spelling means that in the spoken Latin of Pompei, the sound had already been dropped. As it happens this change corresponds to the history of the Italian language: where Latin had venit for ‘he/she comes’, Italian today has viene. However, this change, well-attested though it is in Pompei, did not affect all Romance languages simultaneously. To this day, some varieties of Sardinian still keep a final –t in these and other verbs. French, in some ways the most innovative Romance language, kept the –t in vient (as the spelling indicates), pronounced well into the 16th century. Indeed it is still pronounced under some circumstances when a following word begins with a vowel (thus, the –t is pronounced in Vient-il? ‘Is he/she coming?’). Finally, in the earliest known texts in a Romance language from the Iberian peninsula (short poems in Arabic script) we find a final –t (spelled with the Arabic letter dad) on third-person verbs. Thus, we have good evidence that the loss of the third person-marking –t that took place in the Latin of Italy had not spread to the Latin of Sardinia, Gaul (today France) or the Iberian Peninsula (Spain and Portugal) before the fall of the Empire. This clearly indicates that the differentiation of Latin into the various Romance languages had begun before the fall of the Western Roman Empire.

One of the main forces behind the diversification of Vulgar Latin was probably the Roman army, as conquering soldiers were the first ones to bring Latin to the far corners of the growing Empire. Joseph B. Solodow in Latin Alive (p. 36) recounts a story from the historian Tacitus, describing

“a vivid scene during a military campaign. Two brothers belonging to a Germanic tribe stand on opposite banks of a river and spiritedly debate the proper stance to be taken towards the Romans… And when [Tacitus] mentions that the debate was conducted mostly in Latin and explains that the brothers had learned the language through military service with the Roman army, he indicates one way by which familiarity with Latin spread among native people.”

The Romans encouraged Latin-based education in the provinces, especially for the children of the elites; “Britain, Gaul, Spain, and north Africa were soon producing distinguished orators and writers, teachers and scholars” (Solodow, p. 37). Latin was viewed as the cement holding the empire together, “a peaceful bond” (Augustine, On the City of God, 19.7). As Latin gained ground throughout the empire, first in its cities and then in the countryside, a significant number of the conquered people became bilingual. But such massive learning of Latin as a second language resulted in non-native speakers introducing patterns and constructions from their native tongues. Children who grew up in such linguistically mixed communities incorporated some of the non-Latin patterns into their otherwise Latin speech. Such substratum influences are a well-known vehicle of language change, although individual instances are often difficult to prove (see McMahon 1994, pp. 221-222 for further discussion and examples). Changes were also happening in variants of Vulgar Latin for other reasons as well, and gradually a plethora of dialects emerged.

It should also be noted that Bouckaert et al.’s graphic representation of language divergence through a family tree superimposed on the animated map is misleading as well. The authors state in the Supplementary Materials that their “phylogeographic model allows [them] to infer the location of ancestral language divergence events corresponding to the root and internal nodes of the Indo-European family tree” (p. 20). However, the resulting depictions are bizarre, as they depict a given language group splitting in a particular place and then the daughter languages moving into new areas. For example, in the case of the Romanian splitting off the Romance tree, the “location of ancestral language divergence events” is in the area of Rome, creating the impression that Romanian diverged from Latin in the center of the Empire, with speakers of (proto-) Romanian subsequently moving eastwards. Numerous other divergence events are mapped in equally unlikely locales.

Irish Dialects Today

In a typical situation, as a language expands geographically it simultaneously diverges linguistically, due both to drift and to contact with different neighboring languages. The diversification of English into British, Scottish, American, Australian, and other varieties worldwide—not always easily mutually intelligible—illustrates this process perfectly, as does the expansion of Indo-European languages in general from their relatively small homeland area. In other cases, however, languages (or dialects) can diversify without spatial expansion. A perfect example involves the diversity of Modern Irish dialects, which largely developed due to the shrinking of Irish-speaking communities and the lack, since the mid-17th century, of a formal authority (e.g. language academy) or a social body responsible for “managing the linguistic garden, killing the weeds of linguistic creativity” (as James McCloskey beautifully described it in his recent presentation on “The Dialect Geography of Irish Nonfinite Clauses” at UC-Berkeley). As a result, significant variation has developed over the past several centuries among dialects spoken from Donegal in the northwest and to County Waterford in the south. The differences among dialects concern not only pronunciation but also grammatical structures; mutual intelligibility is now so low that McCloskey was accused of “speaking French” when he used his native Donegal dialect in southern Ireland!



*Unlike glottochronology, which assumes a constant rate of change for basic vocabulary items and attempts to estimate the dates of divergence from a common proto-language, lexicostatistics makes use of the comparative method (though without reconstructing a proto-language), has a wider range of applications, and need not rely on the assumption of a constant rate of change. A leading practitioner of lexicostatistics was Isidore Dyen—my former landlord, of all things!—who used these techniques to classify Austronesian and Indo-European languages.



Bergsland, Knut & Hans Vogt (1962) On the validity of glottochronology. Current Anthropology 3: 115–153.

McMahon, April (1994) Understanding Language Change. Cambridge University Press.

McMahon, April (2010) “Computational Models and Language Contact”. In Raymond Hickey (ed.) The Handbook of Language Contact. Pp. 128–147. Wiley-Blackwell.

Solodow, Joseph B. (2010) Latin Alive. The Survival of Latin in English and the Romance Languages. Cambridge University Press.


Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below: