Meeting Darwin’s Last Challenge: Matching Genes, Languages, and Geography (Part 3)

Oct 4, 2015 by

Novembre2008 In my previous posts (see here and here), I have analyzed Svetlana Burlak’s critique of Longobardi et al. (2015), posted on the Генофонд.рф website. The second critical response posted there is by geneticist Oleg Balanovsky (“Two similar studies arrived at opposite conclusions”). The gist of Balanovsky’s critique is that the results of Longobardi et al.’s study do not coincide with those in Novembre et al. (2008). The latter study shows “a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans”. In contrast, Longobardi et al. claim that genes correlate better with language (especially its syntactic component) than with geography. Note, however, that the two claims are not necessarily incompatible: it is possible that while genes are closely correlated with geography, they are even more closely correlated with language. In fact, Balanovsky himself concedes this point (translation mine):

“However, the Novembre article showed a surprisingly close connection between gene pool and geography, but a theoretical possibility remains that the connection with linguistics is even closer. Still, it is surprising and even worrying that two studies, both relying on similar data (variability in hundreds of thousands of autosomal markers in European populations) achieve opposite conclusions.”

In principle, this is a valid point, yet the seven years that separate those two articles—and the wealth of new genetic data, as well as advances in genetic testing and computational processing of data that occurred in that period—make me more willing to place my bets on the side of Longobardi et al. rather than earlier article by Novembre et al.

I am particularly surprised to see this critique coming from Oleg Balanovsky, since Novembre et al.’s article, which he appears to trust more than that of Longobardi et al., contradicts his own findings. Consider the image, reproduced from Novembre et al.’s article above, particularly the position of Slovaks on the two-dimensional diagram of genetic distances (the ochre circle in the bottom-right quadrant, marked “SK”). According to this study, Slovaks are genetically closest to Italians, Greeks, and Cypriots, in descending order. Yet, these findings by Novembre et al. are not in accordance with a recent study by Balanovsky and his team, which he discusses in his response to A.A. Klyosov’s critique. Balanovsky’s study shows Slovaks to be genetically closest to Poles and Hungarians, and to a lesser extent to Ukrainians and Russians. In other words, Slovaks’ genetic links, according to Balanovsky, extend north and east, not south. Moreover, according to Balanovsky, Italians, Greeks, and Cypriots are among “populations genetically drastically distinct from Slovaks” (translation mine); in fact, according to Balanovsky, these southern European populations are no closer to Slovaks than the Irish, the Finns, or the Portuguese, which Novembre et al. claim to be the most genetically distinct from the Slovaks. (Curiously, both Novembre et al.’ and Balanovsky’s studies have shown the Czechs to be genetically very distinct from Slovaks. Since neither Czech nor Slovak were included in Longobardi et al.’s study, it is impossible to tell whether the similarity of their languages coupled with distinctive gene pools would be present a problem to the proposed correlation between languages and genes.)


Another reason I am surprised at Balanovsky’s critique of Longobardi et al.’s conclusion that genes correlate more closely with language than with geography is that the same language/gene correlation has been found in Balanovsky et al.’s 2011 study of the Caucasus region (discussed briefly in my earlier post). As can be seen from the image, reproduced on the left from Oleg Balanovsky’s doctoral dissertation, each distinct language family in North Caucasus—Adygo‑Abkhazian, Ossetian, Nakh, and Dagestanian—correlates with a particular predominant Y‑haplogroup. While Balanovsky mentions the conclusion of his Caucasus study in his critique of Longobardi et al., he suggests that Europe is different from the Caucasus in that in Europe the geography rather than language is expected to correlate more closely with genetics. It is not clear to me, however, why that should necessarily be so.


The only group in Europe that is distinctive linguistically but not genetically is the Hungarians: their language belongs to the Finno-Ugric rather than Indo-European family, but their genetic profile is very similar to those of their neighbors. For example, according to Novembre et al., Hungarians are genetically similar to (in descending order) Czechs, Slovenians, Croatians, and Austrians. Yet, Longobardi et al.’s study includes Hungarian, and indeed their syntactic and genetic trees reflect the differing positions of Hungarian in those two domains: clustering with Finnish in the syntactic tree yet clustering with Serbo-Croat and Romanian populations on the genetic tree. In fact, Hungarians are one striking exception to the overall pattern, yet apparently not sufficient to destroy the correlation between languages and genes. The other two European populations in Longobardi et al.’s study whose languages are distinctive are Finns and Basques, but as most genetic studies show, these populations are also genetically distinctive.

There are two other interesting populations in Europe whose languages and gene pools do not go together, missing from Longobardi et al.’s study: Estonians and the Gagauz. The former speak a Finno-Ugric language, like the Finns, but it is not clear whether they genetically are similar to the Finns or to the other peoples in the region (primarily Latvians and Lithuanians). I am not aware of any genetic studies that answer that question conclusively (for example, Balanovsky’s recent study of Baltic and Slavic genes and languages, discussed in my earlier post, does not include Estonians). The Gagauz speak a Turkic language, more precisely a language in the Oghuz branch of the Turkic family, whose closest relatives include Azerbaijani, Turkish, and Turkmen (note that populations speaking those languages are neither geographically nor genetically close to the Gagauz). Yet, genetic studies find the Gagauz to be more closely related to neighboring southeastern European groups than to linguistically related Turkic-speaking groups (cf. Nasidze et al. 2007, Varzari et al. 2009). Perhaps if enough of such counterexamples are included in a study, the alleged correlation between languages and genes would appear weaker. (Including Hungarian, Estonian, and Gagauz into the lexical comparison would also allow researchers to test whether the lexicons of these languages are more permeable to borrowings from surrounding languages than their syntax.)


In conclusion, I’d like to point out a fascinating connection between the findings of Longobardi et al.’s study with intriguing ideas proposed by Baker (2001, 2003). If, as Longobardi et al. claim, linguistic diversity correlates closely with genetic diversity among humans, the next obvious question is why. Baker conjectures that differences in language, including parametric differences (i.e. different values of syntactic parameters), are an overt and easily identifiable marker of genetic differences. After all, people typically acquire their mother tongue from those whom they are genetically related to. As a result, people tend to speak the language of “our people”, and such a distinction between “us” and “them” typically determines who one fights with, who one marries, and who one shares information with (hence, Baker’s notion of “language as a cipher”). Thus, although Longobardi and Baker work with different notions of what syntactic parameters are and how many of them there are (for a detailed discussion, see Pereltsvaig 2015), Longobardi’s computational results and Baker’s thought experiments find surprising harmony.





Baker, M. (2001) The Atoms of Language. Basic Books.

Baker, M. (2003) Linguistic differences and language design. TRENDS in Cognitive Sciences 7(8): 349-353.

Balanovsky, O; K Dibirova; A Dybo; O Mudrak; S Frolova; E Pocheshkhova; M Haber; D Platt; T Schurr; W Haak; M Kuznetsova; M Radzhabov; O Balaganskaya; A Romanov; T Zakharova; D F Soria Hernanz; P Zalloua; S Koshel; M Ruhlen; C Renfrew; R S Wells; C Tyler-Smith; E Balanovska; and The Genographic Consortium (2011) Parallel Evolution of Genes and Languages in the Caucasus Region. Molecular Biology and Evolution 28(10): 2905–2920.

Longobardi, G.; et al. (2015) Across Language Families: Genome Diversity Mirrors Linguistic Variation Within Europe. American Journal of Physical Anthropology.

Nasidze, I.; et al. (2007) The Gagauz, a Linguistic Enclave, are not a Genetic Isolate. Annals of Human Genetics 71(3): 379–389.

Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A. R., Auton, A., … & Bustamante, C. D. (2008) Genes mirror geography within Europe. Nature 456(7218): 98-101.

Pereltsvaig, Asya (2015) The Functional Structure of the Nominal Domain. In: Antonio Fábregas, Mike Putnam & Jaume Mateu (eds.) Contemporary Linguistic Parameters. London: Bloomsbury. Pp. 303-331.

Varzari, A.; et al. (2009) Searching for the Origin of Gagauzes: Inferences from Y-Chromosome Analysis. American Journal of Human Biology 21(3): 326–336.

Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below: