Using Grammar to Identify Language Relatedness
[Note: This post is aimed mostly at my current students, but comments from other readers are welcome as well. It is in a way a “prequel” to the mini-series, starting here.]
Most textbook that discuss language families and how such familial relationships are identified—as well as a significant proportion of research in the field—focus on lexical material. But can grammar be also used to examine language relatedness? The answer appears to be yes, and recent research by a team of scholars from the UK and Italy led by Giuseppe Longobardi are exploring ways of doing just that. This post describes the basics of their approach and some of the project’s recent publications.
While many linguists have realized a long time ago that grammar offers more reliable evidence of language relatedness than vocabularies, finding just the right items to compare has proved elusive. Longobardi and his colleagues propose to use structural parameters, that is binary points of difference between languages. For example, a given language may place objects before the verb or after the verb, but not both (particularly, if a restrained type of objects, such as noun phrase objects, are considered). Since all languages have verbs and objects, the question cannot be avoided either. Similarly, a language may place adverbs such as ‘often’ or ‘never’ before or after the verb. If we had a complete list of all parameters, we could describe each language as a list of settings of these parameters (e.g. Verb-Object, Adverb-Verb, etc. for English). Just as with vocabulary items, identical settings could be counted between pairs of languages, or for larger sets of languages, and a phylogenetic (i.e. family) tree could be constructed. For example, consider the table in the image on the left, which includes three languages (Language A, Language B, and Language C) and four parameters: the order of the verb and the object, the order of the verb and the adverb, the possibility of subject-drop (i.e. pro-drop), and the order of the noun and its modifying adjective. Given these parameter settings (depicted here as ON/OFF switches, for presentation’s sake), Languages B and C can be deemed more closely related (3 identical parameter settings) than A and B (2 identical parameter settings) or A and C (1 identical parameter setting). In fact, the table was constructed with English as Language A, French as Language B, and Italian as Language C. Indeed, French and Italian (both Romance languages) are more closely related that either English and Italian and or English and French (recall that English is a Germanic language).
Unfortunately, we do not yet have a complete list of parameters. Therefore, in order to test the method in principle, Longobardi and his colleagues restrict themselves to a list of parameters pertaining to noun phrases, including the order of the noun and its modifying adjective, mentioned above. Their list includes 56 such parameters. Presumably, since these parameters operate on elements inside noun phrases, they are conceptually independent from parameters that pertain to elements in clauses (sentences), such as verbs, objects, adverbs, and the like. (This issue, as well as some examples of 56 parameters used, is discussed in more detailed in my earlier post.) Based on the set of 56 parameters, Longobardi and his team constructed a tree of 12 Indo-European languages, as shown on the right of the image reproduced here from their article. The tree on the left was constructed by comparing lexical items instead. While both trees largely accord in their structure, one point is worth mentioning: the lexical tree shows Greek and Irish as outliers, not more closely related to any particular languages in the set. The syntactic tree, in contrast, folds Greek with Slavic languages in what can be understood as “Eastern Indo-European”, while Irish is part of the “Western Indo-European branch”. This divisions of surviving Indo-European languages into Eastern and Western groupings accords well with the classification adopted by most Indo-Europeanists.
In addition to distances between languages on the basis of lexical items or grammatical parameters, the Longobardi team compared genetic and geographical distances between populations speaking these languages. Their conclusion, as stated in Longobardi et al. (2015), is this (boldface mine):
“We observed significant correlations between genomic and linguistic diversity, the latter inferred from data on both Indo-European and non-Indo-European languages. Contrary to previous observations, on the European scale, language proved a better predictor of genomic differences than geography. Inferred episodes of genetic admixture following the main population splits found convincing correlates also in the linguistic realm.”
The boldface part of the conclusion is important in light of earlier studies, such as Novembre et al. (2008), which have shown a close correlation between genetic and geographical distances among populations. In Longobardi et al.’s study, however, the correlation between linguistic distances inferred on the basis of grammatical parameters and genetic distances is 0.74, whereas the correlation between genetic and geographical distances is merely 0.275.
How would one mount a challenge to Longobardi et al.? One way to do so would be to show that the researchers excluded groups who are genetically similar to their neighbors but whose languages are quite distinctive. Absence of any such groups could help explain the effect of linguistics relationships correlating too closely with geography. However, Longobardi et al.’s study includes one such group: the Hungarians. Several earlier genetic studies have shown that Hungarians are fairly close to their Central European neighbors genetically, yet their language belongs to a different language family—the Finno-Ugric family. Given a small sample set, only 15 languages total, having just one such group does not seem problematic. However, a larger study should also include other European groups speaking non-Indo-European languages but not very distinctive genetically, such as Estonians (another Finno-Ugric language) or the Gagauz (speaking a Turkic language).
Sources:
Longobardi, Giuseppe; Silvia Ghirotto; Cristina Guardiano; Francesca Tassi; Andrea Benazzo; Andrea Ceolin; and Guido Barbujani (2015) Across Language Families: Genome Diversity Mirrors Linguistic Variation Within Europe. American Journal of Physical Anthropology.
Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A. R., Auton, A., … & Bustamante, C. D. (2008) Genes mirror geography within Europe. Nature 456(7218): 98-101.