Are there "universal truths" about language?

In several recent postings, I’ve discussed two studies recently conducted in New Zealand: one on the origins of human language by Quentin D. Atkinson and colleagues published in Science, and the other on the origins of the Indo-European language family by the same Quentin D. Atkinson and his colleague, Russell D. Gray, published in Nature. Today, I will look at yet another study coming out of New Zealand, this one conducted by the same Russell D. Gray and Michael Dunn, Simon J. Greenhill and Stephen C. Levinson and published in Nature as well.

This study is concerned with the question of linguistic universals, which is hotly debated by generative linguists, typologists and functionalists alike. The publication of an article by Evans & Levinson in 2009 denying the existence of linguistic universals raised a storm in academic literature (see e.g., a special volume of Lingua in 2010 dedicated to a discussion of the Evans & Levinson’s article); it remains to be seen if the discussion of linguistic universals spills onto the pages of the popular press beyond the initial publicizing of Dunn et al.’s article.

Here’s the problem: languages vary widely but, it appears, not without limit. So the central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Two major approaches to the study of linguistic diversity have been formulated: Noam Chomsky’s deductive (or theoretical) approach and Joseph Greenberg’s inductive (or taxonomic) approach. According to the former, one is to construct on the basis of as representative a sample as possible a general characterization of what a human language must be like and predict properties of other languages from this; in practice, concrete studies are conducted to construct ever deeper and more intricate analyses of a very few languages or even just one, often English. In contrast, Greenberg’s taxonomic approach is based on checking out all the known languages on earth (at least in theory; in practice, a representative sample of them), cataloguing their properties and producing a list of all the possibilities. In practice, this translates to concrete studies relying on amassing as much data as possible from as wide a range of unrelated languages as possible. A closer consideration of these two approaches will have to remain a subject for a future posting, but at the moment I will just point out that linguists working within both of these paradigms were successful at amassing a great deal of data and developing a deeper and deeper understanding of human language.

Enters Dunn et al. They use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework. Their claims are as follows. First, contrary to the Chomskian theory of parametric variation, they claim to show that “the evolution of only a few word-order features of languages are strongly correlated”. Second, contrary to the Greenbergian generalizations, they claim to show that “most observed functional dependencies between traits are lineage-specific rather than universal tendencies”. Their conclusion is that “these findings support the view that — at least with respect to word order — cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states”.

What I’d like to do here is to look a bit more closely at the eight word order features that they analyze in their study. The numbers I cite here are from the WALS website. Where more than two possible patterns are identified for a given feature, I will count only those languages that exhibit one of the two main patterns, disregarding languages with no dominant pattern, mixed patterns, morphological expression of one of the elements, etc.

So here are the eight word order features:

  • The order of the subject and the verb. In English the subject precedes the verb (e.g., The man came), whereas in Welsh the subject follows the verb (e.g., Daeth y dyn, literally ‘Came the man’). Note that the languages in the WALS sample do not distribute evenly across the two options: 1060 languages (86%) follow the English SV pattern, and only 179 languages (14%) follow the Welsh pattern.
  • The order of the object and the verb. In English the object follows the verb (e.g., Eat the fish!), whereas in Turkish the object precedes the verb (e.g., Mehmed-i gör-dü-m, literally ‘Mehmed I-saw’). Unlike with the subject-verb order feature, languages distribute evenly into those that exhibit the English VO pattern and those that exhibit the Turkish OV pattern, so much so that the equal number of 640 languages in the WALS sample fall into each category.
  • The order of adposition and noun. In English adpositions are prepositions (i.e., they precede the noun), as in to school, whereas in Lezgin adpositions are postpositions, that is they follow rather than precede the noun, creating the pattern ‘school-to’ (English has one exceptional postposition, ago, as in five years ago). The distribution of languages that exhibit the two patterns is fairly even, with 520 languages (53%) having postpositions and 467 languages (47%) having prepositions.
  • The order of genitive and noun. In English genitives precede the noun, as in the girl’s cat, whereas in Krongo (a language spoken in Sudan) genitives follow the noun, as in níimò má-Kùkkú, literally ‘mother Kukku’s’. Languages appear to distribute fairly evenly across the two patterns: 606 languages (59%) exhibit the English GEN-N pattern and 416 languages (41%) exhibit the Krongo N-GEN pattern.
  • The order of adjective and noun. In English an adjective precedes the noun it modifies, as in the small dog, whereas in Apatani, a Tibeto-Burman language spoken in northeast India, adjectives follow the noun they modify, as in aki atu, literally ‘dog small’. The distribution of languages across the two patterns is not even: only 341 languages (31%) exhibit the English Adj-N pattern, whereas the majority 768 languages (69%) exhibit the Apatani N-Adj pattern.
  • The order of demonstrative and noun. In English a demonstrative precedes the noun, as in that man, whereas in Maba, a Nilo-Saharan language spoken in Chad, demonstratives follow nouns, as in mašuk wak, literally ‘man that’. The distribution of languages across the two patterns is even: 494 languages (51%) exhibit the English Dem-N pattern and 481 languages (49%) exhibit the Maba N-Dem pattern.
  • The order of numeral and noun. In English numerals (like adjectives and demonstratives, above) precede the noun, as in three dogs, whereas in Pumi (a Tibeto-Burman language spoken in China) numerals follow nouns, as in qüa xüé, literally, ‘pig eight’ (for ‘eight pigs’). The distribution of languages across the two patterns is fairly even, with 430 language (46%) exhibiting the English Num-N pattern and 515 languages (54%) exhibiting the Pumi N-Num pattern.
  • The order of relative clause and noun. In English a relative clause follows the noun it modifies, as in the book [that I am reading], whereas in Alamblak (a Sepik language spoken in Papua New Guinea) relative clauses precede the noun they modify, as in [ni hik-r-fë] yima-r, literally ‘who would have followed you person’ (for ‘the man who would have followed you’). The distribution of languages across these two patterns is not even: 506 languages (81%) exhibit the English N-Rel pattern, whereas only 119 languages (19%) exhibit the Alamblak Rel-N pattern.

Dunn et al. claim that pair-wise correlations between these features may hold in one language family but not in another. For instance, they find a strong correlation between the adjective-noun and genitive-noun orders in the Indo-European language family, but not in Bantu or Uto-Aztecan languages. Similarly, a strong correlation seems to exist between the subject-verb and object-verb orders in Uto-Aztecan languages but not in Indo-European, Austronesian or Bantu languages.

But these stronger correlations between certain properties in one language family but not in others may have to do not with “cultural evolution [as] the primary factor that determines linguistic structure”, as claimed by Dunn et al., but with the features themselves. Thus, instead of pointing to cultural factors these non-universal correlation may be pointing — ironically! — to universal cognitive factors.

Take, for example, the non-correlation between the subject-verb and the object-verb orders. As mentioned above, cross-linguistically the SV pattern is preferred over the VS pattern, regardless of whether the language is also OV or VO. The preference may be stronger for OV languages, of which 602 are SV and only 6 are VS, but it is observable for VO languages as well, of which 409 are SV and only 166 are VS. This points to a universal tendency for subjects to be cognitively and/or grammatically more prominent than objects. Another possible explanation for the more even distribution of OV/VO patterns (compared to the SV/VS patterns) has to do with the grammatical rule that routinely displaces the verb in some languages but not in others in relation to the object but not to the subject (this happens for reasons involving verbal inflectional morphology rather than subjects or objects themselves).

Consider now the lack of correlation between the genitive-noun order and either the numeral-noun order or the adjective-noun order. As it turns out, the Gen-N pattern is preferred over the N-Gen pattern regardless of whether a language is also N-Num or Num-N (of the Num-N languages, 216 have the Gen-N pattern and only 143 have the N-Gen pattern; of the N-Num languages, 231 have the Gen-N pattern and only 176 have the N-Gen pattern). Similarly, there is no correlation between the genitive-noun and the adjective-noun orders: as mentioned above, the N-Adj pattern is preferred over the Adj-N pattern, regardless of whether the language also exhibits the Gen-N or N-Gen pattern (of the Gen-N languages, 288 have the N-Adj pattern and only 212 have the Adj-N pattern; of the N-Gen languages, 306 have the N-Adj pattern and only 56 have the Adj-N pattern). The non-correlation between the genitive-noun order and other features of a language may have to do with the problematic nature of the genitives themselves, which can occupy different structural positions, as discovered by generative linguists. To put it informally, “there are genitives and there are genitives”; typological counts such as those presented by WALS may be adding up apples and oranges.

In other words, examining the content of various features, the specifics of what is ordered with respect to what, may be more profitable and illuminating than simply correlating any odd word order feature with any other such feature. This can also be seen by examining those correlations between word order features that hold across language families, such as the correlation between the order of adposition and noun and the order of object and verb, or the correlation between the orders of demonstratives, adjectives and numerals with respect to the noun, or the order of relative clauses and adjectives with respect to the noun.

In these instances we compare the ordering of similar types of entities. For instance, both the order of adposition with respect to the noun and the order of the object with respect to the verb concern the order of the head of a phrase (adposition or verb) in relation to its complement (noun or object). Similarly, the demonstrative-noun, adjective-noun and numeral-noun orders concern ordering three types of elements with the respect to the noun in whose extended functional projection they appear (cf. e.g., Guglielmo Cinque’s recent work on this typology in the generative framework).

Finally, the correlation between the relative-noun and adjective-noun orders is not unexpected either, if we examine the elements that are being ordered. Relative clauses, like adjectives, modify a noun, so it is not entirely unexpected to find them on the same side of the noun, which is indeed what we find in the majority of cases: 447 languages in the WALS sample have the relative clause and the adjective on the same side of the noun vs. 103 languages that do not. Besides, the majority of those 103 languages (82 languages, 80%), including English, have the adjective preceding the noun but the relative clause following the noun, not a surprising pattern at all, given that structurally complex elements tend to occur later in the sentence than their structurally lighter counterparts. This applies not only to relative clauses, which being clausal are by definition structurally more complex than adjectives, but also to complex adjectival modifiers, which too are structurally more complex than single word adjectives (cf. the English a proud man vs. a man proud of his children). Note by the way, that despite this complication English is listed in WALS as an Adj-N language.

Placing structurally complex elements later in the sentence also results in that clausal objects occur after the verb in Dutch, contrary to the OV pattern exhibited by more typical noun phrase objects. Hence, in Hij heft gezegd [dat hij komt] literally ‘He has said that he comes‘ the clausal object (marked by brackets and boldface) follows the verb gezegd, whereas in Hij heft [dat] gezegd literally ‘He has that said’ the noun phrase object (also marked by brackets and boldface) precedes the verb gezegd. Another phenomenon that falls under the same umbrella category of structurally complex elements coming at the end is the so-called Heavy-Noun-Phrase-Shift in English: compare I gave a book to John, where the direct object a book precedes the indirect object to John, with I gave to John a book that was recommended to me by my librarian-friend, where the order of the direct and indirect objects is reversed because the direct object (in boldface) happens to contain a relative clause in it, and is thus structurally complex.

The take-home message: applying computational methods to linguistic features without considering the content of these features may lead to masking the real properties of human languages and of reasons behind certain correlations or the lack thereof. Factors such as internal complexity, known vs. new information, or inflectional morphology may and do interplay with the word order features examined by Dunn et al., so in the words of Mark Baker, one of the strongest proponents of the parametric theory of language and the author of The Atoms of Language:

“…it seems as if one cannot understand anything until one understands everything.”

