On statistical universals

May 17, 2011 by

In yesterday’s posting, we’ve come to a conclusion that statistical universals (e.g., “Languages with both adjective-noun and noun-numeral orders are extremely rare”) are useless for a learner trying to navigate the minefield of her target language’s grammar: a learner cannot ignore the possibility that her target language is of the rare type. Similarly, such statitical universals are not directly relevant to a theoretician who asks “What’s a possible human language?”, since even languages with rare features are possible — and actual — languages. So are statistical universals useless?

Not according to Mark C. Baker: in his Atoms of Language, he suggests a way to look at statistical universals that make them interesting (to a theoretician, if not necessarily to our hypothetical learner). In particular, statistical universals are interesting if they can be reduced to combinations of factors. Here’s one example of how it can be done. (For the purposes of this example, we will assume the existence of the Head Directionality Parameter, even though it may ultimately prove to be wrong.)

The problem at hand is the order of three main clausal elements: subject (S), object (O) and verb (V). For three elements ordered freely, there are six logically possible orders — SVO, SOV, VSO, VOS, OSV and OVS — and all six happen to be attested in the world’s languages. However, these orders are not equally frequent in the languages of the world. No matter what sample we take, study after study confirms that the SVO and SOV orders are the most common and are nearly equally common. Less common is the VSO order, but it too is much more frequent than VOS or one of the object-initial orders (OSV or OVS).

Here are some figures from the latest edition of WALS. Out of the total 1377 languages, 488 languages have the SVO order; these include English, Finnish (Finno-Ugric), Indonesian (Asutronesian) and Zulu (Bantu). The SOV order is found in 565 languages, including Turkish (Turkic), Japanese, Quechua and Basque (the latter three languages are likely isolates). The VSO order is much less frequent with only 95 languages in the sample, including Zapotec (Oto-Manguean), Welsh (Celtic; Indo-European) and Niuean (Austronesian). The VOS order is even less frequent with only 25 languages in the sample, among them: Tzotzil (Mayan) and Malagasy (Austronesian). OVS languages number 11 in this sample and include Hixkaryana (Carib; spoken in Brazil) and Mangarrayi (Gunwingguan; spoken in Australia). Finally, OSV languages are the rarest of all, with only 4 languages in the sample: Kxoe (Khoisan; spoken in Namibia), Nadëb (Maku; spoken in Brazil), Tobati (Austronesian; spoken in Indonesia) and Wik Ngathana (Pama-Nyungan; spoken in Australia). (An additional 189 languages do not have a dominant word order.) As you can see, the word orders do not correlate with either geography or language families (at least, not perfectly).

In percentages, SOV order is found in 47.5% of languages, SVO order — in 41%, VSO order in 8%, VOS order in 2.1%, OVS in 0.9% and OSV in 0.3% of languages. Why such disbalance?

Leaving aside the three rarest word orders and focusing for the moment on SOV, SVO and VSO orders, we can say that SOV languages are only slightly more common than SVO languages, but that both are much more common than VSO languages. Why is VSO rarer than the other two orders?

Let’s consider which settings of which word order parameters (as proposed by Mark Baker) will result in VSO order. To get this order, the Head directionality parameter must be set so that the head precedes its complement (VO); the Subject placement parameter must be set so that the subject appears relatively low in the clause structure, which together with the setting of the Verb Attraction parameter (the verb preceeding adverbs and negation marker) guarantees the VS order. If any of these three parameters is set differently, we’ll get a different word order.

If we assume that each parameter has a 50-50 chance to be set this or that way (and that each parameter is a binary choice), the probability that three parameters will be set just a certain way is 1 in 8. Therefore, we expect about 1 in 8 languages to be VSO, give or take. This prediction is not far from the observed frequencies: recall that it in the most recent WALS sample, 8% of langauges are VSO. In other studies this number ranges from 6.9% to 12.5%, depending on the sample.

Thus, statistical universals may not be completely hopeless after all. They may be apparent consequences of (absolute) universal principles and a certain setting of (variable) parameters. Viewed this way, statistical universals may serve as a guide for exploring more insightful universals formulated within the Principles & Parameters framework.

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below: