More on “language as a vehicle for happiness”
A recent issue of PNAS Online contains an article by Peter Sheridan Dodds, Eric M. Clark and colleagues titled “Human language reveals a universal positivity bias”, which I have been asked to comment on. As I have already written on an earlier version of this study in September 2011, I will not repeat most of the criticisms that still apply here. Nor will I discuss the assumption of this study’s authors that language has “a shaping effect on thinking”, or their references to works such as that of Keith Chen on the presence of future tense in a language impeding future-oriented behavior of its speakers; I have written extensively on these subjects (e.g. see here). But is this new study a significant improvement on the preceding work by the same researchers? Is it indeed “a leap forward in terms of the scope of data-based approaches to language analysis”, as the authors have claimed?
I am afraid the answer is “no”. Although this study is more computationally-heavy than its precursor, the adage “garbage in, garbage out” still applies: putting more garbage in does not change what comes out, I am afraid. This study still makes three key blunders that are the hallmarks of quasi-linguistics that I have been writing about. The first blunder that is extremely common among the would-be linguists drawn to the study of language “like moths to a flame”, to use J.P. Mallory’s metaphor, is equating language with words. According to the study’s authors, they “focus on the words people most commonly use” (highlight mine). Yet, words are merely a small (and to a linguist, the least interesting) part of language. This issue is discussed in more detail in my earlier post. The second blunder, as bad as the first, is equating language with writing, as witnessed by their listing of “Chinese (Simplified)” as one of the “languages” they have investigated. Yet this is a way to write rather than speak Chinese; the reader is left in the dark as to whether the study examined Mandarin, Cantonese, or some other Chinese (or “Sinitic”) language.
The third blunder is worth dwelling on a bit more at length. According to the authors, the main contribution of this new study is in including “10 languages diverse in origin and culture”. Of course, ten languages is better than one, but just how diverse and representative is their sample? According to the Ethnologue, there are 7,106 languages spoken in the world today. However, as shown in an important recent geography dissertation (Ford 2013), the Ethnologue often excessively divides single languages along political or ethnic boundaries due to overreliance on such factors as the use of certain writing systems and language of literacy, which are distinct from those of linguistics per se; thus, a more appropriate figure may be around 5,000. Still, a 10-language sample constitutes only about 0.2% of the world’s languages. The sample is tiny, but is it at least representative? Are these ten languages as “diverse in origin and culture” as the authors claim? Let’s consider their genealogical classification (i.e. diversity “in origin”) first. As it turns out, only five language families are represented by the ten languages in the sample: Indo-European, Sino-Tibetan, Austronesian, Afro-Asiatic, and either Altaic (if one believes that this is a bona fide family and that Korean is a member) or Koreanic (if Korean is treated as an isolate). (The Ethnologue treats Altaic as a family, but Korean is not a member of it, but rather an isolate.) The number of language families is as debatable as the number of individual languages, with the Ethnologue listing 147 families. Five out of 147 families means that less than 5% of the existing linguistic diversity at the family level is represented.
Moreover, the sample is biased heavily in favor of the Indo-European family: 6 out of 10 languages—English, Spanish, French, German, Brazilian Portuguese, and Russian—are from this family. Even within the Indo-European family, only three of the 10 benchmark groupings are represented: Germanic (English and German), Romance (Spanish, French, Brazilian Portuguese), and (Balto-)Slavic (Russian). No languages are included from the Indo-Iranian, Hellenic, Armenian, Celtic, or Albanian groupings. A more careful examination reveals that even within these benchmark groupings certain branches are overrepresented: thus, both Germanic languages come from the West Germanic branch and none are from the North Germanic branch. Similarly, within the Romance grouping, all three languages come from the Gallo-Iberian branch of the Western grouping of the Italo-Western subfamily. The point is that the languages selected are very closely related, genealogically.
What of their typological properties? Of the ten languages, seven exhibit Subject-Verb-Object (SVO) order. The only exceptions are Korean, which is SOV, Arabic, which at least in the Modern Standard version is VSO, and German, which WALS treats as having no dominant order. Given that SOV is the most commonly found order cross-linguistically, this sample is not representative in this respect. Similar skewing is found with respect to virtually any typological feature one might care to check. Thus, all except three languages in the sample have sex-based grammatical gender systems; only Korean, Chinese, and Indonesian lack it. Yet, yet 55% of the language in the more carefully selected WALS sample lack such gender systems. Only two of ten languages, Chinese and Indonesian, lack the grammatical marking for past tense, yet this option is nearly as common cross-linguistically as single-past-tense systems found in the other languages in this study (“single-past-tense” meaning that there are no remoteness distinctions). Likewise in phonology, all the ten languages have either average or moderately large consonant systems; only Mandarin has a tone system, and so on. All in all, these are not typologically diverse languages.
The hardest to comment on is their “diversity in culture”, as there are no good ways to measure or assess such diversity. Yet, it is quite evident that none of the languages are spoken (primarily) by hunter-gatherers, nomadic pastoralists, or even primitive agriculturalists. Given that it is electronic corpora that were studied, it is not at all surprising that no truly “exotic” languages like Pirahã, Chukchi, or any of the clicking Khoisan languages are included.
All in all, the linguistic import of this study is virtually nil. Consider the following analogy. Assume a certain study, based on data from a particular GPS app, shows that silver Toyota Corollas and red Subaru Outbacks are driven to places that their drivers associate with positive emotions (such as shops and restaurants) more often than to places associated with negative emotions (say, cemeteries or dental offices). Given the source of the data, the results are automatically skewed because the sample is selected from a non-random group—only cars driven by people who use this app are included. Moreover, the lack of representativeness would make it hard if not impossible to derive more general conclusions about cars of other makes, models, or colors. But most importantly of all, whatever conclusions (however limited!) we are able to draw from such a study are about the psychology of the drivers, not the inner workings of cars. As a result, such a study might have a limited use, for example, for advertisement purposes, yet it cannot be claimed to be “a leap forward in terms of the scope of data-based approaches to” automotive engineering! In the same vein, the study published in PNAS says no more about “language as a vehicle of happiness” than our hypothetical study would about “cars as a vehicles of happiness”.
Sources:
Ford, O.T. (2013) Parallel Worlds: Empirical Region and Place. Unpublished Ph.D Dissertation, Department of Geography, University of California at Los Angeles.