Do Languages of Primitive Peoples Have Small Vocabularies?—And Some Thoughts about Epistemological Populism

Jul 28, 2015 by

In my previous post, I have argued that languages of the so-called “primitive” peoples are not necessarily as simple as many lay people conceive them to be. While yesterday’s post focused on grammars of such languages, illustrating some of the complexity of one such tribal Australian Aboriginal language, Dyirbal, I only mentioned in passing that languages of primitive peoples often have staggeringly rich vocabularies. This is particularly true when it comes to semantic fields that may be relatively poor in European languages, especially if one excludes technical (often Latin) terms. This fact is mentioned in the article titled “A loss for words” by Judith Thurman, published in the March 30, 2015 issue of The New Yorker (thanks to Leni Silberman for sharing the article with me). The focus of Thurman’s article is on the death of endangered languages, such as Kusunda and Hupa, and whether dying languages can be saved, an issue which I have discussed here, here, and here.

With respect to vocabularies of tribal languages, Thurman writes:

“In Samoa, Cox [the executive director of the Institute for Ethnomedicine in Jackson Hole, Wyoming] discovered that Polynesian herbal doctors had an extensive nomenclature for endemic diseases and a separate one for those introduced by Europeans. Their sophistication is not unique. The taxonomies of endangered languages often distinguish hundreds more types of flora and fauna than are known to Western science. The Haunóo [sic], a tribe of swidden farmers on Mindoro, an island in the Philippines, have forty expressions for types of soil.”

(The relatively poor inventory of basic color terms in Hanunóo is discussed in this earlier post.)

What is particularly amazing is how such mistaken beliefs about the world’s languages coexist in the popular consciousness with other beliefs that contradict them (sometimes, equally mistaken). For example, the very common view that “primitive peoples” have “primitive languages”, with small vocabularies and little or no grammar, is often held by the same people who also believe that “Eskimo has many words for snow”. A simple show of hands in my continuing studies classes shows that a large number of educated, intelligent people among those polled hold both of these views, despite a contradiction between them. But this logical incongruity is simply ignored. This is another example of what Martin Lewis and I called “epistemological populism”, complaining that our respective fields—geography and linguistics—are subject to more of this phenomenon of “equat[ing] truth with popularity”, than other scientific fields. Although both the “simplicity of tribal languages” and “Eskimo having many words for snow” turn out to be empirically wrong, such beliefs are hard to shake out of the popular consciousness, where they seem to be cemented in by general illiteracy on matters of language, generated in turn by inadequate school-level education, sensationalist and biased science journalism, and frequent publication by non-linguists of articles on language-related issues that would not pass as Linguistics 101 term papers. Of course, linguists do not do enough to educate broader audiences about the field, largely because such efforts do not pay enough (if at all) and do not count for much in terms of career advancement in the field. These are issues that professional organizations and collegial committees should give more thought and careful consideration to, if the “PR problem” of the field is to be solved. For the time being, I welcome informed readers’ opinions on these issues and ideas about overcoming such epistemological populism.

Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:



  • Sean Manning

    While I have no doubt that the languages of simple, homogeneous societies are usually full of nuance and more than adequate to describe the areas of life which are important to them, I would be very surprised if their vocabularies are usually as large as the languages of complex, heterogeneous societies. On one hand, such languages usually have many fewer areas of life to describe: probably desert animals =or= forest animals, probably one set of terms for metalworking not separate vocabularies for farriers, autobody repairmen, silversmiths, and welders, intricate terms for ancestry and marriage connections but not profession and social status and a whole world’s worth of ethnic groups, probably two or three areas of traditional learning but not hundreds of learned professions with distinctive jargons, etc. On the other hand, such languages are usually limited by what a relatively small community can keep in active use (and yes trained memory can do amazing things, but a single bookshelf can hold more words than anyone can memorize, and its easy to point to texts which were preserved by being memorized and recited while the meaning of some words and phrases was lost). Just as English has a very elaborate vocabulary for snow if one speaks to mountaineers, hockey players, skiers, and meteorologists, I suspect that farmers and gardeners in many complex societies have a rich vocabulary for types of soil (and its easy to point to agrarian societies which lump together kinds of plant which botanists classify as separate species).
    Has anyone attempted such a comparison in a serious way? I know that defining what counts as “one word” is often tricky and so is collecting the full range of a society’s lexicon. A French professor at a university on the east cost of the US (perhaps Alain Touwaide?) has done some work on what happened to the vocabulary of Greek pharmacy in the early middle ages as it became impossible to obtain some materia medica or preserve the full depth of Hellenistic pharmaceutical lore.

    • Thanks for your comment, Sean!

      The point is that languages of small tribal groups have more extensive vocabularies than often thought by lay men (e.g. the author of that article about Toki Pona, linked in the previous post, to which this one is a P.S.). It’s not like they have just a hundred or two hundred words. By the way, it’s not about specialized jargon, which is used by a small occupational group and often cuts across languages (e.g. Dutch and Russian sailors could understand each other pretty easily in terms of sailing jargon, but not generally).

      But more generally, there’s not much point in measuring vocabulary size, because what it means to be a “word” differs from language to language. Roots may be more useful to count, but even so the enterprise is bound to screech to a halt quickly: would we count “butter” as one root or two in English (noun and verb)? Those sorts of things get even more complicated if we do it for other languages, all of which have their own twists to the problem.

      More on this issue here:

      • Sean Manning

        You are welcome Asya (if I may). It is funny how many people fantasize about
        a state of nature but are not willing to see what is known about it.
        Coming from polyglot British Columbia and knowing a bit about the ancient
        Indo-European languages I have a hard time taking seriously the idea
        that people without cities or writing or a lot of professions speak
        simple languages.

        Learning German does make one think about why English speakers see
        “pidgeon removal company” as a phrase but German speakers see
        Taubenabwehrfirma as a word. Even though there are problems with
        definition, I still think that some languages can be said to have a
        larger vocabulary than others. Aside from languages with a
        deliberately restricted vocabulary (Aviation English, some pidgins, this Toki Pona conlang) one might compare Classical Greek which enthusiastically coined new forms (and developed elaborate technical vocabularies) with Classical Latin which preferred to embrace ambiguity. But I am a historian who has had to learn more languages than I expected and dabbles in philology, not a proper linguist.

        Since almost everyone has some competence in several languages, and some exposure to formal language learning and theories about language, it is surprising that people do not immediately push back against some claims about language. Surely most people can see why its unreasonable to ask for a 1:1 translation of a complex text if they thought about it for a moment …

        • I think many people in this country are not quite as multilingual as you say and their exposure to *formal* language learning is quite limited (most language teachers have no formal linguistic education to speak of!). Or maybe it’s “thinking about it for a moment” — you’d be surprised how little analytical thinking happens when people opine about language…

          As for the issue of word counts, you are thinking of languages like Greek, Latin, English — which are too similar to each other to be revealing. Many tribal languages are polysynthetic, some have root-and-template morphology (Semitic) and have other complications. Where some languages express some concepts lexically, others do it grammatically. How can all this be counted in some straightforward manner, I have no idea.

          • Sean Manning

            Isn’t the normal pattern that humans have at least one language to
            communicate with their friends and family and at least one other to
            communicate with a wider community (for trade, paying or collecting taxes, travel, …) or for specific areas of life (worship, scholarship, anime fandom…)? Speakers of a hegemonic language like English, Mandarin, or Russian can sometimes avoid the second as can people who are very isolated. (But most English speakers in North America know some French or Spanish or a language from South Asia or East Asia, and I would imagine that its not so
            uncommon in Russia to know a bit of one of the Turkic languages or another Slavic language). I would expect that most foragers know a bit of a trade tongue or pidgin like Chinook Jargon in nineteenth-century British Columbia.
            So far I have not seen anything in Akkadian or Sumerian as reconstructed which makes the problem of counting vocabulary any more or less hard than English (although reconstructing them is a hard problem, and we only have access to the language of a few domains such as epic literature and scholarly medicine). Obviously choosing which fixed phrases and specialized meanings should be distinguished and counted is hard for any language: are all forms of “to run” one entry? or should transitive and intransitive uses be separated? what about the difference between transitive “to run a cable” and “to run someone out of town on a rail”?

          • “Isn’t the normal pattern that humans have at least one language to communicate with their friends and family and at least one other to communicate with a wider community” — I don’t know if it’s a normal pattern, no… Especially if one thinks about a language and not just blocks of words (the latter, naturally, vary with the situation of use). Many people around the world are multilingual but whether language choice in each given communicative interaction correlates with friends/family vs. work/wider community, I don’t know. I don’t know if that’s been quantitatively studied…

            “I would imagine that its not so uncommon in Russia to know a bit of one of the Turkic languages or another Slavic language” — actually it is very uncommon, particularly among the ethnic Russians, who speak only Russian. When it comes to other ethnic groups, some maintain their indigenous languages, others not so much:


            Among virtually all groups, the knowledge of Russian (self-described, based on census data) is over 90%. Which means that in some groups (those that maintain their language), bilingualism is high, but in others (those that do not maintain their language), it’s mostly just Russian.

            But either way, non-Russians constitute about 15% of the population, so I wouldn’t call bilingualism in RF “common”…

          • Sean Manning

            Ok, I would have thought that proficiency in and domains of use of multiple languages would be the sort of thing which sociolinguists had worked on, but its obviously hard to do in the ancient and medieval languages I work on, so its not something I have studied.

          • Sociolinguistics focuses more on how the same language is used differently by people in different environments, backgrounds etc. As far as I know, there is no good qualitative data on a large scale that shows how many people are bilingual, in what languages etc.

          • As for my point about Semitic morphology making it hard to know what to count as “words”, I am not too familiar with the linguistic details of Akkadian or Sumerian (those are dead anyway), but let’s take Hebrew for example. (As Arabic have even more productive root-and-template patterns, the problem is only more acute). To take an example similar to you “run” example, is HAFAL ‘fall’ and HIPIL ‘make fall, drop’ same “word” or different? They do share the same root (which is hard to see because it’s one of those exceptional verb patterns) and the pattern of causative is productive and regular, and is thus no different from the English book/books — but we hardly feel those are two words! Similarly, KATAV ‘wrote’ and NIKHTAV ‘written’ doesn’t seem to us (from the English perspective!) to merit two-word status, but KATAV and HIKHTIV ‘dictated’ do — from the Hebrew perspective the two pairs are exactly the same.