Which language is the oldest?
I hear this question a lot, or claims that this or that language is “the oldest”. Recently, this question has come up again in connection with the research on the birthplace of human language, published recently in Science and discussed earlier in this blog (and also here). The logic of the argument is this: Khoisan languages are the oldest languages in the world, they have click sounds and therefore it must be that the first human language, the so-called Proto-Human had click sounds as well.
This argument is flawed in several ways. Even if it were true that Khoisan languages are “the oldest” (a question to which I return immediately below), there is no way to prove that the first human language had the same properties (e.g., the same phoneme inventory) as modern-day Khoisan languages. While it is true that languages outside of Africa have not added click sounds to their inventories, it is not true that click sounds cannot in principle migrate from one language into another. The evidence comes from Bantu languages that have click sounds. In all likelihood, Proto-Bantu did not have click sounds. The only Bantu languages that have them today — such as Yeyi, Xhosa and Zulu -– are those that have been in close proximity or in verifiable contact with Khoisan languages that have clicks. Thus, these Bantu languages must have borrowed click sounds from Khoisan languages. Another piece of evidence to support the borrowing theory comes from the fact that only dental, alveolar and lateral clicks are found in Bantu languages, whereas Khoisan languages may also have bilabial and palatal clicks.
But let’s get back to the question of the oldest language. If we give this question a bit more thought, we will see that it is really meaningless. Indeed, what would it mean to say that a given language is “old” or “ancient”? There is one sensible way to use the expression “ancient language” — for languages that were spoken in antiquity and are no longer spoken today. In this sense, the term “ancient language” applies to Scythian, Tocharian and Sumerian, to give just a few examples. But this is not the sense in which the expressions “the oldest language” or “the most ancient language” are commonly used. For example, Khoisan languages are not “ancient languages” in this sense: they are still spoken today by nearly 100,000 people. Furthermore, we know little to nothing about the history of these languages, what they were like in antiquity etc.
So why exactly is the statement that Khoisan languages are “the oldest” meaningless? Let’s think about this. Each person and each generation of people have parents. We can think of human history as a huge, immensely complicated family tree, going back to the first humans. And all these people spoke a language. Some language. Each group or generation of people anywhere on this planet speaks the language of its parents. Almost. Little, sometimes unnoticeable modifications in how words are pronounced, what they mean or how they are put together may be (and are) introduced from generation to generation but no generation makes up a completely new language, not related to that of their parents. If this were ever to happen, children and parents truly would not be able to communicate (rather than merely pretend that they do not understand each other). But these little changes accumulate over time so that the language of long-ago ancestors can become completely incomprehensible and unrecognizable to a later generation. But historical change is slow, very slow. It takes decades, hundreds if not thousands of years.
There are also historical situations when one language is replaced by another. When a group of invaders comes and subjugates the locals, it may impose its own language on the conquered people. It doesn’t happen in all instances of conquest, however: sometimes it is the conquered who give their language to the resulting mix of the invaders and the locals. For example, English derives from the language imposed by Anglo-Saxon invaders on the local Celtic population, but not from the language of the later Viking or Norman invaders (i.e., English is not a descendant of Old Norse or Norman French, and thus it is neither a Scandinavian nor a Romance language, despite the heavy influences of both). But even in the case of such language replacement, the switch is not instantaneous: it takes several generations for the new language to become the language of the land. So again, the language of the great-grandchildren is not the same as that of the great-grandparents, but contemporary generations speak in ways that differ little.
So new languages do not arise from nothing. Every new language is a development of some branch of the bushy family tree of existing languages. The key to understanding this is that languages constantly change. As I said above, languages change slowly, but the process of change is nonetheless relentless. But people who start out speaking the same language do not necessarily adopt the same modifications to their language. One group may change the pronunciation of one word, and another group — the meaning of another word. And before you know it, different groups, which may be distinguished geographically or socially, speak different dialects. These dialects are still mutually comprehensible (which is what dialects are, by definition), but they are not exactly the same. And the process of language change does not stop there. More and more changes in pronunciation, meaning and grammar accumulate and eventually dialects diverge to such a degree that they are no longer mutually comprehensible, at which point we tend to call them languages rather than mere dialects of the same language. In other words, new languages arise as variations on the theme of already existing languages.
Proto-West-Germanic, Proto-Germanic or even Proto-Indo-European languages. In theory, we can trace an unbroken line of descent with small modifications at each generation from Proto-Human to present-day English.
And the same can be said for each and every language. In fact, each and every language traces its ancestry back to Proto-Human. Therefore, all currently spoken languages are equally old. The difference is just in the labels. We think of a given language as “old” if it has had the same label for a long period of time. For example, the name of the Farsi (or Persian) language, an Indo-Iranian language of Iran, has been used since the 6th century B.C.E., and this fact gives us the sense that Farsi is an ancient language.
In the case of Khoisan languages, the label itself is not very old, and these languages are perceived as old in a somewhat different sense. If we consider a genetic tree of human populations (such as in the picture below), Khoisan people are typically on the branch the splits off the common tree the first.
Can we extrapolate from this that the present-day Khoisan languages are “the oldest”? Not really and for two reasons. One reason is that population trees constructed by geneticists don’t always correlate with a linguistic tree. The fact that Khoisan people belong to the branch that splits off the population tree the first does not mean that they speak a language that splits of the human language tree the first. The second reason is that even if it were true that the Khoisan languages split off the language tree the first, it does not mean that they have not changed since. After all, remember that all languages change, all the time. And the Khoisan languages as we know them today are not like the language of Khoisan ancestors was centuries or millennia ago. In theory, Khoisan languages may have “invented” click sounds at some point in this history, or even borrowed them from some other languages (that left no descendants or lost click sounds since then). There is no way to prove (or disprove) that the proto-Khoisan languages of yesteryear had click sounds, let alone that the first human language, the Proto-Human had them.
« Endangered languages: does the size matter?
Adding apples and oranges? »