Indigenous Languages of Siberia: An Overview

[This post was originally published in March 2012]


On the map of the 6,909 living languages listed in the Ethnologue database (see map on the left), Siberia is mostly empty, with fewer dots than European Russia or the United States. Along with the neighboring region stretching from western China through Kazakhstan, it is the least linguistically diverse place in the world.  Many of its 45 languages, moreover, are spoken by low and relentlessly declining numbers, and most are now considered endangered to some extent. Speakers of a typical Siberian language are scattered over vast and scarcely populated territory. As can be seen from the second Ethnologue map, these groups are typically settled along the major Siberian rivers – Ob’, Yenisey, Lena – and their tributaries. Few groups have penetrated deeper into the taiga or tundra, and many of the less-frigid areas in the headways of these rivers have been settled by the Russians, as well as Russified Ukrainians and Germans.

Although Siberia is land of little linguistic diversity, the languages that it does possess present a number of interesting issues. Scholars continue to disagree on the basic classification of most Siberian tongues. One long-established family, Paleo-Siberian, is probably a catch-all category of small families and isolates, no more coherent than the “Papuan” family of New Guinea and environs. Other debates focus on hypothesized relations between Siberian languages and those found elsewhere. Many Siberian languages, moreover, also have a number of fascinating qualities of their own. In John McWorther’s recent What Language Is (And What It Isn’t and What It Could Be), one Siberian language, Ket, is used to exemplify the grammatical complexity that often emerges in small, relatively isolated linguistic groups.

In terms of familial classification, some of the indigenous Siberian languages belong to well-established language families, whereas others defy easy classification. In today’s posts we will examine the language of the smaller ethnic groups of central and northern Siberia, extending from the Ural Mountains in the west to the Pacific Ocean in the east.


In western Siberia, the main indigenous language family is Ugric, a branch of the Finno-Ugric language family (languages in the Finnic, branch of this family are spoken in Northern European Russia, the circum-Baltic area, and northern Scandinavia). Two of the three Ugric languages are found in Western Siberia, presumably the birthplace of the family: Khanty, spoken chiefly along the Ob’ river, and Mansi, spoken further west, on the eastern slopes of the Ural Mountains. By far the widely spoken Ugric language, Hungarian, is found in Central Europe. The original Hungarians, or Magyars, were presumably a forest-dwelling Ugric people who later took up a pastoral way of life on the steppe under Khazar patronage and later settled in the Danubian Basin. The affinity of Khanty and Mansi, on the one hand, and Hungarian, on the other hand, can be seen from a list of cognates, such as the numerals in the picture on the left.


Another clear family of languages spoken mainly in Western Siberia is Samoyedic. It is a small family, consisting of only six languages, all spoken by few people. Quintessential reindeer pastoralists, the so-called Samoyeds stretched from the Kanin Peninsula to the Taymyr Peninsula. The most populous Samoyedic-speaking group is the Nenets, with some 31,300 speakers in the tundra area from mouth of Dvina River in northeast Europe to the delta of the Yenisey in Asia. The second largest Samoyedic language, Selkup, is spoken only by about 1,600 people, in Tomsk Oblast, Yamalo-Nenets Autonomous District, and Krasnoyarsk Krai. Next in size is Nganasan, the northernmost language of Siberia, with only about 500 speakers in small communities on the Taymyr Peninsula of the extreme north. Finally, Forest and Tundra Enets–count only 30 speakers between them. Forest Enets is spoken along Yenisey River’s lower course, mostly in the settlement of Potapovo, whereas Tundra Enets is spoken in the settlements of Vorontsovo and Karepovsk.



Some scholars combine Finno-Ugric and Samoyedic languages in a larger Uralic family (see the chart on the left), based on cognates like the Nenets tuu and the Finnish tuli ‘fire’, or the Nenets min- and the Finnish mene- ‘to come’. They hypothosize that the diverse Uralic languages descend from a common ancestral tongue that must have been spoken over a continuous part of northeastern Europe and northwestern Asia. This ancestral Proto-Uralic language is believed to have had vowel harmony, agglutinative morphology, Subject-Object-Verb order in sentences, two grammatical cases (accusative and genitive) and three locative cases (dative, locative, and ablative), and a fairly complex verbal morphology. Once Proto-Uralic diversified, speakers of Indo-European and Turkic languages displaced many of the original Uralic groups. As a result,  the present-day Uralic languages are spoken in a geographically discontinuous area.


While Uralic itself has not been conclusively established, attempts have nonetheless been made to relate it to other language families. One such proposal connects Uralic to Yukaghir , Yukaghir consists of just two languages of the Kolyma and Indigirka regions in northeastern Siberia: Tundra, or Northern, Yukaghir, and Kolyma, or Southern, Yukaghir (despite the balmy implications of the word “southern”, this language is spoken in one of the coldest and least hospitable areas in Siberia). The two Yukaghir languages have 90 and 30 speakers, respectively. The link between Yukaghir and Uralic languages– sometimes referred to jointly as Uralic-Yukaghir macro family – is tentative; as a result, many linguists prefer treating Yukaghir languages as a separate language family (for example, the Ethnologue, which generally takes a splitter’s approach, does so).

Other Siberian languages that some have regarded as falling within a larger “Uralic macro-family” include the Yeniseian languages, Nivkh, and Ainu. According to the little-accepted Uralo-Siberian hypothesis, the larger family extends all the way to the Eskimo-Aleut languages of North America. All these languages are eluding classification, and many scholars have long unified most of them under Paleo-Siberian label. However, this category is more of a geographic term than a linguistic term.

The Yeniseian language family has only one living language, Ket, which is spoken in the Upper Yenisey Valley in western Siberia. All of Ket’s relatives known from historical records became extinct from the 18th to the 20th centuries. Some scholars have attempts to relate Ket to the Sino-Tibetan languages, but their evidence is far from convincing. Others have suggested that the language of the mighty Xiongnu Confederation, which dominated the Mongolian steppes from the third century BCE to the fifth century CE, belonged to the Yeniseian family. A few linguists have even argued that the Na-Dené languages of North America, which includes the Athabascan tongues, are affiliated with the same family, but evidence for this theory is meager. If it is valid, Navaho would be the most widely spoken language in the broader grouping by a wide margin.

Ket is now spoken by a few hundred people, but even in its heyday, it probably had a few thousand speakers at the most. As can be expected from a language spoken by a small, isolated group, it is very “ingrown”, to use John McWhorter’s terminology: the grammatical complexity of Ket boggles the mind. One aspect of this complexity is the polysynthetic nature of the morphology: even “simple”, regular verbs in Ket sentences can form whole sentence. For instance, the complex meaning of ‘I went to the river and came right back a little later’ – which in English requires a whole sentence with two coordinate clauses – is expressed in Ket by a single word, the verb digdabatsaq.

Nivkh (also known as Gilyak), probably a language isolate, is spoken at the mouth of the Amur River and on the northern part of the Sakhalin Island in Russia’s Far East. Though a written language was developed for Nivkh (primarily for its Amur dialect) in the 1930s, next to nothing has been written or printed in it. Like the languages of the Caucasus, Nivkh tolerates highly complex consonant clusters, whereas its vowel system is relatively simple. Among its most notable grammatical characteristics are a system of noun classifiers, not unlike those found in languages of eastern Asia; a relatively simple case system; a wide range of non-finite gerund forms; the Subject-Object-Verb order in sentences and, as would be expected from cross-linguistic typological correlations, postpositions rather than prepositions.

Finally, Ainu is now all but extinct: although there are around 15,000 ethnic Ainus, the language itself is spoken only as a second language by very few elderly people. Although Ainu, which some regard as a small family of languages, is closely associated with Hokkaido in northern Japan, it originally extended from central Honshu, through Hokkaido to central Sakhalin Island, and up the Kuril Archipelago to the southern tip of Kamchatka. There is little similarity between Ainu and Japanese, and what similarity does exist can mostly be explained by borrowing and typological constraints. Attempts have been made to relate Ainu not only to Uralic languages, but also to Altaic languages and even to Indo-European languages. However, the evidence to support such links has later been proven faulty, so most scholars prefer to treat Ainu as an isolate. The same applies to Ket and Nivkh.

Paleo-Siberian is often extended to include another small family, Chukotko-Kamchatkan, which includes Chukot (or Chukchi), Alutor, Kerek, Koryak, and Itelmen. These languages are spoken on the Chukotka and Kamchatka peninsulas in the far northeast. Chukot is the most populous, with some 10,000 speakers (in the ethnic population of 15,000). Of these, the so-called Maritime Chukchi constitute a quarter of the population, whereas the rest are called the Reindeer Chukchi. Despite the relatively high numbers of speakers, Chukot is endangered, as children seldom acquire the language, especially in settled communities. Although most people under 50 speak some Russian, nomadic, reindeer-herding groups resist the Russian language and culture. This may be in part a reaction to the derogatory perception of this group held by the Russians, who tell insulting and “stupidity jokes” about the Chukchis, portraying them as unsophisticated if not simply retarded. (A separate post considers these jokes more closely.)

Other Chukotko-Kamchatkan languages have highly disparate numbers: Koryak has 3,500 speakers (ethnic population of 8,743), Alutor 200, and Kerek only 2 (out of the ethnic population of 8 in 2002). The numbers for Itelmen speakers vary from source to source, but the reliable figures for native speakers, as opposed to people who know a little bit of the language, hover around 30. Language maintenance and revitalization programs are underway for Itelmen, with heavy involvement from Western scholars, such as Jonathan Bobaljik of the University of Connecticut. (A separate GeoCurrents post will be dedicated to the issue of the Itelmen language revitalization.) It has been intriguingly suggested that the language of the Merkits, one of the most powerful “Mongol” tribes at the time of Genghis Khan, may have belonged to the same family.

Although these so-called Paleo-Siberian languages share certain typological features – for example, they tend to be agglutinative and mostly exhibit Subject-Object-Verb word order in sentences – there are also important differences among them. For instance, Ket is unlike the other Paleo-Siberian languages in using tones to encode meaning (like Mandarin Chinese and many other languages of eastern Asia). Also, Ket has discontinuous root morphemes and infixes, not unlike Semitic languages such as Arabic and Hebrew. Where Nivkh has a plethora of nonfinite, gerund-like verb forms, Ket has few if any. Similarly, Nivkh is unlike other Paleo-Siberian languages in having a well-developed system of noun classifiers, while Chukchi and Koryak are unlike the others in being ergative and having vowel harmony.





  • John Cowan

    I am not aware that Uralic is in any way a controversial grouping: I would say it is as well established as Indo-European. Although there are disputes about exactly how the nine branches are structured (except that the Samoyedic languages are the most divergent), the same is true of the ten IE branches.

    While it is certainly much too soon to call Dene-Yeniseian established, it rests on firmer evidence and is more widely accepted than any other hypothesis involving Yeniseian.

    • Maybe I didn’t explain it clearly: Uralic is not controversial in the sense of whether it actually exists as a family at all. What is highly controversial (if my understanding is correct) is the inclusion of certain languages (Yukaghir) and the grouping of others into major branches. Some scholars, I understand, think that Samoyedic languages may be more closely connected to some (but not all!) Finno-Ugric languages, so that rethinking of the internal geometry of the Uralic family is needed. I am no expert on these languages — but that’s what I gather from talking to the experts. In the case of IE, I don’t think similar arguments arise, for example, that half of Slavic should be in the same grouping with, say, Germanic, and the other half with Iranian, or something like that.

      As for Dene-Yenisean, it’s certainly an interesting theory but I don’t think it’s been proven beyond reasonable doubt. Again, that’s the impression I get from scholars who are more experts in this than I am….