Apr 19, 2011

If a Martian scholar trying to understand the cultural geography of our planet had access solely to the Wikipedia, he’d get a very biased and possibly misleading picture of our human linguistic geography. As discussed in a recent GeoCurrents posting, the “meta-wiki” list of the 281 languages in which Wikipedia articles have been written does not provides a very good representation of how widely these languages are spoken. For instance, Chinese with its 352,000 Wiki articles (all figures are rounded), coming in twelfth place, between Swedish and Catalan, is clearly underrepresented in the Wikipedia, as it is the number one most widely spoken language. Conversely, some smaller languages are overrepresented in the Wikipedia, including the abovementioned Swedish and Catalan. Swedish has 393,000 Wiki articles and only around 10 million speakers. Catalan has 338,000 Wiki articles and 11.5 million speakers.

But our imaginary Martian scholar would get an even worse sense of the future fate of various languages: what languages have a bright, promising future and which ones are stagnating and possibly dying. A quick comparison of the UNESCO list of endangered languages with the “meta-wiki” list reveals that many languages appear on both lists! That is, quite a few of the world languages that are endangered and whose future is not a sure promise are overrepresented in the Wikipedia as well.

The UNESCO list provides a classification system to show just how ‘in trouble’ the language is. Before a language becomes officially “extinct” (i.e., no speakers of it are left), or even “critically endangered” (i.e., the youngest speakers are grandparents and older, and they speak the language partially and infrequently), it can go through several stages of stagnation. First, a language becomes “vulnerable”, when most children still speak the language, but it is restricted to certain domains such as the home. Next comes the “definitely endangered” stage, when children no longer learn the language as a ‘mother tongue’ in the home. After that, a language may get into the “severely endangered” category, when it is spoken by grandparents and older generations; while the parent generation may understand it, they do not speak it to children or among themselves.

Curiously, some of languages listed by UNESCO in these three early endangerement categories — “vulnerable”, “definitely endagered” and “severely endangered” — appear in the list of languages with more than 10,000 Wiki articles. The top spot (#42) belongs to Newar, a “definitely endagered” language with 825,000 speakers and nearly 70,000 Wiki articles.

Many other languages that appear on both lists are — unsurprisingly! — non-national European languages, some of which are often regarded as mere dialects. Among them are the “vulnerable” Basque (660,000 speakers and 69,000 Wiki articles), Welsh (750,000 speakers and 32,000 Wiki articles) and West Frisian (350,000 speakers and 20,000 Wiki articles). “Definitely endangered” languages overrepresented in the Wikipedia include Aromanian (500,000 speakers and 62,000 Wiki articles), Piedmontese (200,000 speakers and 38,000 Wiki articles) and Lombard (3,500,000 speakers and 20,000 Wiki articles). One of the “severely endangered” languages that may well live on through its Wiki articles is Breton, with 250,000 speakers and 37,000 Wiki articles.

In addition to those languages mentioned above, many other overrepresented languages struggling for survival are found among those spoken in heavily computerized, rich Western European countries: Walloon, a “definitely endangered” language with 600,000 speakers and 12,000 Wiki articles in Belgium (ranked #17 on the International Monetary Fund (2010) list of GDP per capita); Low Saxon, a “vulnearble” language with 4,800,000 speakers and 17,000 Wiki articles in Germany (ranked #19); and Sicilian, another “vulnerable” language with 5,000,000 speakers and 17,000 Wiki articles in Italy (ranked #23).

But perhaps more surprisingly, one finds several endangered languages overrepresented in the Wikipedia even in poorer countries: for instance, Belarusian, a “vulnerable” language with 4,000,000 speakers and nearly 60,000 Wiki articles is spoken in Belarus, ranked #79 according to its GDP.

