Languages and genes don’t always match — part 2
In the previous posting, I noted that the commonly held assumption that if the languages of two groups are related, the peoples must be related as well doesn’t always hold. As you can see from the chart below, the genetic tree of populations (on the left) often does not match the linguistic tree (on the right). For example, while the inhabitants of northern and southern India are genetically close, they speak languages from two distinct language families: Indo-European and Dravidian, respectively. Conversely, Ethiopians and Berbers are not genetically close (in fact, they are separated by the highest-order split on the genetic side of the tree!), but they speak languages from the same family: Afro-Asiatic.
Two other particularly interesting cases of genetic/linguistic classification mismatches come from Africa as well. The first one involves the Pygmies (i.e., Mbuti Pygmy on the top of the chart above). Genetically, they are quite distinct from other African peoples; their distinctive physical appearance suggests as much (see picture below). In fact, the Pygmies represent the most ancient divergence right after that of Khoisan people. The overall genetic pool of the Pygmies includes a very high frequency of Y-DNA haplogroup B (which is localized to sub-Saharan Africa, especially to tropical forests of West-Central Africa, where the Pygmies live) and of the mtDNA haplogroup L1 (i.e., the oldest divergent lineage).
Yet, despite their physical and genetic distinctiveness, linguistically the Pygmies are part of the Bantu-speaking sub-Saharan Africa — just as the name Mbuti (with its prenasalized consonant) suggests. Most researchers agree that originally the Pygmies must have spoken a different language, by now completely lost; that’s why the chart above states “unknown” for their linguistic affiliation. Thus, the Pygmies are a good reminder to us all that a group’s linguistic affiliation says little about its genetic origin.
Another example of the non-matching DNA and language involves Hadza and Sandawe, two Khoisan-speaking groups, living in northern Tanzania, in an area where mostly Bantu languages are spoken (marked with orange on the map below). Their closest linguistic relatives — other Khoisan groups — live thousands of miles to the south.
Why did Hadza and Sandawe end up living so far away from their linguistic brethren? And what about their genetics? Two explanations are possible for the geographic discontinuity between Hadza and Sandawe, on the one hand, and other Khoisan speakers, on the other. The first theory takes Hadza and Sandawe speakers to be originally Bantu-speakers who switched to Khoisan languages. The other alternative states that the small pockets of Hadza and Sandawe are remainders of the earlier, pre-Bantu-migration Khoisan population in East Africa, now surrounded by Bantu speakers as a result of that Bantu expansion.
At the first glance, genetic studies seem to support the first theory since the Hadza have been shown to be genetically closer to the Pygmies of Central Africa and Sandawe are closer to the Bantu than to Khoisan speakers. Curiously, the Sandawe are not related to the Hadza, despite their geographical proximity. But the first theory leaves unexplained why these two groups would switch to a Khoisan language if the closest Khoisan speakers are so far from them.
So at present, it is the second theory that has the most proponents. According to this theory, Hadza and Sandawe are originally Khoisan and maintained their language but their genes have changed over the centuries. How could their genes have changed? Well, note that Hadza and Sandawe populations are relatively small: today, there are only 800 Hadza and 40,000 Sandawe. Thus, geneticists hypothesize that there has been a great deal of intermarriage of Hadza/Sandawe with Bantu groups, which resulted in washing out their peculiar Khoisan genetic pool. However, being hunter-gatherers, Hadza and Sandawe were separated from Bantu farmers by socioeconomic factors, so they managed to preserve their language but not to prevent genetic exchange.
The take-home message: while the language may tell us who does or doesn’t belong to “our tribe”, large scale linguistic groupings often conceal rather than reveal genetic groupings on various populations.