Does Basque Descend from Dogon?

Nov 28, 2014 by

I was recently asked to comment on media reports about the work of Basque linguist Jaime Martín Martín that purports to show that Basque descended from a West African language, Dogon. The Ethnologue classifies Dogon as 19 distinct yet closely related languages, all of which are spoken in central Mali, in the area shown in turquoise on the Ethnologue map reproduced on the left. There are some 503,000 speakers of Dogon, with the largest variety, Tomo Kan, spoken by 133,000 and the smallest, Ana Tinga, spoken by merely 500 people. The Dogon varieties are members of the Volta-Congo branch of the Atlantic-Congo sub-family of the Niger-Congo family, whose better-known members are the Bantu languages. If Martín is correct, it means that Basque is also a Niger-Congo language, a cousin of Swahili, Chichewa, and Xhosa. But as attempts to relate Basque to other languages are numerous yet virtually never successful (see for example the discussion of the alleged link between Basque and Georgian), it is worth looking at the sort of evidence that is presented to support Martín’s theory. It should be noted, however, that I have not been able to find Martín’s original paper, so unfortunately I have to rely on what has been reported by the popular media.

According to the reports, Martín compared both structural and lexical aspects of Basque and Dogon. From the structural point of view, it is claimed that the two languages are greatly alike, with the only difference being that Dogon “has no declensions or ergative subject”, which I can only interpret as saying that Dogon lacks case marking or ergative alignment. These are major differences, however, as they determine the overall structural “feel” of the language. The claim that Basque is “very similar” to Dogon is about as adequate as saying Basque is “very similar” to English or to Chinese. If anything, Dogon, lacking case marking or ergative alignment, is much more similar to Chinese—especially since both languages (or language families) are also tonal. Basque, on the other hand, is not tonal, which is another major difference between it and Dogon.

Another piece of evidence adduced in support of the Basque-Dogon link is that “three of the fourteen Dogon dialects showed exactly the same order of words in the sentence” as Basque. This is a piece of non-evidence if I ever saw one! Basque is a strict SOV language, but the SOV order is the most common cross-linguistically, more common than the SVO order, found in English. Nearly 45% of the world’s languages are SOV. Therefore, the chance that any randomly selected language has the same word order as Basque is nearly one in two—hardly a strong piece of evidence for relatedness.

It should also be noted that the order of major clausal constituents is not a strong type of evidence for language relatedness to begin with: related languages can have different word orders. For example, within the Indo-European language family we find SVO languages such as English, French, and Russian; SOV languages such as Farsi and Hindi (and other Indo-Iranian languages), and VSO languages such as Irish and Welsh (and other Celtic languages). If we look beyond the basic main clause word order, other differences between even more closely related languages become apparent. For example, in embedded clauses Irish retains the same VSO order as in its main clauses, while Welsh reverts to the SVO order. Similarly, German and Yiddish exhibit nearly identical order in main clauses, as both languages have the Verb-Second phenomenon. (German and Yiddish differ, however, in the OV/VO order, which only becomes apparent in the presence of an auxiliary, that is when the lexical verb is not subject to the Verb-Second constraint.) But in the embedded clauses, German and Yiddish differ drastically: in German, the finite (i.e. tense-bearing) verb comes last, while in Yiddish it comes second:

Er sagt   daβ [die Kinder   diesen Film gesehen     haben].                [German]

he said   that [the children this     film  seen          have]

‘He said that the children have seen this film.’

Er hat gesagt   az   [di kinder   haben gesen dem Film].               [Yiddish]

he has said   that [the children have seen this film]

‘He said that the children have seen this film.’

Note that English, which is also closely related to German and Yiddish (all three are members of the West Germanic branch of the Germanic family), does not in general have the Verb-Second phenomenon, with some minor exceptions, discussed in my earlier post.

Another example of closely related languages differing in their word order patterns involves Russian (an East Slavic language) and Polish (a West Slavic language). While both languages allow verb-initial orders in main clauses, only in Polish can such verb-initial structures be embedded:

Zastanawiam się czy       [pojedzie Mary do Hiszpanii tego lata].                 [Polish]

I-wonder REFL whether   goes Mary to Spain this summer

‘I wonder whether Mary is going to Spain this summer.’ [Santorini 1989: 173]

*V skazke govorit’sja čto    [posadil  ded                      repku].                 [Russian]

in folktale says           that    planted grandpa.NOM turnip.ACC

intended: ‘The folktale says that the Grandpa planted a turnip.’

Numerous other examples of related languages differing in their word order patterns can be brought to bear on this issue, including the fact that most other Niger-Congo languages are SVO, rather than SOV, like Dogon. And when it comes to syntactic differences between closely related languages, a whole school of research emerged in recent years to deal with exactly those issues—micro-parametric syntax.

Similarly, macro-parameters, such as head/dependent marking (i.e. agreement vs. case marking) and ergative vs. nominative-accusative alignment, need not be a reliable argument in favor of language relatedness. For example, languages in the Caucasus all exhibit ergative alignment (or similar active alignment), but the three indigenous Caucasian language families differ as to whether they rely on case marking, agreement, or both. However, even languages within the same family can differ along those lines. For instance, English (like Dogon) lacks case marking, while Hindi and many other Indo-European languages (in particular, Indo-Iranian and Balto-Slavic languages) have case marking. Moreover, Hindi—much like Basque—exhibits ergative alignment, yet this trait is considerably rarer among Indo-European languages.

Finally, Martín’s claim of the relatedness between Dogon and Basque is said to be supported by lexical similarities. It is not clear, however, if these are true cognates or look-alikes. Therefore, this sort of evidence would not pass muster with any serious historical linguist—and perhaps not even as an answer to a Linguistics 101 quiz. Crucially, it is not clear that Martín has been able to establish any regular sound correspondences between Basque and Dogon. The two pairs of “similar” words presented in the media reports—bede / bide ‘way’ and beri / bero ‘hot’— do not allow to infer any such sound correspondences. If the first pair of words is supposed to illustrate the /e/-/i/ correspondence, why isn’t the second pair beri / biro?

All in all, I am not at all impressed by the evidence presented in support of the Basque-Dogon link. And crucially, my concern is not just quantitative, but qualitative: it is not that there is not enough evidence, but the “evidence” mentioned is of the wrong sort. There is nothing in these reports that would make me want to read the original paper or examine the evidence more closely.


Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:



  • Fedor Manin

    That Russian sentence is acceptable to me — it’s obviously not unmarked, but with the right sort of emphasis, it works. “В сказке (квитанции?) говорится, что посадил дед репку, а уж что потом выросла морковка, тут мы не виноваты.”

    • I agree. Whether verb-initial sentences can be embedded depends on their structure and intonation: this is okay as embedded clause if the emphasize is on “repka” (morkovka), not on “ded”…

  • Przemyslaw Staniewski

    The Polish embedded clause is at least marked as well. But to be honest, it is hardly acceptable to me and I think any interpretation of that sentence in terms of emphasis will help here.

    • Przemyslaw Staniewski

      I mean, will not help here 🙂

      • Thank you, Przemyslaw! That’s an interesting point — I’ll have to check with more Polish speakers, of course, but the marked status of this sentence works perfectly for my theory! 😉

  • Ivan Derzhanski

    Well, to be fair, by `the same order of words’ he doesn’t just mean that both languages are SOV, but that both are SOV, GN, NA, ND and have postpositions. Not that this means much, of course: this is the third most common configuration, with 66 examples in WALS (after SVO/NG/NA/ND/Pr [118] and SOV/GN/AN/DN/Po [106], and before SOV/GN/NA/DN/Po [55]), shared among others by Tibetan and our friend Pirahã.

    • Ivan, good point! I am not sure exactly what “the same order of words” is supposed to mean, so maybe my comments are unfair — but so are the media reports… Thanks for crunching the numbers for other word order features and combinations. I am sure, however, that I can come up with other types of “word orders”, that is orders of other elements — and I am almost certain that we’d find some word order difference between Basque and Dogon, if we look hard enough… There almost always is some difference!

      • Ivan Derzhanski

        Oh, we needn’t look too hard: we only need to read on. Basque is Numeral-Noun, Degree-Adjective and Relative-Noun, whereas Dogon is Noun-Numeral, Adjective-Degree and has internally headed relative clauses. Disclaimer: this comes from WALS, which only knows about four Dogon languages (and doesn’t have word order data for all of them); it may be that the `three of the fourteen Dogon dialects’ (unnamed; wonder why) that Martín has in mind are more similar to Basque, but (1) I’m not betting anything on this and (2) as you say, it doesn’t prove anything, even if true.

      • You admit that your comments were unfair and yet you haven’t corrected the blog post, and the page has become the only authority in English on the subject due to your google rank. Other people are having a hard time finding the paper, too…but you’ve dismissed it without reading it and unfortunately have probably ended a lot of other peoples’ searches with false info. You are neither a geneticist nor an anthropologist, and therefore you would think it strange for Basques, Ainu, Eskimo, Inuit, and Dogon to have similarities. For someone who knows our prehistoric past, it would be far stranger if those peoples didn’t have some similarities in their language, because they all descend from the same group. It is funny that no one ever finds a correlation between Basques and Australian Aborigine or San or Mayan or any other language that they have no genetic ties to. They only find correlations with the peoples that Basques SHOULD have similarities with. It is improbable that the linguists who first saw these correlations already knew about the genetic ties between these groups, especially since some of the claims date from prior to the mapping of our Y chromosome….and because linguists don’t usually care much about genetics.

        • I don’t correct posts unless it’s a typo or some such. And I have no control over Google rank. Is there any specific “false info” in my post? Or is it just a vague and unproven accusation?

          As a linguist, I won’t find it odd if Basque, Dogon or any other language exhibit certain similarities. In fact, I would expect it to be the case. But, also as a historical linguist, I will find it extremely odd when people claim that these languages are historically (or “genetically”, in the linguistic sense) related. There is no evidence to that.

          • How could you possibly know if there is any evidence to that if you haven’t read the paper? You are “not sure exactly what ‘the same order of words’ is supposed to mean” and “might have been unfair.” If you were to read the paper, you might understand what it means and find that it is evidence.

          • How can I possibly know if there’s any evidence for Basque-Dogon relationship? Because I know *what type* of evidence would be needed. Word order, regardless of exactly how it is defined, is not the right type of evidence.

  • paul raicu

    Any two randomly languages of the world will show similarities due to chance. If we go in the east direction we’ll find “connection” between Basque and Ainu, actually this small linguistic similarities are due to coincidence. Ex. ainu ashke (hand) and basque eskua and with metathesis Ugro-finnic: est. käsi, finn. käsi, magh. kéz and so on.
    Paradoxically rev. John Batchelor in his Ainu-English-Japanese Dictionary shows more similarities between ainu and english. Ex. tu – two, re – three, chacha – papa, chip – ship, mat – female, pone – bone, pak – panishment, wakka – water. If tomorrow we find people on Mars I’m sure they have some words like Basque language. The split of basque from a common source was to long time ago and all its relative disappear unfortunately. All attempts to link it with any kind of language easily end to failure. But this it is our obsession to dig again and again…

    • Exactly! Thank you for the Ainu examples.

      • What makes you so sure they are coincidence? Ainu, Basques, Dogon, and English should share words. The first three are remnants of a people who dominated the whole of the Northern Hemisphere during the Paleolithic, and the English are a mix of those people and agriculturalists. We know this because of the distribution of Y Haplogroup ED, and the late arrival of y haplogroup R1b. All of these groups also have high Neanderthal introgression and either have Neanderthal legends or practice neanderthal-like rituals. Any words shared by several of these groups might well have been borrowed from neanderthal.

        • I wonder if we’d find any of these coincidences in the language of Andaman Islanders, who are also remnants of Y Hap ED’d former glory.

    • Ivan Derzhanski

      He claims to have found not just some similarities, but many similarities (`ha comparado 2.274 palabras de ambas lenguas, con un resultado de 1.633 pares de semejanza, lo que representa un 70% del total’). He’s oblivious of the fact that a multitude of obviously similar words is not what you’d expect to find in distantly related languages; rather, you’d expect to find many cognate words whose relatedness is obscured by sound changes.

      • paul raicu

        If we want to find cognates in every languages of the world I’m sure we definitely find something…

        • Ivan Derzhanski

          More on this subject, with maths an’ all:

        • Ah, but we’d find a lot of LOOK-ALIKES, not (necessarily) cognates!

          • paul raicu

            Unfortunately, the linguistics don’t have strict rules like math and in the process of giving etymologies scholars are often subiectiv. It is very easy to confuse a “cognate” with “false friend” especially for basic and old words.

            The border between language and dialect it’s also very elusive. Are Italian Venetian and Italian Sardinian dialects or separate and related languages? Are Daco-rumanian and Aromanian dialects or separate and related languages? In this stage of linguistics development the answer it is also subiectiv.

          • Etymologies are not exactly “subjective”. Rather, changes in meaning can happen for extra-linguistic reasons, making reconstruction of etymologies rather tricky. Confusing “cognates” and non-cognates (“false friends” actually means something different) is possible (and easy) if one doesn’t take into account sound changes that occurred in relevant languages. Borrowings (but not true cognates) will show the “sound signatures” of their source languages:


            As for your second point about dialects, I don’t see how it’s relevant to the issue at hand…

          • paul raicu

            In theory everything seems to be perfect. But in practice?…

            One single example: etymology for Daco-rumanian one hundred.

            In interbelic period come from Sl. suto (Miklosich, Slaw. Elem.), after ’90 Romanian Explanatory Dictionary (DEX) show unknown etymology, in 2008 the scholar Mihai Vinereanu propose in his dictionary a substratum route with satem branch and finally you are sure the Daco-rumanian sută come from Latin centum. How it is that possible?

            In first place, the political bolshevik influence interfere with linguistics, after Romanian Revolution the scholars are still play dice: to be with Latin centum or Avestan satəm? Mihai Vinereanu put stakes on branch satem and you on centum. What credibility has the word “scientific”?

          • One would have to explore the evidence for this or that etymology: one of them must be correct and the other one(s) flawed—the question is which. But I have neither the inclination or the time to investigate that minor (and irrelevant) issue.

            What I do know is that the field of Romanian dialectology and historical linguistics has been highjacked by political ideologies, which you seem to confirm in your comment—that some “scholars” do crappy work does not mean that the whole field is not scientific. Lysenko & Co did crappy biology but that doesn’t mean that biology is not a science, does it?

          • paul raicu

            The political interest must don’t interference in any scientific field! But was Franz Miklosich (1813—1891) politically influenced or simply his work it is today obsolete?

            I guess you do not bias but how we interpret the work of a contemporary linguist Sorin Paliga ( which says neither more nor less that sută comes from the Geto-Dacian substrate and was borrowed by the Slavs?

            Where is scientific rigor when 1 + 1 gives different result every time? In other words, why your scientific results shows centum origin and other satem? May I suspect that some scholars use less scientific method?

          • Ivan Derzhanski

            I think scientific rigour is in the same place where it is when zoologists argue over whether birds are a class of their own within the phylum of vertebrates or a subclass of reptiles. Pluto was considered a planet until recently, but has been demoted to a planetoid or something now. This doesn’t mean that everything is subjective in astronomy, just that concepts can evolve.

            Having looked at Sorin Paliga’s argument, I’d say it appears quite plausible. If Miklošič was incorrect, that’s because Thracology hadn’t been developed to any extent in his day (nor for a long time after), and the problems with PIE *_ḱm̥tóm_ > PSl *_sŭta_ > Ro _sută_ seemed negligible; but still, his guess wasn’t totally untenable, and I wouldn’t call it unscientific. Crackpot claims made up entirely to support some political, nationalist etc. agenda, with no regard for the facts or the scientific method, are a wholly different story.

          • Ivan Derzhanski


            I’m not sure why DEX says the etymology of _sută_ is unknown; there may be problems with the derivation from _sŭto_ that I’m missing, but I don’t see how it can have descended from _centum_: how often does Latin _c_ yield _s_?

          • paul raicu

            Again I suspected the political interest interference. After the Romanian Revolution of ’90 ties with the Soviet world were cut. The changes of words etymologies, especially Slavonic, came as an act of liberation. In 1993 the Romanian Academy change the letter î (ɨ) with â in positions of words (ex. încercînd “trying” > încercând) and writing forms sînt (I am), sîntem (we are), sînteţi (you are) become sunt, suntem, sunteți in an hilarious attempt to be closer to Latin.

            In the particular case of sută, as you were saying, it’s hard to believe Latin descendance since we do not have a well-defined pattern of the transformation.

          • Lioba Werling

            Persian صد sad is hundred . I suppose that Persian is very much older than Latin. Latin centum (c pronounced as k )gave as French cent , in which the letter c is pronounced s , the letter c in Spanish cien 100 is pronounced th, a hardened th gives us t in English ten. In century the letter c is pronounced s too.We see that in the different languages the latin centum is shortened and the ending um omitted , Persian sad , supposedly from Avestan satem dropped the ending em too. There is no problem to derive Romanian suta 100 from Latin or Avestan, its just the question which language is older Latin or Avestan? I think Avestan is very much older. Avestan satem and Latin centum is the same word just an inserted n in Latin.

          • Old Persian is a little “older” (as in was spoken earlier) than Latin, and Avestan is earlier than Latin too. But I don’t see the relevance of that to the issue of inheritance: if you father is younger than your cousin’s father, it’s irrelevant to the question of who you inherited a certain genetic trait from. Clearly, you inherit from you father and not from your uncle (of course, your father and your uncle may share certain genetic traits, but even then you still inherit from you father, not your uncle). So Romanian may have inherited its word for ‘hundred’ from Romanian, but not from Avestan.

            The issue of the endings is irrelevant too, as they were part of the language’s inflectional system. Those inflectional systems are inherited and changed on their own terms, separate from lexical roots. And the English “ten” is not related to the Latin “centum” (the English “hundred”, on the other hand, is).

          • paul raicu

            That is the researchers sin: to divide languages as important and unimportant. If we take lat. aqua (water) > must become rum. apă > but avestan ap or skt. aapa come from rumanian?

          • I am sorry: are you referring to me here? I certainly don’t “divide languages as important and unimportant”

          • paul raicu

            I’m not intend to offend to anybody. In his giantlike work, Julius Pokorny don’t use for example rumanian or sardinian language. The name itself tell everything: Indogermanisches etymologisches Wörterbuch. This title reflect the spirit of his time when languages are treated politically as importance.

          • Oh that was just the (German) term at the time… And yes, of course, linguistics was twisted around to serve ideological goals, no question about it. More on this in our book:

          • paul raicu

            You used a coin to determine the older language? What really means “science” in this particular case of rumanian sută?

        • Yes and we’d find the most between peoples who share parental haplogroups and language genes. Those who have the YAP gene descend from a particular group of people and have a better propensity for tonal languages, and people who have the introgressed Microcephan D gene have a better propensity for non-tonal languages. Those who have neither (San and Bushmen) have neither. Perhaps Neanderthal had a tonal language, the Microcephalin D hominid had a non-tonal language, and Homo Sapien originally had a click language.

      • Excellent point, Ivan!

  • Tomasz Wegrzanowski

    Polish example “Zastanawiam się czy [pojedzie Mary do Hiszpanii tego lata].” sounds just wrong. The only reasonable form is “Zastanawiam się czy [Mary pojedzie do Hiszpanii tego lata].”

    On the other hand gramatically similar sentence “Zastanawiam się czy [pojedzie pani do Hiszpanii tego lata].” is perfectly fine, as long as “pani” (=miss/mrs./etc.) refers to the person you’re speaking to. If it’s some other “pani” (=woman/lady), it doesn’t work.

    “Zastanawiam się czy [pani pojedzie do Hiszpanii tego lata].” could work with either “pani” referring to second or third person.

    This seems to work or not work with pretty much any other honorific like “pan”, “profesor” etc.

  • superbad2011

    This link is, it would seem, this man’s blog note (pub @2014), and publishes his findings, at least. It’s not the paper, though. One route to ‘finding’ the paper is to simply email him and ask for it through the contact form on that page. , or cached at

  • Mark Saltos

    WHY do you and some other linguists INSIST that German and Yiddish are equal, nonlinear members of the same (West Germanic) family?This makes it sound like German and Yiddish are sibling languages which grew linearly out of the same root language or “Mother tongue”. This is nonsense. Yiddish Emerged from High German as a dialect spoken by the Jews of middle and eastern Europe during the early middle ages, rather late in the development of Germanic languages. It was originally a dialect of German, NOT a sibling.
    The Relationship of Yiddish to German is as a daughter to a mother
    It is misleading to list or imply it as anything other. Is there some agenda here?

    • We insist on Yiddish and German being equal siblings rather than daughter and mother because they are, Mark! What is confusing perhaps is that their common ancestor is called “Middle High German”, which makes it sound as if it’s *German*. But it has the same relationship to German (Modern High German, if we want to be precise, and we should!) as Yiddish does. So from a linguistic perspective, we could call that ancestral language “Old Yiddish” or “proto-Yiddish” or “pre-Yiddish” or something along those lines. Calling it “something German” rather than “something Yiddish” comes purely from social and political agenda, not linguistic analysis. For a linguist, Yiddish is as much a dialect or descendant of German, as German is of Yiddish. So if there’s a hidden agenda somewhere, it’s with people who insist on Yiddish being a daughter of German.

      • mark saltos

        If you even THINK that German is a dialect of Yiddish, you are NO “Linguist”.

        This proposition is so inane, it’s opposition requires no further support and Yes, I believe you do have an Agenda, or you’d not cleave to such an inane position.

        • You’ve completely missed the point of my comment, Mark Saltos. Read it again. What I’m saying is that neither Yiddish is a dialect of German nor German is of Yiddish. Both are ridiculous propositions. But her are sister-languages descended from a common ancestor.