Why Some Languages Sound So Fast, or do they?

Sep 19, 2011 by

As a long-time reader of this blog would already know, I am typically skeptical of “linguistic” studies that calculate an average this and multiply it by an average that and supposedly produce results that are relevant for understanding the nature of human language, which is what linguistics is all about. Typically, such studies disregard so much that is already known in linguistics that it is not clear whether their results are to be taken seriously at all.

One such recent study, conducted by researchers from the Université de Lyon and reported in the journal Language, addressed the popular belief that some languages “sound faster” than others. According to Jeffrey Kluger, the author of the Time.Science article, “Spanish blows the doors off French; Japanese leaves German in the dust — or at least that’s how they sound”. Strangely enough, I’ve had the opposite impression of Spanish vis-a-vis French (I must mention that I know both languages to some extent, so they don’t sound completely foreign to me). And I never felt that Japanese is spoken fast, although admittedly, my exposure to natural-sounding Japanese has been limited mostly to bits and pieces of the original “Iron Chef” show.

But assuming that other people’s impressions about the speed of different languages is correct, the puzzle is of course that a text translated from one language to another does not typically take significantly longer or shorter to say than the original. In the words of the article,

“the dialogue in movies translated from English to Spanish doesn’t whiz by in half the original time after all, which is what it should if the same lines were being spoken at double time. Similarly, Spanish films don’t take four hours to unspool when they’re translated into French. Somewhere among all the languages must be a great equalizer that keeps us conveying information at the same rate even if the speed limits vary from tongue to tongue.”

To investigate this puzzle, researchers recorded 59 volunteers of both genders reading various texts in their native languages: English, French, German, Italian, Japanese, Mandarin, Spanish and Vietnamese. The investigators next counted all of the syllables in each of the recordings and further analyzed how much meaning was packed into each of those syllables. Once again, to cite the article:

“a single-syllable word like bliss, for example, is rich with meaning — signifying not ordinary happiness but a particularly serene and rapturous kind. The single-syllable word to is less information-dense. And a single syllable like the short i sound, as in the word jubilee, has no independent meaning at all.”

I must express my concerns here. First of all, syllables are not units of meaning, morphemes are. There are languages that allow monomorphemic words and others that don’t. For example, in Russian practically no lexical words (nouns, verbs, adjectives) are monomorphemic, with very few exceptions (one such exception is the word bezh ‘beige’, a loanword adjective). In contrast, in English many lexical words are monomorphemic, although of course not all. For example, bliss is not just a single syllable, it’s also a morpheme (a noun root) and a word. How much information is contained in a given syllable or in a given morpheme may vary greatly within a language too: according to the above description a single-syllable, single-morpheme bliss contains more information than a three-syllable, two-morpheme happiness. Note also that syllables and morphemes do not necessarily coincide so the first syllable in happiness does not carry any meaning by itself whereas the last syllable does: it carries the grammatical information “noun” as well as the meaning ‘the state of being X’. Just averaging this out seems odd to me. Moreover, what’s the “a single syllable like the short i sound, as in the word jubilee? The three syllables of jubilee are ju.bi.lee!

Another concern with counting meaning per syllable, especially across languages, has to do with the fact that different languages allow different types of syllables. Of the languages studied by these researchers, four — Japanese, Mandarin, Spanish and Vietnamese — are listed by the World Atlas of Linguistic Structures Online as having a moderately complex syllable structure, meaning they allow syllables of the following shapes: V, CV, CVC and/or CCV. On the other hand, three of the eight languages — English, French and German — are listed as having complex syllable structure, that is allowing syllables with more consonants in the beginning (i.e., in onsets) and/or at the end (i.e., in codas). English, for example, allows up to three consonants before the syllable’s nucleus, the vowel, and up to four after the nucleus. I’ve already written on the longest one-syllable word of English. Unfortunately, WALS does not provide information about the syllable structure of Italian.

Why does it matter? Obviously, if a given language allows more complex syllables, more information can be packed into any given syllable, where by “more information” I mean not just lexical meaning, which is rather subjective, but information carried by contrast with what else that syllable could be. To illustrate it with a simple example, imagine two languages: one which allows only CV syllables and one that allows also CVC syllables (with no limitations on what that final consonant can be). To make the example clearer, let’s imagine that our two hypothetical languages have exactly the same inventories of consonants and vowels. Clearly, the same CV syllable in a CVC-language carries more information than the same syllable does in a CV-language because it is contrasted with more alternative possible syllables. A syllable ma in a CVC-language is contrasted not only with mi and mu (for simplicity, let’s assume the simplest three-vowel inventory); it is also contrasted with mam, map, etc. In sum, the more different types of syllables a given language allows, the more information, strictly speaking, each syllable carries and the fewer syllables are needed to express the same amount of information. But this calculation is not taken into account by these researchers.

Curiously, some of their results can be accounted simply by comparing the complexity of allowable syllables: English and French, which are describe as “slow”, both have complex syllables, whereas Japanese and Spanish, which are both described as “fast”, have simpler syllables. The one language that doesn’t seem to fit the pattern so far described is Mandarin, which is described as slow despite having only relatively simple syllables. However, further syllabic complexity in Mandarin is added by the tone system: in a tonal language, a syllable ma with a low tone contrasts with ma with a high tone etc., in addition to contrasting with syllables with other vowels or consonants.

Of course, to make a truly meaningful calculation of syllabic information density defined as contrasts to all other possible syllables the actual vowel and consonant inventories (and for tonal languages, also inventories of tones) must be taken into account. Essentially, each language can be characterized by the number of distinct syllables it allows, based on the number of consonants, the number of vowels, the allowed syllable complexity and the number of tones. This information density figure may then be correlated with speed (i.e., number of syllables per second).

The bottomline: it may be not lexical meaning expressed by syllables (which is a rather nonsensical concept, since — as mentioned above — syllables by themselves do not express lexical or grammatical meaning, only when they coincide with morphemes) but rather information carried by a syllable in contrast with other possible syllables that matters as far as speed of speech matters.

My last critical comment: when making claims based on calculations, it’s great if the actual figures support the conclusions. However, the conclusion that “at the end of, say, a minute of speech, all of the languages would have conveyed more or less identical amounts of information” is not supported by the figures: Japanese apparently conveys 3.8416 units of information per second (7.84 syllables per second multiplied by the average information density of 0.49 per syllable), whereas English manages to convey nearly 1.5 times more — 5.6329 units of information per second (6.19 syllables per second multiplied by the average information density of 0.91 per syllable).

Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:

  • alfanje

    What you say makes a lot of sense. I am the Spanish-speaker in an office of mostly English-speaking people, and my colleagues, who just understand the odd Spanish word, think I speak my language very fast, which inspired me for composing this chart:


    I a more scientific mood, I though that maybe languages with fewer sounds, and specifically with fewer vowel sounds (as Spanish has, as opposed to English) allow for more syllabic speed, whereas English can convey more meaning with monosyllabic bits.

    I always tell my colleagues that the equivalent of the verb "to spell" is used very seldom in Spanish, and I'm quite sure that the same thing happens with "say it again".

  • Asya Pereltsvaig

    @alfanje: Thank you for your comment and the link! Very interesting that in your experience Spanish speakers are characterized as speaking fast. What is not to be confounded here is that any foreign language sounds faster than ours, just because we don't understand it.

    As for your second comment, I agree: the more different vowels (or consonants, for that matter) a language allows, the more possible syllables and therefore the more information each syllable carries.

    As for using the verb "to spell", I think it has to do in part with the fact that the English spelling represents the pronunciation less faithfully than the Spanish (or Russian) spelling. Hence, sometimes a word needs to be spelled to be recognized.

  • Ezra

    I've spent quite a bit of time learning Spanish and French. I'm stronger in French though, having lived there for a year. And I completely agree that French sounds faster. Of course it's completely subjective, but I think the impression comes from how they tend to run words together (due to liaisons). So rather than pausing between those words, it becomes one.

  • Asya Pereltsvaig

    @Ezra: I am glad you share my impression that French is faster than Spanish. I too was thinking about liaisons and also the phrasal rather than word-level stress, both of which phenomena sort of "glue" words together.

  • Surly Teabag

    Thanks for analyzing this study. I had similar concerns about how rigorous it was.

    You might prefer how the information content of a word is measured in the recent paper "Word lengths are optimized for efficient communication":


    I found this paper interesting enough that I wrote a blog post about it:


  • Asya Pereltsvaig

    @Surly Teabag: Thank you for your comment and for the links — hopefully I can read both the article and your blog posting on it soon and perhaps even write up a response here…

  • Anonymous

    I don't think that liaisons play any significant role in the apparent speed of French speech (the use of liaisons is constantly decreasing anyway, they remain mandatory only after personal pronouns).

    IMHO the problem lies with the many schwas and mute 'e's that are omitted in everyday's speech.

    An example from the top of my head: "Je vais te le redire" is pronounced as three syllables ("Jvètlørdir"), but anyone with a basic knowledge of French hears it as six syllables (or even seven) pronounced double that fast.

  • Asya Pereltsvaig

    @Anonymous: Thank you for your comment. Not sure it matters though, the way things were counted in the study I refer to. It is my understanding that they counted how many syllables are actually pronounced, not how many are written out. But having schwa omissions like that affects how much information there is per syllable, so you are right in that sense.

  • Pingback: Linguistic speed dating | Learning Jigsaw()