How many words do I need to know? The 95/5 rule in language learning, Part 2/2

Language dictionariesWelcome to Part 2 of my post on “How many words do I need to know? The 95/5 rule in language learning”. If you haven’t done so already, read through Part 1 before continuing!

How many words in the English Language. How many words are there in some of the world’s major languages? As I stated in Part 1 of this article, there is really is no way to answer this question. Languages are evolving and continuously changing, and subject to people’s own creativity and imagination. After all, it is said that Shakespeare himself invented 1,700 new words!

People continuously invent new words, alter some existing ones, or stop using others altogether. Plus, what about medical and scientific terms? Should they be counted as part of our “vocabulary”? And if we look at the English language word count, for example, what should we do about Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Teenage slang? Abbreviations? Should we count them as English words or not?

The most “objective” measure that we have available for counting the number of words contained in a given language, then, is to calculate the number of words contained in its largest dictionary (really, it’s not that objective, but it’s the only measure we have access to!).  I thus began to research answers to this question in regards to some of the world’s major languages, but quite surprisingly, I couldn’t find any resource on the net actually listing languages and their associated number of words based on dictionary word count. So after having scourged the net for scattered answers, I’d love to share with you my findings.

So here’s a list for 11 of the most spoken languages around the world (sources given as hyperlinks):

LANGUAGE LARGEST DICTIONARY NUMBER OF WORDS
Chinese 汉语大词典 (Hanyu Da Cidian. Lit: Comprehensive Chinese Word Dictionary) 370,000 words; 23,000 head Chinese character entries
English The Second Edition of the 20-volume Oxford English Dictionary 171,476 words in current use, and 47,156 obsolete words; 615,100 definitions
Dutch Woordenboek der Nederlandsche Taal (Dictionary of the Dutch language) 430,000 words
French Le Grand Robert de la langue française 100,000 words; 350,000 definitions
German Der Duden 135,000 words
Italian Grande dizionario italiano dell’uso (Gradit) 270,000 words
Japanese (日本国語大辞典)Nihon kokugo daijiten 500,000 words (this includes definitions and etymologies of foreign loan words (gairaigo, 外来語), highly recent words (gendai yōgo, 現代用語), archaic words (kogo, 古語), idiomatic compound  phrases (jukugo, 熟語), words that can  be written using more than one possible Chinese character to produce subtle differences in meaning (dōkun iji, 同訓異字), and Chinese characters that are written differently but have the same pronunciation (iji dōkun, 異字同訓), some slang (ingo, 隠語), and words used only in regional dialects (hōgen, 方言))
Korean 표준국어대사전 (Korean Standard Unabridged Dictionary) 500,000 words (this includes 190,000 technical words (전문어), 70,000 North Korean words (북한어), 20,000 regionalisms (방언), and 12,000 old sayings (옛말))
Russian Толко́вый слова́рь живо́го великору́сского языка́ (Explanatory Dictionary of the Living Great Russian Language; AKA Dahl’s Explanatory Dictionary) 200,000 words
Spanish Diccionario de la Real Academia Española 100,000 words
Portuguese Vocabulário Ortográfico da Língua Portuguesa Nearly 390,000 words

So… How many words in English Language after all?

The first thing that will probably jump to your eyes, here, is the apparently low word count for English. If you do a quick Google search, you will find out easily enough that many claim that English Language has “the most number of words of any language” out there, with several hailing the “millionth word” milestone recently reached in the English language.

See the next video from Oxford Dictionaries youtube channel where they speak about the subject.

So why only 171,476 words in current use? Well, again, when comparing the largest dictionaries out there, we have to keep in mind several important points: Which country has the best-developed dictionary industry? The best archives? Do you count obsolete words? Dialectal ones? How many scientific words are included?

In Korean, for example, the largest dictionary ever compiled was the result of 8 years of work, through the collaboration of over 500 scholars, for a total cost surpassing 11.2 billion Korean won (~$11.2M). The dictionary includes nearly 200,000 technical words in itself, and thousands of old sayings no longer in usage.

Specialized vocabulary used in sciences is most notably very large and growing constantly. The French “Dictionnaire de la chimie de Duval” (Duval Chemistry dictionary), far from being exhaustive since we already distinguish over 100,000 coloring matters, already contained 26,400 entries in 1935, and more than 70,000 in 1977.
Therefore the reason why English has 171,476 words in current use in its largest dictionary is partly because the dictionary excludes inflections, does not cover several technical and regional vocabularies, and does not, obviously, include words not yet added to the published dictionary. If distinct senses were counted, according to the Oxford Dictionary, the total word count would probably approach three quarters of a million.

Which language has the biggest vocabulary, then?

As you can see, the list I compiled does not necessary tell us which language has the “biggest vocabulary”. It simply tells us which dictionary was made to include the most words.

In any case, if I had to give a short answer to this question, I’d say “Who cares?” Each and every language is amazingly rich and interesting in its own way. Each language has its own genius and its own personality. Arabic has apparently over fifty different words for “camel”. In Korean, there are over five different words for each color equivalent in English (i.e. red, blue, yellow, etc.) and several thousands of words have both a pure Korean and a Sino-Korean (한자어) equivalent.

The reason I compiled a list of the number of words in the dictionaries of some of the world’s most widely spoken languages is simply out of sheer curiosity, not to stir up a debate over which language has the most words. This question, once again, has no definite answer.

What does matter to you as a language learner, though, is to know the approximate number of words needed in order to reach conversational fluency in a language. Of course, you could very well learn a language without ever asking this question and, frankly, it wouldn’t matter the least. But it’s still nice to know. And this number is the approximate amount of words you will actually have to more or less “deliberately” memorize before reaching a point where you can essentially learn almost only through context and good guesswork. For more on that, see Part 1 of this article.

How many words does a native speaker use in daily life?

Green Eggs and Ham,” is a book written by Dr. Seuss (a pen-name of Theodor Seuss Geisel), whose vocabulary famously consists of just fifty different words. It was the result of a bet between Seuss and his publisher, Bennett Cerf, that Seuss (after completing The Cat in the Hat using 225 words) could not complete an entire book using so few words.

Obviously, if one can write a book using as few as 50 words, it makes no doubt that having a vocabulary of 40,000 words is not necessary for communicating. For your information, though, according to Susie Dent, lexicographer and expert in dictionaries, the average active vocabulary of an adult English speaker is of around 20,000 words, with a passive one of around 40,000 words.

What is the difference between an active and a passive vocabulary? Simply put, an active vocabulary is comprised of words that you can recall and use in a sentence yourself. A passive vocabulary, on the other hand, is a vocabulary that you can recognize and know the definition of words, but are not able to use yourself.

Now, here’s where it gets interesting: although an average adult native English speaker has an active vocabulary of about 20,000 words, the Reading Teachers Book of Lists claims that the first 25 words are used in 33% of everyday writing, the first 100 words appear in 50% of adult and student writing, and the first 1,000 words are used in 89% of every day writing! Of course, as we progressively move to a higher percentage, the number of words starts to dramatically increase (especially after 95% of comprehension), but it has been said that a vocabulary of just 3000 words provides coverage for around 95% of common texts (such as news items, blogs, etc.). Liu Na and Nation (1985) have shown that this is the rough amount of words necessary before we can efficiently learn from context with unsimplified text.

When it comes to Chinese, approximately 3,000 characters are required to read a Mainland newspaper. The PRC government defines literacy amongst workers as a knowledge of 2,000 characters, though this would be only functional literacy. Of course, given the nature of the Chinese language, 3000 characters equals to many, many more words. Nevertheless, the highest level (VI) of the new Hànyǔ Shuǐpíng Kǎoshì (HSK), also known as the Chinese Proficiency Test, is a vocabulary of 5000 words (2633 characters).

Finally, in French, the 600 most common words apparently account for 90% of words found in common texts, although I cannot verify the veracity of this claim. But I think you can see from the numbers here that really, in order to understand the biggest part of a language, it is not necessary to know tens of thousands of words. Generally speaking, a vocabulary of about 3000 words (not counting for inflexions, plurals, etc.), then, would be the number necessary to efficiently learn from context with unsimplified text.

Do the Math

Chinese Characters We have seen that the Oxford English Dictionary contains 171,476 words in current use, whereas a vocabulary of just 3000 words provides coverage for around 95% of common texts. If you do the math, that’s 1.75% of the total number of words in use! That’s right, by knowing 1.75% of the English dictionary, you’ll be able to understand 95% of what you read. That’s still just 7.5% of the average passive vocabulary of a native speaker (3000 vs. 40,000 words). Isn’t that great news?

Let’s repeat the math for Chinese. The Hanyu Da Cidian contains 370,000 words, whereas 2500 words (1710 characters) are necessary in order to “read Chinese newspapers and magazines and watch Chinese films”, according to the HSK test (level 5). That’s 0.68% of the total number of words contained in the Hanyu Da Cidian! Knowing 5000 words, the minimum number required to pass the highest HSK test (level 6), would mean knowing 1.35% of the total number of words contained in the Hanyu Da Cidian.

Pareto’s Law and Language Learning

Vilfredo Pareto portrait
Italian Economist Vilfredo Pareto

We will end this already lengthy article by once more taking a look at Pareto’s Law, also known as the 80-20 rule. If you’ve already forgot, the law states that for many events, roughly 80% of the effects come from 20% of the causes. In other words, in the context of work or study, 20% of the efforts bring in 80% of the results.

If we drop the unrealistic figures of the number of words in the largest dictionaries out there, and we instead count the number of words an average educated native speaker knows, which is around 30 to 40 thousand for many languages, we will find out that Pareto’s Law works on steroids! In many cases, knowing just 5-7% of the total number of words that a native speaker knows will allow you to understand anywhere from 90 to 95% of the vocabulary found in common texts! That’s right, 5 to 7% of the effort brings you 95% of the results. That is great news for you my friend.

So yes, languages contain fabulous numbers of words, and for many, learning a foreign language seems like an insurmountable barrier, something that takes dozens of years to accomplish. But the fact is, by learning from the very beginning words in context (I highly recommend the Assimil method), and by gradually building your vocabulary to around 2500-3000 words, it is possible to reach quite rapidly a level at which you will be able to read common texts in the language and understand anywhere from 90 to 95% of it. This is essentially the “golden” number, since this amount of understanding is enough not to make reading in the language a frustrating experience. More importantly, though, this is the rough amount of words necessary before you’ll be able to efficiently learn from context.

Download this article in pdf here

 

81 thoughts on “How many words do I need to know? The 95/5 rule in language learning, Part 2/2”

  1. That was interesting … But the numbers got me dizzy :s … I won’t learn a language using the calculator, it’s just less funny :p I quote as a conclusion : << … if I had to give a short answer to this question, I’d say “Who cares?” Each and every language is amazingly rich and interesting in its own way. Each language has its own genius and its own personality. << .

  2. First, congratulations for the high quality you put into your homepage articles.
    There are some issues that interest me particularly.

    Interesting to know that Chinese Proficiency Test requires to master around 5000 words.
    I read that a C2 certification assumes a working knowledge of 5000 words.
    I would like to understand if 5000 words is meant to be the sum of active and passive words!

    You suggest to get a knowledge of 3000 words to understand 95% of writing of texts for adults. It seems far from the assessment offered in this video which states:
    “Written comprehension only begins at 8000-9000” (words)
    http://www.youtube.com/watch?v=JbYMZZISPrU
    What do you think about it?

    1. Thanks for the kind words Red!

      According to the China Education Center, the vocabulary necessary to pass the highest HSK test is 5000 words. I have no doubt that they refer to a passive vocabulary (which, by the way, de facto includes your active vocab), since it’s mostly a reading and writing test (which, in contrast with speaking, does not only test active vocab).

      As for Dr. Arguelles’ video, I think he is right in what he says. I do not have a Ph.D in linguistics, and Dr. Arguelles is far more knowledgeable than I am. However, from my personal experience, I have found that in many cases the knowledge of words required for understanding a text well enough to start acquiring new vocabulary from context is approximately 95%.

      I should specify that this can of course depend on the language you are learning. If you are learning a language that is of the same language family as your native tongue, 95% understanding will be, in my opinion, enough (roughly 3000+ words). However, I have found that with a language such as Korean (or Japanese, for ex.), which is as far from English as any language can be, 95% comprehension is not always enough because grammatical complexity, coupled with totally different sounding words and turns of phrases, can hinder comprehension significantly. In this case, probably a comprehension around what Dr. Arguelles calls for (98%), will be necessary. This means that indeed 3000 words will not be enough. And again, from my personal experience, I have found that a vocabulary of 3000 words has not been enough to understand the news in Korean. Somewhere around 8000 words would probably be right.

      Hope this answers your question!

      1. I watched Korean sitcoms with NO problem with less than a 100 word vocabulary …

        News in Japan with only a two word vocabulary, and I understood more than my Expat friend who was completely conversant.

        Oh well.

        Wayne, Luvsiesous

  3. To say Dutch has more words than English is complete bullshit–absolutely preposterous. The author is simply being intellectually dishonest. Dutch academia, Dutch media, Dutch publishing etc., is microscopic compared to its UK and US counterparts. The number of Dutch speakers and writers is a tiny fraction of the number of English speakers and writers. We can simply and logically INFER that there should be more English words. And there are.

    1. Hi Daniel.

      You know, there is something commonly called “being civilized,” and that usually entails not insulting other people for their work, especially if you have no clue about what you are talking about.

      I’m not being intellectually dishonest at all. I have clearly stated in this article that what the numbers reflect are simply the number of words in a given dictionary. Go look at the sources I included if you are unsure of the figures. The Oxford Dictionary has a surprisingly small amount of words compared to some of the other dictionaries, but as I stated very clearly this doesn’t mean that English has “less words,” simply that the publisher likely decided to leave out some definitions/inflections/loanwords/etc. It has been widely reported that English probably contains the most words in the world (over million according to the source I quoted in this article), whatever this may mean.

      A quick Wikipedia search, by the way, will tell you that “[The Dictionary of the Dutch language] is said to be the largest monolingual dictionary in the world with over 430,000 entries of Dutch words from 1500 to 1921 and the paper edition consists of 43 volumes and close to 50,000 pages. The dictionary was almost 150 years in the making: the first fascicle (A-Aanhaling) was published in 1863 and the last (Zuid-Zythum) in 1998. Three supplements to the original dictionary text containing modern-day Dutch words were published in 2001.”

      1. Thanks for the accurate explanation. I am in the process of learning Dutch and Afrikaans for the simple ability to read newspapers. I looked up your article and found it very useful. Spanish is where I work to continue my growing application of a second language beyond my native language of English. I have a good friend who assists my application of verbalizing Spanish. Thinking quickly enough in conversation has been my problem but reading is a joy. I understand a bit over 1000 words, closing in of 1,500, which means, appropriately, I understand roughly 90%+ of what I read. Being a person of 57 years of age, it is hard work for me. I wish to learn two more languages (French and Portuguese) then immerse myself in my new enlightened linguistic education. Of course, in a manner of speaking, I am already immersed just not at the depth I desire.

        One final note: rough language, in my mind, is not an appropriate use of criticism in any language. Failure to abide by proper etiquette suggests deficiencies from the unsolicited commenting party.

        Again, thank you for your work, Mr. Gendreau. I appreciate your published effort.

    2. Gryphon Flight

      I am afraid that inferences are not always correct, especially when they are not done carefully. Just because more things are written in English, that does not mean English has more words. It simply means that more people write in English. Would you say that a five-year-old child knows more than an adult just because the child speaks more? Not likely.
      Also, the writer of this article even said that the list basically just stated which languages had the longest dictionaries, which is not the same as having the largest vocabulary. It was very clearly stated and so if you did not get that, then perhaps you should reread the article. English does have a large vocabulary, in my and many other’s opinions, however some of that vocabulary is dialectal, slang, technical, obsolete etc. and so would not make its way into the dictionary. Also, just because a word is in the dictionary that does not necessarily mean that it is used that often, depending on the dictionary. Different dictionaries might include obsolete and technical terms or slang while others would not, so I think it is hard to judge them completely accurately.
      For your information, intellectually dishonest is used to describe someone who is purposely publishing something false, and even if the article was wrong, which I seriously doubt, it would only be intellectual dishonesty if he was trying to mislead us and not if he was just inaccurate. As I doubt that the writer is trying to mislead anyone and states the intrinsic issues with the data, I am forced to conclude that you are, yet again, false.
      Kudos to the author, by the way. I find this article very interesting and enlightening.

    3. Ксения Синяговская

      So do you think the more speakers the language has the more important it is? ) Well, THIS is a totally bullshit ) I’ll give maybe a strange example which isn’t connecting with languages. Finland has a 5,5 million people’s population and Russia more than 250 million (I don’t remember) but Finland has much more metal bands than Russia. So what’s your point?

  4. Thank you for your interesting article. Your statement that knowing
    3,000 words is enough to figure out things from context needs
    qualification. First, it must be the right words (e.g., a native
    English speaker can quickly learn thousands of words in Spanish by
    knowing a few rules that can produce about 20,000 cognates (based on
    NTC’s Dictionary of Spanish Cognates Thematically Organized ). But if
    this were all the person knew, s/he would be missing most of the 100
    most common words ( all but en, no, haber?, por?, todo?,o?, otro??, me,
    mucho, mi?, an~o??, primero?, pasar, dia?, hombre, parte, menos?,
    nuevo?, contrar?)* .Yes, s/he would often have an idea of what the text
    is about – but not what it’s getting at.

  5. Continued:

    But beyond vocabulary, there are the issues of grammar, collocations, idioms, and cultural conditions. I know over 2,000 Korean words (mostly in the “most-common” range) not counting loan words from English, but there is no way I can understand more than a very few even simple, every-day texts other than ones written for FL learners.

    1. Yes in general this rules applies most aptly to languages that are not too distant. Of course, given that Korean is one of the hardest languages in the world for native English speakers to learn, it makes sense that 3000 words might not be enough (especially given the grammatical complexity of the language and the fact that most pure Korean words have a Sino-Korean equivalent, thus doubling the count).

      From my own experience learning Korean, I would give a very rough ballpark estimate of a knowledge of at least 5000 words in order to start acquiring vocab from context. But in most languages (especially European or Germanic languages, if you’re an English speaker), I think the rule I outlined in the article is quite appropriate.

      Thanks for commenting!

  6. Had to comment! Thanking you for taking the time to post your research to us! It’s highly insightful and is a great confirmation to my beliefs. Coupled that you could probably) with a full days work You could learn 40-50 words, that’s what, say roughly 4weeks in a month 50×7=350×4=1400×3=4200
    It’s wishful thinking but if you could actually keep that up you’d be near enough fluent by the end of the year after the initial three months of hard core studying to allow more practice and time for your brain to strengthen the pathways of the words learnt. With the study method of divide and conquer (easier to learn 5 at a time than 50 in one go) and anki SRS(?)…man just getting pumped by the potential lol. Thanks again…it’s 2am as I’m posting this..must sound off my head haha

    1. Thanks for your comment Luther! Yes, theoretically it’s possible to learn 40-50 words in a single day, but that’s a tremendous amount of work and it requires the use of very good memory techniques (I’ve written a series of post on this topic in early 2013). People who start like this usually are the first to quit because they burn out and lose motivation. It’s always useful to remind oneself of the “turtle and the hare” story. Finding the right middle is, in my opinion, more important, and in the end consistency (brought by passion and motivation) always wins.

      Let me know how your learning goes! Which language(s) are you learning by the way?

      1. Hey there sorry for the late reply,
        have a snack as this a long reply as I’m curious on your thoughts.
        I agree once more, it is usually those that start on high workload that burn themselves out and give in. I am currently learning Japanese. It had started as a side module in university after upon graduating my skill went down as I wasn’t practising consistently. Long story cut short I’m back in the game with Anki helping me in regards to memorizing vocab.
        My focus this time is memorizing kanji words, not singular kanji meanings, so that I will be able to read and get my practice via that, as I rarely have the opportunity to speak. I’ve already completed minna no nihongo book 1 and know pretty much all the kanji words in there, ( roughly a 1000 give or take). Due to me working weekends only I have the time to dedicate my weekdays to whatever i choose, this month I’m dedicating it to learning all the vocabulary in minna book 2, 25 lessons hence the learn 40-50 words a day belief, although in truth the max word count in each lesson varies, but it is pretty high.
        grammar is important too but im not looking for a speaking workability, just reading, not even writing.
        my system of learning is to learn 5 words a time, run my eye up and down the list until I know them, I also write them down just enough times that I can write it from memory fluid, this is short term though and it has the issue that I am learning the order and not the actual kanji. the real aim is recognition. after I learn the vocab for the lesson I open up Anki and run through my new cards as it randomises the order meaning I am learning the kanji word more effectively.

        I will learn to speak once I’m done, and writing…well I’ll deal with that when the need arises but with computers and phones changing the hiragana into kanji..you get my drift 🙂

        I will let you know how I get on by the end of this month.

        I’ve bookmarked your site as i’m sure there is a goldmine of info here 😀

      2. “People who start like this usually are the first to quit because they burn out and lose motivation.”
        Or they could just quit learning as many words a day and cut back. You don’t have to stop entirely, it’s not all or nothing.

  7. Use word frequency dictionaries to build your vocabulary.

    The Routledge publisher makes some nice ones which go up to 5000 words, and they can be found on Wikipedia for free if you search “wiki word frequency”.

    I’ve used the frequency word method for various languages, including non-Romance languages, and it’s amazing. For English speakers studying Romance languages, reading becomes much easier at around 1000 words.

    I would like to tackle Korean or Russian using the frequency dictionary method—just to try it out on a language which I am completely unfamiliar with/haven’t studied previously.

    1. Routledge is good. Erwin Tschirner is the author of a bunch of books built around a frequency list, too (I’ve been using the Russian one but others exist, published by Lextra/Cornelsen).

      The limit of that approach is that those methods usually do NOT contain the pronunciation (be it audio or IPA.) This is a big deal because learning from text is a recipe for disaster for one’s oral communication.

  8. I think that the creators of the HSK are just wrong when they assert that 2500 words is enough to “read Chinese newspapers and magazines and watch Chinese films.” I know 10,000 words (including all the ones on the HSK tests through level 5!) and I still find myself with significant vocabulary problems in reading almost all authentic Chinese content. Not just an unknown word here or there — it was hard for me until very recently to get any meaning from the text.

    1. In my article I referred to having to know about 3000 characters, which is different from 3000 words. Do you know the approximate number of characters that you can recognize, by any chance? Because so many Chinese words are composed of multiple characters that in themselves are words, knowledge of 3000 characters would mean many more words than that.

    2. i am a Chinese, and you even count how many words you learned lol you knew 10,000 OK people who is Chinese do not there is limit of words for Chinese language. Chinese have a long histories which for some high educated Chinese used without Chinese grammar, but now Chinese can speak Chinese without grammar. sometime if you translate to English and it won’t make any sense. that is how Chinese is better than other language.

  9. Maybe I have been learning less practical words, but I have about 3,000 notes in Anki and already had a decent sized vocabulary in my head prior (I used anki years ago, but somehow lost all my old decks.) I’m studying Japanese, and some days I read something and I feel like I really get it, then other days I am just reading a kid manga or a candy wrapper and I understand 50% at best. I just can’t believe that 7% of 40,000 words will give me 90-95% comprehension, cuz it hasn’t.

    1. Hannah, from my own experience, languages such as Korean and Japanese are much more complex than other languages and requires a higher number of words before you can get 90-95% comprehension. So this would be totally normal. However, making an effort to learn a lot of common suffixes and prefixes and understand how many words are rooted from the Chinese, and in which way, can help a lot. That being said, Korean and Japanese are reportedly some of the hardest languages to learn from native English speaker, so what you’re telling me doesn’t surprise me much. Just keep going and never give up!

      Sam

      1. What you say makes sense, and when I first came across this article a couple of months ago I did not think it would be possible for me to understand Japanese with that few of words, but I had hope! But now a few months have passed, I have the cards, and it has not happened. So I came to complain… I am mostly just disappointed in myself because if I had been consistently studying from when I started Japanese I would have well over 8,000 cards in anki based on the rate I’ve been going. It’s just been bumming me out. However, I don’t think just sticking words in anki makes me “learn” them either, it gives me only familiarity at best. But it’s something for me to cling to.

        1. Yes, I understand. In any case, I would advise against relying on Anki too much to acquire vocabulary. I think in the end the best is to really get exposed to a large amount of material in the native language, and through repetition and exposure you will more naturally acquire the language. It can also help to use memory techniques such as mnemonics to remembering some words that just won’t stick.

          Don’t forget to remember why you’re learning Japanese and all the good that it has brought to your life, and don’t forget to make language learning fun! Then you won’t have any problems to keep going strong 🙂

  10. Excellent article. I was already familiar with these statistics, having previously encountered them in a few language books I stumbled upon. Great stats, and great news for language students all over the world. Certainly. I take issue, however, with the claim that 3.000 words are “enough not to make reading the language a frustrating experience.” Without a doubt, understanding 95% of what you usually read is a remarkable achievement, and 3.000 words are a relatively easy goal. However, that still means on average you will find one word you don’t know for each ten words you read. And that, in my experience as a proficient speaker of English as a second language, can be very frustrating indeed! The level of strain at which one experiences frustration varies with each individual, but it was only a long time after I had acquired that core vocabulary of roughly 3.000 words that I found myself able at last to tackle novels and most books without feeling frustrated.

  11. Hi lingholic,

    Thanks for an interesting article, though I’m left wondering why you didn’t include Arabic amongst the languages whose words you counted, especially considering how widespread a language it is. Would have loved to see a word count for it, if you can.

    Thanks,
    Hashim

    1. Hi Hashim,

      Very good point. It’s been a long time since I wrote this article, but if my memory serves me well, I hadn’t been able to find reliable statistics on Arabic. I couldn’t find what was the largest dictionary in the language. If you have any clues, let me know!

      Thanks,

      Sam

      1. I’ve been searching this article and its comments to find just the right ( English) words to express my incredulity at the audaciousness that it took to intimate that this was some kind of authoritative exposition on language. I thought I saw it right away when you stated ‘really not that objective’. In your defense, your preface has all of these disclaimers, but then you presume to draw conclusions from your ‘ 11 most spoken languages’. Finally here you admit your lack of reliable statistics . But its too late! You’ve left out Arabic from the ’11 most’. This alone would leave your entire analysis flawed.
        Here’s a couple of glaring omissions.
        You mention the time it took for several dictionaries to be compiled but fail to mention that the Oxford First edition took 65 years. In your table each dictionary mentioned lists ‘Words’ , yet under English you qualify ‘in current use’. Why the particular distinction? Because nowhere else do you make that distinction. Are a half a Million Korean words in current use? And ‘12,000 old sayings’ ? The Oxford Dictionary itself says as of 2005 there are 300,000 entries . Five different Korean words for each English color? Red? There are literally dozens of English words for red.
        I’m sure I might find other articles you have posted as interesting and diverting, but please don’t characterize your ‘findings’ as ‘research’.

      2. Dwane Blundell

        I enjoyed and appreiated your article. I am going to learn mandarin pinyin, I will admit I probably wont read your comments section anymore. Stupid [people{ read your article and ignore statement after statement, then proceed to complain because they feel left out, insulted, when the realitly is probably they have big ego’s , plans for dominating the world by their cowardly hearts, and greedy demands for their rights over everyone else.

    2. Dwane Blundell

      How many words are there in some of the world’s major languages? As I
      stated in Part 1 of this article, there is really is no way to answer
      this question. Languages are evolving and continuously changing, and
      subject to people’s own creativity and imagination. After all, it is
      said that Shakespeare himself invented 1,700 new words!he most “objective” measure that we have available for counting the
      number of words contained in a given language, then, is to calculate the
      number of words contained in its largest dictionary (really, it’s not
      that objective, but it’s the only measure we have access to!). I thus
      began to research answers to this question in regards to some of the
      world’s major languages, but quite surprisingly, I couldn’t find any
      resource on the net actually listing languages and their associated
      number of words based on dictionary word count. So after having scourged
      the net for scattered answers, I’d love to share with you my findings.

  12. Great article. However, 3000 words, at an average (hobby style) pace of 1 word/day if we account for spaced repetition and foreign characters, still accounts for 8.2 years. If we do it seriously we might get into 10 words/day, accounting for almost 1 year.

  13. A great article, and something that is after my own heart!

    After moving abroad with work and needing to become comfortable in my situation as quickly as possibly, I also realised that learning the most high priority words is absolutely crucial to making quick, efficient progress. This has really helped me at work to feel much more comfortable in meetings, reading documents, etc.

    I have written a tool that identifies to people which words are the highest priority for them specifically to learn, based on their own unique situation. Hopefully this can help people in the same situation as me to feel comfortable in their own unique work or social lives as quickly as possible. It is called Box Of Words. http://www.boxofwords.com/?ln=w .

  14. Well, Arabic is the biggest and largest language on earth in terms of the number of words, there are more than 12 to 13 million words in Arabic excluding those recently discovered such as cars, planes , computers , …etc. Why is Arabic ignored intentionally , where other less important languages such as German , Korean, Japanese are mentioned !! I feel that this article is biased and unfair.

  15. Well, Arabic is the biggest and largest language on earth in terms of the number of words, there are more than 12 to 13 million words in Arabic excluding those recently discovered such as cars, planes , computers , …etc. Why is Arabic ignored intentionally , where other less important languages such as German , Korean, Japanese are mentioned !! I feel that this article is biased and unfair.

    1. Hi Ahmed. I would love if you could point out an authoritative source that states that Arabic’s largest dictionary has 12 to 13 million words. I’d be happy to add that to the list in this article. Thanks for your contribution!

      1. Ahmed Al-Mahdawy

        Hi, I am arabic native and i believe arabic is the most hardest language in the world but people who have not even studied arabic say chinese is the hardest, I want to ask them Have you learned? they will answer What is arabic anyway? no-one even know it?
        anyway about your question even you didn’t ask me, but there is no source i can give actually now arabic is not spoken in real as other languages, natives speak it in dialects so you can never have source but verbs in arabic have 13 ways in english for example there is for the verb write “wrote”
        in arabic “kataba-katabat-katabaa-katabataa-katabna-katabuu-katabtu-katabnaa-katabta-katabti-katabtumaa-katabtum-katabtunna” and every one gives another meaning. so how do you think about all the languages words and arabic have dual which english does not have kitaab = one book
        kitaabaan = two books but alas people even do not know and judge with no knowledge.
        sorry for my english

        1. يعني شلون بالله تعرف اذا كانت لغتك الام هي لغة صعبة؟ ترة كلام مو منطقي وغبي جدا.

          وشدخل كونها لغة القرآن بالموضوع؟ انت تتهم الكاتبة بكونها ماخذة الموضوع بشكل شخصي بينما انت ماخذ الموضوع بشكل اكثر من الشخصي وكانما تدافع عن عائلتك.

          بعدين الموضوع يعتمد حسب اللغة اللي يتكلمها المتعلم، يعني مثلا متحدثي الانكليزية يشوفون العربية صعبة جدا، بينما متحدثي الفارسية يشوفوها سهلة لأن قريبة على لغتهم، مثل مال الانكليزي صعب علينا بينما سهل على السويدي.

        2. Ahmed, the very fact that each of those definitions are pretty much identical in structure and only the end is changing makes it easier to learn.

          When you’re comparing to chinese and japanese, the exact same symbols used in writing can have completely different meanings based off context.

          Stating
          That is an apple.
          They’re apples.
          It is an apple
          Is all written exactly the same: りんごです。

          And to make it even worse, there are no spaces! You simply have to guess based of context when a word starts or ends.

          And there are many many words that are prounounced the same way except one vowel is longer than the previous word and makes it a totally new meaning. Japanese doesn’t even have plural words, thats why its exceptionally difficult for someone to learn the language one both sides.

          1. Not always changes in the end, many definitions change at the beginning, like for example “baqara (cow)-abqar(3 or/and more cows)” some might change at the mid as well like “baab(door)-abwaab(3 or/and more doors) and hence many change the whole word like “imra’a(woman)-nisa'(women), and about a word pronounced the same way except one vowel is different or longer, we have zillions of that, one word in Arabic can be pronounced the same way and many means and can be pronounced differently and the same meaning as well as it might have the same meaning, not only one vowel might change or get longer but any vowel can change since in Arabic the short vowels are cut, you do not write them, for example the word “علم” is written like alm but how do you pronounce it? if you for example pronounce it “alam” you’ll mean flag but if you pronounce it “ilm” you’ll mean science and if you pronounce it “ilma-ilmi-ilmu-etc…” you might mean science as well and you might mean other meanings, depending on it’s place in a sentence or how you might spell it if you’re spelling it alone and grammar plays a rule over here, if you pronounce it “alama” you’ll mean “he knew” if you pronounce it “allama” you’ll mean he taught and the meaning might change depending on it’s place in the sentence, like night and knight in English were both are pronounced nīt, let alone how the natives pronounce it with all it’s meanings with their endless dialects all over the Arab world, and the word I gave is just an example, the list is endless, I give to you that when it comes to writing Chinese and Japanese are harder than Arabic, Arabic after all is an alphabet, just some letters and you form the word, but since it’s an “Abjad” writing system and short vowels are cut and like I said one word might have MANY meanings and pronunciations, at this point it will be harder to learn and grasp, hence even we Arabs many times find it harder grasp and pronounce the word wrongly (depending on it’s place on a sentence like I said changes the meaning), while Chinese and Japanese uses symbols, but when it comes to the Language itself, not a single other Language out there is more harder than Arabic… or/and more eloquent than Arabic.

          2. And I forget about this when I said not always changes in the end, but it does also change at the beginning and the mid, when it comes to verbs in Arabic there is Active(past/present/imperative) Passive(past/preset), he only gave the active past ones, and he didn’t gave the full list, there is for example “takaataba(changes at the beginning and at the middle as well) and here’s other examples for other verbs, though those stands for write rather then wrote “yaktub-yaktubu-yaktubun-yaktubn-yukaatib-yukaatibu-yukaatibun-yukaatibn-taktub-taktubu(also all the rest of what I wrote with Y but change it with T)-aktubu-okaatibu-naktubu…etc honestly(probebly what he said and what I said is not even the half of it xD), the list is really long xD

          3. Linguists have studied languages for decades and virtually all of them agree that if you are in a romance language, learning an eastern language such as Chinese or Japanese is harder than Arabic.

            That doesn’t mean Arabic isn’t hard to learn, it still takes roughly 2000+ class hours to obtain fluency. But in regards to Japanese, you literally need to memorize over 2000 symbols just to be proficient to a 9th grade level. Even if you memorized the symbols, you still have to learn what the words actually mean.

            Even if you say that not all words follow a pattern, based off your examples a lot of those words appear to have a pattern. Languages are a lot easier to learn if there is a pattern involved, adolescents are a lot better at picking up these patterns than adults.

            Also, someone in Asia for example would not need nearly as much time to learn Arabic as they already have a symbolic based language, this is specifically rating how hard it is for English speakers to learn a language, not what is the most difficult language in the world is.

          4. Put aside their writing systems or let’s say Arabic, Chinese, Japanese are written with the Latin alphabet then you’ll get what I meant by saying that Arabic is way more harder than those 2, of course those symbols aren’t easy to learn, wouldn’t be easy for me as well as an Arabic native speaker to learn them, I know they’re way harder to learn than any alphabet or any other writing system, but that was not what I meant.

    2. Dwane Blundell

      Very good point. It’s been a long time since I wrote this article, but
      if my memory serves me well, I hadn’t been able to find reliable
      statistics on Arabic. I couldn’t find what was the largest dictionary in
      the language. If you have any clues, let me know. Sounds like the opportunity to step up and prove a point was offered, but it just isn’t important enough to prove. Awwwh that is why, I am dealing with canadian indians this is this the responsibility of others to prove 12 to 13 million words, or wait ……this just in….. has an unknown number between 90 and 500 million words according to
      different sources due to various reasons such as diacritics system.

    3. Ксения Синяговская

      Do you think that language is the most important because of numerous speakers? And the languages with less of speakers are not so important? That’s totally gibberish! The importance of language doesn’t measure in native speakers’ number! I love rare languages like Moksha, Northern Sami and so on and for me they are even more important!

  16. I don’t believe this , 2500 to 3000 is childish !
    I personally know about 4400 words (I am sure for 3000, because I wrote them in an application called Anki, they are not families, or similar words, completely different words like “rib” and “flicker”, and additionaly I choose %99 of my input from English learners books and resources like Longman or Oxford and …) and still am an idiot in reading and understanding websites and books and watching movies or TV shows and etc. And I can tell you with a guarantee that more than %50 of what I face and deal with is unfamiliar for me, except websites that has a simple writing like your site, or books for intermediate english learners that I can understand most of them(about %80 to %90). Forget about American movies or newspapers or popular teen novels, even for 10 to 12 years old !
    Tell this to someone who doesn’t know anything about English. by knowing 3000 words you just suck a sweet blue sucker with little 2 year old kids in nursery school and frequently say “hi” or “how are you” and point to the balls and say their color, nothing else.

    1. Hi Ali. Your English seems to be pretty good for somebody who knows 4400 words, so I guess you have pretty much confirmed the results of the analysis contained in this article, which states that knowledge of 2500 to 3000 words is enough to get by in everyday life and start learning new words from context. Obviously, the number will differ from language to language and the analysis is not scientific, but if you could easily comprehend this article, you definitely seem to be on the right track.

      Of course, knowledge of 3000 words won’t be enough to magically start understanding movies (reading and listening skills are two different skills) or novels that use a lot of descriptive vocabulary. But at least, you have reached a point where, arguably, you could begin learning new vocabulary from context, which I would recommend you do by working with material that’s slightly more challenging than your current abilities would normally call for, but that still closely matches your current level of language skills.

      1. Hi (?),
        Thanks for your comment. But I barely can speak. And here is where I doubt 3000 words is enough for everyday life. Maybe you find my writing a little good for an ESL learner, but I won’t’ survive in U.S. for example, by this level of speaking that I have. Anyway, it seems that I’m doing right thing as you said. But I hope there will be a time to stop reading these silly materials wrote for ESL learners. I feel stupid and sometimes moron, when I start reading or listening to these resources( I learned “moron” in a native novel though! 😀 ).
        Thanks again.

      2. Jonatas Cabral

        I find this article to have good insights, but some premises that lead the reader in believing some bullshits that will prevent them to study properly. First of all, while 3.000 words can be enough to understand silly articles and bs conversation. It’s far from knowing a language enough to enjoy it at its potencial. I myself have a vocabulary’s size ranging from 8.000~8.500 words and can’t understand most of the musics i hear. I’m not fluent in the language and have a hard time putting my thoughts into phrases instantly in a way that shows ease while talking. I’m far from really mastering this language, though i’m making progress and am reaping the rewards. If one wants to learn a language, one should give all his sweat and blood to that goal. I agree about trying to understand a word by its context. The way to do this is using a deviation of “differential diagnosis” used in medicine. You try to find the right meaning of the word by eliminating the wrong meanings. BUT, after that process, you MUST check a good dictionary to confirm or deny your proposition.

  17. I modify 2 things in my previous post :
    1-teen novels, even for 10 to 12 years old (*kids) !
    2-by knowing 3000 words you (*can) just suck a sweet blue sucker

  18. Wow. I’m amazed sir. It’s funny how the Arabic language, apart from using that definition, had only one short sentence in your post. Believe it or not, you cannot speak about languages without considering Arabic language which is used in many countries.
    It has an unknown number between 90 and 500 million words according to different sources due to various reasons such as diacritics system. Hope that gives a short answer to your question of the biggest vocabulary language.

    1. Ксения Синяговская

      It’s impossible to cover all the languages in one article. There was no word about Finnish and Ukrainian, so what?

  19. Why you publish this bullshit?

    Looking how many words are in a dictionary cover and then
    producing this table? You don’t know Korean or Japanese so how you know what
    they mean buy a word count? This is so
    wrong!

    1. Hi there,

      Could you possibly elaborate on the logic behind your argument, if we can call it that? In what way is the article misleading or wrong?

      And for your information, I do know about Korean (and to a lesser extent, Japanese).

      Thanks for your interest in Lingholic.

  20. “Who cares?” is not a sensible answer to any question. By definition, the person asking the question cares enough to ask the question. In any case, why isn’t “I’m just interested” sufficient answer to the question on a site devoted to answering interesting questions?

  21. Chinese only use 3000 characters for 99% of their daily life, but in total there is more than 100,000 characters, and there are no Chinese can sure about how many words there are. also this Chinese dictionary is for uneducated people to used. at last those information about Chinese is not right, maybe it is from some kind English website or researcher, but still is wrong. the only right information about Chinese language is come from Chinese researcher not those people who don not deeply understand about the Chinese languages. this article will only make Chinese laugh about it!

  22. Arabic Language Fan

    So you avoided a language like Arabic altogether, I wonder why?
    Just in case someone out there is interested to know the real answer to the question, Arabic has over 90 millions different words (actually estimate is between 90 millions to 500 millions) with over 400 millions using it language as first language and an extension of a billion+ people using it as a spiritual language. Arabic is still oldest living language in the world with little change for the last 2500 years. No one certain when this language has started as it is old than any written history. Again, I am puzzled why such a language has been omitted from the list above (mind you, I am not an Arab)

  23. There has been some confusion around the chinese math. in fact you are confusing words and cararters, a word can contain a caracter or more, and what you need for HSK 6 for example 5000 caracters which you can use to make a lot more then 1.35% of the Hanyu da cidian.

  24. You forgot RUSSIAN.

    1,000 most common words only get you 60% fluency.

    You could learn THREE other languages to 60% fluency just to get started in Russian.

    And then there is the Russian grammar. What a Russian BEAR their grammar is.

    1. Ксения Синяговская

      Yes, I agree with you here. Slavic and Finno-Ugric are very hard because of difficult grammar and that fact that you mentioned, 1,000 most common words only get you 60 % fluency at best…

  25. Ксения Синяговская

    I just want to add a pea to this soup ) As a guy below mentioned, a thousand words of Russian gives you just 60 % fluency at best. The same I can tell about Finnish.

  26. I think Arabic Language easier than Chinese and Japanese Languages to “LEARN” it.
    despite the fact that Arabic language is easy, it has lots of words that you can guess the word formation easily after while of learning.This for grammatical changes. However, it has lots of vocabulary for things, for example, lion has more than 100 different words.this clarify how much vocabularies in Arabic language. In addition, the grammar in Arabic language is huge and its a science that take a years to study it. it is the richest language in terms of vocabularies and grammars quantity.For example, some time you can notice that it is difficult to translate the real meaning of an arabic context to another language even though you have learned the other language and speak it natively. I don’t have any source for my information, but as a native arabic speaker and as people have said, I can say that no language is better than the arabic language in terms of expressing the real meaning to the other person.

Leave a Comment

Your email address will not be published. Required fields are marked *