How many words do I need to know? The 95/5 rule in language learning, Part 1/2

A very common question that people ask when starting the study of a foreign language is “How many words do I need to know in order to be conversationally fluent for everyday talk in X language?” This is a very good question, and one that we will try to answer in part 2 of this post, but first of all, let me ask you this: Have you ever wondered how many words there are in your language? Well, this is the wrong question, in fact, since there is no single sensible answer to this question. Why is that? Simply put, it’s impossible to count the number of words in a language, because it’s so hard to decide what actually counts as a word.

Old books on a bookshelfFor example, it is said that the word “set” in English has 464 definitions in the Oxford English Dictionary. Would we count a word with multiple definitions as one single word, or would we count each definition has an individual word? And what about phrasal verbs, such as “set up,” “set about,” “set apart,” and so on? Or what about so-called open compound words like “hot dog,” “ice cream,” and “real estate”? Lastly, if you consider the plural and singular forms of words, different verb conjugations, together with different endings, prefixes and suffixes, you will quickly understand the difficulty in counting the number of words in a language.

So the question really should be: Do you know how many words there are in your language’s largest dictionary? Since I wanted to get a rough idea of the number of words in some of the world’s major languages, and compare this number to the average number used 90 to 95% of the time in everyday life and in common news articles, this is a question I spent quite a bit of time searching answers for. And I’m sure you are curious too.

As I said before, many language learners wonder the number of words they will have to learn before gaining intermediate or advanced fluency in a given foreign language, and I will answer that question a bit later on in this article. So after doing quite a bit of research, I did manage to find the number of words the major dictionaries of the world’s major languages, numbers which you will find in part 2 of this post. But hey, don’t stop reading here, because I have some other important stuff to discuss!

The Pareto Principle and Language Learning

Vilfredo Pareto portrait
Italian economist Vilfredo Pareto

So what is the purpose of my “research”? Well, some of you might have heard about the Pareto Principle, also known as the 80-20 rule. If you’d like to learn more about this, I encourage you to check out my post that partly deals with this subject. In a nutshell, though, the Pareto Principle is as follow: after having observed numerous phenomena ranging from land ownership to pea pods, Italian engineer and philosopher Vilfredo Federico Damaso Pareto came up with what became known as Pareto’s Law: for many events, roughly 80% of the effects come from 20% of the causes. In other words, in the context of work or study, 20% of the efforts bring in 80% of the results.

In the context of language learning, then, I wanted to find out the approximate percentage of words you would have to learn to understand 90 to 95% of the most commonly used words in everyday life. Why 90 to 95% of the most commonly used words? Simply put, this is the rough amount of comprehension needed in order to understand what is being said quite well in a language. Plus, by understanding this much of the vocabulary, you’ll be able to guess the remaining 5 to 10% of words that you do not know simply through context. The numbers are not exactly the same as the 82-20 rule, as you’ll see in my next post, but the principle is similar: only a small fraction of your efforts will bring in the biggest results.

This is very important, because after having reached a level of understanding high enough in a language, I believe it’s time to drop the dictionary and to truly start (or continue at an increasing speed) learning “inductively”, through context and through good guesswork. You do that every day in your own language, since nobody knows the meaning of every single word in their language (wait until you see the number of words that the Oxford English Dictionary defines!), very far from it in fact; so why not do the same in a foreign language?

Developing Good Guessing Skills

A few weeks ago I read an article in The Telegraph entitled “Learning a foreign language: five most common mistakes.” It’s a short and rather informative article, so I encourage you to give it a quick read. One of the most common mistakes that the author listed in there was that of “Rigid Thinking”. The excerpt is worth quoting at length:

Linguists have found that students with a low tolerance of ambiguity tend to struggle with language learning.

Language learning involves a lot of uncertainty – students will encounter new vocabulary daily, and for each grammar rule there will be a dialectic exception or irregular verb. Until native-like fluency is achieved, there will always be some level of ambiguity.

The type of learner who sees a new word and reaches for the dictionary instead of guessing the meaning from the context may feel stressed and disoriented in an immersion class. Ultimately, they might quit their language studies out of sheer frustration. It’s a difficult mindset to break, but small exercises can help. Find a song or text in the target language and practice figuring out the gist, even if a few words are unknown.

[bold emphasis mine]

Rigid thinking is in fact extremely common among language learners, and extremely uncommon when it comes to your native language! After all, do you really reach for a dictionary often when reading in your native language? My guess is, not so often, even if, I am sure, you do not know the meaning of several words you come across (especially in novels, where the descriptive vocabulary is very literary and uncommon at times).

Confused lookYet good guessing skills are truly important when it comes to acquiring a foreign language, for the simple reason that it’s not possible (and even if it were, it would be highly impractical) to learn every single definition of a even a single word (such as “set”) in English. If you can’t learn the definitions of a single word in a given language, why even imagine the need to learn the definition of every single word you come across?! What happens is that you will eventually learn words through repeated exposure, in different contexts, at different places. This is called assimilation. And this is your aim when acquiring a foreign language. Check out my post entitled “Memory Tip #4: Learn From Context” for more info on how to use context to also help with memorization.

Let me give you this example sentence: “We put a tremendous amount of effort to finish this project, and we finally succeeded.” Now, let’s say that you understand everything here except for the word “tremendous”. Chances are you get can a rough idea of the meaning of “tremendous” through the context given here. You understand 92.5% of this sentence (14 words out of 15), and the remaining 7.5% can be understood contextually. Keywords include “effort”, “project”, and “finally succeeded”, and through guesswork, it’s not that hard to come up with a meaning that will be similar to what you would find in a dictionary. If you couldn’t guess the meaning of the word “tremendous,” by the way, it simply means “a lot”, “a great amount”.

Assimilating the Language

So the point I’m trying to make here, is that if you can achieve a 95% understanding of the most common words found in a given language, it will become possible to acquire the remaining unknown words contextually, by a process called assimilation (the method Assimil works around a similar philosophy). Now, of course simply knowing words does not equal to a perfect understanding of what you listen/read, since there is also grammar/idioms/figures of speech/etc. involved in the language, and these can provide wonderful barriers to understanding. You could very well know every single word in a sentence and still not understand what is being said because of unfamiliarity with these aspects of the language. Nevertheless, most of the time, by knowing 90 to 95% of the words in a sentence, and by being provided with sufficient context, you should have very few problems understanding and communicating in the language, especially if you are learning a language that is part of the same language family as that of your mother tongue.

So we’ll stop here for today’s post. In part 2, we take a look at the number of words contained in the major dictionaries of some of the most widely spoken languages. We will also try to answer the question “how many words should I ‘learn’ before gaining conversational competency in a given foreign language?”

Click here to read part 2 of this article.

By Lingholic

22 thoughts on “How many words do I need to know? The 95/5 rule in language learning, Part 1/2”

  1. Thanks for your article.
    I have enjoyed reading this article.
    I totally agree with what you wrote.

    However, I sometimes feel it not easy for me to guess the word that I don’t know in sentence I am reading.
    I think I is better for me to find key word so that I understand what I am reading.

    As mentioned already, I should not find the word that is not key word, like “tremendous”..
    Anyway, thanks again.
    I want to let you know I have really been enjoying reading your article since I came to know you are in Korea.

    I am looking forward to part 2.

  2. Hi Sam,
    I loved the article. In facts, it tells what I’ve experienced hhh
    I used to look at the dictionary for every single word I came across. It took me hours to read and understand a short text!! I was trying to “absorb” the definitions … That was overworking and fruitless :<
    Then, I let it down, and found out a new method … read read read, listen listen listen … by the end, some words/idioms/expressions just sticked in my mind, at that moment, I took the dictionary, try to find the definitions, and to match them with what I understood/guessed from the context … and guess what, it's always close to the definition given by the dictionary, and it stays in my mind for ever, I even begin to use it spontaneously 😉
    I can't wait to read the second part 🙂

    1. Yes, your experience seems to be similar to what many language learners experience. Eventually, you have to give up the dictionary and just read read read, and listen listen listen.

      Thanks for the kinds words 🙂

  3. hi Sam!!

    Great article as always 🙂
    It´s fantastic that you show us this things. By the way, I have a question, from where can I found the words to learn? from the dictionary? I´ve came to Greece to work and I would like to learn the language 🙂


    1. Hi Alfredo! Usually, you don’t need to “look” for the words to learn. You will simply acquire them through exposure, by reading and listening and speaking as much as you can in the language. In the beginner stages this means working mostly with a textbook and, perhaps, with a tutor. But once you reach an intermediate level you’ll start getting exposed to native content, such as blogs, news stories, books, podcasts, and so on.

      In any case, you can check out the “Wiktionary Frequency List” which is a wonderful resource to find the most common words in a given language. For Greek, follow this link to access the 5000 most used Greek (Ελληνικά) words based on contents of However, I would not recommend anyone from learning words from this kind of list. It’s always much better to learn from context!

  4. Good point, but this just proves why languages like Chinese or Japanese that use characters that are unreadable are the hardest languages to learn. You can guess the meaning of the words you don’t know but you won’t really learn a new word if you don’t even learn the pronunciation of it.

    1. Hi Wai.

      Well, of course, this pre-supposes that the only way to learn a language is through reading. Plenty of people learn languages by listening and speaking them, in which case knowing characters is not a necessity.

      I’d also like to add that technology and the internet has made it considerably easier to read languages such as Chinese or Japanese because it’s easy to hover over a word to see the pronunciation. Lots of app are really useful in this respect.

      1. That doesn’t really change the fact that in order to be fluent with the language you’re learning you need to learn the characters and some things by using the ‘hated dictionary’. Even natives *use* it, even if you claim that it may be such an ominous tool to use. I really mean it. They use it. Be it asking their parents for the words definition or learning before a biological test the definition of a word.

        I am not saying that one should only use dictionary as a way to learn a language. That would be inefficient. However this ‘perfectionism’ is a good thing. If you don’t know something go check what it may be. We humans should do that with many things. Be it words or some facts. I laughed when I saw you write that the above stated question(how many […]) isn’t to be answered. It is to be answered, even if it may be hard to answer.

        You say that there are different meanings behind the words. For example the word staff can mean the personnel or the stick that magicians use. So, these are two different words. I agree with that.
        That doesn’t make ‘estimating’ the number of words needed to speak fluently. We need to create a clever system that would make it possible to answer this question.

        So, each meaning of a word together with the word itself is one unit.
        staff as personnel is one unit while staff as stick is the other one. So if one were to know both meanings he would know 2 different units.

        However this doesn’t solve the issue that some words are less or more important than others. For example this above mentioned staff as a stick is rarely used and much less important word than the meaning as the personnel. So in order to make this system reliable, because to speak a language you don’t need to know hard, rarely used words, but the ones used in the everyday conversation.

        The solution again is to match to each unit it’s own importance, let’s call it weight. Let’s say the word staff(as a personnel) is 10 times more often used than the word staff(as a stick). We should say that staff(as a personnel) should have its weight 10 times bigger than the ones of the stick.

        Naturally a word like ‘I’ would be more important than staff(personnel), so ‘I’ could cost 10 times more than staff.

        If we had all these units together with their weights we could easily take the average set of units(words with their meanings) that a average native is using during let’s say the span of a year.

        Then we would sum their weights and the number would be the number of weights one should know to speak fluently. I shall also mark that it would be a necessary condition, but not a sufficient one.

        This is just a model that could be applied to almost every language and it would make possible to estimate the words one would need to know. So, yes. It is possible to answer such a question. And yes it’s a lot of work to do, because the model I proposed is simply theoretical.

    2. No it’s not really the case, Chinese is a very powerful and simply to use language compared to English and european languages system. Every new word doesn’t require the full understanding of each character and you can guess the meaning of the word even without context, it’s what English can’t do.

      90-95% rule doesn’t apply in chinese language, a smart person can apprehend a lot of two charactered words or three or four words by simply understanding one or two of them in the word. If there’s a rule, I’d say it’s a 50%-70% understanding, depends on how your ability and talent to understand the language and context.

      And chinese characters most powerful ‘sources are that you can understand a strange character by simply understanding the structure of the character, it’s either vertical separated or horizontal separated, each character has its auxiliary segment for helping understand the whole character’s meaning, you might encounter different characters but you would find them extremely similar, that’s why there’s a load of same auxiliary structures in every related-character group. For example, most of metal related characters would contain the at-the-left structure of 钅, this left alone means gold or related materials, and if you find any character that contains this structure, the character is most likely a character that’s related to metal or metal-like materials. And there’s a tons of characters that contain this structure in their structures, practically, most characters in periodic table are labeled with this auxiliary structure. And the pronunciations depend mostly on the right side of the structure, the primary structure of the character. And as you may see, and from the progress of learning chinese, a lot of chinese characters are composed of either two or three recognizable characters, they’re the source characters, the 50% that you have to learn the meaning of it, after you master them, lots of new characters and words are unlocked for you as you crawl through context or just the verbalizations, flashbacks. This is what English can’t do, that’s why they need tons of new words (simple compound or new creation) to support their vocabulary whilst chinese only needs to pick up the already rich character pool to reform a new word.

  5. If you didn’t interrupt yourself so many times to plug your other articles, I would have finished reading before commenting. Seriously – let the piece speak for itself. If it’s good, readers will naturally look for your others. Just really annoyed me.

    Also. Those arrows on the sides (mobile) are driving me nuts. They are so sensitive, they are easy to hit when scrolling. I really wanted to read your article, but it’s like you’re going out of your way to repel the reader. I see you even plug yourself again at the end of the article! Please. Stop. Ugh. I’m going to go redo my Google “how many words / “learn language” search.

  6. Hey Sam,

    It’s a great article, I liked so much. In my point of view and own experience is a waste of energy always look at the dictionary new words that you don’t know(sometimes neither the dictionary knows the word), I’ve been doing this for a long time in everything that a I read, like you said, sometimes you know every words in a sentence and don’t know the meaning, haha it’s really true, so I’ll try not always open the dictionary and search for something that I can understand for context.

    Thank you and take care.

    1. Herbertificus

      And for good reason — it IS the best language in the history of the human race. The mishmash of conquests, clashing histories and culturo-linguistic convolutions that created the English language is both a story AND a result that has no equal in all of human history.

      Let us all, on bended knee, praise heaven for the four major historical events that resulted in the creation and ascendance of Englysch . . .

      The invasion of the Angles and the Saxons. (Foundational language.)

      The invasion of the Vikings. (Vocabulary and grammer.)

      The invasion of Bill and his Frenschies. (Vocabulary. ) (Special shoutout to true badass king Harold Godwinson, who unknowingly sacrificed his life to allow the largest single extragenic force in the historical development of Englysch. Though your reign was short, your worthiness was equal to that of Bill, but Cosmic Linguistic Manifest Destiny won the day.)

      The Bubonic Plague. (Re-ascendance of Englysch.)

      The first and predominant settling of America by the Englysch rather than by any other nation. (Creation of the future globally dominant culture, economy, technological and industrial powerhouse . . . and therefore the globally dominant language.)

  7. Herbertificus

    Hay . . . I mean . . . HEY, Lingholic (too many dadgum homonyms in English), have you ever seen the documentary about the development of the English language called “The Adventure Of English?” It was written and narrated by the chancellor of Leads College, Melvyn Bragg. First aired around 2001 – 2003. It is fantastic. A definate Must See. I’ve watched it about four times through, and I’ve watched the first four episodes, covering the Anglo-Saxons through the Tudor period, about 10 more times. It really IS just stupendously interesting. My son is sick of hearing me repeat a passage from William Of Nassyngton’s “Speculum Vitae,” which is quoted in the doc:

    Latyn can no one speak I trowe
    But those who it from school do know
    And somme know Frensche but not Latyn
    Who are used to court and dwellyn therein
    And somme know Latyn — though just in part
    Whose use of Frensche is . . . less than art
    And somme who understonde Englysch
    Neither Latyn know nor Frensche
    But unlettered or learned, olde or yonge
    Alle understonden the Englysch tongue.

    (circa 1325)

    I don’t know about you, but I laughed hysterically at the line, “less than art.” Isn’t that a trope called “litotes?” Understatement?

    Sorry. I had to paused and laugh hysterically some more. Actually, the “less than art” part is the work of a modern translator — possibly Bragg himself.

    I assume you’ve read “The Mother Tongue,” by Bill Bryson ? Fantastically interesting and entertaining.

    Now I’m going to have to read all of your articles. Basically, discovering your website amounts to a homework assignment ! There’s just too much to learn in life. Is it possible to make a Faustian deal with the devil to be able to read and learn for twenty years and not have it count against your lifespan?

  8. I’m trying to learn Spanish, so your advice is priceless. I’ll try not to become a copy reader while learning. After I’ve reached a certain level, all bets are off for that. (I rewrite my own work a lot.) *Now, of course simply knowing words does not equal to a perfect understanding of what you listen . . .”* I believe instead of “equal” the word should be “equate.”

Leave a Comment

Your email address will not be published. Required fields are marked *