Subscribe to our mailing list

* indicates required
Close

Sunday, March 01, 2009

Top 15K English words

There are plenty of lists of commonly used English words out on the Web, but it turns out most such lists are limited to the 1000 or 2000 most-used words in whatever-corpus-was-sampled, and it's actually surprisingly hard to find a free list of, say, 10K or 20K words, sorted by frequency of usage. I did manage to find such a list, though (containing 15000 words) at AudienceDialog.net.

The explanation behind the genesis of the list is interesting:

While writing our page on Global English we discovered that the best vocabulary for students of English to aim at is around 15,000 words. With a vocabulary of that size, you should have a sustainable knowledge of English. That means when you find a word you don't know, you can usually work out from the context of the sentence. In a document of average difficulty, there will be less than 2 words in every 100 that you do not know.

The technical approach behind compiling the list is explained here. It's always interesting to see what kind of methodology someone uses when compiling lists of this sort, because the results are dependent on so many factors. There's no one right way to determine "the most frequently occurring words in the English language" (the whole idea is a bit absurd if you think about it) and in fact no two lists of this kind are ever the same. But that doesn't at all limit the utility of such lists for the purposes for which they're usually used, fortunately.

I wonder if anyone keeps a list of the most frequently occurring syllables?

And by the way: If you want to see a truly great interactive longtail graphic on this subject, I urge you to check out http://www.wordcount.org/main.php. It's astonishing.

6 comments:

  1. That was interesting.

    Just wondering if they have a parser somewhere to take a sizable sample of your text and help ascertain your 'vocabulary count".

    Thanks for sharing.

    Cheers!
    M.
    http://blog.mindgap.in

    ReplyDelete
  2. Hello!
    Wordcount ist cool!

    Thank you.
    Chris
    www.semanticblog.eu

    ReplyDelete
  3. Thank you. I was trying to find list with decent size and found your blog post :)

    ReplyDelete
  4. frequency in a corpus and what they teach you (in books or language courses) doesn't always align with each other.

    Point in case: one of the 1000 most frequent verbs in
    Italian newspaper text is uccidere ("kill"). In your
    average learn-a-language book, they tell you how to
    get to places, how to buy food, but not how to kill
    or get killed - maybe they want to avoid the negative
    connotations, but the truth is that corpus usage is
    only weakly predictive of your daily needs.

    ReplyDelete
  5. This is an interesting blog. Well i was looking to learn English words. Hope i would get a Qualitative Source.

    ReplyDelete
  6. With the recent reforms and revised pay commission suggestions, youth are more inclined towards govt jobs compared to the IT or other private engineering jobs.

    ReplyDelete

Add a comment. Registration required because trolls.