The Oxford English Corpus is such stuff as Oxford Dictionaries are made on. It contains texts collected mainly from innumerable Internet sites. Their total length so far is about two billion words. The texts represent
different varieties of English, different genres, styles and registers; they
all come from present-day English (from the year 2000 onwards) and are supposed to be representative of the current state of the language. Here are the 100 commonest words in that vast
material:
1
the
2 be 3 to 4 of 5 and 6 a 7 in 8 that 9 have 10 I 11 it 12 for 13 not 14 on 15 with 16 he 17 as 18 you 19 do 20 at 21 this 22 but 23 his 24 by 25 from |
26
they
27 we 28 say 29 her 30 she 31 or 32 an 33 will 34 my 35 one 36 all 37 would 38 there 39 their 40 what 41 so 42 up 43 out 44 if 45 about 46 who 47 get 48 which 49 go 50 me |
51
when
52 make 53 can 54 like 55 time 56 no 57 just 58 him 59 know 60 take 61 people 62 into 63 year 64 your 65 good 66 some 67 could 68 them 69 see 70 other 71 than 72 then 73 now 74 look 75 only |
76
come
77 its 78 over 79 think 80 also 81 back 82 after 83 use 84 two 85 how 86 our 87 work 88 first 89 well 90 way 91 even 92 new 93 want 94 because 95 any 96 these 97 give 98 day 99 most 100 us |
A few facts
are worth noting.
Almost all
these words are “native” in the sense that they continue forms inherited from Old
English. Most of them can be traced back in time still further, to
Proto-Germanic, and quite a large number have their roots in Proto-Indo-European, the
most distant reconstructible ancestor of English. Only four
of them (just, people, use, and the second syllable of because) are Old French
loanwords (first attested in 14th-century documents). A few are of Old Norse origin: notably, the 3pl. personal pronouns they, their, and them, but also want, and
possibly take, while get and give owe at least their initial /ɡ/ to Old Norse
influence (the closely related Old English verbs began with the palatal glide /j/). The Old Norse loans were taken from the Scandinavian settlers in the Danelaw area, presumably between 800 and 1200. The remaining items (ca. 90% of the list) have “always” been English.
This
illustrates the rule that the more common a word is, the less likely it is to
undergo lexical replacement [see: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history]. If we looked instead at the entire lexicon of
present-day English, we would find that relatively recent borrowings from foreign
languages, most often Latin or French, account for at least some 80% of the
vocabulary. That’s because rarely used words are much more likely to be
substituted.
Many of the
most common items are not content words that indicate things, ideas, actions,
states, etc., but function words that mean little or nothing by themseves. They join
content words to modify their meaning, express grammatical relationships, glue
the sentence together, and facilitate discourse. They include articles,
pronouns, conjunction, prepositions, simple adverbs, auxiliary and modal verbs,
quantifiers, and miscellaneous “particles”. Nearly all the words in the first
two columns above are of this kind (the only exceptions being say, get, and go, whose meaning is not particularly specific either). The “Top 100” words are extremely successful replicators: practically every sentence must contain a few of them. Their occurrences make up about 50% of the total material in the Oxford English Corpus!