31 January 2013

Viral Stuff: The Top 100 Words We Use

The Oxford English Corpus is such stuff as Oxford Dictionaries are made on. It contains texts collected mainly from innumerable Internet sites. Their total length so far is about two billion words. The texts represent different varieties of English, different genres, styles and registers; they all come from present-day English (from the year 2000 onwards) and are supposed to be representative of the current state of the language. Here are the 100 commonest words in that vast material:

1     the
2     be
3     to
4     of
5     and
6     a
7     in
8     that
9     have
10    I
11    it
12    for
13    not
14    on
15    with
16    he
17    as
18    you
19    do
20    at
21    this
22    but
23    his
24    by
25    from
26    they
27    we
28    say
29    her
30    she
31    or
32    an
33    will
34    my
35    one
36    all
37    would
38    there
39    their
40    what
41    so
42    up
43    out
44    if
45    about
46    who
47    get
48    which
49    go
50    me
51    when
52    make
53    can
54    like
55    time
56    no
57    just
58    him
59    know
60    take
61    people
62    into
63    year
64    your
65    good
66    some
67    could
68    them
69    see
70    other
71    than
72    then
73    now
74    look
75    only
76    come
77    its
78    over
79    think
80    also
81    back
82    after
83    use
84    two
85    how
86    our
87    work
88    first
89    well
90    way
91    even
92    new
93    want
94    because
95    any
96    these
97    give
98    day
99    most
100   us

A few facts are worth noting.

Almost all these words are “native” in the sense that they continue forms inherited from Old English. Most of them can be traced back in time still further, to Proto-Germanic, and quite a large number have their roots in Proto-Indo-European, the most distant reconstructible ancestor of English. Only four of them (just, people, use, and the second syllable of because) are Old French loanwords (first attested in 14th-century documents). A few are of Old Norse origin: notably, the 3pl. personal pronouns they, their, and them, but also want, and possibly take, while get and give owe at least their initial /ɡ/ to Old Norse influence (the closely related Old English verbs began with the palatal glide /j/). The Old Norse loans were taken from the Scandinavian settlers in the Danelaw area, presumably between 800 and 1200. The remaining items (ca. 90% of the list) have “always” been English.

This illustrates the rule that the more common a word is, the less likely it is to undergo lexical replacement [see: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history]. If we looked instead at the entire lexicon of present-day English, we would find that relatively recent borrowings from foreign languages, most often Latin or French, account for at least some 80% of the vocabulary. That’s because rarely used words are much more likely to be substituted.

Many of the most common items are not content words that indicate things, ideas, actions, states, etc., but function words that mean little or nothing by themseves. They join content words to modify their meaning, express grammatical relationships, glue the sentence together, and facilitate discourse. They include articles, pronouns, conjunction, prepositions, simple adverbs, auxiliary and modal verbs, quantifiers, and miscellaneous “particles”. Nearly all the words in the first two columns above are of this kind (the only exceptions being say, get, and go, whose meaning is not particularly specific either). The “Top 100” words are extremely successful replicators: practically every sentence must contain a few of them. Their occurrences make up about 50% of the total material in the Oxford English Corpus!

