22 February 2013

Yesterday’s Words – Today’s Morphemes –Tomorrow’s Segments


The final /θ/ of filth no longer plays any useful morphological function. It has become fused with its derivational base into an indivisible whole. This is quite often the terminal stage in the life-cycles of linguistic replicators.  Old English -þ- was still a morpheme, but it had already lost most of its phonological substance. A few hundred years earlier, in Proto-Germanic, its ancestral form had been *-iþō, continuing still earlier (pre-Germanic) *-étā. A linguistic entity that used to be a suffix of some length has ended up as phonological raw material. It means nothing by itself and has degenerated into a speech sound which, together with three others, encodes a meaning (or rather a cluster of meanings) but is no different, as far as its status is concerned,  from the final /m/ of film.

Whole words may become reduced to the role of ‘bound’ (non-independent) morphological elements. Many derivational affixes used to be words which, through being frequently used in composition, survived in that function while their free-standing variant went extinct. Old English hād meant ‘person, social status’. When added to a noun it meant ‘the state or condition of being an X’. Hence, for example, OE ċild-hād ‘infancy, childhood’. The word hād > hǭd lingered on in Middle English, but seems to have become rare by the thirteenth century and eventually died out as an independent word. Curiously, in modern ‘gangsta’ slang hood (no connection with hood = ‘head covering’) is used as an abbreviation of neighbourhood. It has become a word again, though with a brand new meaning.

Words have fractal-like properties: the more closely you look at them,
the more structure they reveal
When such reduction and fusion processes have operated for millennia, they may compact a whole string of morphemes into a short word without any visible internal structure. If you look at young /jʌŋ/ today, it’s short even for an English word. In the reconstructed remote ancestor of English, the Proto-Indo-European language, it looked roughly like this: *hju-hn̥-ḱó-s. The first element, *hju-, was the compositional variant of the noun *hóju ‘vitality, youthful vigour’; the second was a suffix (possibly derived from an independent word) meaning ‘having, loaded with’. Together they formed the noun *hjú-hon- meaning ‘energetic young man’ (literally: ‘having the strength of young age’, cf. Skt. yúvan-). The addition of the suffix *-ḱó- produced an adjective with the meaning ‘like a young man, juvenile’. We find its reflexes for example in Sanskrit (yuvaśá-), Latin (iuvencus), Welsh (ieuanc ~ ifanc), and of course in the Germanic languages (PGmc. *jungaz > OE ġeong ~ iung [juŋg] > young). In other words, the /jʌ/ part of young is what has remained of a once independent noun, and the /ŋ/ represents two concatenated morphemes compressed into a single segment. Incidentally, *hóju is a very interesting item in the Proto-Indo-European lexicon, and I hope to return to it soon. 

17 February 2013

All That Junk: The Afterlife of Broken Morphemes


The lexicon is full of elements which were once entirely functional morphemes but which for various reasons – most often the destructive effects of sound change – have become defective and eventually useless. They still get replicated and passed on from generation to generation simply because we acquire our mother tongue (or learn a foreign language) whole, without testing the functionality of every little detail and without repairing those that might seem to have been broken. Linguistic communication involves a lot of redundancy, so a small local loss is easy to tolerate: other elements will take over the function of the damaged one.

For example, the suffix -th, forming deadjectival nouns in early English, is now practically dead. We are still able to detect its presence in words like length, strength, warmth, breadth, width, etc., but the process that formed these words (in a very distant past) is no longer productive, as opposed, for example, to the formation of nouns in -ness. We can say awsomeness, weirdness, and coolness (and interpret such words correctly even in the unlikely case that we have encountered them for the first time), but not “awsometh”, “weirdth”, or “coolth”. We do not understand today why the vowel od length and strength should be /e/ (rather than /ɒ/, as in long and strong), why we say /wɪdθ/ rather than /waɪdθ/, or why, on the other hand, there is no need to modify the vowel of warm to get warmth. Is youth related to young? The spelling is similar and suggestive of a relationship, but how exactly they might be related is a mystery (unless you happen to be a linguistic expert paid for knowing such things).

In Old English the suffix was still quite productive. Also the process of i-umlaut, responsible for the fronting of the vowel of length, strength, and breadth, had not yet degenerated into an obscure fossil remain but was still utilised to some extent as a morphological device. Consider  OE fūl ‘dirty, polluted’. The suffix , when added to it, caused the vowel to become front (though still pronounced with rounded lips, like German ü or French u): the OE spelling was fȳlþ (phonetically /fyːlθ/). Whence the fronting? If you look at the corresponding Old High German noun, fūlida, the reason becomes clear: the suffix once contained an *i, lost in the pre-literary history of English, but not before it had exerted its assimilatory effect on the preceding syllable. In Old English, the addition of the suffix and the accompanying vowel change functioned together as a complex marker indicating a noun derived from an adjective (note the redundancy of such double marking, given up in the case of warmth).

What happened later? Long /yː/ was regularly shortened when it was followed by a consonant cluster (this is also the reason why we have short vowels in depth, breadth and width), and by the end of Middle English the resulting /y/ had become unrounded, merging with short /i/. The modern outcome is filth /fɪlθ/. Meanwhile, the long vowel of fūl developed regularly, becoming a diphthong in the fifteenth century as a result of the Great Vowel Shift. The outcome is ModE foul /faʊl/. Both words are short, and whatever similarity has remained between them hardly compels one to believe that they must be related. In fact, any fully competent speaker of Modern English asked to form a noun from foul will likely suggest foulness – partly because filth, liberated from its original obligations, has shifted its meaning from simply ‘dirtiness’ to ‘disgusting stuff’. We no longer break a word like filth into meaningful smaller parts. Filth is now unanalysable, and the final -th is not recognised for what it used to be. It has become junk.

Some of it will be recycled
Junk morphology may be compared to “junk DNA” – the sequences in the genome that have lost their original “meaning”. The genome is littered with the slowly decaying débris of once-functional sequences:  former genes (now pseudogenes) damaged by a mutation that has rendered them incapable of coding for a protein, retroviruses that have been infecting the ancestors of modern organisms over tens of millions of years, and after integrating their code with the host’s DNA lost their ability to break free and go on infecting new cells... and all that junk. It survives because it accumulates faster than purifying selection is able to take effect. In other words, junk is relatively harmless, so it is not worthwhile to remove it quickly. It can even find a use again: occasionally a broken sequence can be co-opted in a novel function. As we shall see, the same is true of junk morphology. But this is something to be continued in future posts.

09 February 2013

The Family Life of One


[continued from here]

Let’s have another look at the Oxford English Corpus list of the most frequent words. As we realise now, one (#35) is not the only descendant of OE ān to be found there. We also find the following:

a (#6), an (#32)

Also the first syllable of only (#75) is no other than the regular reflex of ān. No fewer than four out of the 100 most frequent English word-forms have the same Old English ancestor. Jolly good show!

Actually, another relative of one is hiding in the list. It is any (#95), which goes back to Old English ǣniġ, related to German einig and descended from earlier *ain-iga-. Its ǣ instead of ā is due to the assimilatory influence of the vowel *i in the following syllable (umlaut). Latin has a word with the very same root and apparently the same suffix, ūnicus ‘sole, unique’ < *oiniko-. The meaning is different from that of OE ǣniġ, but related forms in other Germanic languages usually have the same meaning as the Latin one. Old English must therefore have innovated by converting an adjective meaning ‘only, sole’ into an indefinite pronoun (‘one of many, no matter which’).

Of course the Modern English offspring of ān is even more numerous. Suffice it to mention lone, which has resulted from the faulty analysis of alone as ‘a- + lone’ (on the analogy of pairs like live and alive). The actual etymology of alone is in fact ‘all + one’, but speakers can’t be expected to know the historical origin of the words they use. Once lone became a legitimate English adjective, it started producing its own derivatives, such as lonely and lonesome. The list could be made much longer. One easily enters into symbiotic relationships with other words, forming compounds like one-eyed, one-sided, one-way, etc. The structure of these three is fully transparent. Such compounds are a dime a dozen, but if they evolve for a sufficiently long time, interesting things happen to them.  Gradually, over the centuries, their internal structure suffers obscuration as a result of the erosive effect of sound changes. Thus, prehistoric *aina-liban- ‘11’ (literally something like ‘one left over [when you subtract ten]’) became OE endleofan, ændlefen or ellefne, eventually yielding eleven. An etymologist will inform you that the initial e- is a distant cousin of one, but to a speaker of Modern English the numeral is an atomic entity that cannot be broken down into smaller meaningful elements. The same, in all likelihood, would have been true of Old English speakers.
[click to enlarge]

Here is a (partial) family tree of word-forms whose common ancestor is Old English ān. It is contained in a larger tree – that of words derived from Proto-Germanic *aina-. It’s a very successful family, both because it has many members and because some of those members have become indispensable to English speakers (as shown by their extremely high frequency of occurrence).

07 February 2013

The Secret Ways of Weak Forms: Here Comes a New ’Un


[continued from here]

You may find it surprising, but we have quite precise and insightful analyses of English pronunciation, complete with phonetic transcriptions, dating back to the sixteenth century. One exceptionally talented early phonetician, born centuries ahead of his time, was John Hart, the author of An Orthographie (1569), a treatise in which he advocated a spelling system that would reflect the pronunciation of the Tudor period faithfully by using as many symbols as there were contrastive speech-sounds, “and no more”. Hart left us numerous transcriptions of his variety of Early Modern English in the system he had devised. His testimony is highly reliable: we can be pretty confident that he pronounced whole, one and those with the same vowel, and that there was no initial /w/ in his pronunciation of one. Represented with the help of modern IPA symbols, his transcription of one  is /oːn/. Incidentally, he also had /ˈoːnlɪ/ for only, and in case you have not figured it out for yourselves, only = one + the suffix -ly, forming adjectives and adverbs.

We have to remember, however, that English has been (at least) as variable in the past as it is today. There is spelling evidence to show that some varieties of Middle English tended to insert a glide before any word-initial vowel, as if trying to force every word to begin with a consonant. The glide tended to match the features of the vowel, so we usually find the rounded back glide /w/ before rounded back vowels, including ME /ɔː/. Middle English scribes had no common standard to conform to, so they often used “ear-spellings”, revealing their pronunciation habits. We thus find occasional w-initial variants of words like ǭte ‘oat’, ǭth ‘oath’, ǭn ‘one’, ǭnes ‘once’, ǭnliche or ǭnlie ‘only’, ǭld ‘old’, ǭk ‘oak’, etc. The idealised “dictionary” spellings cited in the preceding sentence represent a Late Middle English “virtual norm” that didn’t really exist. The actual spelling was highly variable, and the documented variants include wote, woth, won, wonys, wonlyche, wolde, wooke, etc. None of these forms survived into mainstream Modern English, except, it seems, one and once (the modern spelling does not follow the mutated pronunciation).

What I think happened was that forms with an initial glide came to be stigmatised as regional or vulgar, and were avoided in “cultivated” speech. The exceptional pronunciation of one became fixed not in the full form employed when speaking slowly and distinctly, but rather in the reduced form used in unstressed positions when the word functioned as a pronoun – a colloquial pronunciation like /wʊn/ or /wən/, resulting from the phonetic weakening of /ˈwɔːn/ at an early date. There is also a /w/-less variant of such an informal weak pronunciation: Modern English /ən/, as in a good ’un (note, by the way, that both a and un have the same Old English ancestor!). A “sloppy” form hiding in unstressed positions apparently managed to escape the attention of purists, who naturally focussed on the elegance and clarity of formal speech. Words which had no weak forms (like ‘oak’ and the rest) lost the initial glide irreversibly. But stressed /ˈoːn/ (as recorded by Hart after the Great Vowel Shift) was paired with unstressed /wən/, in which the parasitic /w/ was able to survive.

Hat tip: The VoiceGuy
(and yet the weak often win)
One frequent scenario leading to the emergence of new irregular variants consists in the restressing of weak forms. When the Modern English indefinite article a/an is pronounced emphatically, it does not revert to its historical source but generates new stressed forms, such as /eɪ/, /æn/. Neither did /wən/ evolve back into /woːn/ when it was reinstated in stressed positions. It spawned new forms (possibly a variety of them) with a restored full vowel, such as /wʊn/, /wʌn/ or /wɒn/ (as in northern England). Those innovations began to compete with /oːn/ and with its descendant /oʊn/ – first as an alternative pronunciation of the pronoun under stress, then also in the numeral.  And weak forms can be formidable competitors. Their sheer frequency of use enables them to survive and repeatedly claim niches inhabited by their rarer if stronger counterparts. Eventually, the new pronunciation infected not only the numeral but also the related words once and none (the latter from ME nǭn, which continued OE nān = ne + ān ‘not one’); possibly also nothing (ME nǭn thing ‘not a/one thing’).

Less transparent derivatives and compounds containing OE ān were left intact and continued to develop regularly, which is why we still have /oʊ/ in only (OE ān-līċone-like’), alone (OE eall + āna ‘all solitary’), and atone (ME at-ǭnen, mirroring Latin ad-ūnō ‘unite’). When the replacement happened, they were already living their own independent lives, and their connection with one was not transparent enough to trigger analogical change.

The discussion would not be complete without mentioning the traditional Scots variants reflecting a ME form with unrounded /aː/ (OE /ɑː/ did not change into /ɔː/ in the Northern Middle English dialect ancestral to Scots and to some northern varieties of British English). Hence we have dialectal forms spelt ae or ane, and pronounced [eː], [en], or [jɪn], depending on the local accent. They are the more-or-less regular outcome either of Northern Middle English ān ‘one’ or of its by-form in which the final /n/ was dropped.

06 February 2013

A Strange Couple: One and Once


[continued from here]

In one ofthe earlier posts I said that the non-nominative case-forms of OE ān did not survive the collapse of the Old English declensional system. It is now time to qualify that statement. They did not survive in their original function, qua case-forms. But occasionally isolated inflected forms of Old English origin were utilised as something else, usually as adverbs. By specialising in a new function they moved into a secure niche which protected them from extinction. This is something worth keeping in mind. Such “living fossils” often appear in various languages. The loss of a grammatical category normally means that all forms marked for that category are discarded, but not if they manage to find employment in some other department.

Dæġes and nihtes
M. C. Escher, Day and Night (1938)
The Old English  genitive was often used adverbially, especially in expressions of time. Such is the origin of adverbs like nights ‘(regularly) at night’, as in “He works nights” (sometimes reformulated as “of a night”). The final -s continues the Old English gen.sg. ending -es (and so, from the historical point of view, has nothing to do with the plural -s of Modern English, but is closely related to the “Saxon genitive”). In Old English, gen.sg. forms like dæġes, nihtes meant, respectively, ‘by day’ and ‘by night’. The phrase þæs ġēares (the gen.sg. of þæt ġēar) meant ‘(in) that year’. In my native language, Polish, the same meaning is expressed as tego roku (also the genitive case of ‘that year’). We find a similar use of the ablative case in Latin (die ‘by day’, nocte ‘by night’ – the so-called ablativus temporis).

The gen.sg. of OE ān in the masculine or neuter gender was ānes /ˈɑːnəs/. Although it is not attested as an adverb (it seems the Anglo-Saxons preferred  the specialised “true” adverb ǣne ‘one time, on one occasion’, also related to ān), its Middle English descendant, ǭnes (the actual spelling was usually ones, oones, oonys, ons, or something similar) was so employed on the analogy of other adverbial genitives, such as nightes ‘at night, by night’, slowly outcompeting its older synonym, ME ę̄ne (from OE ǣne, see above). This is the source of Modern English once. Note that although the final -s is etymologically identical with the Saxon genitive suffix, its pronunciation is voiceless (/s/), unlike that found in the reformed modern genitive of pronominal one, namely one’s (with final /z/). In fact, the voiceless pronunciation is older. The voicing in the Saxon genitive is a late innovation. Once was left unaffected because it was no longer analysed as a genitive. The ending is similarly voiceless in twice and thrice, which reflect the Old English adverbs twiġa and þriġa. They developed into Middle English twīe, thrīe, which then became extended with the suffix -s on the analogy of ǭnes.

But if OE āc, āte, āþ (all with initial /ɑː/) yielded regularly ME ǭk, ǭte, ǭth, and eventually Mod.E oak, oat, oath (with /oʊ/, as expected), why are one and once pronounced as if they were spelt wun, wunce? The spelling is like that of bone and stone, while the pronunciation is completely crazy. No other instance of English o is pronounced /wʌ/. Why, then, does one sound like won, and not like own? This question will be tackled in the next post.

04 February 2013

Ex Uno Plures: One Gets Duplicated


[continued from here]

If you look at the WALS map showing the distribution of different types of indefinite articles in Europe, you get the impression that there are quite a few languages using something else than the cardinal ‘one’ in this function. Those languages include English, Dutch, Frisian, Danish, Breton, Albanian, and Hungarian. This impression is very misleading.

Zooming in on Europe
As a matter of fact, Hungarian egy serves both as an indefinite article (‘a, an’) and a numeral (‘one’). Albanian has a rich inventory of indefinite pronouns, but the one that functions as the indefinite article is një, identical with the numeral ‘one’. Breton uses un as the indefinite article, and unan as the numeral ‘one’. It takes no powerful linguistic insight to realise that the former is just a truncated variant of the latter (by the way: they are distantly related to French un/une, but not borrowed from French!). The same goes for Frisian, which has in ‘a(n)’ and ien ‘one’, In Danish, en (common gender) and et (neuter) are used in both functions. The difference between the indefinite article and the numeral is one of stress, hence the optional use of an acute accent (én/ét) in the numeral. We find the same orthographic device in Dutch (een ‘a(n)’ ~ één ‘one’), and in Norwegian (placed by the editors of the WALS in the “same form” basket). Of course stressed and unstressed forms tend to develop divergently, as the latter are affected by vowel reduction (e.g. Dutch /eːn/ versus /ən/). This is our old friend, word duplication, at work. That’s the reason why the WALS considers the difference as lexicalised, therefore significant, in some languagesSome subtle contrasts may be found in languages not listed as having indefinite markers different from ‘one’. For example, the Romanian indefinite articles are un (masculine) and (feminine), identical with the corresponding forms of the numeral, all derived from Latin ūnu-/ūna ‘one’. But the Romanian indefinite article has a nom./acc. plural form, niște, which is suppletive, i.e. co-opted from a different source (namely, from the Latin phrase nescio quid, roughly translatable as ‘whatever’). Compare French plural des, unrelated to singular un/uneYet, to sum up, the indefinite article, if it exists at all, either has the same form as the numeral ‘one’ at least in the singular or differs from it only minimally, showing clear evidence of a historical relationship. This is what we find throughout Europe. There are no real exceptions.

Whoa, wait a minute... What about English?

English is no exception either. Although it may seem that the basic form of the indefinite numeral is a /ə/ (a cat, a friend), and we add a final /n/ only before a vowel (an apple, an heir) to avoid hiatus, the historical sequence was the other way round. The oldest form of the article was an, and the final /n/ was deleted before word-initial consonants, first optionally and variably, then obligatorily. And what else is an if not a low-stress variant of Old English ān ‘one’? Just as in German or French, the numeral came to be employed as a marker of indefiniteness, and when used in that function (which increased its frequency of occurrence quite dramatically – by more than an order of magnitude) it suffered the usual consequences of being such a tremendous replicator: an increased tolerance of phonetic reduction, leading to the gradual evaporation of phonological substance. The vowel changed from /ɑː/ to /a/, and eventually to /ə/. The deletion of word-final /n/ in function words and in grammatical endings was widespread in Middle English. That’s why we have Modern English my (before a noun) for OE mīn (but mine otherwise, and compare obsolete mine eyes, when a vowel followed). In Chaucer’s language it was possible to use o or oo (pronounced /ɔː/) as a variant of the numeral oon (ǭn) before a consonant:
Noght oo word spak he moore than was neede. [The Canterbury Tales, General Prologue, the description of the Clerk]
In early Middle English an was common before words beginning with a consonant, and various inflected forms of the indefinite article (ane, anre, anes, etc., parallel to German eine, einer, eines...) were still preserved in more conservative dialects. By the end of the Middle English period the distribution of a and an already resembled that observed today (with some minor differences, like the use of an before a pronounced /h/). The functional duplication (and formal multiplication!) of OE ān was complete and fixed.

But the story is not finished yet, and will be continued in the next post.

03 February 2013

There Is Only Wun Such Number


Among the words whose Old English prototype contained the vowel /ɑː/, one is rather special. I mean, one is rather special – you know, the cardinal numeral referring to lowest positive integer. It is so common that I used it in the first sentence of this post without premeditation. In Old English, its form was ān, rhyming with bān ‘bone’ and stān ‘stone’. To be sure, its nominative singular was ān, but the word behaved like a typical adjective, so it was inflected for gender (masculine, feminine, neuter), case (nominative, accusative, genitive, dative, instrumental), and number (singular, plural). Even better than that: like most Old English adjectives it had two types of declension, “strong” and “weak” (never mind the reason why, it isn’t important here). Most of those forms contained /ɑː/, like the nom.sg., but the strong acc.sg. masculine was ǣnne (with an umlauted vowel). However, the post-Old English collapse of the elaborate Germanic system of case and gender killed off all the inflected forms, leaving only ān, which became Middle English ǭn.

1 is special: Benford’sLaw
(the relative frequency of first digits in real-life data listings)
‘One’ is exceptional among the numerals in that it easily develops new grammatical functions and shades of meaning. Apart from its use in counting (“one, two, three...”), it can mean ‘single, lone, not two or more’ (“one cup of coffee”, “one at a time”), ‘unique, only, distinct from others’ (“the one thing that I’m sure of”), ‘the same’ (“we are of one mind”), ‘whole, complete’ (“in one piece”), ‘this, as contrasted with the other’ (“on the one hand...”), ‘a certain, indefinite, some’ (“one Sunday morning”), ‘typical, representative of a class’ (“he was one such person”) or even, emphatically, ‘veritable’ (“one hell of a show”). It has also been co-opted as a pronoun, meaning ‘an indefinite person’ ( “one never knows”), or replacing a noun in a noun phrase (“I need a bigger one”).  In brief, it’s a hard-working word. No wonder it ranks #35 on the Oxford Corpus frequency list, squarely between my and all.

Some languages employ different words for the different senses of English one. This is the likely reason why there’s no single Indo-European root for the cardinal ‘one’. In the languages descended from Proto-Indo-European we have a family of words based on a root reconstructed (roughly) as *oi- (among them *oino-, the ancestor of Proto-Germanic *aina- and, consequently, OE ān), competing with the root *sem-. The original semantic distinction is hard to reconstruct, but it seems likely that *oino- etc. meant ‘single, isolated’, while *sem- combined such meanings as ‘taken together, united’ with ‘one of a series’.

On the other hand, in some languages the cardinal ‘one’ has still more functions than in English. In particular, ‘one’ commonly serves as an indefinite article (French un/une, German ein/eine). Indeed, it seems that about half of the languages that have indefinite articles as a grammatical category, the numeral for ‘one’ doubles up in that capacity [see the World Atlas of Language Structures].

But I suppose this is enough for one blog message (one cannot absorb too much at one time, can one?). The tale will be continued tomorrow. And it’s a long and twisted one.