24 November 2015

Boontling “Deek”: A Rovin’ Gypsy Word?

The little town of Boonville (Mendocino County, California) was established in the early 1860s near a slightly older place called The Corners. A local general store was moved from The Corners to the present location of the town centre and then sold to Mr W. W. Boone, who modestly named the settlement after himself (it had briefly been called Kendall City in appreciation of another local businessman). The inhabitants of Boonville (now about 1000 people) refer to their town colloquially as Boont.

What makes Boonville special is its local ‘jargon’ which probably arose in the 1890s among children and young people (who then grew up without abandoning it). The community was quite isolated at the time, and kept no records to inform posterity why they chose to develop an extremely hermetic and highly inventive vocabulary of about 1500 words, known as Boontling (Btl.). Boontling was not originally meant to be written down, but a semi-formalised spelling was developed for it in the 1970s. One of the local words is to boont ‘to speak Boontling’. At present Boontling is dying out (Btl. pikin to the dusties) despite having been discovered by linguists and made known to the general public. Many Boontling words remain in circulation, but there are few fluent users left. Boontling has never been a fully fledged dialect: it has a distinct vocabulary incomprehensible to outsiders, but the accent is a rural variety of Northern California English (with historical affinities to the Midwestern and Border South dialects), and the Boontling syntax is in nearly all respects the same as that of mainstream US English.

The Old Machine Boys [source]
Despite its recent origin, Boontling vocabulary is etymologically opaque to a surprising extent. Nevertheless, the vast majority of its words are coined from pre-existing elements rather than made up entirely from scratch. Often you have to know the history of the place and rely on anecdotes collected from elderly locals that “explain” the meaning of some words, especially those derived from personal names. (A professional etymologist has to verify their historicity, of course, and this is likely to be the toughest part of the job.) Some words reflect otherwise forgotten dialectal or slangy vocabulary. Some were coined using Humpty Dumpty’s technique of piecing together broken fragments of ordinary English words. Some hide behind strange pronunciations that appear to have been borrowed from Scottish or Ulster Scots speakers. Some came from Spanish (approximately half the population is of “Hispanic or Latino” descent), and a few from the Pomoan languages indigenous to California (there are a few Native Americans as well).

I’m intrigued by a few of them. For example, one of the most common and persistent Boontling words is deek ‘look, see, stare, notice’ (also used as a deverbal noun). I’m not aware of the use of deek anywhere else in North America. However, deek is a well-known colloquial Northernism in Britain. It’s stereotypically associated with Geordie (the dialect of Newcastle and the Tyneside area), but it actually occurs throughout Northern England (including Cumbria, Liverpool and Yorkshire) and much of Scotland. The word is a loan from Romani or rather Angloromani – the Romani-derived lexicon embedded in the varieties of English used by the British Romanies (see Yaron Matras, 2010, Romani in Britain: The Afterlife of a Language, Edinburgh University Press). The Angloromani verb (no longer inflected) is deek, dik, dikkai [diːk, dɪk, dɪkʰaɪ], reflecting European Romani dikh- ‘see’. There are, by the way, quite a few Romani loans in British dialects (some of them, such as pal ‘brother, friend’, no longer dialectal). The Dictionary of the Scots Language gives, among others, these recent examples of the use of deek:
  • Deek that gadgie. ‘Look at that guy.’ (Edinburgh, 1988)
  • The gaffer wis anither big rough-deeking gadgie... (Aberdeen, 1990)
Here, in addition to deek, also gadgie guy, bloke is a Romani loan (Angloromani gadji, gawdjo, gawdja < European Romani gadžo ‘non-Gypsy’).

The root dikh- arrived with the ancestors of the modern Romani all the way from Northwestern India. It is cognate to Hindi dekh- and to Sanskrit dṛś-, dṛkṣ-, all of which continue a well-known Proto-Indo-European root, *derḱ- ‘watch, see’. Incidentally, the Hindi word became independently borrowed into British English via the army slang of British soldiers serving in India, hence have a dekko ‘have a look’.

The Germanic languages also inherited a few words derived from *derḱ-, but English has lost all of them. Old English still had torht ‘bright, splendid, illustrious’ from the PIE deverbal adjective *dr̥ḱ-tó- (cf. Skt. dṛṣṭá- ‘seen, visible’). It was used almost exclusively in poetry, but also served as an element forming personal names. For example, an Old English gadgie called Torhthelm (Totta for friends) owned a farm called Totta’s Homestead (Tottan-hām) in todays north London. The To- part of Tottenham is about all that has survived of the root *derḱ- in Modern English via direct descent. A number of other reflexes, however, have reached English by horizontal transfer from other Indo-European languages, the most spectacular of them being dragon (ultimately from Greek drákōn ‘starer’ → ‘serpent with a deadly stare’). But I’m digressing.

I have no watertight proof that Btl. deek is the same word as Angloromani, Northern English, Scots and Scottish English deek, but I’d be very surprised if somebody proved that Btl. deek had a different origin. Still, I have no idea how the word could have reached an obscure valley in Northern California and become fixed in the local slang without leaving any other traces in American English. If anyone among my readers comes up with an idea how to explain its trajectory in time and space, I’ll be immensely grateful for sharing it.

21 November 2015

A Normally Weird Language

Every week, the digital magazine Aeon publishes several ambitious essays, by competent writers, on culture, philosophy, science, technology and other interesting subjects. One of last week’s authors is John McWhorter, professor of linguistics and American studies at Columbia University; the topic is the English language. The essay is entitled “English is not normal”. Professor McWhorter argues not only that English is genuinely “weird” (anyone who has followed his publications already knows it) but makes a stronger claim that it “really is weirder than pretty much every other language”. Now that is a really weird thing to say, so let’s see how it is argued.

English is not normal
McWhorter begins by discussing English spelling and its caprices (with the reservation that writing is secondary with respect to speech). This is of course due to the conservative character of the spelling system, which has not undergone any major reform since Late Middle English. But English is by no means the only language with such a mismatch between its spoken and written form due to the reluctance of its orthography to catch up with sound change. French, for example, is just as weird. It has plenty of ambiguous spellings with more than one possible pronunciation and alternative spellings for one and the same phoneme in one and the same position. It easily beats English when it comes to mute consonants: vin, vins (verb and noun), vain, vains, vint, vaincs, vainc, vingt are all pronounced /væ̃/. Massive mergers of this kind would surely have caused any normal language to collapse, so French can’t be normal, can it? Irish spelling was even worse before its mid-20th-c. modernisation, and still remains a pretty complicated affair (regular, but you have to master quite a few rules to figure out how to pronounce bhfaighidh). Lhasa Tibetan has lost many consonant both in initial and final clusters, but has retained their spelling representation. And while we are in Asia, isn’t written Chinese even a little weird? Professor McWhorter says that “in countries where English isn’t spoken, there is no such thing as a ‘spelling bee’ competition”. To my knowledge, national spelling competitions are organised in many countries, including Poland. I have finished runner-up in one of them, and I can testify it was tough going. Is Polish a normal language?

The  next claim is that English is not similar enough even to closely related languages to guarantee partial mutual comprehensibility. Well, this depends on what we regard as a “related language”. If, for example, we treat Scots as a close cousin rather than a variety of English, we have to agree that English and Scots are partly comprehensible to each other’s speakers (more so, I presume, than Standard Dutch and High German). English and Frisian are more closely related to each other than either is to the rest of Germanic, but they became separated geographically more than 1500 years ago and, unlike Dutch and German, or Spanish and Potruguese, have not remained in contact or been connected by a continuum of intermediate dialects. If that makes English weird, Greek, Albanian and Armenian are even weirder (not to mention such orphan languages as Japanese, Burushaski or Basque).

According to McWhorter, English is the only Indo-European language without grammatical gender. This sweeping statement is simply false. Let’s begin with the observation that the “classical” three-way distinction (masculine : feminine : neuter) probably did not exist in Proto-Indo-European itself, which only distinguished neuters from non-neuters (a state of affairs thought to be preserved by the extinct Anatolian languages such as Hittite). Once the three-gender system emerged in the rest of the family, it was reduced again in some branches. For example, although Latin had three genders, all the modern Romance language descended from it have only two, having eliminated the neuter. Among the Scandinavian languages, Danish and Swedish have merged the feminine and masculine into one “common” (non-neuter) gender. English has gone one step further. Already at the Early Middle English historical stage all morphological markers of gender were abolished in nouns and adjectives. The only trace of the former three-way system is a “natural gender” distinction in the third person singular of personal pronouns (he : she : it). But even within the Germanic group we find the same development in Afrikaans. If anything is “weird” about gender in English and Afrikaans, it isn’t its loss in nouns, but rather the survival of natural gender in pronouns: having pronominal but no nominal gender is very rare cross-linguistically. As for the rest of the Indo-European family, there is no grammatical gender in modern Persian, Balochi, Ossetic, and several other (though not all) Iranian languages. Armenian (also Indo-European) has no gender either. Both the genderless Iranian languages and Armenian are more consistent than English in their elimination of gender: their personal pronouns are genderless too. Armenian na means ‘he/she/it’; literary Persian has u ‘he/she’ (used only of humans) contrasting with ân ‘it’ (non-human), but the latter has taken place of the former in spoken Persian. As we can see, English is by no means alone even in Indo-European. And since more than 50% languages worldwide have no morphological gender or noun-class system, it is in good company.

The next feature is genuinely weird ­– here I completely agree. No other language known to McWhorter or to me marks the third person singular of present-tense verbs and leaves all the other forms unmarked (the sole exception is the present tense of to be). This is of course due to a historical accident caused by extralinguistic factors – the generalisation of the originally plural polite pronoun ye/you, which led to the disappearance of 2sg. thou/thee together with all the verb forms associated with it (art, wilt, dost, hast, drink(e)st). Nevertheless, it’s strange, though hardly strange enough to justify the claim that English is “deeply peculiar in the structural sense”.

Less convincing is the case for the weirdness of do-support in questions requiring inversion (does she smoke?), in negation (she doesn’t smoke), and in emphatic statements (she does smoke). Professor McWhorter has for a long time argued that the construction is due to Celtic influence and found exclusively in Brittonic Celtic and English. This is doubtful for several reasons. Constructions regarded as precursors of do-support occur sporadically in 14th-c. English, but fully assume their modern functions and begin to spread rapidly after ca. 1500. That’s 1000 years after the initial contact between the Anglo-Saxon and the Brittonic Celts. Why so late? Perhaps the construction existed in informal spoken English and didn’t make it into the written standard until the sixteenth century? Such an explanation could work for Old English, but hardly for the Middle period, from which we have a vast corpus of documents representing different genres, styles, and grammatical registers. There is, furthermore, no evidence of analogous constructions in Celtic pre-dating their début in English, so the direction of influence is uncertain (if it’s influence at all, rather than accidental convergence made likelier by the fact that inversion is used as a syntactic device in both cases). The fact that the Celtic analogue of do-support can also be found in Breton does not prove its great age. Contacts between the Celtic populations of Brittany and Cornwall were regular and intensive until the decline of an independent Duchy of Brittany in the 16th century. Anyway, even if we are dealing with a pattern borrowed from Celtic, English shares it with Welsh, Cornish and Breton, and so can’t be regarded as exceptionally weird in this respect. Again, the claim that such a construction does not occur anywhere else is exaggerated. Do-support analogues have been reported from some Lombard dialects of Northern Italy (the use of the auxiliary fa ‘do’ in questions), and even from Korean (in negation). A related construction (with Old Norse gera ‘prepare, do’) was used in Old Icelandic negation. Even if the English-specific combination of functions is “special”, its components can be found here and there.

The rest of McWhorter’s essay is devoted to the “mongrel vocabulary” of English (with most of it being actually French, Latin or Scandinavian), the richness of synonymy resulting from layers of borrowing, and the impact of Latinate loans on the development of a complex stress system. Though remarkable, these features are hardly unique of even rare. Plenty of languages have been relexified with foreign elements to a comparable degree, and with equally dramatic consequences for their morphology and phonology.

Of course the essay is pop-linguistics, addressed to a general audience, so the author has every right to simplify things for didactic convenience. He justly debunks the all-to-popular idea of English as the “model” language, so ordinary that it can be regarded as a safe testing-ground for linguistic theories (“let’s consider any language – for example, English”). However, in doing so, he errs on the opposite side, trying to make English look more extraordinary than it really is. English does have its structural idiosyncrasies, but so does just about any other human language. Tsakhur (a Northeast Caucasian language) has ‘tourquoise’ as a basic colour term (it’s also weird in having at least about 70 consonant phonemes); Czech is pretty much unique in having a fricative alveolar trill as a phoneme (a sound so rare that the International Phonetic Association has not yet come up a convenient symbol to transcribe it); Hawaiian has [t] and [k] as variants of the same phoneme in its extremely small inventory of consonants; the West !Xóõ language (in Namibia) has 43-111 different clicks (depending on how you analyse the system) in addition to a few dozen other consonants; Winnebago (Siouan) places the main stress on the third mora in longer words, while Macedonian (Slavic) regularly stresses the antipenultimate syllable; in Imonda (in Papua New Guinea) singular and dual nouns are marked with special endings but plurals are expressed as bare stems; Hungarian has 18 noun cases and two basic colour terms for different kinds of ‘red’. Pirahã (in Amazonas, Brazil) has a dozen phonemes (at most), no numerals, and no basic colour terms; the jury is still out on whether it has embedded clauses. On the other hand, it has a rich verb morphology, with an unusually large number od aspects and several shades of evidentiality (expressing the source/reliability of information). There’s a lot of weirdness out there.

The fact is that the total weirdness of a language is not a quantifiable notion. It makes little sense to say that one language is generally weirder than another (as opposed to being weirder in some particular respect). Caprices of history have elevated English to the status of global lingua franca. It doesn’t owe its unique position to any structural features, although the fact that it has an enormous population of speakers is relevant for its current and future evolution. Yes, it has many eccentric features but hardly represents an extreme type of language. “English is not normal”, while a catchy title, is at best a trivial statement that could be true of any language (if you concentrate exclusively on a few selected oddities).

23 September 2015

Nucg Nucg, Winc Winc: The Anglo-Saxon Dairy Business

Those of my visitors who know something about Old English poetry may have realised that the link between the F-word and churning butter (see the previous post) is not just etymological – it’s a literary allusion.  Among the famous Anglo-Saxon riddles preserved in the Exeter Book we find the following one (Riddle 54):

Hyse cwom gangan,    þær he hie wisse
stondan in wincsele,    stop feorran to,
hror hægstealdmon,    hof his agen
hrægl hondum up,    <hrand> under gyrdels
hyre stondendre    stiþes nathwæt,
worhte his willan;    wagedan buta.
Þegn onnette,    wæs þragum nyt
tillic esne,    teorode hwæþre
æt stunda gehwam    strong ær þon <hio>,
werig þæs weorces.    Hyre weaxan ongon
under gyrdelse    þæt oft gode men
ferðþum freogað   ond mid feo bicgað.

An Anglo-Saxon churn lid, with the Freudian hole
The Exeter Book (written more than one thousand years ago) is the largest extant anthology of Old English poetry. It contains diverse stuff, from solemn religious and allegorical poems, saints’ lives, elegies and fragments of heroic legends to comic, somewhat naughty, light compositions, such as Riddle 54. There are as many as 96 Old English riddles in the manusctipt (the genre is hardly documented in any other source). Many of them have very serious religious solutions, but certainly not this one. Good translations of the riddles are hard to get by. Much is lost in translation, and humour is usually the first victim. A specialist can always enjoy the original, but for the sake of those whose Old English is not very fluent I’m going to offer my own translation, for what it’s worth. At least it isn’t a horrible mistranslation (some others are) and it tries to capture the spirit of the original. I also hope it isn’t too stilted (for a piece of Old English verse).

Some things are practically untranslatable. For example, Old English had grammatical gender, and the use of feminine personal pronouns (corresponding to Modern English she and her) doesn’t mean that the pronoun indicates a female human being. It can refer to any object whose Old English name is a feminine noun (e.g. tunge ‘tongue’, bōc ‘book’,  duru ‘door’, etc.). It may suggest a woman, but since the alternative possibility is also probable, the suggestion is much weaker than in Modern English. This subtle ambiguity would be lost completely if she were replaced by it, so I let it stay. Just remember that in Modern English not only ships but also some tools and utensils can be conventionally personified by their users and referred to as “she”. It isn’t quite the same thing as Old English grammatical gender, but must suffice to justify my artistic licence.

Another problem is that Old English is a dead language and its written record if far from perfect. The words in angle brackets represent editorial emendations in places where the text seems to be corrupt. The first of the restored forms, <hrand> actually reads rand in the manuscript, but this can’t be the word intended by the poet. The rules of Old English poetic alliteration demand something beginning with h in the first stressed position of the second half of the line. The most likely emendation is hrand. Unfortunately, such a word-form does not occur anywhere else in the entire Old English text corpus. The context requires a verb in the past tense here. A past tense like hrand presupposes the infinitive *hrindan, past tense plural *hrundon, past participle *hrunden, etc. But what might they mean? Not only is the verb otherwise unknown from Old English; it has left no Middle of Modern English descendants either. To use a technical Greek term, it’s a hapax legomenon, a word appearing only once.

There’s nothing wrong with being a hapax. It’s the inevitable consequence of the fact that words have wildly different frequencies of use (a common motif in my blog posts). In fact, in any large corpus of texts at least about 40% of the words (types, not tokens) occur only once. The same is true of Old English: more than half of the entries in any more-or-less complete Old English dictionary occur only once or twice in the surviving texts. So hrand is not anything unusual, just a little enigmatic.

What about possible cognates in other Germanic languages? We have Old Icelandic hrinda (past tense hratt < *hrant < *hrand) whose precise meaning is known: ‘push, hurl down’ and, figuratively, ‘launch’ or ‘expel, get rid of’ (the verb has survived in Modern Icelandic and Faroese). The literal meaning roughly fits the context of Riddle 54. Most Modern English translations use thrust; I prefer shove because of its greater semantic overlap with Scandinavian hrinda, and also for the sake of alliteration. Last but not least, shove is less dignified than push or thrust, and has the kind of colloquial vigour they lack, which is an advantage in this case. All right, I’ve never tried it before, so here goes!*)

A lad came walking    to where, as he knew,
she stood in a corner;    stepped in from afar,
a brisk bachelor,    tucked up his own
shirt with his hands,    shoved under the girdle
of the one standing    a stout thingumajig
and worked his will;    both rocked back and forth.
The servant quickened up:    at times he was of use,
a handy workman,    he grew weaker though
with every stroke,    strenghtless too soon,
weary from work.    There began to form
under her girdle    that which good men often
dearly desire    and procure with money.

And the solution is yes, yes, you’ve guessed correctly!  a butter churn, that is OE ċyrn. By the way, this word occurs three times in Old English texts: once as cyrin (sg.), once as cyrne (pl.), and once as cirm (misspelt by the scribe). As you can see even the citation forms that we use for convenience represent “Standard Old English” imposed by modern dictionary editors rather than the actual language of the manuscripts.

An early 20th-century postcard
Needless to point out, ĊYRN [wink wink, nudge nudge, say no more, say no more] is the “formal” solution of the riddle. The informal one is as obvious to us as it was to any Anglo-Saxon audience in the tenth century. Other ambiguous riddles in the Exeter Book exploit the same risqué ambiguity: the alternative interpretation is invariably bawdy. Their innuendo-laden humour may be crude, but it still appeals to the modern reader. For the survival of the whole collection we are indebted to Leofric, Bishop of Exeter, a well-educated bibliophile, who died in 1072, bequeathing his impressive manuscript collection to Exeter Cathedral. He apparently did not regard the riddles as subversive enough to be denied the shelter of the cathedral library. Riddle 54 helps us to understand why, back in 1290, a chap from Ipswich, presumably a local dairyman, was called Simon Fukkebotere. It offers us a glimpse into the secret world of naughty associations that existed in the minds of Anglo-Saxon scribes and their audience (and still exist in ours), so we are not making things up when we hypothesise that the original meaning of fuck was ‘strike repeatedly’. Who knows, perhaps the speakers of Old English could use the same word for churning and, with less innocent intent, for [know what I mean? nudge nudge] the other thing.

13 September 2015

The Middle English Dictionary Needs a Fucking Update

Sorry, but I have to comment on this topic.  The news has already spread across the Internet, arousing the interest of several bloggers:

here, here, here, etc.

Somewhere among the indictment rolls of the county court of Chester (1310/11), studied by Dr. Paul Booth of Keele University (Staffordshire), a man whose Christian name was Roger is mentioned three times. His less Christian byname is recorded as well, with minor orthographic variations. The repetition guarantees that what the name contains is not an artefact resulting from a spelling mistake but the real thing: to wit, the man’s full name was Roger Fuckebythenavele. Though Roger was finally outlawed by the court and never heard of again, his legacy will make a lasting impact on English word studies. Not only does his second name move back the earliest attestation of fuck in its modern sense by many decades; it also, for the first time, establishes it as a bona fide Middle English word. Inevitably, the question will be raised again whether fuck is a native English word (a view defended, among others, by Lass 1995) or a relatively late newcomer (as argued e.g. by Liberman 2007: 78-87).

Like dog (attested once about 1050 and then again some 150 years later) and shark (attested once in 1442 and then again in 1569), fuck has a “ghost lineage” – a long attestation gap during which it must have existed, although no record of its use has survived. We do see several occurrences of fucke in 13th-century bynames like Fuckebotere (= “Fuckbutter”, 1290) and Fuckebeggar (1286/87), but in these, the verb seems to mean, respectively, ‘churn, beat’, and ‘punch, hit’ rather than you-know-what. Semantic associations leading from such meanings to the rather obvious sexual connotations of Roger’s bizarre cognomen are pretty natural, though. The coexistence of both ca. 1300 suggests that the use of fuck for sexual intercourse is a semantic specialisation which took place a long time ago. We find it not only in English but also (perhaps independently) in a few other Germanic languages. For details, you may consult the etymological information in the beautifully updated entry in the OED.

Fucking, Austria (probably unconnected)
I side with those who believe that fuck is old and has a respectable Germanic pedigree. The stem *fukkō-, with its characteristic double consonant, is easy to explain as a Germanic iterative verb – one of a large family of similar forms. They originated as combinations of various Indo-European roots with *-nah₂-, a suffix indicating repeated action. The formation is not, strictly speaking, Proto-Indo-European; the suffix owes its existence to the reanalysis of an older morphological structure (reanalysis happens when people fail to analyse an inherited structure in the same way as their predecessors). Still, verbs of this kind are older than Proto-Germanic.

One particularly clear example is English lick from Old Englich liccian < PGmc. *likkō-. Numerous cognates in other Indo-European languages show unambiguously that the PIE root was *leiǵʰ- ‘lick’. The expected Germanic reflex of *ǵʰ is a voiced fricative or stop (*ɣ/*g) resulting from the operation of Grimm’s Law. A different development in this case was caused by the suffix *-nah₂- , attached to the root in pre-Germanic times to yield *liǵʰ-náh₂- ‘lick (repeatedly)’. The root occurred in the reduced grade since the suffix carried the accent. After an unaccented syllable, the sequence *-ǵʰn- changed into *-gg-, which, as Grimm’s Law completed its course, became Proto-Germanic *-kk- (if the preceding vowel was short).

Many historical linguists don’t accept this development, known as Kluge’s Law (discovered more than a century ago but neglected for many decades). In recent years, however, so much evidence has been collected to support it that it seems unfair to call it “controversial” (if not something worse) any longer. The outcome of Kluge’s Law is the same for originally voiceless, voiced and “aspirated” (breathy-voiced) Indo-European stops: all of them yielded a voiceless geminate (double consonant) in the environment in which the law applied. After a long vowel or diphthong, however, the geminate was simplified, leaving a single voiceless stop.

There was a Proto-Indo-European root usually reconstructed as *peug- (or possibly *peuǵ-), meaning ‘stab, hit’ (cf. Latin pungō ‘pierce’, pūgnus ‘fist’, pūgna ‘fight’, pugil ‘boxer’; Greek púgmē ‘fist, fist-fight’). In combination with the *-náh₂- suffix we would get *pug-náh₂- > PGmc. *fukkō- ‘strike repeatedly, beat’ (like, say, “dashing” the cream with a plunger in a traditional butter churn). Note also windfucker and fuckwind – old, obsolete words for ‘kestrel’.

A number of words in other Germanic languages may be related to fuck. One of them is Old Icelandic fjúka ‘to be tossed or driven by the wind’ < *feuka-; cf. also fjúk ‘drifting snowstorm’ (or, as one might put it in present-day English, a fucking blizzard). These words fit a recurrent morphological pattern observed by Kroonen (2012): Germanic iteratives with a voiceless geminate produced by Kluge’s Law often give rise to “de-iterativised” verbs in which the double stop is simplified if the full vocalism or the root (here, *eu rather than *u) is restored.

If the verb is really native (“Anglo-Saxon”), one would expect Old English *fuccian (3sg. *fuccaþ, pl. *fucciaþ, 1/3sg. preterite *fuccode, etc.). If these forms already had “impolite” connotations in Old English, their absence from the Old English literary corpus is understandable. We may be absolutely sure that *feortan (1/3 sg. pret. *feart, pret. pl. *furton, p.p. *forten) existed in Old English, since fart exists today (attested since about 1300, just like fuck) and has an impeccable Indo-European etymology, with cognates in several branches. Still, not a single one of these reconstructed Old English verb forms is actually documented (all we have is the scantily attested verbal noun feorting ‘fart(ing)’).

One has to remember that written records give us a strongly distorted picture of how people really spoke in the past. If you look at the frequency of fuck, fucking and fucker in written English over the last 200 years, you may get the impression that these words disappeared from English completely ca. 1820 and magically reappeared 140 years later. Even the first edition of the Oxford English Dictionary (whose ambition was to be exhaustive) pretended they didn’t exist. The volume that should have contained FUCK was published in 1900, and Queen Victoria was still alive.

Google books Ngram Viewer

Booth, Paul. 2015. Roger the incompetent copulator is outlawed, 28th September 1311. [Academia.edu].

Kroonen, Guus. 2012. Consonant gradation in the Germanic iterative verbs. In: Benedicte Nielsen Whitehead et al. (eds.), The sound of Indo-European: Phonetics, phonemics, and morphophonemics, Copenhagen: Museum Tusculanum Press, 263-290.

Lass, Roger. 1995. Four letters in search of an etymology. Diachronica 12(1): 99-111.

Liberman, Anatoly. 2007. An analytic dictionary of English etymology: An introduction. Minneapolis: University of Minnesota Press.

06 September 2015

A Jan’s Chance: The Fate of Innovations

Imagine that you start a linguistic innovation. One fine day you decide to replace the English word dog with a new, hitherto unused word — for example, jan. As of now, you will say, “I have to walk the jan”, “My jan’s name is Bruno”, and, “The jan is man’s best friend”. You will substitute jan for dog in set phrases such as “go to the jans” and “every jan has its day”. Jan would do its job neither better nor worse than dog. Both are arbitrary sound sequences (their pronunciation does not suggest what they mean); both are short and easily pronounceable. Dog has only one obvious advantage over jan: it is already an established, familiar, commonly used English word. There is no compelling reason why people should find it a good idea to abandon it just like that and learn to use a different word for the same concept. If you are really determined (and perhaps slightly nuts), you can try persuading your family and close friends to humour you and adopt your innovation when they are talking to you. You can bring up your children informing them that your family pet Bruno is a jan. But sooner or later they will find out that everybody else calls jans (including Bruno) dogs. Your experiment will almost certainly fail. Not because the word jan is useless, but because the function you’d like it to have is already carried out equally well by another word. It makes jan a “neutral” innovation — one that could play its role well enough but has no functional advantage over a preexisting competitor.

On the other hand, something similar to this thought-experiment really happened about one thousand years ago. The word docga (the Old English ancestor of dog), coined by an unknown innovator at an unknown date*), somehow became a widespread synonym of the established Old English word hund, and after a few centuries managed to replace it in the mental lexicon of every English-speaker of the time. Although its dethroned predecessor did not become completely obsolete, its frequency of use dropped by at least an order of magnitude, and it had to undergo narrow semantic specialisation in order to survive. Today, a hound is a special type of hunting dog, not just any dog in general. And if you look at other languages, you will occasionally see similar cases of lexical replacement. French chien and Italian cane go back to Latin canis, as expected, but Spanish perro is an innovation (about as mysterious as dog). It seems some new words for old things do catch on, albeit rarely. The chances are slim but apparently larger than zero.

A selfie with a jan (whose name is not Bruno)
A lexical innovation is more likely to succeed if it finds and conquers a functional niche not yet occupied by any other word. In this way it makes itself useful, which may give people a powerful incentive to adopt it. For example, the word selfie made its first recorded appearance in September 2002, in Australia (or rather in the Australian sector of cyberspace). Within the next few years it grew popular among (mostly young) English-speaking Internet users worldwide, slowly gaining the status of buzzword. Then it infected Facebook communities and its popularity soared to the zenith (as did the number of selfies published online). In 2013 the Oxford English Dictionary declared it the word of the year.

How is it possible for an innovation to become “fixed” in a large speech community? How do the the chances of fixation depend on the functional value of the innovation? What is that functional value? What happens to innovations that have enjoyed some success  but haven’t yet reached fixation? This is what my next blog posts will be about.

*) Nobody knows for sure where Old English docga came from. My own modest etymological proposal can be found here.

06 November 2014

Second-Language Reciprocity

Here is a fascinating infographic presentation posted at Lovely Little Lexemes (hat tip to Mrs. B!). A curious (and probably unique) relationship can be observed between the United Kingdom and Poland: the most common second language in the UK is Polish, and the most common second language in Poland is English.

Click here to see an enlarged version.

07 October 2014

Two Is Company, Four Is a Party

Neuter nouns with the suffix *-wr̥/*-w(e)n- are relatively rare in most branches of Indo-European. The only group where they can be found in great numbers is Anatolian. In Hittite, the suffix productively  formed verbal nouns (names of actions), but there are also examples of nouns that had  become independent lexical units, no longer bound to a particular verb paradigm. They had usually acquired a concrete meaning (referring to a thing or substance rather than an abstraction). One of such nouns is Hitt. pahhur/pahhuen‘fire’, evidently an ancient word, preserved in many branches of the family and showing evidence of archaic vowel alternations and mobile stress: nom/acc.sg. *páh₂wr̥, gen.sg. *ph₂wéns, etc. It may be etymologically connected with the verb *pah₂- ‘guard, protect’, but it’s doubtful if even the speakers of Hittite were still aware of any such connection: the semantic distance between the verb and its derivative was already too great.

Outside Anatolian, the suffix does not play any major role. The nouns that contain it are scattered remnants of a Proto-Indo-European pattern of word-formation. Their attestation is very uneven. They are quite well represented in Sanskrit and Greek, but only isolated examples are found elsewhere (the ‘fire’ word, which became part of Indo-European basic vocabulary sufficiently early, is exceptionally well attested). Here are a few typical *-wr̥/*-w(e)n- nouns evidently connected with known verb roots:

  1. *h₂árh₃-wr̥, gen. *h₂r̥h₃-wén-s  ‘arable land’ (root *h₂arh₃- ‘till, plough’);
  2. *snéh₁-wr̥, gen. *sn̥h₁-wén-s ‘string, sinew’ (root *(s)neh₁- ‘spin, twist’);
  3. *séǵʰ-wr̥, gen. *sǵʰ-wén-s ‘steadfastness’ (root *seǵʰ- ‘conquer, take possession of; hold, own’);
  4. *h₁éd-wr̥, gen. *h₁d-wén-s ‘food’ (root *h₁ed- ‘eat’).

Their reflexes in the historically documented languages rarely display the whole range of vowel, consonant and stress variations, most of which were levelled out analogically in prehistoric times. Still, these alternations are reconstructible thanks to the fact that different fragments of the pattern have been preserved in different languages. They can be reassembled into a complete picture like the pieces of a jigsaw puzzle or the disarticulated skeleton of a fossil animal.

Got wheels?
A four-wheeled toy from the Cucuteni-Trypillian culture;
the early fourth millennium BC.
Neuters of this kind formed collectives by inserting a lengthened *ō into the suffix. The collective of a count noun denotes simply a set of objects (a collective plural), while the collective of a mass noun like ‘fire’ denotes a particular quantity or sample of the thing in question (‘a fire, a burning mass’). This became one of the derivational mechanisms by which Indo-European mass nouns could be transformed into count nouns. The accent was commonly shifted to the suffix in the process, causing the reduction of the root vowel: *páh₂wōr (collective) > *ph₂wṓr > *pwṓr (a countable neuter with its own case forms such as gen.sg. *p(h₂)un-és). Still later, the distiction between the original mass noun and its collective could be blurred and abandoned, the younger form ousting the older and serving in both functions (‘fire’ or ‘a fire’). The archaic Proto-Indo-European form *páh₂wr̥ is unambiguously preserved only in Anatolian, while the remaining Indo-European languages show reflexes of *pwṓr or its further modified descendants.

Now we can view the reconstruction *kʷét-wr̥ in this light. Supposing it was derived from our hypothetical verb root *kʷet- ‘group into pairs’, the original meaning of *kʷétwr̥ (as a nomen actionis) would be something like ‘pairing’, and its collective *kʷétwōr would mean ‘a particular result of pairing, a complete set organised into pairs’. In the Proto-Indo-European world, there were many “natural” sets of things conceptualised as consisting of two pairs: human hands and feet; fore and rear legs of animals; the wheels of a wagon; the four directions, whether cardinal (east and west, north and south) or relative (forward and backwards, left and right); paired organs of perception (two eyes and two ears). This could have provided sufficient motivation for treating ‘4’ as the prototypical case of an “even collective”. An interesting parallel can be seen in the “fraternal” numeral systems widespread in Amazonia. In the languages that employ them, the numeral ‘4’ is derived from an expression meaning ‘each has a brother/companion/spouse’. At a more primitive stage, preserved in the Dâw language, there are only three “exact” lexical numerals, ‘1’, ‘2’, and ‘3’. The values from 4 to 10 are described as ‘even’ (‘has a brother’) or ‘odd’ (‘has no brother’). The precise value can’t be expressed linguistically, but the words ‘even’ and ‘odd’ can be supplemented by clarifying hand gestures:
Dâw speakers indicate ‘four’ by holding the fingers of one hand separated into two blocks; for ‘five’, they add the thumb; for ‘six’, they place the second thumb against the first to make a third pair; and so on until for ‘ten’ all fingers are grouped into five pairs, the thumbs together.
[Epps 2006: 265]
Once established as a concrete numeral (rather than part of an even-odd tally system), *kʷétwōr (or *kʷətwṓr) was interpreted as an ordinary neuter plural, and – like the numerals ‘1’, ‘2’, and ‘3’ – formally an adjective, inflected not only for case but also for gender. This resulted in the analogical creation of the animate plural in *-wor-es (and the periphrastic feminine ‘four females’, soon univerbated and phonetically mutilated in the process). Note that if the adjective had been formed directly from the verbal noun *kʷétwr̥/*kʷ(ə)twén-, its animate plural would probably have ended up as *kʷet-won-es. In addition to the Greek and Vedic words for ‘fat’, already discussed, compare Greek peîrar (gen. -atos) ‘boundary’ < *pér-wr̥/*pr̥-w(e)n- versus the Homeric adjective a-peírōn (animate) ‘boundless, endless’ < *n̥-per-wōn.

All this suggests that the word *kʷétwr̥ (coll. *kʷétwōr) was transparently derived from a verb root and adopted as a cardinal numeral at a rather late date, perhaps in “Core Indo-European” (the non-Anatolian part of the family) rather than in Proto-Indo-European proper. It is a well-known fact that Anatolian has a different word for ‘4’, *meju- (Hittite meu-/meyau-, Luwian māwa-). Since the jury is still out on whether Hittite kutruwa(n)- ‘witness’ has anything to do with the numeral ‘4’*), we should seriously consider the possibility that the familiar reconstruction *kʷetwores is not Proto-Indo-European at all but represents a “dialectal” innovation which replaced its older synonym in the common ancestor of Tocharian and the extant branches of the family.

If this were a journal article rather than a blog post, I would now be obliged to account for every puzzling irregularity in the branch-specific reflexes of *kʷetwores and its variants. I will spare my visitors such excruciating details, but if anyone is really interested in discussing them, welcome to the Comments section.

And now back to other matters – next time.

*) A witness in court could be denoted as ‘the fourth man’ (beside the two contracting parties and the judge).


Epps, Patience. 2006. “Growing a numeral system: The historical development of numerals in an Amazonian language family”. Diachronica 23(2): 259-288. [a preprint version is available here]