06 November 2014

Second-Language Reciprocity

Here is a fascinating infographic presentation posted at Lovely Little Lexemes (hat tip to Mrs. B!). A curious (and probably unique) relationship can be observed between the United Kingdom and Poland: the most common second language in the UK is Polish, and the most common second language in Poland is English.

Click here to see an enlarged version.

07 October 2014

Two Is Company, Four Is a Party

Neuter nouns with the suffix *-wr̥/*-w(e)n- are relatively rare in most branches of Indo-European. The only group where they can be found in great numbers is Anatolian. In Hittite, the suffix productively  formed verbal nouns (names of actions), but there are also examples of nouns that had  become independent lexical units, no longer bound to a particular verb paradigm. They had usually acquired a concrete meaning (referring to a thing or substance rather than an abstraction). One of such nouns is Hitt. pahhur/pahhuen‘fire’, evidently an ancient word, preserved in many branches of the family and showing evidence of archaic vowel alternations and mobile stress: nom/acc.sg. *páh₂wr̥, gen.sg. *ph₂wéns, etc. It may be etymologically connected with the verb *pah₂- ‘guard, protect’, but it’s doubtful if even the speakers of Hittite were still aware of any such connection: the semantic distance between the verb and its derivative was already too great.

Outside Anatolian, the suffix does not play any major role. The nouns that contain it are scattered remnants of a Proto-Indo-European pattern of word-formation. Their attestation is very uneven. They are quite well represented in Sanskrit and Greek, but only isolated examples are found elsewhere (the ‘fire’ word, which became part of Indo-European basic vocabulary sufficiently early, is exceptionally well attested). Here are a few typical *-wr̥/*-w(e)n- nouns evidently connected with known verb roots:

  1. *h₂árh₃-wr̥, gen. *h₂r̥h₃-wén-s  ‘arable land’ (root *h₂arh₃- ‘till, plough’);
  2. *snéh₁-wr̥, gen. *sn̥h₁-wén-s ‘string, sinew’ (root *(s)neh₁- ‘spin, twist’);
  3. *séǵʰ-wr̥, gen. *sǵʰ-wén-s ‘steadfastness’ (root *seǵʰ- ‘conquer, take possession of; hold, own’);
  4. *h₁éd-wr̥, gen. *h₁d-wén-s ‘food’ (root *h₁ed- ‘eat’).

Their reflexes in the historically documented languages rarely display the whole range of vowel, consonant and stress variations, most of which were levelled out analogically in prehistoric times. Still, these alternations are reconstructible thanks to the fact that different fragments of the pattern have been preserved in different languages. They can be reassembled into a complete picture like the pieces of a jigsaw puzzle or the disarticulated skeleton of a fossil animal.

Got wheels?
A four-wheeled toy from the Cucuteni-Trypillian culture;
the early fourth millennium BC.
Neuters of this kind formed collectives by inserting a lengthened *ō into the suffix. The collective of a count noun denotes simply a set of objects (a collective plural), while the collective of a mass noun like ‘fire’ denotes a particular quantity or sample of the thing in question (‘a fire, a burning mass’). This became one of the derivational mechanisms by which Indo-European mass nouns could be transformed into count nouns. The accent was commonly shifted to the suffix in the process, causing the reduction of the root vowel: *páh₂wōr (collective) > *ph₂wṓr > *pwṓr (a countable neuter with its own case forms such as gen.sg. *p(h₂)un-és). Still later, the distiction between the original mass noun and its collective could be blurred and abandoned, the younger form ousting the older and serving in both functions (‘fire’ or ‘a fire’). The archaic Proto-Indo-European form *páh₂wr̥ is unambiguously preserved only in Anatolian, while the remaining Indo-European languages show reflexes of *pwṓr or its further modified descendants.

Now we can view the reconstruction *kʷét-wr̥ in this light. Supposing it was derived from our hypothetical verb root *kʷet- ‘group into pairs’, the original meaning of *kʷétwr̥ (as a nomen actionis) would be something like ‘pairing’, and its collective *kʷétwōr would mean ‘a particular result of pairing, a complete set organised into pairs’. In the Proto-Indo-European world, there were many “natural” sets of things conceptualised as consisting of two pairs: human hands and feet; fore and rear legs of animals; the wheels of a wagon; the four directions, whether cardinal (east and west, north and south) or relative (forward and backwards, left and right); paired organs of perception (two eyes and two ears). This could have provided sufficient motivation for treating ‘4’ as the prototypical case of an “even collective”. An interesting parallel can be seen in the “fraternal” numeral systems widespread in Amazonia. In the languages that employ them, the numeral ‘4’ is derived from an expression meaning ‘each has a brother/companion/spouse’. At a more primitive stage, preserved in the Dâw language, there are only three “exact” lexical numerals, ‘1’, ‘2’, and ‘3’. The values from 4 to 10 are described as ‘even’ (‘has a brother’) or ‘odd’ (‘has no brother’). The precise value can’t be expressed linguistically, but the words ‘even’ and ‘odd’ can be supplemented by clarifying hand gestures:
Dâw speakers indicate ‘four’ by holding the fingers of one hand separated into two blocks; for ‘five’, they add the thumb; for ‘six’, they place the second thumb against the first to make a third pair; and so on until for ‘ten’ all fingers are grouped into five pairs, the thumbs together.
[Epps 2006: 265]
Once established as a concrete numeral (rather than part of an even-odd tally system), *kʷétwōr (or *kʷətwṓr) was interpreted as an ordinary neuter plural, and – like the numerals ‘1’, ‘2’, and ‘3’ – formally an adjective, inflected not only for case but also for gender. This resulted in the analogical creation of the animate plural in *-wor-es (and the periphrastic feminine ‘four females’, soon univerbated and phonetically mutilated in the process). Note that if the adjective had been formed directly from the verbal noun *kʷétwr̥/*kʷ(ə)twén-, its animate plural would probably have ended up as *kʷet-won-es. In addition to the Greek and Vedic words for ‘fat’, already discussed, compare Greek peîrar (gen. -atos) ‘boundary’ < *pér-wr̥/*pr̥-w(e)n- versus the Homeric adjective a-peírōn (animate) ‘boundless, endless’ < *n̥-per-wōn.

All this suggests that the word *kʷétwr̥ (coll. *kʷétwōr) was transparently derived from a verb root and adopted as a cardinal numeral at a rather late date, perhaps in “Core Indo-European” (the non-Anatolian part of the family) rather than in Proto-Indo-European proper. It is a well-known fact that Anatolian has a different word for ‘4’, *meju- (Hittite meu-/meyau-, Luwian māwa-). Since the jury is still out on whether Hittite kutruwa(n)- ‘witness’ has anything to do with the numeral ‘4’*), we should seriously consider the possibility that the familiar reconstruction *kʷetwores is not Proto-Indo-European at all but represents a “dialectal” innovation which replaced its older synonym in the common ancestor of Tocharian and the extant branches of the family.

If this were a journal article rather than a blog post, I would now be obliged to account for every puzzling irregularity in the branch-specific reflexes of *kʷetwores and its variants. I will spare my visitors such excruciating details, but if anyone is really interested in discussing them, welcome to the Comments section.

And now back to other matters – next time.

*) A witness in court could be denoted as ‘the fourth man’ (beside the two contracting parties and the judge).


Epps, Patience. 2006. “Growing a numeral system: The historical development of numerals in an Amazonian language family”. Diachronica 23(2): 259-288. [a preprint version is available here]

02 October 2014

Only Connect: The Strange Triangle

The Latin adjective triquetrus ‘triangular’ (neuter -um, feminine -a) is baffling. It’s obviously a compound, and it obviously contains the compositional form of the numeral ‘three’, *tri-. What else it contains is anything but obvious. Unfortunately, it’s the only specimen of its kind. The mysterious element -quetrus does not occur in any other Latin compound. It looks as if it could have something to do with quattuor ‘four’. When ‘four’ occurs as the first part of a compound, it has the shape quadru/i-. This form must somehow go back to *kʷətwr̥-, its metathetic variant *kʷətru-, or a hybrid combination of both, but the voicing of the *t is odd, not to say perverse, because its exact opposite, *dr > tr, was a regular change in the prehistory of Latin. The word ‘four’ is evidently such a fickle fellow that it just can’t resist breaking some established rules. For greater inconsistency, the adverbial numeral quater ‘four times’, which in other IE languages (and presumably in Latin as well) derives from *kʷ(e)twr̥-s ~ *kʷ(e)tru-s, shows no voicing. We see a voiced stop again, though, in the denominal verb quadrō ‘to square; put in order, arrange’ and a few related words such as quadra ‘square piece or slice, plinth, dining table, etc.’ and quadrātus ‘square (n. and adj.)’.

Some connections are impossible.
The second part of triquetrus doesn’t simply reflect *kʷetru- (or *kʷatru- < *kʷətru-), because the word is a second-declension o-stem, which means that its pre-form ended in *-tro- rather than *-tru-. The form *kʷetro- (or possibly *kʷatro-, since pre-Latin *a would have merged with *e in this position) does not otherwise occur as a variant of ‘4’ in Latin, but since we are dealing with a capricious word-family, it’s hard to rule out a connection. If it does mean ‘four’, however, why’s that? A triangle has three sides, it has three angles, but has it got three “fours”? It would not be strange if a word for the right angle had something to do with squares or rectangles, and therefore indirectly with the numeral ‘4’, but a triangle can have at most one right angle, certainly not as many as three (the Penrose tribar, shown on the right, would be an exception if it could exist in ordinary Euclidean space).

Can external cognates help? It’s tempting to compare triquetrus with Old English þrifeoþor (sometimes glossed as ‘triangular’ in reference books such as Bosworth and Toller’s Anglo-Saxon Dictionary). It has been suggested earlier by one of the commenters on this blog [Douglas G. Kilday] that the Old English word is a loan from (unattested) Gaulish *petros ‘corner’ (< *kʷetros), which became Germanic *feþra- after the operation of Grimm’s Law. This tantalising suggestion, however, can’t be correct. The word þrifeoþor appears in Old English glossaries (Corpus, Erfurt, and Épinal) three times (spelt ðrifeoðor, trifoedur, ðrifedor), and is translated into Latin as triquadrum. One might think that triquadrum is a distortion of triquetrum caused by “folk etymology” (the mistaken identification of the second part as the compositional form of ‘4’), but in fact it’s no such thing. Old English authors took the adjective triquadrus from Orosius, a Christian priest and scholar from the Roman province of Gallaecia (today’s Galicia, Spain). Orosius, active in the first decades of the 5th century, was the author of several enormously influential works, including  Historiae Adversus Paganos, with a chapter on the geography of the world. Here is the relevant passage (Book 1, Chapter 2; emphasis added):
Maiores nostri orbem totius terrae, oceani limbo circumsaeptum, triquadrum statuere eiusque tres partes Asiam Europam et Africam uocauerunt, quamuis aliqui duas hoc est Asiam ac deinde Africam in Europam accipiendam putarint.
[Our elders made a threefold division of the world, which is surrounded on its periphery by the Ocean. Its three parts they named Asia, Europe, and Africa. Some authorities, however, have considered them to be two, that is, Asia, and Africa and Europe, grouping the last two as one continent.]
The epithet triquadrus refers to “the circle of all the earth” (orbis totius terrae = the world). Orosius certainly doesn’t mean that the Earth is a triangular circle, or that it has three corners. He means that the landmass of the world (as he knew it) is tripartite, divided by most ancient geographers into three continents (in this context, quadra means ‘part, division, area’, not literally a square). Anglo-Saxon translators coined a calque, mechanically replacing Latin quadr- with feoþor- < *kʷetwr̥-, the compositional form of Old English fēower ‘four’. Þrifeoþor was never intended to mean ‘triangular’. Its second member is the same feoþor- (= Late West Saxon fiþer-, fyþer-) that we find as the first element in numerous Old English compounds, e.g. fiþerfēte ‘four-footed’ (= Latin quadrupēs).

External support for *kʷetro- thus evaporates, but triquetrus still has to be explained somehow. I would suggest that its second element is a derivative of *kʷet- ‘join pairwise’ with the instrumental suffix *-tro-. When the suffix was added to a root ending in a dental stop, the last segment of the root was dropped already in Proto-Indo-European (this process is known as “the metron rule”). Thus we get *métrom (Greek métron ‘measure’) from *méd-trom (*med- ‘allot, mete out’), and *h₁étrom (Vedic átra- ‘nourishment’) from *h₁éd-trom (*h₁éd- ‘eat’). The noun *kʷétrom < *kʷet-trom would be ‘something that holds a pair of things together’, hence ‘joint, connection’ or the like. There were several Proto-Indo-European roots with similar meanings, and accordingly several nearly synonymous nouns for things like woodworking joints; joint itself comes (via French) from Latin iunctus ‘connected’ (the root here is *jeug-, as in yoke). Tri-quetrus (< *tri-kʷetro-) is built exactly like tri-angulus (a noun is used as the second member of a compound adjective without altering its stem class), and its etymological meaning is ‘having three connections (between pairs of sides)’.

The next post, in which I shall return to the numeral ‘four’ itself, will be the last in this series.

[back to the table of contents]

29 September 2014

Forgotten Derivatives and Their Sexual Implications

What kind of noun is čët? What is its relationship to our hypothetical verb root? One cannot avoid asking such questions when proposing an etymology. A word is more than a root; it has a derivational history. If you add an affix to a word, you may alter its lexical category and its meaning of the base. We already know a good deal about morphological processes in the Indo-European languages, which means that we can tell plausible relationships between possibly related words from unlikely ones.

Let R be a root morpheme. In Proto-Indo-European (and in many of the languages descended from it), a root consists of a consonantal skeleton with a slot where a vowel can be inserted. For example, the verb root *{w_rǵ} ‘make, work’ is normally quoted in the form *werǵ-, called its e-grade, symbolised as R(e). Here, the slot is occupied by the vowel *e. The same root also forms an o-grade, R(o), realised as *worǵ-, and a zero grade, R(z), in which the vowel slot remains empty. In that case, the liquid *r, sandwiched between two other consonants, has to play the role of a syllable nucleus, and the root becomes phonetically *wr̥ǵ- (in the traditional Indo-Europeanist notation, a tiny subscipt ring marks a syllabic consonant).

One of the largest and most productive classes of PIE nominals (nouns/adjectives) were the so-called thematic nouns (also known as o-stems). Their stem ended in the vowel *-o-, to which inflectional endings were attached. In the simplest case, the vowel was added directly to the root; in more complex cases it was part of a suffix (such as *-to-, *-no-, *-tero-, *-tlo-, etc.). Somewhat surprisingly, “simple thematic”  nouns of the shape R(e)-o- were pretty rare in the protolanguage. The neuter action noun *wérǵ-o-m ‘work, activity’ is well supported by the agreement between Germanic *werka- (Old English weorc, German Werk) and Greek érgon; we also have Iranian (Avestan) varəza-, with the same stem (and meaning) but with masculine inflections. Very few such nouns, however, are truly old. More typically, the suffix *-o- was added to R(o), as in *wóiḱ-o- ‘house, dwelling’ (root *weiḱ- ‘enter, occupy’) and sometimes to R(z), as in *jug-ó- ‘yoke’ (root *jeug-, already mentioned in earlier posts).

Marc Greenber (2001) doesn’t define the morphological status of his reconstruction *kʷet- (‘two’ > ‘pair, partner’). In some places in the article he treats it as if it were a root noun (with no suffixes), but the simplest form we actually find in Slavic is represented by Russ. čët (cf. dialectal Polish cot), which appears to reflect a thematic masculine noun *kʷet-o-s ‘even number’. How could it have originated? If *kʷet- was once a verb root (with the approximate meaning of ‘arrange in pairs, pair up’), *kʷet-o- makes sense as a kind of action noun that has acquired a resultative interpretation: by pairing objects together, you end up with an even number of them. (By the way, the verb root is not entirely conjectural: we can see it in Russian četáť ‘form pairs’.) The problem  with *kʷet-o- is that it represents a rare type of stem, at least in terms of PIE morphology. Is it legitimate to posit it just like that?

On the other hand, *kʷet-o- needn’t go all the way back to PIE. The deverbal formation R(e)-o- has enjoyed increased productivity in Slavic. We even have doublets like R(o)-o- and R(e)-o-, where the o-grade variant is more conservative (and has more external cognates), while the e-grade seems to be a younger innovation (with a more restricted distribution).  Thus, the root *tekʷ- ‘run, flow’ has produced Slavic *tekъ (as if from *tekʷ-o-s) ‘waterflow, leak, source’, which coexists with *tokъ (< *tokʷ-o-s) ‘stream, current, flux; (figuratively) course, sequence of events’. The former is an innovation directly connected with the Slavic verb *tekti ‘leak, flow’ (3sg. *tečetь > *tékʷ-e-ti), whereas the latter is a relict form which has drifted away from its etymological base, also semantically. Therefore, if *četъ is a relatively recent derivative of a Proto-Slavic verb, it wouldn’t be surprising if it had an o-grade cousin (possibly with a more “evolved” meaning).

As a matter of fact, Greenberg mentions *kotъ ‘offspring (of animals), litter’ and *kotiti (sę) ‘have young’ as possible members of the same word-family. A connection with the homophonous noun *kotъ ‘domestic cat’ (a European Wanderwort which spread with the introduction of cats) is folk-etymological: the verb may be used of cats, but also of mice, sheep, goats, roe deer, and a variety of other animals. It is used even in those Slavic languages that have a different word for ‘cat’ (e.g. Serbo-Croatian mačka). The verb *kotiti could be an “iterative/causative” built to the root *kʷet-. The structure of such secondary verbs is R(o)-éje/o- (the final vowel of the stem alternates depending on which conjugational ending is added). For example, the Slavic verb *gъnati (3sg. *ženetь) ‘drive on, drive away, rush’ has a corresponding o-grade iterative, *goniti (3sg. *gonitь) ‘chase, run after’. These forms ultimately reflect PIE *gʷʰén-/*gʷʰn- ‘slay, kill with blows’ (a root verb, somewhat  restructured in Slavic) and its PIE iterative *gʷʰon-éje/o-. The verb *tekti (< *tékʷ-e/o-), mentioned above, forms a pair with the causative *točiti ‘cause to flow, (cause to) roll’  (< *tokʷ-éje/o-). Note also such English pairs as lie vs. lay, or sit vs. set, where the first member is a primary verb and the second is its causative (e.g. ‘lay’ = ‘cause to lie’).

The consequences of forming a pair.
[source; © gerald reiner]
The stem *kʷot-éje/o-, originally with middle-voice inflections (whose function was taken over by the reflexive/reciprocal pronoun * in Slavic), would mean ‘form a couple (together)’, hence ‘mate, have sex’, and eventually ‘reproduce, have young’. If so, *kotъ ‘litter’ is not a senior synonym of *četъ (with a hard-to-explain change of meaning), but more likely a separate verbal noun back-formed from *kotiti sę (the consequence of mating), on the analogy of formally similar denominal verbs: *agniti sę ‘yean’, *teliti sę ‘calve’, *žerbiti sę ‘foal’.

The feminine *četa can hardly be a collective (at any rate in the meaning ‘pair’). Not only because it refers to just two things, but also because collectives in *-ah₂ to o-stem masculines are an archaic formation in Indo-European (as opposed to neuter collectives, co-opted as ordinary plurals of neuter nouns and adjectives), and *četъ is unlikely to be sufficiently ancient. But Indo-European *-(a)h₂ was not only a collective suffix and a marker of femininity; it was also employed to coin (formally feminine) abstracts, including action nouns. Quite a few deverbal masculines in Slavic (and more generally in Balto-Slavic) have feminine synonyms like *čarъ ~ *čara ‘sorcery, enchantment’ or *-tokъ ~ *-toka ‘flow, course’, *-sěkъ ~  *-sěka ‘cutting’ (in compounds). Note the familiar morphological formations represented by Greek tómos ‘slice’ (result of cutting) versus tomḗ ‘cut’ (an instance of cutting) – a nice parallel to *četъ (resultative) vs. *četa (an individual instance of pairing).

In the first post of this series I suggested that the stem *kʷet-w(o)r- was originally a deverbal neuter of a familiar type. Before I develop this idea, let me briefly suggest one other possible trace of the root *kʷet-: the second member of the Latin compound triquetrus ‘triangular’. The next post will be about it.

[back to the table of contents]

26 September 2014

Twos and Troops: Sifting the Evidence

Jakobson’s remark about a possible connection between Russian čët and četýre is discussed in Blažek (1999: 212-213) and especially in Greenberg (2001). Both authors mention earlier, more sketchy treatments of the problem, and they both add more Slavic material to the Russian words originally listed by Jakobson (which were čët, čëtka ‘even number’, četá ‘pair, union’, and čeť ‘quarter’). Blažek also notes an interesting potential cognate in Ossetian, an Indo-European language spoken in the north-central Caucasus (Ossetian is the only living descendant of the Northeast Iranian languages once spoken by the Scytho-Sarmatian inhabitants of the Eurasian steppe belt). The word in question is cæd ‘pair of oxen yoked together’, as if from Proto-Iranian *čatā (the Digor dialect of Ossetian has preserved a more conservative disyllabic form of the word, cædæ).

Blažek does not follow up Jakobson’s suggestion (presumably because he favours a different etymology of ‘four’, proposed by Schmid 1989; see pp. 213, 215, 331 in Blažek’s book). Greenberg, however, regards it as convincing and develops it further. Like Blažek, he considers the predominantly South Slavic *četa ‘troop, military unit’ (hence Serbo-Croatian Četnici ‘Chetniks’) to be part of the word-family of čët, and tries to explain the accentual difference between the end-stressed word četá (< *četa̍) in Russian and the root-stressed South Slavic forms – Bulgarian čéta, Serbian/Croatian čȅta, Slovene čẹ́ta (< *čèta) – in order to defend their common origin.

According to Greenberg, the word ‘four’ is derived from the root *kʷet- meaning ‘two’ extended with a multiplicative suffix, so that *kʷet-wor- means ‘(two) groups of two, twice two’. Greenberg also speculates that Proto-Indo-European *kʷotero- ‘which (of two)?’ (Greek póteros, English whether) contains the same root. This is hardly a good idea, since there is no compelling reason to question the straightforward standard analysis of *kʷo-tero- as the interrogative pronoun *kʷo- plus *-tero-, the IE suffix of binary contrast. The semantic gap between ‘two’ and ‘military unit’ is bridged by Greenberg as follows: Slavic *četa originated as the collective (in *-ah₂) of a word meaning ‘two, pair’, and ‘multitude of pairs’ evolved into ‘troop, group, band (of soldiers)’.

Arranged in pairs
There are serious problems with this derivation. First, (East/West) Slavic *četъ means ‘even number’, not ‘two’ or ‘pair’, while, on the contrary, the supposedly collective četá can mean ‘pair’ in Russian (beside some related meanings: ne četá, accompanied by a dative, means ‘not on a par with, superior to…’). What appears to be its exact cognate in Ossetian means ‘pair of oxen’, not, say, ‘herd of cattle’. Furthermore, while it’s true that the semantics of Russian četá covers not only ‘pair’ but also ‘troop’ (the latter attested already in Old Russian), we are probably dealing with a lexical merger between a native East Slavic word and a borrowing from Church Slavic (Czech četa ‘platoon’ is likewise a South Slavic loan, as are, ultimately, a number of similar “wandering words” in various neighbouring languages – Romanian, Hungarian, Albanian, and even Turkish). The non-attestation of intermediate meanings like ‘double column (of soldiers)’ makes it hard to justify the derivation of ‘troop’ from ‘pair’. Since the semantic difference is combined with a formal difference (conflicting accentuation), the etymology simply falls apart. It seems reasonable to conclude that the contrast between *četa̍ and *čèta is old and distinguishes two words of different origin (notwithstanding their merger in Russian). [See this comment, however.]

Jakobson’s final hypothetical relative of ‘four’, čeť ‘fourth part (of land), quarter’ (Old Russian četь ~ četъka), is in all likelihood a popular truncation of četverť (~ četvertka) < Proto-Slavic *četvьrtь ‘quarter’ < *kʷetwr̥-ti-, a noun corresponding to the widespread ordinal *kʷetwr̥-to- ‘fourth’. It is of course related to ‘four’, but in a rather trivial manner.

Etymological dictionaries often attempt to connect četa (in either sense) with the Slavic verb *čьtǫ (inf. *čisti) ‘count, reckon, read’, derived from PIE *kʷeit- ‘notice, recognise’. This verb has produced numerous derivatives in Slavic (e.g. *čislo ‘number’); some of them may be accidentally similar to members of the čët group both in form and in meaning, e.g. Old Czech čet ‘count, quantity’ (Modern Czech počet, with a prefix). Note, however, the gen.sg. čtu ~ čta. The disappearing root vowel reflects Proto-Slavic *ь (a reduced vowel continuing earlier short *i in the weak form of the root, *kʷit-). Despite their deceptive similarity, Russian četá (or čët) and Czech čet have different etymologies.

If we remove all the false or dubious cognates, we are left with just the initial material: *četъ ‘even number’, *četьnъ ‘even (of numbers)’ and *četa ‘pair’ ­– a word-family securely attested in East and West Slavic. We can safely add the Ossetian word (isolated in Iranian, as far as I know, but a perfect match for *četa, semantically and formally). There’s no evidence that the original meaning of the morpheme *čet- was ‘two’; nevertheless, it seems to have had something to do with arranging things in couples. Typologically, the Slavic “odd/even” terminology is parallel to what we have seen in Greek and Sanskrit, even if different lexical roots are involved. If so, one could expect *čet- to be semantically close to the familiar Indo-European roots *h₂ar- ‘fit together’ and *jeug- ‘yoke, connect’. I shall therefore tentatively assume that *čet- continues a verb root like *kʷet-, with the approximate meaning of ‘combine into pairs’. Let’s see if we can work from here ­– next time.


Václav Blažek. 1999. Numerals: Comparative–etymological analyses of numeral systems and their implications. Brno: Masarykova Univerzita v Brně.

Marc L. Greenberg. 2001. “Is Slavic četa an Indo-European archaism?”. International Journal of Slavic Linguistics and Poetics 43: 35-39.

21 September 2014

‘Four’: A Map

I didn’t plan it this way, but since the discussion of the etymology of ‘four’ has unfolded into a small saga in several acts, I have to organise it for convenience. Here is a map of the route:

  1. [Word of the Month: Proto-Indo-European ‘Four’]
  2. [Even and Odd]
  3. [The Name of the Game: Jakobson Reads Vasmer]
  4. [Twos and Troops: Sifting the Evidence]
  5. [Forgotten Derivatives and their Sexual Implications]
  6. [Only Connect: The Strange Triangle]
  7. [Two Is Company, Four Is a Party] NEW!
The End

The Name of the Game: Jakobson Reads Vasmer

With the vast and reliable etymological material put into circulation by Vasmer, a number of new questions naturally arises. I should like to dwell on some particulars.
Roman Jakobson (1955) *) 

The Slavs played at “even and odd” too. In Polish the game used to be called cetno licho (or cetno i licho). The noun licho is still used as a mild euphemism for ‘devil’. Czego chcesz, do licha? means “What the heck do you want?” Polish also has the adjective lichy ‘poor, inferior, in bad shape’. Historically, licho is a neuter form of lichy, substantivised centuries ago, when the adjective had a wider range of meaning, including  ‘mean, evil’; licho was therefore ‘something wicked’. The phrase cetno i licho lingers on on the fringes of literary Polish (people are at best vaguely aware that it refers to some old game of chance), but cetno no longer occurs on its own, and has no obvious relatives  in the modern Polish lexicon.

The man who read Vasmer's dictionary
A few hundred years ago (most examples come from 16th-century texts) cetno and licho could mean, respectively, ‘even number’ and ‘odd number’. Though often contrasted with each other, they were not yet harnessed together into a fixed phrase. Cetnem (instr.sg.) or w cetnie (loc.sg.) meant ‘(occurring) in even numbers’; likewise lichem and w lichu ‘in odd numbers’. This usage has been completely forgotten.

Licho and lichy go back to Proto-Slavic *lixъ ‘strange, irregular, rogue’. In the modern Slavic languages it usually has pejorative conotations (‘bad, lacking, defective, lonely’, etc.); it can also mean ‘excessive, superfluous’. The meaning of Russian lixój, however, ranges – somewhat schizophrenically – from ‘bad, sinister, hard’ to ‘daring, valiant’ (the common ancestor was ‘extraordinary’, whether in a positive or a negative sense)’. Like semantically similar words in other languages (Greek perittós, English odd), *lixъ developed the arithmetical meaning of ‘odd’, which survives here and there in the Slavic branch. For example, in Czech liché číslo means ‘odd number’. As for its origin, *lixъ < *leikʷ-so-, from the widespread Proto-Indo-European root *leikʷ- ‘leave, abandon’.

So much for licho. Where does cetno come from? The Russian term for “even and odd” is čët i néčet. Čët means ‘even number’ (= čëtnoe čisló); néčet is its antonym. The adjective čëtnyj ‘even’ (of a number) is closely related to Polish cetno. Russian č normally corresponds to Polish cz, but some regional varieties of Polish have merged the affricate cz /tʂ/ with c /ts/ for centuries, and the standard language has borrowed a number of dialectal pronunciations of this kind.

On the combined evidence of Polish and East Slavic forms we can reconstruct Proto-Slavic *četъ (n.) and *četьnъ (adj.). Russian also has the noun četá ‘pair, couple’, which is formally and semantically close to them. There are several other Slavic words that might or might not be related to *četъ, but it’s wiser at this stage to exclude more difficult material so as to avoid the risk of contaminating a reliable set of cognates with spurious ones.

Back in the 1950s, as successive volumes of Max Vasmer’s monumental Russisches etymologisches Wörterbuch were published in Heidelberg, the great linguist Roman Jakobson (then at Harvard University) read the entire dictionary (I mean, actually read it like a novel, page by page), jotting down comments on entries that attracted his attention. Those marginalia were published as a journal article (see the reference below) and reprinted in Jakobson’s Selected Writings (Volume II: Word and Language). With regard to čët and its relatives, Jakobson remarked that they “seem to be archaic relics of the same word family as četýre” (the Russian reflex of the Indo-European numeral ‘four’). Having devoted one sentence to the matter, he moved on to the next entry that had caught his eye, čex ‘Czech’. The idea that čët and četýre are somehow related has been picked up by several other authors, but hitherto published attempts to analyse *kʷetwor- in this light have the usual flaws of “root etymologies”: too little attention to morphological details, and too much imaginative semantics. Nevertheless, I think Jakobson’s idea is worth salvaging, so I’ll review those previous attempts and try to see if I can do any better.

*) Roman Jakobson. 1955. “While reading Vasmer’s dictionary”. Word 11: 611-617.

[link to a digitalised Russian translation of Vasmer's dictionary]

[to be continued]

[back to the table of contents]

19 September 2014

Even and Odd

A brief interlude before we dissect *kʷetwor- for good:

This game is simple, and is played with marbles. One player holds in his hand a number of these toys, and demands of another whether that number is even or odd. If the guess is right, the guesser wins one; if wrong, he loses one.
Edgar Allan Poe, The Purloined Letter

Hellenistic ladies playing with astragaloi
(The British Museum)
This game is not only simple, but also as old as the hills. The Romans played it, and so did the Greeks and their gods. It was played with whatever could be concealed in one’s hand: astragaloi (“knucklebones”), nuts, coins, or pebbles. The game, in some ways ancestral to roulette, was called pār impār ‘equal-unequal’ in Latin. The Greeks called it artiasmós, or ártia ḕ perittà ‘even or odd’, or zugà ḕ ázuga ‘pairs or non-pairs’. It was so popular among the Greeks that a special verb, artíazō, was coined to mean ‘play at even and odd’.

Note that the Greek word for ‘even’ is ártios, meaning also ‘perfect, complete, exactly fitted’; it contains the highly productive Proto-Indo-European root *h₂ar- ‘fit together’, which has yielded, among many other Classical words of international currency, Greek harmonía ‘connection, framework’ (hence, figuratively, ‘agreement, order, harmony’) and Latin articulus ‘joint’. Similarly, Greek zugón ‘yoke’ (hence ‘pair’) < PIE *jugóm is derived from the root *jeug- ‘to yoke, connect’. The same root is the source of the Sanskrit words for ‘even’ (yugmán-) and ‘odd’ (a-yúj-, literally ‘having no yoke-fellow’). On the other hand, the core meaning of Greek perittós ~ perissós was ‘excessive, superfluous, extraordinary’. It seems that the notion of parity or “evenness” was understood as exhaustive divisibility into pairs rather than into two equal halves. To check if a number of things was even, you removed pair after pair until either nothing or a surplus of one was left. Such a remainder, or “odd man out”, was a kind of imperfection, marring the regularity of the number.

What has it got to with the etymology of ‘four’? We shall see next time.

[back to the table of contents]

17 September 2014

Word of the Month: Proto-Indo-European ‘Four’

As promised in a comment to my previous blog post, I’m going to discuss an etymological question: the origin and structure of the numeral ‘4’ in the Indo-European languages.

The Proto-Indo-European numeral ‘four’ had several intriguing properties. It was the largest non-complex cardinal number that agreed grammatically with a noun it modified. Consequently, it was inflected for gender and case, like any ordinary adjective. It shared that property with the words for ‘one’, ‘two’ and ‘three’. For obvious semantic reasons, their declension was defective: ‘one’ was normally singular, ‘two’ was declined only in the dual number, and ‘three’ and ‘four’ only in the plural.

The fourth is for luck.
The basic forms of the numeral ‘4’ (as reconstructed in handbooks) were the animate “count plural” *kʷetwores and the inanimate (neuter) “collective plural” *kʷetwōr (from earlier *kʷetwor-h₂). There is some uncertainty about the accentuation of these forms: some reconstruct them with PIE stress on the first syllable, others on the second (the comparative evidence is not unambiguous).

Proto-Indo-European probably had no feminine gender as a formal category, but it had ways to express femininity in derivatives. Curiously, the numerals ‘three’ and ‘four’ seem to have had feminine forms, preserved only in Celtic and Indo-Iranian. They are reconstructed as *tisres ‘3’ and *kʷetesres ‘4’. The final *-es is the familiar nom.pl. ending of animate stems ending in a consonant, but the rest looks baffling. The suffix *-sr-, known also from the Anatolian languages, where it forms nouns denoting human females, probably reflects an archaic, almost completely abandoned word for ‘woman’ (*ser-), although the zero grade (absence of a vowel) in the nom.pl. is aberrant; the initial part (*ti-, *kʷete-) looks in either case like the badly mangled residue of an actual numeral stem. Given the normal rules of IE word-formation, we would expect something like *trí-sor-es and *kʷétwr̥-sor-es. The characteristic “defects” of the attested forms are nevertheless shared between Celtic and Indo-Iranian; they must therefore go back at least to their most recent common ancestor. Such distortions are not quite unexpected in compound words, which commonly lose their transparency through irregular simplification.

Let’s ask a stupid question: what is *kʷetwores/*kʷetwōr the plural of? I mean, if it’s really an adjective, perhaps it had an older “etymological” meaning before it became part of the numeral system? If we strip off the inflections, what remains is the stem *kʷetwor-/*kʷetwr- (the second vowel is lost in so-called “weak” case-forms like loc.pl. *kʷetwr̥sú). This “bare” stem also occurs as a compositional variant of ‘four’, sometimes with  the final segments reversed (*kʷetwr̥- ~ *kʷetru-).

An Indo-European stem with four consonants and two vowel slots must have been morphologically complex at some point. The most likely division into morphemes would be *kʷet-w(o)r-. The *-w(o)r- part looks familiar. A suffix of this form is found in a number of Indo-European nouns, typically inanimate abstracts derived from verb roots. We also find it e.g. in the PIE word for ‘fire’, *páh₂wr̥, which is not obviously deverbal (though a connection with *pah₂- ‘guard’ is thinkable). We also have at least one evidently archaic example od an adjective built in the same manner. Beside the inanimate noun *p(e)iH-wr̥ ‘fat’ (Greek pĩar) we find an adjective meaning ‘fat, fertile’ whose masculine form was *p(e)iH-won-; its neuter must have been originally identical with the noun, and a suffixed feminine *piH-wer-ih₂ was added to the paradigm as the IE gender system developed a three-way contrast (I use the cover symbol *H here for a laryngeal whose “index” is hard to determine). Note the consonant alternation in the suffix: it’s characteristic of an entire class of neuters, so-called r/n-stems. They show *-r in the nom./acc. singular and collective (e.g. *páh₂-wōr, the collective of the ‘fire’ word), but *-n- in the remaining cases (like the gen.sg. *ph₂-wén-s). The variant *-n- is also expected in related animate forms, with the strange exception of *-r- occurring before the femininising suffix *-ih₂, as illustrated by the preserved forms of the adjective ‘fat’. The striking agreement between Greek píōn (m.), píeira (f.) and Vedic pī́van- (m.), pī́varī (f.) shows that this unusual alternation is inherited.

To continue our Gedankenexperiment: so far we haven’t identified the underlying root *kʷet-. Still, if we tentatively assume that it was indeed a verb root, some predictions can be made: beside the hypothetical abstract noun *kʷét-wr̥, possible derivatives include an adjective of exactly the same form in the inanimate gender. Its expected animate form would be *kʷét-won- (nom.sg. *kʷétwō, nom.pl. *kʷétwones). The neuter noun/adjective would form the collective plural *kʷétwōr. Of these forms, two can be regarded as attested: *kʷétwōr is a possible reconstruction of the neuter numeral, and *kʷétwr̥ is its uninflected compositional variant. Conspicuous by their absence are any forms with *n instead of *r. Why, for example, is the animate (masculine) plural *kʷetwores rather than *kʷetwones? The most natural explanation is that this particular plural isn’t old enough to participate in the *-n/r- alternation.

Let’s imagine that *kʷétwr̥ was originally a neuter noun (without an accompanying adjective). Whatever its etymological meaning (let’s symbolise it ‘X’), the collective plural *kʷétwōr (meaning ‘a set of instances of X’) came to be employed as a cardinal number, at first uninflected (like ‘five’, ‘six’, etc.), but eventually attracted into the adjective system, presumably on the analogy of the already adjectival numerals ‘two’ and ‘three’. In the early history of Indo-European the accent was often shifted to the second syllable in such collectives; hence the by-form *kʷ(e)twṓr, in which the first vowel could be phonetically reduced (*kʷətwṓr) or lost altogether. Non-initial stress is reflected in Germanic (cf. Gothic fidwor, displaying the voicing effect of Verner’s Law), and vowel reduction accounts for Latin quattuor (with Lat. /a/ from *ə).

When *kʷétwōr ~ *kʷ(e)twṓr came to be interpreted (and declined) as a neuter plural adjective, an animate counterpart was analogically supplied by adding appropriate inflectional endings to the stem *kʷétwor- or *kʷ(e)twór-. Since its origin as an n/r-noun had been forgotten by that time, PIE-speakers had no reason to make their life more difficult by reviving an ancient alternation. The only case-forms requiring distinctly animate inflections (different from neuter ones) were the nom.pl. (*-es) and acc.pl. (*-n̥s from earlier *-m̥-s). The unsettled stress pattern (*kʷétwores ~ *kʷ(e)twóres) may well be an old feature of the numeral ‘four’.

Some details require more attention, but first I would like to address the question left unanswered above: what exactly was *kʷet-, the root supposedly underlying the derivation of the numeral ‘four’? I will try to suggest an answer in the next post (later this week, I hope), so please stay tuned.

[back to the table of contents]

11 August 2014

De-Extinction: The Mammoth Walks Again

A word has a definable function if speakers regularly select it to convey a certain meaning (or, more generally, to achieve a certain communicative effect). As long as they have a reason to do so, a word  remains useful and there is a good chance that it will stay in circulation. A word which is used frequently will be transmitted to new users more reliably, especially if its function is easy to infer from the way it is used. Low-frequency words are prone both to semantic change and to lexical replacement: new speakers may quite accidentally fail to hear them used, or encounter them only occasionally in a context which doesn’t quite clarify their meaning. Word death is mostly due to accidental transmission breaks happening too often.

If historical linguists had any say in the matter, I’m sure that time-honoured words, priceless as evidence of language history, would enjoy special protection, and every care would be taken that they should be saved for posterity (no matter if we still need them for everyday communication). Alas, linguists have no such authority. It’s common usage plus quirks of fate that ultimately decide whether a word will die or survive.

A word already dead in spoken language may occasionally come back to life. Talking of fate and its quirks – here is one well-known case.

The descendant of the Old English noun wyrd ‘fate, destiny, fortune’ was practically extinct by the sixteenth century, ousted by its Latinate synonyms. It lingered on in Scotland long enough to be used by John Bellenden (in the 1530s) in his Scots translation of a Latin version of the story of King Duncan and Macbeth (published by Hector Boece a decade earlier). The three prophesying fairies which appear in the narrative, thought to be the supernatural “Fates” who control human destiny (comparable to the Greek Μοῖραι, the Roman Parcae or the Scandinavian Nornir), are called weird sisteris (literally = ‘the Fate Sisters’) by Bellenden. He didn’t invent the phrase; it can be found in earlier Scots sources referring to the three classical Fates.

The story told by Boece and translated by Bellenden was in turn adapted by the English chronicler Raphael Holinshed and his collaborators, and thus the weird sisters found their way into The Chronicles of England, Scotland, and Ireland. The second edition of that work, published in 1587, was Shakespeare’s source for the plot of Macbeth. Some confusion must have taken place in the process. Shakespeare turned Holinshed’s “goddesses of destinie, or else some nymphs or feiries” into repulsive old hags with “choppy” fingers, skinny lips, and even beards to boot. Shakespeare and the compositors of the First Folio (1623) were apparently puzzled by the unfamiliar word weird. The original phrase underwent deformation into weyward or weyard sisters; the first word was possibly taken for an adjective similar to wayward, and pronounced as two syllables (although exactly how Shakespeare understood it and whether he actually confused it with wayward are moot questions). Later editors “restored” the spelling used by Holinshed and his Scots source (but not by Shakespeare), bringing back the form of weird, but not its original function. Like an Egyptian mummy from old horror films, weird rose from its tomb and strutted about, half-resurrected but not sure what to do in the modern world.

Nineteenth-century readers and playgoers deduced the meaning of weird from what they saw on the stage. They were shown three “Weird Sisters” portrayed as grotesquely hideous witches, bizarre and unearthly. “Ah,” thought the audience, “so that’s what they mean by ‘weird’.” Before long, weird became a popular adjective to describe anything strange or uncanny. Crucially for its further spread, it managed to colonise the colloquial register of English, in which there is a constant demand for new emotionally coloured words to replace those that have become hackneyed. A function was apparently there, waiting for a suitable word to express it. What remains of Old English wyrd is just the form, like an empty shell, co-opted for completely new grammatical and semantic uses. Those who would like to clone the mammoth should draw a lesson from it.

The life restoration of a 17th-c. word.
De-extinction can happen in various ways. The word twat, gone obsolete for about a century, was excavated by Robert Browning and mistaken for something entirely innocent (the context was again not clear enough and could suggest a nun’s headgear; see here and here at Language Log). Browning’s naive mistake was later exposed by the Wise Clerks of Oxenford, much to the delight of those who heard of it, and the seventeenth-century four-letter word came back to life, regaining even its high obscenity index. It’s probably far more frequent now (especially in British English) than it ever was in its former heyday. Please consider this cautionary example before you de-extinct the thylacine.

My personal favourites among the words that should have been saved (but were not) are old kinship terms. Proto-Indo-European had a large and complicated system of names for different kinds of family relations. Many of them were still used in Old English, but only a handful have survived till now ­–­ those refering to the closest biological relationships (mother, father, sister, brother, daughter, son, all of them with impeccable PIE pedigrees, even if sister was touched by Old Norse influence). A few have been substituted by terms borrowed from French (aunt, uncle, niece, nephew), also traceable back to PIE, but acquired second-hand. Note, by the way, that while Old English ēom, for example, referred specifically to a maternal uncle in the strictest sense (the brother of one’s mother), an uncle could be maternal or paternal already in Middle English. Furthermore, uncle may refer to the husband of one’s aunt (again maternal or paternal) – not even a blood relation. We are dealing here with a new system replacing an older one, not just a series of lexical replacements.

The boringly transparent “in-law” terms have replaced the Old English words for affinity relationships. Not a single one has survived. All that mattered in the late Middle Ages was the degree of affinity as defined by the Code of Canon Law (which prohibited sex and marriage between some people so related), and the “in-law” terminology made that explicit. Gone are such beautiful Old English relics as snoru ‘daughter-in-law’ (from PIE *snusós) and tācor ‘the brother of one’s husband’ (note that only a woman could have one) – one of the four kinds of brotherhood-in-law possible today. The latter word has relatives in Indo-Iranian, Balto-Slavic, Greek, Latin, and Armenian. The PIE stem is usually reconstructed as *dah₂iwér-, but the details of its development in some branches of the family (including PGmc. *taikuraz and its historical reflexes) are not quite clear, making it especially interesting.

Couldn’t we revive those forgotten kinship terms, just for fun? Well, I don’t think the two just mentioned would have much chance of success. Had snoru developed regularly, it would be *snore today, and I doubt if any woman would find such awkward homonymy acceptable. Tācor, in turn, would have become Modern English *toker. Unfortunately, such a form (orthographic and phonetic) is no longer up for grabs. We find it in the lyrics of “The Joker” (by the Steve Miller Band):
I’m a joker
I’m a smoker
I’m a midnight toker...
and it doesnt mean an Anglo-Saxon brother-in-law.

05 August 2014

Le Mot Juste

In the title of the preceding blog I mentioned the “word” zyzzyva. If you check it up, you will learn that it refers to a genus of weevil  from South America. Zyzzyva is its official “Latin” name. Of course only professionals who study neotropical beetles for a living have any real reason to talk about zyzzyvas from time to time. The name doesn’t even seem to have a real etymology. The entomologist who named the tiny thing probably did so with jocular intent: he wanted to make sure that the name would be the last one on any alphabetic list of insects. One accidental side-effect of his practical joke was that it made Zyzzyva  far more prominent than its entomological status would warrant. I’m sure I would never have heard of it if its name began with F, or R, or even Za… rather than Zy….

Zyzzyva belongs to Curculionidae, a family with some 50,000 species grouped into more than 4,600 genera. These numbers may seem large, but they refer only to taxonomic units described and named so far; the actual diversity of the family must be much, much greater. Non-specialists, however, have (at best) only the faintest idea of what a weevil is, what it may look like, and how it differs from other “bugs”; they wouldn’t be able to recognise a particular genus or species if their life depended on it. Very few people have ever seen a zyzzyva. I haven’t been able to find an image of one, using Google. A photograph widely circulated on the nets and purported to feature a zyzzyva (see below) in fact shows a different weevil, not even from the same family.

Do not trust Google blindly.

The case of zyzzyva is instructive because it shows how a word-like entity can spread quite virally and remain in circulation despite having practically no communicative function. The only reason why people might want to use it is its curiosity value. It looks improbable in an amusing way, and one may like the sound of it. The fact that it refers to a real animal is quite irrelevant.

My God ‒ Rogets Thesaurus!
(by Ronald Searle)
The vocabulary we really need for effective communication is quite simple in comparison with the complexity of the world around us. We have words for a few hundred core concepts and for several thousand peripheral ones. A well-educated person’s active vocabulary contains fewer words than there are species of weevil. Nevertheless, we manage to make ourselves understood, and even to add stylistic nuances to plain communication (as when we decide, perhaps after consulting Roget’s Thesaurus, that something was “calamitous” rather than merely “disastrous”). In terms of brain organisation, memory is cheap, but not so cheap that we should want to have a separate word for every possible category of object, every imaginable shape or colour, and every kind of activity. It’s good to be precise, but not at any cost. After all, if there is no single word to convey exactly what we want to express, we can resort to combinations of words (“three hundred and seventy-one”) or circumlocutions (“those little things with the sort of raffia work base that has an attachment”). Different cultures have very different priorities when it comes to naming things. The appearance of a new kind of object may inspire lexical innovation (the coining of a new word) or semantic change (adding a new meaning to an already existing word), but words may also become forgotten when they are no longer needed to transmit culturally important meanings. I will discuss some characteristic examples next time.

01 August 2014

From Aardvark to Zyzzyva: Words and Their Functions

Sick of considering a hammer? All right, as I was saying...

Are there any linguistic units whose functionality is hard to doubt? I think everybody will agree that words are functional. Without attempting to formulate a precise definition, let’s assume that a “word” is a recurrent linguistic element which can be uttered on its own, and which is stored in a speaker’s memory as a bundle of phonetic, morphological, syntactic and semantic propreties.

A word has a phonetic shape: it consists of a string of segments (“speech sounds”), which may be accompanied by a specific pattern of voice pitch and intensity (tone, accent or stress). Every language has a limited inventory of phonetic building-blocks which can be used to form words, and imposes certain constraints on their permissible combinations. In this way, “legal” word shapes are defined for a given language. Different words, with different histories and meanings, may accidentally acquire the same pronunciation. Quite frequently one word has two or more acceptable phonetic variants, or is pronounced differently in different varieties (“accents”) of the same language.

A word can be said to have internal morphological structure if speakers are aware that it consists of smaller meaningful units (morphemes). For example the abstract noun functionality is derived from the adjective functional by combining it with the noun-forming morpheme -ity (which does not occur in isolation). Since there are many parallel formations in English (personal + -ity, formal + -ity, cordial + -ity, etc.), speakers can figure out their structure without much difficulty despite the fact that the suffix has a distorting effect on the base to which it is added (for example, it forces primary stress to fall on the immediately preceding syllable). Functional itself happens to be internally complex: it can be decomposed into the noun function plus the adjective-forming suffix -al.

A word has syntactic properties which determine the manner in which it cooperates with other words to form longer and more complex structures. For example, horse is an English noun. It can be combined with other lexical items into a noun phrase (e.g. the big black horse), which in turn may play certain roles in a sentence, for example the role of its subject (The big black horse jumped over a hurdle). Like other countable nouns in English, horse can be inflected for number (sg. horse, pl. horses).

Finally, words are carriers of information: the vast majority of them have so-called lexical meaning: they point to something in the external world (classes of object, qualities, actions, abstract concepts). There is also a limited inventory of “function words” (prepositions,  pronouns, articles, conjunctions and the like), which specialise in expressing syntactic rel
ations within the utterance and don’t necessarity have any non-linguistic reference. Words are combined into longer utterances which convey complex messages. Their content is determined not only by the individual word meanings and the sentence structure, but also by the situational context in which members of a speech community talk to one another, their shared knowledge, presuppositions, etc. One word may have several core meanings plus a number or peripheral and figurative ones, each with numerous “shades of meaning”; quite often the same or similar meaning can be expressed by different words.

Oxford Dictionaries’ word of the year 2013

Let us identify the function of a word with the semantic and/or grammatical role it plays in communication. It’s easy to build a string of sounds which satisfies the well-formedness conditions of a given language, and give it a plausible-looking orthographic shape.  For example, whasket, clenge, crive, borm and scrough (pronounced to rhyme with cow) are all possible English word forms. You could compose an imitation of an English sentence using them (plus a few function words): She scroughed and crove along the borm, clenging her whaskets (crove is of course the past tense of crive). Such pseudowords, hovever, don’t mean anything to anybody (or at any rate have no meaning agreed upon by any substantial speech community); therefore we don’t call them words.

A word is functional by definition: in principle, no word is completely useless – otherwise it wouldn’t be a real word. It’s clear, however, that some words are more useful than others. It would be difficult to communicate without such household words as timeoutmoneystandwhite or three, whereas fancier and more specialised items like oligarchmulchincurious or tailgate are much less indispensable to the general public (though useful in some special situations). You can live a long and happy life without ever rummaging the English word-hoard to the very bottom for rare lexical gems such as coccineousexantlatespurrierululatory or xenophilia, even if you happen to be vaguely aware of their existence. Indeed, thousands of words qualified as obsolescent or obsolete haunt the pages of the Oxford English Dictionary, unheard of by anyone save professional word-hunters. Practically nobody has used them for decades (with the possible exception of Scrabble addicts). Do they still have a function or should they be considered linguistic junk – ex-words (’oose metabolic processes are now ’istory)? The functions of a given word may change in time; different words may compete for the same function. To what extent do we, the language users, control such processes? If words are tools of communication, do we shape them and adapt them to our needs? What role do our individual preferences play? What other forces affect the functionality of words? I’ll broach these questions in the blogs to come.

21 July 2014

A Great Indo-European *Bʰlog

It is vacation time, so I finally have a little time to kill. I solemnly declare I will start posting again soon. Meanwhile, let me recommend a terrific new blog devoted to Indo-European linguistics, culture, and mythology. As you may know, there is some disagreement among historical linguists as to quite a few details of the Proto-Indo-European reconstruction, and consequently there are several different dialects of Proto-Indo-European. One of the reasons why I like The *Bʰlog is that I use practically the same dialect as its author, so there is nothing I can disagree with in matters of pronunciation and word inflection. I can read it without having to shake my head disapprovingly every now and then as I mentally correct the reconstructions. But of course there are lots of other reasons to like it, as you will find out for yourselves if you visit it:

The *Bʰlog