Language Evolution: Word of the Month: Proto-Indo-European ‘Four’

17 September 2014

Word of the Month: Proto-Indo-European ‘Four’

As promised in a comment to my previous blog post, I’m going to discuss an etymological question: the origin and structure of the numeral ‘4’ in the Indo-European languages.

The Proto-Indo-European numeral ‘four’ had several intriguing properties. It was the largest non-complex cardinal number that agreed grammatically with a noun it modified. Consequently, it was inflected for gender and case, like any ordinary adjective. It shared that property with the words for ‘one’, ‘two’ and ‘three’. For obvious semantic reasons, their declension was defective: ‘one’ was normally singular, ‘two’ was declined only in the dual number, and ‘three’ and ‘four’ only in the plural.

The fourth is for luck.

The basic forms of the numeral ‘4’ (as reconstructed in handbooks) were the animate “count plural” *kʷetwores and the inanimate (neuter) “collective plural” *kʷetwōr (from earlier *kʷetwor-h₂). There is some uncertainty about the accentuation of these forms: some reconstruct them with PIE stress on the first syllable, others on the second (the comparative evidence is not unambiguous).

Proto-Indo-European probably had no feminine gender as a formal category, but it had ways to express femininity in derivatives. Curiously, the numerals ‘three’ and ‘four’ seem to have had feminine forms, preserved only in Celtic and Indo-Iranian. They are reconstructed as *tisres ‘3’ and *kʷetesres ‘4’. The final *-es is the familiar nom.pl. ending of animate stems ending in a consonant, but the rest looks baffling. The suffix *-sr-, known also from the Anatolian languages, where it forms nouns denoting human females, probably reflects an archaic, almost completely abandoned word for ‘woman’ (*ser-), although the zero grade (absence of a vowel) in the nom.pl. is aberrant; the initial part (*ti-, *kʷete-) looks in either case like the badly mangled residue of an actual numeral stem. Given the normal rules of IE word-formation, we would expect something like *trí-sor-es and *kʷétwr̥-sor-es. The characteristic “defects” of the attested forms are nevertheless shared between Celtic and Indo-Iranian; they must therefore go back at least to their most recent common ancestor. Such distortions are not quite unexpected in compound words, which commonly lose their transparency through irregular simplification.

Let’s ask a stupid question: what is *kʷetwores/*kʷetwōr the plural of? I mean, if it’s really an adjective, perhaps it had an older “etymological” meaning before it became part of the numeral system? If we strip off the inflections, what remains is the stem *kʷetwor-/*kʷetwr- (the second vowel is lost in so-called “weak” case-forms like loc.pl. *kʷetwr̥sú). This “bare” stem also occurs as a compositional variant of ‘four’, sometimes with the final segments reversed (*kʷetwr̥- ~ *kʷetru-).

An Indo-European stem with four consonants and two vowel slots must have been morphologically complex at some point. The most likely division into morphemes would be *kʷet-w(o)r-. The *-w(o)r- part looks familiar. A suffix of this form is found in a number of Indo-European nouns, typically inanimate abstracts derived from verb roots. We also find it e.g. in the PIE word for ‘fire’, *páh₂wr̥, which is not obviously deverbal (though a connection with *pah₂- ‘guard’ is thinkable). We also have at least one evidently archaic example od an adjective built in the same manner. Beside the inanimate noun *p(e)iH-wr̥ ‘fat’ (Greek pĩar) we find an adjective meaning ‘fat, fertile’ whose masculine form was *p(e)iH-won-; its neuter must have been originally identical with the noun, and a suffixed feminine *piH-wer-ih₂ was added to the paradigm as the IE gender system developed a three-way contrast (I use the cover symbol *H here for a laryngeal whose “index” is hard to determine). Note the consonant alternation in the suffix: it’s characteristic of an entire class of neuters, so-called r/n-stems. They show *-r in the nom./acc. singular and collective (e.g. *páh₂-wōr, the collective of the ‘fire’ word), but *-n- in the remaining cases (like the gen.sg. *ph₂-wén-s). The variant *-n- is also expected in related animate forms, with the strange exception of *-r- occurring before the femininising suffix *-ih₂, as illustrated by the preserved forms of the adjective ‘fat’. The striking agreement between Greek píōn (m.), píeira (f.) and Vedic pī́van- (m.), pī́varī (f.) shows that this unusual alternation is inherited.

To continue our Gedankenexperiment: so far we haven’t identified the underlying root *kʷet-. Still, if we tentatively assume that it was indeed a verb root, some predictions can be made: beside the hypothetical abstract noun *kʷét-wr̥, possible derivatives include an adjective of exactly the same form in the inanimate gender. Its expected animate form would be *kʷét-won- (nom.sg. *kʷétwō, nom.pl. *kʷétwones). The neuter noun/adjective would form the collective plural *kʷétwōr. Of these forms, two can be regarded as attested: *kʷétwōr is a possible reconstruction of the neuter numeral, and *kʷétwr̥ is its uninflected compositional variant. Conspicuous by their absence are any forms with *n instead of *r. Why, for example, is the animate (masculine) plural *kʷetwores rather than *kʷetwones? The most natural explanation is that this particular plural isn’t old enough to participate in the *-n/r- alternation.

Let’s imagine that *kʷétwr̥ was originally a neuter noun (without an accompanying adjective). Whatever its etymological meaning (let’s symbolise it ‘X’), the collective plural *kʷétwōr (meaning ‘a set of instances of X’) came to be employed as a cardinal number, at first uninflected (like ‘five’, ‘six’, etc.), but eventually attracted into the adjective system, presumably on the analogy of the already adjectival numerals ‘two’ and ‘three’. In the early history of Indo-European the accent was often shifted to the second syllable in such collectives; hence the by-form *kʷ(e)twṓr, in which the first vowel could be phonetically reduced (*kʷətwṓr) or lost altogether. Non-initial stress is reflected in Germanic (cf. Gothic fidwor, displaying the voicing effect of Verner’s Law), and vowel reduction accounts for Latin quattuor (with Lat. /a/ from *ə).

When *kʷétwōr ~ *kʷ(e)twṓr came to be interpreted (and declined) as a neuter plural adjective, an animate counterpart was analogically supplied by adding appropriate inflectional endings to the stem *kʷétwor- or *kʷ(e)twór-. Since its origin as an n/r-noun had been forgotten by that time, PIE-speakers had no reason to make their life more difficult by reviving an ancient alternation. The only case-forms requiring distinctly animate inflections (different from neuter ones) were the nom.pl. (*-es) and acc.pl. (*-n̥s from earlier *-m̥-s). The unsettled stress pattern (*kʷétwores ~ *kʷ(e)twóres) may well be an old feature of the numeral ‘four’.

Some details require more attention, but first I would like to address the question left unanswered above: what exactly was *kʷet-, the root supposedly underlying the derivation of the numeral ‘four’? I will try to suggest an answer in the next post (later this week, I hope), so please stay tuned.

[back to the table of contents]

40 comments:

David Marjanović17 September 2014 at 03:22
(Greek pĩar, Vedic

I think you made a typo in an <i> tag...

and vowel reduction accounts for Latin quattuor (with Lat. /a/ from *ə).

Intriguing; Latin has a lot of /a/ that seems to come out of nowhere, to the point that it's been called "unreliable" for drawing conclusions about PIE vowels. Can the vowel reduction, which presumably came with some shortening, account for the otherwise unexpected /tː/ by compensatory lengthening?
ReplyDelete
Replies
Unknown19 September 2014 at 06:24
Even if reduced grade were acceptable, there would be no basis to expect it in Latin 'four'. Oscan _petora_ 'four' (Fest.) and _petiru/o-pert_ 'four times' (Tab. Bant. 14/15), with Umbrian _peturpursus_ 'for quadrupeds' (Tab. Iguv. 6B:11), show that P-Italic inherited /e/-grade in the first syllable, and it makes sense that Q-Italic did also.

The Latin /a/-vocalism must thus be due to contamination. The combining form _quadri/u-_ 'four-' also has -dr- which cannot be derived by verifiable soundlaws from any reasonable IE form of 'four-', so this cluster must also come from the contaminant. The adjective _quadrus_ 'square' could have been extracted from an obsolete noun *quadrum 'whetstone' which was interpreted as a substantivized adjective 'square (stone)', neuter after _saxum_, and later replaced by _co:s_.

An IE root *k^weh1d- 'to wear down, abrade, sharpen by abrasion' accounts for this in addition to a group of Germanic words. Normal grade appears in ON _hváta_ 'to break through', /o/-grade in Go. _hwo:ta_ 'threat', and zero grade in OE _hwæt_ 'swift, brave', _hwæss_ 'sharp', _hwettan_ 'to sharpen, whet, incite', as well as Lat. *quadrum < *k^wh1d-róm. (Pokorny, IEW 636, lists the Gmc. words but wrongly includes Lat. _triquetrus_ instead of _quadrus_.)

The inroad for contamination was probably 'forty'. The inherited tongue-twister *quetvora:ginta: was one syllable longer than _quadra:ginta:_, this originally a colloquial substitute 'square decades', i.e. 'decades on the points of a square'. Once this was established as 'four decades', a new combining form _quadri/u-_ was able to oust inherited *quetur- (= Umb. _petur-_, Skt. _catur-_), whose form was peculiar. New compounds along with _quadrus_ and _quadra:ginta:_ acquired enough collective strength to impose _qua_-anlaut on *queturs 'four times' (hence *quatur(s) > _quater_), *quetvo(:)r 'four', and even *quo:rtus 'fourth' (cf. Praenestine QVORTA, woman's name), turning it into _qua:rtus_.

Sihler's explanation of -tt- in _quattuor_ (NCG §185.4) is laced with misinformation. It obviously has nothing to do with the gemination in _acqua_, condemned by Probus, which did not spread beyond northern Italy. Spanish _agua_ has no underlying geminate, but _cuatro_ does, since Sp. _piedra_ continues Lat. _petra_. And Lat. _mortuus_ (along with _perspicuus_ and the like) shows generalization of the post-heavy Sievers variant *-uwo- for simple *-wo-; it continues OL *mortuvos, not *mortvos (cf. Venetic _murtuvoi_ 'to the dead (man)'). (Likewise disyllabic Lat. _-ius_ continues post-heavy *-ijo- for simple *-jo-.) Lat. _bat(t)vere_ (incorrectly written _bat(t)uere_) is not comparable to _quattuor_, for Friulian has _bataye_ 'battle' from _battva:lia_, but _kutuardis_ 'fourteen' from _quattuordecim_. That is, pretonic -ttv- and -ttu- were kept distinct.

Although Romance requires -tt-, QVATVOR does occur in inscriptions and manuscripts, as well as Medieval Latin (e.g. _quatuor socij_ in the Cuckoo Song instructions). This suggests an external source for the -tt-, such as crossing with Oscan _pettiur_, whose meaning has been disputed. Oscan -iu- for earlier post-dental *-u- is found in _tiurrí_, _eítiuvam_, etc., apparently the rising diphthong [ju] (Buck, OUG §56). And gemination occurred before [j], as in _Dekkieis_, gen. of _Dekis_ 'Decius' (ib. §162). Thus Osc. _pettiur_ could regularly continue earlier *petur, hypostatized from the combining form (Umb. _petur-_, Skt. _catur-_). Native speakers of Oscan (or closely related Sabine) who learned Latin might have had a hard time losing the geminate in 'four', thus making _quattuor_ out of _quatuor_. This is no more outlandish than the attested replacement of _poplicus_ by _publicus_, the latter with Sabine (or Oscan) phonetics, which left _populus_ unaffected.
ReplyDelete
Replies
Piotr Gąsiorowski19 September 2014 at 08:23
Even if reduced grade were acceptable, there would be no basis to expect it in Latin 'four'. Oscan _petora_ 'four' (Fest.) and _petiru/o-pert_ 'four times' (Tab. Bant. 14/15), with Umbrian _peturpursus_ 'for quadrupeds' (Tab. Iguv. 6B:11), show that P-Italic inherited /e/-grade in the first syllable, and it makes sense that Q-Italic did also.

We do not know which forms survived into Proto-Italic, but there's no reason why they should have had the same grade in the first syllable. If the numeral was still declinable, it may have had reflexes of such forms as the "mainstream" *kʷétwores with *e beside a restressed collective *kʷtwṓr; and either of them could have had weak cases of the amphikinetic type like *kʷtwr̥-bʰís, ending up with Italic *a as a remodelled zero-grade. Any levelling in favour of either vowel would have taken place independently in Latin and Sabellic.

Sihler's explanation of -tt- in _quattuor_ (NCG §185.4) is laced with misinformation. It obviously has nothing to do with the gemination in _acqua_, condemned by Probus, which did not spread beyond northern Italy. Spanish _agua_ has no underlying geminate, but _cuatro_ does

Sihler nowhere claims that they represent the same change (or that acqua underlies the modern Romance forms, or that the gemination in that word was regular or widespread). He merely adduces inscriptional examples of a tendency to lengthen obstruents in a similar context (which operated also independently in West Germanic and Sanskrit, among others).

Although Romance requires -tt-, QVATVOR does occur in inscriptions and manuscripts, as well as Medieval Latin (e.g. _quatuor socij_ in the Cuckoo Song instructions). This suggests an external source for the -tt-...

Whether we are dealing with an imperfect sound change leaving behind synchronic variation in Latin, or with external influence, the /tt/ is not particularly mysterious or isolated, which is the whole point here.
ReplyDelete
Replies
David Marjanović19 September 2014 at 13:53
Very interesting! :-) So there's a chance that modern Italian (pubblico, acqua...) continues an Oscan sound change?
ReplyDelete
Replies
Piotr Gąsiorowski8 February 2015 at 19:40
The Core IE numeral 'eight' is certainly a dual form, which makes it likely that the corresponding singular meant 'four' (at least approximately) at some point. It's a thematic dual, so the stem without the dual inflection is *Hoḱto-. Avestan ašti- means the breadth (not the length) of four fingers (= one "palm"). I-stem nouns parallel to o-stems of adjectival origin are common in Indo-European. The "index" of the initial laryngeal is hard to detemine if the root vocalism is *o. One popular hypothesis is that the word is derived from *h₂aḱ- 'be sharp, sharpen' and somehow connected with *h₂oḱetah₂ 'harrow, rake', with reflexes in several branches. The semantics, however ('an implement with sharp tines' → 'the four fingers of a hand'), is a little convoluted.
ReplyDelete
Replies
Piotr Gąsiorowski9 February 2015 at 20:39
We don't know how old the "classical" decimal system of Core IE is. '5, 6, 7' are all a bit peculiar ('7' is the best candidate for a loanword). '8' looks like a dual. '9' is etymologically opaque (a connection with 'new' is hardly secure), '10', for a change, is a well-behaved stem, apparently an original neuter, *déḱm̥t-, with normally formed and regularly ablauting (dḱm̥t-/dḱomt-) dual and collective plural forms attested in the decadic numerals '20, 30, ... 90', and a thematic derivative serving as '100' ((d)ḱm̥t-ó-m). The pattern looks old because of its complex ablaut, but unfortunately the Anatolian evidence for anything above '4' is problematic (even the interpretation of šiptamiya- as 'sevenfold' is uncertain). We know that the Anatolians used a decimal system, but how much was inherited and how much borrowed from their Middle Eastern neighbours remains undecidable. We can only curse them for having used logographic writings so consistently.
ReplyDelete
Replies
Piotr Gąsiorowski9 February 2015 at 22:37
I can only speak for myself. One possibility is that the numerals above 4 were originally adverbs (rather than adjectives) -- formally, nom./acc. neuters like Lat. multum or facile (note that '10' and '100' are inflected like neuters when forming complex numeral expressions). This doesn't quite work for '8', which has an animate (nom./acc.) dual ending, but if '8' was derived from '4' (or a noun meaning 'four of [something]'), at least its exceptional behaviour is not inexplicable. I would speculate that there was a time when pre-PIE speakers had a very restricted counting system (1-4, if that), in which all the numerals were adjectives. As the system was extended, adverbs were co-opted as the higher numerals, as if modifying the degree to which a noun was plural. the odd-looking *pénkʷe could be an archaism predating the spread of *-m to thematic neuters (i.e., a bare stem, like the nom./acc. of athematic neuters).
ReplyDelete
Replies
David Marjanović10 February 2015 at 01:12
Somewhere I once read that the Moscow School had reconstructed a Proto-Caucasian* *fimkʼwe "5" which looks like a possible source for a loan into PIE. (Not in the other direction; then we'd expect *pʰ or *pʼ instead of *f.) I can't find the source now. :-(

* North Caucasian of course, excluding Kartvelian.
ReplyDelete
Replies
Piotr Gąsiorowski10 February 2015 at 11:21
I wonder what the evidence for such a form would be. Below, at any rate is the relevent entry in the Tower of Babel database (the reconstruction is Sergei Starostin's). By the way, they reconstruct a large set of numerals (including '100') all the way back to "Proto-Sino-Caucasian".

Entry: PNC *f̠ɦä̆ 'five'

And here some of the competing reconstructions of "North Caucasian" numerals are compared:

Wikipedia: North Caucasian languages
ReplyDelete
Replies
David Marjanović3 April 2015 at 17:34
Oh, sorry, not "five", but "fist"! And only (North)East Caucasian, it hasn't been found in the West. Here's the Starling entry. "Notes: Reconstructed for the PEC level. Correspondences are regular (one of the roots with the relatively rare phoneme *f)."

Judging from a paper where I found it, it's entry number 428 in Starostin & Nikolayev's (1994) North Caucasian Etymological Dictionary, which I don't have (except for the preface which doesn't mention it). Starostin (1988) cited it on p. 119 as Proto-East Caucasian *X̄wink'wV (for which he listed plausible reflexes in 6 languages "and others"), where "X" probably means [χ].

North Caucasian languages are some of the least likely languages to loan material into PIE

Why do you think so? There are plenty of similarities that have been considered evidence for contact (or even, by a few people, the idea that IE and NWC are sister-groups). Geographically it makes sense; that there are similarities which require some explanation other than chance is not controversial as far as I know.
ReplyDelete
Replies
OsoDanes2 September 2019 at 13:43
The Semitic numerals are unlikely to have transferred directly from Proto-Semitic into Proto-Indo-European per se. I tentatively posit an intermediary language (or more!) on the Balkans, recognizing that much of early PIE contact with the Neolithic occured north of the Black Sea. Alternative roads to the Semitic Middle East are the linguistically packed Caucasus region (but the numerals are similar in Kartvelian!) and the linguistic void that we ultimate know as BMAC east of the Caspian Sea. But considering the importance of agriculture in the Balkan region and the sustained border between PIE and the Cucuteni-Tripolye culture in modern Ukraine, my bets are on the western route. I'm still at a loss as to whether the words reached PIE in a linguistic vehicle related to Semitic, or through an unrelated intermediary.
ReplyDelete
Replies
Z4chst3r16 January 2020 at 14:36
One needs to look further back than PIE. There is an affinity between PIE *kʷetwor- and Afro-Asiatic for example Shawiya: d'rbu and even with Pre-Dravidian languages e.g. Meyu that has: yerrabula going back to *kutyarra-pula. In the latter the *kutyarra actually means two and the -pula is a dual ending 4 is 2 doubled/folded. Its possible that PIE and Afro-Asiatic words for 4 go back to something similar - a possible survival of a word like -pula meaning folding or double is seen in the -bu part of d'rbu and say -ba in Hebrew: arba. If 4 was originally something like *kʷetworepola the *kʷetwore- would similarly have meant 2 originally, say derived from a conjunction *kʷe ("and") + comparative *twor- ("more") and the word for 4 would have meant 2 doubled. But then the word for 2 was shortened to just *twor- which allowed the word for 4 to lose the ---*pola ending with the result that just *kʷetwore- came to mean 4.
ReplyDelete
Replies
Legal Translation Company in Dubai28 March 2022 at 14:03
United Arab Emirates, Federal Decree No. (128) of 2021
United Arab Emirates, Federal Decree No. 127 of 2021 No. (127) of 2021
United Arab Emirates, Federal Decree No. (125) of 2021
ReplyDelete
Replies

Add comment

Language Evolution

17 September 2014

Word of the Month: Proto-Indo-European ‘Four’

40 comments:

About me

Some really great blogs

Blog Archive

Popular Posts

Total Pageviews