17 September 2014

Word of the Month: Proto-Indo-European ‘Four’

As promised in a comment to my previous blog post, I’m going to discuss an etymological question: the origin and structure of the numeral ‘4’ in the Indo-European languages.

The Proto-Indo-European numeral ‘four’ had several intriguing properties. It was the largest non-complex cardinal number that agreed grammatically with a noun it modified. Consequently, it was inflected for gender and case, like any ordinary adjective. It shared that property with the words for ‘one’, ‘two’ and ‘three’. For obvious semantic reasons, their declension was defective: ‘one’ was normally singular, ‘two’ was declined only in the dual number, and ‘three’ and ‘four’ only in the plural.

The fourth is for luck.
The basic forms of the numeral ‘4’ (as reconstructed in handbooks) were the animate “count plural” *kʷetwores and the inanimate (neuter) “collective plural” *kʷetwōr (from earlier *kʷetwor-h₂). There is some uncertainty about the accentuation of these forms: some reconstruct them with PIE stress on the first syllable, others on the second (the comparative evidence is not unambiguous).

Proto-Indo-European probably had no feminine gender as a formal category, but it had ways to express femininity in derivatives. Curiously, the numerals ‘three’ and ‘four’ seem to have had feminine forms, preserved only in Celtic and Indo-Iranian. They are reconstructed as *tisres ‘3’ and *kʷetesres ‘4’. The final *-es is the familiar nom.pl. ending of animate stems ending in a consonant, but the rest looks baffling. The suffix *-sr-, known also from the Anatolian languages, where it forms nouns denoting human females, probably reflects an archaic, almost completely abandoned word for ‘woman’ (*ser-), although the zero grade (absence of a vowel) in the nom.pl. is aberrant; the initial part (*ti-, *kʷete-) looks in either case like the badly mangled residue of an actual numeral stem. Given the normal rules of IE word-formation, we would expect something like *trí-sor-es and *kʷétwr̥-sor-es. The characteristic “defects” of the attested forms are nevertheless shared between Celtic and Indo-Iranian; they must therefore go back at least to their most recent common ancestor. Such distortions are not quite unexpected in compound words, which commonly lose their transparency through irregular simplification.

Let’s ask a stupid question: what is *kʷetwores/*kʷetwōr the plural of? I mean, if it’s really an adjective, perhaps it had an older “etymological” meaning before it became part of the numeral system? If we strip off the inflections, what remains is the stem *kʷetwor-/*kʷetwr- (the second vowel is lost in so-called “weak” case-forms like loc.pl. *kʷetwr̥sú). This “bare” stem also occurs as a compositional variant of ‘four’, sometimes with  the final segments reversed (*kʷetwr̥- ~ *kʷetru-).

An Indo-European stem with four consonants and two vowel slots must have been morphologically complex at some point. The most likely division into morphemes would be *kʷet-w(o)r-. The *-w(o)r- part looks familiar. A suffix of this form is found in a number of Indo-European nouns, typically inanimate abstracts derived from verb roots. We also find it e.g. in the PIE word for ‘fire’, *páh₂wr̥, which is not obviously deverbal (though a connection with *pah₂- ‘guard’ is thinkable). We also have at least one evidently archaic example od an adjective built in the same manner. Beside the inanimate noun *p(e)iH-wr̥ ‘fat’ (Greek pĩar) we find an adjective meaning ‘fat, fertile’ whose masculine form was *p(e)iH-won-; its neuter must have been originally identical with the noun, and a suffixed feminine *piH-wer-ih₂ was added to the paradigm as the IE gender system developed a three-way contrast (I use the cover symbol *H here for a laryngeal whose “index” is hard to determine). Note the consonant alternation in the suffix: it’s characteristic of an entire class of neuters, so-called r/n-stems. They show *-r in the nom./acc. singular and collective (e.g. *páh₂-wōr, the collective of the ‘fire’ word), but *-n- in the remaining cases (like the gen.sg. *ph₂-wén-s). The variant *-n- is also expected in related animate forms, with the strange exception of *-r- occurring before the femininising suffix *-ih₂, as illustrated by the preserved forms of the adjective ‘fat’. The striking agreement between Greek píōn (m.), píeira (f.) and Vedic pī́van- (m.), pī́varī (f.) shows that this unusual alternation is inherited.

To continue our Gedankenexperiment: so far we haven’t identified the underlying root *kʷet-. Still, if we tentatively assume that it was indeed a verb root, some predictions can be made: beside the hypothetical abstract noun *kʷét-wr̥, possible derivatives include an adjective of exactly the same form in the inanimate gender. Its expected animate form would be *kʷét-won- (nom.sg. *kʷétwō, nom.pl. *kʷétwones). The neuter noun/adjective would form the collective plural *kʷétwōr. Of these forms, two can be regarded as attested: *kʷétwōr is a possible reconstruction of the neuter numeral, and *kʷétwr̥ is its uninflected compositional variant. Conspicuous by their absence are any forms with *n instead of *r. Why, for example, is the animate (masculine) plural *kʷetwores rather than *kʷetwones? The most natural explanation is that this particular plural isn’t old enough to participate in the *-n/r- alternation.

Let’s imagine that *kʷétwr̥ was originally a neuter noun (without an accompanying adjective). Whatever its etymological meaning (let’s symbolise it ‘X’), the collective plural *kʷétwōr (meaning ‘a set of instances of X’) came to be employed as a cardinal number, at first uninflected (like ‘five’, ‘six’, etc.), but eventually attracted into the adjective system, presumably on the analogy of the already adjectival numerals ‘two’ and ‘three’. In the early history of Indo-European the accent was often shifted to the second syllable in such collectives; hence the by-form *kʷ(e)twṓr, in which the first vowel could be phonetically reduced (*kʷətwṓr) or lost altogether. Non-initial stress is reflected in Germanic (cf. Gothic fidwor, displaying the voicing effect of Verner’s Law), and vowel reduction accounts for Latin quattuor (with Lat. /a/ from *ə).

When *kʷétwōr ~ *kʷ(e)twṓr came to be interpreted (and declined) as a neuter plural adjective, an animate counterpart was analogically supplied by adding appropriate inflectional endings to the stem *kʷétwor- or *kʷ(e)twór-. Since its origin as an n/r-noun had been forgotten by that time, PIE-speakers had no reason to make their life more difficult by reviving an ancient alternation. The only case-forms requiring distinctly animate inflections (different from neuter ones) were the nom.pl. (*-es) and acc.pl. (*-n̥s from earlier *-m̥-s). The unsettled stress pattern (*kʷétwores ~ *kʷ(e)twóres) may well be an old feature of the numeral ‘four’.

Some details require more attention, but first I would like to address the question left unanswered above: what exactly was *kʷet-, the root supposedly underlying the derivation of the numeral ‘four’? I will try to suggest an answer in the next post (later this week, I hope), so please stay tuned.

[back to the table of contents]

40 comments:

  1. (Greek pĩar, Vedic

    I think you made a typo in an <i> tag...

    and vowel reduction accounts for Latin quattuor (with Lat. /a/ from ).

    Intriguing; Latin has a lot of /a/ that seems to come out of nowhere, to the point that it's been called "unreliable" for drawing conclusions about PIE vowels. Can the vowel reduction, which presumably came with some shortening, account for the otherwise unexpected /tː/ by compensatory lengthening?

    ReplyDelete
    Replies
    1. I think you made a typo in an [...] tag...

      Thanks, fixed.

      Intriguing; Latin has a lot of /a/ that seems to come out of nowhere, to the point that it's been called "unreliable" for drawing conclusions about PIE vowels.

      The PIE "schwa secundum" (which I transcribe here as *ə) was a prop vowel breaking up difficult clusters (as an alternative to consonant deletion). From the phonological point of view, it was an allophone of zero, a non-phonemic segment. Of course, once epenthesises, it could be reinterpreted as an ordinary vowel. Many instances of Latin /a/ are thought to have such an origin, e.g. pandō < *patnō < *pətnah₂- (a nasal-infix present of *peth₂- 'fly') = //pt-ne-h₂-//; cf. Gk. pítnēmi (with the usual Greek reflex of the epenthetic vowel, found also in some variants of 'four').

      Can the vowel reduction, which presumably came with some shortening, account for the otherwise unexpected /tː/ by compensatory lengthening?

      According to Sihler's New comparative grammar of Greek and Latin 185.4 (p. 181 in the 1995 edition), *-tw- regularly became Classical Latin *-ttu- and Proto-Romance *-tt- after a stressed vowel (of course, *kʷatwōr had acquired initial stress at an earlier date). The change is not unlike West Germanic gemination. In epigraphic Latin obstruents are often more generally geminated before liquids and glides (PVBBLICO, ACQVA, etc).

      Delete
    2. Self-correction: ignore the asterisk in "Classical Latin *-ttu-".

      Delete
  2. Even if reduced grade were acceptable, there would be no basis to expect it in Latin 'four'. Oscan _petora_ 'four' (Fest.) and _petiru/o-pert_ 'four times' (Tab. Bant. 14/15), with Umbrian _peturpursus_ 'for quadrupeds' (Tab. Iguv. 6B:11), show that P-Italic inherited /e/-grade in the first syllable, and it makes sense that Q-Italic did also.

    The Latin /a/-vocalism must thus be due to contamination. The combining form _quadri/u-_ 'four-' also has -dr- which cannot be derived by verifiable soundlaws from any reasonable IE form of 'four-', so this cluster must also come from the contaminant. The adjective _quadrus_ 'square' could have been extracted from an obsolete noun *quadrum 'whetstone' which was interpreted as a substantivized adjective 'square (stone)', neuter after _saxum_, and later replaced by _co:s_.

    An IE root *k^weh1d- 'to wear down, abrade, sharpen by abrasion' accounts for this in addition to a group of Germanic words. Normal grade appears in ON _hváta_ 'to break through', /o/-grade in Go. _hwo:ta_ 'threat', and zero grade in OE _hwæt_ 'swift, brave', _hwæss_ 'sharp', _hwettan_ 'to sharpen, whet, incite', as well as Lat. *quadrum < *k^wh1d-róm. (Pokorny, IEW 636, lists the Gmc. words but wrongly includes Lat. _triquetrus_ instead of _quadrus_.)

    The inroad for contamination was probably 'forty'. The inherited tongue-twister *quetvora:ginta: was one syllable longer than _quadra:ginta:_, this originally a colloquial substitute 'square decades', i.e. 'decades on the points of a square'. Once this was established as 'four decades', a new combining form _quadri/u-_ was able to oust inherited *quetur- (= Umb. _petur-_, Skt. _catur-_), whose form was peculiar. New compounds along with _quadrus_ and _quadra:ginta:_ acquired enough collective strength to impose _qua_-anlaut on *queturs 'four times' (hence *quatur(s) > _quater_), *quetvo(:)r 'four', and even *quo:rtus 'fourth' (cf. Praenestine QVORTA, woman's name), turning it into _qua:rtus_.

    Sihler's explanation of -tt- in _quattuor_ (NCG §185.4) is laced with misinformation. It obviously has nothing to do with the gemination in _acqua_, condemned by Probus, which did not spread beyond northern Italy. Spanish _agua_ has no underlying geminate, but _cuatro_ does, since Sp. _piedra_ continues Lat. _petra_. And Lat. _mortuus_ (along with _perspicuus_ and the like) shows generalization of the post-heavy Sievers variant *-uwo- for simple *-wo-; it continues OL *mortuvos, not *mortvos (cf. Venetic _murtuvoi_ 'to the dead (man)'). (Likewise disyllabic Lat. _-ius_ continues post-heavy *-ijo- for simple *-jo-.) Lat. _bat(t)vere_ (incorrectly written _bat(t)uere_) is not comparable to _quattuor_, for Friulian has _bataye_ 'battle' from _battva:lia_, but _kutuardis_ 'fourteen' from _quattuordecim_. That is, pretonic -ttv- and -ttu- were kept distinct.

    Although Romance requires -tt-, QVATVOR does occur in inscriptions and manuscripts, as well as Medieval Latin (e.g. _quatuor socij_ in the Cuckoo Song instructions). This suggests an external source for the -tt-, such as crossing with Oscan _pettiur_, whose meaning has been disputed. Oscan -iu- for earlier post-dental *-u- is found in _tiurrí_, _eítiuvam_, etc., apparently the rising diphthong [ju] (Buck, OUG §56). And gemination occurred before [j], as in _Dekkieis_, gen. of _Dekis_ 'Decius' (ib. §162). Thus Osc. _pettiur_ could regularly continue earlier *petur, hypostatized from the combining form (Umb. _petur-_, Skt. _catur-_). Native speakers of Oscan (or closely related Sabine) who learned Latin might have had a hard time losing the geminate in 'four', thus making _quattuor_ out of _quatuor_. This is no more outlandish than the attested replacement of _poplicus_ by _publicus_, the latter with Sabine (or Oscan) phonetics, which left _populus_ unaffected.

    ReplyDelete
  3. Even if reduced grade were acceptable, there would be no basis to expect it in Latin 'four'. Oscan _petora_ 'four' (Fest.) and _petiru/o-pert_ 'four times' (Tab. Bant. 14/15), with Umbrian _peturpursus_ 'for quadrupeds' (Tab. Iguv. 6B:11), show that P-Italic inherited /e/-grade in the first syllable, and it makes sense that Q-Italic did also.

    We do not know which forms survived into Proto-Italic, but there's no reason why they should have had the same grade in the first syllable. If the numeral was still declinable, it may have had reflexes of such forms as the "mainstream" *kʷétwores with *e beside a restressed collective *kʷtwṓr; and either of them could have had weak cases of the amphikinetic type like *kʷtwr̥-bʰís, ending up with Italic *a as a remodelled zero-grade. Any levelling in favour of either vowel would have taken place independently in Latin and Sabellic.

    Sihler's explanation of -tt- in _quattuor_ (NCG §185.4) is laced with misinformation. It obviously has nothing to do with the gemination in _acqua_, condemned by Probus, which did not spread beyond northern Italy. Spanish _agua_ has no underlying geminate, but _cuatro_ does

    Sihler nowhere claims that they represent the same change (or that acqua underlies the modern Romance forms, or that the gemination in that word was regular or widespread). He merely adduces inscriptional examples of a tendency to lengthen obstruents in a similar context (which operated also independently in West Germanic and Sanskrit, among others).

    Although Romance requires -tt-, QVATVOR does occur in inscriptions and manuscripts, as well as Medieval Latin (e.g. _quatuor socij_ in the Cuckoo Song instructions). This suggests an external source for the -tt-...

    Whether we are dealing with an imperfect sound change leaving behind synchronic variation in Latin, or with external influence, the /tt/ is not particularly mysterious or isolated, which is the whole point here.

    ReplyDelete
    Replies
    1. Sorry for clumsy wording; I was typing in haste.

      it may have had reflexes...

      Read: "its paradigm may have included reflexes..."

      Delete
  4. Very interesting! :-) So there's a chance that modern Italian (pubblico, acqua...) continues an Oscan sound change?

    ReplyDelete
    Replies
    1. I did not intend to imply that Italian gemination in _acqua_ and _pubblico_ came from Oscan, and my gripe with Sihler is that gemination in _acqua_ and in _quattuor_ are two different phenomena, not to be carelessly lumped together. Posttonic gemination in It. _pubblico_ has nothing to do with the replacement of Latin _poplicus_ by Sabellized _publicus_ (probably due to heavy Sabine usage of the Ager Publicus for grazing), or the Sabellic origin of _populus_ itself centuries earlier. Gemination in It. _acqua_ is regular and parallel to that in _tacqui_ 'I was silent', in which Lat. _tacui:_ [tákuwi:] was first syncopated to *[tákwi:] in Vulgar Latin, then underwent gemination to *[tákkwi:] in Central Italian dialects of VL. According to Clara Hürlimann (Die Entwicklung des lateinischen _aqua_ in den romanischen Sprachen 9-12, Zürich 1903), _aqua_ with simple [k] is heard in the far North of Italy (Cremona, Padova, Torino, etc.), while _aqua_ and _acqua_ with [kk] coexist in the boundary zone (Venezia, Bologna, Piacenza, etc.). This gemination started early enough for the prescriptivist Probus to hear it and complain about it. By contrast the gemination in _quattuor_ is Pan-Romance and did not bother Probus.

      I neglected to explain how _quattuor_ became trisyllabic, as reflected in Friulian _kutuardis_ and Veglian _kuatuarko_ 'fourteen', which require -ttu- not -ttv-. Intervocalic glides were not represented in the usual Latin orthography, thus MORTVOS or MORTVVS [mórtuwus]. Ambiguity was tolerated in VOLVI 'I wished' or 'I rolled' ([wóluwi:], [wólwi:]) and PARVI 'I prepared' or 'small ones' ([páruwi:], [párwi:]). This problem did not arise in Roman Latin after non-liquids. Vergil scanned GENVA 'knees' as disyllabic, apparently a dialectal form exhibiting precocious syncope, [génwa] for RL [génuwa]. Evidently the post-heavy Sievers alternants *-uwo- and *-ijo were generalized following all consonants. De Vaan (EDL 389-90, discussing _mortuus_) suggests that this was conditioned by /t/ in Proto-Italic, but that does not explain _perspicuus_ or _continuus_. With _arvus_ and _salvus_ we have secondary posttonic syncope of *-uwo- after liquids, undone in verbal forms like [wóluwi:] by analogy with [woluwísti:]. How late this syncope occurred is indicated by the Plautine trisyllabic scansion _larua_ [láruwa].

      At some point in the history of Latin, I presume that *kWetwo(:)r, whatever its accent at the time, thus became *kWetuwo(:)r. Even though 'four' did not contain the suffix *-(u)wo-, analogy turned a strictly morphological replacement into a phonetic one. Similarly *gWemjo: or *wemjo: 'I come' became *gWenjo: or *wenjo:, then *gWenijo: or *wenijo:, Latin _venio:_; *kWom-ja(:)m 'whereas' became *kWonja(:)m, then *kWonija(:)m, Lat. _quoniam_.

      Prehistoric Latin thus had *quetuor 'four' and *quetuora:ginta: 'forty'. The latter hexasyllabic monstrosity would be highly susceptible to replacement by the colloquial _quadra:ginta:_ as described earlier, which would save two syllables and a great deal of tongue-twisting with the awkward sequence of vowels. The /a/-vocalism would spread to compounds, then to numeral forms as described earlier, producing _quatuor_. Being trisyllabic, this was no more susceptible to gemination within classical Latin than _vidua_ [wíduwa] 'widow'. The -tt- had to come from outside Latin. Postclassically, the simplex [kWáttuwor] was syncopated to *[kWáttwor] and reduced to *[kWáttor], thus resembling the outcome of _batuo:_, [bátuwo:] > *[bátwo:] > *[báttwo:] > *[bátto:]. But Friul. _kutuardis_ < _quattuórdecim_ against _bataye_ < _battvá:lia_ (postclass.) retains the distinction.

      Delete
    2. According to Clara Hürlimann (Die Entwicklung des lateinischen _aqua_ in den romanischen Sprachen 9-12, Zürich 1903), _aqua_ with simple [k] is heard in the far North of Italy (Cremona, Padova, Torino, etc.), while _aqua_ and _acqua_ with [kk] coexist in the boundary zone (Venezia, Bologna, Piacenza, etc.).

      Any connection to the fact that consonant length is lost across the board in northern Italy?

      Delete

    3. To my knowledge, the degemination of _acqua_ in North Italian dialects reported by Hürlimann is indeed identical to the general degemination, and so her crooked Venezia-Bologna-Piacenza line has no special significance for the northern limit of Vulgar Latin _acqua_, which cannot be determined from modern dialects. The Venetian dialect has only ss and zz (= Tuscan cc ) in native words, mere graphic geminates indicating nonvoicing. Venetian _acquarèla_ and _acquisiòla_ 'quando l'altezza dell'acqua del mare è più della comune, ma non può dirsi alta' i.e. 'slightly higher sea level than normal' (cited by A.P. Ninni, Giunti e Correzioni al Dizionario del Dialetto Veneziano, Venezia 1890) are either half-Tuscanisms or written according to the Tuscan spelling _acqua_. Tuscanized spelling is used in _acqua_ and all its derivatives by G. Patriarchi (Vocabolario Veneziano e Padovano, 3rd ed., Padova 1821), with the only concession to local pronunciation the isolated entry "aqua: vedi acqua". The written form of Bolognese makes extensive use of geminates corresponding to those in Tuscan, but these are explicitly stated to be graphic by C.E. Ferraro (Vocabolario Bolognese-Italiano, 2nd ed., Bologna 1835, p. XVII): "Per regola generale le consonanti doppie non si pronunziano che per semplici nel linguaggio bolognese come nel francese." On the other hand no such statement regarding the consonants is made by L. Foresti (Vocabolario Piacentino-Italiano, Piacenza 1836), who does provide detailed information on the contrast between the vowel-systems of Piacentino and the standard language, and whose written Piacentino includes many examples of geminates not occurring in Toscano due to more extensive syncope in the former. However I find it hard to believe that Piacenza is an outlier of true original geminates. I cannot say whether Foresti's written geminates are merely graphic or represent (at least in part) secondary gemination due to syncope and assimilation, and I welcome information from anyone familiar with this dialect.

      Delete
  5. "The most natural explanation is that this particular plural isn’t old enough to participate in the *-n/r- alternation."

    Could this be evidence that the lemma replaced an older *h3ekt- "four", now only preserved in the dual *h3ekt-eh3/1 "eight" and Avestan asti "length of four fingers" (< PIE *h3ekt- + Indo-Iranian length stem *-ti)?

    ReplyDelete
    Replies
    1. The Core IE numeral 'eight' is certainly a dual form, which makes it likely that the corresponding singular meant 'four' (at least approximately) at some point. It's a thematic dual, so the stem without the dual inflection is *Hoḱto-. Avestan ašti- means the breadth (not the length) of four fingers (= one "palm"). I-stem nouns parallel to o-stems of adjectival origin are common in Indo-European. The "index" of the initial laryngeal is hard to detemine if the root vocalism is *o. One popular hypothesis is that the word is derived from *h₂aḱ- 'be sharp, sharpen' and somehow connected with *h₂oḱetah₂ 'harrow, rake', with reflexes in several branches. The semantics, however ('an implement with sharp tines' → 'the four fingers of a hand'), is a little convoluted.

      Delete
    2. By length, I meant the distance of four-fingers-breadth. I used the word in that general "end-to-end" sense and apologize for the infelicitous wording.

      As for divining which H-initial, those connections always struck me as somewhat strained for semantic reasons you already mentioned. Given that there are three competing concepts of the number four in various stages of Proto-Indo-European (*k^wetwores, *meju-, and *Hok`to-), does this imply that a pre-Proto-Indo-European may have lacked the numeral? Typologically this is nothing unusual to lack numerals above three. Historically numbers six, seven, and beyond have long been suspected of being loans from other languages. Areally, other language families nearby the PIE urheimat frequently borrowed numbers four and beyond (e.g., Kartvelian languages).

      Sorry for the million replies I had to delete.

      Delete
    3. We don't know how old the "classical" decimal system of Core IE is. '5, 6, 7' are all a bit peculiar ('7' is the best candidate for a loanword). '8' looks like a dual. '9' is etymologically opaque (a connection with 'new' is hardly secure), '10', for a change, is a well-behaved stem, apparently an original neuter, *déḱm̥t-, with normally formed and regularly ablauting (dḱm̥t-/dḱomt-) dual and collective plural forms attested in the decadic numerals '20, 30, ... 90', and a thematic derivative serving as '100' ((d)ḱm̥t-ó-m). The pattern looks old because of its complex ablaut, but unfortunately the Anatolian evidence for anything above '4' is problematic (even the interpretation of šiptamiya- as 'sevenfold' is uncertain). We know that the Anatolians used a decimal system, but how much was inherited and how much borrowed from their Middle Eastern neighbours remains undecidable. We can only curse them for having used logographic writings so consistently.

      Delete
    4. On a related question, then, has there ever been a satisfactory explanation given as to why PIE inflected the numerals 1 - 4 for gender and case but did not for 5 and beyond? What sort of grammar scheme would give rise to the partitioned syntax?

      Delete
    5. I can only speak for myself. One possibility is that the numerals above 4 were originally adverbs (rather than adjectives) -- formally, nom./acc. neuters like Lat. multum or facile (note that '10' and '100' are inflected like neuters when forming complex numeral expressions). This doesn't quite work for '8', which has an animate (nom./acc.) dual ending, but if '8' was derived from '4' (or a noun meaning 'four of [something]'), at least its exceptional behaviour is not inexplicable. I would speculate that there was a time when pre-PIE speakers had a very restricted counting system (1-4, if that), in which all the numerals were adjectives. As the system was extended, adverbs were co-opted as the higher numerals, as if modifying the degree to which a noun was plural. the odd-looking *pénkʷe could be an archaism predating the spread of *-m to thematic neuters (i.e., a bare stem, like the nom./acc. of athematic neuters).

      Delete
    6. Somewhere I once read that the Moscow School had reconstructed a Proto-Caucasian* *fimkʼwe "5" which looks like a possible source for a loan into PIE. (Not in the other direction; then we'd expect *pʰ or *pʼ instead of *f.) I can't find the source now. :-(

      * North Caucasian of course, excluding Kartvelian.

      Delete
    7. I wonder what the evidence for such a form would be. Below, at any rate is the relevent entry in the Tower of Babel database (the reconstruction is Sergei Starostin's). By the way, they reconstruct a large set of numerals (including '100') all the way back to "Proto-Sino-Caucasian".

      Entry: PNC *f̠ɦä̆ 'five'

      And here some of the competing reconstructions of "North Caucasian" numerals are compared:

      Wikipedia: North Caucasian languages

      Delete
    8. North Caucasian languages are some of the least likely languages to loan material into PIE, especially a numeral.

      Delete
  6. Oh, sorry, not "five", but "fist"! And only (North)East Caucasian, it hasn't been found in the West. Here's the Starling entry. "Notes: Reconstructed for the PEC level. Correspondences are regular (one of the roots with the relatively rare phoneme *f)."

    Judging from a paper where I found it, it's entry number 428 in Starostin & Nikolayev's (1994) North Caucasian Etymological Dictionary, which I don't have (except for the preface which doesn't mention it). Starostin (1988) cited it on p. 119 as Proto-East Caucasian *X̄wink'wV (for which he listed plausible reflexes in 6 languages "and others"), where "X" probably means [χ].

    North Caucasian languages are some of the least likely languages to loan material into PIE

    Why do you think so? There are plenty of similarities that have been considered evidence for contact (or even, by a few people, the idea that IE and NWC are sister-groups). Geographically it makes sense; that there are similarities which require some explanation other than chance is not controversial as far as I know.

    ReplyDelete
    Replies
    1. Yes, IE '5' can be straightforwardly derived from East Caucasian (Nakh-Daghestanian) 'fist'. Likewise, IE '2' is clearly related to the same numeral in West Caucasian (Abhaz-Adyghe).

      There're other interesting lexical correspondences between EC and IE pointed by Starostin in an old Russian article. My guess is the language(s) spoken by the nomadic shepherds of the Pontic Steppes (which I call "Kurganic" and makes up a significant part of the IE core lexicon) was either part or a neighbour of East Caucasian.

      Delete

    2. Since *f is such a rare phoneme in NEC, I wonder whether 'fist' there is simply a Gothic loanword. It is uncontroversial that Go. galga 'stake, cross, gallows' was borrowed into some of the Caucasian languages. One can envision the Gothic overlords using mailed fists to deal with minor troublemakers, while major ones got the gallows.

      Delete
    3. This is out the question. However, Nakh has borrowings from Ossetian (an Indo-Iranian language spoken in the Caucasus) and other external sources which Starostin mistook for native words. For example, Nakh *ford 'sea' (Chechen hord, Ingush hord) could be indeed a Gothic loanword with a semantic shift parallel to the one of Greek póntos.

      Delete
    4. Actually, NEC *f should be replaced in reconstructions by *χʷ or even *ħʷ (if I'm not mistaken, Dolgopolsky used the "joker" X for this). This way, IE *p in the numeral '5' would be the product of the development of labial clusters in "Eurasiatic" which I mentioned before.

      Delete
    5. I meant Dolgopolsky's X stands for χ ~ ħ. This way, Macro-Caucasian and "Borean" *Xʷ would give Eurasiatic *pʰ > IE *p.

      Delete
    6. Since *f is such a rare phoneme in NEC, I wonder whether 'fist' there is simply a Gothic loanword. It is uncontroversial that Go. galga 'stake, cross, gallows' was borrowed into some of the Caucasian languages. One can envision the Gothic overlords using mailed fists to deal with minor troublemakers, while major ones got the gallows.

      I'm not aware of evidence that the Gothic kingdom extended that far southwest – and the timing is off by thousands of years! You'd need to postulate a Wanderwort long after */f/ was gone.

      The fact that */f/ was rare is easily explained by its origins: Starostin derived it from Proto-Sino-Caucasian clusters like */xw/ and */xŋw/.

      However, Nakh has borrowings from Ossetian (an Indo-Iranian language spoken in the Caucasus) and other external sources which Starostin mistook for native words.

      This, on the other hand, would have allowed Gothic words to penetrate into the Caucasus; Ossetic is descended straight from the language of the Alans, whose close association with the Goths is well known.

      Macro-Caucasian and "Borean" *Xʷ would give Eurasiatic *pʰ

      This is still the kind of shift I expect in loans, not in native vocabulary. There are plenty of languages that have borrowed foreign /f/ as /p/ or /pʰ/, but is that attested as a sound shift anywhere? The closest thing that comes to mind is the Samoyedic */s/ > */t/ shift, and I don't know how safe that reconstruction actually is.

      Delete
    7. that have borrowed foreign /f/ as /p/ or /pʰ/

      Or foreign /x/ as /k/ or /kʰ/, for that matter.

      Delete
    8. The "hardening" of continuants (glides, fricatives) into stops is not all that rare as a sound change, especially in positions that favour fortition (word-initially). We have /θ, ð/ > /t, d/ in most modern descendants of Proto-Germanic (including some accents of English), Latin /w-/ > /β-/ > [b-] in Spanish (word-initially), similar changes in modern Indo-Iranian languages, /w-/ > /g-/ in Armenian, /w, j/ > /kʷ, č/ in Klallam, etc.

      /f/ (or /ɸ/) > /p/ is rare enough in this direction to be considered "counternatural", but I wouldn't rule it out completely. I've seen Proto-Gbe *χʷ > *ɸ > /p/ proposed for Gen.

      There are also some bizarre special cases, like reverse lenition in Gaelic (a morphophonological process, so not strictly speaking a sound shift), thanks to which we have Scottish Gaelic piuthar /pjuəɾ/ 'sister' from *swésōr (OIr. siur → len. fiur).

      Delete
    9. Or foreign /x/ as /k/ or /kʰ/, for that matter.
      That's right. Take for example NEC *χχHweje 'dog' ~ Uralic *koje 'man, person' and NEC *χχHwej-rV 'dog' (oblique stem) ~ Uralic *kojra 'male (dog, man)'.

      Delete
    10. However, the "hardening" of fricatives don't necessarily indicate borrowing. Uralic, for example, lacks labial and velar fricatives, so they must have evolved into the corresponding stops.

      Delete
    11. What if *H is [ʔ] or [ʡ]? Then the *χχ could be a red herring that was lost on the Uralic side, NEC turned, say, *q into *H while Uralic instead merged it into *k.

      ...if we accept the rather bizarre semantic shift and the geographic convolutions for the sake of the argument. You'll need a lot more examples of regular sound correspondences to convince me of those.

      Delete
    12. This comment has been removed by the author.

      Delete
    13. Actually, we've also got Tibeto-Burman *qhʷi:j 'dog', so NEC *χχ ~ TB *q ~ Uralic *k.

      As for the "semantic shift", I think the word originaly meant 'male' and somewhere in Asia specialized as a Wanderwort 'dog' which spread to NEC, TB and even to Kartvelian (Swan xwir- '(male) dog'). Interestingly, the IE cognate would be *wi:ro- 'man, husband'.

      Delete
    14. Incidentally, Sinitic *khʷi:n 'dog' would be a variant of the same Wanderwort with a different suffix borrowed into IE *k´(u)wo:n, whose direction of borrowing is often reversed by Indo-Europeanists.

      Delete
    15. Interestingly, the IE cognate would be *wi:ro- 'man, husband'.

      Then why doesn't it start with *h₂?

      Also, why are you assuming a phoneme *i:? I've seen that word reconstructed with *h₁...

      Delete
    16. IE-ists don't reconstruct an initial "laryngeal" because they don't regard it as necessary. However, in a macro-comparative basis we should reconstruct *(H)wi:ro-, as does Starostin (Jr.?) in his own database.

      On the other hand, I don't follow the ortodox model in attributing EVERY long vowel to *h₁ after a short vowel, nor other instances where "laryngeals" are introduced against Occam's Razor.

      Delete
    17. It isn't just the long vowel that makes the laryngeal necessary. In Baltic, the word is affected by Hirt's Law (výras, with a retracted acute accent, cf. Ved. vīrá-). As for an initial larygeal here, there's no trace of any such thing in Indo-European. In particular, Vedic compounds such as a-vīra- 'unmanly' show no laryngeal lengthening (as opposed to sūnára- < *h₁su-h₂ner-o- 'manly, brave').

      Delete
    18. Another hypothesis is that the cardinal numbers 2 to 4 would come from names of the fingers or somehow symbolically linked to them. While counting, one would have pointed at the respective fingers thus saying one, two and three, four and the hand, ("and" being "-kwe"). Over a longer period of time, the initial system would not have been understood anymore. The "-kwe" after three would have been felt as the beginning of four. And since the word "hand" would have become too linked to the idea of "five", each indo-european branch would have developed a new word of its own to make the distinction clear.

      Delete
    19. But this would only make sense within an isolationist framework which disregards interlinguistic contacts. In fact, some of the higher numerals ('6', '7') are Wanderwörter of Semitic origin.

      Delete
    20. Are they really? I've read about the idea that *septḿ̩ is from the Semitic feminine *sabʕatum (if I remember that form correctly!), but why didn't that give **sebh₃otm̩ or something like that?

      Delete