22 February 2013

Yesterday’s Words – Today’s Morphemes –Tomorrow’s Segments

The final /θ/ of filth no longer plays any useful morphological function. It has become fused with its derivational base into an indivisible whole. This is quite often the terminal stage in the life-cycles of linguistic replicators.  Old English -þ- was still a morpheme, but it had already lost most of its phonological substance. A few hundred years earlier, in Proto-Germanic, its ancestral form had been *-iþō, continuing still earlier (pre-Germanic) *-étā. A linguistic entity that used to be a suffix of some length has ended up as phonological raw material. It means nothing by itself and has degenerated into a speech sound which, together with three others, encodes a meaning (or rather a cluster of meanings) but is no different, as far as its status is concerned,  from the final /m/ of film.

Whole words may become reduced to the role of ‘bound’ (non-independent) morphological elements. Many derivational affixes used to be words which, through being frequently used in composition, survived in that function while their free-standing variant went extinct. Old English hād meant ‘person, social status’. When added to a noun it meant ‘the state or condition of being an X’. Hence, for example, OE ċild-hād ‘infancy, childhood’. The word hād > hǭd lingered on in Middle English, but seems to have become rare by the thirteenth century and eventually died out as an independent word. Curiously, in modern ‘gangsta’ slang hood (no connection with hood = ‘head covering’) is used as an abbreviation of neighbourhood. It has become a word again, though with a brand new meaning.

Words have fractal-like properties: the more closely you look at them,
the more structure they reveal
When such reduction and fusion processes have operated for millennia, they may compact a whole string of morphemes into a short word without any visible internal structure. If you look at young /jʌŋ/ today, it’s short even for an English word. In the reconstructed remote ancestor of English, the Proto-Indo-European language, it looked roughly like this: *hju-hn̥-ḱó-s. The first element, *hju-, was the compositional variant of the noun *hóju ‘vitality, youthful vigour’; the second was a suffix (possibly derived from an independent word) meaning ‘having, loaded with’. Together they formed the noun *hjú-hon- meaning ‘energetic young man’ (literally: ‘having the strength of young age’, cf. Skt. yúvan-). The addition of the suffix *-ḱó- produced an adjective with the meaning ‘like a young man, juvenile’. We find its reflexes for example in Sanskrit (yuvaśá-), Latin (iuvencus), Welsh (ieuanc ~ ifanc), and of course in the Germanic languages (PGmc. *jungaz > OE ġeong ~ iung [juŋg] > young). In other words, the /jʌ/ part of young is what has remained of a once independent noun, and the /ŋ/ represents two concatenated morphemes compressed into a single segment. Incidentally, *hóju is a very interesting item in the Proto-Indo-European lexicon, and I hope to return to it soon. 


  1. Has *-étā survived in Polish?

  2. I started to look for the answer and I have found:
    Pl. jesieć (dial.) ‘grain sieve’; osieć (E. dial.) ‘granary’; jesiótka (dial.) ‘grain sieve’; osiótka (W dial.) ‘granary’
    OHG egida f. ‘harrow’; OE eg(e)þe f. ‘harrow’;

  3. Slavic has a very similar formation in *-ota: *vysota 'height', *širota 'width', *dьlgota 'length', *glupota 'stupidity', etc. It must be the same thing. I'm not sure how to explain the o-grade in Slavic. It seems to me that the suffix originated as the combination of the thematic vowel *-e/o- with the actual abstract suffix *-tah2. Since *o was the generalised colour of the thematic vowel in adjectives before case endings, it may have affected their derivatives too: *glupo-ta, etc. Compare the adjective *vyso-kъ 'high', in which *-kъ is a secondary extension (absent from the comparative *vyše).

  4. Can we be sure that PGmc. *-iþō continues pre-Germanic *-étā and not a suffix *-ítā that had been abstracted from i-stems?

  5. *-tah2(t)- abstracts were normally derived from adjectival stems, so we get e.g. Gk. barú-tēs 'heaviness', Skt. vasú-tā 'wealthiness' (from u-stems). It's hard to say what happened to the thematic vowel in such derivatives, since we have conflicting or ambiguous evidence. For example, Germanic *-iþō points to *-i- or *-e-, forms like Gk. neó-tēt- 'youth' and Slavic *-o-ta point to *-o-, Sanskrit -á-tā(t)- is compatible with either *-e- or *-o-, and Latin -i-tāt- (-ie-tāt- from *-io- stems) is compatible with just about any short vowel. I don't think a transfer from i-stems is likely, because i-stem adjectives were vanishingly rare in PIE. But *i is also a frequent alternant of the thematic vowel e.g. in the complex suffix *-i-ko- and several other formations. The allomorphy of stem-final *-e/o/i- is still poorly understood. I prefer *-e- in the ancestor of Germanic for two reasons: (1) there is no positive evidence for a high vowel in other branches (Latin is ambiguous); (2) it seems to me that *-i- was more likely to replace thematic *-e/o- if it wasn't accented. I may be wrong, of course.

  6. The meaning of IE *h₂oju- can be more accurately described as 'vital force' > 'lifetime'. Notice also the combinatory form of this word is actually *h₂ju-h₃-, which gives an ablauting pattern *jeu- ~ *jou- in Baltic and Celtic, hence the traditional reconstruction *(h₂)jeu-. Thus your own reconstruction is rather innovative.

    I agree this is a very insteresting item, and I'd link it (at a macro-comparative or supra-dialectal level) to *gʷje-h₃- 'to live'.

    1. LOL, you deleted and changed the comment while I was replying to it. Just a couple of additional points, then: Balto-Slavic *jeuHna- is a vriddhied thematic derivative of *h₂júh₃(o)n-, with the neo-full grade in the "wrong" place (as often in such cases). I would explain *gʷih₃w- quite differently:


  7. I agree that *h₂ was most likely a voiceles fricative in the uvular/pharyngeal range (velar or epiglottal values are also thinkable), but as the phonetic reconstruction is based on indirect evidence and hard to pinpoint, I prefer to err on the cautious side. When I write *h₂, every Indo-Europeanist knows at once what I mean without asking what my personal preferences are.

    I don't reconstruct the stem with a final *h₃. In *h₂jú-h₃on- we have the "Hoffmann suffix" *-h₃on- added to the zero-grade of the noun. *h₂jeu- 'young' does not exist as an adjective. It's a secondary full grade which isn't likely to be of PIE date (except possibly in a variant of the loc.sg.). The Sanskrit pattern ā́yu, gen. yóṣ, as if from *h₂ój-u/*h₂j-éu-s, looks impressively archaic but is in fact analogical (modelled on proterokinetic stems). The earliest reconstructible pattern was acrostatic, with *o/*e in the root (the latter coloured to *a by the laryngeal) -- something like nom.sg. *h₂óju, gen.sg. *h₂áju-s (→ *h₂áiw-os).

    And yes, it also developed meanings like 'lifetime, longevity, long time' etc. already at an early date.

  8. I'm not sure about what you mean by "indirect", as Anatolian (Hittite) does provide direct, although partial, evidence. Also data from other families (which most IE-ists seem reluctant to use) helps to the reconstruction. So in my opinion there's no excuse for not using a real phonemic value (albeit approximate) such as χ instead of the algebraic symbol H₂ (I prefer to use capitals for emphasis), which doesn't even represent a single but at least two
    different phonemes (not just allophones) depending on being part or not of the syllable nucleus.

    Also for *χáju- and *χjú-H₃en-
    I'd like to see the actual evidence for o as representative of the ablaut pattern in your reconstruction. And in the case of the "Hoffmann suffix", also for H₃.

    And although this doesn't seem to be the case, it's wrong to asume (as many IE-ists do) every case of non-ablauting *a is due to vowel-coloring. This is patent in Paleo-European substrate lexicon such as 'apple' and several 'water' words, where a vowel could be reconstructed.

    Notice also I carefully avoid using the term "PIE", because in my opinion it doesn't represent an actual and well-defined entity but rather a convenient fiction or comparative tool. I regard "PIE" as a screen where those features found in IE languages are projected, abut which at the same time hides the complexity (both diachronic and diatopic) behind it.

  9. The Spanish IE-ist Francisco Villar reconstructs a 4-vowel system i, e, ɑ, u for earlier stages of IE. Then a coming from *χe would have merged with in some IE languages, while in others would have been backed to *o, giving raise to a 5-vowel system. I find this a better explanation than the traditional hypothesis of the merger of o and a.

    This means (later shited either to a or o) can be either be apophonic or the product of vowel-coloring from a labialized "laryngeal". In fact, the case of the 'apple' word would be the latter, with *ɑ- < *ʕa-. In fact, Uralic *omena 'apple' (from another variant of the same "Nostratic" root) has *o- instead.


  10. I have my own ideas about PIE ablaut, but it would be premature to reveal them here before I have presented them to my collegues (to be criticised and perhaps demolished). As for symbols, an algebraic one is just as good as an IPA character as long as specialists agree on its meaning.

    As for *h₃ in the Hoffamnn's suffix, there are cases where it causes voicing when added to a voiceless stop. Since *h₃ seems to have been distinctively voiced (*pi-ph₃-e-ti > *pibeti) as opposed to the other two "laryngeals", I prefer the reconstruction *-h₃on- despite Hoffmann's own preference for the first laryngeal here. The full vocalism of the suffix would be *o posttonically even if the initial laryngeal had no colouring effect.

    1. As for symbols, an algebraic one is just as good as an IPA character as long as specialists agree on its meaning.
      The thing is H₂ is an ambiguous symbol because it can represent two different phonemes, either a consonant or a vowel (as e.g. in 'father').

      I also think IE-ists have abused of the cluster -pH₃- when reconstructing words such as the forementioned 'apple', so I'd prefer a more direct evidence of H₃.

  11. A syllabic consonant is not necessarily a separate phoneme. Also, the jury is still out on whether interconsonantal laryngeals in words like *ph₂tēr were really vocalised in PIE as opposed to being "repaired" in various ways (including cluster simplification in some environments and prop-vowel insertion in others, sometimes already in the protolanguage, but more often in the daughter languages).

    Voicing before the Hoffmann suffix is quite well attested. Not only in *h₂ap-h₃on- > *abon- 'river', where it was first identified by Eric Hamp, but also e.g. in numerous Latin nouns in -g-on- derived from stems ending in /k/ (vertex : vertīgō, etc.).

  12. Celtic *abon- is a derivate from Paleo-European *ɑb- 'water', a lexeme also found in Latin amnis. So I'm afraid there's no **H₂ap- here.

  13. Lat. abnis is of course cognate, but how does it demonstrate an underlying *b? Any labial stop followed by *n gives Latin /mn/, cf. *swepno- > somnus. So amnis may simply reflect *h₂ap-ni-, related to *h₂ap-no- (cf. Palaic hāpna- 'river'). I see no reason to label the Italic and Celtic 'river' words "Paleo-European" and separate them articifially from the Indo-Iranian and Anatolian word-family based on the acrostatic root noun *h₂ōp-s/*h₂ap- 'flowing water'.

  14. Remember my former comment about substrate languages? I'm sure you're acquainted with Krahe's "Alteuropäische" aka Old European Hydronymy (OEH). Besides Celtic and Latin, *ɑb- can be found in German river names in -apa, -affa, as pointed by Krahe himself.

    The word you mentioned is part of a family of 'water' words such as *ɑb-, *ɑkʷ-ā, *up-/*ub-, found in the OEH. For more information, I'd recommend you Villar et. al (2001): Lenguas, genes y culturas en la prehistoria de Europa y Asia Suroccidental.

  15. I'm familiar with that branch of research, but it is rather far from my idea of historical linguistica as a discipline based on sound and rigorous methodology. A "family" which includes *ab- ~ *akʷ-ā ~ *up-/*ub-, practically in free variation with each other, could include just about anything else. Espcially if a root is so short, making accidental similarity hard to rule out.

    See here, slides 9-10, for a cautionary example.

  16. I'm familiar with that branch of research, but it is rather far from my idea of historical linguistica as a discipline based on sound and rigorous methodology.
    Your methodology "sees" languages as complete systems with lexicon, morphology, syntax and so on. This is appliable to well-documented languages with large sets of data, but not to fragmentary systems such as substrates and long-range relationships. However, this doesn't mean research on the latter couldn't be as rigorous as in the former, but only it's much more difficult to achieve satisfactory results.

    A "family" which includes *ab- ~ *akʷ-ā ~ *up-/*ub-, practically in free variation with each other, could include just about anything else.
    Sorry, but I disagree. I'm afraid you threw the baby with the bathwater.

    As regarding "obscurum per obscurius", there's a funny joke. One night, a drunk man was searching in vain for his key under a solitary street lamp in a dark street. A passer-by saw this and asked him: -What are you doing? -I'm searching for my key. -Are you sure you lost it here? -No, but this is the only place where I could see it.

    This illustrates what many historical linguists do: trying to explain the unknown exclusively from what is known, as in Coates's etymology of the toponym London.

  17. Villar regards these 'water' words, as well as *ip-/*ib-, as descending from a common Paleo-IE ancestor language spoken in the Upper Paleolithic (Gravettian period), although I won't go that far. The ablaut pattern of these hydronyms points to a 3-vowel system *i, *ɑ, *u similar to the Semitic one, so a Neolithic chronology can be posited, in account of other lexical correspondences dating from that period.

    In particular, *ɑkʷ-ā would be cognate to the Hittite verb eku-/aku- 'to drink', which shows the std IE ablaut. This reminds me of Iranian *dānu- 'river' and IE *dhen- 'to flow', which can be linked with Sino-Tibetan *dhɨ̄n/*dhɨ̄ŋ 'to drink, to swallow' and Basque e-dan 'to drink'.

  18. Sorry, Octavià, but I think that in these matters we can only agree to politely disagree. "Fragmentary systems" are not distinguishable from chance agreements and random noise, as far as I'm concerned. It's only wishful thinking that makes people see Palaeolithic substrates and long-range agreements behind them. In my opinion it's better to admit ignorance than to work with insufficient data. But of course it's my approach and I don't question your freedom to experiment with looser methodology.

  19. This comment has been removed by the author.

  20. I'm no longer convinced by Kuryłowicz’ explanation of Ved. píbati etc. from *pí-ph₃e-ti. No laryngeal coloration occurs in Sicel πιβε ‘drink!’, Gaul. ibe ‘id.’, Old Ir. ibid ‘drinks’ < Celt. *fibeti, etc. If *ph₃ yielded *b, the thematic stem *pi-ph₃-e/o- should have been colored *pib-ø/o-. If there was pressure to remodel the colored thematic vowel after the usual uncolored *e, there would equally well have been pressure to remodel the consonantism after the usual reduplicated present type (as later in Lat. bibō). More plausibly *pib-e/o- is a thematization of earlier *pib-. This stem could have been extracted from a 2pl. middle imperative like those of the Vedic 3rd pres. class. Then corresponding to Ved. ju-hu-dhvá-m would have been *pi-ph₃-dʱwé ‘drink ye to one another!’, with indirect reciprocal force of the middle, referring to the drinkers’ tradition of toasting each other’s health. Hackstein’s Law *CH.CC > *C.CC (HS 115:1-22, 2002) would delete *h₃, whereupon *pdʱ would be assimilated to *bdʱ, and the new 2pl. impv. *pib-dʱwé could serve as the basis for a new athematic stem *pib-, later thematized as *pib-e/o- (which yielded the attested Ved. 2pl. mid. impv. piba-dhva-m, 3x RV, 1x AVP). Regeneration of a whole verbal paradigm from an imperative form is documented in Middle Indic. From dehi ‘give!’ (Ved. dehí, Av. dazdi, evidently corresponding to the Grk. aor. impv. δός + -θί, i.e. *dh₃es-dʱí), a new pres. stem de- was extracted, yielding deti ‘gives’ in Pāli and Prākrit. Thus *píbeti provides no basis for presuming that *h₃ or any laryngeal could spread its voicing or unvoicing to an adjacent stop. And so I see no advantage to positing *h₃ rather than *h₁ in the Hoffmann suffix. Hitt. ḫāpa- can have underlying *b, with Celt. *abon- continuing *h₂eb-h₁on-. Lat. vertīgō shouldn’t be regarded as a Hoffmann extension of vertex, -ĭcis, but as a replacement for a fem. *vortī-. For some reason vṛkī́ḥ-feminines became unacceptable in Italic and were repaired by conversion to i-stems (neptis) or jā-stems (avia), or by addition of *-k- (jūnīx, mātrīx, nūtrīx, etc.), *-nā- (gallīna), or *-gōn- (also in virāgō from *virā-, cf. Festus: feminas antiqui ... viras appellabant).

    1. Interesting.

      If there was pressure to remodel the colored thematic vowel after the usual uncolored *e, there would equally well have been pressure to remodel the consonantism after the usual reduplicated present type (as later in Lat. bibō).

      Well, analogy is often irregular...

      The rest of your point about *pib- is quite interesting. On the voicing power of *h₃, though, there's a paper by Robert Woodhouse that I may or may not have time to dig up this weekend.

    2. ...or the one after.

      Robert Woodhouse (2015): Two properties of PIE *h₃. Studia Etymologica Cracoviensia 20: 273–284.

      From p. 274:

      "Indeed I have managed
      to assemble a small number of examples in which (only) posttonic *h₃ changes an immediately preceding PIE tenuis into the corresponding PIE media (preglottalized voiced stop) in Vedic, Greek, Latin, Celtic and Slavic. The examples, aside from (1) the well known Ved. pres. píbati ‘drink’ : perf. participles papivā́ṃs-, pītá- (*ph₃i-) : Lat. bibō (with analogical initial), Gaul. ibeti-s, are (2) Gk. ὄγδοος ‘8th’ : ὀκτώ ‘8’ (perhaps *h₂ók₁th₃-uh₂o- : *h₂ok₁th₃-éh₁); (3) Gk. κύρβ(ε)ις ‘rotatable inscribed pyramid’ : καρπός ‘wrist’ : (*k₂órph₃is with *o > u by Cowgill’s law followed by delabialization : *k₂rph₃ós) : Lat. corbis ‘basket’, MIr. corb ‘car’, Russ. dial. korób ‘belly’ (with acute), whence Russ. koróbit’³ ‘bend, warp’, (*k₂erph₃- ‘turn’, cf. LIV2: 392), (4) Gk. κτύπος ‘loud noise’ : ἐρίγδουπος ‘loudsounding, thundering’ also, with originally accented augment, aor. ἐγδούπησαν (*-kth₃(e/o)up-)."

      Footnote 3 is strange:

      "A better derivation, as I now see, is directly from the PIE singular *kórph₃-ei having the same structure as *h₁/₃ók-ei (: *h₁/₃k-énti) deduced for the singular stem of Hitt. āk-i/akk- ‘die, be killed’ by Kloekhorst (2008: s.v.), a structure we shall meet again very soon in this paper (p. 253)."

      The notation *k₁, *k₂ is explained in footnote 2:

      "My PIE contains two series of tectals: prevelars = palatovelars k₁, g₁, g₁ʰ, subject to environmentally conditioned loss of the palatal feature, and backvelars k₂, g₂, g₂ʰ with environmentally conditioned labialization (Woodhouse 1998; 2005); a factual demonstration of this latter peculiarity will, I hope, shortly become available."

      I have not managed to get access to those papers.

      From p. 275:

      "[...] I think it is possible to obtain a fifth example of conditioned voicing by *h₃ by deriving Slavic *slě̑p- ‘blind’ and Sl. *slàb- ‘weak’ from the splitting of an ablauting paradigm *slṓph₃ ‘infirm’ with a similar pattern to the one Beekes (1995: 190) proposes for PIE *sṓm ‘one’ and on the assumption that the lengthened grade inhibits acuting by Winter’s law by eliminating the preglottalization just like any other laryngeal (see Kortlandt 1985: 115 on the loss of laryngeals in contact with a preceding lengthened grade vowel), thus:"

      I'm not reproducing the 2D scheme.

      The sixth example is presented in a way that is too long to present here (the whole paper is rather rambling in style); it is "αὐδή ‘voice, sound, speech’", "Ved. vádati, ppp. uditá- ‘speak, say, utter, tell, report’, OCS vaditi ‘accuse’", Latin votō > vetō and an etymon extracted from Hittite and Cuneiform Luvian that is itself too difficult to copy here.