19 July 2013

Consider a Hammer

Figure1: Co-opting a natural object
According to Wikipedia, “A hammer is a tool meant to deliver an impact to an object. The most common uses for hammers are to drive nails, fit parts, forge metal and break apart objects. Hammers are often designed for a specific purpose, and vary in their shape and structure.” Hammers have been shaped by the functions they typically perform. A heavy metal head fixed on a light handle stores kinetic energy before the blow is delivered. The length, cross-section and shape of the handle are ergonomically adapted to human handgrip and typical working conditions. There are functionally motivated differences between, say, a light claw hammer used for driving and removing nails and a heavy-duty sledgehammer used for tearing down walls.

Figure 2: Putting a handle on it
The ancestors of all hammers were natural cobbles used as hammerstones by Palaeolithic humans (as well as earlier hominins). They carried out some of the same functions as modern hammers, albeit less efficiently. There was no handle (its function was played by the user’s arm), and hammerstones used for different purposes had the same general shape, differing mostly in size and weight. Small gradual improvements and  occasional major inventions (a wooden handle, the use of bronze or steel instead of stone) transformed the primitive tool visible in Figure 1 into a more sophisticated version (Figure 2), and finally into a fully streamlined  modern hammer (Figure 3).

Figure 3: Shaped by its functions
Of course a hammer can be used for many other purposes beside pounding nails into things or splitting hard objects. It can serve as a makeshift paperweight, a percussion instrument (as in Penderecki’s De Natura Sonoris No. 2), an improvised weapon, and even as a ritual or ceremonial object – for example, the emblem of a smithing god. Such accidental functions do not normally influence the evolution of hammers. If a type of hammer acquires a historically stable secondary function (e.g. removing nails), you can see the characteristic adaptations (a flattened and rounded claw), copied and perfected by new generations of hammer manufacturers. Ad hoc functions have no such consequences. Nobody modifies the shape of a hammer to make it a better paperweight. Its only when a hammer is regularly recruited for a new task that adaptation begins to shape it in a new way. This may lead to the emergence of highly specialised hammers (such as the judge’s gavel or the doctor’s knee mallet).

The first hammers were not designed by anyone. Our distant ancestors learnt to select naturally formed stones. Then they learnt to improve their shape, fix them on a handle to optimise energy transmission, etc. The functional features of a hammer are those that have been consistently selected for in the past. It’s always possible to use a tool in an unconventional way, but such occasional applications don’t explain why the tool looks the way it does. Some features (for example, the colour of the handle) are free to vary. They are non-adaptive, devoid of functional importance.

I hope you can see how this hammer analogy can be applied to linguistic structures. That’s what will be done in the next post.

18 July 2013

Who Benefits from Language Change?

Since functionalism treats language as a tool designed and perfected by humans to serve their needs, it understands function as a purpose-oriented property of linguistic structures: it is a way of achieving a communicative aim by linguistic means. Language is fine-tuned to optimise communication, which means, among other things, that the natural conflict between the speaker’s needs (encoding and sending linguistic messages at a low cost) and the listener’s needs (receiving and decoding messages without unnecessary effort) must be resolved. Languages maintain a delicate balance between ease of production and ease of perception. For example, precise enunciation is expensive in terms of articulatory effort and neuromuscular control, but if the speaker tries to reduce this cost excessively by sacrifying precision, the result may be the listener’s failure to understand the message. Since having to repeat a sentence twice is usually costlier than saying it once with sufficient clarity, the speaker has to anticipate any undesirable difficulties at the listener’s end, and the tendency to favour ease of articulation is mitigated by those anticipations.

To whose benefit?
Artist: Matthew Martin
Language change can make life minimally easier for the speaker or the listener. Sound changes are often classified into “lenitions” (weakenings) and “fortitions” (strengthenings). Weakenings consist in reducing articulatory effort (and the acoustic prominence of speech sounds), while strengthenings involve increased effort (and acoustic prominence). In this dualist interpretation, weakenings are speaker-oriented, while strengthenings are listener-oriented. Any change has a purpose, and therefore a functional significance – all that needs to be determined is its orientation: cui bono?

Note, however, that an explanatory statement like ‘/t/-glottaling occurs in some accents of English because it is a speaker-friendly articulatory weakening’ is hard to falsify. Whatever happens to the phonetic realisation of /t/, you can always “explain” it in a circular fashion as an attempt to improve either ease of production or ease of perception. A change can’t be functionally neutral simply because there’s no place for such a thing in the functionalist view of language. It would be nice if we could predict when change will be driven by the speaker’s or the listener’s needs (or when nothing happens). If instead we identify the motivating factor after the fact, depending on the outcome, it’s an “either way I win” kind of game, where you can explain everything but predict nothing. Of course there are some characteristic cross-linguistic “hotspots” of change: weakenings are more likely in unstressed environments or syllable-finally; strengthenings happen more often under stress and syllable-initially. This kind of conditioning, however, is sensitive to the segmental and prosodic context rather than the needs of language users.

Then, there are classificatory problems. In non-rhotic varieties of English final or preconsonantal /r/ becomes vocalised. If preceded by a full vowel, it coalesces with it, causing the vowel to undergo lengthening and/or diphthongisation (e.g. /kard/ > /kɑːd/ ‘card’, /niːr/ > /nɪə/ ‘near’). Whose life is made easier by this change? Is it weakening, strengthening, or six of the one and half a dozen of the other? Doesn’t the increased length/complexity of the syllable nucleus compensate for the consonant loss? What about the fact that the phonemic inventory of non-rhotic English may become larger and more complex as a result? If both the speaker and the listener lose something and gain something else at the same time, why bother changing anything? Why does this kind of change spread at all if there’s no clear net gain from it for anybody?

There are accents of American English where /æ/ is tensed, raised and diphthongised, becoming [eə]. This can be regarded as phonological reinforcement, and therefore a kind of strengthening. The vowel becomes more salient, which might benefit the listener. But in most varieties of American English the change is restricted to certain environments: some accents have it only before nasals, others before nasals and voiceless fricatives, and still others before nasals, voiceless fricatives, and voiced stops (often with lexical exceptions). Why is the presumed anticipation of the listener’s needs selective in this way?

Some “functions” are self-evident. It is obvious that the function of a word is to carry a lexical meaning and a syntactic role (sometimes more than one). There are no completely functionless words practically by definition. But what, for instance, is the function of the final /st/ in amongst (synonymous with among)? Whose convenience does it serve? If semantic change takes place, as when Old English cniht ‘boy’ developed into Middle English knyght ‘knight, nobleman’, how does one measure its impact on communication? If this particular shift was motivated by some functional pressure, I would like to hear the details.

In the next post I shall try to re-define function in such a way that it becomes less teleological and more distinguishable from accidental byproducts of linguistic evolution. Please be prepared to consider the possibility that language structure is not entirely rational, functional, or intelligently designed.

16 July 2013

Language as Clockwork

Proto-World was fun, wasn’t it? but there’s little I can add to the topic. If any readers of this blog would like to continue discussing mass comparison and global etymologies, they are welcome to do so in the comment boxes in that thread. Let’s change the perspective again and focus on linguistic microevolution. In the nearest future I would like to discuss the following things: the notion of “function” in linguistics, and two fundamental mechanism of evolution: adaptative change and random drift.

Functional approaches to language emphasise the view of language as an instrument of human communication and social interaction. Therefore, functional factors such as people’s communicative needs (and in particular considerations of iconicity, economy, and ease of processing) are thought to exert influence on the course of language change: some changes are advantageous for effective communication and therefore encouraged by functional motivations, while others are deleterious and therefore discouraged. There is an understandable tendency among functionally inclined linguists to regard all elements of language as functional in some sense (like the interlocking parts of a carefully designed clockwork mechanism), and to insist that any explanation of language change should assume the form of a functionally motivated scenario (change happens for a “reason”). The idea that a language system can be to a large extent messy and basically functionless, and that much of language change is random and neutral (or as nearly neutral as matters) with respect to its users’ needs and goals, flies in the face of the tenets of functionalism, and so may seem provocative to many mainstream linguists. It will be defended here, but first I shall take a closer look at the fuzzy concept of “function” and the role it plays in linguistics. This is what the next post will be about.

05 July 2013

Global Water for the Last Time

I’m sorry for such a long break since the last post, but the end of the academic year is a busy time. Where were we? Ah, yes, the global etymon meaning ‘water’.

I analysed the Indo-European evidence in some detail to highlight the fact that, although Latin aqua has cognates here and there in Indo-European, its attestation is too weak to treat the word as reconstructible all the way back to Proto-Indo-European. It’s a regional word with uncertain affinities, and surely not the PIE ‘water’ word (there are better candidates for that status). Its story contains a moral: sheer similarity, even within an uncontroversial family, doesn’t mean anything by itself. There is an inherited verb root meaning ‘drink’ which looks tantalisingly similar to aqua (and was once regarded as related to it), but which has to be separated from it, given what we know today. Our improved understanding of some of the languages of the past (such as Hittite and the rest of the Anatolian clade) has forced us to abandon quite a few superficially promising etymologies. And it’s a good thing: it shows that etymologies are in principle falsifiable. All you need is a good model within which they can be evaluated.

Of course absence of evidence is not evidence of absence. It may conceivably happen that a word present in a protolanguage survives only in one language descended from it, or in a small cluster of related languages. In such cases, outgroup comparison may still enable us to recognise the word as inherited. We only need some secure external cognates and a consistent pattern of correspondences. We can’t, however, trust conclusions drawn only from the existence of vaguely similar words scattered across several families, especially if there is no pattern they could fit into because the researchers feel free to avoid real reconstructive work. If you look at Bengtson & Ruhlen (B&R)’s data, you will find many clear examples of “reaching down” (selecting isolated lookalikes and pretending they represent the families in question).

For example, words related to aqua are claimed to be present in Afro-Asiatic, while in fact all the proposed cognates  come from two periferal branches: Omotic (whose very membership in Afro-Asiatic is is uncertain) and Cushitic (whose exact location in a the AA family tree is anything but clear, but which is areally close to Omotic, so that borrowing between them is hard to rule out). The meaning of the suggested cognates is sometimes ‘water’, (but also ‘[to be] wet’, ‘drink’ or ‘drops of water’). But what about the Berber, Chadic, Egyptian and Semitic branches of Afroasiatic, where no such item occurs? What about alternative ‘water’ words which can be found in Cushitic and/or Omotic? (By the way, putative cognates of aqua occur only in North Omotic.) Afro-Asiatic is a big family, with about 300 extant members. With so many languages and “related meanings” to choose from, and with no formal controls, pseudo-cognates crop up inevitably. An Amerind Etymological Dictionary (Greenberg & Ruhlen 2007) lists no fewer than seventeen different etyma meaning ‘water’: *aqʷ’a/*uqʷ’a (of course), but also *man, *poi, *re, *si, *kʷati, *p’ak, *na, *ʔali, *pan, *tuna, *c’i, *kam ~ *kom, *to ~ *do, *kona, *xi, and *hobi (while we’re at it, there are also eight Amerind words for ‘dog’ and thirteen for ‘eye’). These forms are not real comparative reconstructions (their phonetic details are nowhere dicussed or justified) and must be treated as approximate, which of course makes comparison as easy as pie, especially if semantics is given as much leeway as phonology.

Lost in distillation
[Source: Wikimedia]
If you don’t reconstruct past sound changes, how can you decide whether, e.g., French eau (pronounced /o/) is related to Spanish agua, or that both of them are related to Romanian apă? Note that these three modern Romance languages began to diverge less than two thousand years ago. Their modern ‘water’ words are already more different from the common ancestor (yes, Latin aqua) than the latter is from, say, some of the “Amerind” forms cited by B&R. Sound change may be rapid and dramatic. What, then, constitutes a “match” if you are comparing languages supposedly separated by 10,000 or 20,000 years of independent development, and if you can’t even be bothered to study systematic sound correspondences or morphological patterns? Ignorance helps you to see patterns that knowledge dispels at once. In Kove, one of the Austronesian languages of New Britain (in the Bismarck Archipelago), water is called eau. If we knew less than we do about the history of French (or Kove, for that matter), we might suspect a long-range connection, mightn’t we? Is Proto-Pama-Nyungan *nguku/i (which should replace B&R’s anachronistic “Proto-Australian” *gugu) related to Lat. aqua? Well, if I am shown a serious etymological proposal, with the relevant sound changes, morphological derivations and semantic shifts (if any) all spelt out, I’ll tell you what I think of it. Untestable guesswork hardly deserves to be discussed.

A “cognate” like “Proto-Central-Algonquian *akwā ‘from water’” may look impressive until one learns that the actual root, Proto-Algonquian *akw- (the * came from the wrong segmentation of an Algonquian compound) means ‘ashore, out of the water’ (indicating location or direction rather than the place of origin) and that the real Algonquian ‘water’ term is *nepyi (for details, as well as the for full review of other Algonquian data cited by B&R, see Marc Picard 1998). But of course there are so many “Amerind” ‘water’ words that *nepyi could even be decomposed into more than one of them (e.g. *na + *poi).

Impressionistic comparison without any regard for methodological rigour will invariably produce the same outcome: a haphazard collection of words from, say, a dozen families and a few dozen languages (out of the world’s several thousand) which look vaguely similar and have vaguely similar meanings. How should one formulate a relationship proposal based on such evidence, so that other people could evaluate it? Surely not by listing the putative cognates and saying “look!” in the hope that the raw unanalysed evidence will speak for itself. But “global etymologists” do just that. They promise that someone, sometime, will carry out the actual comparative work, but they also claim that their data stand even without it. That’s wishful thinking, pure and simple.

10 June 2013

A Water Word that Wasn’t There

The last item on Bengtson & Ruhlen’s list of “global etymologies” is ʔAQ’WA ‘water’. What can hardly escape anybody’s attention is its uncanny similarity to one of those Latin words which are the common currency of our civilisation: aqua, as in aquarium, aqueduct, or BonAqua. One knows such words even without the benefit of a good classical education. Is it possible that an ancient “global” word survived virtually unchanged in Latin? 

To be sure, Bengtson and Ruhlen don’t actually reconstruct their global proto-words. They claim that the glosses offered in the article “are intended merely to characterize the most general meaning and phonological shape of each root”. Nevertheless, the “phonological shape” looks pretty specific, complete with such fancy details as an initial glottal stop, and a medial uvular ejective. Are those segments there because there is some solid evidence for them, or are they simply ornamental? Never mind. We shall look at the global data next time. Today let’s only examine the putative Indo-European reflex of ʔAQ’WA. We have already seen how the comparative method works, so let’s apply it again. 

Bengtson & Ruhlen cite the following forms to support the PIE reconstruction *akʷā-
  • Anatolian: Hittite eku-, Luwian aku-, Palaic aḫ- ‘drink’ [somewhat sloppy and not quite correct, see below] 
  • Italic: Latin aqua ‘water’ 
  • Germanic: Gothic ahwa ‘river’ [found elsewhere in Germanic as well] 
  • Tocharian A yok- ‘drink’ [Toch. A and B, as a matter of fact] 
At first blush, the evidence looks impressive. The word (or at least its root) occurs in four branches of IE, including Anatolian and Tocharian. That should be enough to guarantee that we are dealing with a PIE lexical item. To be sure, the meaning ‘water’ occurs only in Latin; the Germanic cognate means ‘river’, and Anatolian and Tocharian only have a verb meaning ‘drink’. If the noun and the verb were related, it would be interesting to analyse the relationship and make sure that the meaning ‘water’ is indeed old and not derived within IE. That will not be necessary, however, because the words are not related in the first place. 

Hitt 3sg. ekuzi ~ eukzi, 3pl. akuanzi  ‘drink’ (+ Palaic ahu- and Cuneiform Luwian u-) may only reflect a root with a voiced consonant (a voiceless one would have become -kk-, not -k-, in Hittite). We can connect them via regular sound correspondences with Latin ēbrius ‘drunk’ and Greek nḗpʰō ‘be sober’ (= ‘not-drink’, with the IE negative particle *n(e)-). The Anatolian verb forms might go back to a plain root present *h₁égʷʰ-ti, *h₁gʷʰ-énti, but Tocharian AB yok- and the Latin adjective require a long vowel; the jury is still out on whether we should posit a PIE lengthened-grade root *h₁ēgʷʰ- or a reduplicated stem, *h₁é-h₁gʷʰ- (or even something still more complex). A couple of things seem clear, though. The root-final consonant is *gʷʰ, not *, and the initial laryngeal is *h₁ (the one that doesn’t colour an adjacent short vowel). This is enough to exclude any connection with aqua or its Germanic cognates. One might add that apart from *h₁egʷʰ- we also find the widespread perfective verb *poh₃(i)- ‘drink’ (also in Anatolian, with the meaning ‘swallow, gulp down’). As reflexes of *h₁egʷʰ- clearly refer to drunkennes at least in Latin and Greek, perhaps its original meaning was ‘get drunk’ (on something more intoxicating than water) rather than simply ‘drink’. 

Not real water
We are left with Latin aqua ‘water’ and Germanic *axʷō ‘river’ (a perfect formal match combined with a difference in meaning). Possible traces of a Celtic word reconstructible as *akʷā are few and hardly substantial: they include several European river-names ending in -apa (which might or might not be a Gaulish cognate of aqua, not confirmed by any Gaulish text), and a single occurrence of -akua as part of a longer sequence in an unclear Celtiberian inscription, where the context doesn’t rule out the meaning ‘river’ (but neither does it demand such an interpretation). By contrast, Germanic *axʷō is abundantly attested (Goth. aƕa, Old High German and Old Saxon aha, Old Frisian ā ~ ē, Old English ēa, Old Norse á). All the reflexes mean ‘running water, stream, river’, which shows that PGmc. *axʷō was roughly synonymous with PIE *h₂ap-h₃on- and possibly replaced the latter term in the prehistory of Germanic. The word-family represented by English water, German Wasser and Gothic wato was not affected. In Latin, on the other hand, aqua completely ousted *wodr̥ ~ *udōr/*udn-, etc., and became the ordinary word for ‘water’ (including “tame” water for drinking or washing). 

Germanic also displays some interesting derivatives, such as *aujō ‘island; meadow-land’ from earlier *aɣʷjō < pre-Germanic *akʷjā́ (ON ey, OE īġ ~ īeġ). This word formed the first member of the OE compound īġ-lond > ModE island (which owes its mute s to false association with Old French isle, an unrelated but acidentally similar word derived from Latin īnsula). The compound, by the way, outcompeted the free-standing word: in Middle English the element ei ~ i ~ ie was common in placenames, but no longer in isolation. As regards its further derivatives, we have OE īġoþ ‘islet, small island’ (hence modern ait ~ eyot, used mostly with reference to the topography of the Thames). Finally, Germanic *ēɣ⁽ʷ⁾ijaz (cf. the ON ocean-giant Ægir, OE ǣġ(e) ‘island, sea, sea-coast’) may be related provided that the word is old enough to reflect some characteristic “special effects” of laryngeal colouring: Lat. a- and Gmc. *a- would together point to an initial *h₂a-, but *ē-, if cognate, would imply an old lengthened grade *h₂ē-, immune to the a-colouring effect of *h₂. All this is highly speculative, especially in the absence of any uncontroversial cognates of aqua outside Latin and Germanic. The IE reconstruction *h₂ákʷah₂ is often encountered in the linguistic literature. While not impossible, it is hardly warranted by the comparative evidence. Moreover, even if the word is genuinely old within IE, neither Latin nor Germanic can tell us if we should reconstruct an intervocalic *-kʷ- or *-ḱw-. If the latter, one might attempt to connect the ‘river/water’ word with the IE adjective meaning ‘swift, fast’ (traditionally reconstructed as *ōḱú-, with an initial *ō which conceals some puzzling combination of PIE vowels and laryngeals, not yet unravelled to everyone’s satisfaction). In that case, however, we must posit an evolutionary chain like ‘swift’ → ‘rapid current’→ ’river’ → ’water’ to account for the semantics. If there’s any truth in this suggestion, the meaning ‘water’ is highly derived, and there was originally nothing aquatic about the PIE root that produced the Latin and Germanic terms. 

I have only touched upon the problems surrounding aqua and its kin. A full discussion would not change the bottom line: *akʷā (or any laryngeally revamped version thereof) is not a valid PIE reconstruction. The words we find in Germanic and Latin are regional, not common Indo-European. Their pedigree is uncertain; they may be loans from an unidentified pre-IE substrate (in which case their deeper history is unknowable for lack of data). If they are derived from an internal IE source, then in all likelihood the link with streams, rivers, and finally water as a substance is a late product of semantic evolution. The Anatolian and Tocharian words for ‘drinking’ belong to a totally different word-family despite their misleading resemblance. The famous Hittite phrase wātar⸗ma ekutteni ‘and you will drink water’ (part of the sentence that triggered Hrozný’s eureka experience) does contain a cognate of English water, but not one of Latin aqua.

[► Back to the beginning of the Proto-World thread]

05 June 2013

A Wiki-Wiki Interlude

This is not about water, but it is too good to miss.

Hawai‘ian phonology is simple, but its history is fascinating. Proto-Eastern Polynesian *k was shifted to a (phonemic) glottal stop /ʔ/ in Hawai‘ian (that is what the inverted comma in Hawai‘i stands for),  which left the coronal stop *t with a lot of free space to expand into (there were no other stops or fricatives articulated with the involvement of any part of the tongue). As a result, most of the allophones of *t migrated away from their original point of articulation, towards the soft palate, until *t basically changed into /k/, reaching the position vacated by the old shifted velar. To be more precise, today [k] is the main phonetic realisation of /k/ (former *t), but in some positions the pronunciation may still be [t], and in fact just about any non-labial and non-glottal obstruent (stop, fricative or affricate) may be employed as an allophone of /k/.

Thanks to this highly unusual place-of-articulation shift the Central East Polynesian adjective *witi ‘quick, lively’ became Hawai‘ian wiki (mind you, it can still be pronounced ['witi] or ['viti], but the shifted pronunciation ['wiki] brings it phonetically closer to English quick and increases the odds of its being picked up by an English-speaker). Thus was born one of the most successful linguistic replicators of today. For centuries the virus was more or less confined to its insular homeland, but in the mid-1990s it infected the mind of an American computer programmer visiting the islands. Before long, all major language communities had their Wikis. There is of course a Hawai‘ian one as well!

I want to thank Lara Prescott for bringing this beautiful infographic presentation to my attention, and I hasten to share it wiki-wiki.

01 June 2013

Wild Waters

I apologise in advance if what you find below is technical and hard to follow, but I am still talking of the comparative method. If you prefer something easy, I recommend mass comparison.

Old Indic ap- ‘water’ is a curious word. It is a feminine root noun (its stem is a bare root morpheme with no suffix), and Indo-European root nouns are generally interesting. They are primitive formations, inherited rather than borrowed, often charmingly irregular and likely to reveal some little secrets on close examination. To begin with, the declension of ap- is somewhat defective. Some of its case forms in the singular are not attested at all, and those that are occur exclusively in the archaic Vedic dialect, while Classical Sanskrit knows only plural forms. The stem has two variants, strong āp- (nom.pl.  ā́pas) and weak ap- (gen.sg. apás, loc.pl. apsú, etc.). A similar pattern can be seen in the Iranian languages, especially Avestan, where the nom.sg. āfš (< *āp-s) is preserved beside acc.sg. āpəm, nom.pl. āpō, contrasting with the weak stem of gen.sg. apō, gen.pl. apąm, etc. The pattern looks like a slightly reworked acrostatic paradigm, possibly *Hóp-/*Hép-, where *H is one of the PIE “laryngeals”. The original declension would have been like this:
  • nom.sg. *Hṓp-s
  • acc.sg. *Hóp-m̥
  • gen.sg. *Hép-s (→ *Hép-os → *Hep-ós, on the analogy of mobile stems)
  • nom.pl. *Hóp-es

One would expect the normal IE lengthening of the root vowel *o in the nom.sg.; in the acc.sg., voc.sg., and nom./voc. pl. the inherited *o would have occurred in an open syllable, a context in which it would have been affected by the Indo-Iranian lengthening known as Brugmann’s Law. In other case forms we presumably have something else than *o (so the laryngeal should be either the non-colouring *h₁ or the a-colouring *h₂). The presence of an initial laryngeal is demonstrated by vowel lengthening visible in compounds like Skt. dvīpá- ‘island’ < *dwi-Hp-ó- ‘with water on either side’. For reasons that will become clear in a moment, most specialists reconstruct the root as *{h₂ep-}, which, assuming an acrostatic paradigm, would have resulted in nom.sg. *h₂ṓps, gen.sg. *h₂áp(o)s, nom.pl. *h₂ópes. The Indo-Iranian word may mean not only just ‘water’ (natural fresh water in lakes or rivers), but also the “celestial waters”, i.e. the sky, as well as “the Waters” personified as deities.

Outside of Indo-Iranian, we have a nice Tocharian cognate (Toch.A/B āp- f. ‘water, river’, with a vowel that could reflect *ō or *a), and a few more doubtful ones: Old Prussian ape ‘stream’, as if from *h₂ap-ijah₂, cf. Vedic ápya- ‘aquatic’ (similar words in Lithuanian and Latvian begin with u-, which makes comparison problematic). No forms with a reflex of *e are visible anywhere, which favours the reconstruction of *h₂ as the initial.

There are also a number of possibly related words in Italic, Celtic and Anatolian, which mean ‘river, stream’ and present some characteristic problems as a group. In Anatolian, we find Hittite hapas, Palaic hāpna-, Cuneiform Luwian hāpa/i- (all meaning ‘river’), and the Lycian verb χba(i)- ‘to water, irrigate’ (plus a cognate verb in Hittite, apparently borrowed from Luwian). Together, hey would confirm the reconstruction of the initial laryngeal as *h₂ (*hwas not preserved in Anatolian, and word-initial *h₃ seems to have been lost in Lycian). Unfortunately, the medial stop in Anatolian cannot reflect *p, whose outcome would have been rendered as -pp-; a single spelling reflects a PIE voiced stop. That’s why the root underlying the Anatolian words is often reconstructed as *h₂abʰ-, not *h₂ap- (and not *h₂ab- either, since *b was vanishingly rare or even non-existent in PIE).

The wild waters of one of the British Avons (Devon)
[hat tip: Simon and Fiona]
Latin amnis ‘river’ could reflect *h₂ap-ni- (with a regular nasal assimilation), but if related to Palaic hāpna-, it would be better analysed as *h₂abʰ-ni- (which would have yielded the same Latin outcome). This seems to be confirmed by the Celtic nasal stem *abon- (Old Irish aub < *abū < *abō(n) ‘river’) and its synonymous derivative *abonā (Welsh afon), known from a number of tautological hydronyms in Britain (the River Avon is literally ‘the River River’). It would seem, therefore, that we actually have two “watery” roots, *h₂ap-, found in Indo-Iranian and Tocharian (with possible trace attestation elsewhere), and *h₂abʰ- (less likely *h₂ab-) in Anatolian, Latin, and Celtic. The distribution is puzzling and the roots are suspiciously similar, but *p and *b(ʰ) do not vary freely in the same morpheme in PIE. Are the roots different and their similarity accidental? Or is it some kind of aberrant dialectal variation in the protolanguage? Such variation is often taken for granted by etymological dictionaries, but it’s clearly a case of relaxing the sound standards of comparison. It would be much nicer to be able to unify the etymologies without special pleading.

A possible connection between the two variants was suggested by Eric Hamp in 1972. PIE had a quasi-possessive suffix first described by Karl Hoffmann back in 1955 and named after him. The shape of the Hoffmann suffix is *-Hon-/*-Hn-. Hoffmann himself supposed that the initial laryngeal was *h₁ (probably = IPA [h]), but some identify it as *h₃. There’s little evidence either way, to be sure, but it has long been known that *h₃ may be responsible for voicing a preceding obstruent (hence the idea that *h₃ was a voiced fricative, IPA [ɣ] or the like). The best example is the reduplicated present stem *pí-ph₃-e/o- > *píbe/o- ‘drink’ (from the root *{peh₃(i)-}). Hamp proposed that *abon- reflected *h₂abh₃on- ‘having/carrying water’, i.e. *h₂ap- extended with the Hoffmann suffix. The Latin and Palaic forms would be analysable as derivatives of the same word: *h₂ab(h₃)n-o- ~ *h₂ab(h₃)n-i-.

But what about Hittite hapas, which does not seem to contain the Hoffmann suffix? Well, it may contain it after all. PIE *h₂abh₃on- would have become pre-Hittite *xaban- (*h₃ was lost word-medially in Anatolian). But there was a strong tendency in Hittite for animate n-stems to adopt a-stem inflections. The pivot of the change was the nom.sg., which lost its final *-n early (already in PIE) but acquired a secondary -s in Anatolian on the analogy of other types of animate stems; cf. *h₃ór-ō(n) ‘eagle’, acc.sg. *h₃ór-on-m̥ > Hitt. nom.sg. hāras, acc.sg. hāran-an (n-stem) → hāra-n (a-stem). Indeed, the Hittite ‘river’ word is attested several times with n-stem endings, which lends credence to the hypothesis that hapa- is an original n-stem (Proto-Anatolian *xábō(-s)/*xabn-), and is in fact an exact cognate of Old Irish aub.

Thus the reconstruction of the Hoffmann suffix as *-h₃on-/*-h₃n-, with a laryngeal that triggers voicing in a preceding voiceless segment, allows us to derive all the forms under discussion from one acrostatic root noun *h₂óp-/*h₂áp-. A slightly different alternative solution, also possible though more controversial, would be *h₂ā́p-/*h₂áp-, with an acrostatic *ā/a alternation (fundamental rather than due to laryngeal colouring; some Indo-Europeanists deny the existence of such a pattern). In either case the weak stem is *h₂ap-, and we really can’t know whether the Indo-Iranian long vowel in the strong cases reflects *o lengthened by Brugmann’s Law, or inherited *ā. I’ll tentatively accept the former possibility (without ruling out the latter). The root noun itself is attested securely but less widely than its most important derivative, *h₂ap-h₃on- > *h₂ab(h₃)on- ‘river’. On the whole, the analysis sketched above is weaker than the reconstruction of *wódr̥/*wédn-. Some linguists do not find the identification of the laryngeal in the Hoffmann suffix as *h₃ convincing, and are happy with the reconstruction of alternative roots (or root variants) for ‘water/river’. To my mind, Hamp’s solution is elegant and parsimonious (it prevents us from positing extra variants beyond necessity).

Note that the gender of *h₂ṓp-s/*h₂áp- is feminine in Indo-Iranian (animate in PIE terms), as opposed to the neuter (inanimate) gender of PIE *wódr̥/*wédn-. The distribution of both words and their derivatives (in both primary subfamilies of Indo-European, sometimes in one and the same branch, and without any geographical restrictions – from Ireland to India, Central Asia and Chinese Turkmenistan) guarantees protolanguage status for both of them. The gender difference, the mythological significance of Indo-Iranian *Hap- (not shared with *udan-), and the fact than *h₂ap- seems to have been preferentially used in other IE branches to derive words with the meaning ’river, stream’, suggest that the words were not quite synonymous, and that the Indo-Europeans may have been like the modern Hopi Indians in having two separate concepts corresponding to English water: ‘tame water’ contained for human use (like Hopi kuuyi) versus ‘wild water’ as a natural force beyond human control (like Hopi paahu). It’s the latter kind that could be personified or even deified. Note the potential problem for long-range research: even “Swadesh” meanings are not necessarily as fundamental as we tend to imagine. If one wants to compare the IE ‘water’ terms with putative external cognates, the question arises which aspect of ‘water’ is more representative of H₂O. All right, then: which of the two do mass-comparatists mean when they talk of “the PIE word for water”? Surprisingly, neither, as we shall see next time.

30 May 2013

The Water Story

When during the First World War the Czech orientalist Bedřich Hrozný was copying cuneiform inscriptions from the Hittite royal archive, deposited at the Imperial Ottoman Museum in Constantinople, it suddenly dawned on him that the still enigmatic language was Indo-European. One of the first words that he was able to interpret was wa-a-tar (wātar) ‘water’. Hrozný already knew that the preceding clause meant something like ‘and you will eat bread...’, so ‘drink water’ certainly made sense as a continuation. Of course even the occurrence of a familiar-looking word in the right context doesn’t mean much by itself, but the newly excavated Hittite corpus was sizeable and Hrozný was soon able to understand large fragments of the texts and indentify (not always correctly) more Indo-European material in them – from pronouns and sentence particles to verbs, nouns and adjectives.

The similarity of wātar to words for ‘water’ in other branches of IE is not accidental, and the word is inherited from a common ancestor rather than borrowed. We can say so with confidence not simply because the sound correspondences look fine. The ‘water’ word is declined in Hittite, with inflectional endings familiar from elsewhere. What’s more, the declension of wātar is irregular in an interesting way: the stem has the variant witen- in the oblique cases (such as the gen.sg. witenas), and its nom./acc.pl. is witār. Those Hittite alternations can be traced back to a reconstructed pattern like *wódr̥, *wedén-os, *wedṓr – with vowel substitutions, accent shifts, and a characteristic *r/n alternation in the suffix, found also in neuter stems in other morphologically conservative IE languages.

Hittite preserves a unique variety of stem variants in one paradigm. Other IE languages have levelled them out at least partly:
  • Greek has húdōr, gen.sg. húdatos, nom./acc.pl. húdata. The a of the suffix in the oblique cases and in the plural reflects a pre-Greek syllabic nasal (*ud-n̥-t-os, *ud-n̥-t-ah₂, with an extra -t- that is a Greek innovation), which means that the *r/n alternation is indirectly reflected there, but the root syllable has a fixed shape (the full vowel *e/o was deleted, leaving */wd-/ = *ud-); also the accent is fixed on the initial syllable.
  • In Vedic, only a few isolated case forms of the word survived (loc.sg. udán ~ udáni, gen.sg. udnás, nom./acc.pl. udā́), with alternations restricted to the suffix, as in Greek, but with the word accent anywhere but on the root, quite unlike Greek.
  • In Germanic, the root syllable has the same full vowel throughout (*wat-, reflecting older *wod-); the *r/n alternation is still visible, but the variants with *r and *n are segregated among different Germanic languages (cf. Old English wæter vs. ON vatn, both remodelled as vowel-final stems: *wat-r-a- vs. *wat-n-a-). Gothic, in which the stem remained consonantal, generalised the nasal variant at the expense of *r: nom.sg. wato, gen.sg. watins, dat.pl. watnam (as if from pre-Gmc. *wod-ōn, *wod-en-, *wod-n-).
  • In Baltic the suffix has a nasal, but there is also another nasal, curiously infixed in the root, presumably due to the generalisation of anticipated nasality: *wod-n-/*ud-n- > vand-/und-, as in Lithuanian vanduõ, acc.sg. vándenį, Latvian ûdens, Old Prussian wundan, unds.
  • Some of the other IE languages also preserve traces of the noun (Slavic *voda, Umbrian utur, abl.sg. une, etc.), and numerous words derived from the stem *w(V)d-(V)n/r- appear even in those languages in which the primary noun has been lost, cf. Latin unda ‘wave’ < *ud-n-ah₂ (with a metathesis common in Latin and convergent with what we see in the declension of the Baltic ‘water’ word’).
Otter < OE oter < PGmc. *utraz < *ud-r-o-s
It’s a nice jumble of forms, not even quite compatible across related languages because of the independent fixation of different innovations along different branches of the family tree. It took the efforts and accumulated insight of several generations of Indo-Europeanists (culminating in the work of late 20th-century scholars such as Jochem Schindler) to explain their complicated evolution in detail. 

The hypothetical common starting point is an “acrostatic” neuter noun with an *o/e alternation (see here for a similar case): nom./acc. sg. *wód-r̥, oblique *wéd-n-, collective pl. *wéd-ōr (from a still earlier *wéd-or-h₂, where *h₂ was a collective ending, lost already in PIE after a stem-final liquid or nasal but causing the compensatory lengthening of the vowel of the stem-forming suffix). The *e in the root syllable was the “weak” counterpart of the “strong” grade *o. But PIE *e was ambiguous, because it could also represent the strong grade of some roots, whose weak variants lacked the vowel. On the analogy of such roots, new weak stems were created, with the *e deleted and the accent shifted to another syllable: collective *udṓr, oblique *udén- (especially in the loc.sg.) or *udn- (followed by an accented inflectional ending). The collective plural (‘waters’ = ‘a vast quantity of water’) was occasionally reinterpreted as a singular mass noun. Its declension was then remade as follows: nom./acc.sg. *wédōr or *udṓr, gen.sg. *udn-és.

Such analogical remodelling must have taken place already in the common ancestor of all the IE languages, and was continued after the breakup of Indo-European unity. The state of affairs visible in Hittite is archaic, but only in a relative sense. In the ancestor of Hittite the noun became accentually mobile – for example, the old gen.sg. *wéd-n̥-s  was replaced by *wed-én-(o)s on the analogy of nouns with a shifting accent – but no new weak grade was generated in the process.

acrostatic → mobile
collective → singular
nom./acc. coll.

The emergence of new variant paradigms is schematically shown in the table above. The forms on the left are the oldest ones; those in the last column illustrate some post-PIE developments (as  reflected e.g. in Germanic). It is important to realise that there must have been considerable variation (rather than a single paradigm) already in the most recent common ancestor of the known IE languages. That variation supplied the raw material for later developments, which could be compared to independent attempts to assemble a new vase from the scattered fragments of several broken ones. Alternative PIE paradigms, each of them too complex to survive in the long run, were mixed up, reorganised, and independently simplified in the daughter languages. We understand the process rather well because the ‘water’ word fits into a more general pattern together with other words of a similar structure, and their evolution is part of a still grander model of inflection and phonological alternation in PIE nouns. Despite its complexity it’s not an arbitrary just-so story but a coherent and well-constrained theory explaining a large segment of PIE grammar. We don’t know everything about its prehistory. For example, we are still very much in the dark about the origin of the *o/e alternation in acrostatic roots. We take the left-hand column of the table as the point of departure because it represents the earliest stage we can safely reach given our current understanding of Proto-Indo-European.

To sum up, the fact that Hittite wātar is similar to English water is interesting but not particularly impressive as an isolated observation. Similarities can be found between any languages chosen at random. It’s far more significant that the inflectional pattern visible in Hittite helps us to understand the origin of the diversity displayed by cognate ‘water’ words elsewhere in the IE family and is part of the evidence used in the reconstruction of the PIE morphological system. It’s those pervasive shared patterns that demonstrate the membership of Hittite in the IE family.

But wait a minute... I promised to discuss the global etymon ʔAQ’WA, right? Why am I talking of PIE *wódr̥ instead? Well, because it’s the best-attested IE word for ‘water’, supported by a wide array of comparative evidence. Anyone trying to establish a genetic relationship between IE and other language families had better keep this in mind. But surely there are other ‘water’ words in IE that are possible candidates for PIE status and could be of interest to long-rangers? Perhaps, but they’ll be discussed in a separate post. We are taking a roundabout route to ʔAQ’WA, but we'll eventually get there.

[► Back to the beginning of the Proto-World thread]

26 May 2013

Water, Water Everywhere: Back to Global Etymologies

The Eurasiatic interlude was longer than I had originally planned. It’s time to return to Proto-World and “global etymologies”. Few things are more instructive than a nicely dissected example, so I shall compare different approaches to analysing genetic relationships and illustrate them with real data.

No matter how severely we criticise the long-range reconstructions of Nostratic/Eurasiatic, they are proposed by scholars who respect the standard comparative method and appreciate its importance for separating signal from noise. According to the mainstream approach, it is not enough to observe that numerous pairs of words across two languages are similar in form and meaning. One ought to analyse the similarities carefully in order to decide whether they are more likely the consequence of common ancestry than of non-genetic factors such as horizontal diffusion (borrowing), functional convergence (onomatopoeia, etc.), or blind chance. Attempts to meet the accepted standards in inter-family comparison may fail, but at least there are people courageous enough to accept the challenge.

M. C. Escher, Rippled surface (1950)
But there is also a different approach, called multilateral comparison (a.k.a. mass comparison), according to which genetic relationships can be (and indeed have always been) established without assembling regular sound correspondences and reconstructions. To classify a set of languages (the larger the better) one only needs a collection of tabulated data (a list of basic vocabulary and grammatical morphemes for each language will suffice), a good eye for spotting patterns, and some general linguistic training (as opposed to the expert knowledge of some of the languages being compared). It doesn’t really matter if the evidence is partly corrupt or incomplete: as long as there’s plenty of it, its cumulative weight makes errors cancel out. Finding lexical matches across a large number of languages requires no analytic skills or painstaking detective work: enough evidence leaps out at you from the printed page as you eyeball it. Classificatory conclusions can be drawn simply from inspecting the data, with a confidence approaching certainty.

The best-known advocate of multilateral comparison was Joseph H. Greenberg (1915-2001), who used it famously to classify all the languages of Africa into four genetic stocks, and then to hypothesise that all the native languages of the New World with the exception of the Eskimo-Aleut and Na-Dene families formed one vast macrofamily, dubbed Amerind”. He was also the original proponent of “Eurasiatic” – a hypothetical genetic grouping similar to the older concept of “Nostratic”, though not identical with it. Greenberg’s successors have boldly extended his methodology to the study of the world’s languages, not only grouping them into one global phylogeny, but also arriving at twenty-seven examples of “global etymologies” labelled with approximate reconstructions (Bengtson & Ruhlen 1998). This is quite surprising, since according to their own principles comparative reconstruction is a separate technical task, not required for a correct classification. Nevertheless, mass-comparatists often propose impressionistic reconstructions, and even compile etymological dictionaries where hundreds of such reconstructions are offered (cf. Greenberg & Ruhlen 2007). They may be marked with an asterisk just like the legal products of the comparative method – a practice bound to confuse a non-specialist by creating the impression that some actual reconstructive work has been done.

In the posts to follow I shall focus on Bengtson & Ruhlen’s Global Etymology #27, ʔAQ’WA ‘water’. I intend to show, first, how Indo-European words meaning ‘water’ are analysed with the help of the standard comparative method; then, how Nostratic linguists handle data extracted from several families (including IE) to reconstruct a putative common proto-word at the macrofamily level; and finally, how mass-comparatists identify a global etymology (and restore the form of the corresponding word).

Greenberg, Joseph H. & Merritt Ruhlen. 2007. An Amerind Etymological Dictionary.  Stanford, CA: Stanford University Press. [PDF]
Ruhlen, Merritt & John D. Bengtson. 1998. “Global etymologies”. In Merritt Ruhlen, On the Origin of Languages: Studies in Linguistic Taxonomy. Stanford, CA: Stanford University Press. [PDF]
 [► Back to the beginning of the Proto-World thread]

20 May 2013

A Special Question on Quechua

One of the Quechua words cited in Table 1 in my Inca Connection post is genuinely related to something Indo-European (though not to what the Eurasiatic pseudo-cognate would imply). Which one? Please post your suggestions as comments below. I shall discuss the answers (if any) about this time tomorrow.
Oops, please ignore this challenge. I thought kuchuy 'cut' was somehow back-formed from kuchillu 'knife' (Spanish cuchillo < Lat. cultellus), but it seems that I was wrong: the verb is native and the similarity is deceptive, as in the case of English cook vs. cookie.

18 May 2013

The Inca Connection: A Quechua Word Game

Gather round and I’ll show you a magic trick. Watch my hands, but first look at Table 1 below. It is based on a 200-word Swadesh list for Southern Quechua and the Tower of Babel “Eurasiatic” etymologies:

Table 1

*ma, *ʔVnV
mana... chu
ama... chu
not (negation)
not (prohibition)
what (interr.)
what (interr.)
bark (of a tree)
bark (of a tree)
bark, skin
bark, skin
far, next
thick, dense
tongue, speak
feather, tail
(a kind of) fish
thick, swell

There are only twenty-two matches because I got bored too soon, but it’s an easy game. One can even formulate some preliminary “regular correspondences” (supported by a few cognate pairs each!). For example, Eurasiatic liquids (laterals and rhotics) generally merge in Quechua, yielding /r/ (8, 11, 12, 14, 20, 22), but before certain consonants (laryngeals and semivowels) liquids are reflected as palatal /ʎ/, spelt ll (13, 17). Eurasiatic affricates are generally preserved as such, yielding Quechua ch /tʃ/ (4, 16, 17), but we also have one example of a velar stop palatalised and affricated before a front vowel (20) and possibly one more (1) if chu is related to PIE *kʷe (but I can’t say at this stage why the *e is reflected as /u/). Before the low vowel /a/ Eurasiatic dorsals become velar /q/ in Quechua (6, 7, 8, 13). There are sporadic exceptions (2, 11) and one occurrence of a uvular before /u/ (22), but come on, folks, you can’t expect me to solve all problems in one fell swoop with so little material.

No comment (Aaarrrrrgh!)
I think I have already demonstrated beyond reasonable doubt that the Quechua people are a lost Nostratic tribe. Note that the semantic matches are impeccable and the similarity of the words is quite obvious to any open-minded observer. Indeed, the matches are much better than many of those in the LWED. The quality of examples 1, 2, 3, 4, 5, 6, and 9, in particular, is guaranteed by the fact that they represent statistically certified ultraconserved Eurasiatic vocabulary (Pagel et al. 2013). The famous items ‘mother’, ‘bark’, and ‘worm’ are among them. In many Eurasiatic languages the words for ‘bark’ and ‘skin’ are the same or look related (6, 7). This seems to be true of Quechua as well, but just in order to probe every possibility, I can offer an alternative etymology of qara ‘skin’ (8, from a different Eurasiatic root), in which case its homophony with qara ‘bark’ must be accidental. A nice match either way.

But there is more to Quechua than just its Eurasiatic affinities. It seems to be particularly close to Proto-Indo-European. Compare the Quechua numerals pichqa ‘5’ and suqta ‘6’ = PIE *penkʷe, *sweḱs, clearly a common Indo-Quechuan innovation not shared with any other Eurasiatic group. I can’t reveal too much at present, but mark my words: you’ll read about it in Nature one day – or Science, perhaps, or PNAS.

[► Back to the beginning of the Proto-World thread]