05 July 2013

Global Water for the Last Time

I’m sorry for such a long break since the last post, but the end of the academic year is a busy time. Where were we? Ah, yes, the global etymon meaning ‘water’.

I analysed the Indo-European evidence in some detail to highlight the fact that, although Latin aqua has cognates here and there in Indo-European, its attestation is too weak to treat the word as reconstructible all the way back to Proto-Indo-European. It’s a regional word with uncertain affinities, and surely not the PIE ‘water’ word (there are better candidates for that status). Its story contains a moral: sheer similarity, even within an uncontroversial family, doesn’t mean anything by itself. There is an inherited verb root meaning ‘drink’ which looks tantalisingly similar to aqua (and was once regarded as related to it), but which has to be separated from it, given what we know today. Our improved understanding of some of the languages of the past (such as Hittite and the rest of the Anatolian clade) has forced us to abandon quite a few superficially promising etymologies. And it’s a good thing: it shows that etymologies are in principle falsifiable. All you need is a good model within which they can be evaluated.

Of course absence of evidence is not evidence of absence. It may conceivably happen that a word present in a protolanguage survives only in one language descended from it, or in a small cluster of related languages. In such cases, outgroup comparison may still enable us to recognise the word as inherited. We only need some secure external cognates and a consistent pattern of correspondences. We can’t, however, trust conclusions drawn only from the existence of vaguely similar words scattered across several families, especially if there is no pattern they could fit into because the researchers feel free to avoid real reconstructive work. If you look at Bengtson & Ruhlen (B&R)’s data, you will find many clear examples of “reaching down” (selecting isolated lookalikes and pretending they represent the families in question).

For example, words related to aqua are claimed to be present in Afro-Asiatic, while in fact all the proposed cognates  come from two periferal branches: Omotic (whose very membership in Afro-Asiatic is is uncertain) and Cushitic (whose exact location in a the AA family tree is anything but clear, but which is areally close to Omotic, so that borrowing between them is hard to rule out). The meaning of the suggested cognates is sometimes ‘water’, (but also ‘[to be] wet’, ‘drink’ or ‘drops of water’). But what about the Berber, Chadic, Egyptian and Semitic branches of Afroasiatic, where no such item occurs? What about alternative ‘water’ words which can be found in Cushitic and/or Omotic? (By the way, putative cognates of aqua occur only in North Omotic.) Afro-Asiatic is a big family, with about 300 extant members. With so many languages and “related meanings” to choose from, and with no formal controls, pseudo-cognates crop up inevitably. An Amerind Etymological Dictionary (Greenberg & Ruhlen 2007) lists no fewer than seventeen different etyma meaning ‘water’: *aqʷ’a/*uqʷ’a (of course), but also *man, *poi, *re, *si, *kʷati, *p’ak, *na, *ʔali, *pan, *tuna, *c’i, *kam ~ *kom, *to ~ *do, *kona, *xi, and *hobi (while we’re at it, there are also eight Amerind words for ‘dog’ and thirteen for ‘eye’). These forms are not real comparative reconstructions (their phonetic details are nowhere dicussed or justified) and must be treated as approximate, which of course makes comparison as easy as pie, especially if semantics is given as much leeway as phonology.

Lost in distillation
[Source: Wikimedia]
If you don’t reconstruct past sound changes, how can you decide whether, e.g., French eau (pronounced /o/) is related to Spanish agua, or that both of them are related to Romanian apă? Note that these three modern Romance languages began to diverge less than two thousand years ago. Their modern ‘water’ words are already more different from the common ancestor (yes, Latin aqua) than the latter is from, say, some of the “Amerind” forms cited by B&R. Sound change may be rapid and dramatic. What, then, constitutes a “match” if you are comparing languages supposedly separated by 10,000 or 20,000 years of independent development, and if you can’t even be bothered to study systematic sound correspondences or morphological patterns? Ignorance helps you to see patterns that knowledge dispels at once. In Kove, one of the Austronesian languages of New Britain (in the Bismarck Archipelago), water is called eau. If we knew less than we do about the history of French (or Kove, for that matter), we might suspect a long-range connection, mightn’t we? Is Proto-Pama-Nyungan *nguku/i (which should replace B&R’s anachronistic “Proto-Australian” *gugu) related to Lat. aqua? Well, if I am shown a serious etymological proposal, with the relevant sound changes, morphological derivations and semantic shifts (if any) all spelt out, I’ll tell you what I think of it. Untestable guesswork hardly deserves to be discussed.

A “cognate” like “Proto-Central-Algonquian *akwā ‘from water’” may look impressive until one learns that the actual root, Proto-Algonquian *akw- (the * came from the wrong segmentation of an Algonquian compound) means ‘ashore, out of the water’ (indicating location or direction rather than the place of origin) and that the real Algonquian ‘water’ term is *nepyi (for details, as well as the for full review of other Algonquian data cited by B&R, see Marc Picard 1998). But of course there are so many “Amerind” ‘water’ words that *nepyi could even be decomposed into more than one of them (e.g. *na + *poi).

Impressionistic comparison without any regard for methodological rigour will invariably produce the same outcome: a haphazard collection of words from, say, a dozen families and a few dozen languages (out of the world’s several thousand) which look vaguely similar and have vaguely similar meanings. How should one formulate a relationship proposal based on such evidence, so that other people could evaluate it? Surely not by listing the putative cognates and saying “look!” in the hope that the raw unanalysed evidence will speak for itself. But “global etymologists” do just that. They promise that someone, sometime, will carry out the actual comparative work, but they also claim that their data stand even without it. That’s wishful thinking, pure and simple.


  1. Now, Amerind *poi is certainly cognate with PIE "drink". How can it not be? ;-)

  2. Absolutely. In my own mother tongue the word for 'a drink' is napój, which must be a close cognate of Algonquian 'water'. ;)

  3. I tracked down that Amerind etymological dictionary ( http://www.scribd.com/doc/29833923/An-Amerind-Etymological-Dictionary ), and... did they just presuppose a phonological system and map every word they came across to it?

    It's just so... disjointed. Reaching.

    "This book is dedicated to the Amerind people, the first Americans."

    My god. Forgive me but they are no more a people than Eurasians. They're many peoples. Many many many varied peoples, of distinct and fundamentally very different cultural backgrounds.

    Never mind that long before this was published we've known that the Americas were peopled in waves divided by eons. Never mind that even a single wave doesn't imply that the wave was monoglottic. Never mind the vast vast range of "adaptedness" differing cultures demonstrate (signs of recent intrusion etc). Never mind that outside the Americas similar cultures are often very deeply unrelated and dissimilar cultures are often quite closely related. They're all the same to the authors, so they have to be fit into a presupposed Amerind hypothesis, even if you have to resort to division signs to make your arbitrary trail of sound laws work.

    Looking through this "dictionary"... I don't know if it's the people I've known or what but this is just... so patronizingly simple minded.

  4. "A “cognate” like “Proto-Central-Algonquian *akwā ‘from water’” may look impressive until one learns that the actual root, Proto-Algonquian *akw- (the *-ā came from the wrong segmentation of an Algonquian compound)..."

    This is a persistent problem for these claims. if oyu don't know enough about a language to figure out what is a rot and what isn't, because oyu have too light a grip on the morphologicla processes or whatever, then oyu are going to make this kind of mistake over and over again.

    Another kind of mistake comes from not knowing or not caring to know the history of an etymon, as you mention in the aricle, so that you find all kinds of false cognates based on misidentificatiosn of etyma. This is a problem especially in languages like Mandarin where there has been a lot attrition of consonants, so much that tonal distinctions cannot possibly disambiguate. Here's an example: the syllable 'bi'

    First tone: http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=bi1
    Second tone: http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=bi2
    Third tone: http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=bi3
    Fourth tone: http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=bi4

    These hundred or so separate etyma are by no means all the words pronounced 'bi' in one tone or the other.

    And of course some of these change meaning - sometimes actually etymologically related, often times not at all - if the tone changes. I am talking about the same graph. Each one of these has to be run down, one by one, to get to what is actually going on with that etymon. and as I said, sometimes you will find that you are dealing with one or more etyma in one graph.

    And if you are half-slick you will pay attention to the phonetic portion of the graphs, because that *often* gives you solid information about the phonetic shape of the word a couple of millneia or so ago. But to be fully slick you have to remember that this is not always going to be the case, and never reliaibly; people made a lot of uniformed substitutions over the centuries, depending on the pronunciation of thier particular dialect at the time. Which itself is a form of information, but about a different question.

  5. The Middle and Old Chinese columns in Baxter and Saggart's table (which features many of those "bi" items) give you some idea how all those mergers happened (uncertain as the OC reconstruction is).

  6. I love that chart. It makes OC look like a real language. You can really hear in your mind what people were saying in the Classics.

    It also looks a lot like a lot of proto-TB reconstructions.

    1. That's because the system was developed using all sorts of evidence, including external (Sino-Tibetan) comparison.

    2. External comparison was explicitly not used. Pretty much every other conceivable source was.

    3. You are right. In section 2.7 of Old Chinese: A new reconstruction ("Tibet-Burman") the authors explicitly argue that Old Chinese phonology must be reconstructed based on inner Chinese comparative evidence. This evidence includes all the outlying Chinese languages not derivable from Middle Chinese: the Min dialects (and Norman's new reconstruction of Proto-Min), Hakka and Waxiang, as well as early loans from Chinese into other languages.

    4. By the way, the link above no longer works. Here is an updated one.

  7. Another thng - it looks like OC had about the same level of homophony that modern Englisih has - a lot but with just enough convenient distance in the distribution of homophones to keep the system from collapsing into a need for tones.