16 May 2013

Eurasiatic: A Wild Pursuit (2)

The only content word cited by the Pagel et al. (2013) with a putative cognate class size of more than four is ‘give’. The proposed Eurasiatic reconstruction is *dwV[H]V, with an optional “laryngeal” and two wildcard vowels. The reconstruction of a labial-velar glide is not justified explicitly (one can only guess that without it the Altaic initial would be different). PIE *deh₃- ‘give’ (phonetically *doh₃-, with laryngeal colouring) is a widely attested root aorist. Proto-Uralic *toɣe ‘bring’ is a very nice match and indeed is frequently quoted as possible evidence of “Indo-Uralic” kinship (alternatively, it could be an old loan from IE). Ironically, if Pagel et al. had really observed their “identical meaning” requirement as strictly as declared, they would have been forced to disqualify this match as not quite exact: the Uralic meaning is primarily ‘bring’, not ‘give’. The other cognates cannot be taken seriously: the Altaic ones (apart from formal problems) have to do more with feasting than with giving, the Eskimo one means ‘sell’ rather than ‘give’, and the Dravidian one means ‘bring’ in most languages of that family. In both Eskimo and Dravidian, unlike Uralic, the only potential sound corresponcence involves the initial (a dental stop, one of the most common classes of sound cross-linguistically). To claim that we are dealing with an item represented in five families out of seven is a massive overstatement. It’s really more like two families (and even there some semantic leeway must be excused).

All the remaining content words in the list are said to have cognate class sizes of four (whetever their estimated universal frequency of use). Even they, however, present various problems. For example, the ‘mother’ word (presumably *ʔVmV ‘mother, woman’ in the database, with a glottal stop that is nowhere attested, and vowels that may be present or absent at will) is a fancy cover reconstruction for a collection of obvious nursery noises coopted as kinship terms. ‘Mother’ words like /mama/, /ma/, /eme/, /ana/, /aɲa/, /aja/ etc. are likely to be re-created independently in different languages. They are common on all inhabited continents for reasons discussed already by Roman Jakobson in 1959 and by innumerable linguists since. Though babbling sequences involving /m/ preferentially refer to ‘mother’ (note the almost self-explanatory fact that Latin mamma means ‘nipple’), they may also occasionally be applied to other members of the family, like Manchu ama or Georgian mama, both meaning ‘father’. Linguists routinely disqualify nursery words as comparative evidence (unless a root that might be of such an origin is extended with non-iconic suffixes elevating it to the status of a “regular” word).

Thors goin wormin
The meaning ‘worm’ is too imprecise to be useful in inter-family comparison. English worm (like bug) may be applied to innumerable invertebrates. It was still vaguer in the early Germanic languages, where it could describe anything that crept or crawled, including legendary dragons and serpents (such as the mighty Midgarðsormr of the outer Ocean, shown on the left). The comparative material under Eurasiatic *ḲorV- ‘worm’ in the database is aswarm with miscellaneous vermin: houseflies, gadflies, maggots and other larvae, fleas, crickets, wasps, spiders, leeches, and even eels – words referring mostly to particular taxa rather than “worms” in general. Despite this semantic latitude the formal correspondences are far from impressive. One is reminded of the apocryphal definition of etymology as une science où les voyelles ne font rien et les consonnes fort peu de chose.

There would be little point in analysing the remaining items one by one. If anyone is interested in discussing them (or any other sins against linguistics committed in the article), it can easily be done in the Comments section. The examples above were chosen more or less at random, and are not necessarily the most problematic ones; they merely illustrate some of the obvious problems. What is quite evident is that the database does not allow one to define well-bounded cognate sets, and that the size of those sets is very easy to inflate by relaxing formal or semantic criteria even minimally in a way not controlled in the study. The “data” so concocted are simply unusable. If a statistical method applied to them seems to confirm the researchers’ expectations, it’s probably because the expectations are already encoded in the reconstructions used as data (which is part of the reason why reconstructions are not supposed to be so used!). The authors of the article consider such a possibility but do not really take any precautions against it. With such input, the only signal a statistical analysis will detect comes from confirmation bias.

  1. "the Uralic meaning is primarily ‘bring’, not ‘give’. The other cognates cannot be taken seriously: the Altaic ones (apart from formal problems) have to do more with feasting than with giving,"

    Actually this one is plausible, if it is supposed ot refer to the action of bringing something to a feats for gift exchange. It's kind of all related. But it also kind of loose and there's no way to confirm it without other infomration. And even if you do have other information and forms, it gets pretty circular pretty fast.

    And this piece:
    "Proto-Uralic *toɣe ‘bring’ is a very nice match and indeed is frequently quoted as possible evidence of “Indo-Uralic” kinship (alternatively, it could be an old loan from IE)."

    caught my eye because I thought you were going to say people try to relate it to that TVK word family - tow, tug, take. and whoever wrote the Uralo-Siberian wiki relates it to a reconstruction in Proto-Eskimo-Aleut. Still kind of thin.

    The other problem with this is that the sememe "give" is so vague and unbounded as to just melt off in all directions. The most commn Chinese equivalent, gei3, also menas ot benefit someone and is used as a co-verb in that sense. Here it resembles equivalent words in some California languages where the object of "give" is the person receiving the gift, not the gift. And that is a goodly semantic distance from "bring".

  2. That reminds me of one the silly remarks in the article -- that frequently used words tend to have more precise meanings. "Two" is given as an exaple, as opposed to "burn". But actually the most common English content words (leaving aside grammatical particles) are say, one, get, make, like, time, know, take, people..., all of them with extremely fuzzy meanings. Well, the next one is year, and that's relatively concrete.

    Some languages mix up the semes 'give' and 'get', the way 'teach' and 'learn' or 'lend' and 'borrow' are confusible.

  3. semes > sememes, excuse the haplology.

  4. Looking closely at the LWED, it seems S. Starostin et al. evoke semantic shift on an ad hoc basis. Or at least they do implicitly. Which is a pity - from my non-expert position, they appear at least to have a hypothesis that's viable, if their methodology is frankly atrocious.

    Semes themselves I find fuzzy. Is there an authoritative list of semes and good documentation of the methodology?

    Again, I don't know much. Wikipedia isn't so useful on this matter, I find.

    If one could create some good lists of semes (I would assume through comparing how use lines up with as many other languages as possible, somewhat like Patrick Dinneen's Foclóir Gaedhilge agus Béarla but also with Cantonese, Hawaiian, Zulu, etc.), then is there any truly in-depth study of semantic change using this?

    What I mean is examining historically the "shuffling" of semes across morphemes in, say, English as it acquired (amongst other things) a Romantic register or similar parallels with CJKV or Arabic on Urdu, Farsi, and Turkish, or better examining a modern language in midst of a great period of transition, and using that to establish laws of semantic change in similar fashion to phonetics.

    It seems that could really remedy their methodologically issues, a great deal at least. Not being aware of the issues in reconstructing "to give" when the best understood daughters are full of examples of that very word changing meaning and being replaced seems incurable to me, though.

    1. "Is there an authoritative list of semes "

      I doubt that one is possible. Sememes are not unitary, they are composed of smaller sememes which are composed of other sememes, one of which may well the sememe yu started out to analyze. It's circular and a good example of dependent origination.

      For example, would you say that "carry" "bring" or "take" are sememes? Well that doesn't fit the evidence in Chinese or Navajo. In Chinese you have a cluster of verbs that describe different wyas of holding something - on the back, in one hand, in both hands, by a belt or strap, on the head. To construct those other three verbs, you build a verb chain with those verbs of holding and then directional complements. from what I understand this is how it works in Navajo too.

      Archimedes said if you gave him a spot to stand on he could move the world. well, there is no spot to stand, nowhere to start, and he never moved the world.

  5. The Polish-Australian linguist Anna Wierzbicka has done some interesting work on semantic primes, but I think the whole area is still at a pre-theoretic stage and we are far from understanding patterns of semantic variation and semantic shifts (apart from being able to describe and classify them).

    The correct term is "sememe", by the way, but my little typo seems to have produced a viable replicator ;)

  6. I actually thought semes was more primal than sememes; that sememes were like atoms, and semes were like leptons or quarks (as per the current understanding). Or another way to think of it; sememes as operations with semes as operands. Evidently not. But that's what I get for trusting my education to Wikipedia.

    With that, thanks. Next time I'm at a good library I'll look her up.