25 April 2013

One Big Family? Evidence, Please


Question 2: Are all the recorded languages ultimately related?
We normally say that two languages are related if they go back to a common ancestor. But we have already seen that “common ancestry” is a tricky notion in the case of entities with easily permeable fuzzy boundaries. To say that, for example, Latin and Sanskrit had a common ancestor is shorthand for saying that the most conservative core of their lexicons consists of linguistic replicators whose genealogies can be traced back to “the same” speech community, delimited in space and time. The core is quite thick, to be sure – several hundred lexical items with cognates in the other language. They show evidence of having undergone the characteristic sound changes that have affected each lineage during its separate history, they display similar inflectional patterns, etc. They are homological structures, not mere lookalikes. But the fact remains that both Latin and Sanskrit also contain thousands of lexical units whose history is more complicated. Some have been borrowed from one branch of Indo-European into another, others come from outside the family. For example, Latin has loanwords borrowed from Etruscan, a non-Indo-European language of ancient Italy (in addition to inner-IE loans from Greek, Gaulish, the extinct Italic languages, etc.); Sanskrit in turn has numerous words apparently imported from extinct and otherwise lost ancient languages – the enigmatic linguistic substrates of Central Asia and the Indian subcontinent.
The “core” vocabulary tends to erode away with the passage of time.  The branches of the family trees we reconstruct do not represent complete languages but only their most durable cores, which get thinner and thinner as we run our reconstruction back in time. Proto-Indo-European is still a solid construct because Indo-European is a vast family with some excellently documented representatives providing first-class historical evidence, but many language families are defined on a much shakier basis. Uralic is not bad at all, but its lexical core shared between the primary branches of the family amounts to something like 200 reconstructible roots. Afroasiatic, with just a few dozen uncontroversial proto-morphemes, is already a borderline case. “Relatedness” understood as shared ancestry makes sense as long as we can support it with a large number of words and morphemes showing systematic phonological correspondences. If all we can parade as evidence is a handful of imperfectly matching lexical roots and some similar-looking inflectional endings, “relatedness” evaporates and cognacy becomes indistinguishable from accidental similarity.
It is quite possible that high-frequency units for which we can predict the lowest rate of lexical replacement and the longest survival time – for example personal pronouns – may be retained via vertical inheritance long enough to suggest remote relationship between otherwise distinct language families. This might be the source of some curious cross-family correspondences like the M-T phenomenon in several language families of northern Eurasia (where the nasal /m/ tends to occur in first-person pronouns and a coronal obstruent in second-person pronouns). But such evidence, no matter how tantalising, is hardly sufficient to demonstrate a “superfamily” relationship if not backed up by a substantial amount of data to which the comparative method could be applied to rule out chance agreement.
If the applicability of the family tree model is limited in this way, perhaps we should focus on individual linguistic replicators – the stuff of which languages are made – rather than languages themselves. It could be argued that despite horizontal diffusion the genealogies of related replicators will still converge at some point in the past. Their family trees will not be isomorphic with language phylogenies, but a borrowed morpheme also has its deeper history in the language it came from. Even if the notion of “language relatedness” can’t be extended ad infinitum, it is imaginable that most replicators, whether transmitted vertically between generations of speakers or horizontally between different speech communities, eventually coalesce with their relatives in one and the same ancestral speech comunity somewhere in the deep prehistory of language. I see no easy way to disprove such a possibility, but I see no way to prove it either.
The kind of thing I do not trust at all
Relatedness can be tested for items with reconstructable histories, because we know what regular changes they can be expected to have undergone along the way, and what correspondences they should exhibit. Without that knowledge, anything could be related to anything else. A long sequence of phonological changes can distort a word beyond recognition; semantic shifts can change its meaning. With a little bit of imagination it’s easy to invent an arbitrary scenario relating a word in Basque to a word in Georgian, Hungarian, Sumerian, or any language of one’s choice. It’s a popular sport among amateur long-range comparatists, but it is not the way sound historical linguistics should be practised. It is wiser to admit out ignorance than to use dubious methods to get untestable results. So my well-considered answer to Question 2 is, “I have no clue”. The null hypothesis in such cases is always that A is not related to B unless there is sufficient evidence to conclude otherwise. I apologise if this attitude sounds unromantic.

[► Back to the beginning of the Proto-World thread]

16 comments:

  1. I noticed yesterday when I looked in at ethnologue that they have decided Penutian is not a real family, as well as Na-Dene (by taking Tlingit out of it)

    they are probably wrong, bt the thing is they are acting on priinciple. and when you have a bunch of languages or groups like the ones that make/made up Penutian, that have been in contact for millenia, it jusst becomes impossible to determine if the similarities are genetic or contact phenomena. It's a like a criminal case where the evidence is hopelessly tainted.

    They broke up Hokan too, even the northern group of groups that looked pretty solid.

    But this abundance of caution is not new with American languages. Even though Caddoan, Iroquoian and Siouan show all kinds of tantalizingly common features, people have refused to give in to eagerness. The same goes for linking Salishan and Algonkian, even though they are both about 4,000 years old, both have geographical decently close points of diffusion and all that (Algic doesn't count, it includes the obviouly unrelated Wakashan.)

    As you say, there are problems even with Afro-asiatic. Apparenbtly Chadic shars all the diagnostic features excpet any shared vocabulary. Oops.

    ReplyDelete
  2. The Omotic languages are even worse. The proportion of their Swadesh lexicon shared with the rest of AA is practically at the level of random noise, and the inflectional correspondences are like those between IE and Uralic -- possibly phantoms.

    It's worthy keeping in mind that the null hypothesis (no relationship) is always provisional and open to falsification. That's why it's better to adopt it when in serious doubt. You can always abandon it if new evidence comes to light.

    ReplyDelete
  3. worthy -> worth. Sorry for my messy typing.

    ReplyDelete
  4. It's worth keeping in mind that the null hypothesis (no relationship) is always provisional and open to falsification. That's why it's better to adopt it when in serious doubt. You can always abandon it if new evidence comes to light.

    Hear, hear! Romance is great, but it has no place in serious scholarship.

    ReplyDelete
    Replies
    1. Except Romance in the technical sense (the most recent common ancestor of Sardinian, Romanian and Portuguese, and all its descendants).

      Delete
    2. Somebody open a window!

      Delete
  5. Since you mentioned the subject, serious long-range comparativism doesn't work as you described it. In fact, short-range comparison within language families, which leads to the reconstruction of their proto-languages, is a pre-requisite to long-range comparativism.

    Also academic credentials doesn't qualify one's work. Actually, there're many so-called "professionals" whose work is of an amateur level and viceversa (although less frequently).

    ReplyDelete
    Replies
    1. I haven't dealt with serious long-rangers yet. And of course I agree that there are people with academic credentials whose methods don't qualify as seriuos.

      Delete
    2. Well, I'm a "serious long-ranger" myself, although I focus on individual etymologies rather than a whole bunch of them. Hence Swadesh lists and lexico-statistical methods don't appeal to me.

      Delete
  6. In my opinion, the limitations of the classical tree model would also apply to the IE family, which I don't consider to be precisely a case of short-range comparativism, although certainly is shorter than proposed macro-families such as Afrasian, Nostratic, etc., for which the tree model is even more inadequate.

    The PIE reconstructed by IE-ists contains more than 2,000 individual lexical items found in IE languages. By contrast, its morphology is much more compact. As pointed by IE-ists such as Villar, this would indicate an extensive language replacement took place in the Chalcolithic-Bronze Age (Kurgan theory) by which the morphology came from the superstrate (replacing language) while much of the lexicon of the substrates (replaced languages) was retained.

    I've already alluded to some of this substrate lexicon, which contrarily to what Vennemann thinks isn't Basque-like at all. In fact, Basque doesn't seem to be a representative of the "autochthonous" European languages but rather from the east, possibly from the same steppes where some placed the myhical PIE speakers.

    ReplyDelete
    Replies
    1. Errata: I mean "but rather an invader from the east".

      Delete
    2. By contrast, its morphology is much more compact.

      "Compact" in what sense? Reconstructible PIE has a very rich and complex system of inflection, word-formation and compounding, very different from "contact languages" such as creoles.

      Delete
    3. I mean some of the IE morphology is quite uniform across diffrent groups, white on the other hand there're some interesting dialectal isoglosses as regarding morphology. For example, the distribution of the relative prononouns *jo- and *kWe-, the passive verbal forms in *-r, the genitive suffixes -oso-, -osjo- and so on. In my opinion, they're a reflex of the language replacement process(es) I mentioned before.

      I think isn't a single monolithic proto-language as commonlyn though but rather a projection of several linguistic layers, stratified in time and space. I hope you'll understand my point.

      Delete
  7. FWIW: the number of "about 200 Uralic roots" can quite likely be expanded to a ballpark of 300-400, and including less secure comparisions extends this yet further to 600-ish. The older estimates rely on requiring Samoyedic cognates, after an old but unproven assumption of a deep division between Finno-Ugric and Samoyedic. (Recent research suggests instead a grouping of Samoyedic and Ugric.)

    If anything makes the case for Uralic weaker than Indo-European (in the same irrelevant technical sense by which ten trillion is larger than one trillion), it's the absense of original morphological root alternations. For PIE case forms of words can be frequently reconstructed individually, expanding the data available far beyond the bare root count.

    ReplyDelete