18 May 2013

The Inca Connection: A Quechua Word Game


Gather round and I’ll show you a magic trick. Watch my hands, but first look at Table 1 below. It is based on a 200-word Swadesh list for Southern Quechua and the Tower of Babel “Eurasiatic” etymologies:

Table 1


Proto-Eurasiatic
Meaning
Quechua
Meaning
1
*ma, *ʔVnV
not
mana... chu
ama... chu
not (negation)
not (prohibition)
2
*ḳV
this
kay
this
3
*mV
what (interr.)
ma
what (interr.)
4
*mV[c]V
old
machu
old
5
*ʔVmV
mother
mama
mother
6
*ḳerV
bark (of a tree)
qara
bark (of a tree)
7
*gwVrV
bark, skin
qara
bark, skin
8
*ḲaĺV
skin
qara
skin
9
*ḲorV
worm
kuru
worm
10
*ḳurV
short
kuru
short
11
*ḳUlV
far, next
karu
far
12
*lVKV
thick, dense
raku
thick
13
*Ḳa[lH]ä
tongue, speak
qallu
tongue
14
*ṗVĺV
feather, tail
puru
feather
15
*ʕeḳu
water
yaku
water
16
*ḳuc`u
cut
kuchuy
cut
17
*[č]orwV
(a kind of) fish
challwa
fish
18
*bongä
thick, swell
punkiy
swell
19
*ṗuɣV
blow
phukuy
blow
20
*külV
cold
chiri
cold
21
*w[e]ṭV
year
wata
year
22
*ḳVlV
grass
qura
grass

There are only twenty-two matches because I got bored too soon, but it’s an easy game. One can even formulate some preliminary “regular correspondences” (supported by a few cognate pairs each!). For example, Eurasiatic liquids (laterals and rhotics) generally merge in Quechua, yielding /r/ (8, 11, 12, 14, 20, 22), but before certain consonants (laryngeals and semivowels) liquids are reflected as palatal /ʎ/, spelt ll (13, 17). Eurasiatic affricates are generally preserved as such, yielding Quechua ch /tʃ/ (4, 16, 17), but we also have one example of a velar stop palatalised and affricated before a front vowel (20) and possibly one more (1) if chu is related to PIE *kʷe (but I can’t say at this stage why the *e is reflected as /u/). Before the low vowel /a/ Eurasiatic dorsals become velar /q/ in Quechua (6, 7, 8, 13). There are sporadic exceptions (2, 11) and one occurrence of a uvular before /u/ (22), but come on, folks, you can’t expect me to solve all problems in one fell swoop with so little material.

No comment (Aaarrrrrgh!)
I think I have already demonstrated beyond reasonable doubt that the Quechua people are a lost Nostratic tribe. Note that the semantic matches are impeccable and the similarity of the words is quite obvious to any open-minded observer. Indeed, the matches are much better than many of those in the LWED. The quality of examples 1, 2, 3, 4, 5, 6, and 9, in particular, is guaranteed by the fact that they represent statistically certified ultraconserved Eurasiatic vocabulary (Pagel et al. 2013). The famous items ‘mother’, ‘bark’, and ‘worm’ are among them. In many Eurasiatic languages the words for ‘bark’ and ‘skin’ are the same or look related (6, 7). This seems to be true of Quechua as well, but just in order to probe every possibility, I can offer an alternative etymology of qara ‘skin’ (8, from a different Eurasiatic root), in which case its homophony with qara ‘bark’ must be accidental. A nice match either way.

But there is more to Quechua than just its Eurasiatic affinities. It seems to be particularly close to Proto-Indo-European. Compare the Quechua numerals pichqa ‘5’ and suqta ‘6’ = PIE *penkʷe, *sweḱs, clearly a common Indo-Quechuan innovation not shared with any other Eurasiatic group. I can’t reveal too much at present, but mark my words: you’ll read about it in Nature one day – or Science, perhaps, or PNAS.

[► Back to the beginning of the Proto-World thread]

32 comments:

  1. You should totally turn this into a real paper, like Sokal's 1996 paper "Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity".

    ReplyDelete
    Replies
    1. It's indeed my response to what could be called "postmodern historical linguistics". But Sokal deliberately produced nonsensical word salad, while I'm at least partly serious about the comparative analysis. My "long-range comparison" picked up so many false positives not because I was cheating but because I was comparing real-language data with highly unconstrained reconstructions (*******!). Many long-rangers do the same thing in earnest.

      There is, by the way, nothing inherently impossible about Quechua-Eurasiatic connections. If the "Eurasiatic" languages began to diverge as early as 15 ky BP, that would have left their speakers some 1,500 years to discover the Beringian passage and catch up with the Clovis expansion. But let's face it, any real historical relationship would have produced a very different pattern -- not this near-indentity of linguistic forms. "Ultraconserved" doesn't mean almost unchanged.

      Delete
    2. Probably it also helped that you compared a large sound system to a small one. You had to "assume" several unconditional mergers, and that's always easy. What would happen with a Quechua variety that distinguishes plain, aspirated and ejective plosives?

      (Of course the next question is whether the Proto-Eurasiatic sound system should really be reconstructed as so large.)

      Anyway, from the OP:

      and possibly one more (1) if chu is related to PIE *kʷe (but I can’t say at this stage why the *e is reflected as /u/)

      The Nostraticists have actually taken care of that one: they derive PIE *ʷe from PEA *o and *u. It's the ch that doesn't fit. :-)

      Delete
    3. Probably it also helped that you compared a large sound system to a small one. You had to "assume" several unconditional mergers, and that's always easy. What would happen with a Quechua variety that distinguishes plain, aspirated and ejective plosives?

      I used a Swadesh list for Cuzco Quechua, which has these contrasts. Fortunately, plain stops are the most frequent type there ;-). But even if they weren't, there are enough wildcards and cover symbols in Eurasiatic reconstruction to make comparison with any other system relatively easy by offering alternative possibilities.

      Delete
  2. Well, this can be merged with Mark Rosenfelder's "Deriving Proto-World with tools you probably have at home", which satirizes mass comparison by showing that Quechua-speakers are also a lost part of the Sinitic world. Add to this a certain person's totally non-satirical claims about the Indo-European nature of Tsimshianic....

    ReplyDelete
  3. I admit I was struck by the apparently topical reference to "Di's divorce". Jeez that is an old page!

    ReplyDelete
  4. It's easier if you compare a language of your choice (or even a computer-generated list of artificial words for real meanings) with something reconstructed, like Starostin's Eurasiatic/Nostratic. Optional segments, cover symbols and numerous synonyms help a lot. I found perfect matches (100% semantic agreement + more or less regular sound correspondences) for more then 10% of the longer Swadesh list in less than half an hour. Imagine what could be done with a large Quechua dictionary a lot of time to spare.

    ReplyDelete
  5. BTW, the paper by Page et al. has received criticism from Moscow, too.

    ReplyDelete
  6. Thanks for the link, Sergei, a very competent discussion, highlighting all the methodological faults.

    ReplyDelete
  7. "I think I have already demonstrated beyond reasonable doubt that the Quechua people are a lost Nostratic tribe."

    Oh well, this is very exciting, isn't it? This shows how Uto-Aztecan really is descended from Biblical Hebrew and that modern Utes and Shoshone are descended from the Lamanite tribes. Quick, send word to Salt Lake City! /s

    And you thought you were joking. There really are people poised to swallow this stuff.

    ReplyDelete
  8. Broadly speaking, when comparing distantly related languages, it's far easier to pick either chance resemblances tor Wanderwörter han genuine cognates, as the latter tend to undergone semantic shifts, more so in the enourmous time depths involved, which are much older than the ones proposed by long-rangers.

    An example (among many others) of a chance resemblance mistaken for a true cognate in The Tower of Babel is Kartvelian *q´el- 'neck' vs. IE *kol-s-o- id. The latter is a Latin-Germanic isogloss derived from IE *kºel- 'to turn', reflecting the physical analogy between 'neck' and 'pole'.

    ReplyDelete
    Replies
    1. Of course, along semantic shifts there were also lexical innovations as e.g. the word 'neck' in Germanic and Italic.

      On the other hand, I don't think Swadesh lists are a useful tool in long-range comparisons, mainly for the reason stated above. In my opinion, true distant relationships must be much older (at least 2-3 times) than just 15,000 yBP.

      Delete
  9. Hello Piotr! It is nice to me to have found your excellent blog!

    Let me refresh this a little old subject, if you please. It seems to me that in your "magic trick" there is plenty of sarcasm and perhaps even mockery towards so called long-range linguistics. I do not share your criticism - but let's leave it for a while, and let's look for just honesty.

    I am asking myself a simple question: how big is the probability that the Quechua word/root for, say, skin matches the Nostratic/Eurasiatic reconstruction. To count it, firstly we should ask how many possible Quechua roots (possible = permissable by word-building rules that operate in that language) look similar to the reconstructed ones. Secondly, we should compute how many different roots are possible in Quechua at all. Finally, let's divide the two values. What result do you expect? Close to 0.01? 0.0001? Note that the value means the probability that we are dealing with a real word game = the probability that the observed similarity is indeed caused by chance.

    Then let's take another root from your chart, say, for tongue, and repeat the operation. To obtain the correct value of the probability that both "skin" and "tongue" roots look like those reconstructed for Nostratic by a simple chance, we must multiply both results.

    Let's assume now that the reconstructions are very inaccurate, which causes numerous potential Quechua roots to be similar to them. I will even assume that as many as one Quechua possible root per ten is enough similar to the Nostratic reconstruction. Then it will appear that the probability of pure chance here is enough high and equals to 0.1 (or 10%). But then the probability that both "skin" and "tongue" roots are similar to those in Eurasiatic by chance is only 0.01 (or 1%). Is it still plausible?

    I realize that Nostratic reconstructions are inaccurate, and sometimes even inadequate. This is why I have assumed the implausibly high value of the percentage (1% for each analysed item). Just to have a really large margin for the case of possible critique.

    Let's go further this way. Give up the "mum" word (as it looks similar in so many languages), and perhaps some others from your chart... Even if there remain only 10 possible cognates, and if we maintain for each of them that every one per ten Quechua possible roots look similar to the respective Eurasiatic reconstruction, we will be able to compute that the total probability of the chance is 1 * 10^(-10). In other words, there is only one chance per ten billion that your chart is indeed nothing more than a word game.

    ReplyDelete
    Replies
    1. Hi, Grzegorz,

      Thanks for visiting my blog. Your probabilistic reasoning is flawed, but I can't refute it in two or three sentences. I'm working on a review at the moment, so please have a little patience. I'll try to respond at length tomorrow.

      Delete
    2. Grzegorz, I agree that the probability of getting a "Eurasiatic" match for a given Quechua word is of the order of 0.1. It would be much less than that if we we comparing two Swadesh lists for two real languages, but comparison with the Eurasiatic part of the ToB database offers a lot of leeway thanks to their use of optional segments and cover symbols, and the fact that for many items there are synonyms to choose from. Let's then accept 0.1 as a plausible guesstimate.

      You are right that for any given set of n Quechua words the individual probabilities of "success" (finding a Eurasiatic match in the ToB database) can be treated as independent and should be multiplied. Therefore, the probability of getting exactly 22 successes (as in my little exercise) equals 10^(-22) (one chance in ten sextillion). Wow! We have demonstrated the validity of the Quechua-Eurasiatic theory beyond reasonable doubt!

      What is wrong with this calculation? Note the word given above. In my game the set is not given in advance. It consists of 22 words I have picked out of 200. There are C(200,22) possible ways of selecting 22 distinct objects out of 200, where C(n,k) is the binomial coefficient:

      C(200,22) = 22!×178!/200! = 112532031446554154468618348400

      So I chose just one set among the more than 10^29 (one hundred octillion) possible ones and got 22 successes. Note that the words were hand-picked (because I first made sure they matched something in the database), not selected at random. To estimate the actual probability of 22 successes out of 200 trials you have to take into account all the remaining possible selections.

      From the formal point of view, my game is a typical Bernoulli trial with a binomial distribution of successes and failures. There are n trials yielding success with probability p (and failure with probability 1-p). The probability of getting exactly k successes (and n-k failures) is C(n,kp^k×(1-p)^(n-k).

      Assuming n=200, k=22 and p=0.1, the probability of getting exactly 22 successes in 200 trials is 0.0806. However, I could also claim a win if I got more than 22 successes, so we have to add up the probabilities of 22, 23, 24, ... 200 successes. The total probability of getting at least 22 wins out of 200 is 0.3516. Let's suppose that we have underestimated p just a little, and that the actual value is, say, 0.12. In that case, the probability of getting at least 22 successes is 0.6999. If we have overestimated it, and the actual value is 0.08, the probability of 22 or more successes is 0.0804. So if p=0.1(±20%), our successes are not statistically significant.

      Delete
    3. Erratum (first paragraph): for "if we we" read "if we were".

      Delete
    4. Oops! Sorry for posting in haste. Another correction:

      C(200,22) = 200!/(22!×178!) = 112532031446554154468618348400

      Delete
    5. Let me add, in case it isn't quite clear, that p≈0.1 is the estimated probability of a chance match between a word on the Swadesh list for Quechua and a potential cognate in the database. So when I say that "our successes are not statistically significant", I mean they are not sufficient to refute the hypothesis that the matches are due to chance -- not by a long shot.

      I shouldn't have posted that lengthy comment before I finished my morning coffee. No "Edit" button for comments is one of those deficiencies of Blogger that drive me nuts.

      Delete
    6. Fortunately, there's a remedy for this, and it's called Disqus. :-)

      Delete
    7. Nooooo... Disqus comes with its own problems, like expecting that you don't even want to see all comments. I recommend Wordpress.

      Delete
    8. Piotr, thank you for your kind response. Sorry, I have not been able to response at once.

      You are right of course, and my posts were a little provocative. Many people try to prove genetic relation just by showing 10, 20 or even 50 pairs of similar looking words. I would tell them the same as you said in a little different, non too much mathematical way, however. Namely, p = 0.1 means that, among 200 words, 20 are expected to match just by chance, and their presence proves nothing. It means that if we compare two completely unrelated (or rather: really distantly related) languages with the help of their Swadesh-200 lists, the maximum probability is that we will get 20 matches. In yet other words, for n = 200 and p = 0.1, Binomial[n, k]*p^k*(1 - p)^(n - k) gets its maximum value (of 0.0936363) when k = 20.

      Maths is useful only when we can do the proper interpretation of the results. The probability of 0.08062 (which means 8.062%) tells us that there is a little less than 1 chance in 12 that we will get exactly 22 matches if the compared languages are not related (i.e. so distantly related that the Swadesh's method does not lead to any significant results). Once again, if we assume (!) that two compared languages are not related, there is ca. 8% chance that we will get exactly 22 matches from the list of 200 words.

      Any further conclusions are over-interpretation. In particular, in any case we cannot say that these two investigated languages are not related. We cannot also test the hypothesis that they are related - because kinship is not a binary relation. We can only say (at most) how much the kinship is.

      Delete
    9. Btw., there is no aim in considerations on [b]at least[/b] 22 matching pairs - because the number of not matching pairs is also important for our calculations, and the number is known (or at least, should be known). If we found 30 matching pairs (of 200), the probability would be only 0.0068, so almost 12 times less than with 22 matching pairs.

      Besides, the correct estimation of p is yet more important. I proposed p = 0.1 on purpose, only because I just knew the result in advance. But the value means that the Nostratic reconstructions are [b]very[/b] inaccurate. If we assumed p = 0.05 (twice less), so doubled accuracy of reconstructions, the probability of obtaining exactly 22 matching pairs out of 200 just by chance would be only 0.000290682 (more than 277 times less!).

      The probability p = 0.05 means for example that we compare two CVC roots, totally neglecting the vowel, and distinguishing only 4-5 different types of consonants (it is the case, for example, when treating all of *kor, *qur, *khil, *gel like matching each other). I think such an accuracy should not be excluded, the more that [i]one can even formulate some preliminary “regular correspondences” (supported by a few cognate pairs each!)[/i]. As a result, there is still no reason for treating the Quechua - Nostratic similarity with sarcasm.

      Delete
    10. "Similarity" is a tricky notion. The more distant the genetic relationship, the less likely real cognates are likely to be. Any discernible similarity between, say, Greek and Malay, such as duo : dua '2' and mati : mata 'eye' can only be accidental (or perhaps, in some cases, due to horizontal transfer: Malay has Indo-Aryan loans which may be vaguely similar to related Greek words, e.g. polis : pura 'city'). Collecting lookalikes is not a constructive approach to long-range comparison.

      Delete
    11. ... the less similar real cognates are likely to be.

      Delete
  10. Still convinced that the thesis of a distant relationship between Quechua and Nostratic is so ridiculous?

    If yes, please explain why there is only one possibility per ten billion that your chart is the result of a blind fate. And why do you take so little-little-little probable hypothesis as a real fact to such a degree that you ridicule the contrary view.

    Btw. (@ John Cowan) "a certain person" who makes the "totally non-satirical claims about the Indo-European nature of Tsimshianic" is John Dunn, a professional linguist, a retired professor of linguistics at the University of Oklahoma in Norman, on a base of some tens or even some hundreds of words. I have acquainted with the data collected by him (now impossible to find online). Basing on it, and on my (very limited) knowledge of Tsimshianic grammar, I personally still see no real reason for his claiming that Tsimshianic is Indo-European. But those tons of similar-looking words are really impressive, and I would not object to the idea that most of them are loanwords from an Indo-Euroepan language, possibly close to Tocharian.

    I would rather suggest trying to explain the similarity instead of just treating it with ridicule.

    If similarity of many roots between a certain language and Nostratic, Indo-European etc. had been claimed for a language used somewhere in Eurasia, would we have been more likely to accept that the similarity may be the result of mutual influence in the past or even of a genetic relation? If yes, and despite this we are not willing to claim such a relationship between Eurasiatic and, say, Quechua, maybe the reason is our misconception about the past?

    I mean: the Nostratic proto-language may really have existed once, say, 10 thousand years ago, and the language predecessors of today's Quechua may have lived somewhere in the proximity, or even their language may have been genetically closely related to Nostratic. So, we cannot just say that the observed (Nostratic - Quechua) similarity is still so implausible that even ridiculous. It is so, because there is not a single fact for which we could be sure that the above-outlined scenario is a nonsense (or I do not know one).

    Anyway, why should we believe instead in a hypothesis which is as little probable as one to ten billion?

    ReplyDelete
    Replies
    1. Still convinced that the thesis of a distant relationship between Quechua and Nostratic is so ridiculous?

      Personally, I don't find it ridiculous at all. Still, I agree with Piotr that a simple comparison of lookalikes cannot demonstrate it; it can only be right for the wrong reason.

      As a biologist, I even think the question "is Quechua related to Eurasiatic at all" is simply beside the point. Of course they're related. While the monogenesis of language isn't as obvious as the single origin of all Life As We Know It, there is simply no reason to suppose that language evolved more than once or that some population lost it and then redeveloped it from scratch. The interesting question is the shape of the tree: what is the closest relative of Quechua, and what is the closest relative of Indo-European?

      Basing on it, and on my (very limited) knowledge of Tsimshianic grammar, I personally still see no real reason for his claiming that Tsimshianic is Indo-European. But those tons of similar-looking words are really impressive

      Not if you look at the details. Dunn proposes several really strange sound correspondences and many really strange shifts in meanings. Let's reconstruct Proto-Penutian first, and then let's compare that to a much improved Proto-Eurasiatic.

      If similarity of many roots between a certain language and Nostratic, Indo-European etc. had been claimed for a language used somewhere in Eurasia, would we have been more likely to accept that the similarity may be the result of mutual influence in the past or even of a genetic relation?

      This depends on what the similarities are like: where in the vocabulary are they found, what are the regular sound correspondences, do they contain elements that make no sense in one language but are grammatical affixes in another, and so on. It's difficult; above all, it's a lot of work.

      Delete
    2. As a biologist, I even think the question "is Quechua related to Eurasiatic at all" is simply beside the point.

      I am also a biologist, and it is nice to me to meet a colleague. I agree with you - all languages, and especially all extra-African languages, come from one proto-language.

      The interesting question is the shape of the tree

      Yes, indeed. For example, closer relationship of Quechua and Nostratic than Quechua and, say, Australian languages is something expected. It would be exciting if it appeared that things go inversely (because it might suggest then that the New World has been peopled in no accordance with the Hrdlicka's paradigm).

      Among all Native American languages, Greenberg distinguished only 3 large groups, Eskimo-Aleut, Na-Dene and the third one which he called Amerindian. We may try to reconstruct the tree of Ameridian - but what if we could demonstrate much closer relationship between, say, Quechua and Nostratic, than between another Amerindian language and Nostratic?

      Dunn proposes several really strange sound correspondences and many really strange shifts in meanings.

      Not stranger than, for instance, shifts postulated for Armenian.

      Let's reconstruct Proto-Penutian first, and then let's compare that to a much improved Proto-Eurasiatic.

      Under the condition that Penutian is a real division... Besides, in my opinion we are talking about loanwords. If yes, it is enough to show that we deal with Tsimshianic words without cognates in other languages thought to be related. A Proto-Penutian reconstruction is not needed for this.

      Telling the truth, we are talking about Nostratic, not Eurasiatic. Greensberg's Eurasiatic does not include Dravidian, Kartvelian or Afro-Asiatic - while Nostratic does. And the ToB reconstruction is based on Nostratic, not Eurasiatic. Pagel, Atkinson (the same who “proved” the location of the IE cradle in Anatolia) and others have simply got confused with these terms.

      Delete
    3. Not stranger than, for instance, shifts postulated for Armenian.

      Not true.

      Greensberg's Eurasiatic

      Oh, sorry, I'm not talking about Greenberg's but about the Moscow School version, which forms Nostratic together with Afro-Asiatic. Many recent Moscow School works have used the term "Nostratic" for this branch in order to avoid confusion with Greenberg's version, thus causing confusion with their own earlier usage...

      Delete
  11. all languages, and especially all extra-African languages, come from one proto-language

    While monogenesis is plausible, there is no actual evidence for it, and not likely to be any either.

    we are talking about loanwords

    IE loanwords in Tsimshianic only doesn't make much sense, unless you suppose that Tsimshianic split from the rest of Penutian already in the Old World, which there is no reason to believe.

    really strange sound shifts

    Unnaturalness of sound shifts is no big deal: see Austronesian.

    ReplyDelete
  12. If the Out of Africa theory is right (= all modern humans stem from a single group of Homo sapiens who came from Africa), does it not require that we suppose all human languages to have evolved from a single language?

    ReplyDelete
  13. You are completely right. I see you have been met with the humdrum cynical attitudes which are so rampant in the western world of historical linguistics...
    Oh, it doesn't stop at Quechua and Indo-European either -- check out Burushaski numerals. Cognates are also rampant throught Tibetic languages. One commentor was on the money about Tocharian, that's got fascinating matches as well.
    Indeed, it is difficult to imagine a scenario in which the languages of the Americas DIDN'T have contact with ancient IE, Uralic, Sino-Tibetan...
    Of course, there is always the possibility of other paths of migration than the Bering Strait. To me the linguistic complexity of the Americas points toward several waves of arrival over thousands of years. Imagine all the sort of borrowing and hybridization that can go on in that time. And yet still these cognates ring clear as bells!
    Oh, this wonderful mystery of language.

    ReplyDelete
  14. I am definitely not a supporter of Amerind, by the way, although there is certainly more interconnectivity going on than linguists are currently wiilling to admit/care to address.

    ReplyDelete