Gather round and I’ll show you a magic trick. Watch my hands, but first look at Table 1 below. It is based on a 200word Swadesh list for Southern Quechua and the Tower
of Babel “Eurasiatic” etymologies:
Table 1
ProtoEurasiatic

Meaning

Quechua

Meaning


1

*ma, *ʔVnV

not

mana... chu
ama... chu

not (negation)
not (prohibition)

2

*ḳV

this

kay

this

3

*mV

what (interr.)

ma

what (interr.)

4

*mV[c]V

old

machu

old

5

*ʔVmV

mother

mama

mother

6

*ḳerV

bark (of a tree)

qara

bark (of a tree)

7

*gwVrV

bark, skin

qara

bark, skin

8

*ḲaĺV

skin

qara

skin

9

*ḲorV

worm

kuru

worm

10

*ḳurV

short

kuru

short

11

*ḳUlV

far, next

karu

far

12

*lVKV

thick, dense

raku

thick

13

*Ḳa[lH]ä

tongue, speak

qallu

tongue

14

*ṗVĺV

feather, tail

puru

feather

15

*ʕeḳu

water

yaku

water

16

*ḳuc`u

cut

kuchuy

cut

17

*[č]orwV

(a kind of) fish

challwa

fish

18

*bongä

thick, swell

punkiy

swell

19

*ṗuɣV

blow

phukuy

blow

20

*külV

cold

chiri

cold

21

*w[e]ṭV

year

wata

year

22

*ḳVlV

grass

qura

grass

There are
only twentytwo matches because I got bored too soon, but it’s an easy game. One
can even formulate some preliminary “regular correspondences” (supported by a
few cognate pairs each!). For example, Eurasiatic liquids (laterals and rhotics)
generally merge in Quechua, yielding /r/ (8, 11, 12, 14, 20, 22), but before certain consonants (laryngeals and semivowels) liquids are reflected as palatal /ʎ/, spelt ll (13, 17). Eurasiatic
affricates are generally preserved as such, yielding Quechua ch /tʃ/ (4, 16,
17), but we also have one example of a velar stop palatalised and affricated before a front vowel (20)
and possibly one more (1) if chu is related to PIE *kʷe (but I can’t say at this stage
why the *e is reflected as /u/). Before the low vowel /a/ Eurasiatic
dorsals become velar /q/ in Quechua (6, 7, 8, 13). There are sporadic exceptions (2, 11) and one occurrence of a uvular
before /u/ (22), but come on, folks, you can’t expect me to solve all problems in one fell
swoop with so little material.
No comment (Aaarrrrrgh!) 
But there
is more to Quechua than just its Eurasiatic affinities. It seems to be particularly
close to ProtoIndoEuropean. Compare the Quechua numerals pichqa ‘5’ and suqta
‘6’ = PIE *penkʷe, *sweḱs, clearly a common IndoQuechuan innovation not shared
with any other Eurasiatic group. I can’t reveal too much at present, but mark
my words: you’ll read about it in Nature one day – or Science, perhaps, or
PNAS.
[► Back to the beginning of the ProtoWorld thread]
[► Back to the beginning of the ProtoWorld thread]
You should totally turn this into a real paper, like Sokal's 1996 paper "Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity".
ReplyDeleteIt's indeed my response to what could be called "postmodern historical linguistics". But Sokal deliberately produced nonsensical word salad, while I'm at least partly serious about the comparative analysis. My "longrange comparison" picked up so many false positives not because I was cheating but because I was comparing reallanguage data with highly unconstrained reconstructions (*******!). Many longrangers do the same thing in earnest.
DeleteThere is, by the way, nothing inherently impossible about QuechuaEurasiatic connections. If the "Eurasiatic" languages began to diverge as early as 15 ky BP, that would have left their speakers some 1,500 years to discover the Beringian passage and catch up with the Clovis expansion. But let's face it, any real historical relationship would have produced a very different pattern  not this nearindentity of linguistic forms. "Ultraconserved" doesn't mean almost unchanged.
Probably it also helped that you compared a large sound system to a small one. You had to "assume" several unconditional mergers, and that's always easy. What would happen with a Quechua variety that distinguishes plain, aspirated and ejective plosives?
Delete(Of course the next question is whether the ProtoEurasiatic sound system should really be reconstructed as so large.)
Anyway, from the OP:
and possibly one more (1) if chu is related to PIE *kʷe (but I can’t say at this stage why the *e is reflected as /u/)
The Nostraticists have actually taken care of that one: they derive PIE *ʷe from PEA *o and *u. It's the ch that doesn't fit. :)
Probably it also helped that you compared a large sound system to a small one. You had to "assume" several unconditional mergers, and that's always easy. What would happen with a Quechua variety that distinguishes plain, aspirated and ejective plosives?
DeleteI used a Swadesh list for Cuzco Quechua, which has these contrasts. Fortunately, plain stops are the most frequent type there ;). But even if they weren't, there are enough wildcards and cover symbols in Eurasiatic reconstruction to make comparison with any other system relatively easy by offering alternative possibilities.
Well, this can be merged with Mark Rosenfelder's "Deriving ProtoWorld with tools you probably have at home", which satirizes mass comparison by showing that Quechuaspeakers are also a lost part of the Sinitic world. Add to this a certain person's totally nonsatirical claims about the IndoEuropean nature of Tsimshianic....
ReplyDeleteI admit I was struck by the apparently topical reference to "Di's divorce". Jeez that is an old page!
ReplyDeleteIt's easier if you compare a language of your choice (or even a computergenerated list of artificial words for real meanings) with something reconstructed, like Starostin's Eurasiatic/Nostratic. Optional segments, cover symbols and numerous synonyms help a lot. I found perfect matches (100% semantic agreement + more or less regular sound correspondences) for more then 10% of the longer Swadesh list in less than half an hour. Imagine what could be done with a large Quechua dictionary a lot of time to spare.
ReplyDeleteBTW, the paper by Page et al. has received criticism from Moscow, too.
ReplyDeleteThanks for the link, Sergei, a very competent discussion, highlighting all the methodological faults.
ReplyDelete"I think I have already demonstrated beyond reasonable doubt that the Quechua people are a lost Nostratic tribe."
ReplyDeleteOh well, this is very exciting, isn't it? This shows how UtoAztecan really is descended from Biblical Hebrew and that modern Utes and Shoshone are descended from the Lamanite tribes. Quick, send word to Salt Lake City! /s
And you thought you were joking. There really are people poised to swallow this stuff.
Broadly speaking, when comparing distantly related languages, it's far easier to pick either chance resemblances tor Wanderwörter han genuine cognates, as the latter tend to undergone semantic shifts, more so in the enourmous time depths involved, which are much older than the ones proposed by longrangers.
ReplyDeleteAn example (among many others) of a chance resemblance mistaken for a true cognate in The Tower of Babel is Kartvelian *q´el 'neck' vs. IE *kolso id. The latter is a LatinGermanic isogloss derived from IE *kºel 'to turn', reflecting the physical analogy between 'neck' and 'pole'.
Of course, along semantic shifts there were also lexical innovations as e.g. the word 'neck' in Germanic and Italic.
DeleteOn the other hand, I don't think Swadesh lists are a useful tool in longrange comparisons, mainly for the reason stated above. In my opinion, true distant relationships must be much older (at least 23 times) than just 15,000 yBP.
Hello Piotr! It is nice to me to have found your excellent blog!
ReplyDeleteLet me refresh this a little old subject, if you please. It seems to me that in your "magic trick" there is plenty of sarcasm and perhaps even mockery towards so called longrange linguistics. I do not share your criticism  but let's leave it for a while, and let's look for just honesty.
I am asking myself a simple question: how big is the probability that the Quechua word/root for, say, skin matches the Nostratic/Eurasiatic reconstruction. To count it, firstly we should ask how many possible Quechua roots (possible = permissable by wordbuilding rules that operate in that language) look similar to the reconstructed ones. Secondly, we should compute how many different roots are possible in Quechua at all. Finally, let's divide the two values. What result do you expect? Close to 0.01? 0.0001? Note that the value means the probability that we are dealing with a real word game = the probability that the observed similarity is indeed caused by chance.
Then let's take another root from your chart, say, for tongue, and repeat the operation. To obtain the correct value of the probability that both "skin" and "tongue" roots look like those reconstructed for Nostratic by a simple chance, we must multiply both results.
Let's assume now that the reconstructions are very inaccurate, which causes numerous potential Quechua roots to be similar to them. I will even assume that as many as one Quechua possible root per ten is enough similar to the Nostratic reconstruction. Then it will appear that the probability of pure chance here is enough high and equals to 0.1 (or 10%). But then the probability that both "skin" and "tongue" roots are similar to those in Eurasiatic by chance is only 0.01 (or 1%). Is it still plausible?
I realize that Nostratic reconstructions are inaccurate, and sometimes even inadequate. This is why I have assumed the implausibly high value of the percentage (1% for each analysed item). Just to have a really large margin for the case of possible critique.
Let's go further this way. Give up the "mum" word (as it looks similar in so many languages), and perhaps some others from your chart... Even if there remain only 10 possible cognates, and if we maintain for each of them that every one per ten Quechua possible roots look similar to the respective Eurasiatic reconstruction, we will be able to compute that the total probability of the chance is 1 * 10^(10). In other words, there is only one chance per ten billion that your chart is indeed nothing more than a word game.
Hi, Grzegorz,
DeleteThanks for visiting my blog. Your probabilistic reasoning is flawed, but I can't refute it in two or three sentences. I'm working on a review at the moment, so please have a little patience. I'll try to respond at length tomorrow.
Grzegorz, I agree that the probability of getting a "Eurasiatic" match for a given Quechua word is of the order of 0.1. It would be much less than that if we we comparing two Swadesh lists for two real languages, but comparison with the Eurasiatic part of the ToB database offers a lot of leeway thanks to their use of optional segments and cover symbols, and the fact that for many items there are synonyms to choose from. Let's then accept 0.1 as a plausible guesstimate.
DeleteYou are right that for any given set of n Quechua words the individual probabilities of "success" (finding a Eurasiatic match in the ToB database) can be treated as independent and should be multiplied. Therefore, the probability of getting exactly 22 successes (as in my little exercise) equals 10^(22) (one chance in ten sextillion). Wow! We have demonstrated the validity of the QuechuaEurasiatic theory beyond reasonable doubt!
What is wrong with this calculation? Note the word given above. In my game the set is not given in advance. It consists of 22 words I have picked out of 200. There are C(200,22) possible ways of selecting 22 distinct objects out of 200, where C(n,k) is the binomial coefficient:
C(200,22) = 22!×178!/200! = 112532031446554154468618348400
So I chose just one set among the more than 10^29 (one hundred octillion) possible ones and got 22 successes. Note that the words were handpicked (because I first made sure they matched something in the database), not selected at random. To estimate the actual probability of 22 successes out of 200 trials you have to take into account all the remaining possible selections.
From the formal point of view, my game is a typical Bernoulli trial with a binomial distribution of successes and failures. There are n trials yielding success with probability p (and failure with probability 1p). The probability of getting exactly k successes (and nk failures) is C(n,k)×p^k×(1p)^(nk).
Assuming n=200, k=22 and p=0.1, the probability of getting exactly 22 successes in 200 trials is 0.0806. However, I could also claim a win if I got more than 22 successes, so we have to add up the probabilities of 22, 23, 24, ... 200 successes. The total probability of getting at least 22 wins out of 200 is 0.3516. Let's suppose that we have underestimated p just a little, and that the actual value is, say, 0.12. In that case, the probability of getting at least 22 successes is 0.6999. If we have overestimated it, and the actual value is 0.08, the probability of 22 or more successes is 0.0804. So if p=0.1(±20%), our successes are not statistically significant.
Erratum (first paragraph): for "if we we" read "if we were".
DeleteOops! Sorry for posting in haste. Another correction:
DeleteC(200,22) = 200!/(22!×178!) = 112532031446554154468618348400
Let me add, in case it isn't quite clear, that p≈0.1 is the estimated probability of a chance match between a word on the Swadesh list for Quechua and a potential cognate in the database. So when I say that "our successes are not statistically significant", I mean they are not sufficient to refute the hypothesis that the matches are due to chance  not by a long shot.
DeleteI shouldn't have posted that lengthy comment before I finished my morning coffee. No "Edit" button for comments is one of those deficiencies of Blogger that drive me nuts.
Fortunately, there's a remedy for this, and it's called Disqus. :)
DeleteNooooo... Disqus comes with its own problems, like expecting that you don't even want to see all comments. I recommend Wordpress.
DeletePiotr, thank you for your kind response. Sorry, I have not been able to response at once.
DeleteYou are right of course, and my posts were a little provocative. Many people try to prove genetic relation just by showing 10, 20 or even 50 pairs of similar looking words. I would tell them the same as you said in a little different, non too much mathematical way, however. Namely, p = 0.1 means that, among 200 words, 20 are expected to match just by chance, and their presence proves nothing. It means that if we compare two completely unrelated (or rather: really distantly related) languages with the help of their Swadesh200 lists, the maximum probability is that we will get 20 matches. In yet other words, for n = 200 and p = 0.1, Binomial[n, k]*p^k*(1  p)^(n  k) gets its maximum value (of 0.0936363) when k = 20.
Maths is useful only when we can do the proper interpretation of the results. The probability of 0.08062 (which means 8.062%) tells us that there is a little less than 1 chance in 12 that we will get exactly 22 matches if the compared languages are not related (i.e. so distantly related that the Swadesh's method does not lead to any significant results). Once again, if we assume (!) that two compared languages are not related, there is ca. 8% chance that we will get exactly 22 matches from the list of 200 words.
Any further conclusions are overinterpretation. In particular, in any case we cannot say that these two investigated languages are not related. We cannot also test the hypothesis that they are related  because kinship is not a binary relation. We can only say (at most) how much the kinship is.
Btw., there is no aim in considerations on [b]at least[/b] 22 matching pairs  because the number of not matching pairs is also important for our calculations, and the number is known (or at least, should be known). If we found 30 matching pairs (of 200), the probability would be only 0.0068, so almost 12 times less than with 22 matching pairs.
DeleteBesides, the correct estimation of p is yet more important. I proposed p = 0.1 on purpose, only because I just knew the result in advance. But the value means that the Nostratic reconstructions are [b]very[/b] inaccurate. If we assumed p = 0.05 (twice less), so doubled accuracy of reconstructions, the probability of obtaining exactly 22 matching pairs out of 200 just by chance would be only 0.000290682 (more than 277 times less!).
The probability p = 0.05 means for example that we compare two CVC roots, totally neglecting the vowel, and distinguishing only 45 different types of consonants (it is the case, for example, when treating all of *kor, *qur, *khil, *gel like matching each other). I think such an accuracy should not be excluded, the more that [i]one can even formulate some preliminary “regular correspondences” (supported by a few cognate pairs each!)[/i]. As a result, there is still no reason for treating the Quechua  Nostratic similarity with sarcasm.
"Similarity" is a tricky notion. The more distant the genetic relationship, the less likely real cognates are likely to be. Any discernible similarity between, say, Greek and Malay, such as duo : dua '2' and mati : mata 'eye' can only be accidental (or perhaps, in some cases, due to horizontal transfer: Malay has IndoAryan loans which may be vaguely similar to related Greek words, e.g. polis : pura 'city'). Collecting lookalikes is not a constructive approach to longrange comparison.
Delete... the less similar real cognates are likely to be.
DeleteStill convinced that the thesis of a distant relationship between Quechua and Nostratic is so ridiculous?
ReplyDeleteIf yes, please explain why there is only one possibility per ten billion that your chart is the result of a blind fate. And why do you take so littlelittlelittle probable hypothesis as a real fact to such a degree that you ridicule the contrary view.
Btw. (@ John Cowan) "a certain person" who makes the "totally nonsatirical claims about the IndoEuropean nature of Tsimshianic" is John Dunn, a professional linguist, a retired professor of linguistics at the University of Oklahoma in Norman, on a base of some tens or even some hundreds of words. I have acquainted with the data collected by him (now impossible to find online). Basing on it, and on my (very limited) knowledge of Tsimshianic grammar, I personally still see no real reason for his claiming that Tsimshianic is IndoEuropean. But those tons of similarlooking words are really impressive, and I would not object to the idea that most of them are loanwords from an IndoEuroepan language, possibly close to Tocharian.
I would rather suggest trying to explain the similarity instead of just treating it with ridicule.
If similarity of many roots between a certain language and Nostratic, IndoEuropean etc. had been claimed for a language used somewhere in Eurasia, would we have been more likely to accept that the similarity may be the result of mutual influence in the past or even of a genetic relation? If yes, and despite this we are not willing to claim such a relationship between Eurasiatic and, say, Quechua, maybe the reason is our misconception about the past?
I mean: the Nostratic protolanguage may really have existed once, say, 10 thousand years ago, and the language predecessors of today's Quechua may have lived somewhere in the proximity, or even their language may have been genetically closely related to Nostratic. So, we cannot just say that the observed (Nostratic  Quechua) similarity is still so implausible that even ridiculous. It is so, because there is not a single fact for which we could be sure that the aboveoutlined scenario is a nonsense (or I do not know one).
Anyway, why should we believe instead in a hypothesis which is as little probable as one to ten billion?
Still convinced that the thesis of a distant relationship between Quechua and Nostratic is so ridiculous?
DeletePersonally, I don't find it ridiculous at all. Still, I agree with Piotr that a simple comparison of lookalikes cannot demonstrate it; it can only be right for the wrong reason.
As a biologist, I even think the question "is Quechua related to Eurasiatic at all" is simply beside the point. Of course they're related. While the monogenesis of language isn't as obvious as the single origin of all Life As We Know It, there is simply no reason to suppose that language evolved more than once or that some population lost it and then redeveloped it from scratch. The interesting question is the shape of the tree: what is the closest relative of Quechua, and what is the closest relative of IndoEuropean?
Basing on it, and on my (very limited) knowledge of Tsimshianic grammar, I personally still see no real reason for his claiming that Tsimshianic is IndoEuropean. But those tons of similarlooking words are really impressive
Not if you look at the details. Dunn proposes several really strange sound correspondences and many really strange shifts in meanings. Let's reconstruct ProtoPenutian first, and then let's compare that to a much improved ProtoEurasiatic.
If similarity of many roots between a certain language and Nostratic, IndoEuropean etc. had been claimed for a language used somewhere in Eurasia, would we have been more likely to accept that the similarity may be the result of mutual influence in the past or even of a genetic relation?
This depends on what the similarities are like: where in the vocabulary are they found, what are the regular sound correspondences, do they contain elements that make no sense in one language but are grammatical affixes in another, and so on. It's difficult; above all, it's a lot of work.
As a biologist, I even think the question "is Quechua related to Eurasiatic at all" is simply beside the point.
DeleteI am also a biologist, and it is nice to me to meet a colleague. I agree with you  all languages, and especially all extraAfrican languages, come from one protolanguage.
The interesting question is the shape of the tree
Yes, indeed. For example, closer relationship of Quechua and Nostratic than Quechua and, say, Australian languages is something expected. It would be exciting if it appeared that things go inversely (because it might suggest then that the New World has been peopled in no accordance with the Hrdlicka's paradigm).
Among all Native American languages, Greenberg distinguished only 3 large groups, EskimoAleut, NaDene and the third one which he called Amerindian. We may try to reconstruct the tree of Ameridian  but what if we could demonstrate much closer relationship between, say, Quechua and Nostratic, than between another Amerindian language and Nostratic?
Dunn proposes several really strange sound correspondences and many really strange shifts in meanings.
Not stranger than, for instance, shifts postulated for Armenian.
Let's reconstruct ProtoPenutian first, and then let's compare that to a much improved ProtoEurasiatic.
Under the condition that Penutian is a real division... Besides, in my opinion we are talking about loanwords. If yes, it is enough to show that we deal with Tsimshianic words without cognates in other languages thought to be related. A ProtoPenutian reconstruction is not needed for this.
Telling the truth, we are talking about Nostratic, not Eurasiatic. Greensberg's Eurasiatic does not include Dravidian, Kartvelian or AfroAsiatic  while Nostratic does. And the ToB reconstruction is based on Nostratic, not Eurasiatic. Pagel, Atkinson (the same who “proved” the location of the IE cradle in Anatolia) and others have simply got confused with these terms.
Not stranger than, for instance, shifts postulated for Armenian.
DeleteNot true.
Greensberg's Eurasiatic
Oh, sorry, I'm not talking about Greenberg's but about the Moscow School version, which forms Nostratic together with AfroAsiatic. Many recent Moscow School works have used the term "Nostratic" for this branch in order to avoid confusion with Greenberg's version, thus causing confusion with their own earlier usage...
all languages, and especially all extraAfrican languages, come from one protolanguage
ReplyDeleteWhile monogenesis is plausible, there is no actual evidence for it, and not likely to be any either.
we are talking about loanwords
IE loanwords in Tsimshianic only doesn't make much sense, unless you suppose that Tsimshianic split from the rest of Penutian already in the Old World, which there is no reason to believe.
really strange sound shifts
Unnaturalness of sound shifts is no big deal: see Austronesian.