There are many analogies between the history of languages and biological evolution, but the differences between the two domains are also important and should be highlighted. One of them is the degree to which evolving units can be expected to remain recognisable over long periods of time (let’s assume that “long” means millions of years in biology and thousands of years in linguistics). The contrast is particularly striking if we consider biological evolution at the molecular level. How do we decide whether two fragments of DNA in the genomes of two different species have shared ancestry? The problem is by no means trivial, but it helps if the fragments in question are sufficiently long. If we align them and find a high percentage of identical base-pair sequences, it becomes likely that the partial identity is due to common origin rather than convergent evolution or pure chance. Of course sufficient care must be taken to rule out other possible explanations. For example, a lengthy periodic sequence in which a simple motif is repeated numerous times (like CACACACACACA...) is something that may have been generated many times independently by some kind of replication slippage. Whether the similarity of two such sequences means anything depends on the context in which they occur (e.g. whether they occupy the same locus in their respective genomes).
Note that the “alphabet” of the genetic code is very simple and (almost) the same in all domains of life, and the same is true of the interpretation of the codons (the translation of nucleotide triplets into amino acid sequences). Guanine is guanine, and cytosine is cytosine, no matter if they form a GC pair in the DNA of a blue whale, an amoeba, or a forget-me-not. The unity of the genetic code leaves no interpretative leeway: two nucleotides are either identical or different, not “similar” or “almost the same”.
By contrast, morphemes and words are encoded as strings of phonological segments whose inventories differ widely from language to language. There may be universal constraints on what is permitted to function as a speech segment, but individual languages enjoy a lot of freedom in this repect. Even varieties of one and the same language (say, Scottish English, Received Pronunciation, and General American) may have quite different phonological systems. The same speech sound may play a different role in different inventories. For example, the contrast between plain [t] and aspirated [tʰ] is not employed to differentiate meaning in English (the occurrence of the two sounds is predictable from the phonetic context), but in many languages (Mandarin, Armenian, Ancient Greek, Hindi, etc.) /t/ and /tʰ/ are distinct phonemes – contrastive units perceived as different “letters” of the mental code we use to represent words in the lexicon. Because of the operation of sound change, the pronunciation of a word (and its phonological encoding) can morph into something pretty well unrecognisable in the course of several generations.
A typical gene may contain about 30,000 base pairs. A typical word contains just a few phonemes. This difference in internal complexity is really significant. If there is a 50% match between the base-pair sequences of two aligned genes, the probability of this agreement being accidental is practically zero (even if they encode for rather different proteins and so have different “meanings”). If two words with “the same” meaning have “the same” form in two languages (remember that “sameness” is a tricky notion in linguistics, hence the cautionary quotation-marks), their identity means nothing by itself: it may well be accidental.
The most powerful device in the toolkit of historical linguists, the comparative method, is not interested in lookalikes but in words displaying recurring systematic correspondences – patterns resulting from the operation of historical sequences of regular sound changes. In order to undergo such changes (and to acquire their characteristic imprint, like a certificate of origin) a word must spend enough time in the company of other words in the same linguistic lineage. It may be difficult to believe that e.g. the English word daughter (RP /dɔːtə/, Gen.Am. /dɔtɚ ~ dɑtɚ/) is related to Polish córa /ʦura/ (with the same meaning), but the relationship is more or less obvious to anyone familiar with the regular sound changes in Germanic and Slavic, and with the correpondences they have left behind. Remember: not because the words are similar (they aren’t) but because the relationship between the most conservative lexical strata of Polish and English conforms to a well-defined pattern of formal correspondences, described in detail in the linguistic literature.
Greek theós and Latin deus not only have the same meaning (‘god’) but also look very similar: /tʰ/ and /d/ are both dental stops, /o/ and /u/ are both back rounded vowels (and we know that Greek -os often corresponds to Latin -us); the remaining segments are simply identical, and there is a similar hiatus (absence of a consonant) between the vowels in each case. There is, on the other hand, no plausible common source for the initial sounds, despite their similarity. No PIE word-initial consonant (or consonant cluster) can yield /d/ in Latin and / tʰ/ in Ancient Greek. The pair violates all known patterns of correspondence between the two languages. Moreover, a careful analysis shows that there are practically certain relatives of deus in Greek (e.g. the adjective dĩos ‘divine’ and the name of Zeús, the chief Olympian god), and plausible relatives of theós in Latin (e.g. fās ‘religious law’, fēstus ‘festive’, and fānum ‘temple’). They are hardly similar to theós, but they fit nicely into the known pattern of correspondences. One could entertain other possibilities, e.g. that theós might be a loanword from an unknown IE language in which the PIE *d > *tʰ; such an assumption would account for the deviation from the expected pattern. But – apart from the fact that there would be other features of theós left unexplained – ad hoc recourse to otherwise unheard-of languages with arbitrary charcteristics falls foul of Ockham’s Razor. The hypothesis that deus and theós are unrelated, and that theós is instead related to fās etc., is more parsimonious, and on the whole more compelling. The respective prototypes of deus and theós are therefore reconstructed as *deiwós versus *dʰh₁sós, derived from different PIE roots. It follows that their similarity is deceptive and there is no genetic link between them.
The likelihood that two words from different languages will display both a “matching” meaning and a “matching” form quite by chance is much higher than most people would imagine, and increases dramatically every time we relax the criteria of what constitutes a match. Without any formal controls, such as those that allow us to recognise spurious cognates in lexicons with reconstructable histories, matches are a dime a dozen and have no intrinsic value as evidence. If two languages are distantly related, real cognates are often as dissimilar as daughter and córa, and we can’t identify them just by eyeballing the material.
|Pieces should form a pattern|
To be fair, the oldest words in the IE languages have often retained enough similarity to be “visibly related” even to a layperson. After all, it was the bare resemblance of some Sanskrit words to those in Greek, Latin, etc. (famously observed by Sir William Jones in 1786) that led to the discovery of the IE family. But it was the careful application of the comparative method and the reconstruction of the common “protolanguage”, as well as the unravelling of the changes transforming it into the historically known languages, that allowed linguists to progress from impressionistic speculation to something resembling a scientific model. Many of the correspondences that seemed evident to linguists in the early 19th century have turned out to be misleading, and lots of non-evident ones have been discovered. Despite being regular, they couldn’t be spotted by a naive observer.
[To be continued in the next post.]
[► Back to the beginning of the Proto-World thread]
[► Back to the beginning of the Proto-World thread]