26 March 2013

Chaque Réplicateur A Son Histoire


Let us define linguistic replicators, informally, as recurrent fragments of language structure (such as words, morphemes, speech sounds, and constructions) which can be transmitted from one generation of speakers to the next. A replicator has an evolutionary relevance only if it can be “internalised” by new speakers as a permanent information structure residing in their brains. Although the crucial part of the process takes place during first-language aquisition, our mother tongue continues to be developed and modified in our later years. We can also learn one or more foreign languages. Our brain is actually the place where language contact takes place and where replicators can penetrate language barriers.

It should be clear by now that replicators live lives of their own. It’s much easier for them to spread inside a speech community (especially if it’s predominantly monolingual) than infect a different language; that’s why their genealogies lie mostly inside the branches of the family-tree of languages. But their lateral transmission (from language to language) is not only possible but common. Linguistic replicators and the languages that contain them co-evolve to a significant degree but are not doomed to each other’s company forever.

How many species?  Reticulate evolution
©2011 Australian Institute of Marine Science, Coral Reef Research
We can also reverse the perspective: a language (rather than being a fundamental unit) can be viewed as a stream of replicators, most of which have co-evolved for a long time, so they have adapted to each other’s presence as well as to the host community of their human users. If we were able to plot the historical lineages of all replicators, thousands of lines would cluster together into somewhat fuzzy bundles that in turn would form a denser tree-like pattern in which we could make out “separate languages”. But of course there would also be some merging back of closely related branches, and a lot of horizontal transfer going on between branches (no matter how distant their relationship), sometimes taking the form of massive lateral influx (think of French loans in English, Arabic loans in Persian, Spanish loans and grammatical constructions in Nahuatl, etc.). Inside a branch, the pattern of linguistic microevolution may be network-like (“reticulate”) rather than tree-like: replicators often form tight symbiotic complexes, losing their separate identities in the process. Relationship between closely related languages or dialects of the same language (two situations that may be difficult to distinguish in practice) is also typically reticulate and dominated by the lateral diffusion of linguistic elements. Unsurprisingly, the same patterns are familiar to evolutionary biologists, cf. the image on the left). 

26 comments:

  1. Hi Piotr,

    I'm not sure I understand the following sentence:

    "But of course there would also be some merging back of closely related branches, and a lot of horizontal transfer going on between branches (no matter how distant their relationship), sometimes taking the form of massive lateral influx ..."

    You began this paragraph by proposing an alternative to the model of languages as "fundamental units". But, doesn't the idea of "horizontal transfer between branches" just take us back to this model? If we're not concerned with languages as a whole, but only the components thereof (= replicators, if I understood your definition correctly), then the idea of "transfer" no longer makes any sense, because from this perspective, there are no larger groupings to which a replicator can "belong" (except maybe the speech community in which the replicator is used). Instead, individual replicators simply propagate themselves through time, sometimes branching out into multiple forms and sometimes not.

    The same confusion comes up for me with the phrase "distant relationship [between branches]" and the term "lateral". Both of these seem to presuppose a model of language relationship that transcends the relationships between the individual replicators within each language.

    Did I misunderstand something about the model you're using in this paragraph?

    ReplyDelete
  2. I'm not saying "individual languages" aren't real, my claim is only that they are not particularly well-bounded. Biologists have a very similar problem with the notion of "species", which is notoriously hard to define. Still, they speak both of different species (usually divided by a reproductive barrier, in the case of sexually reproducing organisms) and of gene flow between species, and don't regard such usage as self-contradictory.

    There would be no speech communities if they did not share enough linguistic structure for mutual comprehensibility to be close to 100%. If it's close to 0%, we can be sure we are dealing with completely separate languages (and clearly discrete speech communities). But in real life one often finds intermediate cases which can't be unambiguously classified, as well as the linguistic analogue of what biologists call a "ring species": A is mutually intelligible with B, B with C, C with D, and D with E, but E is not mutually intelligible with A.

    We like to think in terms of binary contrasts and clearcut relations, but reality can't be shoehorned into such simple schemes. In the case of changeable, evolving systems relations like "A is the same language as B" are not strictly transitive. There's always a tacit understanding that we really mean "almost the same", since no two people ever use exactly the same language. The same goes for different historical stages: between Old English and present-day English each of the forty or so generations of English-users has spoken "the same" language as their parents and their children. And yet, despite all that "sameness", English has changed beyond recognition. For example, most of the original vocabulary of Old English has been discarded, and most of the words used today have been borrowed from sources such as French and Latin (not to mention a few dozen other languages).

    The family tree is a valid model of linguistic relationships, but with quite a few caveats. We normally concentrate on a highly durable core (like the "basic vocabulary" and its morphology) and ignore the rest (in fact, most of the structure of an actual language) as a messy complication. But look at the first sentence of this paragraph: the words family, valid, model, linguistic, relationships, quite, and caveats are not etymologically Germanic (though some of them contain elements traceable back to PIE via Italic). So it's possible to be English without being Germanic though from the taxonomic point of view English is a Germanic language. Labels like "Germanic" or "Indo-European" refer to the historical derivation of the most stable part of the "stream of replicators" without assuming that there is some kind of "historical essence" which will never evaporate.

    ReplyDelete
    Replies
    1. P.G.:

      'The same goes for different historical stages: between Old English and present-day English each of the forty or so generations of English-users has spoken "the same" language as their parents and their children. And yet, despite all that "sameness", English has changed beyond recognition.'


      Hi Piotr, I don't know if you address this point somewhere (your postings have become too dense for me to read thoroughly, or if not to read then to understand) but as we know from Mr. Caxton, 'ovr language now (like 1450-60-ish) varyeth ferre from that whych was vsid & spokyn whan I was borne. For we Inglyssh are bourn undir the domynacyoun of the mone, never faste & fyrme but ever waxynge and wanynge etc. etc.' I am, in other words, particularly interested in the phases where the language evolution dramatically accelerates and adjacent generations no longer have the illusion of speaking the same language... I have the impression this is now happening with Polish, probably because we Poles read next to nothing and have lost a great deal of the 'sense' of a language standard, as a result. (For instance, my Polish, a sexagenarian's, is hardly comprehensible to twenty-somethings, Polish.)

      Delete
  3. "I'm not saying "individual languages" aren't real, my claim is only that they are not particularly well-bounded."

    I'm actually not sure that "languages" (in the sense of self-contained systems) are real, except in a sociological sense. Outside of sociology, it seems more empirically precise to speak of what you call replicators (individual words, morphemes etc.).

    "There would be no speech communities if they did not share enough linguistic structure for mutual comprehensibility to be close to 100%."

    I mentioned speech communities because they are the only factor (that I can think of right now) that allows one to determine when a set of replicators becomes a language: if these replicators are used within a speech community, then they are the language of that community.

    "So it's possible to be English without being Germanic though from the taxonomic point of view English is a Germanic language. Labels like "Germanic" or "Indo-European" refer to the historical derivation of the most stable part of the "stream of replicators" without assuming that there is some kind of "historical essence" which will never evaporate."

    I'm not sure what you mean by "... the most stable part". Stable across what process?

    ReplyDelete
  4. I'm actually not sure that "languages" (in the sense of self-contained systems) are real, except in a sociological sense. Outside of sociology, it seems more empirically precise to speak of what you call replicators (individual words, morphemes etc.).

    I'd put it slightly differently. Discrete or semi-discrete languages are emergent entities produced by linguistic evolution (or, to be more precise, by macroevolution). Unlike the actual replicators, they are not the primary units of evolution. You might call them empirically "less real" because of that, but on the other hand replicators hosted by one and the same community form patterns of mutual dependency and tangled interactions (roughly, the thing we call "a grammar"), which are real enough and sort of keep them together in one bundle.

    I'd also say that language family trees are real but less real than the histories of individual replicators, since the former emerge as you superpose a vast number of the latter.

    I'm not sure what you mean by "... the most stable part". Stable across what process?

    Just across history. Words like that, me, two, three, four, ten, sit, eat, lie, father, mother, brother, son, daughter, foot, tooth, heart, wind, sun, snow, water, fire, grain, wolf, birch, new, red, as well as a number of others, have "travelled together" in the same chain of communities for several millennia. Not that there is anything mystical about them. Most of them just happen to be frequently used words for commonly mentioned concepts. They have extremely long expected lifetimes because their transmission pattern is particularly reliable. Sooner of later anything can undergo lexical replacement, but this kind of vocabulary remains stable for thousands of years (rather then hundreds, if that, for rarer words). Their existence is what makes the comparative method possible.

    ReplyDelete
  5. You might call them empirically "less real" because of that, but on the other hand replicators hosted by one and the same community form patterns of mutual dependency and tangled interactions (roughly, the thing we call "a grammar"), which are real enough and sort of keep them together in one bundle.

    I'm not sure, though, that these interactions and dependencies are self-contained enough (i.e., shielded from sociological, pragmatic and other factors in a person's cognition) to speak of "a grammar" as a unitary object.

    Just across history. Words like that, me, two, three, four, ten, sit, eat, lie, father, mother, brother, son, daughter, foot, tooth, heart, wind, sun, snow, water, fire, grain, wolf, birch, new, red, as well as a number of others, have "travelled together" in the same chain of communities for several millennia.

    In that case, why don't we replace statements such as "English is Germanic" with statements such as, "the Swadesh vocabulary of English is (mostly) Germanic"?

    The first statement seems false or inaccurate unless we can define certain features as the "essence" of English and other features as merely accidental.

    ReplyDelete
  6. A (language-specific) grammar is not so much a unitary object as a large bundle of objects characterised by a lot of coherence. But then all macroscopic phenomena are like that. We find it convenient to treat a river descriptively as an object, though it is "really" zillions of water molecules that happen to move in the same general direction, and only a tiny fraction of them cover the whole distance from the springs to the sea. But a river has macroscopic emergent properties that are not reducible to properties of molecules. It has well-defined physical dimensions, you need a bridge of a ferry to cross it, you can drown in it, it provides a habitat for fish and other life forms, etc.

    As for replacing statement 1 with statement 2, I wouldn't do so for several reasons. First, it's perfectly legitimate to use time-honoured standard shorthand terminology if you make its meaning clear. Second, the alternative statement is not very accurate either. Some of the Swadesh vocabulary in English is not Germanic, some of the non-Swadesh vocabulary is Germanic, the Swadesh sets themselves are pretty arbitrary, and "Germanic" is as fuzzy as any other label for a language. Then, what about words whose etymological trajectories within Germanic are not straightforward? For example, egg is an Old Norse loan in English, and ablaut is a technical term borrpowed from Modern German. That makes them both "Germanic", but not in the same way as father, fish, five, fart, full, etc.

    We are not very good at handling evolving and emergent objects, but we do have to handle them somehow. Pedantic precision is not always possible, let alone desirable.

    ReplyDelete
  7. As for replacing statement 1 with statement 2, I wouldn't do so for several reasons. First, it's perfectly legitimate to use time-honoured standard shorthand terminology if you make its meaning clear.

    I don't have a problem with that in principle, though I'm still not sure I completely understand what statements such as "English is Germanic" are a shorthand for.

    Some of the Swadesh vocabulary in English is not Germanic,

    That's why I wrote "mostly" in parentheses.

    some of the non-Swadesh vocabulary is Germanic,

    True, there are other aspects of English than Swadesh terms that can be designated Germanic. My point is that I don't understand the reason (outside of shorthand considerations) for labeling the entire English language as "genetically Germanic", and thereby eliding the many, many aspects of English that aren't genetically Germanic.

    the Swadesh sets themselves are pretty arbitrary,

    Maybe I shouldn't have used the specific term "Swadesh", then, but you seemed to be vouching for basic vocabulary lists as a criterion for classification when you said,

    The family tree is a valid model of linguistic relationships, but with quite a few caveats. We normally concentrate on a highly durable core (like the "basic vocabulary" and its morphology)

    Pedantic precision is not always possible, let alone desirable.

    With all due respect, I don't see how the precision I'm talking about is pedantic. The idea that languages can only belong to one genetic grouping at a time ("English is Germanic, and not Romance", etc.) is an artifact of the history of linguistics. Why should we persist in using the same labels/terminology as early comparative linguists, even when these labels are clearly inadequate?

    ReplyDelete
  8. By "the basic vocabulary" I mean just words for high-frequency meanings that are relatively culture-free. Of course that's what the Swadesh list is about, but when I say that it is arbitrary, I mean two things: (1) the original Swadesh lists were compiled on the basis of pretty impressionistic criteria rather than anything quantifiable; (2) all Swadesh lists have an artificial cut-off point (100, 120, 200 or the like). What we actually have in real languages is a fast-dropping cline of frequency (as roughly described by Zipf's "law"), and there is a measurable correlation between the expected survival time of lexical items and their frequency of use. There are several hundred English lexical items of "Germanic" origin, but their frequency distribution is highly -- very highly -- skewed.

    As for the last paragraph, sorry, I meant no offence. It simply strikes me as impractical to try and squeeze all the necessary caveats in every statement I make rather than explain them once beforehand and then continue to use standard terminology with all the caveats in mind. I hope it's obvious by now that when I say "English is a Germanic language" I am aware of all the complications. Crucially, when I use such terminology as a historical linguist, I mean the part of English to which the classical comparative method is applicable. For example, all the words that have been part od the "stream of replicators" we conventionally regard as the lineage of English for at least ca. 2500 years show the characteristically Germanic phonetic developments like Grimm's Law, Verner's Law, etc., characteristically Germanic morphological properties, and so on. The metaphor of the family tree is applicable only inasmuch as such evidence can be selected. I think it's quite likely that it ultimately erodes away over periods significantly greater than the age of the major uncontroversial families, and so languages can't be classified in this way indefinitely. Not because we have too little information but because the very basis for genetic classification becomes "washed out" in the course of time.

    ReplyDelete
  9. Crucially, when I use such terminology as a historical linguist, I mean the part of English to which the classical comparative method is applicable.

    If you were to compare English and (e.g.) Finnish, you would find patterns of sound correspondence that hold true at least for some sets of words: e.g., pairs such as Finnish realiteetti and Eng. reality show that in some words, word-final [e:t:i] in Finnish corresponds to word-final [i] in English.

    On the other hand, if you compared English and a language that is generally considered to be related to it, not all word pairs would necessarily point toward the same pattern of correspondence: the comparison between English drag and Icelandic draga points to one pattern, whereas the comparison between E. draw and Icel. draga points to another.

    None of these correspondence patterns depends (as far as I can see) on any prior knowledge of how English, Icelandic and Finnish have historically been classified: the comparative method can be applied to any two (sufficiently large) groups of linguistic data points.

    I hope that the above doesn't sound pedantic, but I genuinely don't understand what you mean when you say that the comparative method is only applicable to the part of English that's been affected by Grimm/Verner/etc. I don't doubt that categorizations such as "Germanic" are meaningful, but I still don't see a reason (other than shorthand, and continuity with earlier terminology) for labeling an entire language "Germanic", etc.

    ReplyDelete
  10. If you were to compare English and (e.g.) Finnish, you would find patterns of sound correspondence that hold true at least for some sets of words: e.g., pairs such as Finnish realiteetti and Eng. reality show that in some words, word-final [e:t:i] in Finnish corresponds to word-final [i] in English.

    Exactly. And in order to explain this curious correspondence you have to study the genealogy of this item. Finnish -eetti and English -y eventually coalesce outside both languages, as Latin -āt-, with multiple borrowing events (and the transformation from Latin to Old French) as parts of the full story. It's a different story from that of the common Finno-Ugric vocabulary, or the Germanic component of English, or for that matter the Germanic component of Baltic Finnic. Every language has hybrid origins and in fact that's the intended message of my next planned postings, but we do need to identify the "core" lineages (and the characteristic innovations associated with them) as a frame of reference. That's the part to which we first apply the comparative method to identify changes such as Grimm's Law in Germanic of Lat. -ātem > French . When we compare secondary correspondence patterns due to borrowing, we normally need not define any new changes: we refer to those already established for the languages involved in the contact situation. It's methodologically sound. I can't see what we could gain by doing it the other way round.

    ReplyDelete
    Replies
    1. This is very similar to what we see in the Chinese-based learned vocabulary in Sinospehere languages. We see severla borrowing events, often form different varieties of Chinese and at different periods, and we see back borrowing.

      Sometimes these events were distinct enough that whole groups of loanwords have discrter names. in japanese there are two groups of Chinese loans, kanon and go-on words, and very often they are two pronunciations, sometimes with different meanings, for the same Chinese etymon.

      Vietnamese seems to have borrowed quite a number of words from the ancestor of Cantonese, but somehow still reflects a lot of northern features.

      Then there was a lot of borrwing of Japnese compounds of Chinese etyma back into Chinese when Chinese students went to Japan to study at the end of the 19th century, mostly techncal terms.

      Delete
    2. Such doublets sometimes even reach Europe, like Portuguese chá (from Cantonese) or Russian čaj (from Mandarin via Persian) vs. English tea (from Xiamen).

      Delete
    3. That's a good one.

      It's an extreme example, usually the variations are on the vowles rather than the initials of words, but the distance between Mandarin and Minnan is so great that that that kind of thing should not be a surprise.

      And here's one from two Algonkiain languages at about the same distance as Mandarin and Minnan - . The words "skunk" and "Chicago come from the same etymon, 'to stink'.


      Here's one for you. This deals with isoglosses in Muskogean.
      http://www.peopleofonefire.com/proto-musk/reconstructing_protomuskogean_environments.htm

      Look at the short discussion of terms for 'corn' (maize). It's an interesting problem in historical lingusitics.

      Delete
    4. Thanks for the link, Jim! I've never come across that site before, and it's full of fascinating stuff.

      Delete
  11. Every language has hybrid origins and in fact that's the intended message of my next planned postings, but we do need to identify the "core" lineages (and the characteristic innovations associated with them) as a frame of reference.

    I respectfully disagree. I don't need to know anything about the purported core lineages of English and Finnish in order to determine (or at least hypothesize) that Finn. word-final [e:t:i] corresponds to English word-final [i] in a certain set of words: there are enough relevant data points in English and Finnish, showing enough semantic and phonetic similarity (outside their suffixes) that I can apply the comparative method here without knowing anything else about the lexicon of either language.

    The same applies to cases where relationships are less immediately obvious: I can establish a relationship between certain words/morphemes in English and Icelandic (such as draw and draga) by comparing these words and morphemes in isolation, as long as the set of comparanda is large enough. Where would the criterion of "core" or "peripheral" lineages be necessary in this method?

    ReplyDelete
  12. If you do as you suggest in the first paragraph, you'll end up reinventing the wheel -- I mean getting the same results at a higher cost. It can be done as an exercise, but please show me a single fact you can discover in this way that has not been accounted for by the traditional approach (analyse older lexical strata before younger ones). We know that words like water are part of the "core" in English while those like reality aren't; not because a vision in our dreams told us so or because we assume anything about them in advance, but because the painstaking job of stratifying the lexicon historically has already been done. You know, don't you, that realiteetti is a Latinate loan, and its very shape suggests a likely route of borrowing. What would be the point of ignoring that knowledge?

    ReplyDelete
  13. You know, don't you, that realiteetti is a Latinate loan, and its very shape suggests a likely route of borrowing. What would be the point of ignoring that knowledge?

    I don't think there would be any point in ignoring that knowledge, if one already had it. However, you seemed to be saying (perhaps I misunderstood you) that it was necessary to establish a language's "core" lineage(s) before one could apply etymological methods to the rest of that language.

    There is a lot of knowledge we can take for granted about certain languages, because of the work people have already done in etymologizing them. If we were trying to etymologize the vocabulary of a previously-unstudied language, it's not obvious to me that the concept of "core lineage" would be useful/important. It might be useful to gather Swadesh-type lists for this language (because the words in such lists might tend to have a common origin), but such lists wouldn't necessarily tell us much about the non-Swadesh vocabulary of this language, especially if the language had as many sources as (e.g.) modern-day English does.

    ReplyDelete
  14. No, they wouldn't. That's exactly why I emphasise the methodological usefulness of dealing first with the part of the lexicon which can be expected to be reasonably conservative. "The core" is not given in advance -- it has to be identified. And that's what people do in practice when studying poorly known languages and families. Far be it from me to suggest that historical linguists should be interested in the core to the neglect of the periphery.

    I'd better stop here for now. Thank you for your incisive comments. I'll be happy to continue this discussion after one or two follow-up posts.

    ReplyDelete
  15. Hi Piotr - I just found your blog, and wanted to say it's terrific. Super readable on complex topics, and you moderate the talkbacks like a champ. I'll definitely be bookmarking you.

    Since you've already drawn parallels between biology and linguistics, I wonder if the way biologists apply computational phylogenetics might not be helpful when answering people who want a precise definition of languages, families, etc. In much the same way as you say language trees really emerge from the superposition of a vast number of replicators, biologists calculate "most parsimonious" evolutionary trees by selecting and codifying hundreds of traits found across individual specimens, then running algorithms the myriad different combinations found in real animals are most likely related. There's no guarantee that two animals may share a given trait because of shared ancestors or convergent evolution, but you get a nice, hard stat saying how likely a given evolutionary tree actually is.

    From a phylogenetic perspective, "English is Germanic" could be true simply by definition: Say, if "Germanic" were defined as a crown group that includes "English reference text 1 and Gothic reference text 2 and all their common ancestors." (Biologists actually publish definitions like this for reference, like mammals consist of "the most recent common ancestor of living monotremes and therians (marsupials and placentals) and all descendants of that ancestor.") The particular content of that "Germanic" group might change quite a bit depending the inputs for the tree you're using, and lord knows that phylogenists love to argue over which trees are better-supported, which traits you should use as inputs, etc. But it really gives you rigorous terms for the debate.

    End rant! Thanks again!

    ReplyDelete
  16. Hi, Nathan! There are of course a number of linguists experimenting with phylogenetic/cladistic software. The results are interesting, and depending on the choice of characters one can indeed get different trees. For example, languages like Albanian or English, which have been strongly relexified with "foreign" elements (Latin and French, respectively), may end up in odd places in the tree if you concentrate on lexical characters. Analyses based on phonological and morphological characters usually just confirm what everyone knows anyway, at least as regards the familiar robust clades we call "the branches". There is more room for surprises (but also less robust evidence) for larger clades. Within IE, Germanic is notoriously capricious, sometimes clustering with Italic and Celtic, sometimes as a sister clade of Balto-Slavic (again little wonder, given its geographical position and the ease with which linguistic innovations have always spread across linguistic boundaries in the Great European Plain.

    As in biology, modern definitions of linguistic taxa tend to be cladistic, and paraphyletic groupings are avoided, though people rarely go to such lengths as, say, defining IE formally as the most recent common ancestor of (say) Hittite and English plus all its descendants, or Slavic as the set of languages more closely related to Bulgarian than to Latvian. Which is a pity, because tree-based thinking has a strong impact on the correctness and accuracy of our reconstructions. For example, people often use criteria like "an item counts as PIE if it is attested in four different branches, not all of them in the West or in the East". Such a principle ignores the known structure of the family tree. PIE as reconstructed in the 19th century, on the basis of about a dozen branches, is very different from today's PIE, which includes two branches unknown to 19th-century linguists (Anatolian and Tocharian, both of them outside the crown group, not unlike all those Triassic and Jurassic stem mammals). As a consequence, the term "PIE" is somewhat ambiguous, often in practice referring to the ancestor of the crown group. They should be distinguished and some people do so by using such terms as "Core IE" (the sister group of Anatolian); I personally use "Neo-IE" for the crown group (the sister of Tocharian).

    ReplyDelete
    Replies
    1. For example, languages like Albanian or English, which have been strongly relexified with "foreign" elements (Latin and French, respectively), may end up in odd places in the tree if you concentrate on lexical characters.

      The trick is not to treat surface forms as lexical characters, but to apply the comparative method first, and identify only lexical replacements as new characters. That is what Ringe & Co. do, and it eliminates bonehead errors like grouping English and French together because of all the loanwords.

      Delete
    2. Thanks again, Piotr. Tocharian as a Jurassic stem-mammal - perfect.

      Delete
    3. How is the possibility of multiple ancestries (for example, English < West Germanic, North Germanic, Romance, etc. etc.) normally incorporated into cladistic models of language origin?

      (I know essentially nothing about cladistics, so that might be a pretty elementary question.)

      Delete
    4. A cladistic analysis presupposes a family tree, so it's applicable to those parts of a language whose relationship to the homologous parts of other languages can be so represented. This has the unfortunate consequence that many people, including historical linguists, regard the oldest lexical/morphological layers as somehow privileged, and vertical inheritance as, say, more "genetic" than horizontal transfer. The family tree is then highlighted as the "true history" of a group of languages, while borrowing and other contact phenomena are downplayed as "minor complications". Needless to say, such a view of linguiostic evolution is biassed and far from realistic.

      Delete
  17. That's why they get predictable results ;).

    ReplyDelete