05 May 2013

Morphemes Are Forever

Where do words come from? We may form them by combining preexisting words or roots with derivational morphemes, but where do those preexisting components come from?

The fundamental grammatical units (morphemes and words) are sometimes created from scratch. Imitative and sound-symbolic words (meow, cuckoo, swoosh, bang), are obvious examples. Of course as soon as they become fully lexicalised, they may develop in parallel with other words and undergo the same sound changes, which may result in the loss of their iconic character. For example, Latin had the imitative verb pīpiō ‘pip, cheep (like a chick)’. The fact that we also have Ancient Greek pippízō or, for that matter, Finnish piipittää, all with the same meaning, tells us more about young birds (they made noises like [piː piː] in the past just as they do today) than about the languages that have “borrowed” the bird call in question. Such imitative word-coining must have happened independently many times. But Latin also had several words derived from the verb, among them post-Classical pīpiō (acc. pīpiōnem) ‘nestling, young bird’. That derived noun gradually developed a more specialised meaning, ‘squab, young pigeon (or its meat)’, and its form evolved according to regular sound changes. In the passage from colloquial Latin to French, the medial -pi- developed into *-βj- > *-βʤ- > -ʤ-, eventually yielding Old French pijon (~ pigon, pichon, pigeon) /piʤõn/, which was borrowed into fourteenth-century English (and spelt variously peion, pyion, pichonpygeon, etc.). Modern French /piʒɔ̃/ and Modern English /'pɪʤən/ are not onomatopoeic any more. They have become “normal” words, for which the sound-meaning pairing looks entirely arbitrary. After all, they normally refer to adult pigeons today, and everybody knows that adult pigeons go “coo, coo” rather than “pee, pee”.

It is imaginable that many lexical roots began their existence in a similar manner, as iconic sound combinations mangled beyond recognition by the operation of historical sound change; note, however, that “imaginable” does not mean “demonstrable”. Most of the morphemes whose genealogy we can follow back to a remote past do not seem to have been onomatopoeic during their recoverable history. What is their ultimate source? We simply don’t know. If we say that an English word or lexical root is “of Proto-Indo-European origin”, we mean that at the present state of our knowledge we can trace it back to PIE and no further. If we say that it is “of Latin origin”, we point to Latin as the donor of a loanword and usually stop there; but of course the Latin source itself may have a deeper history in Italic, or it may be a Classical Latin derivative of something older, or it may be a loanword from a still different source (Greek, Etruscan, Gaulish, etc.), in which case the whole cycle repeats itself. Etymological dictionaries declare some words to be “of unknown origin” or “without etymology”, which means that at some point we run out of information as to their deeper origin. But to be honest, we always have to stop somewhere, even if “somewhere” refers to the oldest available reconstructed stage. The genealogies of words (or their parts) lead us deeper and deeper into the past, often forcing us to follow a complex trajectory of borrowing from language to language, or even between language families. In the end we either reach a point beyond which reconstruction is impossible, or lose track of the word in the uncharted wilderness of poorly documented languages. With relatively few exceptions (words known to be coined ab novo), the lexical items we study, or at least their morphological components, are indefinitely old.

Is there a way to refine our methods so that tracing words back to their source would not depend on our ability to reconstruct “protolanguages”? Words and morphemes are in principle immortal. Thanks to horizontal transfer, they can survive the death of languages they came from. For example, quite a few English words (ebony and ivory, among them) are certainly or possibly of Ancient Egyptian origin in the sense that we can follow their trail back to Ancient Egyptian via intermediaries such as French (or Spanish, or Arabic), Classical Latin, Ancient Greek, or Old Persian. Ancient Egyptian is as far as we go simply for lack of further clues. Perhaps there is a way to recover at least a handful of “global etymologies”, i.e. words whose reflexes occur in the language families of all the inhabited continents – not necessarily by virtue of being inherited from a common protolanguage but because they have managed to colonise the world thanks to a combination of inheritance and prehistoric borrowing?

I don’t think we have been introduced.
[Albrecht Dürer. Source: Wikimedia Commons]
To use a biological analogy, we know that the mitochondrial DNA of our species has a single source – an anonymous Paleolithic woman we dub “Mitochondrial Eve”. There was also a common male source of our Y chromosomes, a certain “Y-Chromosomal Adam”, who lived at a different time and place from Mitochondrial Eve. But that is not the end of the story: in the human genome there are a few hundred thousand regions for which family trees could be constructed, and each of them may be rooted in a different prehistoric individual, some “Adam”, “Eve”, “Jack”, “Jill” or “Algernon” who never met any of the others. Those trees, viewed separately, do not represent the ancestry of the whole species, or of identifiable subpopulations – just of selected genomic loci. Perhaps we can get round the “unreconstructible Proto-World” problem by concentrating on globally shared lexical units and leaving aside the question which particular languages they have passed through?

The discussion will be continued in the following posts.

[► Back to the beginning of the Proto-World thread]


  1. The fundamental grammatical units (morphemes and words) are sometimes created from scratch. Imitative and sound-symbolic words (meow, cuckoo, swoosh, bang), are obvious examples. Of course as soon as they become fully lexicalised, they may develop in parallel with other words and undergo the same sound changes, which may result in the loss of their iconic character.

    Do you think that affixes often originate in a similar way (i.e., as onomatopoetic or otherwise expressive formations)?

    Quite a lot of adjectival suffixes, for example, seem plausibly, or probably, of diminutive origin: e.g., the Germanic adjectival suffix *-sk- (as in the -sh of English, Polish etc.) seems to be connected to the Greek diminutive suffix -iskos. There are also many IE adjective suffixes that contain -k- or -l-, both of which I think are also widely used to form diminutives.

  2. It's also related to Balto-Slavic *-iska- with the same adjective-forming function (as in Polish angielski, polski).

    I think some affixes may well be of expressive origin, and indeed there are whole classes of expressive words following the same pattern (e.g. English giggle, bubble, babble, cackle, a productive process, cf. Lewis Carrolls burble, and the recurrence of the -l- suffixoid in whiffle, chortle. I'm not sure about the IE "dorsal" suffixes, since at least some of them are etymologisable (e.g. the zero-grade *-h3kʷ- of the root underlying PIE 'eye' was also employed as a suffix). It's quite common for second members of compounds and for cliticised postpositions to evolve into suffixes, and it's always a possibility worth investigating.

    1. "It's quite common for second members of compounds and for cliticised postpositions to evolve into suffixes, "

      Or for them to go the other way and become independent words. This happened to the "jack" in "hijack" after the term "skyjack" was coined in the 60s. Now it is a slang term that just means to steal. So you have "jack" evolving from a personal name in the original expression "Hi, Jack" in reference to highway robbery, to an element in an unanalyzable compound, and then re-analyzed out by analogy as some semantically quite distant new word.

      Phonesthemes are another intereting example of morpheme generation, or maybe just reassignment and semantic distorition. I ran across a site one day that documented the semantic history of a lot of words in English that had started out semantically quite distinct but had started to slide semantically towards each other based purely on a chance resemblance that then grew into a phonesthemic group or family.

    2. In rare cases an inflectional suffix can regain its freedom and become a postposition. This has happened to the "Saxon genitive" in English. It is no longer a real inflection but a clitic placed at the end of a noun phrase:

      [The Queen of England]'s birthday
      [The man in the street]'s opinion

    3. "Regain its freedom" - yes indeed. These affixes were cliticized onto words from some kind of independent forms deep in the past.

      This process is a cycle. Matisoff said the same thing about tonogenesis in Sino-Tibetan, where you could see an arc of Old Chinese - toneless - to Middle Chinese - tones - to Shanghai for example, with the tone system functionally reduced to two. And in some ways Mandarin is not far behind. And Matisoff said this was part of the cycle of prefixing and suffixing in ST, with affixes reduced to consonants that then faded away but "cheshirized" into tones, and then when the tone system broke down there would be a phase of compounding in the language to disambiguate all the homophones that resulted (and this is the case in Mandarin) and that would turn the affixation wheel another spin.

    4. Would such a thing occur frequently more broadly as a linguistic phenomenon?

      For example, noun classes. Is there a cycle of "middle/null state" <> classifiers <> noun classes <> gender <> null ?

      (<> indicates direction-less evolution)

      I believe I'm fine saying languages can gain genders/noun classes. I don't know if a Polish linguist would agree, but most if not all the grammars I've read on the language declare there being 5 genders, the feminine and neuter with the masculine "broken" into three based on humanness and animacy. Similarly, PIE itself has been analyzed to lack a feminine until after Hittite split from the core.

      I believe I'm equally fine in stating that a language can lose grammatical gender, to the point of it leaving behind unanalyzable (phonetic) segments, e.g. the neuter *d>t in that and it, or even in a written context the French silent es deriving from -a in Latin.

      Additionally, in Yoruba, some have proposed that the now unanalyzable prefixes ta- and ki- in tani, kini (who/what) are the remnants of the ancient Niger-Congo noun class system, with ki- even being related to Meinhof's 7th class (for "things") in Bantoid contexts.

      So if that's that, then what can be said about the kiki/bouba phenomena?

      If they affect a word, and that word attaches itself to another word semantically related, only to erode into unanalyzable segments, then don't we see words becoming "painted" by exterior properties? Especially if the word gets further eroded; if the inner segments erode, we're left with a word that looks a certain, has a high probability of sounding that way, that would easily provide a false cognate.

      Even skipping all of my nonsense; just the kiki/bouba phenomenon itself - doesn't that imply that the probability of linguistic convergence is much higher than biological convergence?

      In which case, wouldn't analysis of single words be possibly even more error prone than long-range comparison, which from my understanding attempts to examine things more systematically, despite its own problems?

    5. Thanks for your comment, Jacob! Yes, I agree that phonaesthetically marked elements may be absorbed into lexical units in the way you suggest, increasing the likelihood of accidental matches between expressive words (of course the notion "expressive" is itself fuzzy and hard to define). I also think that some morphems may accidentally acquire a phonaesthetic value. For example, laterals are very salient phonetically and often occur in expressive words, so a suffix like *-elo-, common in IE languages, may have secondarily acquired productive "intensifier" functions (hence its use in diminutive nouns, iterative verbs, etc.).

      This looks like stabilising selection in favour of maintaining the phonosymbolic or iconic value of some parts of the vocabulary. Note what happens to onomatopoeic words if sound change affects them. They are often "reset" to their original state as if by cancelling the effects of sound change. Middle English cuccu /kʊ'ku:/ (as in "Sumer is icumen in, lhude sing cuccu") would have developed into something like Modern English */'kʌkəʊ/ if it had been allowed to follow the mainstream developments (the related word cuckold did follow them, sice it became divorced semantically and derivationally from its etymological source early enough).

  3. Egyptian ab for elephant must have been inherited from some proto-Egyptian source or borrowed from a neighboor language via contact or trade. If primary sense was "ivory", could we suppose that they knew ivory before to know the animal (so, ivory>ivory-beast>elephant), so it's plausible an African origin, instead of Afro-Asiatic/Nostratic origin. Sanskrit ibha is related, so the original "elephant" in this etymology, was the African Loxodonta (formerly present at North Africa, now extinct) or Asian Elephas (formerly present in Mesopotamia, Elephas maximus asurus, extinct since 100 B.C.)

  4. Sanskrit íbha- is unlikely to be related to the other 'elephant/ivory' words. The Vedic meaning is 'servants, household of a chief', and the meaning 'elephant' is only attested in later Sanskrit literature (Manusmṛti).

    1. I agree with Simões that 'elephant' is secondary from 'ivory', but in Sanskrit we're apparently dealing with two homonymous words.

  5. Even though same words are used through out the history, it does not mean that they keep the same sense (meaning/concept/category) and the same reference (object/phenomenon). In other words, cognates in different languages don't have the same sense and reference; diachronically, the words have changed sense and reference.

    Can you address the above concern if it is relevant?

  6. Well, they are not "the same" words but words with shared ancestry. The form-meaning pairing is of course as evolvable as other aspects of language. There are relatively stable meanings -- for example, a cardinal numeral meaning "seven" is unlikely to shift its meaning to "seven and a half" and then to "eight". But since many words have plenty of secondary senses, they easily move like amoebas in the semantic space. A word with the basic meaning "woman" may come to mean "woman/wife" (extension of meaning), and then only "wife" (specialisation), while some other word takes over its old semantic niche. Similarly, "moon" > "moon/month" > "month", or (with just gradual specialisation) "food" > "food, especially meat" > "meat". All these examples are real. One could use Polish, Latin, and English, respectively, to illustrate these semantic shifts.