Language Evolution: March 2013

26 March 2013

Chaque Réplicateur A Son Histoire

Let us define linguistic replicators, informally, as recurrent fragments of language structure (such as words, morphemes, speech sounds, and constructions) which can be transmitted from one generation of speakers to the next. A replicator has an evolutionary relevance only if it can be “internalised” by new speakers as a permanent information structure residing in their brains. Although the crucial part of the process takes place during first-language aquisition, our mother tongue continues to be developed and modified in our later years. We can also learn one or more foreign languages. Our brain is actually the place where language contact takes place and where replicators can penetrate language barriers.

It should be clear by now that replicators live lives of their own. It’s much easier for them to spread inside a speech community (especially if it’s predominantly monolingual) than infect a different language; that’s why their genealogies lie mostly inside the branches of the family-tree of languages. But their lateral transmission (from language to language) is not only possible but common. Linguistic replicators and the languages that contain them co-evolve to a significant degree but are not doomed to each other’s company forever.

How many species? Reticulate evolution
©2011 Australian Institute of Marine Science, Coral Reef Research

We can also reverse the perspective: a language (rather than being a fundamental unit) can be viewed as a stream of replicators, most of which have co-evolved for a long time, so they have adapted to each other’s presence as well as to the host community of their human users. If we were able to plot the historical lineages of all replicators, thousands of lines would cluster together into somewhat fuzzy bundles that in turn would form a denser tree-like pattern in which we could make out “separate languages”. But of course there would also be some merging back of closely related branches, and a lot of horizontal transfer going on between branches (no matter how distant their relationship), sometimes taking the form of massive lateral influx (think of French loans in English, Arabic loans in Persian, Spanish loans and grammatical constructions in Nahuatl, etc.). Inside a branch, the pattern of linguistic microevolution may be network-like (“reticulate”) rather than tree-like: replicators often form tight symbiotic complexes, losing their separate identities in the process. Relationship between closely related languages or dialects of the same language (two situations that may be difficult to distinguish in practice) is also typically reticulate and dominated by the lateral diffusion of linguistic elements. Unsurprisingly, the same patterns are familiar to evolutionary biologists, cf. the image on the left).

24 March 2013

Replicators at a Fork in the Road

In the figures below, time flows from left to right. In each case there is a language split (the division of a speech community into two communities separated by a communication barrier). A black diamond stands for the emergence of a pair of cognates caused by a such a split. A solid black dot stands for a duplication event (the emergence of a variant form in the same language). A white dot stands for the loss of a linguistic entity, and a linked pair of pentagons stand for an event of linguistic borrowing (apex down indicates the source language, apex up indicates the target language). The figures describe some commonly occurring “etymological scenarios” reconstructed by historical linguists. They will soon be followed by a series of posts illustrating those scenarios with real-life examples.

Fig.1. Simple coalescence

Fig.2. Lateral intrusion

Fig. 3. Deep coalescence

Fig. 4. Wanderwort invasion

23 March 2013

The Etymon: Where Genealogies Coalesce

How do we trace words and morphemes to their source? It’s simple. Take a set of forms that seem to be related (or cognate, in the jargon of historical linguistics), travel back in time and watch out for coalescence effects. The farther backwards you go, the more similar the candidate cognates should grow, until at some points they coalesce completely. If instead they seem to become less and less similar, and their genealogies seem to drift apart rather than converge, you are probably dealing with spurious cognates – words whose similarity is accidental, and not due to common descent.

If you have no TARDIS to actually travel in time, do what scientists do. Take a reliable model of historical tranformations (in this case, of phonetic and morphological changes: linguists have built them for many of the known language families), and run it backwards. It is not always easy to reverse reconstructed language changes, because they are frequently accompanied by an irrevocable loss of information. For example, if a language has been affected by a change consisting in the merger of the vowels *e and *o and *a (let’s imagine that they all become /a/), you may be unable to recover the original quality of the vowel in a given modern word containing /a/. Fortunately, information lost in the genealogy of one particular language may be preserved in related languages. In fact, a merger just like the hypothetical one described above occurred in the prehistory of the Indo-Iranian languages, but data from languages such as Greek, where the original vowel qualities are preserved more faithfully, allow us to disambiguate the Indo-Iranian evidence. Furthermore, even where the three vowels have merged, traces of their old pronunciation may be visible as different phonetic impact on the development of the neighbouring sounds (with information being re-encoded rather than lost). If in a distant past old *e regularly palatalised a preceding *k, which became č [ʧ] as a result, and if this č can only have this kind of origin, you know that a modern sequence like ča must reflect *ke. Modern ka is still ambiguous (*ko or *ka), but at least you can rule out *ke. And guess what? This is also what we really find in Indo-Iranian.

Note that the genealogy of a set of cognates does not have to follow the structure of family relations among the languages that contain those cognates. If, for example, a Germanic word has a cognate in Slavic and another one in Greek, we usually expect the genealogies of the cognates to meet in the common ancestor of Germanic, Slavic, and Greek. But words are often exchanged “horizontally” between languages. They have their own unique histories; they may even survive the death of their original host language and live on somewhere else. It may turn out that their genealogies, traced backwards in time, leave the Indo-European family before they finally coalesce.

It is also possible for a set of related forms to converge only partly by the time we reach the common ancestor of the languages that contain them. For example, OE fȳr, Gothic fon (gen.sg. funins), and Old Norse funi beside fýr and fúrr ‘fire’ are all related and represent the scattered and partly reworked remains of a single ancient paradigm. That rather complicated paradigm was quite likely felt to be somewhat irregular already in Proto-Indo-European. We reconstruct it like this: nom.sg. *páh₂wr̥, gen.sg. *p(h₂)wén-s (the bracketed segment was probably left unpronounced). In Germanic, the *r/n alternation in the stem of the word was no longer conditioned grammatically. We have different Proto-Germanic byforms of the ‘fire’ word, with either *r or *n generalised accross the board in the historically known languages. To “unify” them, we have to go beyond Proto-Germanic, but even as we reach Proto-Indo-European, the coalescence is not complete. The stem has different variants (“allomorphs”) in different case-forms. We can treat them as “internal” cognates traceable back to a still more remote common ancestor. Such an approach is called internal reconstruction, in this case within PIE. The stem looks as if it had originally consisted of two morphemes, with the word stress alternating between them, roughly along the following lines:

“strong” cases: *péh₂- + *-weN-
“weak” cases: *peh₂- + *-wéN-

The symbol *N stands for a hypothetical Pre-PIE segment that became PIE *r in the nom./acc. (where it was word-final) and *n in other case-forms. The transcription used above is provisional and has no claim to strict accuracy: internal reconstruction is inherently insecure, and the comparative method can’t be applied to PIE itself (unless one day we discover hitherto unrecognised cognates lurking in other language families). Can we identify the two morphemes? The *-wr̥/-wén- part is at least a recurrent derivational suffix, even if its function isn’t entirely clear. We find it in several other neuter nouns, usually related to a verb root. In Hittite, where it is particularly frequent, it productively forms verbal substantives. Perhaps, then, the *pah₂- part is the same as the verb root *{peh₂-} ‘protect, guard’ (note the regular change of underlying *e to *a triggered by an adjacent *h₂), and this ‘fire’ word originally mean ‘stuff to be protected’ (like the sacred fire of Vesta)? Possible but speculative. Nevertheless, at least from the formal point of view, we have approached the point of where the genealogies of fire and fon coalesce not only with each other but also with those of Greek pũr, Classical Armenian howr, Hittite pahhur (gen.sg. pahhuenas), etc. – and I mean not only the quotation form of each word, but also all its case-forms. In order to reach that point we have to travel back in time beyond Proto-Germanic and even beyond Proto-Indo-European, into the mostly uncharted territory of internally reconstructed Pre-Proto-IE.

There will be more about different types of coalescence later.

13 March 2013

To Be or to Be: Aspects of Existence (Act IV)

There is a subtle difference between the is-forms and the biþ-forms in Old English. The reflexes of *h₁es-/*h₁s- typically refer to ongoing temporary states, which is not surprising, given the imperfective value of the PIE root verb. The b-forms, on the other hand, usually refer to future states or to general, timeless truths:

Iċ bēo ġearo sōna ‘I shall be ready soon’

On twelf mōnðum bið þrēo hund daga & V & syxtiġ daga ‘In twelve months there are 365 days’

This contrast, unique to Old English among the Germanic languages, shows that the root *bi- retained something of the perfective value of *bʰuH-. Altough it can’t reflect the unmodified root aorist, it seems reasonable to look for its origin among formations genetically connected with the aorist, and preserving an “aoristic” lexical aspect – verbs that refer to actions or states having a terminal point, evolving towards completion, or treated as a unitary whole without internal temporal structure. As mentioned in a previous post, “present” stems with a zero grade of the root and an accented *-e/o- suffix, derived from root aorists (e.g. *gʷr̥h₃-é/ó- ‘devour’ from *gʷérh₃-/*gʷr̥h₃- ‘swallow up’), fit the bill. Let us therefore suppose that *bʰuH- also generated a present stem of that type, namely *bʰuH-é/ó-, formally imperfective but with the approximate meaning ‘is going to be/become’ or ‘is (at any time, by definition)’. Such a verb is indeed attested outside Germanic, cf. Greek pʰúomai ‘grow, spring up, become’. The expected development of such a stem from PIE to PGmc. would be as follows:

1sg. *bʰuH-ṓ > *buō

2sg. *bʰuH-é-si > *buisi

3sg. *bʰuH-é-ti > *buiþi

3pl. *bʰuH-ó-nti > *buanþi

Let us suppose that the unaccented short high vowel in hiatus (resulting from the loss of the laryngeal) became a glide in pre-Germanic times, and that the resulting exotic cluster *bw- was simplified to *b already in Proto-Germanic. The predicted outcome would then be something like this:

1sg. *bō > OE *bū (or *bō?)

2sg. *bisi > OE bis(t)

3sg. *biþi > OE biþ

3pl. *banþi > OE *bœ̄þ

The first and last forms do not resemble the attested ones. But the Proto-Germanic pattern would have been so unlike anything else in the verb system that speakers could be expected to favour any “mutation” reducing its irregularity. For example, the reanalysis of *bisi and *biþi as *bi-si and *bi-þi (rather than *b-isi, *b-iþi with an aberrant asyllabic root) would have motivated the introduction of the neo-root *bi- also in the first person and the plural:

1sg. *b-ō → *bí-ō > OE bēo

3pl. *b-anþi → *bí-anþi > OE bēoþ

This restructuring could have happened at any time between late Common Germanic and archaic Old English, yielding the same outcome.

This little essay cannot exhaust the secrets of ‘to be’ in Old English. One vexing problem is the OE 2sg. eart (also Anglian (e)arþ) and the alternative plural found in the northern dialects (Mercian earun, Northumbrian aron, hence Modern English are). Their vocalism (*a, as if from pre-Gmc *o rather than *e) is puzzling and makes comparison with Old Norse ert, eru problematic. They may represent the lingering traces of another verb which has contributed to the paradigm of ‘to be’, possibly PIE *h₁er- ‘arrive’. PGmc. 2sg. *arþ(a) and 3pl. *arun(d) would make sense as reflexes of the IE perfect built to that root, retaining their stative value and interpreted as present-tense forms (‘to have arrived’ → ‘to be there’).

Another crazy quilt (Hat tip: The Dotty One)

I have not discussed the past tense (1/3sg. wæs, 2sg. wǣre, pl. wǣron < PGmc. *was-/*wēz-) because its origin is not controversial. Since both *h₁es- and *bʰuH- seem to have been reluctant to form perfects of their own, and since the PIE perfect was the foundation on which the new past tense of basic (non-derived) verbs was built, the missing past-tense forms were co-opted from the conjugation of the near-synonymous stem *wis-i-/*wes-a- (Gothic wisan; Old English wesan, an alternative infinitive beside bēon ‘to be’), continuing the PIE root *h₂wes- ‘stay, remain, spend the night’. The perfect of that verb became utilised as the past of ‘to be’, while the present forms just withered away (they occur sporadically in OE texts, but are extremely rare). This symbiotic relationship goes back to Proto-Germanic times, and the suppletive use of the ex-perfect stem *was-/*wēz- as ‘was/were’ occurs throughout Germanic. In Old English, we also find forms of wesan as alternative imperatives (wes/bēo ‘be’) and present participles (wesende/bēonde ‘being’). To sum up, the English paradigm of ‘to be’ is a Frankenstein monster sewn together from pieces of defective paradigms and involving perhaps as many as four different PIE roots (*h₁es- [am, is], *bʰuH- [be, being, been], *h₂wes- [was, were], and possibly *h₁er- [art, are]). A really impressive case of lexical symbiosis!

10 March 2013

To Be or to Be: Aspects of Existence (Act III)

Here is the present-tense paradigm of ‘to be’ in five Germanic languages. The figure below is a cladogram showing how those languages are related to one another.

	Gothic	Old Norse	OHG	Old Frisian	Old English
1sg.	im	em	bim	bem	eom	bēo
2sg.	is	ert	bist	bist	eart	bist
3sg.	ist	er	ist	is	is	biþ
1pl.	sijum	erum	birum	send	sind	bēoþ
2pl.	sijuþ	eruð	birut
3pl.	sind	eru	sint

The Gothic and Old Norse forms are all derivable (albeit with some complications) from the inherited paradigm of PIE *h₁es-/*h₁s-. The Old Norse root is er- (rather than es-) probably because the Germanic forms of ‘to be’ for the most part reflect unstressed, enclitic variants of the verb. If Proto-Germanic *s was intervocalic and preceded by an unstressed vowel, it became *z by so-called Verner’s Law. In North and West Germanic languages this *z became /r/ (there was no such change in Gothic). The third-person forms er and eru were innovative, created analogically to match the rest of the paradigm.

More curious is the occurrence of an initial /b/ in some of the West Germanic forms (see the blue-shaded cells in the table). In Old High German, we find verb forms which look like the expected reflexes of *h₁es-/*h₁s-, except that a /b/ is prefixed to the verb in the first and second persons, singular as well as plural. Old Frisian exhibits the same oddity, but only in the singular, because the Anglo-Frisian languages had given up distinguishing grammatical persons in the plural and generalised the 3pl. form. Note that there are no “b-forms” whatsoever in Gothic and Old Norse, so it looks as if the use of /b/ in the first and second persons were a distinctly West Germanic innovation.

And yet, most strangely of all, English does not follow the same pattern. This would be less surprising if English were a “basal” West Germanic language loosely related to the rest of the group. But no, English is nested rather deep inside one of the subgroupings of West Germanic. Still, even today, English has am, are (older art), is as opposed to Modern German bin, bist, ist. Instead of isolated appearances of an initial /b/, Old English had a full set of b-forms as an alternative present-tense paradigm beside the rather faithfully retained reflexes of *h₁es-/*h₁s-. In no way can the Old English pattern be explained as a local innovation. It must be an archaism: OE preserved two “alleles” of ‘to be’ for each person, whereas both its sister language, Old Frisian, and its slightly more distant cousin, Old High German, had eliminated one of the variants and fixed the other. The fact that in both languages the b-forms show a similar distribution is therefore the result of convergence, not a shared West Germanic feature. Also the total absence of b-forms in Gothic and Old Norse must be due to the parallel elimination of a pattern present already in Proto-Germanic. The strange root *bi-, which seems to underly the Old English b-forms, is unique to that language; Old Frisian and OHG have forms contaminated with it rather than its direct reflexes. Nevertheless, this contamination is indirect evidence of the presence of *bi- in Proto-West Germanic. Since outside Germanic PIE *bʰuH- has provided verb forms with the meaning of ‘to be’ in several branches (Slavic *byti, Lat. fuī ‘I was’, etc.), the archaic status of the alternative b-paradigm in Old English is quite clear. Note the importance of the outgroup data, overriding the negative evidence of Gothic and Old Norse. Apparently the insular isolation of English since the Anglo-Saxon conquest of Britain in the fifth century helped it to preserve a curious lexical relict.

If we were to reconstruct the early Germanic prototypes of the OE b-forms, they would have to look more or less like this:

1sg. *bi-ō, 2sg. *bi-si, 3sg. *bi-þi, 3pl. *bi-anþi

Such a reconstruction is hardly satisfactory, since the endings look partly like those of the thematic conjugation (1sg.), partly athematic (2-3sg.), and partly ambiguous (3pl.). The pattern must have been reworked in some way, but how? Let us note first that the root *bʰuH- occurs in Germanic outside of the paradigm of ‘to be’. One old imperfective stem derived from the root aorist, *bʰuH-jé/ó- yielded Germanic *bū-i/a- (OE būan, ON búa ‘dwell, inhabit’), but it doesn’t match the OE b-forms either formally or semantically. Secondly, although the residue of the b-conjugation was completely integrated with the *h₁es-/*h₁s- pattern in continental West Germanic, Old English had two semi-independent sets of forms, preserving subtle semantic differences which may offer a clue to understanding the origin of *bi-.

But I suppose this is more than enough for one blog post. The investigation will be continued in the next.

09 March 2013

Tense Wars, and the Survival of the Perfect

Before I address the fate of *bʰuH- in Germanic, a brief interlude is in order to provide my guests with some necessary background knowledge. The last common ancestor of the Germanic languages was spoken not much more than 2,000 years ago; the time-depth of the Indo-European family is likely to be about 6,000 years ago (with a wide margin of uncertainty). This means that perhaps as many as 4,000 years elapsed between the breakup of PIE and the stage we can reach by carrying out comparative reconstruction within Germanic. The ancestor of the Anatolian languages was the first to split from the rest of the family, followed by the ancestor of Tocharian. The remaining branch (let’s call it Neo-Indo-European) was ancestral to all the modern IE groups. It underwent further splits, diversifying into a number of distinct lineages, Proto-Germanic among them. After diverging from its closest relatives, Proto-Germanic continued evolving on its own, and developed a number of unique innovations inherited by its descendants but not shared with the rest of Indo-European.

De bello temporum

By the time Proto-Germanic began to split up, its grammar had been affected by thorough upheavals. In the verb system, aspect had lost its importance as a grammatical category. Another basic opposition – that of tense, present versus past – remained. The inherited forms of the stative aspect (the “IE perfect”) had acquired a new interpretation. The original perfect had referred to a current state brought about by a past action; now the focus had shifted to the action itself, and the perfect began to encroach on the territory of the old past tenses (the aorist and the imperfect), threatening to make them redundant. Only in a few cases did the perfect retain its stative meaning. For example, the perfect of the verb ‘to see’ (PIE *woid-/*wid- > Germanic *wait-/*wit-) survived as a present-tense verb meaning ‘to know’ (= ‘to have seen’) – a phenomenon found also in other IE languages (Sanskrit véda, Gk. oîde ‘(s)he knows’). The form of the perfect had been modified too. For example, most perfect stems had lost one of the typical traits of that category – the partial doubling (reduplication) of the root syllable. But the perfect kept some of its special inflectional endings, as well as the characteristic vowel alternations that distinguished it from the related present stems. They are still visible in sing vs. sang or drive vs. drove.

The battle of the tenses ended with the crushing victory of the perfect. The aorist and the imperfect died out almost completely. One solitary survivor was the imperfective past tense of the verb ‘to do’ (from PIE *dʰeh₁-). Its reduplicated imperfective (PIE 3sg. *dʰi-dʰéh₁-t, 3pl. *dʰé-dʰh₁-n̥t) still survives as the past tense did. It was also employed as an auxiliary verb that formed a periphrastic past tense in combination with a past participle. In this way, a host od secondary (derived) verbs without an inherited perfect of their own could acquire a semantically equivalent past tense. It was a useful function and it helped to keep the last imperfect alive. However, the auxiliary suffered the common fate of grammatical words – phonetic attrition. No longer a free-standing word, it degenerated into a clitic and then a suffix. Its last visible trace is the past-tense ending of “regular” verbs (English -ed), and even that actually reflects the suffix of the participle fused with the ex-auxiliary (that’s why loved is today both a participle and a past tense).

The aorist fared no better. It did not survive at all as a past tense in Germanic. To be sure, some derivatives of old aorists lived on as present-tense forms, but most of those had long been divorced from their historical source. For example, Proto-Germanic had no past tense corresponding to the root aorist *gʷem-/*gʷm- ‘come, go’, but it did have a present stem (*kʷim-i-/*kʷem-a- > Goth. qiman, OE cuman ‘come’) continuing the pre-Germanic “simple thematic present” *gʷém-e/o- (3 sg. *gʷém-e-ti, 3pl. *gʷém-o-nti). It probably started out as the subjunctive mood of the root aorist. Such subjunctives began to evolve functionally into present-tense forms soon after the separation of Anatolian (cf. Vedic jám-a-ti ‘he comes’), and this new type of present became very productive in the Neo-IE languages. But there was also another type of present related to root aorists, with the zero-grade of the root and an accented suffix. For example, the root aorist *gʷerh₃-/*gʷr̥h₃- ‘devour’ produced the present *gʷr̥h₃-é/ó-, found in some branches of Neo-IE. Such presents retained much of the perfective force of the aorist: they referred to telic (goal-oriented) actions. They can also be found in Germanic, e.g. *wig-i/a- < *wik-é/ó- ‘conquer, fight’ (attested also in Celtic).

The discussion of ‘to be’ can be continued now, but it will have to wait till the next post.

08 March 2013

To Be or to Be: Aspects of Existence (Act II)

Proto-Indo-European verb roots had an obligatory aspectual value: they were either perfective (punctual, discrete) or imperfective (durative, continuous). Their inherent aspect could be switched by adding appropriate suffixes, but it was preserved in simple root verbs (in which inflections were added to the bare root). Inflectional endings could contain information about the tense of the verb (to be precise, they could carry present-tense markers), but perfective stems were always unmarked for tense and were normally interpreted as a perfective past tense referring to a punctual or completed action. By contrast, imperfective stems could form both a past tense and a present, referring to an action in progress or to repeated/habitual actions. The traditional terminology of IE studies uses the following, somewhat confusing terms for combinations of aspect and tense:

present (= imperfective, present)
imperfect (= imperfective, non-present)
aorist (= perfective, non-present)

The confusion is made worse by the fact that there was a third aspect with a stative (non-eventive) value (e.g. referring to a state resulting from a previous action), called the “perfect” (not to be confused with “perfective”). The perfect was not the inherent lexical aspect of any root; it was derived (from both perfective and imperfective roots) by means of certain morphological operations, and had its own special personal endings.

The formal aspect of a PIE verb can be quite surprising. The root *gʷʰen- ‘kill, strike’ was imperfective, though most people would regard killing as a punctual, non-durative activity. Remember, however, that the meanings we attach to PIE roots and words are only approximate. Perhaps it would be more correct to gloss *gʷʰen- as ‘to deal blows’ (a prolonged or repeated action).

Sometimes we meet pairs of distinct but nearly synonymous roots, the main semantic difference between them being one of aspect (how the action is viewed in relation to the flow of time). For example, *h₁ei- means ‘walk, be on the move’ (imperfective), while *gʷem- means ‘go, come, advance a step’ (perfective). But, beside a root verb, *gʷem- could also form derived stems with imperfective semantics: *gʷm̥-je/o- (3sg. *gʷm̥-jé-ti, 3pl. *gʷm̥-jó-nti) and *gʷm̥-sḱe/o-, with the approximate meaning ‘to be going/coming’. Note that those derivatives, unlike root verbs, had a fixed accent on the stem-forming suffix.

Non-present eventive verbs (imperfects and aorists) had so-called secondary personal endings. In the third person, they differed from presents by lacking the final *-i, which seems to have functioned as a marker of here-and-now (and therefore of the present tense). Thus, *gʷem- had the following forms: 3sg. *gʷém-t, 3pl. *gʷm-ént. In some branches (for example Greek and Indo-Iranian) the past tense was explicitly characterised by the accented prefix *é- (called the “augment”), so that ‘(s)he came’ was expressed as *é-gʷem-t, and ‘(s)he was walking’ as *é-h₁ei-t (as opposed to the present *h₁éi-ti). Without the augment, the non-present forms could be interpreted as “timeless” (neutral with respect to tense).

The imperfective verb *h₁es- ‘be’ also had a perfective counterpart, the root aorist *bʰuH-. The use of a capital *H as a cover symbol for any of the three laryngeals means that the evidence is insufficient to make the reconstruction more specific. Unlike most other verb roots *bʰuH- is usually cited without a full vowel in the basic form, because we do not even know for sure if it was originally *bʰeuH- or *bʰweH- (each reconstruction has its fans, but the evidence is inconclusive). The reduced “zero grade” of either root shape would have been *bʰuH-, and that’s the only form widely attested across the branches of the IE family. Quite possibly the weak form was generalised already in the common ancestor of the Indo-European languages. If so, the ancestral aorist displayed no vowel alternation even if its accent was mobile: 3sg. *bʰúH-t, 3pl. *bʰuH-ént.

Not exactly an action

If *bʰuH- was at the same time perfective and semantically related to *h₁es-, it must have referred to some transitional aspects of existence like ‘come into being, arise, appear, happen’, or ‘get, grow, become’, i.e. entering a state rather than remaining in it. We find copious evidence of the root having those meanings, as well as many similar ones, in various IE languages. But, as we have seen, aorists could be transformed into presents by adding imperfective suffixes. For example, the derivative *bʰuH-jé-ti would have meant ‘is becoming’, ‘is happening’, etc., converging semantically with ‘being there’. Let’s suppose that *h₁es-, because of its rather special existential function (‘to be’ is hardly an action or an event), was defective in some respects. Judging from the comparative evidence, it did not form any stative (“perfect”) or perfective (“aorist”) stems; no verbal adjective analogous to English past participles was derived from it. If any descendant of PIE evolved gramatically in a way which made those gaps inconvenient, existing forms of other verbs with a similar meaning could be co-opted to make the paradigm complete. In the next post I will try to show how this process operated in Germanic.

06 March 2013

To Be or to Be: Aspects of Existence (Act I)

The conjugation of ‘to be’ in English is an instructive example of a conjugational paradigm formed by a coalition of verb forms with different histories. We normally expect the “grammatical forms” of a verb to consist of one and the same root with different inflectional endings. Thus, for example, Latin laudāre ‘to praise’ has the following present-tense forms in the active voice:

1sg. laudō, 2sg. laudās, 3sg. laudat

1pl. laudāmus, 2pl. laudātis, 3pl. laudant

and the corresponding passives:

laudor, laudāris, laudātur

laudāmur, laudāminī, laudantur

and, for example, pluperfect subjunctives:

laudāvissem, laudāvissēs, laudāvisset

laudāvissēmus, laudāvissētis, laudāvissent

... and numerous others, all of them containing the root laud-, extended with the characteristic suffix of the “1st conjugation” (-ā-) to form the stem laudā-, to which inflectional endings are attached. A paradigm like this is genetically uniform (and highly regular as well).

But some paradigms look less regular. Compare the present-tense conjugation of ‘to be’ in several Indo-European languages (OCS = Old Church Slavonic):

Language	1sg.	2sg.	3sg.	1pl.	2pl.	3pl.
Latin	sum	es	est	sumus	estis	sunt
Greek	eimí	eî	estí	esmén	esté	eisí
Gothic	im	is	ist	sijum	sijuþ	sind
OCS	jesmĭ	jesi	jestŭ	jesmŭ	jeste	sǫtŭ
Vedic	ásmi	ási	ásti	smás	sthá	sánti
Hittite	ēsmi	ēssi	ēszi	ēsweni	ēsteni	asanzi

The overall similarity is quite striking; and it is also clear that the IE languages share some characteristic irregularities in the conjugation of this verb. For example, if we compare the 3sg. and 3pl. forms, and separate the ending from the root, we see that the original conjugation must have been something like this:

[1] 3sg. *es-ti, 3pl. *s-Vnti

where *V is a vowel whose precise quality is hard to determine. Some of the attested forms point to PIE *e, others to *o. (The asterisks preceding these forms mean: “Attention! This is a reconstruction, not a directly attested word or sound!”)

The Vedic evidence suggests that the root has the shape es- in the singular and s- in the plural because of a shifting stress pattern. Indo-Europeanists traditionally prefer to use the word “accent” because it is thought that syllable prominence was marked by pitch variations rather than greater loudness in the common ancestor of the IE family. However, there must have been a stage in the deep prehistory of IE when prominence was correlated with articulatory effort and the “accent” was in fact some sort of dynamic stress similar to that found in Modern English. The absence of stress often caused the phonetic reduction or complete loss of vowels. In this case, the vowel of the root *es was reduced to zero, leaving only *s, if the stress was on the inflectional ending and the root was unstressed.

To tell the truth, the reconstruction in [1] counted as “standard PIE” at the end of the 19th century. More recent decades have brought some progress in our understanding of the PIE grammatical system, and we now use a more sophisticated reconstruction (with the root rewritten as *h₁es- and the stress shifting between the root and the 3pl. ending *-enti):

[2] 3sg. *h₁és-ti, 3 pl. *h₁s-énti

A Hittite ritual text featuring some laryngeal fossils

*h₁ is one of the so-called “PIE laryngeals” – consonants lost in most languages of the family, though not quite without trace. The existence of those consonants was hypothesised back in the 1870s and famously demonstrated several decades later (an interesting case, showing that historical linguistic is capable of making testable predictions). Three such aitch-like phonemes are usually reconstructed for PIE; they are symbolised *h₁, *h₂, and *h₃ to avoid commitment as to their precise phonetic value, but many Indo-Europeanists believe that *h₁ was glottal (like English /h/) whereas *h₂ and *h₃ had some sort of back articulation (velar, uvular or pharyngeal), and *h₃ was likely voiced. In some languages (notably in Greek) an initial laryngeal before a consonant is reflected as a vowel, but in most IE languages it simply disappears. That’s why Vedic has 3pl. sánti while in Greek the proto-form *h₁s-énti became *esenti, then *ehensi (the probable pronunciation in the archaic Mycenaean dialect, in whose rather clumsy writing system using the Linear B syllabic script it was rendered e-e-si), and finally Classical Greek eisí.

There were also other verbs of this kind:

[3] 3sg. *gʷʰén-ti, 3pl. *gʷʰn-énti ‘strike, kill’

They don’t look particularly irregular when you look at the PIE reconstruction. However, since the presence or absence of a full vowel often made a difference in their later development, the contrast between the 3sg. and 3pl. forms increased in some of the daughter languages. For example, the Vedic reflexes look like this:

[4] 3sg. hánti, 3pl. gʰnánti

where the h of the singular (pronounced as a semi-voiced glottal fricative = IPA [ɦ]) represents the Vedic outcome of an aspirated velar stop palatalised before a front vowel. The full development was something like this:

[5] *gʷʰén-ti > *gʰénti > *ǰʰénti > *ǰʰánti > hánti.

Note that although *e became a low central vowel (*a) in the Indic languages, it first affected the pronunciation of the preceding consonant in a way which betrays its original front quality.

So, although pairs like Sanskrit ásti and sánti, Latin est and sunt, Old English is and sind, etc., look rather different, they are, at least from the historical point of view, forms of “the same” word. But what about forms like English be and been? In Old English, the verb ‘to be’ even had an alternative present tense in which all the verb forms had an initial /b/:

[6] 1sg. bēo, 2sg. bist, 3sg. biþ; pl. bēoþ

They do not seem to be connected with the root *h₁es- in any way. Where did they come from, then? This will be the topic of the next posting.

Language Evolution