06 March 2013

To Be or to Be: Aspects of Existence (Act I)

The conjugation of ‘to be’ in English is an instructive example of a conjugational paradigm formed by a coalition of verb forms with different histories. We normally expect the “grammatical forms” of a verb to consist of one and the same root with different inflectional endings. Thus, for example, Latin laudāre ‘to praise’ has the following present-tense forms in the active voice:
1sg. laudō, 2sg. laudās, 3sg. laudat 
1pl. laudāmus, 2pl. laudātis, 3pl. laudant
and the corresponding passives:
laudor, laudāris, laudātur
laudāmur, laudāminī, laudantur
and, for example, pluperfect subjunctives:
laudāvissem, laudāvissēs, laudāvisset
laudāvissēmus, laudāvissētis, laudāvissent
... and numerous others, all of them containing the root laud-, extended with the characteristic suffix of the “1st conjugation” (-ā-) to form the stem laudā-, to which inflectional endings are attached. A paradigm like this is genetically uniform (and highly regular as well).

But some paradigms look less regular. Compare the present-tense conjugation of ‘to be’ in several Indo-European languages (OCS = Old Church Slavonic):


The overall similarity is quite striking; and it is also clear that the IE languages share some characteristic irregularities in the conjugation of this verb. For example, if we compare the 3sg. and 3pl. forms, and separate the ending from the root, we see that the original conjugation must have been something like this:

[1] 3sg. *es-ti, 3pl. *s-Vnti
where *V is a vowel whose precise quality is hard to determine. Some of the attested forms point to PIE *e, others to *o. (The asterisks preceding these forms mean: “Attention! This is a reconstruction, not a directly attested word or sound!”)

The Vedic evidence suggests that the root has the shape es- in the singular and s- in the plural because of a shifting stress pattern. Indo-Europeanists traditionally prefer to use the word “accent” because it is thought that syllable prominence was marked by pitch variations rather than greater loudness in the common ancestor of the IE family. However, there must have been a stage in the deep prehistory of IE when prominence was correlated with articulatory effort and the “accent” was in fact some sort of dynamic stress similar to that found in Modern English. The absence of stress often caused the phonetic reduction or complete loss of vowels. In this case, the vowel of the root *es was reduced to zero, leaving only *s, if the stress was on the inflectional ending and the root was unstressed.

To tell the truth, the reconstruction in [1] counted as “standard PIE” at the end of the 19th century. More recent decades have brought some progress in our understanding of the PIE grammatical system, and we now use a more sophisticated reconstruction (with the root rewritten as *h₁es- and the stress shifting between the root and the 3pl. ending *-enti):
[2] 3sg. *h₁és-ti, 3 pl. *h₁s-énti
A Hittite ritual text featuring some laryngeal fossils
*h₁ is one of the so-called “PIE laryngeals” – consonants lost in most languages of the family, though not quite without trace. The existence of those consonants was hypothesised back in the 1870s and famously demonstrated several decades later (an interesting case, showing that historical linguistic is capable of making testable predictions). Three such aitch-like phonemes are usually reconstructed for PIE; they are symbolised *h₁, *h₂, and *h₃ to avoid commitment as to their precise phonetic value, but many Indo-Europeanists believe that *h₁ was glottal (like English /h/) whereas *h₂ and *h₃ had some sort of back articulation (velar, uvular or pharyngeal), and *h₃ was likely voiced. In some languages (notably in Greek) an initial laryngeal before a consonant is reflected as a vowel, but in most IE languages it simply disappears. That’s why Vedic has 3pl. sánti while in Greek the proto-form *h₁s-énti became *esenti, then *ehensi (the probable pronunciation in the archaic Mycenaean dialect, in whose rather clumsy writing system using the Linear B syllabic script it was rendered e-e-si), and finally Classical Greek eisí.

There were also other verbs of this kind:
[3] 3sg. *gʷʰén-ti, 3pl. *gʷʰn-énti ‘strike, kill’
They don’t look particularly irregular when you look at the PIE reconstruction. However, since the presence or absence of a full vowel often made a difference in their later development, the contrast between the 3sg. and 3pl. forms increased in some of the daughter languages. For example, the Vedic reflexes look like this:
[4] 3sg. hánti, 3pl. gʰnánti
where the h of the singular (pronounced as a semi-voiced glottal fricative = IPA [ɦ]) represents the Vedic outcome of an aspirated velar stop palatalised before a front vowel. The full development was something like this:
[5] *gʷʰén-ti > *gʰénti > *ǰʰénti > *ǰʰánti > hánti.
Note that although *e became a low central vowel (*a) in the Indic languages, it first affected the pronunciation of the preceding consonant in a way which betrays its original front quality.

So, although pairs like Sanskrit ásti and sánti, Latin est and sunt, Old English is and sind, etc., look rather different, they are, at least from the historical point of view, forms of “the same” word. But what about forms like English be and been? In Old English, the verb ‘to be’ even had an alternative present tense in which all the verb forms had an initial /b/:
[6] 1sg. bēo, 2sg. bist, 3sg. biþ; pl. bēoþ
They do not seem to be connected with the root *h₁es- in any way. Where did they come from, then? This will be the topic of the next posting.


  1. It is remarkable that the 3pl form preserves the zero-grade in many languages, but the remaining plural forms seem to have undergone analogical levelling. OCS is a very clear case in point, but also Latin - here analogy seems to have worked 1sg - 1pl, 2sg - 2pl. Why don't we have analogical levelling in 3pl? Sturtevant must have been right about the irregular working of analogy.

  2. It is by no means certain that the original stress in the 1/2pl. was on the inflectional ending. The 3pl. form, which is used far more frequently may have been unique in having a stressable (underlyingly accented) vowel. If so, it's the Sanskrit-type forms that are secondary (3pl. being the source of analogical influence). The relative rarity of 1/2pl. forms (not to mention duals) is the reason why I'm not using them to illustrate the stress shift.

  3. PS. In Latin, sumus was likely influenced by sum, which is itself due to enclitic reduction: *esmi > *esm̥ > esom (attested!) > som > sum.

  4. So why is the Hittite 3pl asanzi and not sanzi? Was the *h₁ still present, and the a- is a real or orthographical epenthetic vowel? Is Kloekhorst on to something when he reads e-es-zi as ʔeszi instead of ēszi?

    1. There's no independent evidence that *h₁ was vocalised in PIE or that it was reflected as a glottal stop in Anatolian. The a ot asanzi 'they are', adanzi 'they eat', appanzi 'they take', akuanzi 'they drink' must be a secondary "weak grade": a schwa-type epenthetic vowel was inserted to break up some word-initial obstruent clusters in the passage from PIE to Proto-Anatolian. We have the same alternation in ses-/sas- 'sleep', and also in the ablauting noun tēkan, gen. taknās (PIE *dʰǵʰm-').

    2. So the cluster that was broken up was *h₁s-, meaning that *h₁ didn't completely disappear without a trace in Anatolian, even though it didn't remain as a phoneme, right?

    3. There are no direct reflexes of *h₁ in the historically attested Anatolian languages, but there are reasons to believe that it survived into Proto-Anatolian. For example, the assimilation rule *-VRHV- > *-VRRV- operates also when *H = *h₁.

    4. The case of asanzi etc. doesn't sound qualitatively different from the Greek reflex of *HC-, where presumably an epenthetic schwa was inserted and then colored by the laryngeal. Yet we say that Greek has a reflex of initial preconsonantal *h₁ and Anatolian doesn't. Is that simply because in Anatolian the epenthesis pattern applies more broadly than just to *HC-?

    5. I wouldn't say that the Greek "prothetic" vowel is a direct reflex of the laryngeal (precisely because it reflects a post-PIE epenthetic schwa, not the laryngeal itself), but this may be just a terminological preference. In Anatolian, however, we can distinguish consonantal reflexes of (some of) the laryngeals from such indirect traces ("laryngeal effects") as the initial vowel of asanzi.

    6. Is this the latest word on the subject?