29 January 2013

The Little Lambs Who Lost Their Way: Lexical Exceptions

Consider the following Old English words: gān ‘gone’, clāþ ‘cloth’, brād ‘broad’. They belonged to the same lexical set as OE gāt ‘goat’, and we would expect them to have evolved like the rest of the GOAT set, since they do not share any characteristic subregularities with any recognised “minority flock”. Even the spelling of gone and broad (similar to that used in stone and goat, respectively) suggests that they were still members of the GOAT set at the time when the modern orthographic conventions were becoming fixed. And yet they have parted company with other words containing OE ā. Broad has joined the CAUGHT set (with Modern English /ɔː/, as in cause), while the other two vary between CAUGHT and LOT (Modern English /ɒ/ or its unrounded counterpart /ɑ/ as in American dialects). Note also that while OE sc(e)ān ‘shone’ yields the expected outcome /ʃoʊn/ in America, the normative British pronunciation is /ʃɒn/, with a shortened vowel.

Such cases are truly irregular and call for individual explanation. We know that the shortening of the vowel of clāþ cannot date back to Old English (OE claþ” would have become Modern English “clath”). OE ā produced a mid-low rounded vowel /ɔː/ (conventionally spelt ǭ to distinguish it from other O spellings) after the Norman Conquest, during the Middle English period. Indeed, the word was very often spelt clothe, clooth or cloothe  in Middle English, apparently indicating a long-vowel pronunciation. Note that the OE plural clāþas has normally developed into Mod.E clothes, with /oʊ/ (the th may be mute, but that is another story). Today, however, clothes is no longer regarded as the plural of cloth, but rather as an independent collective noun (a case of word duplication!). The distribution of the modern pronunciations of cloth points to an early shorthening of Middle English /ɔː/, as a result of which the word joined the LOT set. Then, in some (but not all) mainstream accents of Modern English, the short vowel was affected by the lengthening heard in moss, cost, lost, frost, moth, often, off, cough, etc., induced by the following voiceless fricative.

The development of broad must have been different, since the word does not show a short vowel in any major accent, and the final consonant is not a voiceless fricative. When the Great Vowel Shift of the 15th century transformed ME /ɔː/ into Early Modern English /oː/ (diphthongised to /oʊ/ in most contemporary varieties of English), one stray sheep left the flock as its vowel underwent an irregular lowering (for reasons that elude us). That lowered pronunciation merged with the new /ɔː/ that resulted from the smoothing of the diphthong /aʊ/ after the Great Vowel Shift (in such words as daughter, caught, law, cause, and drawn).

Gonna be gone
Perhaps there was another sheep of the same contrary disposition, since the long vowel of gone in the accents that rhyme it with drawn is best explained in the same way. Why do we find /gɒn/ ~ /gɑn/ as well? It’s hard to say at which historical stage the shortened variant originated. It could have appeared before the Great Vowel Shift, immediately after it, or still later, with the same result. It is quite possible that it has arisen many times. It is worth observing that high-frequency verbs often display irregular phonetic simplification, possibly because sloppy pronunciations are easier to tolerate in words more or less predictable from the context. Note the similarly unexpected short vowel of says and said, does and done, as well as been (pronounced like bin in American English). Been, said, does, done, says, and gone (in that order) are all among the 500 most frequently occurring English word-forms.

I will return to this interesting correlation between frequency of use and erratic behaviour (which usually consists in some kind of phonological erosion – the shortening, reduction or loss of speech segments).


  1. Re frequency: a great champion thereof has been (still is?) the Polish linguist Witold Mańczak, whose 'cela est du^ a` la fre'quence' (he wrote mostly in French) was a 'caeterum censeo'. He was also an admirer of Zipf. Are you familiar with his work?

  2. I am. I also have the honour to have met Prof. Mańczak on several occasions. I disagree with him on many things. As regards sound change, he places an almost exclusive emphasis on frequency effects, and makes his position unnecessarily dogmatic. But there is a sound core in his views, and I'm convinced the significance of the Zipf distribution is often insufficiently appreciated.

    1. Slightly OT....Regarding Manczak. Can any of his theories be construed as implying that Polish=Proto-Slavic=PIE?

    2. No, they are only biassed towards "proving" that the Slavic and PIE homelands overlapped modern Poland.

  3. ' It is worth observing that high-frequency verbs often display irregular phonetic simplification, , possibly because sloppy pronunciations are easier to tolerate in words more or less predictable from the context'.

    I know at least one verb for which the above does not work (for me): can/can't as pronounced by Americans: I tend to hear both as something like 'cayn' or 'cairn'. I often have to ask: pardon, you are saying you can or you can't (cahn't)? Maybe the contexts are not always sufficiently predictable?

    Frequency often issues in simplification or 'erosion' but sometimes in complication, too, such with the initial w- in 'one' (wun), I suppose. Often, too, complex forms are remembered and used precisely "du^ a` la fre'quence", as Mańczak would have said, for instance we say 'am, are, is', not 'be, bees'.

  4. Being a frequently occurring form is a mixed blessing. On the one hand, it protects a word from lexical replacement. On the other, it produces more opportunity for "mutations", especially those giving rise to weak forms. In a nutshell: live longer, change faster.

    By the way, the conjugation of "to be" is so complex because it is suppletive. No fewer than four originally different Indo-European verb roots have contributed to the creation of this Frankenstein monster. In Old English, there were still two competing infinitives with two competing present-tense paradigms:

    eom, eart, is; sind(on)


    bēo, bist, biþ; bēoþ

  5. Conjugation of 'be' suppletive.

    Yessir, it very much is. So in Polish, by-, es- s-, in Latin fu- (the same as by- in Pol.), es-, s-.

    Frequency helps suppletive verbs to survive in speakers' memory, e.g. 'am' etc., but it did not help in Afrikaans (just 'is' for all persons, as compared with ben, zijt, is, zijn, zijt, zijn of older Dutch) or in Modern Polish, jeśm (am) and jeś (art) got replaced with analogical formations from jest, (jestem, jesteś), not unlike in Lithuanian where 'regular' forms esu, esi replaced the old ones esmi, essi. Something similar in Modern Persian, hastam, hasti, hast.

  6. The Modern Polish present-tense pattern is not so much suppletive as partly "diploid": except in the 3sg./pl., the finite forms of 'to be' consist of jest- (treated as if it were a root) plus personal endings which are, historically, enclitic forms ot 'to be'!

    jest + jeśm' 'I am' → jest-em 'I am'

    The model for this must have been the periphrastic past tense (= the "Slavic perfect"), which consisted of the participle był plus the enclitic auxiliary 'to be':

    był + jeśm' → był-em 'I was'

    Since in the third person the auxiliary came to be dropped, the bare participle był/byli was reinterpreted as a finite verb. The proportional relation:

    był : był-em :: jest : X

    produced a new form of the present tense: X = jest-em. The only forms retained from pre-Polish times are those of the third person: sg. jest, pl. . Of course they occur far more frequently than others, which may explain their survival.

  7. I was saying: the Modern Polish 'be' _is no longer_ suppletive _despite_ frequency.

    But how come the pattern---you describe above---applicable till then and till now only to past-tense forms has been used with, or rather for, a present-tense form? Also, not perfectly, not consistently, for we do not say *jestam or *jestom when we are feminine or gender-neutral beings, like we say byłam or byłom for 'I was'. It's a conundrum to me.

    In Greek, a rather archaic language, they say, it's 'eisi', an analogical formation from es-, for the usual s- in the 3rd pers. plural (są, sont, sind(on), sunt, what not...). This does not put me off, coz es- is on the whole far more frequent than s-. But why did in so many other languages the s- stem survive, in the 3rd person plural?
    Even if this is all 'du^ a` la fre'quence', we must ask ourselves: fre'quence de quoi? Which forms, being more frequent (than which?) have prevailed here?

  8. The -a/o- in the past tense is the gender suffix of the participle (był/była/było). Likewise in the plural: byli/były. The -e- of the masculine, however, was originally the vowel of the auxiliary (retained when the auxiliary was added to a consonant-final word-form):

    'you were'
    był-eś (masculine)
    była-ś (feminine)

    In the present tense, the neo-root jest- was genderless, so the prop vowel of the suffix is invariably -e-

    'you are'
    jest-eś (not marked for gender)

    What is a little surprising is that jest was used in the plural despite the fact that the 3pl. form was . But we have ample textual evidence of forms like sąście ~ -ście są in older Polish. It's clear that jest- and są- competed in the 1/2pl., and jest- eventually won.

    1. What I still do find surprising is that _despite_ the frequency of 'jeśm' etc.---or perhaps it was not all that frequent?--it was replaced by those analogical formations from 'jest', and also, to begin with, for this sole and unique verb the analogy was extended to the present tense, starting from a 'root' that does not at all look like your typical active past participle in -ł(a/o). Mind boggles once you start to think about it.

      Where do, too, such constructions as 'jam-ci jest'? Is 'jam' not 'ja+jeśm'? Where's that 'jest' from, then?

      In Czech, and afaik in other Slavonic tongues, the counterparts of 'jeśm' have survived, jsem etc. (In Russian it's just jest', anyway). Why this Polish _Sonderweg_, do you happen to know?

      Re Greek, thank you I did not know that. But I'd have to examine the Laryngaltheorie anyway.

      'What is a little surprising is that jest was used in the plural despite the fact that the 3pl. form was są'

      Well, here with Mańczak I'd exclaim 'du^ a` la fre'quence'. Jest is far more frequent than są, isn't it?

    2. Where do, too, such constructions as 'jam-ci jest'? Is 'jam' not 'ja+jeśm'? Where's that 'jest' from, then?

      As I said, they are analogical, modelled on the preterite ja-m ci był. The moment był came to be regarded as a bona fide finite verb form, the whole construction was reanalysed as a verb + a mobile (detachable) personal ending. Polish has generalised this pattern to replace the exceptionally irregular present tense of 'to be' and make it more manageable. Some other Slavic languages are happy with the old present forms, others (like Russian) have lost them almost completely. There is no universal way of dealing with strange archaic patterns.

    3. and the reason why the present-tense forms were modelled on the preterite forms _only_ in the case of this verb być=to be is that it was exceptionally irregular and little manageable, is that what you're implying? I understand the business with the -ł participle as a bona fide verb and the rest, but I am still puzzled that the reanalasis you're referring to was applied to the present tense _just_ for the verb 'być', 'to be'.

  9. P.S. Greek eisi is not analogical. It's the normal development of PIE *h1s-énti > *ehensi (cf. Myc. e-e-si).

    The PIE verb *h1es- had a root present which, like other typical root presents, had a shift of accent from the root in the singular to the ending in the plural:

    3.sg. *h1és-ti, 3pl. *h1s-énti

    In most groups the initial *h1 was simply dropped, but Greek regularly vocalised it as a "prothetic vowel".

    1. >>P.S. Greek eisi is not analogical.

      Neither is Lithuanian esì, BTW (quoted by Wojciech). Double consonants are phonotactically prohibited in the language even if they are sometimes written across morpheme boundaries as a purely orthographic convention (ìššūkis 'challenge').

    2. A good point. The simplification of *h1es-si to *h1esi happened already in PIE.

    3. OK, so what is 'esi'? Coz 'esu' _is_ analogical, ain't it? Old Lithuanian it was 'esmi', if I remember well---was it not 'essi' (give or take consonant gemination) in the 2nd pers. sing.? So by analogy I thought 'esi' would be an analogical form too. What about 'esame', 'esate', then?

      Your cousins, the Lettonians, say '(es) esmu', which is analogical too, though one analogical levelling less than 'esu'. Or am I quite mistaken?

    4. esu is esmi restructured on the analogy of the more productive thematic type. So are esame etc., where -a- is the "thematic" vowel.

      But esi is inherited (cf. Skt. asi etc.).

    5. OK, put this way: if the 2nd pers. sing. of buti had been (as it is not) analogical, moulded on, say, dirbti, to work, what would it have been? Not 'esi', likewise? Given that the rest of the paradigm(save 'yra') is analogical, it is hard to believe that 'esi' be inherited, _pace_ Skt. For all of its archaicity, the verbal system of Lith. strikes one as full of analogical formations, does it not?

    6. >>Given that the rest of the paradigm(save 'yra') is analogical

      Also ẽsti 'is/are typically, happens' (~ Polish bywa), which still may be used as mere 'is/are' in the high register and in dialects.

      >>it is hard to believe that 'esi' be inherited

      What if it's exactly the inherited esì (reanalized as thematic es- + thematic -i) which contributed to the thematization of 'to be' in Lithuanian?

    7. Yes, I know esti, though I did not know it was still used in Lithuanian in any register. All the better!

      BTW, what is the 'received theory' of the origin of the ending -i, 2nd pers. sing.? I'd have nothing against your hypothesis above. I only must first find it believable, and that not just because it is beautiful (which it is).

      BTW, in Homeric Greek there was the form 'essi', militating against the ungemination of PIE *es-si in Proto-PIE already --- _pace_ Piotr and his Skt.

    8. It was easy to restore es-si at any time (the same happened in Armenian and Latin, for example; in early Latin (Plautus) es always scans heavy (= /ess/). But Gk. must go back to pre-Greek *esi, otherwise the fricative would not have disappeared. And there is independent evidence of *ss being routinely degeminated already in PIE.

    9. A similar (to yours) position assumes Helmut Rix in his standard 'Historische Grammatik des Griechischen' 1st ed. 1975:

      §274 2. Sg. athem.: bei einsilb. Primärstämmen Endung gr. -si (nur hom. έσσί) > -i (§ 89), ζ. T. mit Sek.-End. -s (§ 268) zu 4s verdeutlicht;
      -si = Prim.-End. idg. *-si in ai. -si -si aw. -hi -§i heth. lit. -si lat. got. -s:
      *H1ei-si: gr. εΐ (§ 59.89.267) ai. esi lat. is heth. päisi
      *H1esi (< *H1es-si): gr. εί (ion. εί-ς) ai. dsi aw. ahi altlit. esi abg.
      jesi; restituiertes *H1es-si in hom. dor. έσσί lat. ess > es heth.
      essi »bist'

      OK, what interest me most is, when you-guys write '*H1esi (< *H1es-si): gr. εί', how can you be so sure that the degemination happened in PIE already. There is independent evidence for it, you say, OK, but certainly not very ample and rather circumstantial, isn't it? If PIE was like Lithuanian in non-tolerating geminated consonants at all (see Sergejo contribution above) then OK, no problem, but it probably wasn't quite like Lithuanian, was it?... Could the two forms, essi and esi, not have existed in various forms of (latish) PIE alongside each other, must the Homeric form needs be a restoration (based on esti?). And finally---the starting point of our discussion---how can we know that the Lithuanian 'esi' is not an analogous formation but an inherited form? I'd be most happy if it was the latter, but I still need to be convinced. (Hopefully, Lith. esi is not part of that 'independent evidence'.)

      Maybe I should explain that though I be interested in such stuff independently, I am also interested in the meta-scientifical (I am a philosopher) issue of the 'style of argumentation' used in various (respectable) branches of science, such as hist. linguistics. It's about 'what counts as prevailing evidence? a knock-down argument?' and such-like. That is why I am teasing you a bit.

      In this context I'd mention Mieczysław Jerzy Künstler, a Sinologist, don't know if you ever heard of him, I think he must have been a 'big shot' in Polish linguistics but I may be wrong; I have learnt a bit from him in the respect just-mentioned in the context of his polemics against the Sino-Tibetan hypothesis (against the linguistic establishment, he did not believe in Chinese's affinity with the Tibeto-Burmese, for methodological reasons).

    10. There is independent evidence for it, you say, OK, but certainly not very ample and rather circumstantial, isn't it? If PIE was like Lithuanian in non-tolerating geminated consonants at all (see Sergejo contribution above) then OK, no problem, but it probably wasn't quite like Lithuanian, was it?...

      The evidence is not merely circumstantial. It mainly involves the behaviour of s-stems in which the suffix is added to a root ending in *s (*-s-(e/o)s-). The geminate gets simplified if the suffix is in the zero grade. It seems that PIE did not tolerate any morphological geminates, hence such rules as *-t-t- > *-tst- or *-t-tr- > *-tr- and dissimilations like Skt. adbhis for ap- 'water' + -bhis (inst.pl.).

      I haven't met Mieczysław Jerzy Künstler, but I have his book Języki chińskie.

    11. That's interesting... So Sergejas may be right after all that 'esi' is inherited... . Good for him. But why is it 'eimi' rather than *esmi for 'am' in Ancient Greek? (we) are is still 'esmen' if I remember well... . Did the 1st person get its stem from the 2nd person? That would no longer be 'du^ a` la fre'quence', would it?

      The book by MJK you mention contains interesting if a bit embittered polemics against the Sino-Tibetan orthodoxy.

    12. But why is it 'eimi' rather than *esmi for 'am' in Ancient Greek? (we) are is still 'esmen' if I remember well...

      *-sm- normally developed into *-hm-, yielding either a geminated nasal (-mm-) or a single nasal with compensatory lengthening of the preceding vowel. So Att. eimí (+ Aeol. émmi, Cret. ēmí) are phonologically regular, while esmén has an analogically restored /s/ (from 2pl. esté). But dialectal 1pl. forms like eimén, eimés, ēmén are also attested.

      I'm familiar with MJK's views. He isn't alone in questioning the Sino-Tibetan unity. But there are also other departures from orthodoxy: some specialists prefer to nest Sinitic (Chinese) deep within Tibeto-Burman rather than make it a sister group of TB and a primary branch of "Sino-Tibetan". The current classifications of all those languages are very tentative. Most of them have very little derivational morphology, which makes the systematic application of the comparative method difficult; many are poorly documented and barely studied. All that makes Proto-Sino-Tibetan reconstructions sketchy and controversial.

    13. MJK mentions first of all a lack of systematic sound correspondences (in which IE languages abound) between Chinese and TB. Re little derivational morphology: perhaps basic lexicon (this is an idea by W. Mańczak) would help, English has, too, little derivational morphology yet shares a lot of words with say Ancient Greek, eg am-eimi, water-hudor, sit-kathedzomai ...or even Skt. or Hittite. Chinese does not (says MJK) or not ascertifiably so (given that it has been written in a non-phonetic script for millenia, and reconstructions of the old pronunciations diverge wildly).

      I have surely heard of Greenberg and Co.'s attempts to lump together Sino-Tibetan with a lot of other _Sprachfamilien_, like even Na-Dene, Algonkuin and what not --- yet, as Kant would say, here is reason reaching the limits of its legitimate use, methinks.

    14. English has a lot of obscured derivational morphology inherited from PIE and PGmc., usually no longer productive and not analysable synchronically, but recoverable via comparison with other languages. In Chinese and TB, there's too little to compare. We are lucky to have numerous old and archaic IE languages to work with -- most other families are nowhere so well documented.

    15. But hasn't English lost most of its morphology? I remember raising this point with you on John Wells' blog. And yet we do not have to trace English back to 'the olde dayes of that Kynge Arthoure' -- We, Gardena in geardagum, þeodcyninga þrym gefrunon hu... etc. -- to find out that it is a relative of, say, Albanian or even modern Hindi, basic vocabulary does the job. With Chinese the problem is, claims MJK at least, that it does not display much (regularly phonetically divergent) vocabulary similarities with TB. But again, we have not idea, _pace_ Karlgren et. al., what it sounded like in th'olde dayes of their Han-Emperors and their predecessors.

      The Semitic family meseems at least as well documented. Not to say 'unsurpassably documented'... Have a good weekend you Piotr and whoever else may be reading this exchange.

    16. OK, I know what my next entry will be about: fossil morphology.

  10. ' Note also that while OE sc(e)ān ‘shone’ yields the expected outcome /ʃoʊn/ in America, the normative British pronunciation is /ʃɒn/, with a shortened vowel.'

    I have always thought that 'shown' for 'shone' in the US is a spelling pronunciation, like so many American peculiarities... The word looks like it ought to be read 'shown'.... . Is there any evidence to the contrary? Frequency must have something to do with it, for how often have we an opportunity to say that something or other shone? I personally -- not very often, alas...

    1. It's known to have varied in the past. The American pronunciation is in fact the expected regular outcome of OE sċān. I wonder if the British pronunciation has been analogically influenced by gone.

  11. 'The American pronunciation is in fact the expected regular outcome of OE sċān.'

    I know. But has it in fact come out of it, or is this a case of a happy coincidence of how the word 'must have sounded' by its origin and how it 'ought to be pronounced' by its spelling? For 'gone' I'd say: du^ a` la fre'quence...

    1. John Walker wrote this more than 200 years ago, describing the British pronunciation of the time:

      This word is often pronounced so as to rhyme with tone; but the short sound of it is by far the most usual among those who may be styled polite speakers

  12. In Australian English, gone has unique phonetics: it is usually /gɒːn/, with a lengthened /ɒ/ vowel not found in any other word.