2.18 The Functions of Structure Words: The shortest words of any language are its logical, numerical or grammatical words. These are words like 'or' 'of' 'is' 'a' 'if' 'one' 'two' 'the' and endings like '-s' '-ed' '-ing' and '-hood' in English. In Loglan short morphs of this kind include all the grammatical particles (tense words, articles, etc.), the connectives (logical and causal), all the prepositions that are not predicates, the case-markers, the pronouns, the other variables, and all the number- and letter-words.
Clearly, structure words are the words that shape the structure of an utterance, and into which the less frequently-used and typically longer content words, the predicates and names, are fitted as pictures into frames. Thus, Le _____ pa _____ is a sentence-frame. Its nature is completely determined by two structure words. But any two predicates we care to choose can complete the picture so-framed. For example, let us use mrenu and fumna. If we drop these two predicates into the empty places in the frame in both possible orders, we will make two sentences out of them: Le mrenu pa fumna = 'The man was a woman' and Le fumna pa mrenu = 'The woman was a man'. If there are 10,000 predicates in the language, then there are 100,000,000 ways of filling this one frame. Each structural frame is therefore a set of possible sentences.
This is the function of structure words: to build the structural frames which the content words then fill out, thus creating the utterances of the language.
2.19 The Four Little-Word Forms: All simple structure words in Loglan are short; the longest have only three letters. Since the shortest predicates are four letters long, there is a complete visual and audible separation of these two kinds of regular words in Loglan. Therefore we will occasionally call the simple structure words little words, for in Loglan they are genuinely little. It is the compound little words described in the next section that attain substantial lengths.
The four forms of little words are .V VV CV and CVV. Using a typical member to represent each form-class, we can call them the A-form words, the Ia-form words, the Da-form words, and the Tai-form words.
The complete set of A-form words are a e i o u. In ordinary contexts, these are the five simple logical connectives 'or' 'and' 'if and only if' and 'whether' etc., although they are also used as letter words in spelling contexts; see Sec. 2.25. [MRH: the use of a e i o u in spelling has been eliminated] These five tiny words, as well as any compounds which commence with them, must always be preceded by pauses in speech and commas in text. For example, in Da, a de pa kamla /DA.aDEpaKAMla/ ('X and Y came') the pause-comma is obligatory.
There are 25 Ia-form words and all of them are attitude indicators. Examples are ia and io ('Certainly' and 'Probably'). No pause need accompany these Ia-form words or their compounds: Da ia, e de io pa kamla /daIA.edeIOpaKAMla/ (daa-YAA . eh-deh-YOH-paa-KAAM-laa) = 'X certainly, and Y probably came'.
Next are the 85 Da-form words formed of the 5 primary vowels combined with the 17 regular consonants. All possible Da-form words have currently assigned meanings. Indeed, the CV form is perhaps the hardest-working morphological form in the language, in the sense that more CV morphs occur in Loglan utterances than morphs of any other form. The best examples of Da-words are da itself and its four companions, de di do du. These are often well-translated into a sort of "mathematized" English by the gender-less, number-less, case-less mathematical variables 'X' 'Y' 'Z' 'W' 'Q'. For example, Da kepti de di do du means 'X is a ticket to Y from Z on carrier W for price Q'.
Finally, there are the more numerous but less frequently used Tai-form words. There is morphological space for more than 400 of these CVV-form morphs, and only about half of them have been assigned; so we still have plenty of room for growth.
[It is a widely believed myth among loglanists that we are "running out of CVV space." This is not true. The last 12 years of active lexical development have added only a few dozen Tai-form words to the 148 words of this form that had meanings in 1975. More than a few dozen empty places, namely 85, have been added morphologically to the CVV word-space since that time. Five new phonemes /h y x q w/ were added to the language—even though four of them, /y x q w/, added only letter-words to the CVV space, and then promptly occupied it—and the previously unused vowel-pairs /aa ee oo/ have been discovered to have uses and so have augmented the rows of the CVV table. Since the number of Tai-form words that have actually been added to the language, including the new letter-words, is far less than 85, we have even more open CVV-space now than we had in 1975.]
The best examples of Tai-words are the letter-words Bai Cai Dai etc. [B C D etc.] of which Tai [T] itself is one; but many of the prepositions and adverbs of the language are also Tai-form. For example, dio is a "case tag" that means 'to/toward' and sui is a "discursive" that means 'also'.
As we shall see presently, the 3 irregular sounds /q w x/ and /y/ are fully represented morphologically only among the letter-words. In fact, CVV is the only little-word form in which /q w x y/ may occur. All other little words are made from the 5 primary vowels and the 17 regular consonants. [MRH: out of date]
MRH: it is worth noting though we have as yet no need to exploit it that we have a further space
Of Cvv-V words if we get desperate.
2.20 Compound Little Words: A compound little word is a string of simple little words concatenated pauselessly in speech and printed without spaces in text. Thus, pacenoina is a compound little word, or simply a compound. It is composed of pa + ce + noi + na. Most combinations of little words are permitted in compounds. One exception is that VV-form words may be compounded only with each other. For example, uaui expresses a satisfied kind of happiness and uiua expresses a happy kind of satisfaction, and both are compound attitudinals. Compound attitudinals may be of any length; but they are the only kind of compound in which VV words may occur.
There is one more restriction. While V, CV and CVV words may be mixed together in a compound, two orders are proscribed: V words may not follow other V words (the result would look like a VV word), and they may not follow CV words (which result would simulate a CVV word). So V words may in fact only follow CVV words, which they do only rarely and then only when the Tai-word is of Cvv-form, that is, only when it contains a monosyllabic vv-pair like Tai itself. When V words do follow Cvv words in a compound, they lose their leading pause. That pause is retained when a .V word is used initially in a compound. Thus .V + CVV puts the obligatory pause in front of the compound where it belongs. For example, anoi = 'if' is such a word, and the leading pause is preserved in use: Da, anoi de /DA.anoiDE/ = 'X if Y'. *Cvv + .V, in contrast, would put the pause in the middle of the word if the pause were retained. So it is not retained. (Words may have no medial pauses, of course; see Sec. 2.28.) An example of a Cvv + V compound is MaiA (MIGH-aa), the acronymic word for [MA], the US Postal Service's acronym for 'Massachusetts'. Note that /MAIa/ (MIGH-aa) is audibly distinct from /MAia/ (MAA-yah). This phonological distinction will be employed in the resolution of little words; Sec. 2.33.
There are also some "false compounds" that we must look at. Derivations of the CV + V kind do exist semantically even though morphologically they are proscribed. For example, the CVV word noa has a meaning which is derivable from no followed by a ('not and/or' or 'only if'). But the resulting noa is not a compound. It is a simple CVV word chosen with that semantic derivation in mind. This kind of derivation applies to the Tai-form letter-words as well. Thus Tai itself is semantically derived from the letteral [T] plus the suffix [ai]. But Tai is nevertheless a simple CVV word, and so is morphologically not subject to further resolution.
MRH: see later for extensive comments on “compound little words”.
2.21 Letter-Words: These are a kind of structure word which have a very special morphology in Loglan. By a letter-word is meant a word like English 'em', 'eff', or 'dee' by which letters are spoken or read aloud from text in a given language. That they are words in the spoken forms of languages which have written forms, and therefore have characters of some kind—and of course, not all languages do—is clear. We can say 'There are nineteen effs on this page' in spoken English. But words like 'eff' seldom appear in English text. Instead, in writing such a sentence we would probably use the letteral [f] and type [There are nineteen f's on this page.]. In Loglan, letter-words appear as frequently in the written language as in the spoken one. There are also a great many Loglan letter-words, since there are separate words for the upper- and lower-case versions of each letter in the Greek and Latin alphabets. That makes 100 letter-words in all, as there are 26 letters in the Latin alphabet and 24 in the Greek one.
Each Loglan letter-word is formed by combining the Loglan phoneme associated with that letter with a suffix. If the letter is Latin, the association is automatic. The phoneme associated with a given letter is the sound that letter is given in reading Loglan text aloud. If the letter is Greek, however, some of the associations between characters and sounds are obvious and some are arbitrary. So the entire list of associations between Greek characters and Loglan sounds will be given presently.
2.22 Suffixes for the 52 Latin Letter-Words: Some of these words are Tai-form; some of them are Ama-form. The four suffixes required to generate all 52 Latin letter-words are as follows:-
For the 7 lower case Latin vowels, add -si; thus asi, esi,…, ysi.
In addition, there are the 7 single-letter abbreviations of the vowel letter-words provided by the 7 vowels a e i o u w y used as one-letter words. These, too, are letter-words. But unlike the Ama- and asi-form vowel-words, to which they are alternatives (allomorphs), the single-letter letter-words are ambiguous with respect to both language and case. As we shall see later, the single phoneme /a/ may stand for the upper case Latin letteral A, for the lower case Latin a, for the upper case Greek alpha (we cannot display the Greek letterals on our font, so will be content to name them), or for the lower case Greek alpha. Which letteral the vowel /a/ is representing in any given case will depend entirely on the context in which we find it; see Sec. 2.24.
The reason there is a single-letter abbreviation for each of the Latin vowel-words is that, in many contexts in which the Latin letter-words are used, the preferred vowel-word is the vowel itself. Thus in spelling the Loglan word bawe wish to name the characters; so we say bei, a (bay . aa). We could say beiasi (bay . AA-see) if we chose, thus specifying the lower case Latin [a]; but that much univocality is not required in the spelling context and is seldom used. See Sec. 2.25 for more on spelling practice. In fact, the 7 single-letter vowel-words, a, e, i, etc., are used either in or as letter-words wherever the loss of case and language information is unimportant; for example, in making acronymic words; see Sec. 2.29.
Consonants cannot, of course, be spoken alone. Therefore they always require a vocalic suffix.
MRH: afi for lower case latin is now deprecated, to be replaced by zia
Ama for upper case latin is now deprecated, to be replaced by ziama
Vowels in acronyms collapse to –za- but never to –a-
2.23 Suffixes for the 48 Greek Letter-Words: These words are also of either Tai-form or Ama-form, but of course four different suffixes are used. The suffixes required to generate the 48 Greek letter-words are as follows:-
For the 6 lower case Greek vowels, add -fi; thus afi, etc.
upper add -mo; thus Amo, etc.
18 lower consonants, add -eo; thus beo, etc.
upper add -ao; thus Bao, etc.
Two phonemes in the Loglan phoneme set have no corresponding letters in the Greek alphabet. These are /c/ and /w/, (sh) and (eu). The remaining 24 Loglan phonemes have been tentatively assigned to the 24 letters of the Greek alphabet as follows. There are four with arbitrary associations and these are marked with a pound-sign [#]:
As I say, these assignments are tentative. The Institute would be pleased to consider any proposal based on a better understanding of Greek phonemics than this one displays.
To use these tables to build a Greek letter-word, proceed as follows.
(1) Suppose we want the word for lower case Greek gamma.
(2) The suffix for l.c. Greek consonants is -eo.
(3) Gamma is associated with the Loglan phoneme /g/.
(4) So the required letter-word is g + eo = geo, a Tai-form word.
MRH: only the lower case Greek consonant construction remains in use at this time.
2.24 Uses of Letter-Words: Letter-words are currently being used in five contexts: (1) In spelling, see next section. (2) In making acronymic words like CaiIzA (shai-EEZ-aa) for 'CIA' (see-igh-EIGH), see Sec. 2.29. (3) In forming dimensioned numbers like nenimei (neh-NEE-may) [10m] for '10 meters', see Sec. 2.31. (4) As letter-variables both in mathematics (toXai [2X]) and in ordinary discourse (Bai groda Cai = 'B is bigger than C', which is often abbreviated in text to [B groda C]). (5) To form scientific predicates, for example, geoykreni (geigh-oh-uh-KREH-nee) for 'gamma-ray', which is made from the letter-word for lower case gamma, geo (GEIGH-oh), the hyphen /y/ (uh) (see Secs. 2.48 and 2.55), and the predicate kreni (KREH-nee), which means 'ray'.
MRH: considerable changes have been made in dimensions, acronyms, and in general all uses of letters other than as pronouns.
2.25 Spelling Aloud: To spell a word aloud in Loglan, one uses Tai-words for the consonants and either A-words or Ama type words for the vowels at the speller's option. Normally da will use A-words for the vowel-letters. But if capitalization is to be reported, or there is any other source of confusion in the context, da may choose to use Ama words for greater explicitness. Thus the string of utterances (for so the grammar will perceive it) Tai A I /tai.a.i/ (tigh . aa . ee) will be taken by the Loglan auditor to spell the word [Tai]. In English, we would say 'Capital tee. Eigh. Eye.' More explicitly da might wish to say Liu Tai nu leasri li, Tai A I lu = 'The word 'Tai' is spelled (character-written) 'Tai. A. I.'.' A guide for this string is (lee-oo-TIGH- noo-leigh-AAS-ree-lee . tigh . aa . ee). There is more on spelling in the grammar.
2.26 Little Word Predicates: There are three series of words in Loglan that considered morphologically are compound little words but which are treated by the grammar as predicates. These are (1) mathematical predicates, (2) acronymic predicates, and (3) identity predicates. Identity predicates are bi and its analogs and compounds (see Lexeme BI in the Lexicon) and require no special morphological treatment. The other two series of little word predicates do require special treatment and will be discussed in the next section and in Sec. 2.29.
2.27 Mathematical Predicates: There are the two series of these words: the cardinals and the ordinals. They are generated by attaching either the cardinal suffix /-ra/ or the ordinal suffix /-ri/ to any number word or other quantifier; see Lexeme PREDA for the complete list of non-numerical quantifiers. Examples are tora = 'is a dyad/a twosome' and rari = 'is the "all-th" or final member of sequence…'. Like other predicates, mathematical ones are stressed penultimately. So they must be separated from any preceding number-word or quantifier by a pause in speech or a comma in text. (Such commas are not strictly necessary in text, but it is considered good writing style to use them because they contribute to the isomorphism of the language.) Thus Kambei leva fe, fefera galno veslo mi /KAMbeilevaFE.feFEraGALnoVESlomi/ = 'Bring those five fifty-five gallon containers to me' must be partitioned into (at least) two breathgroups at the juncture between the quantifier and the cardinal predicate to prevent /KAMbeilevafefeFEraGALnoVESlomi/ from being heard. The pauseless production would mean 'Bring that five-hundred-and-fifty-five gallon container to me'.
[This stress rule is new since 1983. It was decided to bring the stress in mathematical predicates—once initially stressed—into line with that of all other predicates. The cost of this rectifying move is phonologically a modest one: the occasional use of a quite naturally-occurring pause. Morphologically it eliminates an exception. It allows us to say that all predicates are stressed penultimately.]
MRH: it is worth noting that my parser does not yet enforce this stress rule.
MRH: I am just seeing this pause rule, which I believe I already enforce for different reasons.
2.28 The "No Pausing Inside Words" Rule: The preceding resolution demonstrates the necessity of not pausing inside a word…especially not inside a compound one, which will often have some other resolution if the speaker does inadvertently pause. This is because no Loglan word legitimately contains a pause and so the resolver makes good use of whatever pauses it hears. (This means that the tiny stops that do occur acoustically in some vowel disyllables, e.g. in /a,o/, will be measured by the machine—and by the human auditor, for that matter, who is usually unaware of them—as "too brief to be a pause".) New loglanists frequently object to this rule: 'But sometimes I don't know the word, and I have to hesitate until I remember how it goes!' That is true; and human auditors—your teacher, for example—will understand this…and probably you. But the machine will not understand you until your Loglan speech becomes "fluent". That, in fact, is what any increase in fluency largely is. It is the elimination of just these morphologically unnecessary (and sometimes downright misleading) pauses from the hesitant speech with which you and every other learner will inevitably begin.
MRH: this is important. It does allow the deliberate introduction of pauses to break compound words when they are not wanted.
2.29 Acronymic Predicates: Morphologically, acronymic words are compound little words; but grammatically they are predicates. They are one of several classes of little word predicates (see Sec. 23) which have this slightly misleading morphology.
But what are "acronymic words"? Just as a letteral, let's say [T], is a visual abbreviation of its letter-word Tai, so a Loglan acronym is a visual abbreviation of its acronymic word. Thus wherever [CCC] occurs in Loglan text it is simply an optional, written abbreviation of the word CaiCaiCai; and both are pronounced (shigh-SHIGH-shigh), for, like all predicates, acronyms are stressed penultimately. This is not quite like the handling of acronyms in English. The English acronym [CCC], for example, is a representation in writing of the "spelling-form" 'See-see-see'; but this form never occurs in written English. So a closer parallel with Loglan acronyms is the use of compound numerals in both languages. Thus,  is a shorthand written notation for the spoken expression 'one-hundred-twenty-three' in English just as  is shorthand for netote in Loglan. Both "longhand" forms may occur in writing as well as in speech. Just so, [CCC] is shorthand for CaiCaiCai; and CaiCaiCai is not only the "spelling aloud" of [CCC] but it is a written word as well.
Conventionally, the acronyms of Loglan are restricted to Latin letterals, both upper and lower case. The variety of internationally-used acronyms that can be "spelled out" as acronymic words in Loglan includes not only common sequences of Latin upper-case letterals, like [USA UN DNA], but also alphanumeric sequences like [U234] and chemical formulas like [H2SO4]. Also, chemical symbols containing both lower- and upper-case letterals, such as [Fe] (Iron) and [As] (Arsenic), are also uniquely spellable as Loglan acronymic words once certain decoding conventions are taken into account. But first let us look at the rules for turning such acronyms into Loglan words. These "spelling-out" rules are as follows :-
All consonant-letterals in an acronym are represented in the acronymic word by their full 3-letter words. Thus peicei is the word of which [pc] is the written abbreviation; and both expressions are pronounced /PEIcei/ (PAY-shay). [MRH: all contracted vowels are now –za-, never –a-]
With a very few exceptions to be noted later, each Latin vowel-letteral in an acronym is represented by that single vowel phoneme in the acronymic word. Thus DaiNaiA = /dai + nai + a/ is the reading-aloud, or spelling out, of DNA; and both expressions are pronounced /daiNAIa/ (digh-NIGH-aa).
If a pair of Latin vowel-letterals are adjacent in an acronym, the vowel phonemes by which they are read aloud are hyphenated with /z/ in the acronymic word. Thus CaiIzA = /cai + i + z + a/ is [CIA] read aloud. Both are pronounced /caiIZa/ (shigh-EEZ-aa). Similarly, AzAzA (aa-ZAA-zaa) is [AAA] read aloud.
If there is a 2-letter element symbol in a chemical acronym which, like [Ca] (Calcium), is composed of an upper-case consonant-letteral plus a lower-case Latin vowel-letteral, the two are hyphenated with /z/ in the acronymic word. Thus, [Ca] itself is read as Caiza (SHIGH-zaa) while [CaCO3] (calcium carbonate) is read as CaizaCaiOte (shigh-zaa-shigh-OH-teh). Note that [O] is not hyphenated to the second [C]. Thus [O] is not a lower-case appendage of that [C].
To read an acronym containing Greek vowel-letterals as a Loglan word, those vowel letters must be read aloud by their full three-letter words, e.g, /AMo/ or /AFi/. Any Greek consonant-letter in an acronym must be hyphenated with /z/ to any immediately following vowel of either nationality. [MRH: No Greek vowels; Greek lower case consonants obey same rules as others]
All non-initial numerals in an acronym are pronounced as number-words in the acronymic word. Thus Utotefo = /u + to + te + fo/ is the word of which [U234] is the abbreviation. Both are pronounced /utoTEfo/ (oo-toh-TEH- foh).
Acronyms with initial numerals—rare forms at best—are not allowed to be transformed into Loglan acronymic predicates. If they were, they would be taken for dimensioned numbers, e.g., temei = [3m] (TEH-may) 'three meters'.
Dimensioned numbers may have acronyms as their right-hand parts. Thus [100USD] might be the written form for 'one hundred US dollars', in which [USD] is an acronym. The whole expression, then, would be pronounced /nema,uSAIdai/ (neh-maa-oo-SIGH-digh) in which the resolver would detect the /u/ as part of the acronym [USD] and therefore not pair it with the preceding /a/. Written out, the compound word for [100USD] is nemaUSaiDai. [MRH: initial marker mue added to dimensions]
Acronymic words whose acronyms would imitate existing or even possible Loglan words are not permitted. This *peia is not permitted because its acronym is [pa] which imitates pa; see Sec. 28 for the acronym recovery rules. In speech /PEIa/ and /pa/ would be quite distinct, but in written text the acronym [pa] would be indistinguishable from the word [pa]. [MRH: no such rule enforced]
2.30 Pause and Stress Around Acronymic Words and Letter-Words: Stress is always penultimate in acronymic predicates, that is, stress falls on the syllable that is second from the last, as indeed it does in all predicate words.
Given the penultimate stress rule, Rule 5 in the preceding section requires that the juncture between single-vowel connectives and a quantifier, as in U totefo le mrenu ('Whether 234 of the men'), be protected by either a pause or by final stress on the quantifier: thus either /u.totefoleMREnu/ or /utoteFOleMREnu/ would be morphologically distinct from the production with the penultimately stressed acronymic word, namely /utoTEfoleMREnu/ ('Be U234 (to/at/about) the men(?|)' whatever that would mean). My prediction is that usage will follow the second, more economical route. But let us see.
The fact that acronymic words, being predicates, are always penultimately stressed may also be used to force the resolution of a string of separate letter-words (which otherwise might imitate an acronym) by stressing their final member. Thus, in /VEDmabaicaiDAI/ the sequence /baicaiDAI/ cannot resolve as the acronym BaiCaiDai [BCD], for that would have to be penultimately stressed: /baiCAIdai/. Thus the resolver can write Vedma Bai Cai Dai - 'Sell B to C for D', or even [Vedma B C D], from the production /VEDmabaicaiDAI/ without benefit of pauses. Pauses may of course be used to separate acronymic words from each other and from the number- and/or letter-words they might otherwise absorb, or optionally (never obligatorily) to separate letter-words from one another. Thus Bai, DaiNaiA Cai = 'B is the DNA of C (that is, part of C's genome)' may be univocally produced with only that one pause: /BAI.daiNAIacai/ or, just as effectively, /BAI.daiNAIaCAI/. However, if we drop the pause, the resulting production /BAIdaiNAIacai/ resolves as BaiDai NaiA Cai ('Be a BD type of NA to/of/about, etc. C'), and if at the same time the stress is dropped from /BAI/, then the resulting production, /baidaiNAIaeai/ resolves as BaiDaiNaiA Cai = 'Be a BDNA to/of/about, etc. C'. So stress is an important feature of the speechstream in the neighborhood of letter-words.
As we've just seen, /VEDmabaicaiDAI/ will resolve as Vedma Bai Cai Dai without benefit of pauses. But if we put the stress on the middle term in such a set, we get something that appears at first sight—that is, before we rule on it—to be resolutionally ambiguous: /VEDmabaiCAIdai/. There's an acronym here, alright, but it might be either BaiCaiDai (/baiCAIdai/), or CaiDai (/CAIdai/) with the letter-word Bai coming before it. Clearly we must rule on this case. The best morphological ruling is to let the resolver take such a pauseless, penultimately stressed string of letter-words as the longest acronym it can be; and then use pauses to mark off other cases. Under this rule, the sense of 'Sell B to CD' is given by Vedma Bai, Caidai; and this will be uniquely resolved from either /VEDmabai.CAIdai/ or /VEDmaBAI.CAIdai/; but /VEDmabaiCAIdai/ uniquely resolves as Vedma BaiCaiDai and means 'Sell BCD'. Similarly, 'This is a ticket from B to C on DF for (price) G' is Ti ketpi Bai Cai, DaiFai Gai, which is uniquely given by /tiKETpibaiCAI.DAIfaigai/ (and some other variations). But again, only a single pause is obligatory.
2.31 Pause and Stress Around Dimensioned Numbers: There is a final problem involving acronyms, and that is how to protect the junctures around the acronym-bearing dimensioned numbers of Rule 8, Sec. 29. Our example was [100USD]. This is spelled out in text as [nemaUSaiDai], and partially spelled out as [100USaiDai]; but in any case it is produced in speech as /nema,uSAIdai/, the close-comma indicating that the /a,u/ is disyllabic. The problem is how to prevent these objects from being misheard as "indefinite descriptions", that is, with the same grammar as Ne mrenu = 'One man'. Like mrenu, USaiDai is a predicate. How is it that the resolver does not hear Nema USaiDai—which would mean 'Some one hundred instances of U.S. dollars (i.e., things worth one U.S. dollar, eg., the bills or coins themselves)'—in this production? Again, the default rule is to let the resolver hear the longest dimensioned number it can hear, and use pauses to mark off other cases. So the resolver is instructed not to resolve pauseless productions like /nema,uSAIdai/ as two words when it can resolve it as one. Indeed, the two-word phrase Nema USaiDai would be parsed as an indefinite description. So when indefinite descriptions involving acronymic predicates are intended, the speaker must pause between the quantifier and that predicate. Thus Nema USaiDai /NEma.uSAIdai/ gets the now-intended meaning of 'Some one-hundred U.S. dollars' exactly. Even consonant-initial acronyms require this protection. Thus, Nema NaiZaiDai = 'Some 100 New Zealand dollars' must also be pause-bearing (/Nema.naiZAIdai/) in order to distinguish it from the dimensioned number nemaNAIZAIDAI /nemanaiZAIdai/. This last expression is a single word, a quantifier, and might be the measure of some quantity, say '100 New Zealand dollars- worth of wool' - NemaNaiZaiDai lunli. No pause is needed in this indefinite description. The two stressed syllables in the pauseless production /nemanaiZAIdaiLUNli/ will effect the desired resolution.
Notice that /NEma.uSAIdai/ differs from /nema,uSAIdai/ in two respects: there is a pause and a stressed /NE/ in the first production but not in the second. The extra stress in the first production does help the human auditor resolve these two productions correctly; but it may not be relied on by the machine's resolver. For example, if the quantifier is monosyllabic, the natural tendency to stress one syllable relative to another vanishes. So /ne.uSAIdai/ requires the pause to distinguish it from /ne,uSAIdai/. Between this last pair of productions the phonological difference is now minimal; but it works.
MRH: this seems to me to have nothing in particular to do with dimensioned numbers. The need to break in the middle of a number without a dimension occurs in other contexts.
One final note about stress in the neighborhood of dimensioned numbers. When such a number is of minimal length, consisting of only one letter-word, say, with a default ne assumed, and is being used in a numerical description—as in Ti langa ta lio mei = 'This is longer than that by (one) meter'—then the descriptor lio ('the number…') must be pronounced disyllabically and its second syllable must be stressed. This pattern will always give the desired resolution. Thus /liOMei/ will resolve as lio mei, while the ambiguous production */lioMEI/ is in danger of picking up any following letter-word as the final syllable of the number: /lioMEImei/ => lio meimei = 'the number (one) millimeter'. True; such a sequel is rare. But pronouncing lio disyllabically and stressing its final syllable offers complete protection against this accident; for then the one-syllable number is in effect the ultimate syllable of a quasi-compound, the phrase lio mei, which is in effect being stressed penultimately. This determines the right juncture of mei and prevents it from being heard as part of anything else.
MRH: I would be more inclined to require pauses or the explicit numeral breaking word juu.
2.32 Acronym Recovery Rules: The word-formation rules of Sec. 26 are sufficient to transform any (allowable) acronym into an acronymic word. But they are not sufficient to allow the recovery of every (allowable) acronym from its sound in speech. The consonant-words will decode uniquely, of course; but the single vowel-phonemes to which the vowel-words have been (nearly always) reduced will not. So to enable unique decoding of the single vowel phonemes in spoken acronymic words, certain already widely-used international conventions have been adopted. These are:-
"Nationality": The default assumption is that all the vowel-letterals in an acronym are Latin. If Greek vowels are used, they must be spelled out. If all the letter-words are single vowels, it is assumed that the entire acronym is Latin and upper case. Thus /aZAza/ is [AAA]; and the written word is AzAzA.
Case: It can be inferred from the word-formation rules that every single letter vowel-word which is joined by a hyphen to a preceding upper-case consonant-word represents a lower-case Latin letteral. Thus /FAIze/ is [Fe] and the acronymic word is Faize. But let's go further. Let's also assume that every single-letter vowel-word that is not hyphenated to a preceding Latin consonant-word has the same case as that word, or as any following consonant-word if the vowel-word in question happens to be initial. Then /CAIo/ will decode as [CO] and /Unai/ as [UN]; and the two words will be spelled out as CaiO and UNai, respectively. (It is quite a different matter whether [UN] will regularly refer to the United Nations in Loglan, or to the (impossible) diatomic compound of Uranium and Nitrogen; we trust the former…although discussion of the latter is not of course impossible.) /FAIe/ of course will then be [FE]; /FAIze/ would still be [Fe]; and /FEIze/ (FAY-zeh) might well be taken to be [fE] if such a curious acronym is ever needed. (At the rate at which acronyms are proliferating in the modem world, it may soon be.) But /FEIe/ (FAY-eh) would always have to be written out as feie, for its acronym *[fe] would imitate the number-word fe.
The case convention will not allow the symbols for the two V-initial element words Au (Gold) and Eu (Europium) to be read aloud in the usual way. The problem is that both the letterals in just these two chemical symbols are vowels. So the above conventions would apply the default rule wrongly and decode the spoken words /AZu/ and /EZu/ as [AU] and [EU]. (One of these, [AU], as it happens, predicates another impossible diatomic compound, this one involving Argon. The other, [EU], is not a possible chemical acronym for [E] is not an element symbol.) So these two exceptional chemical acronyms will have to have their lower-case second vowels "spelled out” when spoken aloud, namely as /aZUsi/ and /eZUsi/, respectively. They can then be written [Au] and [Eu] as required. Elements whose symbols are single letterals are, of course, referenceable by the corresponding letter-words: thus Nitrogen by Nai and [N] and Oxygen by Oma and [O].
Interestingly enough, the symbols composed of single letter-words have the grammatical status of "arguments", that is, they function as designations; while acronymic words composed of two or more symbols—not necessarily all alphabetic— have, as we have noted, the grammatical status of predicates. Thus Ta U235 /ta.utoTEfe/ means 'That's U-235'; while it takes "predification" (with Lexeme ME) to turn the argument Uma into a similar predicate: Ta meUma = 'That's U (in the sense of Uranium)'. The reason for this lack of parallelism between letter-words and acronyms is explained under Lexeme TAI in the Lexicon.
MRH: here is more evidence for my understanding that there should be no multiletter pronouns.
2.33 Resolving Structure Words: All structure words, whether simple little words or compounds, are resolved in a two-pass operation. On the first pass, the resolver reduces some part of some production that contains only little words to a string of simple little words, preserving whatever pauses and stresses it may find among them for use on the second pass. On the second pass, the resolver acts as a compounder. It places junctures (word-boundaries) around certain substrings of the string of little words which the reducer has identified (in this way, it "compounds" them), thus isolating those which are left as simple little words. It then turns both kinds of "lexes" (words) over to the lexer for classification as "lexemes". (The terms 'lex' and 'lexeme' are defined in the next chapter; for the moment you can think of a lex as a word, and its lexeme as the part of speech to which it belongs.) Well-formed text has had all this morphological work done for it by the writer; so text is turned over to the lexer directly.
[The "compounder" that will perform the second pass has not yet been written…although the lexer that will use its output has been. The latter is part of the preparser of the machine grammar which has, so far, been tested on textual input only. There is no doubt in my mind that a compounding algorithm to work with acoustic productions can be written, even if it might require a somewhat more elaborate array of pauses and stresses than are now thought to be sufficient. It is even possible that, to make the compounder work, some additional usage restraints on grammatical productions may also be found necessary; but these will almost certainly be minor adjustments in the usage rules which will have no large effects on the grammar.]
The reduction pass works as follows. For unconditional resolution the little-word resolver requires that it be given a segment of a production that is known by other resolutions of the resolver to contain only little words. Such a segment or segments could be (a) all of a V-final breathgroup which contains no CC's and hence no predicates; (b) those parts of a V-final breathgroup which does contain one or more CC's but in which the region or regions occupied by predicates have been marked (this will have been done for it by the predicate resolver in Sec. 2.60); or (c) a C-final breathgroup in which the regions occupied by predicates, if any, and by the resident name have all been marked, and in which there is one or more residual segments known to contain only little words. Thus in unconditional resolution, the little-word resolver is the last of the three resolvers to go to work on some breathgroups, namely on those parts of it which have been set aside for it by the other resolvers.
But there is a complication. The little word resolver is also required to perform conditional resolutions. The reader may recall that before the name-resolver of Sec. 2.17 could locate the left edge of the name that it knows is always resident in a C-final breathgroup, it had to have the "regular word resolver" (evidently a joint effort of the little word and the predicate resolvers) attempt to perform a conditional resolution of the prequel of any apparent name-marker that it found. This resolution was conditional because it could fail. Only when it found a name-marker whose prequel did resolve could the name-resolver mark off the region of the C-final breathgroup that was not occupied by the resident name. But not all prequels of apparent name-markers resolve. (The few that don't are, as we will see, easily identified.)
This seems circular but is not. The segment of a production which the name resolver calls a "prequel", and which is the string of phonemes that is given to the little word and predicate resolvers for conditional resolution, is definitely marked. In particular, it is the string that lies between the copy of a name-marker whose prequel it is and the left edge of that breathgroup. In fact the only difference between these prequel strings that the resolver is asked to resolve conditionally, and those more confidently marked portions of a breathgroup which are known to contain only regular words, is that the attempt to resolve the former may fail. The latter will resolve…if the utterance from which it is taken is well-formed. But the answer to the conditional resolution question may occasionally be no. The resolver must be able to provide that sort of information as well as give the results of a successful resolution which it knows beforehand will succeed.
Let us commence with unconditional resolution. We will assume that the name- and predicate-resolvers have done their work, and that we now have a pauseless segment of a breathgroup that is known to consist entirely of little words. Such segments may be initial or non-initial in their breathgroups. We recall that strings of little words will consist only of V VV CV CVV elements; that V elements can only be initial in breathgroups or follow CVV elements; and that if a V element does follow a CVV element pauselessly the latter must be of Cvv form (as for example in the acronymic word MaiA); see Sec. 2.29. This detail is used in Step (3c) of the resolution procedure given below.
We proceed as follows:
1) We ask first if the segment is breathgroup-initial, that is, immediately preceded by a pause. If it is, continue. If it is not, go to Step (4).
2) If the first sound in the pause-preceded segment is V, the first word is either a V or a VV. To find out which, count the consecutive V's. If the number is odd, the first V is a word, and the remainder of the string of V's, if any, is composed of one or more VV words and are so resolved. If the number is even, the entire string of V's is composed of VV words and are so resolved. In this way, we arrive at the first C, if one exists in the segment, or at the end of the segment.
3) If the first sound (or the sound now being examined) in the segment is C, the next sound is a V; for, by construction, there are no CC's in this segment. So examine the next two sounds. Of the three possibilities, namely CVCV, CVVC, CVVV—CVCC is again impossible by construction—the first two resolve immediately:
3a) If CVCV, the first CV is a CV-word, and it is followed by the unresolved sequence CV… This returns us to (3).
3b) If CVVC, it can only resolve as CVV plus sequence C… (Recall that *CV + V is proscribed.) This too returns us to (3).
3c) If it is CVVV, we must listen to the first two Vs and proceed as follows.
3cl) If they form a monosyllabic pair, giving CvvV, then Cvv is resolved as a word and we go with a V-initial sequence to (4), which handles V-initial segments that are not initial in their breathgroups.
3c2) If they are disyllabic, giving CV,VV, then this sequence can resolve in two ways. First, as CV + VV; second as a disyllabic CV,V + a V-initial sequence. To discover which, count the consecutive V's if any exist beyond CV,VV. If the number is zero or even, then the sequence resolves as a CV word followed by one or more VV words. If the number is odd, it resolves as a disyllabic CV,V word which is also followed by one or more VV words. (There can be no V-form words in the string of consecutive V's.) This brings us to the end of the segment or to the next C, which returns us to (3).
4) We know that the segment is not initial in its breathgroup. Therefore if it is V-initial, then it can only commence with a string of one or more VV elements; and these are so resolved. This takes us either to the end of the segment or to the first C; and the latter returns us to (3). If the non-initial segment is C-initial, we also return to (3).
In this way, all little words in well-formed utterances are unconditionally resolved.
We now consider the conditional resolution of a segment which is the prequel of a possible name-marker.
We ask first if the segment is C-final. If it is, it will not resolve into regular words and is so reported. If the segment is V-final, continue.
Is it breathgroup-initial? That is, is the segment preceded by a pause? Then any number of V's may lie between its C's, or before the first C, or after the last C. So resolve it unconditionally; for it will resolve into regular words.
Is it C-initial? If it is, every possible C-V pattern will resolve, so resolve it unconditionally. If it is V-initial, count the V's before the first C, if there is one, or in the segment if it contains no C. If there is an even number of V's, it will resolve unconditionally. It either is, or its head is, a string of VV-form elements. But if the number of V's is odd, it will not resolve and is so reported.
In sum, only if a segment is (i) C-final, or (ii) breathgroup-medial and commences with an odd number of V's will it fail to resolve. All other prequels of apparent name-markers will resolve unconditionally.