How should one write a language?

By dkl9, written 2024-029, revised 2024-029 (0 revisions)


Suppose you are to pick an orthography (writing system) for a language. This mostly happens for one of these reasons:

If you want the orthography to be precise and understood by any competent linguist, you could write the language phonetically with the International Phonetic Alphabet (IPA). The results of that are usually overcomplicated and silly, so people instead often make semi-precise alphabetic orthographies based on the Latin alphabet, sith the Latin alphabet is familiar to most and so easy to learn. Usually, better options exist.

Ceteris paribus, a restrictive orthography — one with a written form for exactly those vocables with meaning in the language — is better. With a restrictive orthography, the details we distinguish in writing all correspond to distinctions that matter in the spoken language. A crude, permissive orthography offers a wide variety of written forms, all of which must be kept distinct, even while relatively few are used in the language.

Most languages have at most a few thousand atomic words. Any other words are usually formed by compounding or inflection. Compounds often use ordinary atomic words in their original form. Inflection gets more complicated, often using affixes peculiar to inflection, and sometimes changing the root word. If the language is fully analytic — it uses the same form of a word for any role in the sentence, expressing grammar with word order — then its words will mostly be atomic or compounds. Thus we can easily write an analytic language, like Chinese or toki pona, with a symbol for each atomic word. Such orthographies (logographies) can be written quickly and compactly, only needing to distinguish pronunciations which are actually meaningful in their respective languages. They are harder to learn than phonetic orthographies, but that's only a problem for those learning a logography anew. Most writing is done by those who already know the system well.

It would be annoyingly complicated to make a word-level restrictive orthography for a language that inflects words. Japanese gets partway there, writing many atomic words with word-level kanji, showing inflections, particles, and some other words with phonetic kana. Most word-inflecting languages should, and do, make things easier and more uniform by having everything written phonetically.

The strictest phonetic writing systems are syllabaries: one symbol per valid syllable, each symbol unique. Those are usually too tedious for practical use, due to how many syllables any given language uses. Many languages distinguish over a thousand syllables. Japanese has only on the order of a hundred syllables, so it goes well with a syllabary (kana), tho even that has fewer symbols than expected, via patterns such as dakuten.

For languages that at least have simple syllables, abjads are the next best choice, if they work. Under an abjad, you write out the consonants — all of which the language uses somehow — and the reader fills in vowels and syllable breaks according to the language's phonotactics. This works better if the language has few vowels, as in Arabic.

If the language has relatively many vowels, or the vowels are used more informatively, a sequence of consonants might leave many words ambiguous. Then you use an abugida, worse than an abjad, sith it may introduce syllables absent in the actual language.

Use an alpabet as a last resort for inflection-heavy languages with complex syllables, like English with its words like "slurping".