On plaintext: what, why, and how

By dkl9, written 2021-365, revised 2023-171 (2 revisions)

§ What?

We may define "plaintext" (or "plain text") as text-representing data for which:

This definition may appear strange. This choice of definition becomes more clearly justified when one considers other formats which aren't plaintext.

§ Images

Digital images can (and often do) contain text. However, the text in an image does not qualify as plaintext. Image text consists of pixels (as does any other content in an image). The pixels, and not the text, correspond to numbers.

The image may present the text in a variety of fonts, sizes, and placements, and one might have edited the image to distort the text in many ways. Even if two images present some text identically, most image formats use compression, which may unpredictably change how pixels correspond to numbers.

Text in images does not consistently or directly correspond to pixels, so image text certainly does not count as plaintext.

§ Formatted documents (MS Word, Google Docs, PDF, etc)

Though closer than images, most formatted documents don't fit this plaintext definition. They do directly contain text, presumably represented as a sequence of character-numbers (plaintext). They also contain formatting details, which get encoded along with the text in the numbers, so the numbers don't correspond to only text.

§ Why?

By invalidating images and formatting, restricting one's documents to plaintext may seem limiting without benefit. But plaintext brings several benefits:

§ Portability

By virtue of its simplicity, plaintext has universal support on all platforms (except one, and it can go suck an egg for all I care). You can send or receive files of plaintext to any device, and as long as it can handle files at all and recognises the file as plaintext (if it doesn't, change the extension to .txt), you can read and edit it.

§ Options

By virtue of its simplicity, one can write programs to work with plaintext relatively easily.

People have written literal hundreds of programs just to edit text, each of which handles the task differently. If you don't like a text editor, you can just use a different one, and still access the same files. If you don't like MS Word, you must keep using it anyway, sith it uses a custom, non-plaintext format that few other programs (all of which you might dislike) can handle.

Besides text editors, people have written text processing utilities to manipulate text in other ways. All these programs work on plaintext, sith plaintext, in its simplicity, does not complicate the programs by making them have to handle the mess of details like formatting present in other document formats. You can apply any of these programs to any plaintext files.

§ Formatting as text

Sith, in true plaintext, all the data corresponds to text, any formatting one seeks to emulate in plaintext would appear as text. Sith the text and the indications of its formatting both correspond to text in equivalent ways, programs for manipulating text can manipulate plaintext formatting just as easily. You could search for text based on its formatting, globally replace one style with another, etc.

§ Necessity

Some tasks — primarily, programming — usually require that one writes and stores relevant documents (in the case of programming, the source code) as plaintext. Trying to use Google Docs or the like to store and run your program will (with perhaps rare exception) fail. Source code almost always comes as plaintext, sith the task does not need formatting and often does require access to text-processing utilities.

§ Appearance

Most text editors (sith they often use atypical colour schemes and monospaced text) make text editing — even mundane editing of documents that one could have written with a word processor — look, to the domain-naive observer, like fancy "hacking".

§ Efficiency

MS Word documents, PDFs, and other non-plaintext formats, by their complexity, use more storage space, data transfer time, and computing power to work with. Thus it will almost always be fastest to use plaintext. (The difference may be negligible on contemporary computers.)

§ How?

To work with plaintext, you only need a text editor, which you can find built-in on every platform (with rare exceptions). Programs specifically for manipulating formatted documents, or "word processors", usually don't edit plaintext and can't replace text editors.

Proper built-in text editors include Notepad (Windows), the aptly-named TextEdit (macOS), and gedit (some Linuxes), but you can use any of many other editors (such as Geany, Vim, and Notepad++) after installing them.

If you use Notepad, know that it (except in recent versions) is incompatible with other systems (specifically, it only accepts CRLF for newlines and not just LF); consider using a different editor. If you use TextEdit, remember to configure it to use plaintext; by default, it acts as a word processor. If you use Chrome OS, you'll need to install an editor app from the Webstore — I use and would recommend Caret. (Caret may be discontinued, but there are others.)

§ Markup and Markdown

If you intend to write "natural language" documents (content just for humans to read, not computers), you may want to bring back formatting. One usually does this in plaintext by adding text-based indications of formatting (such as *asterisks for italics* or [brackets for links]( https://example.org/ )), in one of many conventions categorised as markup.

A particular markup convention, known as Markdown, has an effective design, along with many users and implementations.