In defense of Bayesian probability and uncertain prediction

By dkl9, written 2023-123, revised 2023-171 (1 revisions)

Open this iff you don't know about Bayesian vs frequentist.

The philosophy of probability distinguishes two main perspectives on what a probability means. One is "frequentist" (named for the importance of frequency): the probability of an outcome is the proportion of that outcome in all the trials, in the (potentially theoretical) long run (or "population"). One is "Bayesian" (named for Thomas Bayes): the probability of an outcome is a number following a few rules (such as being from 0 to 1) decided by the observer and updated by evidence according to Bayes' theorem. The two views each have their own set of statistical methods, and are often debated — as a trend to which I, as you might expect from the title, will contribute.

By asserting the objective probability of 0 or 1 for each event, the frequentists prohibit themselves from directly considering probabilities of one-time events. The (frequentist) probability of a coin flipping to heads is 50%, sith you can flip the coin many times. The frequentist probability of any particular politician being impeached on a particular occasion is either 0 or 1, with no options to be more precise in your uncertainty, sith it only happens once. So frequentism doesn't work well for one-time events.

You might think you don't care about one-time events. But all of statistics — all of knowledge — ultimately serves to plan for one-time events. The coin lands on heads half the time, and you want to know that sith you want set up your responses to the next flip. A treatment is effective on 70% of patients, and you want to know that to consider its use on one new patient.

Conclusions reused to guide large-scale actions are just accumulated evaluations of one-time events. To act based on population parameters, without using everything you know about the individual cases, is necessarily a suboptimal approximation.

If you have a fair coin, the frequentists would say the probability that the next flip is heads is 0.5. Once you flip it, and don't look, the frequentist probability goes to an unknown option from 0 or 1. That is, the probability changes over time as events unfold. But events unfold from other events; the outcome of that next flip was already determined before you flipped it, it's just that no one knew. A superintelligent and closely-observant outsider could have predicted the result minutes before the flip. (Unless something something quantum nondeterminism, but we can contrive this a bit so that doesn't apply.) Nothing fundamentally changes at the event. It just becomes more measurable to certain observers.

If we allow probability to change at the points of relatively arbitrary events, why not have it change, more usefully, at the points when you get new information? After all, your brain is part of the physical world, so your taking in new information is an event like any other.

Perhaps you reject the Bayesians' subjective probability sith subjectivity, you think, eliminates any forced basis in reality, and makes an attempted description of the world entirely dependent on opinion. This is an understandable position at which to arrive if you're used to humans' abysmal use of subjective probabilities. Most humans don't correctly apply Bayes' theorem of probability updates, which would enforce a connection to reality, once you get past the priors. Most humans don't correctly calibrate their stated probabilities, often stating an unjustified complete certainty (which, theoretically, should never be revised by evidence), or asserting 99% confidence when they could maybe make 19 out of 20 correct statements of equal quality. (Yes, that way of calibrating probabilities is quite frequentist. The proper, but less intuitive, way to calibrate — fully avoiding frequentism — is by maximising the expected logarithm of the probability given to the correct hypothesis.) And I'm not about to assert that I always do either of those correctly. But the flaws of human practice do not invalidate the properly-implemented theory.

Bayesians rely on priors, which are more vulnerable to subjectivity. (There are ways to objectively base your priors, such as Solomonoff induction, but they tend to be incomputable.) Frequentists rely on the similarly-fraught — but imprecise! — decision to accept or reject hypotheses based on a probability that doesn't answer their original question (P(E|H₀) rather than P(H₀)).

People (some people, at least) seem rather averse to uncertain predictions. Yet uncertain predictions may be the best you can do, being able to express your knowledge in more detail than asserting a single result. We have ways to precisely reward uncertain predictions. People can, based on those methods, try to maximise their expected score. In doing so, they must take into account all their knowledge to produce subjective probabilities, as produced most effectively by Bayes' theorem.