By dkl9, written 2023-288, revised 2023-288 (0 revisions)
The entropy of a message (H) is roughly proportional to the uniformity of its probability distribution (P(x) for each possible x). If the message has just two possible values, H is greater exactly insofar as the split between their probabilities is close to 50/50.
P(A) | P(B) | H (bits) |
0.5 | 0.5 | 1 |
0.7 | 0.3 | 0.88 |
0.8 | 0.2 | 0.72 |
0.9 | 0.1 | 0.47 |
0.95 | 0.05 | 0.29 |
The entropy of a message is, intuitively, proportional to its information content. Thus you can learn more efficiently by seeking messages generated in higher-entropy ways.
Assuming that people ask questions to get information, and that questions are strictly yes-or-no, or otherwise have two main answers (as in "which direction — east or west — leads to our destination?"), the best questions are those which separate options of roughly equal prior probability to the questioner.
The prior probability does not necessarily match the intuitive (but kinda meaningless) "objective probability". E.g. with no specific information for the scenario, the "which direction" question is a 50/50 split, but if you're near the east coast of an island containing an otherwise-unknown destination, your priors should be biased in favour of "west".
There are at least two uses of this maxim. You can use it yourself to guide your choice of questions. You can assume others follow it and, when they seem to violate it by asking weird binary questions, find that at least one of the maxim's assumptions are false:
Alas, I don't (yet) know the relative frequency of those listed confusion modes.