But why is sorting O(n log n)?

By dkl9, written 2023-230, revised 2023-230 (0 revisions)


Prerequisites: algebra, concept of computational complexity (but not necessarily detailed knowledge), concept of big-O (but not necessarily detailed knowledge)

Logarithms, here, are base 2 if notated with "log" and base e (natural) if notated with "ln".

If you study computer science, you probably encounter sorting algorithms and their time complexities:

The list goes on. But — with certain weird exceptions, which I address at the end — you never find one faster than O(n log n), in the average case for large inputs.

The list you sort only has n elements, so only O(n) is obviously needed. Why, exactly, must it be more? Where does a logarithm come from, not just in particular algorithms, but in such a universal limit?

§ Sorting as searching

What does sorting do, really?

The obvious answer — the unhelpful answer — is that it takes a list of n elements and gives you a list of n elements. But the sorted list has the same elements as the input list, just in a different order. So this description of sorting's input and output is misleading. You don't get a list. What do you get?

Sorting, in its essence, is not list-to-list, but list-to-permutation. A sorting algorithm searches in the space of permutations for the correct one — the sorted one.

A permutation is a description of how to order the elements of a list. For a list of n elements, there are n! (n factorial) permutations.

§ Information theory

Take one of the classic sorting algorithms. Doesn't matter which.

What happens in each step that actually gives you information by which to search for the sorted permutation?

The core step of any of those algorithms is comparison. Comparing two items tells you either that the first is greater than the second, or that the second is greater than the first. (We may collapse equality into one of those cases, for the purposes of sorting.) Thus a comparison has two possible results, and, as a binary value, gives you one bit of information.

That's how much information sorting gets. How much does it need?

How much information does it take to specify a permutation?

There are n! permutations, regarded as equally likely, so each permutation has an information content of log(n!) bits.

§ Factorial approximations

O(log(n!)) isn't very intuitive, nor is it the result claimed at the beginning.

Recall the definition of factorial: n! = 1·2·3·...·(n-1)·n. What, then, is the definition of log(n!)? Simplify it as much as makes sense.

log(n!) = log(1·2·3·...·(n-1)·n) = log 1 + log 2 + log 3 + ... + log(n-1) + log n

There are some straightforward bounds we can assert for this.

n log 1 = 0 < (n - 1) log 2 = n - 1 < log 1 + ... + log n < n log n

That expression in the middle will be a lower bound for the number of comparisons needed to sort an n-item list. So far we see that's between O(n) and O(n log n), inclusive.

We can finish this route to an answer with calculus. A sum is approximately equal to a corresponding definite integral: log(n!) = log 1 + ... + log n ≈ ∫1n dx log x. Integration by parts equates that to (n log n - n + 1)/(ln 2) = O(n log n).

§ Sorting as searching, in the other sense

Those approximations and integrals at the end might look recklessly nonrigorous or opaque to you. So here's an entirely different approach.

Consider a semi-magical (but still comparison-focused) sorting algorithm that sorts each element into its place independently of the others. That is:

  1. Make a new list of n items.
  2. For each item in the input list:
    1. Find its sorted position in the new list.
    2. Place it in that position.

This looks to me — and I expect you'd agree — like the most efficient sorting algorithm we could ever get. There's a loop running thru n items, so, assuming we're using a single thread of computation, the algorithm is at least O(n).

Within the loop is a search step. What time complexity do we get from searching?

The most efficient way to search is binary search. In each step (which takes a comparison), we halve the amount of list to search, so a binary search takes O(log n) steps.

Thus this optimal search procedure takes O(n log n) time.

Theoretically, at least, each time we add an item from the input list to the new list, we search not thru the whole space of the new list, but the remaining empty space, which diminishes as we progress. So the log n factor isn't really a constant factor. The total time would be that same sum as before: log 1 + log 2 + log 3 + ... + log n.

But at least in this case, we get the intuition that the search only gets slightly faster for most of this procedure, so that sum-of-logarithms doesn't subceed n log n that much.

§ What, did you want a real proof?

Assume n is even. If n is odd, use n - 1 instead, and we get a trivial difference.

log 1 + log 2 + log 3 + ... + log n > log(n/2 + 1) + log(n/2 + 2) + ... + log(n - 1) + log n > (n/2) log(n/2) = (n/2) (log n - 1) = (n/2) log n - n/2

That is, the logarithmic search-costs for the first half of the searches subceed the full, log n search by at most 1, so the total search time must be at least (n/2) log n - n/2. That's not exactly n log n, but it's O(n log n), for it subceeds n log n by a constant factor and an asymptotically lesser term (n/2).

§ Magical exceptions

A correct result can only be violated if it relies on assumptions that are sometimes false. There are sorting algorithms out there faster than O(n log n):

Odd-even sort is fast sith it bypasses the single-thread assumption: it has, in its full form, a thread for each item in the list.

Radix sort is fast sith it bypasses the comparison assumption: it gets information not by two-way comparisons but by looking at digits, which gives information logarithmic to the numeric base you use.

Bead sort appears to be fast by violating the single-thread assumption, but very differently from odd-even sort.