*Prerequisites: algebra, concept of computational complexity (but not necessarily detailed knowledge), concept of big-O (but not necessarily detailed knowledge)*
*Logarithms, here, are base 2 if notated with "log" and base e (natural) if notated with "ln".*
If you study computer science, you probably encounter sorting algorithms and their time complexities:
- bubble sort, O(n²)
Run thru each item in the list, compare it to the next one, and swap if they're in the wrong order.
Repeat with more runs thru the list until it's sorted.
On average, each element (n) gets "bubbled" to its place in a run thru the entire list (n).
- selection sort, O(n²)
Run thru the list, keep track of the smallest item (by comparing each new item to the currently smallest), and move that smallest item to a new list at the end.
Repeat with more runs thru the original list until all items are moved to the new list.
Every time, each element (n) has to be selected for its place, which takes, on average, n/2 steps.
- insertion sort, O(n²)
Take the first out-of-order item (less than the previous item) in the list, and shift it into its sorted place in the correctly-ordered section at the start, by repeatedly comparing it to its neighbours as it is shifted in.
Repeat with the remaining items until the entire list is sorted.
On average, half the elements (n/2) must be shifted into place, each of which takes, on average, n/2 steps.
- merge sort, O(n log n)
Break the list into a list of pairs, and swap within each pair for which the second item is less than the first.
Then merge adjacent pairs into quartets by alternating samples from the pairs, taking whichever is lesser.
Repeat with larger merges until the entire list is merged into a sorted form.
Every time, each layer of merging (log n) involves action on each element (n).
There are k = log n layers sith the number of sublists halves from each layer to the next, so 2^{k} = n.
- quicksort, O(n log n)
Take the first item of the list and compare it to the rest of the list.
Put other items less than the first before that first item, in a new list, and the greater ones, after it.
Quicksort those two sections of the new list, stopping when there are no more non-empty lists to sort.
On average, this involves log n layers of partitioning (for each layer roughly doubles the number of sublists), each of which acts on most of the elements (n).
The list goes on.
But — with certain weird exceptions, which I address at the end — you never find one faster than O(n log n), in the average case for large inputs.
The list you sort only has n elements, so only O(n) is *obviously* needed.
Why, exactly, must it be more?
Where does a logarithm come from, not just in particular algorithms, but in such a universal limit?
## Sorting as searching
What does sorting do, really?
The obvious answer — the unhelpful answer — is that it takes a list of n elements and gives you a list of n elements.
But the sorted list has the same elements as the input list, just in a different order.
So this description of sorting's input and output is misleading.
You don't get a list.
What do you get?
Sorting, in its essence, is not list-to-list, but list-to-permutation.
A sorting algorithm searches in the space of permutations for the correct one — the sorted one.
A permutation is a description of how to order the elements of a list.
For a list of n elements, there are n! (n factorial) permutations.
## Information theory
Take one of the classic sorting algorithms.
Doesn't matter which.
What happens in each step that actually gives you information by which to search for the sorted permutation?
The core step of any of those algorithms is comparison.
Comparing two items tells you either that the first is greater than the second, or that the second is greater than the first.
(We may collapse equality into one of those cases, for the purposes of sorting.)
Thus a comparison has two possible results, and, as a binary value, gives you one bit of information.
That's how much information sorting gets.
How much does it need?
How much information does it take to specify a permutation?
There are n! permutations, regarded as equally likely, so each permutation has an information content of log(n!) bits.
## Factorial approximations
O(log(n!)) isn't very intuitive, nor is it the result claimed at the beginning.
Recall the definition of factorial: n! = 1·2·3·...·(n-1)·n.
What, then, is the definition of log(n!)?
Simplify it as much as makes sense.
log(n!) = log(1·2·3·...·(n-1)·n) = log 1 + log 2 + log 3 + ... + log(n-1) + log n
There are some straightforward bounds we can assert for this.
n log 1 = 0 < (n - 1) log 2 = n - 1 < log 1 + ... + log n < n log n
That expression in the middle will be a lower bound for the number of comparisons needed to sort an n-item list.
So far we see that's between O(n) and O(n log n), inclusive.
We can finish this route to an answer with calculus.
A sum is approximately equal to a corresponding definite integral:
log(n!) = log 1 + ... + log n ≈ ∫_{1}^{n}dx log x.
Integration by parts equates that to (n log n - n + 1)/(ln 2) = O(n log n).
## Sorting as searching, in the other sense
Those approximations and integrals at the end might look recklessly nonrigorous or opaque to you.
So here's an entirely different approach.
Consider a semi-magical (but still comparison-focused) sorting algorithm that sorts each element into its place independently of the others.
That is:
1. Make a new list of n items.
2. For each item in the input list:
1. Find its sorted position in the new list.
2. Place it in that position.
This looks to me — and I expect you'd agree — like the most efficient sorting algorithm we could ever get.
There's a loop running thru n items, so, assuming we're using a single thread of computation, the algorithm is at least O(n).
Within the loop is a search step.
What time complexity do we get from searching?
The most efficient way to search is binary search.
In each step (which takes a comparison), we halve the amount of list to search, so a binary search takes O(log n) steps.
Thus this optimal search procedure takes O(n log n) time.
Theoretically, at least, each time we add an item from the input list to the new list, we search not thru the whole space of the new list, but the remaining empty space, which diminishes as we progress.
So the log n factor isn't really a constant factor.
The total time would be that same sum as before:
log 1 + log 2 + log 3 + ... + log n.
But at least in this case, we get the intuition that the search only gets slightly faster for most of this procedure, so that sum-of-logarithms doesn't subceed n log n that much.
## What, did you want a *real* proof?
Assume n is even.
If n is odd, use n - 1 instead, and we get a trivial difference.
log 1 + log 2 + log 3 + ... + log n > log(n/2 + 1) + log(n/2 + 2) + ... + log(n - 1) + log n > (n/2) log(n/2) = (n/2) (log n - 1) = (n/2) log n - n/2
That is, the logarithmic search-costs for the first half of the searches subceed the full, log n search by at most 1, so the total search time must be at least (n/2) log n - n/2.
That's not exactly n log n, but it's O(n log n), for it subceeds n log n by a constant factor and an asymptotically lesser term (n/2).
## Magical exceptions
A correct result can only be violated if it relies on assumptions that are sometimes false.
There are sorting algorithms out there faster than O(n log n):
- odd-even sort, O(n)
- radix sort, O(wn) with w small
- bead sort, O(n)
Odd-even sort is fast sith it bypasses the single-thread assumption: it has, in its full form, a thread for each item in the list.
Radix sort is fast sith it bypasses the comparison assumption: it gets information not by two-way comparisons but by looking at digits, which gives information logarithmic to the numeric base you use.
Bead sort appears to be fast by violating the single-thread assumption, but very differently from odd-even sort.