It's well know that K-selection is a problem that can be solved in O(N logN) average time by using an approach similar to the one of QuickSort . Anyway, the worst-case is quadratic even when you adopt a randomization step (here you have the C++ code for Quickselect).
The above paper introduces a beautiful variant of classical selection. The key idea is to partition the input data array A in groups of five elements. Then:
- Within each group G[i], find the median(G[i]) in constant time by direct comparison of the 5 elements; This step is at most linear;
- Take the median of each group and find M, the median of medians (median(median(G, .... G[floor(n/5)+1]) by a recursive call; This steps takes T(n/5);
- Use M to partition the input by calling the quickselect algorithm recursively; This step takes at most T(7n/10) -- see below;
- Among the n/5 medians, n/10 are larger then M, being M the median;
- For each median, at least two other elements in A are larger than the median by construction;
- Therefore, GTM is at least 3n/10;
- For symmetry, LTM is at least 3n/10;
The total cost is therefore T(n) <= Θ(N) + T(n/5) + T(7n/10)
This recursion equation is linear (it can be proven by substitution method). The intuition is that 1/5 + 7/10 = 9/10 and therefore the geometric series is converging.