The Art Gallery Guardian

Selection in a sorted matrix


A matrix is sorted if every row and column are non-increasing.

A common interview problem asks for the kth smallest number in a sorted matrix, and usually people give an O(k\log k) algorithm. Wlog we can assume all the numbers in the matrix are distinct, as we can always break the ties by factoring in the position of the number in the matrix.

There is a O(k) time solution. In fact, O(\min(k, m)) if it's a n\times m matrix and n\leq m.

I'm frustrated that there isn't a good description of such an algorithm. The most common reference people provide is [1]. However there are a few downsides of that paper. It only works when n=m and it is still a bit complicated. So here I will give a short description to a modified version of the algorithm.

One can see the idea closely resemble what happens in fractional cascading, we basically squeeze the unknown values between known values, so we don't have to look at most of the unknown values.

First, we assume the matrix we care about is a n\times m matrix A. Both n and m are power of 2 and n\leq m.

Let A_o be the matrix we get by removing all even index columns from A, and add the last column.

In [1], they used A_{o,o}, which is defined as only retain positions with odd coordinates, but I found it nicer to consider things by only stripping one coordinate. In particular, it gave a much nicer proof for the following result.

Definition1

r(a,A) is the number of elements in matrix A smaller or equal to a.

Theorem2

Let A be a sorted n\times m matrix, then 2(r(a,A_o) - n) \leq r(a,A)\leq 2r(a,A_o).

Proof

For any fixed i, let f(i) be the largest j, such that A_{i,j}\geq a. r(a,A)=\sum_{i=1}^n f(i), r(a,A_o)=\sum_{i=1}^n \lceil f(i)/2 \rceil \leq r(a,A)/2 +n. On the other hand \sum_{i=1}^n f(i)/2 \leq r(a,A_o).

This means if we want to find an element of rank k in A, we can first find element of rank k/2+n and k/2 in A_o, and we know the solution would be in between. The remaining operation takes O(m) time:

  1. Use a flood fill starting from the position of k/2th element in A_o and find at most 2n positions where the element of rank k could reside. Namely it select elements in A that is in between the k/2th element and k/2+nth in A_o. We can do this because all these elements are connected in a component.
  2. While doing the flood fill, it can also find the rank of the k/2th element in A_o in the matrix A for no extra cost.
  3. A linear time selection algorithm on all the elements resulted from the flood fill.

This would give us a recursive algorithm if instead of just finding kth number, it finds the k_1th and k_2th number at the same time. As long as |k_1-k_2|=O(m), we can find them both only with a O(m) extra time. k_1 and k_2 will be the upper and lower bounds respectively. Some basic algebra shows |k_1-k_2|=O(m) inside each recursive step if we start with k_1=k_2=k.

Let T(n,m) be the time used when the matrix is of size n\times m, and n\leq m. Certainly T(1,m)=1, and the rest follow the recursive relation T(n,m) = cm + T(n,m/2) for some constant c.

If n>m, we can rotate the matrix implicitly to get the same running time.

Solving it gives us the desired running time O(m). This is also a O(k) time algorithm because we only need to consider the k\times k submatrix in the up-left position.

A Haskell implementation here, it requires a linear time rank selection algorithm selectRank.

Note a slightly faster algorithm also exists when n and m are very different. It was shown selecting the kth element can be done in O(n\log m/n) time [2].

References

[1] A. Mirzaian, E. Arjomandi, Selection in x + y and matrices with sorted rows and columns, Information Processing Letters. 20 (1985) 13–17.

[2] G. Frederickson, D. Johnson, Generalized selection and ranking: Sorted matrices, SIAM Journal on Computing. 13 (1984) 14–30 10.1137/0213002.

Posted by Chao Xu on 2014-04-02.
Tags: .