Search in a sorted matrix with an oracle

Consider a infinite matrix M. Another way to think about it is a function f:\N\to \N\to X. A matrix M is sorted if M_{i,j}\leq M_{i,j+1} and M_{i,j}\leq M_{i+1,j}.

Problem1

Given a sorted matrix M, and an oracle that takes \lambda returns if a value is \lambda <\lambda^* or \lambda \geq \lambda^*. Find the largest value no larger than \lambda^*.

Assuming there are at most k elements no larger than \lambda^*, and we know the smallest n and m such that M_{i,j}>\lambda^* if i>n or j>m. Also, let t be the smallest number such that M_{t,t}>\lambda^*. One can see that t\leq \min(n,m) and k=O(\max(n,m)^2).

Let's first consider the case when n and m is known and n\leq m. It is Leetcode 240. Search a 2D Matrix II. However, our problem is more general, because comparison with \lambda^* can only be done through the oracle. Craig Gidney wrote about an optimal algorithm with O(n\log \frac{m}{n}) running time, matrix access algorithm. However, the oracle access is too large. There are times where the oracle access is slow. For example, when using it as a subroutine for finding a bottleneck k-link path. There is an algorithm with optimal running time and O(\log(nm)) oracle access.

Let's consider a special case, where n=m=2^k for some k. This case was shown in [1]. For each submatrix, the two vertices on the opposite diagonal indicates the largest and smallest element in the submatrix. Hence each matrix can be represented by two numbers, indicate the maximum and minimum. These numbers are called the representative of the matrix. The idea is we keep two numbers \lambda_1 and \lambda_2, such that we know \lambda^*\in [\lambda_1,\lambda_2]. The algorithm keep partition the matrix into small matrices, updating \lambda_1 and \lambda_2, and discard matrices outside the range. We apply the following algorithm. Let R consists of the multiset of representatives of the matrix, and R' be the representatives that lies inside [\lambda_1,\lambda_2]. We find \lambda, the median of R'. Test if \lambda<\lambda^*. If so, then \lambda_1 updates to \lambda, otherwise \lambda_2 updates to \lambda. This step is done twice. Now, we split the matrices with more than one element into 4 equally sized matrices, and repeat the algorithm. Recall at all times, the matrices does not contain any element in [\lambda_1,\lambda_2] are discarded.

There is at most O(\log n) iterations before range shrinks to a single element, hence at most O(\log n) oracle calls. The difficulty is to show that the overall time is only O(n). Intuitively, in each iteration we quadruple the number of matrices, but we half it by two calls to the oracle. Therefore in \log n steps we obtain roughly 2^{\log n}=O(n) matrices. However, at this point, the matrices are all singletons, and no more matrix can be created. We will only decrease the number of matrices by each oracle call. Careful reader can trace the whole argument in Lemma 2.1 of [1].

For the more general case, one can find the proof in [2]. Note the proof is for selection, but one can easily modify it to work for search.

Now, n and m is not known, but we can quickly using exponential search to find it. Indeed, we just have to apply exponential search in the first row and first column using the oracle. This gives us an extra O(\log n + \log m)=O(\log nm) oracle calls.

Actually, better running time is possible. Use exponential search until we find the first i such that M_{2^i,2^i}>\lambda^*. So we can upper bound t. Then one can solve the problem with 2 matrices. One t\times k matrix and a k\times t matrix. The total running time is therefore O(\log k+t\log k/t)=O(t\log k). In fact, we get O(\log k) oracle calls and O(t\log k) running time. Note t will never approach k. The running time is O(t\log k).

1 Remark

There is an alternative algorithm which can be found in [3]. The alternative algorithm is quite close to a post about selection in a sorted matrix.

The careful reader might observe the known search algorithms follow the exact same structure as algorithms for selection. Indeed, we are doing selection but we do not know the rank of the element. Intuitively, many selection algorithm, the rank is only used to remove the correct set of candidates. Hence this suggest one can modify the algorithm to use the oracle call in place of the rank.

References

[1] G.N. Frederickson, S. Zhou, Optimal parametric search for path and tree partitioning, CoRR. abs/1711.00599 (2017).

[2] G. Frederickson, D. Johnson, Generalized selection and ranking: Sorted matrices, SIAM Journal on Computing. 13 (1984) 14–30 10.1137/0213002.

[3] R. Jacob, Binary search on two-dimensional data, Technische Universität München, 2008.

Posted by Chao Xu on 2019-01-30.
Tags: algorithm, data structure.