Maximum sum $k$ -disjoint subarrays

1 Introduction

A common problem, the maximum subarray problem asks the subarray with the maximum sum.

There are many generalizations, for example into higher dimensions. In 2D, a $n \times n$ matrix, a common solution takes $O (n^{3})$ time[1,2]. There is no $O (n^{3 - ε})$ algorithm assuming All-Pairs Shortest Paths cannot be solved in $O (n^{3 - ε})$ for some constant $ε > 0$ [3].

Another way is to understand it as a graph problem. We are given a path, and there are weights on the vertices. We require a maximum weight connected subgraph. The problem is NP-hard even for planar graphs. However, it is solvable in polynomial time for bounded treewidth graphs [4].

We consider a similar problem where instead of a single subarray, we want at most $k$ disjoint subarrays, such that the sum together is maximized. In fact, this is the Maximum Subarray III problem on LintCode.

Problem1Maximum

k

-Disjoint Subarray Problem

Given array $A [1.. n]$ , find a non-decreasing sequence of indices $i_{1}, \dots, i_{2 k}$ , such that $\sum_{i = 1}^{k} \sum_{j = i_{2 i - 1}}^{2 i} A [j]$ is maximized.

There is obviously an $O (nk)$ algorithm by extending the dynamic programming algorithm for the $k = 1$ case.

2 Solutions

2.1 Dynamic Programming

Let $f (i, k)$ to be the maximum value obtainable by $k$ subarray of $A [1.. i]$ . Let $g (i, k)$ to be the maximum value obtainable by $k$ subarray of $A [1.. i]$ , that uses the $i$ th value in the last subarray. The recurrence $f (i, k) = min (f (i, k - 1), f (i - 1, k), g (i, k))$ , and $g (i, k) = min (g (i - 1, k) + A [i], f (i - 1, k - 1) + A [i])$ solves the problem.

2.2 Greedy

This kind of solution would possibly work on interviews. But can we do better? It is in fact possible to get $O (n lo g n)$ with some care.

wlog, let’s assume the array is alternating, where all odd index are positive and all even index are negative. If we have the solution for the $k$ case, we can get a solution for $k - 1$ case by either discard one of the arrays or “merge” two adjacent arrays by taking a negative piece in the middle.

This shows that once we have the solution for the $k$ case, we can just “contract” a entire subarray into one single value. Csűrös showed that we can just use one merge operation[5]. It find a array element with minimum absolute value, say it’s $A [i]$ , then it is replaced by $A [i - 1] + A [i] + A [i + 1]$ , and then we remove $A [i - 1]$ and $A [i + 1]$ from the array. (For boundary cases, assume $A [0] = A [n + 1] = 0$ ). The idea is a merge can “discard” a value, and a merge is also adding a negative piece and then do contraction. This operation is done until there are exactly $k$ positive numbers, which in that case, the best solution is to just take all $k$ of them.

Thus this implies a $O (n lo g n)$ greedy algorithm, by keep merging and keep track of min absolute value item using a heap. Interestingly, this algorithm was also suggested by students in CS 473. Hsien-Chih and I discovered it is correct by failing to find counterexamples to the greedy approach.

2.3 Speed up greedy

One can see the smallest absolute value does not decrease throughout the algorithm, so instead of just keep finding and merging the item with smallest absolute value, what if one just keep merge merge item with absolute value smaller than $t$ ? There are three possibilities: we picked $t$ so nicely that after all the merges, we get exactly $k$ positive elements left. We picked $t$ too large, we get less than $k$ positive elements. We picked $t$ too small, and we get more than $k$ positive elements.

Bengtsson and Chen uses this idea[6]. They showed they can guess $t$ in a way such that the some measure of the problem size gets smaller by at least $2/3$ , and also shows how to keep track of the merges so it takes $O (n α (n))$ time. Later on, they removed the need of the union-find data structure improved the time bound to the optimal $O (n)$ time [7].

2.4 Optimal running time reducing to $k = 1$ queries

There are other approaches to obtain the same running time. We can consider a query version of the problem when $k = 1$ . Given indices $i$ and $j$ , find indices $i^{'}$ and $j^{'}$ such that $i \leq i^{'} \leq j^{'} \leq j$ , and sum of the elements in $A [i^{'} .. j^{'}]$ is maximized. Chen and Chao showed how to use a data structure that can be built in $O (n)$ time, and return the solution to the above query in $O (1)$ time [8]. It is not a simple data structure. Gawrychowski and Nicholson showed such data structure can be used to solve the Problem 1 in $O (n)$ time [9]. The reduction is easy, but again the bottleneck is the heavy hammers to build the data structure.

2.5 A very simple solution

Recently, I’ve seen a truly simple result. A related problem is the following.

Problem2

Given array $B [1.. n]$ , find a non-decreasing sequence of indices $i_{1}, \dots, i_{2 k}$ , such that $\sum_{i = 1}^{k} B [i_{2 i}] - B [i_{2 i - 1}]$ is maximized.

This problem is featured in interviews, and is also on leetcode as Best time to buy an sell stock IV and showed up in codeforces as stock trading. Problem 1 and Problem 2 can be reduced to each other in linear time. For one direction, we can define $B [i] = \sum_{j = 1}^{i} A [j]$ . The other direction, we let $A [i] = B [i] - B [i - 1]$ for all $i$ . The editorial in codeforces showed a solution similar to [7] for Problem 2.

A surprising algorithm for Problem 2 was found by Zhiqing Xiao that claims to solve the problem in $O (n)$ time by building upon the observation of leetcode user yishiluo. Hence it shows Problem 1 can be solved in linear time, and the only (a bit) heavy hammer is the selection algorithm. Although the solution is simple, it is fairly unclear how to prove correctness. Fangrui Song wrote a better explanation in Chinese. Although it still does not fully prove correctness, it is a step toward a proof.

3 Open problem

Can we find a linear time algorithm for trees? (either weights on edges or on vertices)

References

[1]

H. Tamaki, T. Tokuyama, Algorithms for the maximum subarray problem based on matrix multiplication, in: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial; Applied Mathematics, Philadelphia, PA, USA, 1998: pp. 446–452.

[2]

T. Takaoka, Efficient algorithms for the maximum subarray problem by distance matrix multiplication, Electronic Notes in Theoretical Computer Science. 61 (2002) 191–200 10.1016/S1571-0661(04)00313-5.

[3]

A. Backurs, N. Dikkala, C. Tzamos, Tight Hardness Results for Maximum Weight Rectangles, in: I. Chatzigiannakis, M. Mitzenmacher, Y. Rabani, D. Sangiorgi (Eds.), 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2016: pp. 81:1–81:13 10.4230/LIPIcs.ICALP.2016.81.

[4]

E. Álvarez-Miranda, I. Ljubić, P. Mutzel, The maximum weight connected subgraph problem, in: M. Jünger, G. Reinelt (Eds.), Facets of Combinatorial Optimization: Festschrift for Martin Grötschel, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013: pp. 245–270 10.1007/978-3-642-38189-8_11.

[5]

M. Csürös, Maximum-scoring segment sets, IEEE/ACM Trans. Comput. Biology Bioinform. 1 (2004) 139–150 10.1109/TCBB.2004.43.

[6]

F. Bengtsson, J. Chen, Computing maximum-scoring segments in almost linear time, in: Computing and Combinatorics, 12th Annual International Conference, COCOON 2006, Taipei, Taiwan, August 15-18, 2006, Proceedings, 2006: pp. 255–264 10.1007/11809678_28.

[7]

F. Bengtsson, J. Chen, Computing maximum-scoring segments optimally, Luleå University of Technology, 2007.

[8]

K.-Y. Chen, K.-M. Chao, On the range maximum-sum segment query problem, Discrete Applied Mathematics. 155 (2007) 2043–2052 https://doi.org/10.1016/j.dam.2007.05.018.

[9]

P. Gawrychowski, P.K. Nicholson, Encodings of range maximum-sum segment queries and applications, in: F. Cicalese, E. Porat, U. Vaccaro (Eds.), Combinatorial Pattern Matching, Springer International Publishing, Cham, 2015: pp. 196–206.

Posted by Chao Xu on 2014-10-13.

Tags: classic, algorithm.

Maximum sum k-disjoint subarrays