Subset sum through balancing

This is a note for Pisinger’s balancing algorithm for subset sum [1]. Let $S$ be the set of all subset sums of $S$ . The subset sum problem, the input is $S \subset N$ , and we are interested in checking if $t \in S$ .

We define a variation of the subset sum problem. The balanced subset sum problem. In this problem, we are given a vector $v$ of integers(does not have to be positive). We let $M = ∥ v ∥_{\infty}$ . We are interested in find a subset that sums to $t \in [M]$ .

Theorem1

Each subset sum problem on $n$ elements can be reduced to a balanced subset sum problem in $n$ elements in $O (n)$ time.

Proof

Consider the input to the subset sum problem $S$ and $t$ . Greedily find a subset of elements $S^{'}$ , such that adding any other element will exceed $t$ . Let $∥ S^{'} ∥_{1} = t^{'}$ . Now, we negate all the elements in $S^{'}$ , and ask for balanced subset sum with input set $- S^{'} \cup (S ∖ S^{'})$ and target number $t - t^{'}$ .

We partition $S$ into $A = [- M ..0] \cap S$ and $B = S ∖ A$ . We also define $A_{i} = {a_{1}, \dots, a_{i}}$ and $B_{i} = {b_{1}, \dots, b_{i}}$ .

A set is balanced by the following recursive definition. Let $S$ be a set.

$S = \emptyset$ is balanced.
$∥ S ∥_{1} > t$ , then $S \cup {a}$ is balanced, where $a \in A$ .
$∥ S ∥_{1} \leq t$ , then $S \cup {b}$ is balanced, where $b \in B$ .

Consider a set $T$ , such that $(i, j, k) \in T$ if and only if $k$ is a subset sum of $A_{i} \cup B_{j}$ . Certainly, we are interested if $(∣ A ∣, ∣ B ∣, t)$ is in $T$ . However, the state space is already $O (n^{2} M)$ , which is no better than the standard dynamic programming algorithm.

There is a nice dominance relation. If $(i, j, k) \in T$ , then for $(i^{'}, j^{'}) \geq (i, j)$ , we have $(i^{'}, j^{'}, k) \in T$ . We can ask for each $k$ , what are all the minimal $(i, j)$ pairs where $(i, j, k) \in T$ . Such value will be $g (j, k)$ . Formally, $g (j, k) = min {i ∣ (i, j, k) \in T}$ , one can see that $g (j, k) \geq g (j + 1, k)$ . Also, we know the solution corresponding to $g (j, k)$ must contain $a_{g (j, k)}$ as an element.

One can get a recurrence relation for $g$ as below.

$g (j, k) = min ⎩ ⎨ ⎧ g (j - 1, k) g (j - 1, k - b_{j}) i if k - b_{j} \leq t if k - a_{i} > t and i > g (j, k - a_{i})$

Fix a $k$ and $j$ , let $i$ to be as small as possible, such that there is $A_{i}^{'} \subset A_{i}$ and $B_{j}^{'} \subset B_{j}$ such that $A_{i}^{'} \cup B_{j}^{'}$ is balanced and sums to $k$ . Note that $a_{i} \in A_{i}^{'}$ .

We obtained $A_{i}^{'} \cup B_{j}^{'}$ by inserting an element in $B$ or $A$ to another balanced set. If the inserted element is in $B$ , but not $b_{j}$ , then we know $i = g (j - 1, k)$ . If it is $b_{j}$ , then $i = g (j - 1, k - b_{j})$ . If the last inserted is $a_{i}$ , then $g (j, k) = i$ . Note we observe in this case, $g (j, k - a_{i}) < i$ . A direct dynamic programming implementation seems to imply a $O (n^{2} M)$ time algorithm, since there does not seem to be a quick way to obtain $i$ .

On the other hand, if we think bottom up instead of top down, we can obtain a better result. Below is the algorithm.

The value $D [j, k]$ eventually equals $g (j, k)$ . Note we can argue the running time is $O (n M)$ , since for each fixed $k$ , the final for loop can ran at most $n$ times. It is frustrating that the DP algorithm cannot be inferred directly from the recurrence relation. Indeed, we mainly obtained this through the fact that we can prune the search space if we start bottom up, which is unclear from the recurrence relation.

References

[1]

D. Pisinger, Linear time algorithms for knapsack problems with bounded weights, Journal of Algorithms. 33 (1999) 1–14 10.1006/jagm.1999.1034.

Posted by Chao Xu on 2018-12-18.

Tags: algorithms, subset sum.