# List the smallest $$k$$ subset sums

Problem1

Given a set of positive reals $$\set{x_1,\ldots,x_n}$$ where $$x_1<x_2<\ldots<x_n$$, find the smallest $$k$$ subset sums.

We can assume $$n\leq k$$, because we do not have to read $$x_j$$ if $$j>k$$.

Torsten Gross and Nils Blüthgen posted a $$O(k^2)$$ time solution on arXiv.

We show a $$O(k\log k)$$ time algorithm, which is optimal if we want to output the numbers in order.

We list the sums one by one by maintaining a priority queue of sums. We start with the empty set. Assume that we added the sum induced by $$I\subset [n]$$ (that is, $$\sum_{i\in I} x_i$$) into the output, let $$j=1+\max I$$. Now we can consider two possibilities by extending the current solution: the sum induced by $$I\cup \set{j}$$ or the sum induced by $$I\cup \set{k}$$ where $$k>j$$. We will add both possibilities to the queue so that one can inspect them later. We can avoid storing the sets, only the values are required.

Here is a python implementation.

def first_k_subset_sums(x,k):
n = len(x)
h = []
output = [0] # need to account for the empty set
heapq.heappush(h,(x[0],0))
while h and len(output)<k:
(u,b) = heapq.heappop(h)
output.append(u)
if b+1<n:
heapq.heappush(h,(u+x[b+1],b+1))
heapq.heappush(h,((u-x[b])+x[b+1],b+1))
return output

If we want to output the sets themselves, not just the values, does the running time change? If a set $$I$$ is in the output, then all subsets of $$I$$ must also be in the output. Hence the largest set we can ever output has size $$O(\log k)$$. Therefore the total output length is at most $$O(k\log k)$$.

This is also a lower bound. Consider when $$x_i=2^i$$, then we will output all subsets of $$\set{x_1,\ldots,x_{\log k}}$$, and we know that $$\sum_{i=1}^{\log k} i{\log k\choose i} = \Omega(\log k)$$.

If we don't have to list the smallest $$k$$ subset sum values in order, then $$O(k)$$ is possible, see this mathoverflow answer by David Eppstein.

If we are interested in the smallest $$k$$ distinct subset sum. I don't know of any algorithm that performs better than $$O(nk)$$, even if we know that $$n=\Omega(k)$$.

# Acknowledgements

I would like to thank Tana Wattanawaroon for helpful discussions and taking an interest in this problem.

Posted by Chao Xu on 2017-04-20.
Tags: algorithm.