 # Sum over products of weighted subset of certain size

Consider a commutative semiring $(R,+,\cdot)$. $\mathbb{0}$ is the identity for $(R,+)$, and $\mathbb{1}$ is the identity for $(R,\cdot)$. Let $f,g:V\to R$, $w:V\to \mathbb{N}$ and $Z\subset \mathbb{N}$. It is common that we are interested in computing expressions of the following form.

$\sum_{S\subset V, \sum_{x\in S} w(x) \in Z} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x)$

Examples:

1. If $w(x)=1$ for all $x$, and $f(x)$ be the probability that event $x$ occurs, $g=1-f$, we find the probability that the number of event occurs $t$ times, where $t\in Z$. In probability, this is computing the Poisson distribution.

2. If $(R,+,\cdot) = (\N,+,\cdot)$, $f=g=1$, for all $x$ and $w(x)=x$ and $V\subset \N$ and $Z=\{t\}$, then we find the number of subsets that have element sum $t$.

3. If $(R,+,\cdot) = (\N,\max,+)$, $V\subset \N$, $g=0$ and $Z=\{0,\ldots,W\}$, then this solves the knapsack problem with knapsack size $W$, value $f$ and cost $w$.

4. An actual application inspired this post: An automated test suite that runs $n$ subtests, and it is allowed to rerun a subtest if it fails the first time. A subtest passes if first run passes or the rerun passes. The test is successful if all the subtests passes and the number of total reruns is at most $k$. Assume probability of passing is independent for each subtest. One want to estimate the probability of a successful test given the probability a run passes for a specific subtest.

Let $\max Z = k$ and $|V| = n$. The naive algorithm runs in $O(n2^n)$ time (assuming semiring operation takes $O(1)$ time). There is a common transformation that turns this problem that sum over all subsets to a problem that sums over $Z$. So it runs in $O(nk)$ time.

Let $V=\{v_1,\ldots,v_n\}$ and $V_j = \{v_1,\ldots,v_j\}$. Define $D(i,j) = \sum_{S\subset V_j, \sum_{x\in S} w(x) = i} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x)$.

Certainly, $\sum_{S\subset V, \sum_{x\in S} w(x) \in Z} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x) = \sum_{i\in Z} D(i,n)$

We only incur a $O(k)$ number of semiring operations once we compute all $D(i,n)$ for $0\leq i\leq k$.

Let $[P]$ be the Iverson bracket notation, namely

$[P] = \begin{cases} \mathbb{1} & \text{if } P \text{ is true;}\\ \mathbb{0} & \text{otherwise.} \end{cases}$

Theorem1
1. $D(i,0) = [i \neq 0]$
2. For $j\geq 1$, $D(i,j) = [i\geq w(v_j)] f(v_j)D(i-w(v_j),j-1) + g(v_j) D(i,j-1)$.
Proof

The base case can be verified easily, we show part of a inductive step.

\begin{aligned} f(v_j)D(i-w(v_j),j-1) + g(v_j)D(i,j-1) &= f(v_j) \sum_{S\subset V_{j-1}, \sum_{x\in S} w(x) = i-w(v_j)} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x) + g(v_j) \sum_{S\subset V_{j-1}, \sum_{x\in S} w(x) = i)} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x) \\ &= \sum_{v_j\in S\subset V_j, \sum_{x\in S} w(x) = i} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x) + \sum_{v_j\not\in S\subset V_j, \sum_{x\in S} w(x) = i} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x)\\ &= \sum_{S\subset V_j, \sum_{x\in S} w(x) = i} \prod_{x\in S} f(x) \prod_{x\in V\backslash S} g(x) \end{aligned}

Posted by Chao Xu on .
Tags: algorithm.