Two problem related to sequence of sets

Problem1

Given a sequence of sets $S_{1}, \dots, S_{n}$ with a total of $m$ elements. Partition $[n]$ , such that if $i, j$ is in the same partition class, then $S_{i} = S_{j}$ .

Solve the problem by building a trie over the lexicographic ordering of the elements in the set. Since the alphabet has size $n$ , it has running time $O (m lo g n)$ . One can get better running time using integer data structures, say $O (m lo g lo g n)$ using van Emde Boas tree.

$O (m)$ time is actually possible. For each $k$ , we build the set $H_{k} = {j ∣ k \in S_{j}}$ (as a list). We define equivalent relation $\equiv_{k}$ as $i \equiv_{k} j$ if $S_{i} \cap [k] = S_{j} \cap [k]$ . If we have equivalent class of $\equiv_{k}$ , we can obtain the equivalent class of $\equiv_{k + 1}$ in $O (∣ H_{k} ∣)$ time. Hence together the running time is $O (m)$ .

Problem2

Given a sequence of sets $S_{1}, \dots, S_{n}$ containing a total of $m$ integers, and a integer $k$ . Decide if there exists $i$ and $j$ such that $i \neq = j$ and $∣ S_{i} \cap S_{j} ∣ \geq k$ .

We assume the elements in the sets are in $[m]$ . Let $S = ⋃_{i = 1}^{n} S_{i}$ .

For $k = 0, 1$ , we can solve it in $O (m)$ time: Decide if any element appears more than once in the sets.

For larger $k$ , we shall compute $∣ S_{i} \cap S_{j} ∣$ for every pair $i$ and $j$ . To do this, we start with an all zero $n \times n$ matrix $C$ . At the end of the algorithm, $C_{i, j} = ∣ S_{i} \cap S_{j} ∣$ for all $i, j \in [n]$ . For each element $x$ , we find $E_{x} = {i ∣ x \in S_{i}}$ . This takes $O (m)$ time. We increment $C_{i, j}$ for all $i, j \in E_{x}$ . We claim this algorithm have running time $O (nm)$ . Indeed, for each $x$ , we spend $∣ E_{x} ∣$ time in incrementing $C_{i, j}$ where $i, j \in E_{x}$ . Hence the running time is bounded by $\sum_{x \in S} ∣ E_{x} ∣^{2}$ . We know $\sum_{x \in S} ∣ E_{x} ∣ = m$ and $∣ E_{x} ∣ \leq n$ . We see the worst case is when $∣ E_{x} ∣ = n$ and $∣ S ∣ = m / n$ . In that case, we have running time $O (\sum_{x \in S} n^{2}) = O (mn)$ .

Since we just want to find a pair ${i, j}$ where $∣ S_{i} \cap S_{j} ∣ \geq k$ . We can stop the algorithm as soon as $C_{i, j} \geq k$ for some $i$ and $j$ . This means we can increment at most $(k - 1) n^{2}$ times.

Together, the running time become $O (min (nm, k n^{2} + m))$ .

For $k = 2$ . One can improve the running time when $n$ is large by reduce it to a problem similar to finding rectangles or finding a $C_{4}$ in the incident graph. Let $n^{'}$ be $∣ ⋃_{i} S_{i} ∣$ , we can obtain a more refined bound. Together, the final running time for $k = 2$ is $O (min (m^{4/3}, d m, n^{2} + m))$ . Here $d$ is the degeneracy of the incident graph of the sets and the elements, which is bounded above by the maximum degree.

Recently, I had some result for $k = 3$ , where the running time improves to $O (m^{28/15})$ .

Posted by Chao Xu on 2015-02-08.

Tags: algorithm.