The Art Gallery Guardian

Two problem related to sequence of sets


Problem1

Given a sequence of sets S1,,SnS_1,\ldots,S_n with a total of mm elements. Partition [n][n], such that if i,ji,j is in the same partition class, then Si=SjS_i = S_j.

Solve the problem by building a trie over the lexicographic ordering of the elements in the set. Since the alphabet has size nn, it has running time O(mlogn)O(m\log n). One can get better running time using integer data structures, say O(mloglogn)O(m\log \log n) using van Emde Boas tree.

O(m)O(m) time is actually possible. For each kk, we build the set Hk={jkSj}H_k = \set{j | k\in S_j} (as a list). We define equivalent relation k\equiv_k as ikji\equiv_k j if Si[k]=Sj[k]S_i\cap [k] =S_j\cap [k]. If we have equivalent class of k\equiv_k, we can obtain the equivalent class of k+1\equiv_{k+1} in O(Hk)O(|H_k|) time. Hence together the running time is O(m)O(m).

Problem2

Given a sequence of sets S1,,SnS_1,\ldots,S_n containing a total of mm integers, and a integer kk. Decide if there exists ii and jj such that iji\neq j and SiSjk|S_i\cap S_j|\geq k.

We assume the elements in the sets are in [m][m]. Let S=i=1nSiS=\bigcup_{i=1}^n S_i.

For k=0,1k=0,1, we can solve it in O(m)O(m) time: Decide if any element appears more than once in the sets.

For larger kk, we shall compute SiSj|S_i\cap S_j| for every pair ii and jj. To do this, we start with an all zero n×nn\times n matrix CC. At the end of the algorithm, Ci,j=SiSjC_{i,j} = |S_i\cap S_j| for all i,j[n]i,j\in [n]. For each element xx, we find Ex={ixSi}E_x = \set{i|x\in S_i}. This takes O(m)O(m) time. We increment Ci,jC_{i,j} for all i,jExi,j\in E_x. We claim this algorithm have running time O(nm)O(nm). Indeed, for each xx, we spend Ex|E_x| time in incrementing Ci,jC_{i,j} where i,jExi,j\in E_x. Hence the running time is bounded by xSEx2\sum_{x\in S} |E_x|^2. We know xSEx=m\sum_{x\in S} |E_x|=m and Exn|E_x|\leq n. We see the worst case is when Ex=n|E_x|=n and S=m/n|S|=m/n. In that case, we have running time O(xSn2)=O(mn)O(\sum_{x\in S} n^2)=O(mn).

Since we just want to find a pair {i,j}\set{i,j} where SiSjk|S_i\cap S_j|\geq k. We can stop the algorithm as soon as Ci,jkC_{i,j}\geq k for some ii and jj. This means we can increment at most (k1)n2(k-1)n^2 times.

Together, the running time become O(min(nm,kn2+m))O(\min(nm,k n^2+m)).

For k=2k=2. One can improve the running time when nn is large by reduce it to a problem similar to finding rectangles or finding a C4C_4 in the incident graph. Let nn' be iSi|\bigcup_i S_i|, we can obtain a more refined bound. Together, the final running time for k=2k=2 is O(min(m4/3,dm,n2+m))O(\min(m^{4/3}, dm, n^2+m)). Here dd is the degeneracy of the incident graph of the sets and the elements, which is bounded above by the maximum degree.

Recently, I had some result for k=3k=3, where the running time improves to O(m28/15)O(m^{28/15}).

Posted by Chao Xu on .
Tags: algorithm.