Applications of finger trees

1 Finger trees

Consider a data structure $S$ that maintains a sequence of monoid elements from $(M, \cdot)$ . Say, the sequence is $a_{1}, \dots, a_{n}$ . It has four operations. All time are amortized and not worst case.

$s pl i t (S, p)$ , $p : M \to {0, 1}$ is a monotonic predicate on $M$ , i.e. $p (ab) \geq p (a)$ . It split the sequence at position $t$ , such that $p (\prod_{i = 1}^{j} a_{i}) = 0$ for all $j < t$ , and $p (\prod_{i = 1}^{j} a_{i}) = 1$ for all $j \geq t$ . This takes $O (lo g min (n, n - t))$ time.
$co n c a t (S, T)$ , concatenate sequence represented by $S$ and the sequence represented by $T$ . If the length of the sequences are $n$ and $m$ , respectively, then this operation takes $O (lo g min (n, m))$ time.
$p ro d u c t (S)$ , returns $\prod_{i = 1}^{n} a_{i}$ in $O (1)$ time.
$ma p (S, f)$ , apply $f$ to every entry in the sequence. So $S$ would represent $f (a_{1}), \dots, f (a_{n})$ . This operation takes $O (1)$ time.

There is one operation $E m pt y ()$ that produce an empty sequence.

We have to first make a few assumptions: All monoid operations take $O (1)$ time. All map we consider can be evaluated in $O (1)$ time AND we can compute $h = f \circ g$ in $O (1)$ time. Note this means we compute (a representation) of the function $h$ itself, and we can use this representation to compute $h (x)$ in $O (1)$ time.

Otherwise, all “ $O (T)$ time” should be replaced with “ $O (T)$ monoid operation, function evaluation, function composition and constant time operations.”.

Such data structure exists. The first $3$ operation are supported by finger tree [1]. It is not hard to add the last operation, the idea is to tag nodes with an function $f$ (in the beginning $f$ is just the identity), and it means “apply $f$ to everything below!”. It propagate only when the lower level nodes need to be accessed, therefore the cost would be charged into the other operations. Many other binary search trees probably can implement these operations too. For example, if the dynamic optimality conjecture for splay tree is true, it be best to use splay tree for implementation. However, that’s beside the point. The point is to have an abstract data structure over a monoid sequence. The abstraction deals away with all the mess hidden by the data structure.

2 Extend its power

Finger tree can be extended to query elements by giving indices of the element: take the Cartesian product of the monoid and $(N, +)$ , and we get a new monoid. So we can just assume our data structure has the following extensions:

$s pl i t A t (S, i)$ , split the sequence to two sequence $A$ and $B$ at the $i$ th index in $O (lo g min (n, n - i))$ time.
$in ser t A t (S, i, x)$ , insert element to the $i$ th position.
$d e l e t e (S, i)$ , delete the element at $i$ th position.

Another interesting extension is allowing one to $re v erse (S)$ the sequence in $O (lo g n)$ time. This become much more complicated.

3 Application

There are some common application of finger trees.

Stack, queue, dequeue with product operation, all amortized constant time. As a special case, it would solve the min stack problem.
Random access sequence, which is related to the rope data structure.
Ordered sequence(sorted list).
Priority queues.
Interval trees, segment trees.

Most of the above has been described in [1], but we will talk about two specifics that usually not mentioned by others.

3.1 Merge sorted lists in optimal time bound

Theorem1

If $S$ and $T$ are two ordered sequences of length $n$ and $m$ , respectively. $n \leq m$ . Both ordered sequences are represented by finger trees. Compute the finger tree representation of $S \cup T$ takes $O (n lo g \frac{m}{n})$ time.

Split $T$ into $n$ pieces one by one by split along the $i$ th element of $S$ to the second part of the $i$ th produced piece for all $i$ , then concatenate all of them. The splitting takes $\sum_{i = 1}^{n} lo g t_{i}$ time, where $t_{i}$ is the size of the $i$ th piece. By concavity, we have the time for split is $\sum_{i = 1}^{n} lo g t_{i} \leq n lo g \frac{m}{n}$ , and concatenation time is similar.

3.2 Solve the Klee’s measure problem

This section shows how the map operation is quite crucial because we can make range updates.

The motivation came from the following question. Can finger tree substitute for the popular competitive programming data structure segment tree, a special case of the real segment tree? This is not possible, and it’s not because the large hidden constants. The abstract definition in this article goes cannot do the following common operation on an sequence of integers: increment the $i$ th number by $i$ . This operation make little sense if we allow insert and deletions, but many use of segment tree do not consider insert and deletes.

Fortunately, Klee’s measure problem, the problem that caused the invention of segment tree can be solved with a finger tree. We show how to solve it in 1D. It’s well known how to use it for solving the $n$ D version.

We produce another data structure, such that each operation takes $O (lo g n)$ amortized time. Where $n$ is the number of times we called insert.

We maintain a collection of intervals with data structure $D$ , there are 3 operations.

$in ser t (D, a, b)$ , insert interval $(a, b)$ into the collection.
$d e l e t e (D, a, b)$ . Delete a interval $(a, b)$ from the collection, we assume it’s always a interval inserted before.
$m e a s u re (D)$ . Find the measure of the union of all intervals in the collection.

Remark

It can be sharpened such that $n$ is the number of intervals inside the data structure. We just do a global rebuild if the number of interval doubled or reduced to half since the last global update. Also, $m e a s u re ()$ actually only take constant time.

The finger tree stores a sequence of elementary intervals that partitions the space, and how many copies of that interval exists. Thus we can represent it as $((l, r), c)$ , where $(l, r)$ is the elementary interval with left and right boundary, and $c$ is the number of copies.

To insert a interval, consider it’s left point $l$ . The data structure find a elementary interval contains $l$ , and split it into two elementary intervals. Similarly, we do the same with the right endpoint. This takes $O (lo g n)$ time, and we increase the number of elementary intervals by at most $2$ . $in cre m e n t$ is an automorphism defined as $in cre m e n t (((a, b), c)) = ((a, b), c + 1)$ . For each insertion of interval $(a, b)$ , $in cre m e n t$ gets applied to all elementary interval in the range. For deletion, we apply $d ecre m e n t = in cre m e n t^{- 1}$ instead.

The monoid product is simply $\prod_{i = 1}^{n} ((l_{i}, r_{i}), c) = \sum_{i = 1}^{n} min (c, 1) (r_{i} - l_{i})$ , and this is exactly what $m e a s u re (D)$ should output.

3.3 The local ranking sequence

The ranking sequence of a sequence of distinct numbers $a_{1}, \dots, a_{n}$ is defined as $b_{1}, \dots, b_{n}$ , where $a_{i}$ is the $b_{i}$ th smallest element. Given a sequence of unique integers $a_{1}, \dots, a_{n}$ , we want to design a data structure to query $Q (i, j)$ , which returns the ranking sequence of $a_{i}, \dots, a_{j}$ .

This can be solved by storing sorted subsequences. We just have to return sorted sequence of $a_{i}, \dots, a_{j}$ during the query, and then it become simple to figure out the ranking sequence. This will make sure the running time for $Q (i, j)$ to be $O (j - i)$ . There are a few other variants, but seems quite hard to adopt a finger tree for [2].

References

[1]

R. Hinze, R. Paterson, Finger trees: A simple general-purpose data structure, J. Funct. Program. 16 (2006) 197–217 10.1017/S0956796805005769.

[2]

C.-J. Chang, K.-M. Chao, Efficient algorithms for local ranking, Information Processing Letters. 112 (2012) 517–522 10.1016/j.ipl.2012.03.011.

Posted by Chao Xu on 2014-11-22.

Tags: Data Structure, Algorithm.