The Art Gallery Guardian

Computing the weighted h-index

A common algorithm problem is that given a sequence of numbers, find a h-index. Where h-index is the largest integer hh such there are at least hh integers in the sequence is at least as large as hh.

Formally, we have the following problem.


Given a1,,ana_1,\ldots,a_n, find the largest hh, such that {iaih}h|\set{i \mid a_i\geq h}|\geq h.

The h-index problem is featured in leetcode.

If we the numbers are sorted, then a trivial O(n)O(n) time algorithm exists. If it is not sorted, then note that we can solve the problem on min(a1,n),,min(an,n)\min(a_1,n),\ldots,\min(a_n,n). In this case, the input numbers is at most nn, therefore can be sorted in O(n)O(n) time. Hence the total running time is O(n)O(n).

Consider a weighted version of the problem where the above algorithm does not work.


Given a sequence of pairs of non-negative positive reals (w1,a1),,(wn,an)(w_1,a_1),\ldots,(w_n,a_n). Find the largest hRh\in \R, such that i:aihwih\sum_{i:a_i\geq h} w_i \geq h.

An O(n)O(n) time algorithm still exists. For simplicity, we assume all aia_i’s are distinct, so the input is a set. The case where aia_i’s are not distinct is left as an exercise to the reader.

Define f(t)=i:aitwif(t) = \sum_{i:a_i\geq t} w_i. We want to find the largest tt such that f(t)tf(t)\geq t. First, we can find the median of a1,,ana_1,\ldots,a_n, say tt. If f(t)<tf(t) < t, then we recurse on {(wif(t),ai)ai<t}\set{(w_i-f(t),a_i) \mid a_i< t}. Assume the optimum in the recursed solution is tt', we return t+f(t)t'+f(t) as the solution. If f(t)tf(t)\geq t, then we recurse and output the solution with input {(wi,ai)ait}\set{(w_i,a_i) \mid a_i\geq t}. The running time satisfies T(n)=T(n/2)+O(n)T(n)=T(n/2)+O(n), which is O(n)O(n).

Posted by Chao Xu on .
Tags: algorithms.