# Computing the weighted h-index

A common algorithm problem is that given a sequence of numbers, find a h-index. Where h-index is the largest integer h such there are at least h integers in the sequence is at least as large as h.

Formally, we have the following problem.

Given a_1,\ldots,a_n, find the largest h, such that |\set{i \mid a_i\geq h}|\geq h.

The h-index problem is featured in leetcode.

If we the numbers are sorted, then a trivial O(n) time algorithm exists. If it is not sorted, then note that we can solve the problem on \min(a_1,n),\ldots,\min(a_n,n). In this case, the input numbers is at most n, therefore can be sorted in O(n) time. Hence the total running time is O(n).

Consider a weighted version of the problem where the above algorithm does not work.

Given a sequence of pairs of non-negative positive reals (w_1,a_1),\ldots,(w_n,a_n). Find the largest h\in \R, such that \sum_{i:a_i\geq h} w_i \geq h.

An O(n) time algorithm still exists. For simplicity, we assume all a_i's are distinct, so the input is a set. The case where a_i's are not distinct is left as an exercise to the reader.

Define f(t) = \sum_{i:a_i\geq t} w_i. We want to find the largest t such that f(t)\geq t. First, we can find the median of a_1,\ldots,a_n, say t. If f(t) < t, then we recurse on \set{(w_i-f(t),a_i) \mid a_i< t}. Assume the optimum in the recursed solution is t', we return t'+f(t) as the solution. If f(t)\geq t, then we recurse and output the solution with input \set{(w_i,a_i) \mid a_i\geq t}. The running time satisfies T(n)=T(n/2)+O(n), which is O(n).