# Computing the weighted h-index

A common algorithm problem is that given a sequence of numbers, find a h-index. Where h-index is the largest integer \(h\) such there are at least \(h\) integers in the sequence is at least as large as \(h\).

Formally, we have the following problem.

Given \(a_1,\ldots,a_n\), find the largest \(h\), such that \(|\set{i \mid a_i\geq h}|\geq h\).

The h-index problem is featured in leetcode.

If we the numbers are sorted, then a trivial \(O(n)\) time algorithm exists. If it is not sorted, then note that we can solve the problem on \(\min(a_1,n),\ldots,\min(a_n,n)\). In this case, the input numbers is at most \(n\), therefore can be sorted in \(O(n)\) time. Hence the total running time is \(O(n)\).

Consider a weighted version of the problem where the above algorithm does not work.

Given a sequence of pairs of non-negative positive reals \((w_1,a_1),\ldots,(w_n,a_n)\). Find the largest \(h\in \R\), such that \(\sum_{i:a_i\geq h} w_i \geq h\).

An \(O(n)\) time algorithm still exists. For simplicity, we assume all \(a_i\)'s are distinct, so the input is a set. The case where \(a_i\)'s are not distinct is left as an exercise to the reader.

Define \(f(t) = \sum_{i:a_i\geq t} w_i\). We want to find the largest \(t\) such that \(f(t)\geq t\). First, we can find the median of \(a_1,\ldots,a_n\), say \(t\). If \(f(t) < t\), then we recurse on \(\set{(w_i-f(t),a_i) \mid a_i< t}\). Assume the optimum in the recursed solution is \(t'\), we return \(t'+f(t)\) as the solution. If \(f(t)\geq t\), then we recurse and output the solution with input \(\set{(w_i,a_i) \mid a_i\geq t}\). The running time satisfies \(T(n)=T(n/2)+O(n)\), which is \(O(n)\).