A Reviewer Assignment Problem

Consider there are some reviewers and some papers. Each reviewer can review exactly one paper, and each reviewer is qualified to review some subset of papers. We are interested in maximizing the number of papers reviewed by at least $k$ reviewer, then under that constraint, maximize the paper reviewed by $k + 1$ reviewer, etc. This make sure we are being fair in evaluating papers. It would try to avoid the case where most paper getting small number of reviews and a few papers getting unreasonable number of reviews.

Formally, we are given a bipartite graph $G = (A, B, E)$ of $n$ vertices and $m$ edges. A subset of edges $M$ is called a semi-matching, if $de g_{M} (v) = 1$ for all $v \in A$ . For a subset of edges $M$ , let $g_{M} (i)$ to be the number of vertices in $B$ with degree at least $i$ . We want to find a semi-matching $M$ , such that $(g_{M} (k), g_{M} (k + 1), \dots, g_{M} (n))$ is lexicographically maximum.

When $k = 1$ , if $M$ minimizes the sum of the function $\sum_{v \in B} f (de g_{M} (v))$ for any strictly convex increasing function $f$ , then $(g_{M} (1), g_{M} (2), \dots, g_{M} (n))$ is lexicographically maximum. The problem therefore can be reduced to min-cost flow can be applied here directly, and obtain a polynomial time algorithm [1].

When $k = 3$ , the problem is NP-hard, since it would imply we have to maximized $g_{M} (3)$ , and this is already NP-hard because exact cover by $3$ -sets. That is, given a collection of sets of size $3$ each. Decide if there exists a subcollection that forms a partition of the universe.

The only unresolved case is $k = 2$ . Interestingly, we can show it is also polynomial time solvable. First, one can see that maximizing $(g_{M} (2), g_{M} (3), \dots, g_{M} (n))$ is equivalent to minimize $\sum_{v, d e g_{M} (v) \geq 2} f (de g_{M} (v))$ for some strictly convex increasing function $v$ , and $M$ range through all semi-matchings so each vertex in $B$ has degree exactly $2$ (Assuming it exists).

Apollonio and Sebő shown the following problem is polynomial time solvable [2].

Problem1

Given a graph $G = (V, E)$ , a integer $k$ , convex functions $f_{v} : N \to R$ for each $v \in V$ , and an edge cost function $c : E \to R$ . One can find the following in polynomial time. $min {v \in V \sum f_{v} (de g_{M} (v)) + e \in M \sum c (e) M \subseteq E, ∣ M ∣ = k}$

It’s not hard to generalize it a bit further by requiring $M$ to respect some upper and lower bound on the vertices. Indeed, we can let $f_{v} : N \to R \cup {\infty}$ , and set $f_{v} (x) = \infty$ if $x$ is not between the upper and lower bounds.

Now, we are going to reduce the problem. The reduction is similar to the one for $2$ -or- $0$ matching.

For each vertex $v \in B$ , split into two vertices $v_{1}$ and $v_{2}$ . Define $B_{i} = {v_{1} ∣ v \in B}$ . The new input graph consists of vertices $B_{1} \cup B_{2} \cup A$ . $v_{1}$ and $v_{2}$ connects to the same vertices as $v$ . We add an edge between $v_{1}$ and $v_{2}$ , with very large cost $C$ . Say $C = m n^{2} + 1$ . $v_{1}$ has both an upper and lower bound of $1$ . $v_{2}$ has a lower bound of $1$ . For each vertex in $A$ , add an upper and lower bound of $1$ . We have a strict convex function $f_{v_{2}} (x) = x^{2}$ on each vertex $v_{2}$ .

Let $r = ∣ A ∣$ , $p = ∣ B ∣$ . We solve Problem 1 repeatedly for each $k$ from $r$ to $r + p$ .

Say there exists an optimal solution to the original problem with exactly $t$ vertices in $B$ with degree smaller than $2$ . Find the optimal solution to the new problem with $k = r + t$ . Let it be $M^{'}$ . We obtain $M$ from $M^{'}$ by identify pairs of vertices $v_{1}$ and $v_{2}$ . $M$ would be the solution to the original problem.

1 Extensions

I first heard of the problem from [3], where they focused on $k = 1$ case, but the reviewer can review more than one paper. This case can be handled easily. We can add degree upper and lower bounds to all vertices, and only look at subgraphs in $M$ that satisfies the upper and lower bounds. That is, we can also make sure no reviewers review too many papers too. Under that constraint, find $(g_{M} (k), \dots, g_{M} (n))$ lexicographically. This is possible but more tricky, as we have to do some reduction from capacitated $b$ -matching to $b$ -matching.

There is a little more generalization. Assume for each paper, we have a lower bound of reviews $d_{v}$ . That is, it has to be reviewed by at least $d_{v}$ person to be useful. So translating to the graph case, we can impose the constraint that $de g_{M} (v) = 0$ or $de g_{M} (v) \geq d_{v}$ . One can see maximizing $(g_{M} (2), g_{M} (3), \dots, g_{M} (n))$ is equivalent to maximizing $(g_{M} (1), g_{M} (2), \dots, g_{M} (n))$ where $d_{v} = 2$ for all vertices. Again, one can modify the reduction to handle the case when $d_{v}$ is either $1$ or $2$ .

References

[1]

N.J.A. Harvey, R.E. Ladner, L. Lovász, T. Tamir, Semi-matchings for bipartite graphs and load balancing, Journal of Algorithms. 59 (2006) 53–78 10.1016/j.jalgor.2005.01.003.

[2]

N. Apollonio, A. Sebő, Minconvex factors of prescribed size in graphs, SIAM Journal on Discrete Mathematics. 23 (2009) 1297–1310 10.1137/060651136.

[3]

A. Yeşilçimen, E.A. Yildirim, An alternative polynomial-sized formulation and an optimization based heuristic for the reviewer assignment problem, European Journal of Operational Research. (2019) 10.1016/j.ejor.2019.01.035.

Posted by Chao Xu on 2019-01-29.

Tags: algorithm, matching.