论文标题
体育数据中基于Pagerank的排名方法的限制
Limits of PageRank-based ranking methods in sports data
论文作者
论文摘要
尽管Pagerank已被广泛用于对体育比赛的参与者(团队或个人)进行排名,但它比简单排名的方法的优越性从未被清楚地证明。我们使用来自18个主要联赛的运动结果来校准用于合成运动结果的最先进模型。然后,模型数据用于评估Pagerank在受控设置中的排名表现。我们发现,Pagerank的表现优于基准排名,只有在玩了所有游戏的一小部分时,Pagerank的排名就以胜利的数量。数据的随机性增加,例如结果的内在随机性或主队优势,进一步降低了Pagerank的优势范围。我们提出了一个新的Pagerank变体,该变体在所有评估的设置中都胜过Pagerank,但对数据中随机性的增加具有敏感性。我们的主要发现是通过评估实际数据上的排名算法来确认的。我们的工作证明了使用新颖的指标和算法的危险,而无需考虑它们的适用性限制。
While PageRank has been extensively used to rank sport tournament participants (teams or individuals), its superiority over simpler ranking methods has been never clearly demonstrated. We use sports results from 18 major leagues to calibrate a state-of-art model for synthetic sports results. Model data are then used to assess the ranking performance of PageRank in a controlled setting. We find that PageRank outperforms the benchmark ranking by the number of wins only when a small fraction of all games have been played. Increased randomness in the data, such as intrinsic randomness of outcomes or advantage of home teams, further reduces the range of PageRank's superiority. We propose a new PageRank variant which outperforms PageRank in all evaluated settings, yet shares its sensitivity to increased randomness in the data. Our main findings are confirmed by evaluating the ranking algorithms on real data. Our work demonstrates the danger of using novel metrics and algorithms without considering their limits of applicability.