论文标题
GiteVolve:预测GitHub存储库的演变
GitEvolve: Predicting the Evolution of GitHub Repositories
论文作者
论文摘要
与GitHub等平台的出现,软件开发变得越来越开放和协作。鉴于其至关重要的角色,有必要更好地理解和建模Github作为社交平台的动态。以前的工作主要考虑了传统社交网站(例如Twitter和Facebook)的动态。我们提出了GiteVolve,这是一个预测GitHub存储库的演变以及用户与它们互动的不同方式的系统。为此,我们开发了一个端到端的多任务顺序深神经网络,该网络给出了一些种子事件,同时预测哪个用户群将与给定存储库进行交互,交互的类型是什么,以及它发生的何时。为了促进学习,我们使用基于图的表示学习来编码存储库之间的关系。我们通过建模共同的兴趣来将用户映射到组中,以更好地预测流行度并推广到推理期间看不见的用户。我们引入了人工事件类型,以更好地模型在数据集中存储库的活动级别变化。提出的多任务架构是通用的,可以扩展到其他社交网络中的信息扩散。在一系列实验中,我们使用多个指标和基准来证明该模型的有效性。对模型预测流行和预测趋势能力的定性分析证明了其适用性。
Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mostly considered the dynamics of traditional social networking sites like Twitter and Facebook. We propose GitEvolve, a system to predict the evolution of GitHub repositories and the different ways by which users interact with them. To this end, we develop an end-to-end multi-task sequential deep neural network that given some seed events, simultaneously predicts which user-group is next going to interact with a given repository, what the type of the interaction is, and when it happens. To facilitate learning, we use graph based representation learning to encode relationship between repositories. We map users to groups by modelling common interests to better predict popularity and to generalize to unseen users during inference. We introduce an artificial event type to better model varying levels of activity of repositories in the dataset. The proposed multi-task architecture is generic and can be extended to model information diffusion in other social networks. In a series of experiments, we demonstrate the effectiveness of the proposed model, using multiple metrics and baselines. Qualitative analysis of the model's ability to predict popularity and forecast trends proves its applicability.