论文标题
迈向国际关系数据科学:挖掘中央情报局世界概况
Towards International Relations Data Science: Mining the CIA World Factbook
论文作者
论文摘要
本文介绍了三个成分的工作。第一个组成部分设定了总体理论背景,这是一个论点,即世界上日益复杂的复杂性使国际关系(IR)在理论和实践中都更加困难。信息时代和21世纪的事件已经使IR理论和实践远离了实际的政策制定(Walt,2016年),并使其根深蒂固,这根深蒂固于难以证明的观点和政治理论。同时,“第四个范式 - 数据密集型科学发现”的兴起(Hey等,2009)和数据科学的加强提供了另一种选择:“计算国际关系”(Unver,2018年)。使用传统和以数据为中心的工具的使用可以通过使IR与现实更相关(Koutsoupias,Mikelis,2020)来帮助更新IR领域。数据科学与IR之间的“婚礼”不是万能的。在感知和实践中都需要更改。最重要的是,要输入IR,必须存在相关数据。这是第二个组件发挥作用的地方。我挖掘了中央情报局世界概况,该事实提供了涵盖世界所有国家的跨域数据。然后,我执行各种数据预处理任务在简单的机器学习中达到峰值,该任务指示缺少值提供更完整的数据集。最后,第三个组件提出了利用生产的数据集的各种项目,以通过实际示例来说明数据科学与IR的相关性。然后,讨论有关该项目未来发展的想法,以优化它并确保连续性。总体而言,我希望通过提供实践示例,同时提供未来研究的燃料,从而为IR中的“第四个范式”讨论做出贡献。
This paper presents a three-component work. The first component sets the overall theoretical context which lies in the argument that the increasing complexity of the world has made it more difficult for International Relations (IR) to succeed both in theory and practice. The era of information and the events of the 21st century have moved IR theory and practice away from real policy making (Walt, 2016) and have made it entrenched in opinions and political theories difficult to prove. At the same time, the rise of the "Fourth Paradigm - Data Intensive Scientific Discovery" (Hey et al., 2009) and the strengthening of data science offer an alternative: "Computational International Relations" (Unver, 2018). The use of traditional and contemporary data-centered tools can help to update the field of IR by making it more relevant to reality (Koutsoupias, Mikelis, 2020). The "wedding" between Data Science and IR is no panacea though. Changes are required both in perceptions and practices. Above all, for Data Science to enter IR, the relevant data must exist. This is where the second component comes into play. I mine the CIA World Factbook which provides cross-domain data covering all countries of the world. Then, I execute various data preprocessing tasks peaking in simple machine learning which imputes missing values providing with a more complete dataset. Lastly, the third component presents various projects making use of the produced dataset in order to illustrate the relevance of Data Science to IR through practical examples. Then, ideas regarding the future development of this project are discussed in order to optimize it and ensure continuity. Overall, I hope to contribute to the "fourth paradigm" discussion in IR by providing practical examples while providing at the same time the fuel for future research.