Hydra- NDN上的联合数据存储库

论文标题

Hydra- NDN上的联合数据存储库

Hydra -- A Federated Data Repository over NDN

论文作者

Presley, Justin, Wang, Xi, Brandel, Tym, Ai, Xusheng, Podder, Proyash, Yu, Tianyuan, Patil, Varun, Zhang, Lixia, Afanasyev, Alex, Feltus, F. Alex, Shannigrahi, Susmit

论文摘要

当今的大数据科学社区管理其数据发布和应用程序层的复制。这些社区利用无数机制发布，发现和检索数据集 - 结果是集中式的生态系统，或者是临时数据存储库的集合。将数据集发布到集中存储库可以是过程密集型的，并且这些存储库不接受所有数据集。由于数据名称，元数据标准和访问方法的差异，临时存储库很难找到和使用。为了解决科学数据出版和存储的问题，我们设计了由用户社区提供的宽松的存储服务器（NODES）制成的安全，分布式和分散的数据存储库。 Hydra运行命名的数据网络（NDN），并利用状态向量同步（SVS）协议，该协议使单个节点保持系统的“全局视图”。 HYDRA提供了可扩展的弹性数据检索服务，并通过NDN的内置数据随机添加和网络内的缓存以及通过自动故障检测和维持特定的复制程度来实现数据分配可伸缩性。 Hydra利用“ FAY”，这是本地计算出的数值来决定哪些节点将复制文件。最后，Hydra利用以数据为中心的安全性进行数据出版和节点身份验证。 Hydra使用网络操作中心（NOC）来引导Hydra节点和数据发布者的信任。 NOC分发用户和节点证书，并执行苏索证明的挑战。该技术报告是Hydra的参考。它概述了设计决策，背后的理由，功能模块和协议规范。

Today's big data science communities manage their data publication and replication at the application layer. These communities utilize myriad mechanisms to publish, discover, and retrieve datasets - the result is an ecosystem of either centralized, or otherwise a collection of ad-hoc data repositories. Publishing datasets to centralized repositories can be process-intensive, and those repositories do not accept all datasets. The ad-hoc repositories are difficult to find and utilize due to differences in data names, metadata standards, and access methods. To address the problem of scientific data publication and storage, we have designed Hydra, a secure, distributed, and decentralized data repository made of a loose federation of storage servers (nodes) provided by user communities. Hydra runs over Named Data Networking (NDN) and utilizes the State Vector Sync (SVS) protocol that lets individual nodes maintain a "global view" of the system. Hydra provides a scalable and resilient data retrieval service, with data distribution scalability achieved via NDN's built-in data anycast and in-network caching and resiliency against individual server failures through automated failure detection and maintaining a specific degree of replication. Hydra utilizes "Favor", a locally calculated numerical value to decide which nodes will replicate a file. Finally, Hydra utilizes data-centric security for data publication and node authentication. Hydra uses a Network Operation Center (NOC) to bootstrap trust in Hydra nodes and data publishers. The NOC distributes user and node certificates and performs the proof-of-possession challenges. This technical report serves as the reference for Hydra. It outlines the design decisions, the rationale behind them, the functional modules, and the protocol specifications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题