论文标题
fplyr:R中的大数据拆分策略
fplyr: the split-apply-combine strategy for big data in R
论文作者
论文摘要
我们提出FPLYR,这是一种用于处理大文件的新软件包。它允许用户轻松地针对太大而无法适合可用内存的文件实现拆分型合并策略,而无需依靠数据库或引入非本地R类。自定义功能可以独立应用于每组观测值,并且结果可以返回或直接打印到一个或多个输出文件中。
We present fplyr, a new package for the R language to deal with big files. It allows users to easily implement the split-apply-combine strategy for files that are too big to fit into the available memory, without relying on data bases nor introducing non-native R classes. A custom function can be applied independently to each group of observations, and the results may be either returned or directly printed to one or more output files.