论文标题
Oracle-Mnist:用于基准计算机学习算法的Oracle字符的数据集
Oracle-MNIST: a Dataset of Oracle Characters for Benchmarking Machine Learning Algorithms
论文作者
论文摘要
我们介绍了Oracle-Mnist数据集,其中包括28美元的$ \ times $ 28灰度图像,这些图像来自10个类别,用于基准图案分类,并在图像噪声和失真方面面临特定的挑战。该训练集完全由27,222张图像组成,测试集每类包含300张图像。 Oracle-Mnist与原始的MNIST数据集共享相同的数据格式,从而可以与所有现有的分类器和系统直接兼容,但是与MNIST相比,它构成了更具挑战性的分类任务。古代人物的图像遭受了1)三千年的埋葬和老化以及2)古代汉语的巨大变体写作风格引起的非常严重和独特的噪音,这使它们在机器学习研究中变得现实。该数据集可在https://github.com/wm-bupt/oracle-mnist上免费获得。
We introduce the Oracle-MNIST dataset, comprising of 28$\times $28 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.