论文标题
机器学习数据的凝结表示
Condensed Representation of Machine Learning Data
论文作者
论文摘要
培训机器学习模型需要足够的数据。数据的充分性并不总是与数量有关,而是关于相关性和减少冗余的。生成数据的过程创造了大量数据。当使用RAW时,这样的大数据会导致大量计算资源利用。可以使用适当的凝结表示形式,而不是使用原始数据。将K均值(一种众所周知的聚类方法)与一些校正和改进设施相结合,引入了一种新型的机器学习应用的凝结表示方法。为了有意义和视觉上介绍新方法,采用合成生成的数据。已经表明,通过使用冷凝表示,而不是原始数据,可以接受的准确模型训练。
Training of a Machine Learning model requires sufficient data. The sufficiency of the data is not always about the quantity, but about the relevancy and reduced redundancy. Data-generating processes create massive amounts of data. When used raw, such big data is causing much computational resource utilization. Instead of using the raw data, a proper Condensed Representation can be used instead. Combining K-means, a well-known clustering method, with some correction and refinement facilities a novel Condensed Representation method for Machine Learning applications is introduced. To present the novel method meaningfully and visually, synthetically generated data is employed. It has been shown that by using the condensed representation, instead of the raw data, acceptably accurate model training is possible.