手写脚本分类问题的基于混合群和重力的特征选择算法

论文标题

手写脚本分类问题的基于混合群和重力的特征选择算法

A Hybrid Swarm and Gravitation based feature selection algorithm for Handwritten Indic Script Classification problem

论文作者

Guha, Ritam, Ghosh, Manosij, Singh, Pawan Kumar, Sarkar, Ram, Nasipuri, Mita

论文摘要

在任何多片脚本环境中，手写脚本分类在将文档图像馈送到其各自的光学特征识别（OCR）引擎之前至关重要。多年来，研究人员提出的各种特征向量大多具有较大的尺寸，从而提高了整个分类模型的计算复杂性，从而解决了这个复杂的模式分类问题。特征选择（FS）可以通过仅限于基本和相关的特征来减少特征向量的大小作为中间步骤。在我们的论文中，我们通过引入一种新的FS算法（称为混合群和基于重力的FS（HSGF））来解决此问题。该算法是在最近在文献中引入的3个特征向量上运行的 - 距离式变换（DHT），定向梯度（HOG）的直方图和修改的对数 - gabor（MLG）滤波器变换。三个最先进的分类器，即多层感知器（MLP），K-Nearest邻居（KNN）和支持向量机（SVM）进行手写脚本分类。手写数据集，在块，文本行和单词级别上准备，由官方识别的12个指示脚本组成，用于评估我们的方法。在分类精确度中，仅利用所有三个数据集中的原始特征向量的75-80％，可以在分类精度中平均提高2-5％。与一些普遍使用的FS模型相比，提出的方法还显示出更好的性能。

In any multi-script environment, handwritten script classification is of paramount importance before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimension, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In our paper, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation based FS (HSGFS). This algorithm is made to run on 3 feature vectors introduced in the literature recently - Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG) and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) are used for the handwritten script classification. Handwritten datasets, prepared at block, text-line and word level, consisting of officially recognized 12 Indic scripts are used for the evaluation of our method. An average improvement in the range of 2-5 % is achieved in the classification accuracies by utilizing only about 75-80 % of the original feature vectors on all three datasets. The proposed methodology also shows better performance when compared to some popularly used FS models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题