MINING VERY LARGE DATASETS WITH SVM AND VISUALIZATION

Thanh-Nghi Do, François Poulet

2005

Abstract

We present a new support vector machine (SVM) algorithm and graphical methods for mining very large datasets. We develop the active selection of training data points that can significantly reduce the training set in the SVM classification. We summarize the massive datasets into interval data. We adapt the RBF kernel used by the SVM algorithm to deal with this interval data. We only keep the data points corresponding to support vectors and the representative data points of non support vectors. Thus the SVM algorithm uses this subset to construct the non-linear model. We also use interactive graphical methods for trying to explain the SVM results. The graphical representation of IF-THEN rules extracted from the SVM models can be easily interpreted by humans. The user deeply understands the SVM models’ behaviour towards data. The numerical test results are obtained on real and artificial datasets.

Download


Paper Citation


in Harvard Style

Do T. and Poulet F. (2005). MINING VERY LARGE DATASETS WITH SVM AND VISUALIZATION . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-19-8, pages 127-134. DOI: 10.5220/0002548601270134

in Bibtex Style

@conference{iceis05,
author={Thanh-Nghi Do and François Poulet},
title={MINING VERY LARGE DATASETS WITH SVM AND VISUALIZATION},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2005},
pages={127-134},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002548601270134},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - MINING VERY LARGE DATASETS WITH SVM AND VISUALIZATION
SN - 972-8865-19-8
AU - Do T.
AU - Poulet F.
PY - 2005
SP - 127
EP - 134
DO - 10.5220/0002548601270134