CONSTRUCTION OF DECISION TREES USING DATA CUBE

Lixin Fu

2005

Abstract

Data classification is an important problem in data mining. The traditional classification algorithms based on decision trees have been widely used due to their fast model construction and good model understandability. However, the existing decision tree algorithms need to recursively partition dataset into subsets according to some splitting criteria i.e. they still have to repeatedly compute the records belonging to a node (called F-sets) and then compute the splits for the node. For large data sets, this requires multiple passes of original dataset and therefore is often infeasible in many applications. In this paper we present a new approach to constructing decision trees using pre-computed data cube. We use statistics trees to compute the data cube and then build a decision tree on top of it. Mining on aggregated data stored in data cube will be much more efficient than directly mining on flat data files or relational databases. Since data cube server is usually a required component in an analytical system for answering OLAP queries, we essentially provide “free” classification by eliminating the dominant I/O overhead of scanning the massive original data set. Our new algorithm generates trees of the same prediction accuracy as existing decision tree algorithms such as SPRINT and RainForest but improves performance significantly. In this paper we also give a system architecture that integrates DBMS, OLAP, and data mining seamlessly.

Download


Paper Citation


in Harvard Style

Fu L. (2005). CONSTRUCTION OF DECISION TREES USING DATA CUBE . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-19-8, pages 119-126. DOI: 10.5220/0002509801190126

in Bibtex Style

@conference{iceis05,
author={Lixin Fu},
title={CONSTRUCTION OF DECISION TREES USING DATA CUBE},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2005},
pages={119-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002509801190126},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - CONSTRUCTION OF DECISION TREES USING DATA CUBE
SN - 972-8865-19-8
AU - Fu L.
PY - 2005
SP - 119
EP - 126
DO - 10.5220/0002509801190126