CLUX - Clustering XML Sub-trees

Stefan Böttcher, Rita Hartel, Christoph Krislin

2010

Abstract

XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present CluX, an XML compression approach based on clustering XML sub-trees. CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar. Thereby, CluX allows for storing and exchanging XML data in a space efficient and still queryable way. We evaluate different strategies for XML structure sharing, and we show that CluX often compresses better than XMill, Gzip, and Bzip2, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.

Download


Paper Citation


in Harvard Style

Böttcher S., Hartel R. and Krislin C. (2010). CLUX - Clustering XML Sub-trees . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-04-1, pages 142-150. DOI: 10.5220/0002877901420150

in Bibtex Style

@conference{iceis10,
author={Stefan Böttcher and Rita Hartel and Christoph Krislin},
title={CLUX - Clustering XML Sub-trees},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2010},
pages={142-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002877901420150},
isbn={978-989-8425-04-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - CLUX - Clustering XML Sub-trees
SN - 978-989-8425-04-1
AU - Böttcher S.
AU - Hartel R.
AU - Krislin C.
PY - 2010
SP - 142
EP - 150
DO - 10.5220/0002877901420150