CLUX - Clustering XML Sub-trees
Stefan Böttcher, Rita Hartel, Christoph Krislin
2010
Abstract
XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present CluX, an XML compression approach based on clustering XML sub-trees. CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar. Thereby, CluX allows for storing and exchanging XML data in a space efficient and still queryable way. We evaluate different strategies for XML structure sharing, and we show that CluX often compresses better than XMill, Gzip, and Bzip2, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.
DownloadPaper Citation
in Harvard Style
Böttcher S., Hartel R. and Krislin C. (2010). CLUX - Clustering XML Sub-trees . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-04-1, pages 142-150. DOI: 10.5220/0002877901420150
in Bibtex Style
@conference{iceis10,
author={Stefan Böttcher and Rita Hartel and Christoph Krislin},
title={CLUX - Clustering XML Sub-trees},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2010},
pages={142-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002877901420150},
isbn={978-989-8425-04-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - CLUX - Clustering XML Sub-trees
SN - 978-989-8425-04-1
AU - Böttcher S.
AU - Hartel R.
AU - Krislin C.
PY - 2010
SP - 142
EP - 150
DO - 10.5220/0002877901420150