Mining Japanese Collocation by Statistical Indicators

Takumi Sonoda, Takao Miura

2013

Abstract

In this investigation, we discuss a computational approach to extract collocation based on both data mining and statistical techniques. We extend n-grams consisting of independent words and that we take frequencies on them after filtering on colligation. Then we apply statistical filters for the candidates, and compare these feature selection methods in statistical learning with each other. Five methods are evaluated, including term frequency (TF), Pairwise Mutual Information (PMI), Dice Coefficient(DC), T-Score (TS) and Pairwise Log-Likelihood ratio (PLL).We found PMI, MC and TS the most effective in our experiments. Using these we got 88 percent accuracy to extract collocation.

Download


Paper Citation


in Harvard Style

Sonoda T. and Miura T. (2013). Mining Japanese Collocation by Statistical Indicators . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 381-388. DOI: 10.5220/0004397503810388

in Bibtex Style

@conference{iceis13,
author={Takumi Sonoda and Takao Miura},
title={Mining Japanese Collocation by Statistical Indicators},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={381-388},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004397503810388},
isbn={978-989-8565-59-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Mining Japanese Collocation by Statistical Indicators
SN - 978-989-8565-59-4
AU - Sonoda T.
AU - Miura T.
PY - 2013
SP - 381
EP - 388
DO - 10.5220/0004397503810388