Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model

Masanao Ochi, Masanori Shiro, Jun’ichiro Mori, Ichiro Sakata

2022

Abstract

The scientific literature contains a wide variety of data, including language, citations, and images of figures and tables. The Transformer model, released in 2017, was initially used in natural language processing but has since been widely used in various fields, including image processing and network science. Many Transformer models trained with an extensive data set are available, and we can apply small new data to the models for our focused tasks. However, classification and regression studies for scholarly data have been conducted primarily by using each data set individually and combining the extracted features, with insufficient consideration given to the interactions among the data. In this paper, we propose an end2end fusion method for linguistic and citation information in scholarly literature data using the Transformer model. The proposed method shows the potential to efficiently improve the accuracy of various classifications and predictions by end2end fusion of various data in the scholarly literature. Using a dataset from the Web of Science, we classified papers with the top 20% citation counts three years after publication. The results show that the proposed method improves the F-value by 2.65 to 6.08 percentage points compared to using only particular information.

Download


Paper Citation


in Harvard Style

Ochi M., Shiro M., Mori J. and Sakata I. (2022). Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model. In Proceedings of the 18th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-613-2, pages 286-293. DOI: 10.5220/0011542200003318


in Bibtex Style

@conference{webist22,
author={Masanao Ochi and Masanori Shiro and Jun’ichiro Mori and Ichiro Sakata},
title={Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model},
booktitle={Proceedings of the 18th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2022},
pages={286-293},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011542200003318},
isbn={978-989-758-613-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model
SN - 978-989-758-613-2
AU - Ochi M.
AU - Shiro M.
AU - Mori J.
AU - Sakata I.
PY - 2022
SP - 286
EP - 293
DO - 10.5220/0011542200003318