versions. PatternRank
NP
selects simple noun phrases
as candidate keyphrases and PatternRank
PoS
selects
word tokens whose PoS tags match the pattern defined
in Section 3.1. While PatternRank
PoS
produced better
results in the majority of cases, PatternRank
NP
still
performed very well in all benchmarks. We therefore
conclude that the PatternRank
PoS
approach works par-
ticularly well in the evaluated scholarly domain. Fur-
thermore, since the use of noun phrases as candidate
keyphrases is a more general and domain-independent
approach, we propose using PatternRank
NP
as a sim-
ple but effective keyphrase extraction method for ar-
bitrary domains. Future work may investigate how
the PLM and PoS pattern used in this approach can
be adapted to different domains or languages.
REFERENCES
Augenstein, I., Das, M., Riedel, S., Vikraman, L., and Mc-
Callum, A. (2017). SemEval 2017 task 10: ScienceIE
- extracting keyphrases and relations from scientific
publications. In Proceedings of the 11th International
Workshop on Semantic Evaluation (SemEval-2017),
pages 546–555, Vancouver, Canada. Association for
Computational Linguistics.
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl,
M., and Jaggi, M. (2018). Simple unsupervised
keyphrase extraction using sentence embeddings. In
Proceedings of the 22nd Conference on Computa-
tional Natural Language Learning, pages 221–229,
Brussels, Belgium. Association for Computational
Linguistics.
Braun, D., Klymenko, O., Schopf, T., Kaan Akan, Y., and
Matthes, F. (2021). The language of engineering:
Training a domain-specific word embedding model
for engineering. In 2021 3rd International Conference
on Management Science and Industrial Engineering,
MSIE 2021, page 8–12, New York, NY, USA. Asso-
ciation for Computing Machinery.
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.,
Nunes, C., and Jatowt, A. (2020). Yake! keyword
extraction from single documents using multiple local
features. Information Sciences, 509:257–289.
Caragea, C., Bulgarov, F. A., Godea, A., and Das Golla-
palli, S. (2014). Citation-enhanced keyphrase extrac-
tion from research papers: A supervised approach.
In Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP),
pages 1435–1446, Doha, Qatar. Association for Com-
putational Linguistics.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). BERT: Pre-training of deep bidirectional
transformers for language understanding. In Pro-
ceedings of the 2019 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume
1 (Long and Short Papers), pages 4171–4186, Min-
neapolis, Minnesota. Association for Computational
Linguistics.
Grootendorst, M. (2020). Keybert: Minimal keyword ex-
traction with bert.
Hulth, A. (2003). Improved automatic keyword extraction
given more linguistic knowledge. In Proceedings of
the 2003 Conference on Empirical Methods in Natural
Language Processing, EMNLP ’03, page 216–223,
USA. Association for Computational Linguistics.
Hulth, A. and Megyesi, B. B. (2006). A study on auto-
matically extracted keywords in text categorization.
In Proceedings of the 21st International Conference
on Computational Linguistics and 44th Annual Meet-
ing of the Association for Computational Linguistics,
pages 537–544, Sydney, Australia. Association for
Computational Linguistics.
Kim, S. N., Medelyan, O., Kan, M.-Y., and Baldwin, T.
(2012). Automatic keyphrase extraction from scien-
tific articles.
Le, Q. and Mikolov, T. (2014). Distributed representations
of sentences and documents. In Xing, E. P. and Je-
bara, T., editors, Proceedings of the 31st International
Conference on Machine Learning, volume 32 of Pro-
ceedings of Machine Learning Research, pages 1188–
1196, Bejing, China. PMLR.
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., and
Chi, Y. (2017). Deep keyphrase generation. In Pro-
ceedings of the 55th Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 1: Long
Papers), pages 582–592, Vancouver, Canada. Associ-
ation for Computational Linguistics.
Mihalcea, R. and Tarau, P. (2004). TextRank: Bringing or-
der into text. In Proceedings of the 2004 Conference
on Empirical Methods in Natural Language Process-
ing, pages 404–411, Barcelona, Spain. Association for
Computational Linguistics.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).
The pagerank citation ranking : Bringing order to the
web. In WWW 1999.
Pagliardini, M., Gupta, P., and Jaggi, M. (2018). Unsuper-
vised learning of sentence embeddings using compo-
sitional n-gram features. In Proceedings of the 2018
Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Human Lan-
guage Technologies, Volume 1 (Long Papers), pages
528–540, New Orleans, Louisiana. Association for
Computational Linguistics.
Papagiannopoulou, E. and Tsoumakas, G. (2019). A review
of keyphrase extraction. Wiley Interdisciplinary Re-
views: Data Mining and Knowledge Discovery, 10.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT:
Sentence embeddings using Siamese BERT-networks.
In Proceedings of the 2019 Conference on Empiri-
cal Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan-
guage Processing (EMNLP-IJCNLP), pages 3982–
3992, Hong Kong, China. Association for Computa-
tional Linguistics.
Rose, S. J., Engel, D. W., Cramer, N., and Cowley, W.
(2010). Automatic keyword extraction from individ-
ual documents.
PatternRank: Leveraging Pretrained Language Models and Part of Speech for Unsupervised Keyphrase Extraction
247