A Methodology for Detecting Programming Languages in Stack Overflow Questions

Aman Swaraj, Sandeep Kumar

2022

Abstract

Stack Overflow (SO) is the pre-eminent source for knowledge sharing among developers. The Question-Answer (Q-A) site witnesses a heavy traffic with around 5000 questions being posted every day. Given this scenario, it has now become mandatory for users to provide at least one tag for their questions for better discussion. However, novice developers often incorrectly tag their questions which leads to down voting of the post and eventual loss of information. An automatic tag generation mechanism is therefore needed for associating the posts with their respective programming languages. In this work, we have presented a rule based approach for detecting programming languages in question titles. The rules are used to match specific patterns in question titles and generate programming language tags. We then compare the tags generated by our proposed model with the pre-existing tags provided by stack overflow in the dataset. Our model is able to predict languages with an accuracy of 87%. Additionally, our model can detect multiple programming languages in a post and also identify different versions of a language such Python 2.7, Python 3 etc. We further record interesting observations with respect to existing approaches.

Download


Paper Citation


in Harvard Style

Swaraj A. and Kumar S. (2022). A Methodology for Detecting Programming Languages in Stack Overflow Questions. In Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-588-3, pages 478-483. DOI: 10.5220/0011310400003266


in Bibtex Style

@conference{icsoft22,
author={Aman Swaraj and Sandeep Kumar},
title={A Methodology for Detecting Programming Languages in Stack Overflow Questions},
booktitle={Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2022},
pages={478-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011310400003266},
isbn={978-989-758-588-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - A Methodology for Detecting Programming Languages in Stack Overflow Questions
SN - 978-989-758-588-3
AU - Swaraj A.
AU - Kumar S.
PY - 2022
SP - 478
EP - 483
DO - 10.5220/0011310400003266