An Empirical Study on Neophytes of Stack Overflow: How Welcoming
the Community is towards Them
Abdullah Al Jobair, Suzad Mohammad, Zahin Raidah Maisha, Md. Jubair Ibna Mostafa
and Md. Nazmul Haque
Department of Computer Science & Engineering,
Islamic University of Technology, Boardbazar, Gazipur, Bangladesh
Keywords:
Neophyte, Stack Overflow (SO), New User, Hostile Environment.
Abstract:
Stack Overflow (SO) is the most popular question and answers (Q&A) platform for programmers with a
rapidly expanding community of new users. However, the unwelcoming environment towards new users has
been under discussion for several years, which is a major concern towards the enhancement of a skillful com-
munity. In this work, we study a specific group of users who are either registered in the last 45 days or have a
reputation less than or equal to 50 and term them as “neophytes”. We investigate whether neophytes actually
face hurdles while collaborating in Stack Overflow and, if so, identify the reasons behind this phenomenon
by qualitative and quantitative analysis. Our study finds that neophytes are indeed facing hurdles while col-
laborating in the platform. The reasons behind the hurdles include harsh moderation of posts, negligence of
the posts, deleting or closing of posts, downvoting without providing any proper reasoning, etc. Our findings
can provide guidelines to create a more user-friendly Stack Overflow community. Furthermore, this study can
guide researchers to observe the reactions of neophytes in adverse situations and recommend some steps for
the community to make positive changes to the Stack Overflow environment.
1 INTRODUCTION
The exponential growth of the software development
industry leads to forming a community for aiding one
another with wisdom and experience. Q&A Platforms
are the result of such a need which eventually estab-
lishes a community for sharing knowledge. In the
community, users share skills and techniques among
themselves to solve different problems. Among all
the online software development Q&A platforms,
Stack Overflow is the largest and the most renowned
one (May et al., 2019). From the dawn of its origin, a
total of 16.5 million users have registered on the site
with an average of 3,370 new users registering every
day and making around 11,203 posts on a daily basis
1
(based on a query run in August 2021). Today’s mas-
sive repository of 21 million questions and 31 million
answers in Stack Overflow (Moutidis and Williams,
2021) is the result of the gradual progress of the com-
munity since 2008.
The accessibility of this extensive dataset has
brought about a number of researches on this plat-
1
https://data.stackexchange.com/stackoverflow/query/
1541382
form
2
including evolution of community, posts, code
snippets, along with user behavior, user participation,
mining SO and associated technologies and many
more (Ahmed and Srivastava, 2017; Adaji and Vas-
sileva, 2016). However, a limited number of studies
are focused on the environment of the community for
new users.
With the community’s swift expansion (Mamyk-
ina et al., 2011), Stack Overflow’s environment draws
significant attention. Any hostile nature of the com-
munity may turn off the eagerness for participation,
which hinders the lively ambience of the platform.
However, related studies intimate the existence of un-
welcoming environments specially to the new users.
Less experienced users become frustrated due to the
obscurity in closing questions (T
´
oth et al., 2020).
It ultimately leads the community to become hos-
tile and unsupportive, mostly to the new users. The
study of Abbas presented unanswered questions, neg-
ative feedbacks and deleted questions as the root of
a massive discouraging impact towards users (Abbas,
2019). According to (Slag et al., 2015), 47% of users
2
https://stackoverflow.blog/2009/06/04/stack-overflow-
creative-commons-data-dump/
Al Jobair, A., Mohammad, S., Maisha, Z., Mostafa, M. and Haque, M.
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them.
DOI: 10.5220/0011081100003176
In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2022), pages 197-208
ISBN: 978-989-758-568-5; ISSN: 2184-4895
Copyright
c
2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
197
post only once and disappear from the community.
They found that new users’ posts get removed more
often in addition to not receiving responses to their
questions at a higher rate. Along with researchers, the
SO community itself is concerned about this press-
ing issue
3
. The yearly site satisfaction survey of the
community
3
presents the unwelcoming environment
as the top frustrating and unappealing factor for SO
users. The following quote from the survey result re-
flects this situation of the community,
“The toxic nature of the community .......
Scares people from even signing up let alone
asking questions”
In this study, we address this issue by validating
whether the unwelcoming nature of the SO is a re-
ality and if so, investigate the probable reasons for
encountering such a hostile environment by the new
users. To achieve our goal, these new users have
been distinguished from the total users and termed as
“neophytes”. To validate our conjectures and prop-
erly understand the neophytes’ state in SO, this study
addresses the following research questions-
1. RQ-1: Do Neophytes Face Hurdles while Col-
laborating in Stack Overflow?
The allegation of the Stack Overflow environment
being unwelcoming and hostile, specially to the
neophytes, is a persisting problem for the com-
munity. Our aim is to verify whether the problem
exists in reality or not. The affirmative outcome
of this research question led us to investigate the
second research question.
2. RQ-2: What are the Potential Reasons for Neo-
phytes Facing Hurdles while Collaborating in
Stack Overflow?
There could be several potential reasons for which
neophytes are facing hurdles. Identifying those
reasons will help to understand the unwelcoming
nature of the platform and provide insight towards
solving the problem.
Our study validates the problem of the unwelcoming
environment of SO, specially to the neophytes. In ad-
dition, we also find a number of reasons, including
posts being deleted, closed, posting duplicated ques-
tions or answers, community rules violation etc. for
facing a hostile environment.
The rest of the paper is structured as follows.
Section 2 discusses the motivation behind our work.
Section 3 defines the term ”neophyte” and describe
their characteristics. Section 4 discusses the related
works on SO, specifically new users of SO. Section
3
https://stackoverflow.blog/2020/01/22/the-loop-2-
understanding-site-satisfaction-summer-2019/
5 presents the methodology of this study, including
data extraction, qualitative and quantitative analysis
procedures. Section 6 represents the results and anal-
ysis, answering RQ-1 and RQ-2 and thereafter recom-
mending some steps for Stack Overflow community.
Section 7 discusses the validity of the study and sec-
tion 8 concludes the paper by outlining future work.
2 MOTIVATION
The unwelcoming community has been a buzzing is-
sue since the very beginning. The ambiguous clo-
sure of posts (T
´
oth et al., 2020), negative feedbacks
(Cheng et al., 2014), offensive language (Cheriyan
et al., 2020) make the platform hostile, specially af-
fecting the new users. The gradual posts of renowned
blog sites, official surveys, posts of Meta Stack Ex-
change
4
and official blogs of Stack Overflow commu-
nity itself vocalizes the continual nature of the issue.
A blog post of the exception catcher
5
claims Stack
Overflow as a difficult community for participation
by observing the frequent downvoting tendency to a
post.
Meta Stack Exchange is a Q&A site where users
discuss the workings of SO. Here topic of each ques-
tion falls under some specific tags. The most upvoted
post
6
on “new-user” tag urges the community to be
supportive to new users. It has been viewed 45 thou-
sand times and received 1,830 upvotes (according to
August 2021) which clearly represents that the com-
munity is not welcoming enough for new users.
Jay Hanlon, the former Executive Vice-President
(EVP) of culture and experience of SO is also vocal
about the issue and asks for the prompt change of the
situation
7
.
”Stack Overflow isn’t very welcoming. It’s
time for that to change.
Furthermore, a qualitative accumulation of evidence
by Slegers
8
provides a verdict on the hatred nature of
SO. The major point of this accumulation is “Stack
Overflow hates new users”.
The hostile nature of SO is not a problem of recent
days, rather the situation has been prevailing since
long ago and no improvement is reflected according to
the developer survey of the community of 2019
9
and
4
https://meta.stackexchange.com/
5
https://theexceptioncatcher.com/blog/2012/09/
stackoverflow-is-a-difficult-community-to-participate-in/
6
https://meta.stackexchange.com/questions/9953/
7
https://tinyurl.com/424h7w4j
8
https://hackernoon.com/the-decline-of-stack-
overflow-7cb69faa575d
9
https://insights.stackoverflow.com/survey/2019
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
198
2020
10
. The survey of 2019 in Figure 1, expresses
that there is no progress in the welcoming environ-
ment of the community because 73% developer votes
as the environment remained the same as it was in last
year, 2018. Whereas, the survey of 2020 in Figure 2
shows 70.6% vote.
Figure 1: Developer Survey 2019.
Figure 2: Developer Survey 2020.
All the studies, surveys, blogs and meta discus-
sions substantiate the claim of a hostile environment,
specially for the new users. Moreover, the issue has
been persistent for years and sufficient research works
are not documented on this problem which encour-
aged us to work on this concern.
3 DEFINING NEOPHYTE
The contributions of various levels of users, start-
ing from the professionals to the novices, make
Stack Overflow so lively, dynamic and the most used
question-answer site (May et al., 2019). To evaluate
how new users contribute to the community and to
10
https://insights.stackoverflow.com/survey/2020
analyze how welcoming the environment is towards
them, our research is concentrated on a fixed group of
users who are termed as “Neophytes”.
According to our definition, neophytes are those
groups of users who are either registered in Stack
Overflow in the last 45 days or have a reputation of
less than or equal to 50. Users registered in SO within
the last 45 days are taken into consideration because
Stack Overflow defines them as “New users”
11
. Al-
though it is an acceptable indication of newly joined
users to the platform, it does not specify anything on
their contribution to the site. Thus, a reputation con-
straint is integrated to inspect neophytes’ contribution
to the platform.
After rigorous analysis, we end up with two repu-
tation boundaries of 38 and 50 reputation.(Slag et al.,
2015) worked with 38 reputation as they found it the
average reputation of medium active users. But a
user with 38 reputation lacks the privilege of com-
menting which is a vital feature
12
. On the contrary,
Stack Overflow allows almost all basic operations like
questioning, answering, commenting, upvoting (apart
from downvoting which is assigned for the reputed
users) if someone gets to 50 reputation
11
. So, to en-
sure an impactful presence of users in SO, 50 reputa-
tion is chosen over 38.
Therefore, if one of the conditions gets satisfied
for a user, that user will be considered as a neophyte.
Everyone other than neophytes in SO is specified as
“regular users” throughout the paper.
Algorithm 1: Algorithm to find neophytes from registered
user pool.
1: procedure FINDINGNEOPHYTES(reg users)
2: neophytes = []
3: for each user in reg users do
4: if ( then(user.reputation 50) or
(user.registration day 45) )
5: neophytes.add(user)
6: end if
7: end for
8: return neophytes
9: end procedure
Algorithm 1 dissociates neophytes from all the
registered users pool in Stack Overflow. It re-
ceives reg users as a parameter which represents
registered users of SO. The output is the list of
neophytes separated from the registered users. In
line-2, an empty list of neophytes is taken. For each
user in registered users, the constraints of 50 repu-
11
https://meta.stackexchange.com/questions/310881/
12
https://stackoverflow.com/help/privileges
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them
199
tation boundary or the registration date within last
45 days is checked in line-4. One fulfilling the con-
straints is added to the neophytes list.
According to our definition, 89.9%
13
(14,897,718)
of total users of Stack Overflow data dump 2020 are
neophytes. For this study, we focus on a specific
group of users to investigate the attitude of SO com-
munity towards them.
4 RELATED WORK
Many research works are conducted after the SO
dataset is made public. The research works of Stack
Overflow are diverse. The works are done on a regular
basis on numerous domains.
“Post” analysis is one of the richest domains with
research studies from the very beginning of Stack
Overflow. According to (Abric et al., 2019), dupli-
cate posts are mostly posted by inexperienced users.
With the advancement of technology, the number of
obsolete answers are increasing. This issue was ad-
dressed in a study by (Zhang et al., 2021a) and the
authors observed that some tags are prone to obsolete
answers. The community is also indifferent to spe-
cific tags that keep those tag-specific questions unan-
swered (Saha et al., 2013). A study on new users
by (Hart and Sarma, 2014) denied the claim of new
users relying on intrinsic factors (answerer’s reputa-
tion, representation of answer etc.) only to identify
answer quality. Reasons of closing a post can be cat-
egorized into 5 groups (T
´
oth et al., 2020). However,
the ambiguous closure of post is a concerning issue
which frustrates and hurts the users specially the new-
bies and make the environment feel hostile to them.
These studies include vital information like impact of
closure of questions and new users’ perspective of de-
tecting quality. But the works lack anything related to
how new users’ posts are accepted to the community.
Analysis of “Comment” is another vital aspect to
understand the environment and culture of a platform.
Studies like categorizing the comments (Sengupta and
Haythornthwaite, 2020) indicate how the comments
help in learning and increasing skills. One of the
recent studies on SO investigated how the platform
manages comments (Zhang et al., 2021b). Analyz-
ing these comments can provide with insights on gen-
der hospitality in Stack Overflow (Brooke, 2019). A
study on norm violations in SO shows that its com-
ments are offensive and unwelcoming by presenting a
taxonomy of norms that are violated (Cheriyan et al.,
13
https://data.stackexchange.com/stackoverflow/query/
1384160/
2020). Unfortunately, the domain still requires re-
search on comments, specially addressing the situa-
tion of new users.
Various studies on user badge, reputation, partici-
pation have been making the SO “User” domain en-
riched since the very dawn of its establishment. A
study by (Yanovsky et al., 2021) discussed the associ-
ation of user contribution and behavior with achieve-
ment of badges. A much needed contribution for the
new users is the research of (Bosu et al., 2013), where
they provided guidance to new users on enhancing
their reputation swiftly. An associative study on in-
volvement of habits of individuals with high and low
reputation (Movshovitz-Attias et al., 2013) presents
that extremely high-reputed users are the dominant
source of replies, particularly high-quality responses.
On the contrary, low-reputed users ask a bulk of ques-
tions on the site without answering any question. In
another study, the reputation and contribution of a
user against the completeness of their profile had been
investigated (Adaji and Vassileva, 2016). The study
observed that users with complete profiles have rela-
tively high reputation and also they post high quality
contents.
A number of studies present the concern on the en-
vironment of Stack Overflow. A research on detect-
ing and classifying offensive language claims SO as
unwelcoming by using offensive language (Cheriyan
et al., 2021). In an earlier study, the authors investi-
gated a group of users labelled as ”one-day fly” which
refers to users who never returned after posting only
once (Slag et al., 2015). They examined why one-day
flies don’t contribute to the site more than once. In
spite of discarding the allegation that new users - (i)
post frequent duplicate questions, (ii) post on uncom-
mon tags and (iii) get less views, they found new users
posts frequently get removed and remain response-
less. A subsequent study on one-day flies (Abbas,
2019) discusses some elements, which contribute to
the major issue of inactive users in SO. The author
employs a comprehensive literature review strategy to
develop the analysis. An investigation on ”Slashdot”
(a news and discussion site) finds that it has estab-
lished a distributed moderating mechanism to offer
input on the merit of its posts (Lampe and Johnston,
2005). This research looks at three different theories
for how new users learn to join in a digital commu-
nity: learning transfer from past experiences, observa-
tion of other members, and feedback from other mem-
bers. Another investigation on four big comment-
based news communities depict that negative feed-
back causes major behavioral changes that are harm-
ful to the community (Cheng et al., 2014).
While all the studies mentioned above contribute
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
200
to significant aspects of user reputation, badge, par-
ticipation and community environment in Stack Over-
flow, to the best of author knowledge there is limited
study dedicated towards the environment neophytes
are facing and how they feel for the platform. In our
empirical study we want to investigate the environ-
ment of the platform for neophytes. It will be a step-
ping stone towards building a friendly skillful com-
munity and enhancing the quality of the huge knowl-
edge base, Stack Overflow is aiming for.
5 METHODOLOGY
An overview of our research methodology is pre-
sented in Figure 3. At first, we extract the neophytes’
and regular users’ data from the official SO data dump
and then qualitative as well as quantitative analyses
are performed to answer the RQ-1 and RQ-2. The
qualitative analysis is performed considering posts,
blogs and surveys along with the manual analysis.
The qualitative analysis is followed by the quantita-
tive analysis. For quantitative analysis, a query based
statistical analysis is conducted. The following sub-
sections describe our method with further details.
Data Extraction
For the analysis, the June 2020 database of Stack
Overflow was initially used, provided by Stack Ex-
change Archive
14
. But we had to choose an alter-
nate data source as some tables crucial for our analy-
sis were missing from the data dump (e.g. CloseA-
sOffTopicReasonTypes, CloseReasonTypes, Flag-
Types, PendingFlags, SuggestedEdit, PostFeedback
etc.). These tables are concrete indicators of the diffi-
culty neophytes face in the community, without which
the study loses a significant amount of gravity.
So, as an alternative, ”Stack Exchange Data Ex-
plorer” has been used. Stack Exchange Data Explorer
is an online platform that facilitates the execution of
SQL queries on public data from the Stack Exchange
Network
15
. We accessed the data in August 2021 at
the time of our analysis. An overview of our dataset
is presented in Table 1.
In order to understand the status quo, a timeframe
of 1st January 2020 to 31st December 2020 is se-
lected for the queries. As the data of 2021 was not
complete during the analysis, the 2021 dump was
omitted to avoid anomalies. The one-year timeframe
14
https://archive.org/download/stackexchange
15
https://data.stackexchange.com/stackoverflow/query/
new
Table 1: An overview of dataset.
Dataset
Total Posts 4,456,062
Neophytes’ Posts 1,161,701
Data Duration
1
st
January - 31
st
December,
2020
provided a total of 4,456,062 posts. Out of this to-
tal post, 1,161,701 posts are posted by 619,171 neo-
phytes. This expresses a neophyte posted on average
1.88 posts. Whereas 3,294,361 posts are posted by
458,745 regular users which increases the ratio to 7.18
posts per regular user. As most of the related works
used data spanning from 6 months to 1 year, it can be
said that the data used is sufficient. Python is used for
our statistical analysis. Our entire analysis has been
done using SQL Server as Stack Overflow uses SQL
Server databases to store their data dumps.
These data are used to answer our research ques-
tions that are previously mentioned in the “Introduc-
tion” section. In the following parts, we distinctly ex-
plain the analyses and approaches to answer the re-
search questions.
RQ-1: Do Neophytes Face Hurdles while
Collaborating in Stack Overflow?
The very first step to our analysis is confirmation
of the conjecture that Neophytes are facing hurdles
while collaborating in SO. This issue concerns a sig-
nificant portion of SO users rather than just a few of
them. Both qualitative and quantitative analysis is
performed to answer this research question.
Qualitative Analysis
A qualitative analysis has been conducted to exam-
ine whether neophytes actually face hurdles in Stack
Overflow. Meta Stack Exchange Posts on “new user”
tag, Blogs and surveys
7, 3
by Stack Overflow commu-
nity, renowned blog sites
5, 8
contain the evidences and
insights on unwelcoming environment of Stack Over-
flow. As part of the investigation, we analyze these
documents to understand some factors that influence
creating the hostile environment. In addition, a man-
ual analysis has also been conducted on neophytes’
posts to strengthen our qualitative study. The outcome
of manual analysis gives us an initial impression re-
garding this problem and validates the research pro-
cess which has been presented in ”Result of RQ-1”.
Our intuition, influenced by the resources mentioned
above, indicates the existence of a real problem be-
tween regular users and the neophytes.
In order to prevent biases, 300 neophytes who
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them
201
Figure 3: An overview of methodology.
have registered in 2020 are randomly selected for our
manual analysis. The analysis has been performed on
their 968 posts. The intention is to find out how fre-
quently neophytes face unwelcoming situations while
collaborating to inspect the claim of their hurdles.
At the beginning, the reputation boundary of neo-
phytes i.e. 50 points is clustered into 5 classes con-
sidering the upper and lower bounds, each class has a
difference of 10 reputations. (0-10, 11-20, 21-30, 31-
40, 41-50 reputation). Then we have randomly picked
out 60 users from each class in order to avoid any bi-
ases which resulted in a set of 300 neophytes. The
manual analysis result provides convincing stats on
neophytes facing hurdles in SO which is presented in
the result section of RQ-1.
Quantitative Analysis
For validating the research question (RQ-1), a query-
based quantitative analysis is performed on “Stack
Exchange Data Explorer”. We have analyzed a com-
parative investigation on the total number of neo-
phytes with total users and neophytes’ posts in con-
trast to total posts. Moreover, how many posts of neo-
phytes are getting downvoted and whether they face
hurdles in their first posts while their first posts are
the downvoted ones are also investigated.
RQ-2: What are the Potential Reasons
for Neophytes Facing Hurdles while
Collaborating in Stack Overflow?
To answer this research question, a qualitative anal-
ysis is performed which is followed by a quantita-
tive analysis. The qualitative analysis discovers some
probable reasons whereas the quantitative analysis
validates those reasons.
Qualitative Analysis
To investigate the reasons, first a qualitative analysis
is performed. The same data of 968 posts of 300 neo-
phytes that we have accumulated are also analyzed
here. This time along with further analysis of the
posts, the neophytes profiles have been considered as
well to answer RQ-2. The profile based analysis leads
us to understand the gradual activities of them and
help recognize the reasons for their hurdles. We in-
vestigate the total number of posts by a neophyte, the
date difference of their first and last post and the date
difference of the most down-voted post and the im-
mediate next post. The goal is to inspect the activity
of the neophytes after facing some unwelcoming sit-
uations. We also investigate their badges and overall
progress, mentioning our observations on their posts
as well as their profiles. From this qualitative anal-
ysis, several potential reasons are identified for neo-
phytes facing hurdles in Stack Overflow.
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
202
Quantitative Analysis
To delicately investigate and verify the obtained list
of reasons, a query-based quantitative analysis is con-
ducted. As previously mentioned in the “Data Ex-
traction” section, the Stack Overflow data dump of
2020 is used in this quantitative analysis. We have
formulated numerous query based questions and ex-
ecuted the queries in the online query site of Stack
Overflow, the Stack Exchange Data Explorer. The
outcomes of the query significantly confirmed our
manual analysis.
6 RESULT ANALYSIS
The outcome of the questions will give us a clear pic-
torial view of the issue. Thus, will lead to the com-
munity thinking about taking effective and long term
steps towards resolution.
Result of RQ-1
According to the qualitative analysis, among 968
posts of neophytes, 254 posts are negatively scored
among which 123 posts have no explanation or proper
cause of getting the negative score. In addition, 47
posts are duplicate, 64 are being closed and surpris-
ingly 110 posts get no response at all. These 110 posts
have got no comments, no response along with 0 score
count. The analysis outcome is precisely depicted in
Figure 4.
Figure 4: Post based Manual analysis.
The total number of neophytes’ posts that are fac-
ing difficulties (posts being closed, posts marked as
duplicate, no response to posts, negative scored posts)
is about 49% of the total analyzed posts. As ’posts
with no explanation or proper cause of getting the
negative score’ i.e. 123 posts are already included
in ’negative scored posts’ i.e. 254 posts, we do not
include it to total 49%. Almost half of our randomly
selected neophytes posts denote that they are being
neglected, resulting in the community being unwel-
coming towards them.
According to our quantitative analysis, neophytes
posted 26.07% of the total posts of the platform in
2020 which is a significant portion of the total com-
munity. Out of these 26.07% posts i.e. 1,161,701
posts a total of 108,568 posts get negative score which
is 9.35% of the total posts of neophytes. The percent-
age might be misleading as it is small and seems very
normal. But the same ratio for the regular users comes
down to 3% only. It clearly depicts the difference of
posts getting negatively scored for neophytes and reg-
ular users.
The aforementioned statistics that we have gath-
ered from qualitative analysis lead us to the conclu-
sion that neophytes are facing hurdles in SO.
Key Findings of RQ-1:
According to the qualitative analysis, about
49% of the accumulated 968 posts of neophytes
are found to be having difficulties like being du-
plicated, closed, negatively scored etc. Quanti-
tative analysis reports a 9.35% of the total posts
of neophytes getting negatively scored which
is only 3% for regular users. The analysis
vividly represents that neophytes are facing hur-
dles while collaborating in Stack Overflow.
Result of RQ-2
Among manual analysis of 300 neophytes’ profiles
depicted in Figure 5, total 77 neophytes obtained
the “Informed” badge which means only 25.67% un-
dergo the entire tour page to gain knowledge on how
Stack Overflow works.
Figure 5: Profile based manual analysis.
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them
203
A total of 41 neophytes did not post further af-
ter their posts were negatively scored which results
in 13.67% of our accumulated data-set. Moreover,
62 neophytes posted only once, defined as “one-day-
flies” (Slag et al., 2015), which is 20.67% of the to-
tal 300 neophytes. Among those 62 neophytes, 44 of
them got a score less than or equal to 0 in their posts.
From the qualitative analysis, 9 potential reasons
have been identified. Each of these reasons is respon-
sible for neophytes facing hurdles in Stack Overflow.
The reasons for neophytes facing hurdles in the com-
munity are -
Posts being closed
Posts marked as duplicate
Not mentioning any reason for posts being nega-
tively scored
No response to posts
Unaware of Stack Overflow rules and culture
Deletion of posts
Moderation without proper reasoning
Rude comments
Steep learning curve
From our query based quantitative analysis on Stack
Overflow data dump 2020, the statistics vividly de-
picts the presence of the reasons. Among these in-
troduced reasons, several reasons (Posts being closed,
Posts marked as duplicate, No response to posts,
Not mentioning any reason for posts being negatively
scored) have been validated by the quantitative study.
As previously mentioned in the subsection ”RQ-
2: What are the potential reasons for neophytes fac-
ing hurdles while collaborating in Stack Overflow?”
of ”Methodology”, a number of queries are formed
and executed for the quantitative analysis of this
study. The queries of Listing 1 and Listing 2 are two
of those queries.
1 s e l e ct cou n t ( p . Id )
2 f rom P osts p
3 i n ner j oin U s e rs u
4 on p . Ow n e r U s e r I d =u . Id
5 i n ner j oin PostLinks pl
6 on pl . PostId =p . Id
7 w h ere (u . R e p utat i on <=5 0 o r u .
Cre a t i o nDate >= getdate () -45)
8 and ( p . C r e a t ion D a t e b e t w e e n
da t e f r o m p a r ts (2 020 ,01 ,0 1) a nd
da t e f r o m p a r ts (2 020 ,12 ,3 1) )
9 and ( pl . Li n k T y p e I d =3 )
Listing 1: Query to find posts marked as duplicate.
1 s e l e ct cou n t ( p . Id )
2 f rom P osts p
3 i n ner j oin U s e rs u
4 on u . Id = p . O w n e r User I d
5 l eft o uter j oin P end i n g F l a g s pf
6 on pf . PostId =p . Id
7 l eft o uter j oin Su g ge s t e d E d i ts s e
8 on se . PostId =p . Id
9 w h ere (u . R e p utat i on <=5 0 o r u .
Cre a t i o nDate >= getdate () -45)
10 and ( p . C r e a t ion D a t e b e t w e e n
da t e f r o m p a r ts (2 020 ,01 ,0 1) a nd
da t e f r o m p a r ts (2 020 ,12 ,3 1) )
11 and ( p . C o m m e ntC o u n t =0 and p . Score <0
an d p . C l osedDat e is n ull and pf .
Pos t I d is n u ll and se . P o s tId is
nu l l )
Listing 2: Query to find posts with no reason for being
negatively scored.
The SQL in Listing 1 represents a query to find
the number of duplicate posts of neophytes. If a post
has the property linkTypeId equal to 3, then this post
is counted as a duplicate post which has a prior post
very similar to it. The SQL in Listing 2 represents
query to find the number of posts having no explana-
tion or proper cause of getting the negative score. A
post with negative score having neither any comment,
pending flag, suggested edit nor they are being closed
is considered to be negatively scored without reason-
ing. As every closed post has a reason associated with
it, so the closed post constraint is also integrated into
the query.
However, some reasons (Deletion of posts, Mod-
eration without proper reasoning, Rude comments,
Steep learning curve) could not be analyzed by our
query due to lack of necessary data. Stack Overflow
does not make these data publicly available. The in-
accessibility of all these data imposes a barrier to val-
idate them quantitatively.
Table 2: Comparison of total posts and neophytes posts.
Unwelcoming
Reasons
Total
Posts
Neophytes
Posts
Posts being closed 104,461 52,761
(50.5%)
Posts marked as du-
plicate
78,652 38,508
(48.96%)
Negative scored
posts
207,508 108,568
(52.32%)
Not mentioning any
reason for posts be-
ing negatively scored
56,717 25,421
(44.8%)
Posts got no response
at all
892,557 212,457
(18.29%)
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
204
Table 2 shows a comparison between total posts
vs neophytes’ posts. Whereas, Figure 6 depicts the
stats of several reasons from the quantitative analysis.
Figure 6: Quantitative analysis.
Posts being Closed
From queries conducted on the 2020 dump, we have
observed that total 104,461 posts were closed in 2020.
Out of these closed posts, a significant portion of
50.5% (52,761 posts) belong to neophytes which are
half of the total closed posts, as presented in Table
2. Figure 6 shows that 4.54% of total posts of neo-
phytes are getting closed. On the contrary, it is only
1.57% for regular users’ posts. Although the percent-
age in comparison with their huge post count seems
usual, there is a clear difference between the ratio of
neophytes and regular users. Such actions affect neo-
phytes as a result lose their enthusiasm and interest
from further contributing to the site.
Posts Marked as Duplicate
Our analysis in Table 2 states that, in 2020 total
78,652 posts were marked as duplicate where 38,508
(48.96%) posts belong to neophytes. According to
Figure 6, 3.31% of the total posts of neophytes are
marked as duplicate. The percentage declines to 1.3%
for the regular users.
Duplicate posts generally receive negative feed-
back from the community. However, (Abric et al.,
2019), in their research, depicts that duplicate ques-
tions and answers contain some unique information
that benefits the asker. Even if it is marked as a du-
plicate question, the original question does not serve
the purpose of the asker. This causes frustration for
the neophyte because they did not get help as well as
faced harsh moderation on top of it.
Not Mentioning Any Reason for Posts being
Negatively Scored
From Table 2, we can see that in total, 56,717 posts
of 2020 data dump got negatively scored but no rea-
son (comment, suggested edit, flag) was there to
show-cause the down-vote. Out of which 25,412
posts were posted by neophytes which are 44.8% of
these 56,717 posts. Compared with the total number
of posts(1,161,701 posts) of neophytes in 2020, the
amount is 2.19% as per depicted in Figure 6. Al-
though the amount seems to be small, such behav-
ior strongly demotivates neophytes from further con-
tributing to the site. Downvoting posts is definitely
one of the mechanisms that helps in maintaining the
quality of the platform. But if it is done without ex-
planation of what went wrong with the post, it fails to
serve the purpose.
Posts Got No Response at All
18.29% (212,457 posts out of total 1,161,701 posts
of neophytes in 2020) of neophytes remained com-
pletely responseless which is presented in Figure 6
and Table 2. The posts are neither being closed nor
received any answer. Even those posts do not con-
tain any comment, edit suggestion or any flag. Amidst
the 209,025 unique neophytes whose post got no re-
sponse, 112,486 neophytes (53.81%) did not post fur-
ther. The alarming percentage hints at how this cul-
ture affects the neophytes.
Unaware of Stack Overflow Rules and Culture
Neophytes often make irrelevant answers, security
vulnerable solutions, opinion-based questions, ask for
debugging and violate Stack Overflow rules. All these
are because of being unaware of SO rules and culture.
Neophytes are often not familiar with the conventions
in Stack Overflow which leads to miscommunication
between neophytes and regular users. A significant
number of 2,174,619 neophytes (15.15%) do not go
through the SO tour page and ultimately lack the “In-
formed” badge. From the regular users’ perspective,
this hampers the integrity of SO as the site gets over-
flowed with repetitive and unnecessary posts. How-
ever, the response from this dynamic often discour-
ages neophytes from engaging in any further discus-
sions.
Deletion of Posts
(Slag et al., 2015), one day fly’s posts account for
15.4% of overall post deletions. The study also dis-
cusses how the post deletion system can contribute to
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them
205
lessened participation of one-day-flies. (Abbas, 2019)
discussed “Deleted Questions” as one of the signifi-
cant factors for people not participating in SO.
As Stack Overflow keeps all the information re-
lated to deletion of posts private
16
, it is quite im-
possible to make any quantitative analysis on deleted
posts. However, an idea can be generated regarding
the deletion of posts by counting the number of neo-
phytes getting the “Peer Pressure” badge. The “Peer
Pressure” badge is obtained when users delete their
own post with a score of -3 or lower. The quantitative
analysis informs a total of 153,515 neophytes having
“Peer Pressure” badge in 2020.
Moderation without Proper Reasoning
In SO, users get responses within a very short period
of time, typically within 21 minutes
17
. Moderation in
Stack Overflow is so fast that their questions face neg-
ative responses, closure or deletions etc. within a very
short period of time, like in less than ten minutes
17
.
This can easily lead to users getting frustrated. Thus
it is one of the vital factors which makes communica-
tion between regular users and neophytes difficult.
Rude Comments
Rude Comments are flagged and deleted quickly, but
even in that situation, users end up reading the rude
comments against them. This makes neophytes who
are not yet accustomed to the culture of Stack Over-
flow, feel frustrated and unwelcoming. During our
analysis of individual users’ profiles, we found sev-
eral cases that indicate that a neophyte has stopped
posting after they received negative responses to their
posts. Rude comments towards neophytes dissatisfy
them leading them to leave the community.
Steep Learning Curve
Stack Overflow is different from most question and
answer platforms as they aim to create an effective
knowledge base of developers. To maintain such ef-
fectiveness, participating in SO requires a high learn-
ing curve. That leads to the point that understand-
ing the purpose of SO or participating properly in the
community takes time. By that time, neophytes are
flooded with downvotes, closure deletion and many
other forms of negative response.
16
https://stackoverflow.com/questions/56770820/
17
https://meta.stackexchange.com/questions/61301/
Key Findings of RQ-2:
Potential reasons for neophytes facing hurdles
in Stack Overflow are - posts being closed,
posts marked as duplicate, not mentioning any
reason for posts being negatively scored, no re-
sponse to posts, misconception of Stack Over-
flow rules and culture, deletion of posts, moder-
ation without proper reasoning, rude comments,
steep learning curve.
Recommendation
With the qualitative and quantitative analysis, it is ev-
ident that proper collaboration and initiatives are nec-
essary from both neophytes’ and Stack Overflow’s
ends to better the environment of SO. We recommend
some steps for the Stack Overflow community.
For closed posts, SO is recommended to use a
pre-post automated prediction tool that can pre-
dict whether a post will be closed or not before
the post is published. It will also predict the rea-
sons as well as respective suggestions for closing
posts and notify the user. As a result, users can re-
alize their flaws in posts and act according to the
suggestions. By this, the number of closed posts
will also lessen in SO.
For posts being downvoted without mentioning
any reason, SO should impose the moderators
and privileged users to mention proper reasons for
downvotes. The reasons for such moderation can
help users identify and rectify their flaws.
The rude comments need to be detected before
they are published publicly. That is, comments
should be verified through SO moderation before
posting. The moderation can be proactively per-
formed by an automated tool. This will conceal
any rude language from the sight of users and ul-
timately will reduce the level of hostility.
For posts that got no response at all, SO should
take steps to detect post quality and encourage
privileged users to review them. Moreover, an au-
tomated tool can be designed to route the post to
the more suitable users. The reviews will be no-
tified to the owners so that these can positively
guide them.
Neophytes should be more cautious about their
posts. They should follow the rules and regula-
tions of Stack Overflow as well as accustom them-
selves to the culture. There should be an assess-
ment that ensures that the users are acquainted
with the rules.
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
206
7 THREATS TO VALIDITY
Internal Validity
Stack Overflow does not disclose any data regard-
ing the deleted posts. The only way to obtain this
information is to import earlier data and compare
it to the present one, which is not a valid concrete
work as well. Due to the absence of this data, our
research lacked a quantitative investigation on this
reason.
Stack Overflow does not provide any data on
vital information like closed posts, flags and
suggested edits of posts in the Stack Exchange
Archive(offline database). This led us to work
with the online version of SO data dump (Stack
Exchange Data Explorer). Due to the rapid up-
date of online data-dumps, we have to perform
the analysis binding a particular time-frame con-
straint to avoid the possible anomalies in our data.
External Validity
In order to maintain consistency we limited our
study on Stack Overflow only. So, the research
outcome may not reflect the condition of other
Q&A sites like reddit, quora etc. An analysis on
these sites is also required to understand the over-
all condition of new users and the environment for
them.
Only the database of 2020 has been considered
for our analysis to understand the environment
neophytes face in Stack Overflow. A database of
pre-pandemic period (before 2019) could be com-
pared with a database of pandemic period. It will
indicate if there is any effect of covid pandemic
on the neophytes characteristics and environment
of Stack Overflow.
8 FUTURE WORK &
CONCLUSION
Unwelcoming behavior towards neophytes has been
under discussion for many years, with little steps
taken related to it. The study sheds light on this issue
by confirming its validity and identifying significant
reasons behind this problem by providing definitive
data and statistics. The findings will help build a wel-
coming environment by uniting all ranges of users. It
will encourage new users to be actively involved in
this knowledge base.
A user with a reputation within 50 is considered
as neophytes. A further clustering of this group based
on their activeness would give detailed insights about
their characteristics. This will guide in future works
to see the distinction among active and inactive users.
In addition, it will indicate the ratio of neophytes hav-
ing hurdles while participating in the platform.
Sentiment analysis on neophytes would be an ef-
fective study, along with understanding the impact of
comments on neophytes’ posts. The extensive stud-
ies will lead to the most appropriate suggestions for
Stack Overflow to resolve this problem.
REFERENCES
Abbas, A. E. (2019). Investigating ‘one-day flies’ users in
the stackoverflow: Why do and don’t people partici-
pate? In 2019 International Conference on ICT for
Smart Society (ICISS), volume 7, pages 1–5.
Abric, D., Clark, O. E., Caminiti, M., Gallaba, K., and
McIntosh, S. (2019). Can duplicate questions on stack
overflow benefit the software development commu-
nity? In 2019 IEEE/ACM 16th International Confer-
ence on Mining Software Repositories (MSR), pages
230–234.
Adaji, I. and Vassileva, J. (2016). Towards understand-
ing user participation in stack overflow using profile
data. In International Conference on Social Informat-
ics, pages 3–13. Springer.
Ahmed, T. and Srivastava, A. (2017). Understanding and
evaluating the behavior of technical users. a study of
developer interaction at stackoverflow. Human-centric
Computing and Information Sciences, 7(1):1–18.
Bosu, A., Corley, C. S., Heaton, D., Chatterji, D., Carver,
J. C., and Kraft, N. A. (2013). Building reputation
in stackoverflow: An empirical investigation. In 2013
10th Working Conference on Mining Software Repos-
itories (MSR), pages 89–92.
Brooke, S. (2019). “condescending, rude, assholes”: Fram-
ing gender and hostility on stack overflow. In Pro-
ceedings of the Third Workshop on Abusive Language
Online, pages 172–180.
Cheng, J., Danescu-Niculescu-Mizil, C., and Leskovec, J.
(2014). How community feedback shapes user be-
havior. In Eighth International AAAI Conference on
Weblogs and Social Media.
Cheriyan, J., Savarimuthu, B. T. R., and Cranefield, S.
(2020). Norm violation in online communities–a
study of stack overflow comments. arXiv preprint
arXiv:2004.05589.
Cheriyan, J., Savarimuthu, B. T. R., and Cranefield, S.
(2021). Towards offensive language detection and re-
duction in four software engineering communities. In
Evaluation and Assessment in Software Engineering,
pages 254–259.
Hart, K. and Sarma, A. (2014). Perceptions of answer qual-
ity in an online technical question and answer forum.
An Empirical Study on Neophytes of Stack Overflow: How Welcoming the Community is towards Them
207
In Proceedings of the 7th International Workshop on
Cooperative and Human Aspects of Software Engi-
neering, CHASE 2014, page 103–106, New York, NY,
USA. Association for Computing Machinery.
Lampe, C. and Johnston, E. (2005). Follow the (slash)
dot: Effects of feedback on new members in an online
community. In Proceedings of the 2005 International
ACM SIGGROUP Conference on Supporting Group
Work, GROUP ’05, page 11–20, New York, NY, USA.
Association for Computing Machinery.
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., and
Hartmann, B. (2011). Design lessons from the fastest
q&amp;a site in the west. In Proceedings of the
SIGCHI Conference on Human Factors in Computing
Systems, CHI ’11, page 2857–2866, New York, NY,
USA. Association for Computing Machinery.
May, A., Wachs, J., and Hann
´
ak, A. (2019). Gender differ-
ences in participation and reward on stack overflow.
Empirical Software Engineering, 24(4):1997–2019.
Moutidis, I. and Williams, H. T. (2021). Community evolu-
tion on stack overflow. Plos one, 16(6):e0253010.
Movshovitz-Attias, D., Movshovitz-Attias, Y., Steenkiste,
P., and Faloutsos, C. (2013). Analysis of the reputa-
tion system and user contributions on a question an-
swering website: Stackoverflow. In 2013 IEEE/ACM
International Conference on Advances in Social Net-
works Analysis and Mining (ASONAM 2013), pages
886–893.
Saha, R. K., Saha, A. K., and Perry, D. E. (2013). Toward
understanding the causes of unanswered questions in
software information sites: A case study of stack over-
flow. In Proceedings of the 2013 9th Joint Meeting
on Foundations of Software Engineering, ESEC/FSE
2013, page 663–666, New York, NY, USA. Associa-
tion for Computing Machinery.
Sengupta, S. and Haythornthwaite, C. (2020). Learning
with comments: An analysis of comments and com-
munity on stack overflow. In Proceedings of the 53rd
Hawaii International Conference on System Sciences.
Slag, R., de Waard, M., and Bacchelli, A. (2015). One-day
flies on stackoverflow - why the vast majority of stack-
overflow users only posts once. In 2015 IEEE/ACM
12th Working Conference on Mining Software Repos-
itories, pages 458–461.
T
´
oth, L., Nagy, B., Gyim
´
othy, T., and Vid
´
acs, L. (2020).
Why will my question be closed? nlp-based pre-
submission predictions of question closing reasons
on stack overflow. In 2020 IEEE/ACM 42nd Inter-
national Conference on Software Engineering: New
Ideas and Emerging Results (ICSE-NIER), pages 45–
48.
Yanovsky, S., Hoernle, N., Lev, O., and Gal, K. (2021). One
size does not fit all: A study of badge behavior in stack
overflow. Journal of the Association for Information
Science and Technology, 72(3):331–345.
Zhang, H., Wang, S., Chen, T.-H., Zou, Y., and Hassan,
A. E. (2021a). An empirical study of obsolete answers
on stack overflow. IEEE Transactions on Software En-
gineering, 47(4):850–862.
Zhang, H., Wang, S., Chen, T.-H. P., and Hassan, A. E.
(2021b). Are comments on stack overflow well orga-
nized for easy retrieval by developers? ACM Trans.
Softw. Eng. Methodol., 30(2).
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
208