Table 5: The AUC score for each model using the SRN Val-
idation Dataset (SRN) and our User Labeled Test Dataset
Logistic Regression 0.962 0.729
SRN 0.955 0.634
HAN 0.932 0.666
HAVAN (HAN w/ VAT) 0.939 0.707
tention could be presented to a user. It also seems ad-
vantageous to leverage the vast number of unlabeled
examples with semi-supervised learning to classify is-
sues as security-related. Still, for our approach, it is
important to note that the number of labeled relevant
security examples is relatively few in comparison to
the full unlabeled dataset. To enable the use of more
aggressive SSL-methods, there is a need to acquire
more labeled examples.
An annotation policy was established in order to make
the annotation process more efficient and to favor re-
peatability and reproducibility. All data in the User
Labeled Test Dataset was annotated by one of the au-
thors with knowledge in the field of cybersecurity, a
condition that must be met in order to adequately label
data as relating to cybersecurity. Some data was an-
notated by multiple parties and compared in the cases
of mismatch to ensure the annotations were similar.
Many issues were ambiguous and unclear, making
it important to create a policy. The annotation guide-
line was used to establish a unified labeling method.
It was updated regularly during the annotation phase
whenever a new kind of case arose. The categories
do not discriminate between questions, warnings, or
other discussions about a certain topic. The text is
annotated as the most severe category that accurately
describes it. The priority goes from Vuln being high-
est to Safe being lowest.
Vuln: Presence of known exploits, user-reported vul-
