objective is to accomplish this using “off the shelf”
machine learning models. We are in an era where
everyone is building a custom, new model for every
task. We argue that it is just as important to under-
stand when existing models will suffice. Therefore, in
this work we compare a set of classic machine learn-
ing models, ensembles of some of those models that
integrate knowledge from dense and sparse features,
and a classic neural model that incorporates a pre-
trained language model to reduce the impact of fea-
ture sparsity. Ultimately, we hope to answer the fol-
lowing questions. Are classic machine learning mod-
els sufficient for this task or is a neural model nec-
essary? Does incorporating dense features into the
classic models improve the overall performance of the
classifiers? Is an “off the shelf” model reasonable and
what properties of our data make it reasonable?
Finally, we are interested in seeing how discus-
sions of experiences shared on Twitter relate to differ-
ent salient events of the day. To investigate this, we
build a timeline of events and see how mentions of ex-
periences correlate with different types of events. In
other words, we can determine the types of events that
encourage the public to discuss experiences of harass-
ment and assault.
In summary, the contributions of this paper are as
follows: (1) we conduct an extensive empirical evalu-
ation (including a sensitivity analysis) of different ma-
chine learning methods and ensembles to understand
the strengths and weaknesses of different models on
these short, noisy tweets, (2) we present an analysis
of dense features, (3) we create a ground truth data set
for this task that we share with the computer science
and linguistics communities to continue to improve
models for predicting experiences, (4) we analyze the
volume and temporal structure of experience tweets
during the first year of the #MeToo Twitter move-
ment by determining the correlation between experi-
ence tweets and salient events, and (5) we release our
labeled data to support future research in this area.
The remainder of the paper is organized as fol-
lows. Section 2 presents related literature. In Section
3, we outline the overall methodology and present the
models we test. Our empirical evaluation is presented
in Section 4, followed by a discussion of the results.
Section 5 uses the best model to better understand the
first year of the #MeToo movement. Finally, conclu-
sions and areas for future work are presented in Sec-
tion 6.
2 RELATED LITERATURE
We divide our related work into two parts: an
overview of other inference tasks using Twitter data
that have some similarity to the new task we inves-
tigate in this paper and a brief introduction to online
Twitter movements.
2.1 Inference Tasks using Twitter
Twitter has been used for a wide range of inference
tasks. Numerous studies that infer different types
of demographic information about Twitter users have
been conducted over the past decade (Modrek and
Chakalov, 2019; de Mello Ara
´
ujo and Ebbelaar, 2018;
Fang et al., 2016; Cresci et al., 2018; Ahmad et al.,
2019; Khatua et al., 2018; Devlin et al., 2019; Liu
et al., 2021; Liu and Singh, 2021; Graney-Ward et al.,
2022; Pericherla and Ilavarasan, 2020). Here we
highlight a few that use linguistic characteristics as
features.
Several studies use classic machine learning ap-
proaches. For example, Modrek and Chakalov (Mod-
rek and Chakalov, 2019) use lease absolute shrink-
age and selection operator (LASSO) regression and
support vector machine (SVM) models to categorize
English #MeToo tweets along two dimensions: (1)
an experience of sexual assault and abuse, and (2)
whether the event happened in early life. Their SVM
model achieves 87% accuracy on the former task and
79% on the latter.
Witness identification, the task of identifying eye-
witnesses to an event, presents a similar challenge to
experience classification: secondhand accounts and
noise often vastly outnumber target data points. Fang
and colleagues (Fang et al., 2016) use a variety of
classic methods to identify witnesses to emergency
situations. Similarly, Cresci and colleagues (Cresci
et al., 2018) use quadratic SVMs to identify witnesses
to cultural, music, and technology events.
Recently, researchers have begun considering us-
ing language models for classification tasks us-
ing Twitter data. Ahmad and colleagues (Ahmad
et al., 2019) present a combined long short-term
memory (LSTM) and convolutional neural network
(CNN) model to classify tweets as extremist or non-
extremist. The combined classifier outperforms clas-
sic approaches and standalone LSTM and CNN mod-
els. Khatua and colleagues (Khatua et al., 2018)
use multilayer perceptron, CNN, LSTM, and bidi-
rectional LSTM to classify a tweet about assault as
occurring at (1) the workplace by colleagues, (2)
school by teachers or classmates, (3) public places
by strangers, (4) home by a family member, or (5)
DATA 2022 - 11th International Conference on Data Science, Technology and Applications
108