is the most probable cause but since the dataset is
large, in our opinion this is not the case and the in-
ference drawn holds.
In this work, we show that a model for personal-
ity recognition will benefit from more modalities and
data as input. We propose a new handcrafted be-
haviour encoding where each element is the proba-
bility of a low level action relevant to the task. We
show the effectiveness of all the inputs in the data
through ablation studies. We also give our opinion
on the trends shown in the ablation studies. Owing
to the interdisciplinary nature of the project, there are
numerous additions that will further improve perfor-
mance. From intuition, there are some which might
improve performance by a higher margin than others.
Using better backbones for feature extraction would
be interesting. We use the same ones as in the base-
line we choose but there are existing models with bet-
ter performance for similar tasks that can be utilised.
Transformers have been shown to perform better than
LSTMs. In the future, we will try to increase temporal
scale of attention in the transformer rather than using
a separate module for combining information across
chunks. This might tackle the problem that is seen
with neuroticism as discussed in section 4.3. One of
the major drawbacks of multimodal data is that pre-
processing takes a lot of time. Thus, it will be in-
teresting to explore Knowledge Distillation to allow
the model to utilise one or a subset of modalities and
give a similar performance but with lesser inputs. We
would also like to test our approach on other big scale
multimodal datasets, when they are available in the
future. This area of work has a lot of applications in
healthcare which we are exploring and hope that this
work leads to advancement in the area. We also hope
that it motivates other people to work on this interest-
ing problem.
