Furthermore, this paper introduced auto-encoder
as a tool for dimensionality reduction. The pre-trained
auto-encoder can compress a large scale matrix into a
two-dimensional matrix. This is of vital importance
when training large sparse matrix with machine
learning algorithms. Our results show good fit on the
training embeddings by retaining as much
information retention of the original data as possible.
5 CONCLUSIONS
In summary, this paper described an effort to use
knowledge representation, natural language
processing and machine learning for compiling a suite
of resources for facilitating automated ACE
identification from free-text data particularly in low-
resource environments (i.e., labelled data is scarce).
In particular, based on the previous ACE terms, this
paper identified extra ontological terms from UMLS
and developed comprehensive mappings to its
narrower concepts. We also proposed practical
approaches to self-supervision by utilising the
concept co-occurrence embeddings. This will enable
us to automatically label the textual data, leading to
minimal manual annotations for training supervised
learning models such as deep learning NLP models.
Furthermore, to create a compact representation of
reusable concept embeddings, we trained an auto-
encoder model to reduce the dimension of the sparse
concept-document matrix. The model can compress
useful information into matrix with much fewer
dimensions, leading to efficient computation for
utilising semantics of concept embeddings. In
addition, this paper identified ACEs from Reddit,
leading to publicly available resources for
benchmarking.
This work is very much working in progress.
There are many areas that need further developments
and improvements. Specifically, future steps would
involve 1) further exploration with EHR datasets and
combine the concept embedding from social media
and EHR together; 2) enhancement on the current
concept representation with more attention on the
same concept; and 3) create a public accessible
benchmark on Reddit datasets and other publicly
available datasets (such as tweets) for ACE
identification with gold-standard annotations by
domain experts.
ACKNOWLEDGEMENTS
This research is supported by the UK's National
Institute for Health Research (grant number:
NIHR202639) and the UK's Medical Research
Council (grant number: MR/S004149/2).
REFERENCES
Ammar, N., Zareie, P., Hare, M. E., Rogers, L.,
Madubuonwu, S., Yaun, J., & Shaban-Nejad, A. (2021,
December). SPACES: Explainable Multimodal AI for
Active Surveillance, Diagnosis, and Management of
Adverse Childhood Experiences (ACEs). In 2021 IEEE
International Conference on Big Data (Big Data) (pp.
5843-5847). IEEE.
Brenas, J. H., Shin, E. K., & Shaban-Nejad, A. (2019).
Adverse childhood experiences ontology for mental
health surveillance, research, and evaluation: Advanced
knowledge representation and semantic web
techniques. JMIR mental health, 6(5), e13498.
Felitti, V. J., Anda, R. F., Nordenberg, D., Williamson, D.
F., Spitz, A. M., Edwards, V., & Marks, J. S. (1998).
Relationship of childhood abuse and household
dysfunction to many of the leading causes of death in
adults: The Adverse Childhood Experiences (ACE)
Study. American journal of preventive medicine, 14(4),
245-258.
Hughes, K., Bellis, M. A., Hardcastle, K. A., Sethi, D.,
Butchart, A., Mikton, C., ... & Dunne, M. P. (2017). The
effect of multiple adverse childhood experiences on
health: a systematic review and meta-analysis. The
Lancet Public Health, 2(8), e356-e366.
Liu, M., Luong, L., Lachaud, J., Edalati, H., Reeves, A., &
Hwang, S. W. (2021). Adverse childhood experiences
and related outcomes among adults experiencing
homelessness: a systematic review and meta-
analysis. The Lancet Public Health, 6(11), e836-e847.
Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G.,
& Ghosh, S. S. (2020). Natural language processing
reveals vulnerable mental health support groups and
heightened health anxiety on reddit during covid-19:
Observational study. Journal of medical Internet
research, 22(10), e22635.
Nagowah, S. D., Ben Sta, H., & Gobin-Rahimbux, B.
(2021). A systematic literature review on semantic
models for IoT-enabled smart campus. Applied
Ontology, 16(1), 27-53.
Wu, H., Toti, G., Morley, K. I., Ibrahim, Z. M., Folarin, A.,
Jackson, R., ... & Dobson, R. J. (2018). SemEHR: A
general-purpose semantic search system to surface
semantic data from clinical notes for tailored care, trial
recruitment, and clinical research. Journal of the
American Medical Informatics Association, 25(5), 530-
537.