recommendation process to improve rating predic-
tion.
(Terzi et al., 2014) proposes a text-based user-
kNN algorithm that uses text-based reviews instead
of numerical ratings to compute the users’ similar-
ity. The idea is to determine the similarity between
two users by computing the similarity between re-
views’ words for every item reviewed by both users.
The text-based user-kNN is compared with several
approaches using numerical ratings in the rating pre-
diction step. For the numerical experiments, two data
sets are used: RottenTomatoes and an AudioCD from
AmazonProductReviews. Slightly better results are
obtained for root mean square error (RMSE) between
the actual and the predicted ratings in the text-based
user-kNN approach over the ratings-based ones.
(Poirier et al., 2010) determines sentiment scores
from text-based reviews using a Na
¨
ıve Bayes model.
As a first step, the text-based reviews are analyzed
and text mining techniques are applied in order to
build the user-item-rating matrix. Reviews are clas-
sified into two sentiment classes: positive and neg-
ative using the KHIOPS tool (Boull
´
e, 2007). Then,
the item-based collaborating filtering algorithm is ap-
plied to generate the recommendations. Experiments
are conducted using Flixter, Netflix, and IMDB data
sets. RMSE is used as an evaluation measure.
(Ma et al., 2017) designed an original user-
preference-based collaborative filtering (UPCF) ap-
proach to exploit free-text online reviews to retrieve
users’ preferences. Firstly, aspect-level opinions min-
ing techniques were applied to transform the free-text
reviews into structured aspect opinions. Next, the user
preferences were determined on one hand from the
aspect importance and, on the other hand, from the
aspect need. The aspect importance means that opin-
ions on important aspects are more influential to the
overall ratings than other aspects, and uses the sim-
ilarity between the opinions on one aspect and the
overall ratings. The aspect need is calculated as the
difference between the opinions of a user on an as-
pect and those of other users, which indicates the dif-
ferentiated needing level on this aspect with respect
to the user. Based on this, a user-based collaborative
filtering approach is designed so that the users’ aspect
preferences are integrated to calculate the similarities
between users.
(Musto et al., 2017) implemented a user and item-
based collaborative filtering approach that includes
aspect opinion data. For both user and item-based
use cases, aspect-based user/item distances are cal-
culated using the sentiment ratings extracted from the
reviews’ aspects. The similarity between users/items
is determined based on the inverse of the users/items
distances and ratings’ predictions are computed using
the collaborative filtering algorithm.
3 SYSTEM ARCHITECTURE
As exemplified in Chapter 2, the text-based items’
descriptions reveal more valuable information com-
pared to the plain numerical ratings for the recom-
mendation process. The focus of the proposed ap-
proach is to make use solely of the textual information
when building the recommendation system, regard-
less of the numerical ratings. The textual input is ex-
ploited using a lexicon-based technique to determine
the polarity score of a review. The resulted scores
are the sentiment ratings taken into consideration for
the user-based kNN collaborative filtering algorithm.
After the data collection phase, the text-based items’
reviews serve as input for a sentiment lexicon that de-
termines a sentiment rating for an item. The data set
enhanced with the computed sentiment rating is fur-
ther passed to a recommendation system.
3.1 Data Pre-processing
The proposed recommendation system handles tex-
tual information, therefore, a data cleansing process
was applied to the input data sets before being used
by the sentiment lexicon. The following techniques
have been applied:
• Removal of punctuation and stop words;
• Lower-casing;
• Removal of URLs;
• Stemming
3.2 Sentiment Lexicon
The proposed approach uses, for the sentiment anal-
ysis task, a sentiment lexicon, which was selected
based on the complex and thorough comparison pre-
sented in (Hutto and Gilbert, 2014). The Vader Sen-
timent Lexicon was compared to several ones from
literature (Linguistic Inquiry Word Count, General
Inquirer, Affective Norms for English Words, Sen-
tiWordNet, SenticNet, Word-Sense Disambiguation)
and produced,in most cases, the best results.
Vader (Valence Aware Dictionary and Sentiment
Reasoner) lexicon (Hutto and Gilbert, 2014) is a rule-
based sentiment analysis tool based on a dictionary
that maps words to positive, neutral, or negative sen-
timent scores. The sum of all these scores defines a
compound score which is normalized between -1 and
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
204