During the experiments, it was found that there
are cases of repeated recommendations among the
topics. This means that when receiving a
recommendation for a content, the student preferred
to select a different content instead of the
recommended one. This action causes the system to
re-recommend the previous content because it
remains the most appropriate.
6 DISCUSSION
The systematic literature review was useful because
it enabled us to find out the main existing approaches,
and which of them use similar techniques. The
approaches found and selected apply the LDA
technique using student texts or individual
characteristics to find topics of interest to them.
Our approach uses the LDA technique with data
previously obtained about the student's profile, their
preferences, and the learning paths they went through.
Based on this information, the LDA algorithm
generates groups by similarity, and the contents are
recommended, considering, in addition to the profile
attributes, the knowledge of the paths taken by the
student and by other students in the group. Thus, the
knowledge gained from the paths of other similar
students in the group can be used to benefit the
recommendation.
When analysing the results, the student's
adherence to the group to which he or she was
allocated by the LDA algorithm was first verified. On
average, the group obtained about 27% relevance for
the student considering the 10 most common
interests. This means that on average, at least 1 of the
students' interests is common in the group.
A direct search in the database for students with
the highest number of tags similar the required
student may be better than using LDA for clustering.
However, for the definition of groups, the algorithm
also considers data such as age and education, among
others. For large-scale use, a direct search considering
these values would be much more complex and
laborious, and less effective than using LDA.
Next, the relevance of the recommendations was
verified, determining the average adherence of the
best recommended content to the student's interests
(Fig. 3). The high relevance ratings obtained from the
recommendations are intuitive; it is not difficult to
recommend content within the topics of interest to the
group. The main information that this graph presents
is the difference between the tests. From 100 to 200
contents there was a significant increase in adherence.
However, from 200 to 500, adherence did not
significantly increase. This shows that at around 200
contents, the algorithm reaches a good limit, but more
contents do not make a significant difference in
adherence to the student's interests.
Finally, a comparison was performed between the
best recommended content and the best content
searched directly in the database (Fig. 4). In all cases
(100/200/500 contents), the recommended content
obtained from the database manually had the most
interests of the student. However, there was a pattern
of about 80% similarity where there were 200 or more
contents. This similarity is very high, which means
that the contents recommended by the system are
close to the best possible. Also in this case, it is
possible to see how the increase in the number of data
influenced the increase in content adherence. From
100 to 200 contents, the adherence of recommended
contents improved significantly, with almost 20%.
From 200 to 500 there was an improvement in both,
reaching close to 90%. With more than 200 contents,
there is a smaller, but gradual improvement. In a way,
this confirms what had already been seen in another
indicator, that for more than 200 contents there is no
significant improvement in the recommendation.
Through the experiments carried out, it was
noticed that 100 contents did not manage to reach the
students' interests very well. At between 100 and 200
contents, there was a progressive improvement, and
from 200 contents onwards, less significant
improvements were observed in the recommendation.
This shows that around 200 contents are needed for
the algorithm to be able to generate recommendations
that are close to those considered ideal.
The information obtained from the experiments
also demonstrates how the recommendation by LDA
can be very similar to the ideal search. It is coherent
to assume that in a real scenario, with several
students, some preferences may tend to appear
together in groups of students, making the groupings
more strongly related.
There is a strong tendency, in real situations, for
adherence rates to improve still further. For example,
a student who enjoys Marvel is likely to also enjoy
superheroes, so many students may appear with these
two preferences on their profiles. This correlation
cannot occur with randomly generated students.
7 CONCLUSIONS
This research analyses the recommendations made by
the LDA algorithm with different volumes of content,
for a growing number of students. The experiments
were carried out using randomly generated content