would be expected if they were statistically
independent. If A and C are independent, the
lift will be equal to 1.
Lift(AC) =
Confidence(AC)
Support(C)
(3)
Leverage: Difference between the frequency
of A and C occurring together and the
frequency that would be expected if A and C
were independent. A value of 0 indicates
independence between the two itemsets.
Leverage(AC) = Support(A
C) – Support(A) × Support(C)
(4)
Conviction: A high value means that C is
strongly dependent on A. For example, if the
confidence is equal to 1, the denominator will
be 0, so the conviction will be
∞. Like lift, if
the items are independent, the conviction is
equal to 1.
Conviction(AC) =
()
()
(5)
Under this work, three ARL algorithms were
implemented: Apriori, Equivalence CLAss
Transformation (ECLAT) and Frequent Pattern
(FP)Growth, and extracted from (Agrawal, 1994),
(Zaki, 2000) and (Han, 2004), respectively.
2.2.1 Apriori
This algorithm has to access the dataset several times
to obtain the frequent itemsets. In the first access, the
support is counted for each item individually (level 1
itemsets). Then, with a minimum defined support
value, S, there are excluded rare items, i.e., those
whose support is lower than S.
In later accesses, higher-level itemsets containing
rare items are no longer considered, because if an
item’s support, S
i
, is lower than S, then all subsets that
contain it will have a support equal or lower than S
i
and, thus, lower than S (Apriori property).
This process is repeated until there are no more
frequent itemsets to evaluate. The final list of frequent
items is the junction of all the lists created for each
level, including the support values calculated for each
frequent itemset.
2.2.2 ECLAT
ECLAT is an improved version of the Apriori
algorithm. While Apriori uses a horizontal dataset
representation, ECLAT transforms it into its vertical
representation where, instead of indicating the
itemsets that belong to each transaction, it lists the
transactions in which item occurs.
Transaction lists for higher level itemsets are
created recursively, calculating the intersection of the
transaction lists (from the previous level) of each
item. If the intersection is null, the itemset is removed
from the list. This process is over when, for a certain
level, all intersections are null.
If the minimum support is set to the same value,
the final list of frequent itemsets will be identical to
that of the Apriori algorithm. However, ECLAT takes
up less memory throughout its process, manages to be
faster by using a vertical approach – in which its
calculations are done in parallel – and ends up
performing fewer accesses to the dataset, because it is
possible to calculate the support values for any level.
2.2.3 FPGrowth
The FPGrowth algorithm implementation considers
the Frequent Pattern (FP)-tree, a tree that contains the
prefixes of the transactions. Each tree path represents
a set of transactions that share the same prefix, where
each node corresponds to a single item. Furthermore,
all nodes referring to the same item are linked
together, so that all transactions that contain a certain
item can be easily found and accounted for when
traversing this tree.
The main operation that the FPGrowth algorithm
has to perform is to build the FP-tree of a projected
dataset, i.e., a dataset with the transactions that
contain a certain item, with that item removed. This
projected dataset is processed recursively, not
forgetting that the frequent itemsets found share the
same prefix – the item that was removed.
After building the FP-trees for all the necessary
dataset projections, the process of eliminating some
of the nodes associated with rare items is carried out
in order to simplify the tree and speed up the process.
Thanks to an efficient implementation of FP-trees,
the FPGrowth algorithm largely outperforms the
previously presented algorithms (Apriori and
ECLAT), both in terms of execution time and the
memory required, since the storage of the dataset
using a tree representation is more compact than the
full list of transactions.
2.3 Sequential Pattern Mining
Sequential Pattern Mining (SPM), unlike ARL, takes
into consideration the items’ order in each sequence,
allowing to discover frequent patterns in a dataset,
which may prove useful or interesting to be explored.