UF-Merge algorithm.
Input: An mUF-tree T, a pair of paths PB = <N
b1
N
b2
…N
bk
> and
PC = <N
c1
N
c2
…N
cm
>, and n, the number of nodes to be merged
in each path.
Output: A merged mUF-tree T.
Procedure UF-Merge(T, PB, PC, n)
(1)
(2)
(3)
(4)
(5)
for i = 1 to n do {
N
q
= the new node formed by merging N
bi
and N
ci
and
updating its item, lowerbound, upperbound,
frequency, node-link, parent and children;
remove N
bi
and N
ci
from T;
update header table and node-links within T;
}
return T;
Figure 15: UF-Merge.
In continuing the example, UF-Evolve calls UF-
Shuffle to shuffle mUF-tree(right) in Figure 4(b),
and UF-Shuffle generates mUF-tree(right)
4
, as
shown in Figure 7.
UF-Shuffle calls UF-Merge and recommends the
pair of paths to be merged (i.e. <N
15
N
16
> and
<N
18
N
19
N
17
>). UF-Merge merges the pair of paths,
and returns the generated mUF-tree(right)
6
, as
shown in Figure 9.
UF-Evolve calls UF-Mine, and UF-Mine returns
a new set of uncertain frequent patterns, which is
{(d:[0.4-0.5]):3, (a:[0.1-0.1])(d:[0.4-0.5]):3, (f:[1.0-
1.0])(d:[0.4-0.5]):3, (f:[1.0-1.0])(a:[0.1-0.1])(d:[0.4-
0.5]):3, (b:[0.2-0.2]):4, (a:[0.1-0.1]):4, (f:[1.0-
1.0])(a:[0.1-0.1]):4, (f:[1.0-1.0]):5}.
UF-Evolve combines the original set of uncertain
frequent patterns with the new set. Now, the set of
uncertain frequent patterns becomes {(b:[0.2-0.2]):4,
(a:[0.1-0.1]):4, (f:[1.0-1.0])(a:[0.1-0.1]):4, (f:[1.0-
1.0]):5, (d:[0.4-0.5]):3, (a:[0.1-0.1])(d:[0.4-0.5]):3,
(f:[1.0-1.0])(d:[0.4-0.5]):3, (f:[1.0-1.0])(a:[0.1-
0.1])(d:[0.4-0.5]):3}.
The algorithms continue to attempt building new
mUF-trees until not successful. At the end, the
patterns are stabilized.
6 PERFORMANCE STUDY
6.1 Experimental Environment
and Data Preparation
In this section, we present some performance
comparison of UF-Evolve with FP-growth. All the
experiments are performed on a 2.83 GHz Xeon
server with 3.00 GB of RAM, running Microsoft
Windows Server 2003. The programs are written in
Java. Runtime here means the total execution time,
i.e. the period between input and output, instead of
CPU time. Also, all the runtime measurements of
UF-Evolve/FP-growth included the time of
constructing mUF-trees/FP-trees from the original
databases.
The experiments are done on a synthetic
database (T10.I4.D3K), which is generated by using
the methods in (IBM). In this database, the average
record length is 10, the average length of pattern is
4, and the number of records is 3K. Besides, we set
the number of items as 1000. All the probabilities
for the items are randomly generated.
For UF-Evolve, each record in the database
consists of multiple item:probability pairs. However,
for FP-growth, each record in the database consists
of multiple items. These two representations have
different semantic meanings. In order to carry out a
consistent comparison of UF-Evolve with FP-
growth, we make the following arrangements.
• First we generate UDB for UF-Evolve. Each
record in UDB consists of multiple (a
i
:p
i
).
• Based on UDB, we generate a special database
UDB’ for FP-growth. Each record in UDB’ consists
of multiple (a
j
:[l
j
-u
j
]), where a
j
= a
i
, and l
j
= u
j
= p
i
.
FP-growth treats each (a
j
:[l
j
-u
j
]) as an item.
• While constructing an mUF-tree from UDB, we
keep the Record IDs in the corresponding nodes.
When UF-Evolve generates iterative versions of the
mUF-tree, we record the changes of (lowerbound-
upperbound) in the nodes together with their Record
IDs.
• We use the Record IDs to trace back to the
records in UDB’ and change the corresponding l
j
and u
j
. Then there will be iterative versions of UDB’
for discovering new frequent patterns by FP-growth.
With these arrangements, we will have a consistent
comparison for the runtime and number of mined
frequent patterns from the two algorithms.
6.2 Experiments
In the first experiment, we measured the runtime,
number of shuffles and number of mined frequent
patterns with different numbers of records for UF-
Evolve and FP-growth. The number of records
varies from 0.1K to 3K, the minimum support
threshold is 3%, and the maximum merging
threshold is 0.3. The runtime of UF-Evolve and FP-
growth are shown in Figure 16. UF-Evolve is faster
and more scalable than FP-growth. The number of
shuffles of UF-Evolve is shown in Figure 17. Since
when the number of records increases, UF-Evolve
shuffles a bigger mUF-tree. The numbers of mined
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
82