ing locality of reference. Figure 4 shows an
overview of the control flow in cp program,
and the symbols A, B and C in Figure 4 cor-
respond to the ones in the explanation below.
A. one open() system call opens a file to read and
the other open() system call opens another file
to write, preparing a file descriptor for each file
respectively.
B. read()system call reads up to designated bytes
from the file descriptor into buffer, and then
write() system call writes up to designated
bytes to the file referenced by the file descriptor
from buffer.
C. close() system calls close these files.
Thus, a cp program uses six system calls per trans-
action. We use a cp program called NORMAL
version, which executes these six system calls in
the order we show above, like open(), open(),
read(), write(), close() and close(). We
have to open() a file before executing read() or
write(), and we have to specify the file descriptor,
which is the result of open()system call, to execute
read(), write() and close(). Therefore, we
cannot simply wrap these six system calls. We have
to wrap two open() system calls and other four sys-
tem calls respectively. Because of the additional over-
head of using WSC mechanism that we mentioned in
Figure 3, we cannot expect the effect when applying
WSC mechanism to just one cp transaction. In fact,
doing one cp in WSC version of one cp took about
two times clock cycles compared to NORMAL ver-
sion. Therefore, we consider doing multiple cps in a
program.
We use other four versions of cp program, and
measure 11 portions of these 5 programs to observe:
I. whether WSC mechanism is effective or not in
cp programs in total,
II. the difference between the effect of wrapping
single type of system calls and that of wrapping
various types of system calls, and
III. the effect of wrapping system calls which have
the same code but refer to different data.
Figure 5 shows these 5 programs and 11 portions.
“N” in Figure 5 is the number of cp transactions.
Now, we explain each program and portion below.
Then we explain why we choose these portions to ex-
amine the points of our interests above.
In Program 2 in Figure 5, we wrap every one of six
kinds of system calls. We call this WSC+COLLECT
version.
As a counterpart of this WSC+COLLECT, we also
collect system calls of the same type in a block but ex-
ecute the block with normal system call convention.
We call this program as NORMAL+COLLECT ver-
sion (Program 3 in Figure 5).
In addition, we implement WSC+RW and NOR-
MAL+RW version (Program 4 and 5 in Figure 5),
which change the order of read() and write()
in WSC+COLLECT and NORMAL+COLLECT ver-
sion.
Then, we show the explana-
tion of 11 portions we measure.
1. from open() to close() of NORMAL.
2. from open() to close() of NOR-
MAL+COLLECT.
3. from open() to close() of
WSC+COLLECT.
4. from open() to close() of NORMAL+RW.
5. from open() to close() of WSC+RW.
6. read() and write() part of NOR-
MAL+COLLECT.
7. read() and write() part of
WSC+COLLECT.
8. read() and write() part of NORMAL+RW.
9. read() and write() part of WSC+RW.
10. only write() of NORMAL+COLLECT.
11. only write() of WSC+COLLECT
We measured only write() in portion 10 and
11 to observe the effect of wrapping system calls
which refer to different data. While read() sys-
tem call contains disk access time, write() system
call buffers access to the disk and enables us to ob-
serve the effect of WSC mechanism excluding disk
access time. Also, we implemented NORMAL+RW
and WSC+RW and measured portion 6, 7, 8 and 9 to
observe the effect of wrapping two types of system
calls together. Then, we measured the whole cp in
portion 1, 2 and 3 to examine if WSC mechanism is
effective or not in total. Also, we measured portion 4
and 5 to examine the influence of wrapping read()
and write() system calls on cp total.
We measure clock cycles and the number of events
such as L1 cache misses in every portion. From these
results, we investigate how WSC mechanism effects
locality of reference from the view point of I, II and
III above.
4.2 Performance Evaluation
Table 3 shows the result of cpprograms. In this case,
WSC threshold is 8 and we do cp transactions 100
times, which means N in Figure 5 is 100. The num-
bers in the row “portion” correspond to the numbers
of the explanation we show in subsection 4.1. The
row “#clocks” shows clock cycles, the row “L2$”
shows L2 cache miss counts and rows “ITLB”, and
“DTLB” show the walk counts for ITLB and DTLB,
respectively. We measured these events with a perfor-
mance monitoring tool perfctr (Petterson, n.d.).
IMPACT OF WRAPPED SYSTEM CALL MECHANISM ON COMMODITY PROCESSORS
313