computing the node set of a given instance XML
document, our XSchema-XPath evaluator computes
a set of schema paths to the possible nodes specified
by a given XPath query when the XPath query is
evaluated by a common XPath evaluator on XML
instance documents. If an XPath query cannot be
evaluated completely, the schema paths for the
XPath query are computed to an empty set of
schema paths, i.e. the XPath query is unsatisfiable
according to the schema.
Definition 4 (Schema paths): A schema path is a
sequence of pointers to either the records <XP’, N, z, lp,
f>
or the records <o, {f, …, f}>, where
• XP’ is an XPath expression,
• N is a node in an XML Schema definition,
• z is a set of pointers
• lp
is a set of schema paths,
• f is a schema path list, or a predicate
expression without location steps, and
• o is a keyword.
XP’ is the part of a given XPath expression, which
has been evaluated;
N is a resultant node of a schema
whenever XP’ is evaluated by our XSchema-XPath
evaluator on the schema definition; z is a set of
pointers to the records in which the schema node is
the parent of the schema node of the current record.
Note whenever a record is the first record of a loop,
the record has more than one possible parent record.
lp represents loop schema paths; f represents either a
schema path list computed from a predicate
q that
test the node N, or the predicate expression q itself
from which no schema paths can be computed like
true() or false(), but also including self::node()=C. o
represents operators like or, and and not.
4.1 Computing Schema Paths
We use the technique of the denotational semantics
(Schmidt, 1994) to describe our XSchema-XPath
evaluator, and define the following notations. Let
z
be a pointer in a schema path and
d is a field of a
schema record, we write
z.d to refer to the field d of
the record to which the pointer
z points. We use the
letter S to represent the size of a schema path p, thus
p(S) to represent the last pointer, p(S-1) the pre-last
pointer, and so on.
Figure 3 defines the denotational semantic
L of the
XSchema-XPath evaluator. The function
L takes an
XPath expression and a schema path as the
arguments and yields a set of new schema paths, and
is defined recursively on the structure of XPath
expressions. For evaluating each location step of an
XPath expression, our XSchema-XPath evaluator
first computes the axis and the node test of the
location step by iteratively taking the schema node
p(S).N from each schema path p in the path set as the
context node. The path set is computed from the part
xp’’ of the XPath query, which has been evaluated by
the XSchema-XPath evaluator. For each resultant
node r selected by the current location step
xp
f
, a new
schema path is generated based on the old path
p.
The auxiliary function ϑ(r, g) generates a new schema
path record e=<xp’, r, g, -, ->, adds a pointer to e at the
end of the given schema path
p and returns a new
schema path, where xp’=xp’’/xp
f
and g is a set of
pointers.
In the case of recursive schemas, it may occur
that the XSchema-XPath evaluator revisits a node
N
of the XML Schema definition without any progress
in the processing of the query. We call this a loop. A
loop might occur when an XPath query contains the
axis
desc, ances, preceding or following, which are boiled
down to the recursive evaluation of the axis child or
parent respectively. We detect loops in the following
way: let
r be a visited schema node when evaluating
the part
xp’ of an XPath expression with p(S).N as the
context node. If there exists a record
p(i) in p, such
that
p(i).N=r, and p(i).XP’=xp’, a loop is detected and the
loop path segment is
lp = (p(i), …,p(S)). lp will be
attached to the schema node
p(i).N where the loop
occurs. For computing
L⎡desc::n⎦(p), we first compute
p
i
| p
i
∈L⎡child::*⎦(p
i-1
) where p
1
=L⎡child::*⎦(p). If no loop is
detected in the path p
i
, i.e. ∀i∈{1, ..., S-1}: p
i
(i).N≠p
i
(S).N
∧ p
i
(i).XP’≠p
i
(S).XP’, L’⎡self::n⎦(p
i
) is then computed in
order to construct a possible new path from
p
i
. If a
loop is detected in the path
p
i
, i.e. ∃k∈{1,.., S-1}:
p
i
(k).N=p
i
(S).N ∧ p
i
(k).XP’=p
i
(S).XP’, a loop path segment,
i.e.
{p
i
(k), …, p
i
(S-1)} is identified. The function X
modifies the record, which is the head of the loop,
by adding the loop path into the record, i.e.
X(p
i
(k),
(p
i
(k),..,p(S-1))), and returns true. Furthermore,
although the schema nodes in two records are the
same, i.e.
p
i
(k).N=p
i
(S).N, these two nodes have
different parents, i.e,
p
i
(k).z ≠ p
i
(S).z. Therefore, the
new parent
p
i
(S).z has to be recorded and this is done
by the function
Z, which adds a parent pointer to the
record
p
i
(k), i.e. Z(p
i
(k), p
i
(S).z), and returns true.
The schema paths of a predicate are attached to
the context node of the predicate. The function
A(F, i,
p) writes F into the field p(i).f and returns the modified
schema path
p. The parameter F = {f
1
,..,f
k
} is computed
from a set of predicates
q
1
,..q
k
. f
i
is either a schema
path list computed from a predicate
q
i
, or is the
predicate expression
q
i
itself when q
i
does not contain
location steps. The node
p(i).N is the context node of
these predicates.
q
i
is evaluated to false if q
i
is
computed to the empty schema paths with the
exception of
not(q), which is computed to true. For
instance,
L⎡e[q
1
and q
2
]⎦(p) is computed to empty paths
if
q
1
or q
2
are evaluated to false. When computing the
schema paths of a predicate, the XSchema-XPath
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
160