(5) /^(?=.*\bbikes?\b)(?=.*\b
vehicles?\b)(?=.*\btwo\b)
(?=.*\bwheels?\b).+/i
The regular expression provided in (5) is within
slashes, followed by the i flag at the end, which
indicates that the search should be case-insensitive.
The caret (^) at the beginning is an anchor that
denotes the start of the string (although it may be
optional). The main part of the regex consists of four
capturing groups (inside parentheses), each
corresponding to one of the keywords in (4). Every
keyword requires a positive lookahead (denoted by
?=) and is also preceded by the dot (.) metacharacter
and the asterisk (*) quantifier (which together match
any character between zero and unlimited times, thus
making it possible for the response to optionally
include some other words, in addition to the
keywords). Moreover, each keyword is between word
boundaries (indicated by \b). Three of the keywords
have additionally a question mark at the end (which
means that the preceding s is optional in each case).
Finally, the last capturing group is followed by a dot
and a plus sign (+). These two symbols match any
character at least once (between one and unlimited
times). In effect, the dot and plus match the entire
expression on condition that all of the positive
lookahead assertions are true.
The regex in (5) is capable of matching each of
the sentences given in (3a) and (3b). In view of the
fact that, using exact-text matching approaches, these
sentences would all have to be included in the key (i.e.
in an array of alternative answers that deserve full
credit), the keyword approach based on regular
expressions is a neat solution. Despite this, however,
it is far from perfect as the regex in (5) would also
match sentences which are too vague to be accepted
as correct, for example:
(6) Vehicles including bikes have two wheels.
Bikes denoting vehicles have two wheels.
To make matters worse, it would match sentences
which are definitely incorrect, for example:
(7) Bikes are not vehicles with two wheels.
Bikes are vehicles with two or five wheels.
Bikes are vehicles with more than two wheels.
Bikes are vehicles with twenty two wheels.
Every vehicle with two wheels is a bike.
The opposite situation is also possible: some
sentences which correctly define bikes would not be
matched by the regex. Examples are given below:
(8) Bikes have a pair of wheels.
A bike is a vehicle with a pair of wheels.
A bike has one front and one rear wheel.
Bike – a two-wheeled vehicle.
It should be clear that the regex in (5) requires
modifications as it is not capable of matching every
possible correct response. Amongst other necessary
changes, the keyword vehicle would need to be made
optional. More importantly, however, even if the
regex is successfully adjusted to match all of the
examples presented above, it may be difficult to rule
out the possibility of someone producing yet another
alternative (and acceptable) response. One situation
in which the use of regular expressions may be
particularly challenging is when the keywords
include a word for which the number of acceptable
synonyms can be very large. The adjective good is a
case in point.
2.2 Regular Expressions Versus
Exact-text Matching
The question that arises in this context is whether the
use of keywords and regular expressions is a better
solution than exact-text matching. On the one hand, it
must be admitted that keyword matching is very
likely to result in fewer errors compared to exact-text
matching. On the other hand, even if the keyword
method results in only one mismatch (and the exact-
match method generates dozens of mismatches), there
remains the problem of identifying that single
mismatch in a set of responses submitted by the test
takers. And even if there are actually no mismatches,
there should be a way of making sure that this is
indeed the case. In all probability, the only solution is
some kind of human verification of the automated
scoring. Otherwise, some students may end up with
inaccurate scores (and we might not even be aware of
this).
However, if we accept that human verification is
a necessity, then exact-text matching is actually
superior to the keyword technique. The reason for this
is that the keyword approach can potentially make
two kinds of errors. As shown above, it can give full
credit to incorrect responses, which might be called
false positive mismatches, as in (7), or it can give no
credit to correct responses, which could be termed
false negative mismatches, as in (8) above. The exact-
match approach, by contrast, can only make one type
of error, namely false negative mismatches. This is
because, in the exact-match approach, it is impossible
for an answer to be included in the key and actually
deserve no credit (unless the answer is there by
mistake).
Suppose, for instance, that in the exact-match
approach we have an item with four different keyed
answers. Now, if test takers provide ten different