This facility searches for records containing strings
which either exactly or approximately match a pattern.
Approximate matching allows finding records that contain
the pattern with several errors including substitutions,
insertions, and deletions. For example, Massechusets
matches Massachusetts with two errors (one substitution
and one insertion).
PATTERNS
Our search supports many kinds of queries including arbitrary
wild cards, sets of patterns, and in general, regular expressions.
It supports a large variety of patterns, including simple strings,
strings with classes of characters, sets of strings, wild cards,
and regular expressions.
Strings
Any sequence of characters, including the special symbols
‘^’ for beginning of line and ‘$’
for end of line. The special characters ( ‘$’,
‘^’, ‘*’, ‘[’,
‘^’, ‘’, ‘(’,
‘)’, ‘!’, and ‘\’ ) should
be preceded by ‘\’ if they are to be matched as regular
characters. For example, \^abc\\ corresponds to the string
^abc\, whereas ^abc corresponds to the string abc
at the beginning of a line.
Classes of characters
A list of characters inside [] (in order) corresponds to
any character from the list. For example, [ahoz] is any
character between a and h or between o and
z. The symbol ‘^’ inside [] complements
the list. For example, [^in] denote any character in the
character set except character ‘i’ to ‘n’.
The symbol ‘^’ thus has two meanings, but this is consistent
with other searches. The symbol ‘.’ (don't care) stands for
any symbol (except for the newline symbol).
Boolean operations
The search supports an ‘and’ operation ‘;’
and an ‘or’ operation ‘,’, but not a
combination of both. For example, ‘fast;network’ searches
for all records containing both words.
Wild cards
The symbol ‘ # ’ is used to denote a wild card.
# matches zero or any number of arbitrary characters.
For example, ex#e matches example. The symbol #
is equivalent to .* as used in other searches. In fact, .*
will work too, because it is a valid regular expression (see below), but
unless this is part of an actual regular expression, #
will work faster.
Combination of exact and approximate matching
Any pattern inside angle brackets <> must match the
text exactly even if the match is with errors. For example,
<mathemat>ics matches mathematical with
one error (replacing the last s with an a), but
mathe<matics> does not match mathematical no matter
how many errors we allow.
Regular expressions
The syntax of regular expressions in our search is in general the
same as that for other searches. The union operation ‘’,
Kleene closure ‘*’, and parentheses () are all
supported. Currently ‘+’ is not supported. Regular
expressions are currently limited to approximately 30 characters
(generally excluding meta characters).
