|
|
||||
A regular expression is a pattern that describes text, or a string of text. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.
There are several different versions of regular expression syntax:
The most basic building blocks of regular expressions are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Metacharacters are used to supply additional possibilities, and are most often used for searching.
A list of characters enclosed by [ and ] characters match any single character in that list; if the first character in the list is the caret ^ then the regular expression matches any character NOT in the list.
Example: the regular expression [0123456789] matches any single digit, while the regular expression [^0123456789] matches any character BUT a digit.
You can also specify a lexicographic range of characters as found in the current character set (usually UTF-7 or ASCII) by seperating the first and last characters in the range with a hyphen.
To include a literal ] place it first in the list. To include a literal ^ place it anywhere but first in the list. Finally, to include a literal -, simply place it last.
The period . will match any single character. Most regular expression libraries provide shortcuts to specify certain kinds of characters:
Additional shortcuts are available, but these are the most commonly used.
Normally, regular expressions scan the input for the pattern. However, it commonly occurs that you know that the pattern will appear at the beginning or end of the text.
This is called anchoring, and you can anchor the pattern using ^ to match the beginning of text or $ to match the end of text.
It is important to note that the ^ and $ regular expressions only work at the beginning and end of the pattern, respectively.The anchor \< matches the empty string at the beginning of a word; the anchor \> matches the empty string at the end of a word. PERL also supports \b and \B to match and not-match, respectively, the edge at the beginning of a word. Likewise, \e and \E are supported to allow matching and not-matching the edge at the end of a word
It can be tedious to repeat a complicated regular expression. Sometimes the string being searched for has an unknown length. In such cases the repetition operators are used.
The repetition operators appear AFTER a regular expression, for example
Regular expressions are concatenated together, matching any string formed by the concatenating two substrings that respectivly match the concatenated subexpressions.
Regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression
Repetition takes precedence over concatenation, which means that regular expressions ALWAYS match the largest possible string.
In Basic regular expressions, the metacharacters ?, ., *, +, (, {, |, }, and ) lose their special meaning; they must first be prefixed with a backslash to work as described in this document.
In Extended regular expressions, and PERL regular expressions, the metacharacters above will lose their special meaning when prefixed with a backslash. This is the opposite of Basic regular expressions.
In Shell expressions, only * and ? are recognized, and they have different meanings. The ? matches any single character (like regex .), while the * matches every character up to the character that follows the asterisk.
Related Items