Basic Characters
Below are the chracters used in a regular expression (RE), used in shell script, Perl etc.
* (asterisk) : matches any number of repeats of the character string or RE preceding it, including zero instances
. (dot) : matches any one character, except a newline
^ (caret) : matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE.
$ (dollar sign) : at the end of an RE matches the end of a line
^$ : matches blank lines
[...] (brackets) : enclose a set of characters to match in a single RE
"[xyz]" matches any one of the characters x, y, or z."[c-n]" matches any one of the characters in the range c to n."[B-Pk-y]" matches any one of the characters in the ranges B to P and k to y."[a-z0-9]" matches any single lowercase letter or any digit."[^b-d]" matches any character except those in the range b to d. This is an instance of ^ negating or inverting the meaning of the following RE (taking on a role similar to ! in a different context).
\ (backslash) : escapes a special character, which means that character gets interpreted literally (and is therefore no longer special)
\<...\> (escaped angle brackets) : mark word boundaries
"\<the\>" matches the word "the," but not the words "them," "there," "other," etc.
Extended Characters
Below are the Extended REs. Additional metacharacters added to the basic set. Used in egrep, awk, and Perl.
? (question mark) : matches zero or one of the previous RE. It is generally used for matching single characters
+ (plus) : matches one or more of the previous RE. It serves a role similar to the *, but does not match zero occurrences
\{ \} (escaped curly brackets) : indicate the number of occurrences of a preceding RE to match
() (parantheses) : enclose a group of REs. They are useful with the following "|" operator and in substring extraction using expr
| (or) : RE operator matches any of a set of alternate characters
Reference Links
No comments:
Post a Comment