Replace regex files with multibyte-aware version from FreeBSD.
* Makefile.in (install-headers): Remove extra command to install regex.h. (uninstall-headers): Remove extra command to uninstall regex.h. * nlsfuncs.cc (collate_lcid): Make externally available to allow access to collation internals from regex functions. (collate_charset): Ditto. * wchar.h: Add __cplusplus guards to make C-clean. * include/regex.h: New file, replacing regex/regex.h. Remove UCB advertising clause. * regex/COPYRIGHT: Accommodate BSD license. Remove UCB advertising clause. * regex/cclass.h: Remove. * regex/cname.h: New file from FreeBSD. * regex/engine.c: Ditto. (NONCHAR): Tweak for Cygwin. * regex/engine.ih: Remove. * regex/mkh: Remove. * regex/regcomp.c: New file from FreeBSD. Tweak slightly for Cygwin. Import required collate internals from nlsfunc.cc. (p_ere_exp): Add GNU-specific \< and \> handling for word boundaries. (p_simp_re): Ditto. (__collate_range_cmp): Define. (p_b_term): Use Cygwin-specific collate internals. (findmust): Ditto. * regex/regcomp.ih: Remove. * regex/regerror.c: New file from FreeBSD. Fix a few compiler warnings. * regex/regerror.ih: Remove. * regex/regex.7: New file from FreeBSD. Remove UCB advertising clause. * regex/regex.h: Remove. Replaced by include/regex.h. * regex/regexec.c: New file from FreeBSD. Fix a few compiler warnings. * regex/regfree.c: New file from FreeBSD. * regex/tests: Remove. * regex/utils.h: New file from FreeBSD.
This commit is contained in:
@ -1,151 +1,317 @@
|
||||
.TH REGEX 7 "25 Oct 1995"
|
||||
.BY "Henry Spencer"
|
||||
.SH NAME
|
||||
regex \- POSIX 1003.2 regular expressions
|
||||
.SH DESCRIPTION
|
||||
Regular expressions (``RE''s),
|
||||
as defined in POSIX 1003.2, come in two forms:
|
||||
.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
|
||||
.\" Copyright (c) 1992, 1993, 1994
|
||||
.\" The Regents of the University of California. All rights reserved.
|
||||
.\"
|
||||
.\" This code is derived from software contributed to Berkeley by
|
||||
.\" Henry Spencer.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\" 4. Neither the name of the University nor the names of its contributors
|
||||
.\" may be used to endorse or promote products derived from this software
|
||||
.\" without specific prior written permission.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" @(#)re_format.7 8.3 (Berkeley) 3/20/94
|
||||
.\" $FreeBSD: src/lib/libc/regex/re_format.7,v 1.12 2008/09/05 17:41:20 keramida Exp $
|
||||
.\"
|
||||
.Dd March 20, 1994
|
||||
.Dt RE_FORMAT 7
|
||||
.Os
|
||||
.Sh NAME
|
||||
.Nm re_format
|
||||
.Nd POSIX 1003.2 regular expressions
|
||||
.Sh DESCRIPTION
|
||||
Regular expressions
|
||||
.Pq Dq RE Ns s ,
|
||||
as defined in
|
||||
.St -p1003.2 ,
|
||||
come in two forms:
|
||||
modern REs (roughly those of
|
||||
.IR egrep ;
|
||||
1003.2 calls these ``extended'' REs)
|
||||
.Xr egrep 1 ;
|
||||
1003.2 calls these
|
||||
.Dq extended
|
||||
REs)
|
||||
and obsolete REs (roughly those of
|
||||
.IR ed ;
|
||||
1003.2 ``basic'' REs).
|
||||
.Xr ed 1 ;
|
||||
1003.2
|
||||
.Dq basic
|
||||
REs).
|
||||
Obsolete REs mostly exist for backward compatibility in some old programs;
|
||||
they will be discussed at the end.
|
||||
1003.2 leaves some aspects of RE syntax and semantics open;
|
||||
`\(dg' marks decisions on these aspects that
|
||||
may not be fully portable to other 1003.2 implementations.
|
||||
.PP
|
||||
A (modern) RE is one\(dg or more non-empty\(dg \fIbranches\fR,
|
||||
separated by `|'.
|
||||
.St -p1003.2
|
||||
leaves some aspects of RE syntax and semantics open;
|
||||
`\(dd' marks decisions on these aspects that
|
||||
may not be fully portable to other
|
||||
.St -p1003.2
|
||||
implementations.
|
||||
.Pp
|
||||
A (modern) RE is one\(dd or more non-empty\(dd
|
||||
.Em branches ,
|
||||
separated by
|
||||
.Ql \&| .
|
||||
It matches anything that matches one of the branches.
|
||||
.PP
|
||||
A branch is one\(dg or more \fIpieces\fR, concatenated.
|
||||
.Pp
|
||||
A branch is one\(dd or more
|
||||
.Em pieces ,
|
||||
concatenated.
|
||||
It matches a match for the first, followed by a match for the second, etc.
|
||||
.PP
|
||||
A piece is an \fIatom\fR possibly followed
|
||||
by a single\(dg `*', `+', `?', or \fIbound\fR.
|
||||
An atom followed by `*' matches a sequence of 0 or more matches of the atom.
|
||||
An atom followed by `+' matches a sequence of 1 or more matches of the atom.
|
||||
An atom followed by `?' matches a sequence of 0 or 1 matches of the atom.
|
||||
.PP
|
||||
A \fIbound\fR is `{' followed by an unsigned decimal integer,
|
||||
possibly followed by `,'
|
||||
.Pp
|
||||
A piece is an
|
||||
.Em atom
|
||||
possibly followed
|
||||
by a single\(dd
|
||||
.Ql \&* ,
|
||||
.Ql \&+ ,
|
||||
.Ql \&? ,
|
||||
or
|
||||
.Em bound .
|
||||
An atom followed by
|
||||
.Ql \&*
|
||||
matches a sequence of 0 or more matches of the atom.
|
||||
An atom followed by
|
||||
.Ql \&+
|
||||
matches a sequence of 1 or more matches of the atom.
|
||||
An atom followed by
|
||||
.Ql ?\&
|
||||
matches a sequence of 0 or 1 matches of the atom.
|
||||
.Pp
|
||||
A
|
||||
.Em bound
|
||||
is
|
||||
.Ql \&{
|
||||
followed by an unsigned decimal integer,
|
||||
possibly followed by
|
||||
.Ql \&,
|
||||
possibly followed by another unsigned decimal integer,
|
||||
always followed by `}'.
|
||||
The integers must lie between 0 and RE_DUP_MAX (255\(dg) inclusive,
|
||||
always followed by
|
||||
.Ql \&} .
|
||||
The integers must lie between 0 and
|
||||
.Dv RE_DUP_MAX
|
||||
(255\(dd) inclusive,
|
||||
and if there are two of them, the first may not exceed the second.
|
||||
An atom followed by a bound containing one integer \fIi\fR
|
||||
An atom followed by a bound containing one integer
|
||||
.Em i
|
||||
and no comma matches
|
||||
a sequence of exactly \fIi\fR matches of the atom.
|
||||
a sequence of exactly
|
||||
.Em i
|
||||
matches of the atom.
|
||||
An atom followed by a bound
|
||||
containing one integer \fIi\fR and a comma matches
|
||||
a sequence of \fIi\fR or more matches of the atom.
|
||||
containing one integer
|
||||
.Em i
|
||||
and a comma matches
|
||||
a sequence of
|
||||
.Em i
|
||||
or more matches of the atom.
|
||||
An atom followed by a bound
|
||||
containing two integers \fIi\fR and \fIj\fR matches
|
||||
a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom.
|
||||
.PP
|
||||
An atom is a regular expression enclosed in `()' (matching a match for the
|
||||
containing two integers
|
||||
.Em i
|
||||
and
|
||||
.Em j
|
||||
matches
|
||||
a sequence of
|
||||
.Em i
|
||||
through
|
||||
.Em j
|
||||
(inclusive) matches of the atom.
|
||||
.Pp
|
||||
An atom is a regular expression enclosed in
|
||||
.Ql ()
|
||||
(matching a match for the
|
||||
regular expression),
|
||||
an empty set of `()' (matching the null string)\(dg,
|
||||
a \fIbracket expression\fR (see below), `.'
|
||||
(matching any single character), `^' (matching the null string at the
|
||||
beginning of a line), `$' (matching the null string at the
|
||||
end of a line), a `\e' followed by one of the characters
|
||||
`^.[$()|*+?{\e'
|
||||
an empty set of
|
||||
.Ql ()
|
||||
(matching the null string)\(dd,
|
||||
a
|
||||
.Em bracket expression
|
||||
(see below),
|
||||
.Ql .\&
|
||||
(matching any single character),
|
||||
.Ql \&^
|
||||
(matching the null string at the beginning of a line),
|
||||
.Ql \&$
|
||||
(matching the null string at the end of a line), a
|
||||
.Ql \e
|
||||
followed by one of the characters
|
||||
.Ql ^.[$()|*+?{\e
|
||||
(matching that character taken as an ordinary character),
|
||||
a `\e' followed by any other character\(dg
|
||||
a
|
||||
.Ql \e
|
||||
followed by any other character\(dd
|
||||
(matching that character taken as an ordinary character,
|
||||
as if the `\e' had not been present\(dg),
|
||||
as if the
|
||||
.Ql \e
|
||||
had not been present\(dd),
|
||||
or a single character with no other significance (matching that character).
|
||||
A `{' followed by a character other than a digit is an ordinary
|
||||
character, not the beginning of a bound\(dg.
|
||||
It is illegal to end an RE with `\e'.
|
||||
.PP
|
||||
A \fIbracket expression\fR is a list of characters enclosed in `[]'.
|
||||
A
|
||||
.Ql \&{
|
||||
followed by a character other than a digit is an ordinary
|
||||
character, not the beginning of a bound\(dd.
|
||||
It is illegal to end an RE with
|
||||
.Ql \e .
|
||||
.Pp
|
||||
A
|
||||
.Em bracket expression
|
||||
is a list of characters enclosed in
|
||||
.Ql [] .
|
||||
It normally matches any single character from the list (but see below).
|
||||
If the list begins with `^',
|
||||
If the list begins with
|
||||
.Ql \&^ ,
|
||||
it matches any single character
|
||||
(but see below) \fInot\fR from the rest of the list.
|
||||
If two characters in the list are separated by `\-', this is shorthand
|
||||
for the full \fIrange\fR of characters between those two (inclusive) in the
|
||||
(but see below)
|
||||
.Em not
|
||||
from the rest of the list.
|
||||
If two characters in the list are separated by
|
||||
.Ql \&- ,
|
||||
this is shorthand
|
||||
for the full
|
||||
.Em range
|
||||
of characters between those two (inclusive) in the
|
||||
collating sequence,
|
||||
e.g. `[0\-9]' in ASCII matches any decimal digit.
|
||||
It is illegal\(dg for two ranges to share an
|
||||
endpoint, e.g. `a\-c\-e'.
|
||||
.No e.g. Ql [0-9]
|
||||
in ASCII matches any decimal digit.
|
||||
It is illegal\(dd for two ranges to share an
|
||||
endpoint,
|
||||
.No e.g. Ql a-c-e .
|
||||
Ranges are very collating-sequence-dependent,
|
||||
and portable programs should avoid relying on them.
|
||||
.PP
|
||||
To include a literal `]' in the list, make it the first character
|
||||
(following a possible `^').
|
||||
To include a literal `\-', make it the first or last character,
|
||||
.Pp
|
||||
To include a literal
|
||||
.Ql \&]
|
||||
in the list, make it the first character
|
||||
(following a possible
|
||||
.Ql \&^ ) .
|
||||
To include a literal
|
||||
.Ql \&- ,
|
||||
make it the first or last character,
|
||||
or the second endpoint of a range.
|
||||
To use a literal `\-' as the first endpoint of a range,
|
||||
enclose it in `[.' and `.]' to make it a collating element (see below).
|
||||
With the exception of these and some combinations using `[' (see next
|
||||
paragraphs), all other special characters, including `\e', lose their
|
||||
special significance within a bracket expression.
|
||||
.PP
|
||||
To use a literal
|
||||
.Ql \&-
|
||||
as the first endpoint of a range,
|
||||
enclose it in
|
||||
.Ql [.\&
|
||||
and
|
||||
.Ql .]\&
|
||||
to make it a collating element (see below).
|
||||
With the exception of these and some combinations using
|
||||
.Ql \&[
|
||||
(see next paragraphs), all other special characters, including
|
||||
.Ql \e ,
|
||||
lose their special significance within a bracket expression.
|
||||
.Pp
|
||||
Within a bracket expression, a collating element (a character,
|
||||
a multi-character sequence that collates as if it were a single character,
|
||||
or a collating-sequence name for either)
|
||||
enclosed in `[.' and `.]' stands for the
|
||||
enclosed in
|
||||
.Ql [.\&
|
||||
and
|
||||
.Ql .]\&
|
||||
stands for the
|
||||
sequence of characters of that collating element.
|
||||
The sequence is a single element of the bracket expression's list.
|
||||
A bracket expression containing a multi-character collating element
|
||||
can thus match more than one character,
|
||||
e.g. if the collating sequence includes a `ch' collating element,
|
||||
then the RE `[[.ch.]]*c' matches the first five characters
|
||||
of `chchcc'.
|
||||
.PP
|
||||
Within a bracket expression, a collating element enclosed in `[=' and
|
||||
`=]' is an equivalence class, standing for the sequences of characters
|
||||
e.g.\& if the collating sequence includes a
|
||||
.Ql ch
|
||||
collating element,
|
||||
then the RE
|
||||
.Ql [[.ch.]]*c
|
||||
matches the first five characters
|
||||
of
|
||||
.Ql chchcc .
|
||||
.Pp
|
||||
Within a bracket expression, a collating element enclosed in
|
||||
.Ql [=
|
||||
and
|
||||
.Ql =]
|
||||
is an equivalence class, standing for the sequences of characters
|
||||
of all collating elements equivalent to that one, including itself.
|
||||
(If there are no other equivalent collating elements,
|
||||
the treatment is as if the enclosing delimiters were `[.' and `.]'.)
|
||||
For example, if o and \o'o^' are the members of an equivalence class,
|
||||
then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous.
|
||||
An equivalence class may not\(dg be an endpoint
|
||||
the treatment is as if the enclosing delimiters were
|
||||
.Ql [.\&
|
||||
and
|
||||
.Ql .] . )
|
||||
For example, if
|
||||
.Ql x
|
||||
and
|
||||
.Ql y
|
||||
are the members of an equivalence class,
|
||||
then
|
||||
.Ql [[=x=]] ,
|
||||
.Ql [[=y=]] ,
|
||||
and
|
||||
.Ql [xy]
|
||||
are all synonymous.
|
||||
An equivalence class may not\(dd be an endpoint
|
||||
of a range.
|
||||
.PP
|
||||
Within a bracket expression, the name of a \fIcharacter class\fR enclosed
|
||||
in `[:' and `:]' stands for the list of all characters belonging to that
|
||||
.Pp
|
||||
Within a bracket expression, the name of a
|
||||
.Em character class
|
||||
enclosed in
|
||||
.Ql [:
|
||||
and
|
||||
.Ql :]
|
||||
stands for the list of all characters belonging to that
|
||||
class.
|
||||
Standard character class names are:
|
||||
.PP
|
||||
.RS
|
||||
.nf
|
||||
.ta 3c 6c 9c
|
||||
alnum digit punct
|
||||
alpha graph space
|
||||
blank lower upper
|
||||
cntrl print xdigit
|
||||
.fi
|
||||
.RE
|
||||
.PP
|
||||
.Pp
|
||||
.Bl -column "alnum" "digit" "xdigit" -offset indent
|
||||
.It Em "alnum digit punct"
|
||||
.It Em "alpha graph space"
|
||||
.It Em "blank lower upper"
|
||||
.It Em "cntrl print xdigit"
|
||||
.El
|
||||
.Pp
|
||||
These stand for the character classes defined in
|
||||
.IR ctype (3).
|
||||
.Xr ctype 3 .
|
||||
A locale may provide others.
|
||||
A character class may not be used as an endpoint of a range.
|
||||
.PP
|
||||
There are two special cases\(dg of bracket expressions:
|
||||
the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at
|
||||
the beginning and end of a word respectively.
|
||||
A word is defined as a sequence of
|
||||
word characters
|
||||
.Pp
|
||||
A bracketed expression like
|
||||
.Ql [[:class:]]
|
||||
can be used to match a single character that belongs to a character
|
||||
class.
|
||||
The reverse, matching any character that does not belong to a specific
|
||||
class, the negation operator of bracket expressions may be used:
|
||||
.Ql [^[:class:]] .
|
||||
.Pp
|
||||
There are two special cases\(dd of bracket expressions:
|
||||
the bracket expressions
|
||||
.Ql [[:<:]]
|
||||
and
|
||||
.Ql [[:>:]]
|
||||
match the null string at the beginning and end of a word respectively.
|
||||
A word is defined as a sequence of word characters
|
||||
which is neither preceded nor followed by
|
||||
word characters.
|
||||
A word character is an
|
||||
.I alnum
|
||||
.Em alnum
|
||||
character (as defined by
|
||||
.IR ctype (3))
|
||||
.Xr ctype 3 )
|
||||
or an underscore.
|
||||
This is an extension,
|
||||
compatible with but not specified by POSIX 1003.2,
|
||||
compatible with but not specified by
|
||||
.St -p1003.2 ,
|
||||
and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
.PP
|
||||
.Pp
|
||||
In the event that an RE could match more than one substring of a given
|
||||
string,
|
||||
the RE matches the one starting earliest in the string.
|
||||
@ -157,79 +323,158 @@ with subexpressions starting earlier in the RE taking priority over
|
||||
ones starting later.
|
||||
Note that higher-level subexpressions thus take priority over
|
||||
their lower-level component subexpressions.
|
||||
.PP
|
||||
.Pp
|
||||
Match lengths are measured in characters, not collating elements.
|
||||
A null string is considered longer than no match at all.
|
||||
For example,
|
||||
`bb*' matches the three middle characters of `abbbc',
|
||||
`(wee|week)(knights|nights)' matches all ten characters of `weeknights',
|
||||
when `(.*).*' is matched against `abc' the parenthesized subexpression
|
||||
.Ql bb*
|
||||
matches the three middle characters of
|
||||
.Ql abbbc ,
|
||||
.Ql (wee|week)(knights|nights)
|
||||
matches all ten characters of
|
||||
.Ql weeknights ,
|
||||
when
|
||||
.Ql (.*).*\&
|
||||
is matched against
|
||||
.Ql abc
|
||||
the parenthesized subexpression
|
||||
matches all three characters, and
|
||||
when `(a*)*' is matched against `bc' both the whole RE and the parenthesized
|
||||
when
|
||||
.Ql (a*)*
|
||||
is matched against
|
||||
.Ql bc
|
||||
both the whole RE and the parenthesized
|
||||
subexpression match the null string.
|
||||
.PP
|
||||
.Pp
|
||||
If case-independent matching is specified,
|
||||
the effect is much as if all case distinctions had vanished from the
|
||||
alphabet.
|
||||
When an alphabetic that exists in multiple cases appears as an
|
||||
ordinary character outside a bracket expression, it is effectively
|
||||
transformed into a bracket expression containing both cases,
|
||||
e.g. `x' becomes `[xX]'.
|
||||
.No e.g. Ql x
|
||||
becomes
|
||||
.Ql [xX] .
|
||||
When it appears inside a bracket expression, all case counterparts
|
||||
of it are added to the bracket expression, so that (e.g.) `[x]'
|
||||
becomes `[xX]' and `[^x]' becomes `[^xX]'.
|
||||
.PP
|
||||
No particular limit is imposed on the length of REs\(dg.
|
||||
of it are added to the bracket expression, so that (e.g.)
|
||||
.Ql [x]
|
||||
becomes
|
||||
.Ql [xX]
|
||||
and
|
||||
.Ql [^x]
|
||||
becomes
|
||||
.Ql [^xX] .
|
||||
.Pp
|
||||
No particular limit is imposed on the length of REs\(dd.
|
||||
Programs intended to be portable should not employ REs longer
|
||||
than 256 bytes,
|
||||
as an implementation can refuse to accept such REs and remain
|
||||
POSIX-compliant.
|
||||
.PP
|
||||
Obsolete (``basic'') regular expressions differ in several respects.
|
||||
`|', `+', and `?' are ordinary characters and there is no equivalent
|
||||
for their functionality.
|
||||
The delimiters for bounds are `\e{' and `\e}',
|
||||
with `{' and `}' by themselves ordinary characters.
|
||||
The parentheses for nested subexpressions are `\e(' and `\e)',
|
||||
with `(' and `)' by themselves ordinary characters.
|
||||
`^' is an ordinary character except at the beginning of the
|
||||
RE or\(dg the beginning of a parenthesized subexpression,
|
||||
`$' is an ordinary character except at the end of the
|
||||
RE or\(dg the end of a parenthesized subexpression,
|
||||
and `*' is an ordinary character if it appears at the beginning of the
|
||||
.Pp
|
||||
Obsolete
|
||||
.Pq Dq basic
|
||||
regular expressions differ in several respects.
|
||||
.Ql \&|
|
||||
is an ordinary character and there is no equivalent
|
||||
for its functionality.
|
||||
.Ql \&+
|
||||
and
|
||||
.Ql ?\&
|
||||
are ordinary characters, and their functionality
|
||||
can be expressed using bounds
|
||||
.No ( Ql {1,}
|
||||
or
|
||||
.Ql {0,1}
|
||||
respectively).
|
||||
Also note that
|
||||
.Ql x+
|
||||
in modern REs is equivalent to
|
||||
.Ql xx* .
|
||||
The delimiters for bounds are
|
||||
.Ql \e{
|
||||
and
|
||||
.Ql \e} ,
|
||||
with
|
||||
.Ql \&{
|
||||
and
|
||||
.Ql \&}
|
||||
by themselves ordinary characters.
|
||||
The parentheses for nested subexpressions are
|
||||
.Ql \e(
|
||||
and
|
||||
.Ql \e) ,
|
||||
with
|
||||
.Ql \&(
|
||||
and
|
||||
.Ql \&)
|
||||
by themselves ordinary characters.
|
||||
.Ql \&^
|
||||
is an ordinary character except at the beginning of the
|
||||
RE or\(dd the beginning of a parenthesized subexpression,
|
||||
.Ql \&$
|
||||
is an ordinary character except at the end of the
|
||||
RE or\(dd the end of a parenthesized subexpression,
|
||||
and
|
||||
.Ql \&*
|
||||
is an ordinary character if it appears at the beginning of the
|
||||
RE or the beginning of a parenthesized subexpression
|
||||
(after a possible leading `^').
|
||||
Finally, there is one new type of atom, a \fIback reference\fR:
|
||||
`\e' followed by a non-zero decimal digit \fId\fR
|
||||
(after a possible leading
|
||||
.Ql \&^ ) .
|
||||
Finally, there is one new type of atom, a
|
||||
.Em back reference :
|
||||
.Ql \e
|
||||
followed by a non-zero decimal digit
|
||||
.Em d
|
||||
matches the same sequence of characters
|
||||
matched by the \fId\fRth parenthesized subexpression
|
||||
matched by the
|
||||
.Em d Ns th
|
||||
parenthesized subexpression
|
||||
(numbering subexpressions by the positions of their opening parentheses,
|
||||
left to right),
|
||||
so that (e.g.) `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'.
|
||||
.SH SEE ALSO
|
||||
regex(3)
|
||||
.PP
|
||||
POSIX 1003.2, section 2.8 (Regular Expression Notation).
|
||||
.SH HISTORY
|
||||
Written by Henry Spencer, based on the 1003.2 spec.
|
||||
.SH BUGS
|
||||
so that (e.g.)
|
||||
.Ql \e([bc]\e)\e1
|
||||
matches
|
||||
.Ql bb
|
||||
or
|
||||
.Ql cc
|
||||
but not
|
||||
.Ql bc .
|
||||
.Sh SEE ALSO
|
||||
.Xr regex 3
|
||||
.Rs
|
||||
.%T Regular Expression Notation
|
||||
.%R IEEE Std
|
||||
.%N 1003.2
|
||||
.%P section 2.8
|
||||
.Re
|
||||
.Sh BUGS
|
||||
Having two kinds of REs is a botch.
|
||||
.PP
|
||||
The current 1003.2 spec says that `)' is an ordinary character in
|
||||
the absence of an unmatched `(';
|
||||
.Pp
|
||||
The current
|
||||
.St -p1003.2
|
||||
spec says that
|
||||
.Ql \&)
|
||||
is an ordinary character in
|
||||
the absence of an unmatched
|
||||
.Ql \&( ;
|
||||
this was an unintentional result of a wording error,
|
||||
and change is likely.
|
||||
Avoid relying on it.
|
||||
.PP
|
||||
.Pp
|
||||
Back references are a dreadful botch,
|
||||
posing major problems for efficient implementations.
|
||||
They are also somewhat vaguely defined
|
||||
(does
|
||||
`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?).
|
||||
.Ql a\e(\e(b\e)*\e2\e)*d
|
||||
match
|
||||
.Ql abbbd ? ) .
|
||||
Avoid using them.
|
||||
.PP
|
||||
1003.2's specification of case-independent matching is vague.
|
||||
The ``one case implies all cases'' definition given above
|
||||
.Pp
|
||||
.St -p1003.2
|
||||
specification of case-independent matching is vague.
|
||||
The
|
||||
.Dq one case implies all cases
|
||||
definition given above
|
||||
is current consensus among implementors as to the right interpretation.
|
||||
.PP
|
||||
.Pp
|
||||
The syntax for word boundaries is incredibly ugly.
|
||||
|
Reference in New Issue
Block a user