emacs-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Emacs-diffs] master f81ec28 2/2: Merge from origin/emacs-26


From: Paul Eggert
Subject: [Emacs-diffs] master f81ec28 2/2: Merge from origin/emacs-26
Date: Tue, 2 Apr 2019 02:51:32 -0400 (EDT)

branch: master
commit f81ec28f4fc122658e59c0ec99ca4d92a1fe439f
Merge: f5d3449 0924b27
Author: Paul Eggert <address@hidden>
Commit: Paul Eggert <address@hidden>

    Merge from origin/emacs-26
    
    0924b27bca Say which regexp ranges should be avoided
    
    # Conflicts:
    #   doc/lispref/searching.texi
---
 doc/lispref/searching.texi | 52 ++++++++++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index e3f31fd..748ab58 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -391,18 +391,11 @@ writing the starting and ending characters with a 
@samp{-} between them.
 Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
 Ranges may be intermixed freely with individual characters, as in
 @samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
-or @samp{$}, @samp{%} or period.
+or @samp{$}, @samp{%} or period.  However, the ending character of one
+range should not be the starting point of another one; for example,
address@hidden should be avoided.
 
-If @code{case-fold-search} is address@hidden, @samp{[a-z]} also
-matches upper-case letters.  Note that a range like @samp{[a-z]} is
-not affected by the locale's collation sequence, it always represents
-a sequence in @acronym{ASCII} order.
address@hidden This wasn't obvious to me, since, e.g., the grep manual 
"Character
address@hidden Classes and Bracket Expressions" specifically notes the opposite
address@hidden behavior.  But by experiment Emacs seems unaffected by LC_COLLATE
address@hidden in this regard.
-
-Note also that the usual regexp special characters are not special inside a
+The usual regexp special characters are not special inside a
 character alternative.  A completely different set of characters is
 special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
 
@@ -417,13 +410,34 @@ special there.)
 To include @samp{^} in a character alternative, put it anywhere but at
 the beginning.
 
address@hidden What if it starts with a multibyte and ends with a unibyte?
address@hidden That doesn't seem to match anything...?
-If a range starts with a unibyte character @var{c} and ends with a
-multibyte character @var{c2}, the range is divided into two parts: one
-spans the unibyte characters @address@hidden, the other the
-multibyte characters @address@hidden@var{c2}}, where @var{c1} is the
-first character of the charset to which @var{c2} belongs.
+The following aspects of ranges are specific to Emacs, in that POSIX
+allows but does not require this behavior and programs other than
+Emacs may behave differently:
+
address@hidden
address@hidden
+If @code{case-fold-search} is address@hidden, @samp{[a-z]} also
+matches upper-case letters.
+
address@hidden
+A range is not affected by the locale's collation sequence: it always
+represents the set of characters with codepoints ranging between those
+of its bounds, so that @samp{[a-z]} matches only ASCII letters, even
+outside the C or POSIX locale.
+
address@hidden
+As a special case, if either bound of a range is a raw 8-bit byte, the
+other bound should be a unibyte character, and the range matches only
+unibyte characters.
+
address@hidden
+If the lower bound of a range is greater than its upper bound, the
+range is empty and represents no characters.  Thus, @samp{[b-a]}
+always fails to match, and @samp{[^b-a]} matches any character,
+including newline.  However, the lower bound should be at most one
+greater than the upper bound; for example, @samp{[c-a]} should be
+avoided.
address@hidden enumerate
 
 A character alternative can also specify named character classes
 (@pxref{Char Classes}).  This is a POSIX feature.  For example,
@@ -431,6 +445,8 @@ A character alternative can also specify named character 
classes
 Using a character class is equivalent to mentioning each of the
 characters in that class; but the latter is not feasible in practice,
 since some classes include thousands of different characters.
+A character class should not appear as the lower or upper bound
+of a range.
 
 @item @samp{[^ @dots{} ]}
 @cindex @samp{^} in regexp



reply via email to

[Prev in Thread] Current Thread [Next in Thread]