bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escape


From: Alan Mackenzie
Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
Date: Sun, 22 Nov 2020 13:12:31 +0000

Hello, Stefan.

On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote:
> >> So, yeah, you can add yet-another-hack on top of the other syntax.c
> >> hacks if you want, but there's a good chance it will only ever be used
> >> by CC-mode.  It will take a lot more code changes in syntax.c than
> >> a quick tweak to your Elisp code to search for "\*/".
> [...]
> > OK, here's the patch.

> I think the patch agrees with my assessment above (even though it's
> still missing a etc/NEWS entry, adjustment to the docstring of
> modify-syntax-entry and to the .texi manual).

Here are these things:



diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
index b99b5de0b3..4e9e9207c3 100644
--- a/doc/lispref/syntax.texi
+++ b/doc/lispref/syntax.texi
@@ -287,21 +287,21 @@ Syntax Flags
 @cindex syntax flags
 
   In addition to the classes, entries for characters in a syntax table
-can specify flags.  There are eight possible flags, represented by the
+can specify flags.  There are nine possible flags, represented by the
 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
-@samp{n}, and @samp{p}.
+@samp{e}, @samp{n}, and @samp{p}.
 
   All the flags except @samp{p} are used to describe comment
 delimiters.  The digit flags are used for comment delimiters made up
 of 2 characters.  They indicate that a character can @emph{also} be
 part of a comment sequence, in addition to the syntactic properties
 associated with its character class.  The flags are independent of the
-class and each other for the sake of characters such as @samp{*} in
-C mode, which is a punctuation character, @emph{and} the second
+class and each other for the sake of characters such as @samp{*} in C
+mode, which is a punctuation character, @emph{and} the second
 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
 first character of an end-of-comment sequence (@samp{*/}).  The flags
-@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
-comment delimiter.
+@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the
+corresponding comment delimiter.
 
   Here is a table of the possible flags for a character @var{c},
 and what they mean:
@@ -332,6 +332,13 @@ Syntax Flags
 alternative ``c'' comment style.  For a two-character comment
 delimiter, @samp{c} on either character makes it of style ``c''.
 
+@item
+@samp{e} means that when @var{c}, a comment ender or first character
+of a two character ender, is directly proceded by one or more escape
+characters, @var{c} does not act as a comment ender.  Contrast this
+with the effect of variable @code{comment-end-can-be-escaped}
+(@pxref{Control Parsing}).
+
 @item
 @samp{n} on a comment delimiter character specifies that this kind of
 comment can be nested.  Inside such a comment, only comments of the
@@ -357,7 +364,7 @@ Syntax Flags
 @item @samp{*}
 @samp{23b}
 @item newline
-@samp{>}
+@samp{> e}
 @end table
 
 This defines four comment-delimiting sequences:
@@ -377,7 +384,9 @@ Syntax Flags
 
 @item newline
 This is a comment-end sequence for ``a'' style, because the newline
-character does not have the @samp{b} flag.
+character does not have the @samp{b} flag.  It can be escaped by one
+or more @samp{\} characters, so that an ``a'' style comment can
+continue onto the next line.
 @end table
 
 @item
@@ -962,9 +971,14 @@ Control Parsing
 @defvar comment-end-can-be-escaped
 If this buffer local variable is non-@code{nil}, a single character
 which usually terminates a comment doesn't do so when that character
-is escaped.  This is used in C and C++ Modes, where line comments
-starting with @samp{//} can be continued onto the next line by
-escaping the newline with @samp{\}.
+is escaped.  This used to be used in C and C++ Modes, where line
+comments starting with @samp{//} can be continued onto the next line
+by escaping the newline with @samp{\}.
+
+Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax
+Flags}), where two consecutive escape characters escape the comment
+ender.  @code{comment-end-can-be-escaped} should not be used together
+with the @samp{e} syntax flag.
 @end defvar
 
 You can use @code{forward-comment} to move forward or backward over
@@ -1037,6 +1051,8 @@ Syntax Table Internals
 @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)}
 @item
 @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)}
+@item
+@tab@tab @samp{e} @tab @code{(ash 1 24)}
 @end multitable
 
 @defun string-to-syntax desc
diff --git a/etc/NEWS b/etc/NEWS
index a0e72bc673..3b292e8f41 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
 
 * Lisp Changes in Emacs 28.1
 
++++
+** New syntax flag 'e'.
+This indicates that one or two (or more) escape characters escape a
+comment ender with this flag, causing the comment to be continued past
+that comment ender (typically onto the next line).
+
 +++
 ** 'set-window-configuration' now takes an optional 'dont-set-frame'
 parameter which, when non-nil, instructs the function not to select
diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..7bdbd114ba 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1224,7 +1270,7 @@ Two-character sequences are represented as described 
below.
 The second character of NEWENTRY is the matching parenthesis,
  used only if the first character is `(' or `)'.
 Any additional characters are flags.
-Defined flags are the characters 1, 2, 3, 4, b, p, and n.
+Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p.
  1 means CHAR is the start of a two-char comment start sequence.
  2 means CHAR is the second character of such a sequence.
  3 means CHAR is the start of a two-char comment end sequence.
@@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag:
  c means CHAR is part of comment sequence c.
  n means CHAR is part of a nestable comment sequence.
 
+ e means CHAR, when a comment ender or first char of a two character
+   comment ender, can be escaped by (any number of consecutive)
+   characters with escape syntax.  C and C++ use this facility.
+   Compare and contrast with the variable `comment-end-can-be-escaped'.
+
  p means CHAR is a prefix character for `backward-prefix-chars';
    such characters are treated as whitespace when they occur
    between expressions.



> I really can't understand why you resist so much the use of
> a `syntax-table` property on those rare \\\n sequences.

Because syntax-table text properties are already used for so many
different things in CC Mode (I think the count is five in C++ Mode).
Adding another one would mean having to scan for this rare construct at
every buffer change, and this would slow things down, possibly a lot.

There is no slowdown (beyond a possible microscopic one) in the
modification to syntax.c and, as a bonus, I have written around 200 test
cases for syntax.c's comment features.

>         Stefan


> PS: Also, I just noticed that `gcc -Wall` warns about the use of such
> multiline comments, so it doesn't seem to be a very popular feature.

It is more of a mistake that people occasionally might make than a
feature.  In my opinion, having escaped newlines inside line comments is
a bug in the C/C++ language standards.  Anybody might "end" a line
comment accidentally with "\" or "\\".

> PPS: For reference, I just tried to add support for it in sm-c-mode
> and this is the resulting code:

Just to emphasize Stefan Kangas's point, it is a newline preceded by a
"\" which continues the comment, not an escaped NL in the ordinary
sense.  In particular two "\"s followed by NL still continue the
comment.

> @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into 
> \"#  define\"."
>                                 'syntax-table (string-to-syntax "|"))
>              (put-text-property (match-beginning 2) (match-end 2)
>                                 'syntax-table (string-to-syntax "|")))
> -          (sm-c--cpp-syntax-propertize end)))))
> +          (sm-c--cpp-syntax-propertize end))))
> +    ("\\\\\\(\n\\)"
> +     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
> +          (when (and (nth 4 ppss)        ;Within a comment
> +                     (null (nth 7 ppss)) ;Within a // comment
> +                     (save-excursion     ;The \ is not itself escaped
> +                       (goto-char (match-beginning 0))
> +                       (zerop (mod (skip-chars-backward "\\\\") 2))))
> +            (string-to-syntax "."))))))
>     (point) end))
>  
>  (defun sm-c-syntactic-face-function (ppss)

Yes, something like this would be possible.  But all these syntax-ppsss
would be slow, at least somewhat, as discussed above.

-- 
Alan Mackenzie (Nuremberg, Germany).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]