emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Removing no-back-reference restriction from syntax-propertize-rules


From: Tassilo Horn
Subject: Removing no-back-reference restriction from syntax-propertize-rules
Date: Sat, 16 May 2020 10:39:54 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Hi all,

right now, the docstring of `syntax-propertize-rules' states that
back-references aren't supported (which is true).  I don't see why that
has to be the case.  It already shifts numbered groups as needed, so why
can't it simply shift back-references, too?

The following patch does that:

--8<---------------cut here---------------start------------->8---
modified   lisp/emacs-lisp/syntax.el
@@ -139,14 +139,16 @@ syntax-propertize-multiline
                  (point-max))))
   (cons beg end))
 
-(defun syntax-propertize--shift-groups (re n)
-  (replace-regexp-in-string
-   "\\\\(\\?\\([0-9]+\\):"
-   (lambda (s)
-     (replace-match
-      (number-to-string (+ n (string-to-number (match-string 1 s))))
-      t t s 1))
-   re t t))
+(defun syntax-propertize--shift-groups-and-backrefs (re n)
+  (let ((incr (lambda (s)
+                (replace-match
+                 (number-to-string
+                  (+ n (string-to-number (match-string 1 s))))
+                 t t s 1))))
+    (replace-regexp-in-string
+     "[^\\]\\\\\\([0-9]+\\)" incr
+     (replace-regexp-in-string "\\\\(\\?\\([0-9]+\\):" incr re t t)
+     t t)))
 
 (defmacro syntax-propertize-precompile-rules (&rest rules)
   "Return a precompiled form of RULES to pass to `syntax-propertize-rules'.
@@ -188,9 +190,7 @@ syntax-propertize-rules
 The SYNTAX expression is responsible to save the `match-data' if needed
 for subsequent HIGHLIGHTs.
 Also SYNTAX is free to move point, in which case RULES may not be applied to
-some parts of the text or may be applied several times to other parts.
-
-Note: back-references in REGEXPs do not work."
+some parts of the text or may be applied several times to other parts."
   (declare (debug (&rest &or symbolp    ;FIXME: edebug this eval step.
                          (form &rest
                                (numberp
@@ -219,7 +219,7 @@ syntax-propertize-rules
                  ;; tell when *this* match 0 has succeeded.
                  (cl-incf offset)
                  (setq re (concat "\\(" re "\\)")))
-               (setq re (syntax-propertize--shift-groups re offset))
+               (setq re (syntax-propertize--shift-groups-and-backrefs re 
offset))
                (let ((code '())
                      (condition
                       (cond
--8<---------------cut here---------------end--------------->8---

I've tested it with some simple rules, e.g.,

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

and the properties are applied correctly and the code of the generated
function looks correct, i.e., the second back-reference is rewritten to
\\4 which is the right group \\(three\\) in the combinded regexp.

Am I thinking too naively?  Is there something I'm missing out?

Well, I also found a non-working case:

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(?10:five\\)\\(six\\)\\(\\10\\)" (10 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

Syntactically, this seems to do the right thing.  The numbered group
becomes \\(?16:five\\) with back-reference \\(\\16\\).  However, it will
never match.  With a buffer with contents

--8<---------------cut here---------------start------------->8---
onetwoone test bla bla threefourthree bla quux fivesixfive threefourthree.
--8<---------------cut here---------------end--------------->8---

firing up re-builder with the constructed regexp

  
"\\(one\\)\\(two\\)\\(\\1\\)\\|\\(three\\)\\(four\\)\\(\\4\\)\\|\\(?16:five\\)\\(six\\)\\(\\16\\)"

will not highlight fivesixfive, and re-search-forward doesn't stop at
it.  So is it true that back-references to explicitly numbered groups
don't work at all?

Bye,
Tassilo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]