[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#53680: Endless loop in peculiar case of string-match and string-matc
From: |
Mattias Engdegård |
Subject: |
bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50 |
Date: |
Tue, 1 Feb 2022 13:56:10 +0100 |
> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
>
The diagnostics by Lars and Andreas is correct. Let's look at it more closely,
first translating the regexp to rx for ease of reasoning, and see if we can
make it work:
(rx (* (in "\t\r "))
"implements"
(+ (in "\t\r "))
(+ (group
(* (in "\t\r "))
(+ (in "0-9A-Za-z" "\\_"))
(? ",")))
(* (in "\t\r "))
"{" eol)
The first line is meaningless since it can match the empty string, but you
probably want to anchor the start of "implements" so that it doesn't match
"house_implements". Let's also drop the capture group, and we get:
(rx symbol-start "implements"
(+ (in "\t\r "))
(+ (* (in "\t\r "))
(+ (in "0-9A-Za-z" "\\_"))
(? ","))
(* (in "\t\r "))
"{" eol)
You clearly want to match a non-empty sequence of 'words' separated with
whitespace and/or commas, but the pattern is ambiguous -- all inter-word
separators are optional. Let's make them mandatory:
(rx symbol-start "implements"
;; mandatory whitespace
(+ (in "\t\r "))
;; then a word
(+ (in "0-9A-Za-z" "\\_"))
;; then maybe more words, each prefixed by spaces or comma
(* (+ (in "\t\r ,")) ; fast and loose
(+ (in "0-9A-Za-z" "\\_")))
;; finally whitespace before the curly bracket
(* (in "\t\r "))
"{" eol)
which is reasonably efficient, since all ambiguity is now gone: the regexp can
(almost) only match in one way.
Note the "fast and loose" pattern where we accept any number of spaces or
commas. Here it depends on your grammar but if you want exactly one comma
separating each word, that subexpression would be something like
(* (in "\t\r "))
","
(* (in "\t\r "))
instead.