emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scan of regexp mistakes


From: Paul Eggert
Subject: Scan of regexp mistakes
Date: Fri, 8 Mar 2019 09:13:04 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

On 3/5/19 7:06 AM, Mattias Engdegård wrote:
>
> I can run it periodically but would surely forget. Should I put the trawler 
> in the Emacs source tree (if so, where?), in ELPA, or elsewhere?

Stefan mentioned one possibility. Though even then I daresay it'd be
helpful if you ran it periodically, just as I periodically run
admin/merge-gnulib. (If you don't run it, it's likely nobody else will....)


> -      (re-search-forward "^\\(\s+=+\s?+\\)+\n")
> +      (re-search-forward "^\\(\s+=+\s+\\)+\n")
>                                     ^^^
> Are you sure this shouldn't be `\s*'

No, good point. I'll change it to that. See attached patch.


> -  "[a-z0-9$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
> +  "[a-z$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
>
> You kept the rather odd range `*-=' which comprises `*+,-./0123456789:;<='. 
> Is it supposed to be that way?

Goodness knows what is intended here, as this is some ad hoc variant of
RFC 5322/2822/822 and I don't know which variant. Might as well spell
out the range in a more-conventional way, though. The attached patch
replaces *-= with *+./= (as ,:;< aren't allowed unquoted in RFC 5322
atoms) and puts - and 0-9 elsewhere; this should be closer to what was
wanted and should be clearer anyway. I was unable to track down whatever
suggestion was made by Felix Wiemann long ago, and so removed that
comment (since the regexp no longer matches his suggestion anyway).


> -    (while (re-search-forward "\\ce[»\\.\\?]\\|«\\ce" nil t)
> +    (while (re-search-forward "\\ce[»\\.?]\\|«\\ce" nil t)
>
> Should `\' really be kept in the set of characters? It looks like it was only 
> included as an attempt to escape `.' and `?'.

Yes, probably. Fixed in the attached.


> searching for A-z uncovers more suspect regexps, some of which aren't found 
> by the trawler.

I wonder where those all came from? I attempted to fix them in the attached.


> Here is another one in the same file (line 33), but that wasn't found by the 
> trawler:
>
>        (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]"
>
> That \032 doesn't look right (number base confusion?), and it looks like it's 
> meant as a single character alternative but it isn't, given the misplaced `]'.

The regexp has other troubles. It doesn't include !$'()*+,/:@&= (all of
which are reserved characters according to RFC 3986), and it has
duplicate %. The attached patch fixes the % and puts in a FIXME about
the other chars.


> diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el
> index 1ff6358403..83dcc7b0d1 100644
> --- a/lisp/org/org-mobile.el
> +++ b/lisp/org/org-mobile.el
> @@ -845,11 +845,11 @@ If BEG and END are given, only do this in that region."
>           (cl-incf cnt-error)
>           (throw 'next t))
>         (move-marker bos-marker (point))
> -       (if (re-search-forward "^** Old value[ \t]*$" eos t)
> +       (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t)
>
> Shouldn't this start with "^\\**", or does it have to be exactly two 
> asterisks?
>
>             (setq old (buffer-substring
>                        (1+ (match-end 0))
>                        (progn (outline-next-heading) (point)))))
> -       (if (re-search-forward "^** New value[ \t]*$" eos t)
> +       (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t)
>
> Idem.

"\\**" would be safer, yes. Fixed in the attached.


> -   ((string-match "^//\\(.?*\\)/\\(<.*>\\)$" path)
> +   ((string-match "^//\\(.*\\)/\\(<.*>\\)$" path)
>
> Another repetition-of-repetition. Sure it shouldn't be `*?' instead? It looks 
> likely, since there is a `/' following that would be eaten by the `.*' given 
> half a chance.

The comment on the next line says "Planner has the id after the final
slash", which implies that the first .* should indeed be greedy.


>
> diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el
> index be272c0922..c1a267f4c5 100644
> --- a/lisp/progmodes/fortran.el
> +++ b/lisp/progmodes/fortran.el
> @@ -2052,7 +2052,7 @@ If ALL is nil, only match comments that start in column 
> > 0."
>                  (when (<= (point) bos)
>                    (move-to-column (1+ fill-column))
>                    ;; What is this doing???
> -                  (or (re-search-forward "[\t\n,'+-/*)=]" eol t)
> +                  (or (re-search-forward "[-\t\n,'+./*)=]" eol t)
>
> Where did the . come from? Don't you think that `+-/*' were meant to include 
> those four symbols only?

I couldn't figure out what the code was doing (note the comment...) so
decided to preserve the semantics of the old regexp. But you're right,
"." is likely not intended there. I removed it in the attached.


>               ;; Regexp bug in XEmacs disallows ][ inside [], and wants + last
> -             
> "\\s-*\\.\\(\\(address@hidden|---]\\|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
> +             
> "\\s-*\\.\\(\\(address@hidden|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
>              (setq rep (match-string-no-properties 3))
>              (goto-char (match-end 0))
>              (setq tpl-wild-list
>
> Are you sure that | shouldn't be there too? Or is this some kind of XEmacs 
> idiom?
>
You're right. Wilson Snyder later stepped in and fixed that string. Nice
to have a real expert in the house.

Attachment: 0001-More-regexp-corrections-and-tweaks.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]