bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35202: 27.0.50; Info-quoted false positives and false negatives


From: Mauro Aranda
Subject: bug#35202: 27.0.50; Info-quoted false positives and false negatives
Date: Wed, 10 Apr 2019 21:19:54 -0300

Hello.

I'll explain shortly how I implemented the test:
I created a list of info files that get built when building emacs, and
for each one, I called (info info-filename) to visit it [1].  Then, searched
all the file for the old regexp, storing the values in a list, each
element being a list: '((match-beginning 1) (1- (match-end 1))).  Then,
something similar for the new regexp.  For the change of regexp to have
effect, I added a hook to Info-mode-hook, that basically does this:

(setcar (car Info-mode-font-lock-keywords) current-re)
(setq-local font-lock-defaults '(Info-mode-font-lock-keywords t t)))

Finally, compared both lists (namely old-matches and new-matches) with
cl-set-exclusive-or, sorted it (for better comparison) and wrote a file
.mismatches-filename, for each info file.


Now for the results:
The files that presented mismatches are the following:
emacs (as expected, hence the bug report), calc, idlwave, mh-e, org,
sc.

To navigate to the points, for examination, I recommend widening the
info buffer and then goto-char.

* Emacs:
1)
Old match: (92506 92541)
New match: (92536 92541)

2)
Old match: (92823 92860)
New match: (92856 92860)

This is correct, and achieved by the second option of the regexp I
proposed.

3)
Old match: (183951 183977)
New match: (183952 183977)

This is a little odd, since it is a quote inside a quote.  The new
regexp matches the inner quote, while the old quote quotes the ‘
starting the inner quote too.  There's no big difference, IMO.

4)
Old match: (313527 313526)
New match: (313527 313527)

5)
Old match: (313905 313904)
New match: (313905 313905)

4) and 5) are the same.  This was part of the original bug report, so as
expected, the new regexp handles this case just right.

6)
Old match: (652524 652542)
New match: (652536 652542)

Similar to 1) and 2).

7)
Old match: (767119 767124)
New match: (767123 767124)

This one is tricky.  It is a quote that contains ‘ and ’, but it is not
a nested quote.  Tweaking the regexp to match nested quotes would do the
right thing, but by sheer luck.

8)
Old match: (768216 768225)
New match: (768219 768225)

See 1) and 2).

* Calc:
9)
Old match: (493087 493098)
New match: (493088 493098)

This is odd, and might be the calc.texi file that is wrong (I'm not
sure, but the "`" in calc.texi looks suspicious).  Still, the new
behavior doesn't break display with this one, IMO.

10) Something extra I noted  in the Appendix E Calc Summary.
Both regexp fails at (1386635 1404639).  I found this a hard one, and I
can't think of a way to solve it.

* Idlwave:
11)
Old match: (93451 93514)
New match: (93496 93514)

Both regexp are wrong in this table.  This is similar to 10).  Not an
easy one to solve, but the new regexp at least behaves a little better
in the line with (‘idlwave-find-module’), IMO.

* MH-E:
12) This one is a group of similar mismatches:
Old ones:
(168432 168456)
(168585 168611)
(168755 168774)

New ones:
(168456 168456)
(168611 168611)
(168774 168774)

The old regexp quotes inconsistently, while the proposed one quotes only
the ‘+’, ‘-’ and the ‘r’.  I think it could be solved by tweaking the
proposed regexp, to match the outer quote of a nested quote.

* Org:
13) Go to the table at 685320.  The problem with the mismatches is
similar to 11), and both regexp get it wrong.

* SC:
14)
Old matches:
(9549 9550)
(9768 9769)

New matches:
(9550 9550)
(9769 9769)

I'm not sure if the double quoting of > (as in ‘‘>’’) is intended.  I
don't think so, but I can't be sure.  Still, the new regexp behaves
better, by quoting only the >, while the current one is inconsistent and
looks odd.


To sum it up:
* Not sure if the tables in Idlwave and Org could be changed.  If yes,
then the problems with these files will go away.

* If 9) and 14) can be solved by modifying the .texi file, then either
regexp will do.

* The regexp could be tweaked to match outer quotes, when quotes are
nested.  This is necessary to do the right thing in mh-e file, for
example. 

* Overall, I think it is an improvement.  It doesn't break display, it is more
accurate, and wherever it fails, the current regexp fails too. But of course,
I'm biased, since I'm the one proposing it.

[1] Files I checked:
ada-mode, auth, autotype, bovine, calc, ccmode, cl, dbus, dired-x,
ebrowse, ede, ediff, edt, efaq, efaq-w32, eieio, elisp, eintr, emacs,
emacs-gnutls, emacs-mime, epa, erc, ert, eshell, eudc, eww, flymake,
forms, gnus, htmlfontify, idlwave, ido, info, mairix-el, message, mh-e,
nesticker, nxml-mode, octave-mode, org, pcl-cvs, pgg, ricrc, reftex,
remember, sasl, sc, semantic, ses, sieve, smtpmail, speedbar, srecode,
todo-mode, tramp, url, vhdl-mode, vip, viper, widget, wisent, woman.

For extra points, I checked some external files I happen to have
installed:
libc, bison, wget.

Attachment: 0001-Avoid-false-positives-and-false-negatives-of-Info-qu.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]