bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#63622: lisp/progmodes/python.el: performance regression introduced b


From: kobarity
Subject: bug#63622: lisp/progmodes/python.el: performance regression introduced by multiline font-lock
Date: Wed, 24 May 2023 00:45:27 +0900
User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (Gojō) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.0.50 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

Stefan Monnier wrote:
> > @@ -6708,8 +6710,8 @@ python-mode
> >                  nil nil nil nil
> >                  (font-lock-syntactic-face-function
> >                   . python-font-lock-syntactic-face-function)
> > -                (font-lock-extend-after-change-region-function
> > -                 . python-font-lock-extend-region)))
> > +                (font-lock-extend-region-functions
> > +                 . (python-font-lock-extend-region))))
> >    (setq-local syntax-propertize-function
> >                python-syntax-propertize-function)
> >    (setq-local imenu-create-index-function
> 
> This is bound to break in some cases.  Please use `add-hook` instead.

Thanks.  I also revised python-font-lock-extend-region.  The attached
0001-Use-font-lock-extend-region-functions-in-python-mode.patch is the
revised patch.

As for the performance degradation, I am almost certain that it is a
bug in python-info-docstring-p and/or python-rx string-delimiter.
python-info-docstring-p uses the following regex as one of the
conditions to determine if it is a docstring.

          (re (concat "[uU]?[rR]?"
                      (python-rx string-delimiter))))

One of the problems is that string-delimiter matches even if there is
a single character preceding the string.  For example,
string-delimiter matches both x" and 1".  It might be reasonable to
match r" or b", for example.  Because they are correct string starter
in Python.  However, there is no reason to match x" or 1".  Besides,
python-info-docstring-p explicitly states "[uU]?[rR]?" as above.  So
(python-rx string-delimiter) in the above code should only match
string delimiter without prefixes.  Current python-info-docstring-p
incorrectly considers a code like [""] or {""} to be a docstring.

The performance problem in the example shown by Tom can be resolved by
modifying the above code as follows:

          (re (concat "[uU]?[rR]?"
                      (rx (or "\"\"\"" "\"" "'''" "'")))))

It breaks some ERTs, but I think we should fix the ERTs.  However,
there seems to be another problem in python-info-docstring-p.  It
intentionally considers contiguous strings as docstring as in the ERT
python-info-docstring-p-1:

'''
Module Docstring Django style.
'''
u'''Additional module docstring.'''
'''Not a module docstring.'''

However, as far as I have tried with Python 3 and Python 2.7, this is
not correct.  Therefore, I feel it is better to mark the ERTs as
expected failures than to modify it at this stage.  The attached
0001-Fix-python-info-docstring-p.patch is the patch to do this.

Attachment: 0001-Use-font-lock-extend-region-functions-in-python-mode.patch
Description: Binary data

Attachment: 0001-Fix-python-info-docstring-p.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]