bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61436: Emacs Freezing With Java Files


From: Alan Mackenzie
Subject: bug#61436: Emacs Freezing With Java Files
Date: Wed, 11 Oct 2023 22:03:05 +0000

Hello, Jens.

On Wed, Oct 11, 2023 at 21:38:26 +0200, Jens Schmidt wrote:
> Hi Alan,

> could you please have a look as well?  This seems to be related to
> cc-mode/java-mode.  New, complete reproducer at the very bottom of this
> mail.

> Thanks!

> Hi Robert & Mats,

> Robert Weiner <rsw@gnu.org> writes:

> > Jens wrote:

> >> That always freezes Emacs (29 and master) even before it has a chance to
> >> display P1.java.  The freeze happens in function
> >> `c-get-fallback-scan-pos', where the while loop inf-loops, BUT:

> >> If you uncomment the line setting `hkey-init' to nil in init.el and
> >> repeat: No freeze.

> > As you note above, the infinite loop is coming from a Lisp function in
> > Emacs core, not from Hyperbole.  A Hyperbole setting may help you to
> > see a state reached in that function that you otherwise would not, but
> > it is not a Hyperbole bug; it is an unhandled state outside of
> > Hyperbole.

> Well, yes and no.  The next closest culprit seems to be this hook
> addition from function `hui-select-initialize':

>   ;; These hooks let you select C++ and Java methods and classes by
>   ;; double-clicking on the first character of a definition or on its
>   ;; opening or closing brace.  This is all necessary since some
>   ;; programmers don't put their function braces in the first column.
>   (var:add-and-run-hook
>    'java-mode-hook
>    (lambda ()
>      (setq defun-prompt-regexp
>          "^[ 
> \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|synchronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][
>  
> \t:;.,{}()=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([]
>  \t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, 
> \t\n\r\f]*\\)+\\)?\\s-*")))

> I (very generally) think that Emacs does not have to grok every regexp
> in every context, but I leave that concrete case for Alan and/or others
> to decide.

I think that that regexp might be the source of the hang.  It is
ill-conditioned.  (I've elided all of the keywords between "public" and
"volatile" to try and make it more readable):

"^[ 
\t]*\\(\\(\\(public\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\))\\s-+\\)\\)?
 \\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][ 
\t:;.,{}()^?=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([] 
\t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, 
\t\n\r\f]*\\)+\\)?\\s-*"

The first problem seems to be just after "volatile\\)\\s-+\\)*", where you've 
got:

[[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]
                         ^                ^

, in other words [...]*[...]+, where the ...s match largely the same
characters.  In the event of a failure to match, the Emacs regexp engine
will try every possible combination of these.  This isn't all that bad,
but in a string of N matching characters inside a global mismatch, it
will try out all N-1 ways of splitting up the string between those two
regexp fragments.  In fact, here, the [...]* is entirely redundant (as
well as being harmful) and could be removed.

Another problem is right near the end of the regexp where there is:

\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+

, or rewriting it in an easier to read fashion on several lines:

\\(                                            \\)+  
   \\(                         \\)[, \t\n\r\f]*
      [_$a-zA-Z][_$.a-zA-Z0-9]*
      1111111111111111111111111   2222222222222


.  Here, if you have a sequence of identifier characters, which are
inside a global mismatch, they can all be matched by 1.  However, they
can also be matched by 1, with any number (especially an infinite number)
of zero length strings matching 2.  In this case, the regexp engine will
try out all the ways of matching, an infinite number of them, before
giving up.  Here might be one of the places in the regexp which is
hanging.  It might well be that the second * in that expression should be
a +.

Earlier on in the regexp, I can see \\s-*\\)\\s-+, a possibly zero-length
sequence of space-syntax characters, followed by a non-empty sequence of
them.  I haven't analysed this in detail, but it smells like trouble.

It may well be that persevering with this regexp is a lost cause, and
you'd do better to construct a new regexp from scratch using more
structured methods (perhaps something similar to what's in cc-awk.el).
In fact the regexp looks horribly like one in the CC Mode manual which
was explicitly designated unsupported.  ;-(

Just as a matter of interest, I wrote a tool quite a few years ago to
diagnose and rewrite ill-conditioned regexps, but never got it to release
quality.  I tried out this tool on the regexp, but its output regexp hung
in Java Mode just as much as the original.  But this tool did help me
spot some of the solecisms which I analysed above.


> > On Wed, Oct 11, 2023 at 3:29 AM Mats Lidell <mats.lidell@lidells.se> wrote:
> >
> >  Thanks for the report.

> Actually, not mine.  I'm just the messenger who did some root-cause
> analysis.

> >  Note: I don't know what P1.java means here. I have picked a java file
> >  at random that I had on my machine that is large. Is P1.java a
> >  specific file that has been shared earlier?

> The OP has provided that, see below.

> >  Hyperbole has its own tracker.
> >
> >  https://debbugs.gnu.org/cgi/pkgreport.cgi?package=hyperbole

> Ok, thanks.  As soon as we know whose bug this is we could forward or
> not.


> Now for the next reproducer (Hyperbole no longer required, but still
> present through its regexp :-):

> - Save the following to ~/tmp/init.el:

> ------------------------- snip -------------------------
> (add-hook
>  'java-mode-hook
>  (lambda ()
>    (setq defun-prompt-regexp
>        "^[ 
> \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|synchronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][
>  
> \t:;.,{}()=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([]
>  \t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, 
> \t\n\r\f]*\\)+\\)?\\s-*")))
> ------------------------- snip -------------------------

> - Save attachment P1.java from the initial message

>   
> https://yhetil.org/emacs-bugs/ZPOcahP9yPJ-kLcgipM3-l0jatXJSQWKPfObrlOkIB3dagud85x2DGXGhPpQn1QNqNksVmPIRc1intyW_Cx1Z9ou2vBZ5QLDpLTi_VFVYyg=@protonmail.com/

>   to ~/tmp/P1.java.

> - Start Emacs as

>   ./src/emacs -Q -l ~/tmp/init.el +181 ~/tmp/P1.java

> That always freezes Emacs (29 and master) even before it has a chance to
> display P1.java.  The freeze happens in function
> `c-get-fallback-scan-pos', where the while loop inf-loops.

c-get-fallback-scan-pos tries to move to the beginning of a function.
This probably involves defun-prompt-regexp when it is non-nil.  :-(

-- 
Alan Mackenzie (Nuremberg, Germany).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]