unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Jens Schmidt <jschmidt4gnu@vodafonemail.de>
Cc: Robert Weiner <rsw@gnu.org>,
	Hank Greenburg <hank.greenburg@protonmail.com>,
	Mats Lidell <mats.lidell@lidells.se>,
	Eli Zaretskii <eliz@gnu.org>,
	rswgnu@gmail.com, 61436@debbugs.gnu.org
Subject: bug#61436: Emacs Freezing With Java Files
Date: Wed, 11 Oct 2023 22:03:05 +0000	[thread overview]
Message-ID: <ZScbmT8PV6ucBGwA@ACM> (raw)
In-Reply-To: <87r0m1t0el.fsf@sappc2.fritz.box>

Hello, Jens.

On Wed, Oct 11, 2023 at 21:38:26 +0200, Jens Schmidt wrote:
> Hi Alan,

> could you please have a look as well?  This seems to be related to
> cc-mode/java-mode.  New, complete reproducer at the very bottom of this
> mail.

> Thanks!

> Hi Robert & Mats,

> Robert Weiner <rsw@gnu.org> writes:

> > Jens wrote:

> >> That always freezes Emacs (29 and master) even before it has a chance to
> >> display P1.java.  The freeze happens in function
> >> `c-get-fallback-scan-pos', where the while loop inf-loops, BUT:

> >> If you uncomment the line setting `hkey-init' to nil in init.el and
> >> repeat: No freeze.

> > As you note above, the infinite loop is coming from a Lisp function in
> > Emacs core, not from Hyperbole.  A Hyperbole setting may help you to
> > see a state reached in that function that you otherwise would not, but
> > it is not a Hyperbole bug; it is an unhandled state outside of
> > Hyperbole.

> Well, yes and no.  The next closest culprit seems to be this hook
> addition from function `hui-select-initialize':

>   ;; These hooks let you select C++ and Java methods and classes by
>   ;; double-clicking on the first character of a definition or on its
>   ;; opening or closing brace.  This is all necessary since some
>   ;; programmers don't put their function braces in the first column.
>   (var:add-and-run-hook
>    'java-mode-hook
>    (lambda ()
>      (setq defun-prompt-regexp
> 	   "^[ \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|synchronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][ \t:;.,{}()\x7f=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*")))

> I (very generally) think that Emacs does not have to grok every regexp
> in every context, but I leave that concrete case for Alan and/or others
> to decide.

I think that that regexp might be the source of the hang.  It is
ill-conditioned.  (I've elided all of the keywords between "public" and
"volatile" to try and make it more readable):

"^[ \t]*\\(\\(\\(public\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\))\\s-+\\)\\)? \\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][ \t:;.,{}()^?=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*"

The first problem seems to be just after "volatile\\)\\s-+\\)*", where you've got:

[[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]
                         ^                ^

, in other words [...]*[...]+, where the ...s match largely the same
characters.  In the event of a failure to match, the Emacs regexp engine
will try every possible combination of these.  This isn't all that bad,
but in a string of N matching characters inside a global mismatch, it
will try out all N-1 ways of splitting up the string between those two
regexp fragments.  In fact, here, the [...]* is entirely redundant (as
well as being harmful) and could be removed.

Another problem is right near the end of the regexp where there is:

\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+

, or rewriting it in an easier to read fashion on several lines:

\\(                                            \\)+  
   \\(                         \\)[, \t\n\r\f]*
      [_$a-zA-Z][_$.a-zA-Z0-9]*
      1111111111111111111111111   2222222222222


.  Here, if you have a sequence of identifier characters, which are
inside a global mismatch, they can all be matched by 1.  However, they
can also be matched by 1, with any number (especially an infinite number)
of zero length strings matching 2.  In this case, the regexp engine will
try out all the ways of matching, an infinite number of them, before
giving up.  Here might be one of the places in the regexp which is
hanging.  It might well be that the second * in that expression should be
a +.

Earlier on in the regexp, I can see \\s-*\\)\\s-+, a possibly zero-length
sequence of space-syntax characters, followed by a non-empty sequence of
them.  I haven't analysed this in detail, but it smells like trouble.

It may well be that persevering with this regexp is a lost cause, and
you'd do better to construct a new regexp from scratch using more
structured methods (perhaps something similar to what's in cc-awk.el).
In fact the regexp looks horribly like one in the CC Mode manual which
was explicitly designated unsupported.  ;-(

Just as a matter of interest, I wrote a tool quite a few years ago to
diagnose and rewrite ill-conditioned regexps, but never got it to release
quality.  I tried out this tool on the regexp, but its output regexp hung
in Java Mode just as much as the original.  But this tool did help me
spot some of the solecisms which I analysed above.


> > On Wed, Oct 11, 2023 at 3:29 AM Mats Lidell <mats.lidell@lidells.se> wrote:
> >
> >  Thanks for the report.

> Actually, not mine.  I'm just the messenger who did some root-cause
> analysis.

> >  Note: I don't know what P1.java means here. I have picked a java file
> >  at random that I had on my machine that is large. Is P1.java a
> >  specific file that has been shared earlier?

> The OP has provided that, see below.

> >  Hyperbole has its own tracker.
> >
> >  https://debbugs.gnu.org/cgi/pkgreport.cgi?package=hyperbole

> Ok, thanks.  As soon as we know whose bug this is we could forward or
> not.


> Now for the next reproducer (Hyperbole no longer required, but still
> present through its regexp :-):

> - Save the following to ~/tmp/init.el:

> ------------------------- snip -------------------------
> (add-hook
>  'java-mode-hook
>  (lambda ()
>    (setq defun-prompt-regexp
> 	 "^[ \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|synchronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][ \t:;.,{}()\x7f=]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\<throws\\>\\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*")))
> ------------------------- snip -------------------------

> - Save attachment P1.java from the initial message

>   https://yhetil.org/emacs-bugs/ZPOcahP9yPJ-kLcgipM3-l0jatXJSQWKPfObrlOkIB3dagud85x2DGXGhPpQn1QNqNksVmPIRc1intyW_Cx1Z9ou2vBZ5QLDpLTi_VFVYyg=@protonmail.com/

>   to ~/tmp/P1.java.

> - Start Emacs as

>   ./src/emacs -Q -l ~/tmp/init.el +181 ~/tmp/P1.java

> That always freezes Emacs (29 and master) even before it has a chance to
> display P1.java.  The freeze happens in function
> `c-get-fallback-scan-pos', where the while loop inf-loops.

c-get-fallback-scan-pos tries to move to the beginning of a function.
This probably involves defun-prompt-regexp when it is non-nil.  :-(

-- 
Alan Mackenzie (Nuremberg, Germany).





  parent reply	other threads:[~2023-10-11 22:03 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-11 18:16 bug#61436: Emacs Freezing With Java Files Hank Greenburg via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-12  0:24 ` Hank Greenburg via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-12  6:30   ` Eli Zaretskii
2023-02-12 16:52     ` Hank Greenburg via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-12 17:05       ` Eli Zaretskii
2023-02-12 17:11         ` Hank Greenburg via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-09 20:26           ` Jens Schmidt via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-10 20:58             ` Jens Schmidt via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-11  7:28               ` Mats Lidell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-11 10:17                 ` Robert Weiner
2023-10-11 19:38                   ` Jens Schmidt via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-11 20:07                     ` Robert Weiner
2023-10-11 21:43                     ` Mats Lidell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-11 22:03                     ` Alan Mackenzie [this message]
2023-10-12 19:58                       ` Jens Schmidt via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-13 12:41                         ` Alan Mackenzie
2023-10-13 18:02                           ` Mats Lidell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-13 20:42                           ` Jens Schmidt via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-14 19:41                             ` Alan Mackenzie
2023-10-15 10:20                               ` Robert Weiner
2023-10-16 14:05                                 ` Alan Mackenzie
2023-10-16 19:10                                   ` Robert Weiner
2023-10-21 22:14                                   ` Mats Lidell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-22 14:15                                     ` Alan Mackenzie
2023-10-22 17:17                                       ` Mats Lidell via Bug reports for GNU Emacs, the Swiss army knife of text editors
     [not found]                                         ` <CA+OMD9hgM_NX7GmeW8ph5fBW6SkFGogf4W4JOO5o62H3X15WHw@mail.gmail.com>
2024-04-17 13:22                                           ` Alan Mackenzie
     [not found]                                           ` <Zh_JagP5xaaXJMOo@ACM>
2024-04-17 18:50                                             ` Alan Mackenzie
2024-04-17 22:24                                               ` Robert Weiner
2024-04-19  2:19                                               ` Robert Weiner
2024-04-19  4:40                                                 ` Robert Weiner
2024-04-19 15:59                                                   ` Alan Mackenzie
2024-04-19  2:58                           ` Robert Weiner
2023-02-12  6:00 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZScbmT8PV6ucBGwA@ACM \
    --to=acm@muc.de \
    --cc=61436@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=hank.greenburg@protonmail.com \
    --cc=jschmidt4gnu@vodafonemail.de \
    --cc=mats.lidell@lidells.se \
    --cc=rsw@gnu.org \
    --cc=rswgnu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).