From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#61436: Emacs Freezing With Java Files Date: Wed, 11 Oct 2023 22:03:05 +0000 Message-ID: References: <83cz6fiefb.fsf@gnu.org> <835yc6hl0c.fsf@gnu.org> <87bkd7fsp4.fsf@sappc2.fritz.box> <87il7ew5wx.fsf@sappc2.fritz.box> <87il7dbosk.fsf@lidells.se> <87r0m1t0el.fsf@sappc2.fritz.box> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27922"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Robert Weiner , Hank Greenburg , Mats Lidell , Eli Zaretskii , rswgnu@gmail.com, 61436@debbugs.gnu.org To: Jens Schmidt Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 12 00:03:57 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qqhJ7-00075Z-2L for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 12 Oct 2023 00:03:57 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qqhIs-0005TX-I4; Wed, 11 Oct 2023 18:03:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qqhIp-0005TB-Ub for bug-gnu-emacs@gnu.org; Wed, 11 Oct 2023 18:03:39 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qqhIp-00079N-MR for bug-gnu-emacs@gnu.org; Wed, 11 Oct 2023 18:03:39 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qqhJB-0001T6-MF for bug-gnu-emacs@gnu.org; Wed, 11 Oct 2023 18:04:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 11 Oct 2023 22:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61436 X-GNU-PR-Package: emacs Original-Received: via spool by 61436-submit@debbugs.gnu.org id=B61436.16970618205600 (code B ref 61436); Wed, 11 Oct 2023 22:04:01 +0000 Original-Received: (at 61436) by debbugs.gnu.org; 11 Oct 2023 22:03:40 +0000 Original-Received: from localhost ([127.0.0.1]:40195 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qqhIp-0001SF-93 for submit@debbugs.gnu.org; Wed, 11 Oct 2023 18:03:39 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:39662) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qqhIm-0001Rs-GT for 61436@debbugs.gnu.org; Wed, 11 Oct 2023 18:03:37 -0400 Original-Received: (qmail 59748 invoked by uid 3782); 12 Oct 2023 00:03:07 +0200 Original-Received: from acm.muc.de (pd953a848.dip0.t-ipconnect.de [217.83.168.72]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 12 Oct 2023 00:03:06 +0200 Original-Received: (qmail 5786 invoked by uid 1000); 11 Oct 2023 22:03:05 -0000 Content-Disposition: inline In-Reply-To: <87r0m1t0el.fsf@sappc2.fritz.box> X-Submission-Agent: TMDA/1.3.x (Ph3nix) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:272272 Archived-At: Hello, Jens. On Wed, Oct 11, 2023 at 21:38:26 +0200, Jens Schmidt wrote: > Hi Alan, > could you please have a look as well? This seems to be related to > cc-mode/java-mode. New, complete reproducer at the very bottom of this > mail. > Thanks! > Hi Robert & Mats, > Robert Weiner writes: > > Jens wrote: > >> That always freezes Emacs (29 and master) even before it has a chance = to > >> display P1.java. The freeze happens in function > >> `c-get-fallback-scan-pos', where the while loop inf-loops, BUT: > >> If you uncomment the line setting `hkey-init' to nil in init.el and > >> repeat: No freeze. > > As you note above, the infinite loop is coming from a Lisp function in > > Emacs core, not from Hyperbole. A Hyperbole setting may help you to > > see a state reached in that function that you otherwise would not, but > > it is not a Hyperbole bug; it is an unhandled state outside of > > Hyperbole. > Well, yes and no. The next closest culprit seems to be this hook > addition from function `hui-select-initialize': > ;; These hooks let you select C++ and Java methods and classes by > ;; double-clicking on the first character of a definition or on its > ;; opening or closing brace. This is all necessary since some > ;; programmers don't put their function braces in the first column. > (var:add-and-run-hook > 'java-mode-hook > (lambda () > (setq defun-prompt-regexp > "^[ \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|sy= nchronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)= \\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\= )\\s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_= a-zA-Z][^][ \t:;.,{}()=7F=3D]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(= ([^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\\\s-*\\(\\([_$a-zA-Z][_$.a-z= A-Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*"))) > I (very generally) think that Emacs does not have to grok every regexp > in every context, but I leave that concrete case for Alan and/or others > to decide. I think that that regexp might be the source of the hang. It is ill-conditioned. (I've elided all of the keywords between "public" and "volatile" to try and make it more readable): "^[ \t]*\\(\\(\\(public\\|volatile\\)\\s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-= Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\\s-*\\))\\s-+\\)\\)? \\(\\([[a-zA-Z]= [][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-zA-Z][^][ \t:;.,{}()^?=3D]*\\|\\([= _$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\= \\s-*\\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*" The first problem seems to be just after "volatile\\)\\s-+\\)*", where you'= ve got: [[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z] ^ ^ , in other words [...]*[...]+, where the ...s match largely the same characters. In the event of a failure to match, the Emacs regexp engine will try every possible combination of these. This isn't all that bad, but in a string of N matching characters inside a global mismatch, it will try out all N-1 ways of splitting up the string between those two regexp fragments. In fact, here, the [...]* is entirely redundant (as well as being harmful) and could be removed. Another problem is right near the end of the regexp where there is: \\(\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)[, \t\n\r\f]*\\)+ , or rewriting it in an easier to read fashion on several lines: \\( \\)+ =20 \\( \\)[, \t\n\r\f]* [_$a-zA-Z][_$.a-zA-Z0-9]* 1111111111111111111111111 2222222222222 =2E Here, if you have a sequence of identifier characters, which are inside a global mismatch, they can all be matched by 1. However, they can also be matched by 1, with any number (especially an infinite number) of zero length strings matching 2. In this case, the regexp engine will try out all the ways of matching, an infinite number of them, before giving up. Here might be one of the places in the regexp which is hanging. It might well be that the second * in that expression should be a +. Earlier on in the regexp, I can see \\s-*\\)\\s-+, a possibly zero-length sequence of space-syntax characters, followed by a non-empty sequence of them. I haven't analysed this in detail, but it smells like trouble. It may well be that persevering with this regexp is a lost cause, and you'd do better to construct a new regexp from scratch using more structured methods (perhaps something similar to what's in cc-awk.el). In fact the regexp looks horribly like one in the CC Mode manual which was explicitly designated unsupported. ;-( Just as a matter of interest, I wrote a tool quite a few years ago to diagnose and rewrite ill-conditioned regexps, but never got it to release quality. I tried out this tool on the regexp, but its output regexp hung in Java Mode just as much as the original. But this tool did help me spot some of the solecisms which I analysed above. > > On Wed, Oct 11, 2023 at 3:29=E2=80=AFAM Mats Lidell wrote: > > > > Thanks for the report. > Actually, not mine. I'm just the messenger who did some root-cause > analysis. > > Note: I don't know what P1.java means here. I have picked a java file > > at random that I had on my machine that is large. Is P1.java a > > specific file that has been shared earlier? > The OP has provided that, see below. > > Hyperbole has its own tracker. > > > > https://debbugs.gnu.org/cgi/pkgreport.cgi?package=3Dhyperbole > Ok, thanks. As soon as we know whose bug this is we could forward or > not. > Now for the next reproducer (Hyperbole no longer required, but still > present through its regexp :-): > - Save the following to ~/tmp/init.el: > ------------------------- snip ------------------------- > (add-hook > 'java-mode-hook > (lambda () > (setq defun-prompt-regexp > "^[ \t]*\\(\\(\\(public\\|protected\\|private\\|const\\|abstract\\|sync= hronized\\|final\\|static\\|threadsafe\\|transient\\|native\\|volatile\\)\\= s-+\\)*\\(\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*[][_$.a-zA-Z0-9]+\\|[[a-zA-Z]\\)\= \s-*\\)\\s-+\\)\\)?\\(\\([[a-zA-Z][][_$.a-zA-Z0-9]*\\s-+\\)\\s-*\\)?\\([_a-= zA-Z][^][ \t:;.,{}()=7F=3D]*\\|\\([_$a-zA-Z][_$.a-zA-Z0-9]*\\)\\)\\s-*\\(([= ^);{}]*)\\)?\\([] \t]*\\)\\(\\s-*\\\\s-*\\(\\([_$a-zA-Z][_$.a-zA-= Z0-9]*\\)[, \t\n\r\f]*\\)+\\)?\\s-*"))) > ------------------------- snip ------------------------- > - Save attachment P1.java from the initial message > https://yhetil.org/emacs-bugs/ZPOcahP9yPJ-kLcgipM3-l0jatXJSQWKPfObrlOkI= B3dagud85x2DGXGhPpQn1QNqNksVmPIRc1intyW_Cx1Z9ou2vBZ5QLDpLTi_VFVYyg=3D@proto= nmail.com/ > to ~/tmp/P1.java. > - Start Emacs as > ./src/emacs -Q -l ~/tmp/init.el +181 ~/tmp/P1.java > That always freezes Emacs (29 and master) even before it has a chance to > display P1.java. The freeze happens in function > `c-get-fallback-scan-pos', where the while loop inf-loops. c-get-fallback-scan-pos tries to move to the beginning of a function. This probably involves defun-prompt-regexp when it is non-nil. :-( --=20 Alan Mackenzie (Nuremberg, Germany).