From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.bugs Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs Date: Sun, 19 Feb 2023 23:48:57 +0000 Message-ID: <886c06e50e707ab83560@heytings.org> References: <87lel0c65v.fsf@everybody.org> <838rgvymcd.fsf@gnu.org> <886c06e50e9cfacb7954@heytings.org> <83h6vixik1.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13000"; mail-complaints-to="usenet@ciao.gmane.io" Cc: mah@everybody.org, 61514@debbugs.gnu.org, Stefan Monnier To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 20 00:49:28 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pTtQu-0003BM-Dv for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 20 Feb 2023 00:49:28 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pTtQY-0003g5-Sz; Sun, 19 Feb 2023 18:49:06 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pTtQV-0003ds-LM for bug-gnu-emacs@gnu.org; Sun, 19 Feb 2023 18:49:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pTtQV-0004WF-3P for bug-gnu-emacs@gnu.org; Sun, 19 Feb 2023 18:49:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pTtQU-000399-VE for bug-gnu-emacs@gnu.org; Sun, 19 Feb 2023 18:49:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Gregory Heytings Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 19 Feb 2023 23:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61514 X-GNU-PR-Package: emacs Original-Received: via spool by 61514-submit@debbugs.gnu.org id=B61514.167685054212084 (code B ref 61514); Sun, 19 Feb 2023 23:49:02 +0000 Original-Received: (at 61514) by debbugs.gnu.org; 19 Feb 2023 23:49:02 +0000 Original-Received: from localhost ([127.0.0.1]:49963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pTtQT-00038n-OI for submit@debbugs.gnu.org; Sun, 19 Feb 2023 18:49:02 -0500 Original-Received: from heytings.org ([95.142.160.155]:36166) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pTtQQ-00038d-Tt for 61514@debbugs.gnu.org; Sun, 19 Feb 2023 18:49:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20220101; t=1676850537; bh=vNwda1DnU1OPGmjdm95UmGntKJeSy6DGaOeTxvA/0L8=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=rwgCO0Baex+WOfkRwdGTwVpvCWOUipFDrz+b5q9kzonAXyevl8xscb6tQY39hW7zQ 5FXlZ4LYTMgvkE/Y0DNcKj5xmseqxIu2Fcv5ZRr70HBMtzTrxdoLN15mrcY9e0PlTn 5cE25rgG/dzBxWEIdSdirzEBR0zMzRLVk97B2Kh3GPArpOgZOBBq5EXrO5LnxvK9ey hkAvxVkmPNOgHrSoyKdRwkxYm4ymFQwF5FHROastCSksUbVzVg+igxWolBgIzwF9nL sOSNiP0qeLulrAUSic9xb1CRvRhiG+h3tQ8C8PDNhPTGe2VuG5aSLnhOih48ApFApF G2bxMt9VLzTSw== In-Reply-To: <83h6vixik1.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256119 Archived-At: > > I'm not surprised. There's something weird going on there. Do you > understand the logic in this snippet near the end of > re_match_2_internal: > > /* We goto here if a matching operation fails. */ > fail: > maybe_quit (); > if (!FAIL_STACK_EMPTY ()) > { > [...] > } > else > break; /* Matching at this starting point really fails. */ > } /* for (;;) */ > > if (best_regs_set) > goto restore_best_regs; > > unbind_to (count, Qnil); > SAFE_FREE (); > > if (max_redisplay_ticks > 0 && nchars > 0) > update_redisplay_ticks (nchars / 50 + 1, NULL); > > return -1; /* Failure to match. */ > > What is the mechanism to empty the failure stack, which eventually > causes us to report a failure? What I see is that the stack is either > not being emptied, or being emptied very slowly. Do the "magic" numbers > you came up with explain that? > As Stefan just said, it's POP_FAILURE_POINT which reduces the failure stack and restarts the search (if appropriate). After more investigation (and trying to make sense of the magical numbers), my conclusion is that there is most probably no bug in the regexp engine, and that the sole culprit here is the regexp in nXML. I truncated the file to only 10k characters: it opens after a few seconds. Then I added 10k characters at a time, and opening the file took more and more time, but eventually succeeded. I stopped at 50k characters, where opening the file took something like two minutes. By extrapolation, opening the file truncated to 250k characters should take a year or so ;-) Lowering emacs_re_max_failures just makes the regexp engine fail earlier, because there is not enough room in the failure stack. In a sense it is better to fail earlier, but to do that in all cases, we would have to lower emacs_re_max_failures say to 10000, which I guess wouldn't be good because the it would fail too much.