From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.bugs Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs Date: Mon, 20 Feb 2023 20:13:38 +0000 Message-ID: References: <87lel0c65v.fsf@everybody.org> <838rgvymcd.fsf@gnu.org> <831qmkwmux.fsf@gnu.org> <83cz64v3v7.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="aNKK0DwYBn" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23324"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , 61514@debbugs.gnu.org, mah@everybody.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 20 21:14:29 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pUCYO-0005s7-HW for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 20 Feb 2023 21:14:28 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pUCY3-00071b-G5; Mon, 20 Feb 2023 15:14:07 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pUCXz-00071D-8C for bug-gnu-emacs@gnu.org; Mon, 20 Feb 2023 15:14:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pUCXy-0005GB-V3 for bug-gnu-emacs@gnu.org; Mon, 20 Feb 2023 15:14:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pUCXy-0004ch-Bo for bug-gnu-emacs@gnu.org; Mon, 20 Feb 2023 15:14:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Gregory Heytings Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 20 Feb 2023 20:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61514 X-GNU-PR-Package: emacs Original-Received: via spool by 61514-submit@debbugs.gnu.org id=B61514.167692402417736 (code B ref 61514); Mon, 20 Feb 2023 20:14:02 +0000 Original-Received: (at 61514) by debbugs.gnu.org; 20 Feb 2023 20:13:44 +0000 Original-Received: from localhost ([127.0.0.1]:53726 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pUCXf-0004c0-Rj for submit@debbugs.gnu.org; Mon, 20 Feb 2023 15:13:44 -0500 Original-Received: from heytings.org ([95.142.160.155]:37478) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pUCXd-0004bp-8H for 61514@debbugs.gnu.org; Mon, 20 Feb 2023 15:13:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20220101; t=1676924018; bh=6vT1Un2qtoXvnx7iCY2Zl680TYIrNU3sD0NvUmGwYbo=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=RZZq1BcLKaA063a3qycI83uAzg1LBNUYFuxoB/jPTqULuf+3F4Ry32uVs2uYOoFVK cAyfGDIArjHNF6tiHIOzvV1BKzAMGVWPq2JdvirR25/x5JQjxv6e5jKvRlJ5uIy2Ns +1n6FZ6MqjFZPSuW6R6MuBmzdj6eeVpjD4GVzZCbVhCkWwFL5TYw2t7qtBlFsmC7uL g9W6/AqU/sdZlIp1hl2+xPJZzv3iOaIJZSi5SLKuaWR68Ve2vve4Oxv28/zOp0qeTy Ac3Oj1bdzORj3iUzYTMXf0Q3hOV1QiWqfD5p8Sjhrt0gqDO6YzWjNX/aMc2AZ5gV2z dvqYN98Yqulfg== In-Reply-To: Content-ID: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256218 Archived-At: --aNKK0DwYBn Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-ID: > > We probably still have an O(N=C2=B2) behavior which can bite with a line = like > > > > My patch should significantly improve the constant factor, but with a=20 > long enough "N_N_N_N_N..." I suspect it can still end up painful. > I just tried that, with a 4 MB such line, and indeed the result is=20 painful, but nowhere as painful as this bug: opening that file takes=20 "only" about 4 minutes, after which it can be edited normally. > > Maybe we should reduce the scope of the search for the fallback case=20 > (the case where we add the "[^...]+\\<" prefix) since AFAICT its only=20 > purpose is to try and guess a helpful error messages when the XML is=20 > ill-formed. > That's an idea, yes. With the following patch even your "n_n_..." example= =20 opens almost instantanously: diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el index c36d225c7c9..61783ea4dec 100644 --- a/lisp/nxml/xmltok.el +++ b/lisp/nxml/xmltok.el @@ -734,7 +734,7 @@ xmltok-scan-attributes (atts-needing-normalization nil)) (while (cond ((or (looking-at (xmltok-attribute regexp)) ;; use non-greedy group - (when (looking-at (concat "[^<>\n]+?" + (when (looking-at (concat "[^<>\n]\\{1,1000\\}?\\<" (xmltok-attribute regexp))= ) (unless recovering (xmltok-add-error "Malformed attribute" >>> I don't think we want that for `emacs-29`, but unless there's some=20 >>> objection I'll push this to `master`, >> >> I'd say it fixes an important bug in the regexp engine, but I cannot=20 >> judge whether it's important enough for emacs-29. > > It's a missing optimization that's been with us for many many years, so= =20 > I don't see any urgency to fix it. > It's not urgent, indeed. But it doesn't look risky either, especially=20 given that you've been using that patch for years. Anyway, I don't have a= =20 strong preference. --aNKK0DwYBn--