From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs Date: Sat, 18 Feb 2023 18:22:58 +0200 Message-ID: <838rgvymcd.fsf@gnu.org> References: <87lel0c65v.fsf@everybody.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12448"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 61514@debbugs.gnu.org To: "Mark A. Hershberger" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Feb 18 17:24:15 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pTQ0V-00036o-7w for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 18 Feb 2023 17:24:15 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pTQ0K-0005xg-IA; Sat, 18 Feb 2023 11:24:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pTQ0I-0005xH-Uv for bug-gnu-emacs@gnu.org; Sat, 18 Feb 2023 11:24:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pTQ0I-0002lU-MQ for bug-gnu-emacs@gnu.org; Sat, 18 Feb 2023 11:24:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pTQ0I-000306-Fz for bug-gnu-emacs@gnu.org; Sat, 18 Feb 2023 11:24:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 18 Feb 2023 16:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61514 X-GNU-PR-Package: emacs Original-Received: via spool by 61514-submit@debbugs.gnu.org id=B61514.167673738511467 (code B ref 61514); Sat, 18 Feb 2023 16:24:02 +0000 Original-Received: (at 61514) by debbugs.gnu.org; 18 Feb 2023 16:23:05 +0000 Original-Received: from localhost ([127.0.0.1]:44772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pTPzM-0002yt-UG for submit@debbugs.gnu.org; Sat, 18 Feb 2023 11:23:05 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:55478) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pTPzL-0002yD-6K for 61514@debbugs.gnu.org; Sat, 18 Feb 2023 11:23:03 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pTPzF-0002Xn-LE; Sat, 18 Feb 2023 11:22:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=bu736oO601COnKWJ8R/06T9lnV7bUB0dik8fK3SKsFU=; b=DWzNhndlLM0z o9Jqr0Nl2wibU0J+WTQzAMipQkZndKEBF2+fwl2DUlq4b8lYwHX6k90b7pJLlrdxswxLuBGOFduHN 1js2tj4ZQhlffTw+wAX/M58MNC3flVIQE5uKYnrauaXcC42QYrTEZ3BHYM9MyYbzWeWvblX3iv6pd MCExe5SG/806RYHz0qRiFNuTSuQDbz/kTtyUBXXhVpM/ZJySRPPtIK1lyxNSHmnCzsG8fTt26N73l F2rHG6aeKjyAMoekxG503Smf8DYZKb4hFir+xRAoUqhpHmFl40X/Agww8xdC5yf7B+5HoyysLL5xY 8iU8lViEgSGAP+kvi4CHeg==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pTPzD-0004T4-V7; Sat, 18 Feb 2023 11:22:57 -0500 In-Reply-To: <87lel0c65v.fsf@everybody.org> (bug-gnu-emacs@gnu.org) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:255971 Archived-At: > Date: Tue, 14 Feb 2023 16:02:04 -0500 > From: "Mark A. Hershberger" via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > > There seems to be a regression between 28 and 30 with how emacs handles > long lines. No, there's no regression with long lines. There's an existing bug in our regexp routines and/or nxml. See below. > Bottom line: Emacs 30 is handling files with long lines worse than Emacs > 28. This conclusion is incorrect, or at least inaccurate. Emacs 28.2 has the same problem as Emacs 30. Take that a.xml file, truncate it after 250000 characters, then visit it with Emacs 28.2 -- you will see that Emacs 28.2 freezes exactly like Emacs 30 does. The problem is in the combination of nxml-mode and some subtle bug/misfeature in our regexp routines. Specifically, when we overflow the fail stack, we fail to recover in this case, and seem to infloop inside re_match_2_internal, or maybe recover very inefficiently (I waited for almost 1 hour before giving up). The call which causes the loop is in xmltok.el, in the indicated line: (defun xmltok-scan-attributes () (let ((recovering nil) (atts-needing-normalization nil)) (while (cond ((or (looking-at (xmltok-attribute regexp)) ;; use non-greedy group (when (looking-at (concat "[^<>\n]+?" <<<<<<<<<<<<<<<<< (xmltok-attribute regexp))) (unless recovering (xmltok-add-error "Malformed attribute" (point) (save-excursion (goto-char (xmltok-attribute start name)) (skip-chars-backward "\r\n\t ") (point)))) t)) The regexp that causes this is as follows: "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"