From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Noam Postavsky Newsgroups: gmane.emacs.bugs Subject: bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode Date: Mon, 20 May 2019 16:47:11 -0400 Message-ID: <87o93wam5s.fsf@gmail.com> References: <87ftujuvkd.fsf@zira.vinc17.org> <878sv7ff6j.fsf@gmail.com> <87pnoiegsh.fsf@gmail.com> <20190517213602.GA11777@zira.vinc17.org> <875zq8e6tw.fsf@gmail.com> <20190518144756.GA21327@zira.vinc17.org> <87r28vd2d5.fsf@gmail.com> <20190519001704.GA5467@zira.vinc17.org> <87k1embaqx.fsf@gmail.com> <87h89qb722.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="20141"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) Cc: Vincent Lefevre , 33887@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 20 22:48:19 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hSpCk-00055q-8G for geb-bug-gnu-emacs@m.gmane.org; Mon, 20 May 2019 22:48:18 +0200 Original-Received: from localhost ([127.0.0.1]:41689 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSpCj-0005WE-Ab for geb-bug-gnu-emacs@m.gmane.org; Mon, 20 May 2019 16:48:17 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:47126) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSpCb-0005Vv-G8 for bug-gnu-emacs@gnu.org; Mon, 20 May 2019 16:48:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hSpCY-00013d-Dk for bug-gnu-emacs@gnu.org; Mon, 20 May 2019 16:48:08 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52804) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hSpCV-000122-BY for bug-gnu-emacs@gnu.org; Mon, 20 May 2019 16:48:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hSpCT-000154-Re for bug-gnu-emacs@gnu.org; Mon, 20 May 2019 16:48:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Noam Postavsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 20 May 2019 20:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33887 X-GNU-PR-Package: emacs Original-Received: via spool by 33887-submit@debbugs.gnu.org id=B33887.15583852454102 (code B ref 33887); Mon, 20 May 2019 20:48:01 +0000 Original-Received: (at 33887) by debbugs.gnu.org; 20 May 2019 20:47:25 +0000 Original-Received: from localhost ([127.0.0.1]:38114 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSpBs-000146-QM for submit@debbugs.gnu.org; Mon, 20 May 2019 16:47:25 -0400 Original-Received: from mail-it1-f177.google.com ([209.85.166.177]:54346) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSpBn-00013n-Bk for 33887@debbugs.gnu.org; Mon, 20 May 2019 16:47:22 -0400 Original-Received: by mail-it1-f177.google.com with SMTP id a190so1283297ite.4 for <33887@debbugs.gnu.org>; Mon, 20 May 2019 13:47:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=V0eMec8Puk7kvckPQiccpTavfW8BwLCmCvfKbX1qisc=; b=jrgrZ4u8quY3S/OgLeZojZn4xRtzpQP8+GcxT1J8A1IUpV/v7d5vdP1KFT0eWJqGtx KUzf+UBTJJSRpWIkdPJv95z60BvQtJt4Y6VvDbpJOlfPj3oxKyZGVt6CMNXAHhFQ0Sxq 5kXy5dU/ww0vyTUVqHUi2NfDxLR1XQC5Vh4oA1o3h9fOvuqnE99RMUCgChKft1Jl/E9B VXarsn8cqDe+m8akIulupUT+OKuNOEmdzjL9tHf3o9GiZB7QOHHnWDCYE3zxeeljweIl zQEgfo10D7bNI6IEEJx+W1JpA8NKaV+9yJwyVE4aKJvkMR6Th6nTjJ5GGC57fj13qfgB eOwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=V0eMec8Puk7kvckPQiccpTavfW8BwLCmCvfKbX1qisc=; b=feQwjRefDIuOQ256G3WgfFpFqn7rIHVkrfYgYOTTRm8QophABzw/TU0LksGSaUPXYM JzM/pjAj6ynsZy7oRDSbckpfYfpEeHu6jm1bq1EsmTyX40eFDNgDixhoEoGxMVmPBfV0 eEZeVSzUvi4ONnskosJ4/I/Z71vYju4JxEsqX9kyyBOjzYHSZw401G+1nHfCgUVadZhD +lkhGSY5P1jW362mGWE2aapfI6bn9+u26CldVy49J5W1NXRuwdY9sHiK+bVG7QhHyl2q Sgi4GB9Ad5ZTKRIXUnurhpOzKBwCDfp0STAL6f/3+qXpFHoJnWiuBVYtnqCLhc7ZKuoF Z3AA== X-Gm-Message-State: APjAAAV2ETdx++bbjyw0x+9nQDzNO9JHrDT0wj5pIuaBaGZvkxIOhpxk hGwz8cv5XQoGbpg8ku0huJd1EJ2p X-Google-Smtp-Source: APXvYqz6GfPefy/sHCoDweiVQ+JEym2Lsi2sb6J+1spQJtn2+6CQ08l/7jO/1NXt6vpCWdSgMfXCkA== X-Received: by 2002:a24:303:: with SMTP id e3mr826849ite.156.1558385233497; Mon, 20 May 2019 13:47:13 -0700 (PDT) Original-Received: from minid (cbl-45-2-119-34.yyz.frontiernetworks.ca. [45.2.119.34]) by smtp.gmail.com with ESMTPSA id 3sm292949itm.25.2019.05.20.13.47.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 20 May 2019 13:47:12 -0700 (PDT) In-Reply-To: (Stefan Monnier's message of "Sun, 19 May 2019 15:24:22 -0400") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:159592 Archived-At: --=-=-= Content-Type: text/plain > There's an issue with the following XML file, which does not have > any special character, except a single quote in the middle of the > text. > > > 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 > > > > Note that the newline character before the is important. Right, this is due to chunking by syntax-propertize. Here's the fix: --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=0001-Handle-lone-quote-500-characters-away-from-XML-tag-B.patch Content-Description: patch >From 2025fa25f76fd8a2df46fca8807ca386372757d5 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Mon, 20 May 2019 16:04:24 -0400 Subject: [PATCH 1/2] Handle lone quote 500+ characters away from XML tag (Bug#33887) Because syntax-propertize works in small buffer chunks, the rule for finding quotes which don't contain angle brackets failed to trigger when the angle bracket was outside of the current chunk. * lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Match quotes on lines with no other angle bracket or quote too (the syntax-propertize chunk is extended to cover whole lines). * test/lisp/nxml/nxml-mode-tests.el (nxml-mode-quote-in-long-text): New test. --- lisp/textmodes/sgml-mode.el | 9 +++++++-- test/lisp/nxml/nxml-mode-tests.el | 22 ++++++++++++++++++++++ 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el index 137745fbc1..b555db7b76 100644 --- a/lisp/textmodes/sgml-mode.el +++ b/lisp/textmodes/sgml-mode.el @@ -353,8 +353,13 @@ sgml-font-lock-keywords ;; the resulting number of calls to syntax-ppss made it too slow ;; (bug#33887), so we're now careful to leave alone any pair ;; of quotes that doesn't hold a < or > char, which is the vast majority. - ("\\([\"']\\)[^<>\"']*[<>\"']" - (1 (unless (eq (char-after (match-beginning 1)) (char-before)) + ;; We also check quotes which are unpaired to end of line, + ;; otherwise we miss the case where the quote might "contain" an + ;; angle bracket outside of the current syntax-propertize chunk + ;; (this relies on `syntax-propertize-wholelines' being enabled). + ("\\([\"']\\)[^<>\"']*\\([<>\"']\\|$\\)" + (1 (unless (eq (char-after (match-beginning 1)) + (char-after (match-beginning 2))) ;; Be careful to call `syntax-ppss' on a position before the one ;; we're going to change, so as not to need to flush the data we ;; just computed. diff --git a/test/lisp/nxml/nxml-mode-tests.el b/test/lisp/nxml/nxml-mode-tests.el index 2bbf92bc96..0916a1e652 100644 --- a/test/lisp/nxml/nxml-mode-tests.el +++ b/test/lisp/nxml/nxml-mode-tests.el @@ -86,5 +86,27 @@ nxml-mode-tests-correctly-indented-string (should (= 1 (car (syntax-ppss (1- (point-max)))))) (should (= 0 (car (syntax-ppss (point-max))))))) +(ert-deftest nxml-mode-quote-in-long-text () + (with-temp-buffer + (nxml-mode) + (insert "" + ;; `syntax-propertize-wholelines' extends chunk size based + ;; on line length, so newlines are significant! + (make-string syntax-propertize-chunk-size ?a) "\n" + "'" + (make-string syntax-propertize-chunk-size ?a) "\n" + "") + ;; If we just check (syntax-ppss (point-max)) immediately, then + ;; we'll end up propertizing the whole buffer in one chunk (so the + ;; test is useless). Simulate something more like what happens + ;; when the buffer is viewed normally. + (cl-loop for pos from (point-min) to (point-max) + by syntax-propertize-chunk-size + do (syntax-ppss pos)) + (syntax-ppss (point-max)) + ;; Check that last tag is parsed as a tag. + (should (= 1 (- (car (syntax-ppss (1- (point-max)))) + (car (syntax-ppss (point-max)))))))) + (provide 'nxml-mode-tests) ;;; nxml-mode-tests.el ends here -- 2.11.0 --=-=-= Content-Type: text/plain Note that you have to be sure to recompile sgml-mode.el AND nxml-mode.el after applying these patches, 'make' isn't smart enough to do it automatically (yes, I figured this out the hard way). >> >1 >> >> (syntax-ppss) on the location of "1" in the above, gives (-1 ...). And >> then (syntax-ppss) on the "/" will give (0 ...). So the syntax >> propertize rule for quote use of (zerop (car (syntax-ppss))) no longer >> works correctly to see whether it's inside or outside a tag. >> >> ">" outside of tags should be set to syntax ".", but I would assume that >> adding a syntax-propertize rule which calls syntax-ppss for every ">" >> (to check whether it's inside a tag or not) will be very slow, just like >> calling it for every quote was. Oh, I figured it out, we can just look at (nth 9 ppss), because the list of open parens is still okay, regardless of unmatched close parens. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=0002-Handle-outside-SGML-XML-tags-Bug-33887.patch Content-Description: patch >From d1520ab5b94d0f130955800ea11222a3702a5519 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Mon, 20 May 2019 16:29:04 -0400 Subject: [PATCH 2/2] Handle ">" outside SGML/XML tags (Bug#33887) * lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Check the list of open parens rather than current depth, the latter is not reliable. * test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax): Extend test for this case. --- lisp/textmodes/sgml-mode.el | 4 +++- test/lisp/textmodes/sgml-mode-tests.el | 9 ++++++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el index b555db7b76..052201e5ee 100644 --- a/lisp/textmodes/sgml-mode.el +++ b/lisp/textmodes/sgml-mode.el @@ -364,7 +364,9 @@ sgml-font-lock-keywords ;; we're going to change, so as not to need to flush the data we ;; just computed. (let ((ppss (syntax-ppss (match-beginning 0)))) - (if (prog1 (zerop (car ppss)) ; Outside tag. + ;; Can't rely on depth (nth 0 ppss), because we don't + ;; mark ">" outside of tags. + (if (prog1 (null (nth 9 ppss)) ; Outside tag. (goto-char (1- (match-end 0))) ;; If we're in a comment, don't skip over comment ;; ender. diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el index 09941fe6f1..d6913863d6 100644 --- a/test/lisp/textmodes/sgml-mode-tests.el +++ b/test/lisp/textmodes/sgml-mode-tests.el @@ -138,13 +138,16 @@ sgml-with-content "\"a'\"" "'a\"'" "" - "")) + "" + ;; Yes, ">" is technically valid outside tags! + ">'" + )) (ert-info (str :prefix "Test string: ") (sgml-with-content str ;; Check that last tag is parsed as a tag. - (should (= 1 (car (syntax-ppss (1- (point-max)))))) - (should (= 0 (car (syntax-ppss (point-max))))))))) + (should (= 1 (- (car (syntax-ppss (1- (point-max)))) + (car (syntax-ppss (point-max)))))))))) (provide 'sgml-mode-tests) ;;; sgml-mode-tests.el ends here -- 2.11.0 --=-=-=--