unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: mah@everybody.org, 61514@debbugs.gnu.org
Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 09:59:30 -0500	[thread overview]
Message-ID: <jwv1qmko0yg.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <83cz64v3v7.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 20 Feb 2023 15:54:52 +0200")

Eli Zaretskii [2023-02-20 15:54:52] wrote:

>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: mah@everybody.org,  61514@debbugs.gnu.org
>> Date: Mon, 20 Feb 2023 08:19:26 -0500
>> 
>> >   "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>> >
>> > As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
>> > makes all the difference.  So the looking-at which fails reasonably
>> > quickly is the first call to looking-at above, whereas the one the
>> > "hangs" is the second one.
>> 
>> Yes, it makes a lot of sense now.
>> 
>> > Maybe this points out a way out of this misery?
>> 
>> I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<"
>> instead of "[^<>\n]+?"  avoids the hang
>
> It does, thanks.
>
>> (not sure if it's the right thing to do for all the regexp that can
>> be returned by `xmltok-attribute`, tho).
>
> How would we go about finding out?  Because other than that, changing
> the regexp solves this nasty problem, and all the tests in
> test/lisp/nxml/ still pass.

I did find out: we'll always get the same regexp hre, so it's OK.

It turns out that (xmltok-attribute regexp) doesn't mean to return "the
something of `regexp`" but to return the "the regexp named
`xmltok-attribute`".

`xmltok-attribute` is a funny macro built by `xmltok-defregexp`.

>> And for the stack overflow I haven't yet found its origin.
>
> Not sure what is the mystery here.  AFAIU, we look for the closing
> ">", don't find it, and then start looking for fewer and fewer non-'>'
> characters followed by '>'.  Isn't that what happens here?

Right, but the stack overflows always come from repetitions where
our `mutually_exclusive_p` test fails.  Let's see:

    \\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=

The first two `*` should be non-backtracking because they repeat
[-._[:alnum:]] which is mutually-exclusive with what follows (either `:`
or whitespace, or `=`).  Similarly the third `*` should be
non-backtracking because its body can't match the `=` that must follow.

    \\(?:[\s\r\t\n]*

there aren't enough whitespaces so even if this can backtrack it
shouldn't be the source of our current problems.

    \\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'

Neither `*` here should backtrack.

    \\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)

Same here.

    \\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

And here we're back to only repeating whitespace.

What am I missing?


        Stefan






  reply	other threads:[~2023-02-20 14:59 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 22:05 ` Gregory Heytings
2023-02-15  1:04   ` Mark A. Hershberger
2023-02-15  8:39     ` Gregory Heytings
2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:41         ` Gregory Heytings
2023-02-15 10:52           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:59             ` Gregory Heytings
2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 12:11                 ` Gregory Heytings
2023-02-15 12:54                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:31                     ` Gregory Heytings
2023-02-15 13:56                 ` Eli Zaretskii
2023-02-15 12:20       ` Dmitry Gutov
2023-02-15 13:58         ` Gregory Heytings
2023-02-15 14:17           ` Eli Zaretskii
2023-02-15 14:34             ` Gregory Heytings
2023-02-18 16:22 ` Eli Zaretskii
2023-02-18 17:06   ` Mark A. Hershberger
2023-02-18 17:58     ` Eli Zaretskii
2023-02-18 23:06   ` Gregory Heytings
2023-02-19  0:46     ` Gregory Heytings
2023-02-19  6:42       ` Eli Zaretskii
2023-02-19 23:12         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:48         ` Gregory Heytings
2023-02-19 23:58           ` Gregory Heytings
2023-02-20  2:05             ` Gregory Heytings
2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 11:28                 ` Gregory Heytings
2023-02-20 12:33               ` Eli Zaretskii
2023-02-20 12:31             ` Eli Zaretskii
2023-02-20 12:40               ` Gregory Heytings
2023-02-20 13:14                 ` Eli Zaretskii
2023-02-20 14:17                   ` Gregory Heytings
2023-02-20  0:14           ` Gregory Heytings
2023-02-20 12:32             ` Eli Zaretskii
2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:19     ` Eli Zaretskii
2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 13:54         ` Eli Zaretskii
2023-02-20 14:59           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-02-20 15:56             ` Gregory Heytings
2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 17:14                 ` Gregory Heytings
2023-02-20 17:34                   ` Gregory Heytings
2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:11                   ` Gregory Heytings
2023-02-20 19:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 20:13                       ` Gregory Heytings
2023-02-21 12:05                         ` Eli Zaretskii
2023-02-21 12:37                           ` Gregory Heytings
2023-02-21 13:07                             ` Eli Zaretskii
2023-02-21 14:38                               ` Gregory Heytings
2023-02-21 14:48                                 ` Eli Zaretskii
2023-02-21 15:25                                   ` Gregory Heytings
2023-02-21 15:44                                     ` Gregory Heytings
2023-02-21 16:58                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-18 10:59                                         ` Gregory Heytings
2023-03-18 11:10                                           ` Eli Zaretskii
2023-03-18 15:06                                             ` Gregory Heytings
2023-03-19  2:39                                           ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:35                               ` Gregory Heytings
2023-02-20 20:01                   ` Eli Zaretskii
2023-02-21  2:23                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21  9:39                       ` Gregory Heytings
2023-02-21 12:44                         ` Eli Zaretskii
2023-02-20 17:04               ` Gregory Heytings
2023-02-20 14:06         ` Gregory Heytings
2023-02-20 14:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 14:24             ` Gregory Heytings
2023-02-20 15:02               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:41   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwv1qmko0yg.fsf-monnier+emacs@gnu.org \
    --to=bug-gnu-emacs@gnu.org \
    --cc=61514@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=mah@everybody.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).