From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: mah@everybody.org, 61514@debbugs.gnu.org
Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 09:59:30 -0500 [thread overview]
Message-ID: <jwv1qmko0yg.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <83cz64v3v7.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 20 Feb 2023 15:54:52 +0200")
Eli Zaretskii [2023-02-20 15:54:52] wrote:
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: mah@everybody.org, 61514@debbugs.gnu.org
>> Date: Mon, 20 Feb 2023 08:19:26 -0500
>>
>> > "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>> >
>> > As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
>> > makes all the difference. So the looking-at which fails reasonably
>> > quickly is the first call to looking-at above, whereas the one the
>> > "hangs" is the second one.
>>
>> Yes, it makes a lot of sense now.
>>
>> > Maybe this points out a way out of this misery?
>>
>> I think it does. E.g. there's a chance that using "[^<>\n]+?\\<"
>> instead of "[^<>\n]+?" avoids the hang
>
> It does, thanks.
>
>> (not sure if it's the right thing to do for all the regexp that can
>> be returned by `xmltok-attribute`, tho).
>
> How would we go about finding out? Because other than that, changing
> the regexp solves this nasty problem, and all the tests in
> test/lisp/nxml/ still pass.
I did find out: we'll always get the same regexp hre, so it's OK.
It turns out that (xmltok-attribute regexp) doesn't mean to return "the
something of `regexp`" but to return the "the regexp named
`xmltok-attribute`".
`xmltok-attribute` is a funny macro built by `xmltok-defregexp`.
>> And for the stack overflow I haven't yet found its origin.
>
> Not sure what is the mystery here. AFAIU, we look for the closing
> ">", don't find it, and then start looking for fewer and fewer non-'>'
> characters followed by '>'. Isn't that what happens here?
Right, but the stack overflows always come from repetitions where
our `mutually_exclusive_p` test fails. Let's see:
\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=
The first two `*` should be non-backtracking because they repeat
[-._[:alnum:]] which is mutually-exclusive with what follows (either `:`
or whitespace, or `=`). Similarly the third `*` should be
non-backtracking because its body can't match the `=` that must follow.
\\(?:[\s\r\t\n]*
there aren't enough whitespaces so even if this can backtrack it
shouldn't be the source of our current problems.
\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'
Neither `*` here should backtrack.
\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)
Same here.
\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
And here we're back to only repeating whitespace.
What am I missing?
Stefan
next prev parent reply other threads:[~2023-02-20 14:59 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 22:05 ` Gregory Heytings
2023-02-15 1:04 ` Mark A. Hershberger
2023-02-15 8:39 ` Gregory Heytings
2023-02-15 10:24 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:41 ` Gregory Heytings
2023-02-15 10:52 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:59 ` Gregory Heytings
2023-02-15 11:52 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 12:11 ` Gregory Heytings
2023-02-15 12:54 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:31 ` Gregory Heytings
2023-02-15 13:56 ` Eli Zaretskii
2023-02-15 12:20 ` Dmitry Gutov
2023-02-15 13:58 ` Gregory Heytings
2023-02-15 14:17 ` Eli Zaretskii
2023-02-15 14:34 ` Gregory Heytings
2023-02-18 16:22 ` Eli Zaretskii
2023-02-18 17:06 ` Mark A. Hershberger
2023-02-18 17:58 ` Eli Zaretskii
2023-02-18 23:06 ` Gregory Heytings
2023-02-19 0:46 ` Gregory Heytings
2023-02-19 6:42 ` Eli Zaretskii
2023-02-19 23:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:48 ` Gregory Heytings
2023-02-19 23:58 ` Gregory Heytings
2023-02-20 2:05 ` Gregory Heytings
2023-02-20 4:24 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 11:28 ` Gregory Heytings
2023-02-20 12:33 ` Eli Zaretskii
2023-02-20 12:31 ` Eli Zaretskii
2023-02-20 12:40 ` Gregory Heytings
2023-02-20 13:14 ` Eli Zaretskii
2023-02-20 14:17 ` Gregory Heytings
2023-02-20 0:14 ` Gregory Heytings
2023-02-20 12:32 ` Eli Zaretskii
2023-02-19 23:48 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:19 ` Eli Zaretskii
2023-02-20 13:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 13:54 ` Eli Zaretskii
2023-02-20 14:59 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-02-20 15:56 ` Gregory Heytings
2023-02-20 16:47 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 17:14 ` Gregory Heytings
2023-02-20 17:34 ` Gregory Heytings
2023-02-20 18:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:11 ` Gregory Heytings
2023-02-20 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 20:13 ` Gregory Heytings
2023-02-21 12:05 ` Eli Zaretskii
2023-02-21 12:37 ` Gregory Heytings
2023-02-21 13:07 ` Eli Zaretskii
2023-02-21 14:38 ` Gregory Heytings
2023-02-21 14:48 ` Eli Zaretskii
2023-02-21 15:25 ` Gregory Heytings
2023-02-21 15:44 ` Gregory Heytings
2023-02-21 16:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-18 10:59 ` Gregory Heytings
2023-03-18 11:10 ` Eli Zaretskii
2023-03-18 15:06 ` Gregory Heytings
2023-03-19 2:39 ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:24 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:35 ` Gregory Heytings
2023-02-20 20:01 ` Eli Zaretskii
2023-02-21 2:23 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 9:39 ` Gregory Heytings
2023-02-21 12:44 ` Eli Zaretskii
2023-02-20 17:04 ` Gregory Heytings
2023-02-20 14:06 ` Gregory Heytings
2023-02-20 14:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 14:24 ` Gregory Heytings
2023-02-20 15:02 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:41 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwv1qmko0yg.fsf-monnier+emacs@gnu.org \
--to=bug-gnu-emacs@gnu.org \
--cc=61514@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=mah@everybody.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).