> > We probably still have an O(N²) behavior which can bite with a line like > > > > My patch should significantly improve the constant factor, but with a > long enough "N_N_N_N_N..." I suspect it can still end up painful. > I just tried that, with a 4 MB such line, and indeed the result is painful, but nowhere as painful as this bug: opening that file takes "only" about 4 minutes, after which it can be edited normally. > > Maybe we should reduce the scope of the search for the fallback case > (the case where we add the "[^...]+\\<" prefix) since AFAICT its only > purpose is to try and guess a helpful error messages when the XML is > ill-formed. > That's an idea, yes. With the following patch even your "n_n_..." example opens almost instantanously: diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el index c36d225c7c9..61783ea4dec 100644 --- a/lisp/nxml/xmltok.el +++ b/lisp/nxml/xmltok.el @@ -734,7 +734,7 @@ xmltok-scan-attributes (atts-needing-normalization nil)) (while (cond ((or (looking-at (xmltok-attribute regexp)) ;; use non-greedy group - (when (looking-at (concat "[^<>\n]+?" + (when (looking-at (concat "[^<>\n]\\{1,1000\\}?\\<" (xmltok-attribute regexp))) (unless recovering (xmltok-add-error "Malformed attribute" >>> I don't think we want that for `emacs-29`, but unless there's some >>> objection I'll push this to `master`, >> >> I'd say it fixes an important bug in the regexp engine, but I cannot >> judge whether it's important enough for emacs-29. > > It's a missing optimization that's been with us for many many years, so > I don't see any urgency to fix it. > It's not urgent, indeed. But it doesn't look risky either, especially given that you've been using that patch for years. Anyway, I don't have a strong preference.