unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#61514: 30.0.50; sadistically long xml line hangs emacs
@ 2023-02-14 21:02 Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-14 22:05 ` Gregory Heytings
                   ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-14 21:02 UTC (permalink / raw)
  To: 61514


There seems to be a regression between 28 and 30 with how emacs handles
long lines.

Reading a comparision of editors, I came across this test case:

    It's interesting how some Linux editors handle huge lines. Tested
    several editors on Ubuntu 19.10 on Intel i3 CPU. With XML file with
    a single line of length 4M. XML file contains line like <id
    name="nnnnnnnnnnnnnnn"> with the huge "name" value of 4M.

    Python script to generate test file:

    #!/usr/bin/python3
    f = open("a.xml", "w")
    f.write('<id name="')
    for n in range(1, 4096):
        f.write("n" * 1024)
    f.write('">\n')

From
https://wiki.lazarus.freepascal.org/CudaText_VS_other_editors#Performance_on_huge_lines

I know Emacs has problems with long lines, but the examples on this page
referred to Emacs 26, so I thought I would see how things have changed.

Opening the file (a.xml) produced by the script above from a dired
buffer in Emacs 30.0.50 shows the following in the message window:

    RNG NXML error: (error "Stack overflow in regexp matcher")

After this, Emacs appears to hang and nothing else is displayed.  The
mouse cursor does not change to indicate that any processing is
happening.  It changes to an arrow over clickable areas (e.g. the menu
bar) and a vertical bar over the dired buffer. Hitting C-g does
nothing. Resizing the window does not properly redraw it. Attempting to
close the window does nothing.

Build config for Emacs 30.0.50 below.

For comparison, Emacs 28.2 (from Debian repo) opens the file but
displays opens the file but the *Messages* buffer contains:

    Error: (error "Stack overflow in regexp matcher")
    Error during redisplay: (jit-lock-function 1) signaled (error "Stack overflow in regexp matcher")
    Error during redisplay: (jit-lock-function 1501) signaled (error "Stack overflow in regexp matcher")
    Internal error in rng-validate-mode triggered at buffer position 5. Stack overflow in regexp matcher

Moving point in the buffer displayed seems to work somewhat normally,
but hitting C-e to go to the end of the line takes a bit and then
keyboard navigation seems problematic while the *Messages* buffer fills
with “Error during redisplay” messages.

Bottom line: Emacs 30 is handling files with long lines worse than Emacs
28.

Output from report-emacs-bug continues:

In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.36, cairo version 1.16.0) of 2023-02-10 built on gabriel
Repository revision: ea29622e928f50522e424ee59b0f24bbb5a42eca
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12201007
System Description: Debian GNU/Linux bookworm/sid

Configured using:
 'configure --with-gif=ifavailable --with-tree-sitter=ifavailable
 --with-cairo --with-imagemagick --with-json --with-native-compilation
 --with-xinput2 --with-xwidgets --with-x-toolkit=gtk3 --with-gconf
 --with-xwidgets --with-imagemagick --with-modules'

Configured features:
ACL CAIRO DBUS FREETYPE GCONF GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ
IMAGEMAGICK JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2
M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP
SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER WEBP X11 XDBE
XIM XINPUT2 XPM XWIDGETS GTK3 ZLIB

Important settings:
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_US.UTF-8
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  helm--remap-mouse-mode: t
  async-bytecomp-package-mode: t
  global-emojify-mode: t
  emojify-mode: t
  which-key-mode: t
  global-page-break-lines-mode: t
  page-break-lines-mode: t
  buffer-face-mode: t
  direnv-mode: t
  flx-ido-mode: t
  auto-compile-on-load-mode: t
  auto-compile-on-save-mode: t
  yas-global-mode: t
  yas-minor-mode: t
  gcmh-mode: t
  global-flycheck-mode: t
  flycheck-mode: t
  override-global-mode: t
  shell-dirtrack-mode: t
  server-mode: t
  ido-everywhere: t
  windmove-mode: t
  display-time-mode: t
  straight-use-package-mode: t
  straight-package-neutering-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  prettify-symbols-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  abbrev-mode: t
  hs-minor-mode: t

Load-path shadows:
/home/mah/.emacs.d/straight/build/dpkg-dev-el/debian-autoloads hides /home/mah/.emacs.d/straight/build/debian-el/debian-autoloads
/home/mah/.emacs.d/straight/build/transient/transient hides /home/mah/work/code/emacs-master/lisp/transient
/home/mah/.emacs.d/straight/build/use-package/use-package hides /home/mah/work/code/emacs-master/lisp/use-package/use-package
/home/mah/.emacs.d/straight/build/use-package/use-package-bind-key hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-bind-key
/home/mah/.emacs.d/straight/build/use-package/use-package-core hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-core
/home/mah/.emacs.d/straight/build/use-package/use-package-delight hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-delight
/home/mah/.emacs.d/straight/build/use-package/use-package-jump hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-jump
/home/mah/.emacs.d/straight/build/use-package/use-package-ensure hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-ensure
/home/mah/.emacs.d/straight/build/use-package/use-package-diminish hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-diminish
/home/mah/.emacs.d/straight/build/use-package/use-package-lint hides /home/mah/work/code/emacs-master/lisp/use-package/use-package-lint
/home/mah/.emacs.d/straight/build/bind-key/bind-key hides /home/mah/work/code/emacs-master/lisp/use-package/bind-key
/home/mah/.emacs.d/straight/build/xref/xref hides /home/mah/work/code/emacs-master/lisp/progmodes/xref
/home/mah/.emacs.d/straight/build/project/project hides /home/mah/work/code/emacs-master/lisp/progmodes/project
/home/mah/.emacs.d/straight/build/org/org-fold hides /home/mah/work/code/emacs-master/lisp/org/org-fold
/home/mah/.emacs.d/straight/build/org/ob-tangle hides /home/mah/work/code/emacs-master/lisp/org/ob-tangle
/home/mah/.emacs.d/straight/build/org/org-datetree hides /home/mah/work/code/emacs-master/lisp/org/org-datetree
/home/mah/.emacs.d/straight/build/org/ob-makefile hides /home/mah/work/code/emacs-master/lisp/org/ob-makefile
/home/mah/.emacs.d/straight/build/org/org-goto hides /home/mah/work/code/emacs-master/lisp/org/org-goto
/home/mah/.emacs.d/straight/build/org/org-timer hides /home/mah/work/code/emacs-master/lisp/org/org-timer
/home/mah/.emacs.d/straight/build/org/ob-julia hides /home/mah/work/code/emacs-master/lisp/org/ob-julia
/home/mah/.emacs.d/straight/build/org/ob-eshell hides /home/mah/work/code/emacs-master/lisp/org/ob-eshell
/home/mah/.emacs.d/straight/build/org/org-macro hides /home/mah/work/code/emacs-master/lisp/org/org-macro
/home/mah/.emacs.d/straight/build/org/ol-eshell hides /home/mah/work/code/emacs-master/lisp/org/ol-eshell
/home/mah/.emacs.d/straight/build/org/ob-emacs-lisp hides /home/mah/work/code/emacs-master/lisp/org/ob-emacs-lisp
/home/mah/.emacs.d/straight/build/org/ob-fortran hides /home/mah/work/code/emacs-master/lisp/org/ob-fortran
/home/mah/.emacs.d/straight/build/org/ol-eww hides /home/mah/work/code/emacs-master/lisp/org/ol-eww
/home/mah/.emacs.d/straight/build/org/ol-mhe hides /home/mah/work/code/emacs-master/lisp/org/ol-mhe
/home/mah/.emacs.d/straight/build/org/ol-irc hides /home/mah/work/code/emacs-master/lisp/org/ol-irc
/home/mah/.emacs.d/straight/build/org/ox-org hides /home/mah/work/code/emacs-master/lisp/org/ox-org
/home/mah/.emacs.d/straight/build/org/org-lint hides /home/mah/work/code/emacs-master/lisp/org/org-lint
/home/mah/.emacs.d/straight/build/org/ob-core hides /home/mah/work/code/emacs-master/lisp/org/ob-core
/home/mah/.emacs.d/straight/build/org/org-list hides /home/mah/work/code/emacs-master/lisp/org/org-list
/home/mah/.emacs.d/straight/build/org/org-compat hides /home/mah/work/code/emacs-master/lisp/org/org-compat
/home/mah/.emacs.d/straight/build/org/ox-man hides /home/mah/work/code/emacs-master/lisp/org/ox-man
/home/mah/.emacs.d/straight/build/org/org-persist hides /home/mah/work/code/emacs-master/lisp/org/org-persist
/home/mah/.emacs.d/straight/build/org/ob-org hides /home/mah/work/code/emacs-master/lisp/org/ob-org
/home/mah/.emacs.d/straight/build/org/ob-table hides /home/mah/work/code/emacs-master/lisp/org/ob-table
/home/mah/.emacs.d/straight/build/org/ol-bibtex hides /home/mah/work/code/emacs-master/lisp/org/ol-bibtex
/home/mah/.emacs.d/straight/build/org/org-element hides /home/mah/work/code/emacs-master/lisp/org/org-element
/home/mah/.emacs.d/straight/build/org/oc-natbib hides /home/mah/work/code/emacs-master/lisp/org/oc-natbib
/home/mah/.emacs.d/straight/build/org/ob-ocaml hides /home/mah/work/code/emacs-master/lisp/org/ob-ocaml
/home/mah/.emacs.d/straight/build/org/org-agenda hides /home/mah/work/code/emacs-master/lisp/org/org-agenda
/home/mah/.emacs.d/straight/build/org/ob-sqlite hides /home/mah/work/code/emacs-master/lisp/org/ob-sqlite
/home/mah/.emacs.d/straight/build/org/ol-bbdb hides /home/mah/work/code/emacs-master/lisp/org/ol-bbdb
/home/mah/.emacs.d/straight/build/org/ob-ref hides /home/mah/work/code/emacs-master/lisp/org/ob-ref
/home/mah/.emacs.d/straight/build/org/ox-latex hides /home/mah/work/code/emacs-master/lisp/org/ox-latex
/home/mah/.emacs.d/straight/build/org/org-loaddefs hides /home/mah/work/code/emacs-master/lisp/org/org-loaddefs
/home/mah/.emacs.d/straight/build/org/org-fold-core hides /home/mah/work/code/emacs-master/lisp/org/org-fold-core
/home/mah/.emacs.d/straight/build/org/ob-ditaa hides /home/mah/work/code/emacs-master/lisp/org/ob-ditaa
/home/mah/.emacs.d/straight/build/org/ox-beamer hides /home/mah/work/code/emacs-master/lisp/org/ox-beamer
/home/mah/.emacs.d/straight/build/org/ob-clojure hides /home/mah/work/code/emacs-master/lisp/org/ob-clojure
/home/mah/.emacs.d/straight/build/org/ob-haskell hides /home/mah/work/code/emacs-master/lisp/org/ob-haskell
/home/mah/.emacs.d/straight/build/org/ob-sql hides /home/mah/work/code/emacs-master/lisp/org/ob-sql
/home/mah/.emacs.d/straight/build/org/ob-matlab hides /home/mah/work/code/emacs-master/lisp/org/ob-matlab
/home/mah/.emacs.d/straight/build/org/org-num hides /home/mah/work/code/emacs-master/lisp/org/org-num
/home/mah/.emacs.d/straight/build/org/ob-R hides /home/mah/work/code/emacs-master/lisp/org/ob-R
/home/mah/.emacs.d/straight/build/org/ob-js hides /home/mah/work/code/emacs-master/lisp/org/ob-js
/home/mah/.emacs.d/straight/build/org/ox-ascii hides /home/mah/work/code/emacs-master/lisp/org/ox-ascii
/home/mah/.emacs.d/straight/build/org/org-entities hides /home/mah/work/code/emacs-master/lisp/org/org-entities
/home/mah/.emacs.d/straight/build/org/org-plot hides /home/mah/work/code/emacs-master/lisp/org/org-plot
/home/mah/.emacs.d/straight/build/org/ob-shell hides /home/mah/work/code/emacs-master/lisp/org/ob-shell
/home/mah/.emacs.d/straight/build/org/oc hides /home/mah/work/code/emacs-master/lisp/org/oc
/home/mah/.emacs.d/straight/build/org/oc-biblatex hides /home/mah/work/code/emacs-master/lisp/org/oc-biblatex
/home/mah/.emacs.d/straight/build/org/org-ctags hides /home/mah/work/code/emacs-master/lisp/org/org-ctags
/home/mah/.emacs.d/straight/build/org/org-habit hides /home/mah/work/code/emacs-master/lisp/org/org-habit
/home/mah/.emacs.d/straight/build/org/ob-perl hides /home/mah/work/code/emacs-master/lisp/org/ob-perl
/home/mah/.emacs.d/straight/build/org/org-table hides /home/mah/work/code/emacs-master/lisp/org/org-table
/home/mah/.emacs.d/straight/build/org/ob-calc hides /home/mah/work/code/emacs-master/lisp/org/ob-calc
/home/mah/.emacs.d/straight/build/org/oc-bibtex hides /home/mah/work/code/emacs-master/lisp/org/oc-bibtex
/home/mah/.emacs.d/straight/build/org/ob-octave hides /home/mah/work/code/emacs-master/lisp/org/ob-octave
/home/mah/.emacs.d/straight/build/org/ob-maxima hides /home/mah/work/code/emacs-master/lisp/org/ob-maxima
/home/mah/.emacs.d/straight/build/org/ol hides /home/mah/work/code/emacs-master/lisp/org/ol
/home/mah/.emacs.d/straight/build/org/org-inlinetask hides /home/mah/work/code/emacs-master/lisp/org/org-inlinetask
/home/mah/.emacs.d/straight/build/org/ox-koma-letter hides /home/mah/work/code/emacs-master/lisp/org/ox-koma-letter
/home/mah/.emacs.d/straight/build/org/org-cycle hides /home/mah/work/code/emacs-master/lisp/org/org-cycle
/home/mah/.emacs.d/straight/build/org/ob-latex hides /home/mah/work/code/emacs-master/lisp/org/ob-latex
/home/mah/.emacs.d/straight/build/org/org-indent hides /home/mah/work/code/emacs-master/lisp/org/org-indent
/home/mah/.emacs.d/straight/build/org/ol-gnus hides /home/mah/work/code/emacs-master/lisp/org/ol-gnus
/home/mah/.emacs.d/straight/build/org/org-refile hides /home/mah/work/code/emacs-master/lisp/org/org-refile
/home/mah/.emacs.d/straight/build/org/ob-sed hides /home/mah/work/code/emacs-master/lisp/org/ob-sed
/home/mah/.emacs.d/straight/build/org/org-attach-git hides /home/mah/work/code/emacs-master/lisp/org/org-attach-git
/home/mah/.emacs.d/straight/build/org/org-colview hides /home/mah/work/code/emacs-master/lisp/org/org-colview
/home/mah/.emacs.d/straight/build/org/ob-groovy hides /home/mah/work/code/emacs-master/lisp/org/ob-groovy
/home/mah/.emacs.d/straight/build/org/ob-lisp hides /home/mah/work/code/emacs-master/lisp/org/ob-lisp
/home/mah/.emacs.d/straight/build/org/org-protocol hides /home/mah/work/code/emacs-master/lisp/org/org-protocol
/home/mah/.emacs.d/straight/build/org/ol-doi hides /home/mah/work/code/emacs-master/lisp/org/ol-doi
/home/mah/.emacs.d/straight/build/org/ob-ruby hides /home/mah/work/code/emacs-master/lisp/org/ob-ruby
/home/mah/.emacs.d/straight/build/org/ox-texinfo hides /home/mah/work/code/emacs-master/lisp/org/ox-texinfo
/home/mah/.emacs.d/straight/build/org/ob-eval hides /home/mah/work/code/emacs-master/lisp/org/ob-eval
/home/mah/.emacs.d/straight/build/org/ob-dot hides /home/mah/work/code/emacs-master/lisp/org/ob-dot
/home/mah/.emacs.d/straight/build/org/org-feed hides /home/mah/work/code/emacs-master/lisp/org/org-feed
/home/mah/.emacs.d/straight/build/org/ox-odt hides /home/mah/work/code/emacs-master/lisp/org/ox-odt
/home/mah/.emacs.d/straight/build/org/ob-plantuml hides /home/mah/work/code/emacs-master/lisp/org/ob-plantuml
/home/mah/.emacs.d/straight/build/org/ol-docview hides /home/mah/work/code/emacs-master/lisp/org/ol-docview
/home/mah/.emacs.d/straight/build/org/ob-lob hides /home/mah/work/code/emacs-master/lisp/org/ob-lob
/home/mah/.emacs.d/straight/build/org/ob-awk hides /home/mah/work/code/emacs-master/lisp/org/ob-awk
/home/mah/.emacs.d/straight/build/org/ox-publish hides /home/mah/work/code/emacs-master/lisp/org/ox-publish
/home/mah/.emacs.d/straight/build/org/ox-html hides /home/mah/work/code/emacs-master/lisp/org/ox-html
/home/mah/.emacs.d/straight/build/org/org hides /home/mah/work/code/emacs-master/lisp/org/org
/home/mah/.emacs.d/straight/build/org/org-src hides /home/mah/work/code/emacs-master/lisp/org/org-src
/home/mah/.emacs.d/straight/build/org/ol-w3m hides /home/mah/work/code/emacs-master/lisp/org/ol-w3m
/home/mah/.emacs.d/straight/build/org/ox hides /home/mah/work/code/emacs-master/lisp/org/ox
/home/mah/.emacs.d/straight/build/org/ob-C hides /home/mah/work/code/emacs-master/lisp/org/ob-C
/home/mah/.emacs.d/straight/build/org/oc-basic hides /home/mah/work/code/emacs-master/lisp/org/oc-basic
/home/mah/.emacs.d/straight/build/org/ob-screen hides /home/mah/work/code/emacs-master/lisp/org/ob-screen
/home/mah/.emacs.d/straight/build/org/ob-processing hides /home/mah/work/code/emacs-master/lisp/org/ob-processing
/home/mah/.emacs.d/straight/build/org/ob-sass hides /home/mah/work/code/emacs-master/lisp/org/ob-sass
/home/mah/.emacs.d/straight/build/org/ol-man hides /home/mah/work/code/emacs-master/lisp/org/ol-man
/home/mah/.emacs.d/straight/build/org/org-version hides /home/mah/work/code/emacs-master/lisp/org/org-version
/home/mah/.emacs.d/straight/build/org/org-keys hides /home/mah/work/code/emacs-master/lisp/org/org-keys
/home/mah/.emacs.d/straight/build/org/ox-md hides /home/mah/work/code/emacs-master/lisp/org/ox-md
/home/mah/.emacs.d/straight/build/org/org-capture hides /home/mah/work/code/emacs-master/lisp/org/org-capture
/home/mah/.emacs.d/straight/build/org/ob-lua hides /home/mah/work/code/emacs-master/lisp/org/ob-lua
/home/mah/.emacs.d/straight/build/org/org-duration hides /home/mah/work/code/emacs-master/lisp/org/org-duration
/home/mah/.emacs.d/straight/build/org/org-footnote hides /home/mah/work/code/emacs-master/lisp/org/org-footnote
/home/mah/.emacs.d/straight/build/org/org-macs hides /home/mah/work/code/emacs-master/lisp/org/org-macs
/home/mah/.emacs.d/straight/build/org/org-tempo hides /home/mah/work/code/emacs-master/lisp/org/org-tempo
/home/mah/.emacs.d/straight/build/org/ob-lilypond hides /home/mah/work/code/emacs-master/lisp/org/ob-lilypond
/home/mah/.emacs.d/straight/build/org/ob-exp hides /home/mah/work/code/emacs-master/lisp/org/ob-exp
/home/mah/.emacs.d/straight/build/org/ob-python hides /home/mah/work/code/emacs-master/lisp/org/ob-python
/home/mah/.emacs.d/straight/build/org/ol-info hides /home/mah/work/code/emacs-master/lisp/org/ol-info
/home/mah/.emacs.d/straight/build/org/org-pcomplete hides /home/mah/work/code/emacs-master/lisp/org/org-pcomplete
/home/mah/.emacs.d/straight/build/org/org-attach hides /home/mah/work/code/emacs-master/lisp/org/org-attach
/home/mah/.emacs.d/straight/build/org/org-archive hides /home/mah/work/code/emacs-master/lisp/org/org-archive
/home/mah/.emacs.d/straight/build/org/ol-rmail hides /home/mah/work/code/emacs-master/lisp/org/ol-rmail
/home/mah/.emacs.d/straight/build/org/org-id hides /home/mah/work/code/emacs-master/lisp/org/org-id
/home/mah/.emacs.d/straight/build/org/org-crypt hides /home/mah/work/code/emacs-master/lisp/org/org-crypt
/home/mah/.emacs.d/straight/build/org/ob-java hides /home/mah/work/code/emacs-master/lisp/org/ob-java
/home/mah/.emacs.d/straight/build/org/ob-css hides /home/mah/work/code/emacs-master/lisp/org/ob-css
/home/mah/.emacs.d/straight/build/org/ob-scheme hides /home/mah/work/code/emacs-master/lisp/org/ob-scheme
/home/mah/.emacs.d/straight/build/org/org-faces hides /home/mah/work/code/emacs-master/lisp/org/org-faces
/home/mah/.emacs.d/straight/build/org/ob hides /home/mah/work/code/emacs-master/lisp/org/ob
/home/mah/.emacs.d/straight/build/org/ob-comint hides /home/mah/work/code/emacs-master/lisp/org/ob-comint
/home/mah/.emacs.d/straight/build/org/org-mobile hides /home/mah/work/code/emacs-master/lisp/org/org-mobile
/home/mah/.emacs.d/straight/build/org/ob-forth hides /home/mah/work/code/emacs-master/lisp/org/ob-forth
/home/mah/.emacs.d/straight/build/org/org-clock hides /home/mah/work/code/emacs-master/lisp/org/org-clock
/home/mah/.emacs.d/straight/build/org/ox-icalendar hides /home/mah/work/code/emacs-master/lisp/org/ox-icalendar
/home/mah/.emacs.d/straight/build/org/oc-csl hides /home/mah/work/code/emacs-master/lisp/org/oc-csl
/home/mah/.emacs.d/straight/build/org/org-mouse hides /home/mah/work/code/emacs-master/lisp/org/org-mouse
/home/mah/.emacs.d/straight/build/org/ob-gnuplot hides /home/mah/work/code/emacs-master/lisp/org/ob-gnuplot
/home/mah/.emacs.d/straight/build/let-alist/let-alist hides /home/mah/work/code/emacs-master/lisp/emacs-lisp/let-alist

Features:
(shadow sort mail-extr emacsbug winner ffap tramp-archive tramp-gvfs
tramp-cache zeroconf helm-command helm-elisp helm-eval edebug debug
backtrace helm-info helm-mode helm-misc helm-git-grep helm-files
image-dired image-dired-tags image-dired-external image-dired-util
image-mode exif helm-buffers helm-occur helm-tags helm-locate helm-grep
helm-regexp helm-utils helm-help helm-types cl helm helm-global-bindings
helm-easymenu helm-core async-bytecomp helm-source helm-multi-match
helm-lib async hideshow emojify tar-mode arc-mode archive-mode init
cal-china lunar solar cal-dst cal-hebrew cal-julian holidays
holiday-loaddefs terraform-mode hcl-mode terraform-mode-autoloads
hcl-mode-autoloads terraform-doc terraform-doc-autoloads html-fold
html-fold-autoloads danneskjold-theme danneskjold-theme-autoloads
dpkg-dev-el-autoloads dpkg-dev-el debian-el-autoloads debian-el
which-key which-key-autoloads prettier-js-autoloads impatient-mode
htmlize simple-httpd impatient-mode-autoloads simple-httpd-autoloads
web-mode-autoloads whattf-dt html5-langs whattf-dt-autoloads
rustic-autoloads xterm-color-autoloads spinner-autoloads
project-autoloads xref-autoloads rust-mode-autoloads flycheck-rust
flycheck-rust-autoloads feature-mode cucumber-mode etags fileloop
feature-mode-autoloads markdown-xwidget-autoloads mustache-autoloads
phpcbf phpcbf-autoloads dockerfile-mode-autoloads nov-autoloads
esxml-autoloads kv-autoloads go-errcheck-autoloads go-mode-autoloads
blamer-autoloads git-timemachine vc-git vc-dispatcher
git-timemachine-autoloads treemacs-autoloads cfrs-autoloads
posframe-autoloads pfuture-autoloads ace-window-autoloads lice
lice-autoloads gnus-alias gnus-alias-autoloads lorem-ipsum-autoloads
ox-moderncv org-cv-utils ox-moderncv-autoloads magit-tramp-autoloads
magit-gitflow-autoloads magit-popup-autoloads orgit-forge-autoloads
orgit-autoloads web time-stamp web-autoloads ghub+ apiwrap apropos
ghub+-autoloads apiwrap-autoloads ox-mediawiki-autoloads
org-download-autoloads org-ref org-ref-core org-ref-glossary
org-ref-bibtex avy doi-utils org-ref-utils org-ref-export citeproc
citeproc-itemgetters citeproc-biblatex citeproc-bibtex ol-bibtex
citeproc-cite citeproc-subbibs citeproc-sort citeproc-name
citeproc-formatters citeproc-number rst citeproc-proc citeproc-disamb
citeproc-itemdata citeproc-generic-elements citeproc-macro
citeproc-choose citeproc-date citeproc-context citeproc-prange
citeproc-style citeproc-locale citeproc-term citeproc-rt citeproc-lib
citeproc-s thingatpt queue org-ref-misc-links org-ref-label-link
org-ref-ref-links org-ref-citation-links xref project
org-ref-bibliography-links hydra lv bibtex-completion filenotify biblio
biblio-download biblio-dissemin biblio-ieee biblio-hal biblio-dblp
biblio-crossref biblio-arxiv timezone biblio-doi biblio-core url-queue
hl-line parsebib bibtex org-ref-autoloads citeproc-autoloads
queue-autoloads bibtex-completion-autoloads biblio-autoloads
biblio-core-autoloads parsebib-autoloads avy-autoloads
org2blog-autoloads writegood-mode-autoloads hydra-autoloads lv-autoloads
htmlize-autoloads metaweblog metaweblog-autoloads xml-rpc
xml-rpc-autoloads mediawiki-autoloads json-mode js c-ts-common treesit
imenu cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align
cc-engine cc-vars cc-defs json-mode-autoloads json-snatcher
json-snatcher-autoloads password-store auth-source-pass
password-store-autoloads page-break-lines page-break-lines-autoloads
vterm bookmark tramp tramp-loaddefs trampver tramp-integration cus-edit
pp files-x tramp-compat ls-lisp face-remap compile term disp-table ehelp
vterm-module term/xterm xterm vterm-autoloads org-journal-autoloads
deft-autoloads yaml-mode yaml-mode-autoloads emojify-autoloads
spaceline-all-the-icons spaceline-all-the-icons-separators
spaceline-all-the-icons-segments all-the-icons all-the-icons-faces
data-material data-weathericons data-octicons data-fileicons
data-faicons data-alltheicons memoize spaceline-all-the-icons-autoloads
memoize-autoloads all-the-icons-autoloads spaceline powerline
powerline-separators color powerline-themes spaceline-autoloads
powerline-autoloads multiple-cursors-autoloads helm-git-grep-autoloads
helm-autoloads popup-autoloads helm-core-autoloads async-autoloads
ivy-autoloads python-mode-autoloads org-bullets-autoloads direnv
diff-mode direnv-autoloads alert-autoloads log4e-autoloads
gntp-autoloads flx-ido flx flx-ido-autoloads flx-autoloads
xmlunicode-autoloads auto-compile auto-compile-autoloads
js2-mode-autoloads string-inflection-autoloads org-mime
org-mime-autoloads bbdb-autoloads loccur loccur-autoloads
phpunit-autoloads yasnippet-snippets-autoloads yasnippet-snippets
yasnippet yasnippet-autoloads company-autoloads php-mode-autoloads
ghub-graphql gsexp ghub url-http url-gw nsm url-auth let-alist graphql
graphql-autoloads treepy with-editor comp comp-cstr warnings transient
edmacro kmacro gcmh gcmh-autoloads forge-autoloads yaml-autoloads
markdown-mode-autoloads ghub-autoloads treepy-autoloads
emacsql-sqlite-autoloads emacsql-autoloads closql-autoloads
magit-autoloads magit-section-autoloads git-commit-autoloads
with-editor-autoloads transient-autoloads sqlite3 sqlite3-api
sqlite3-autoloads firestarter firestarter-autoloads editorconfig
editorconfig-core editorconfig-core-handle editorconfig-fnmatch pcase
editorconfig-autoloads f f-shortdoc s f-autoloads s-autoloads geben dbgp
tree-widget geben-autoloads envrc inheritenv envrc-autoloads
inheritenv-autoloads flycheck flycheck-autoloads let-alist-autoloads
pkg-info-autoloads epl-autoloads spacemacs-dark-theme spacemacs-common
spacemacs-common-autoloads compat compat-autoloads finder-inf ox-pandoc
ht dash ox-org ox-odt rng-loc rng-uri rng-parse rng-match rng-dt
rng-util rng-pttrn nxml-parse nxml-ns nxml-enc xmltok nxml-util ox-latex
ox-icalendar ox-html table ox-ascii ox-publish ox ox-pandoc-autoloads
ht-autoloads dash-autoloads org-crypt bind-key my-firestarter ob-ditaa
ob-shell shell ob-dot whiteboard-theme server ido help-at-pt allout
cus-load define org-duration org-clock advice windmove easy-mmode time
org-agenda org-element org-persist xdg org-id avl-tree generator tabify
appt gnus-icalendar org-capture org-refile org ob ob-tangle ob-ref
ob-lob ob-table ob-exp org-macro org-src ob-comint org-pcomplete
pcomplete comint ansi-osc ansi-color ring org-list org-footnote
org-faces org-entities noutline outline icons ob-emacs-lisp ob-core
ob-eval org-cycle org-table ol rx org-fold org-fold-core org-keys oc
org-loaddefs find-func org-version org-compat org-macs format-spec
gnus-art mm-uu mml2015 mm-view mml-smime smime gnutls dig gnus-sum shr
pixel-fill kinsoku url-file svg dom browse-url url url-proxy url-privacy
url-expand url-methods url-history url-cookie generate-lisp-file
url-domsuf url-util url-parse auth-source json map url-vars gnus-group
gnus-undo gnus-start gnus-dbus dbus xml gnus-cloud nnimap nnmail
mail-source utf7 nnoo parse-time iso8601 gnus-spec gnus-int gnus-range
message sendmail mailcap yank-media puny dired dired-loaddefs rfc822 mml
mml-sec password-cache epa derived epg rfc6068 epg-config mailabbrev
mailheader gnus-win gnus nnheader gnus-util text-property-search
time-date mail-utils range wid-edit mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr
gmm-utils eieio byte-opt eieio-core icalendar diary-lib diary-loaddefs
cal-menu calendar cal-loaddefs use-package-autoloads bind-key-autoloads
info straight-autoloads cl-seq cl-extra help-mode straight subr-x
cl-macs gv cl-loaddefs cl-lib bytecomp byte-compile rmc iso-transl
tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads
xwidget-internal dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit
xinput2 x multi-tty make-network-process native-compile emacs)

Memory information:
((conses 16 802052 442929)
 (symbols 48 54570 10)
 (strings 32 265826 103942)
 (string-bytes 1 8283120)
 (vectors 16 131683)
 (vector-slots 8 4350011 2882978)
 (floats 8 2108 2629)
 (intervals 56 1888 1441)
 (buffers 984 17))

-- 
http://hexmode.com/

I cannot remember the books I've read any more than the meals I have eaten;
even so, they have made me.
            -- Ralph Waldo Emerson





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-14 22:05 ` Gregory Heytings
  2023-02-15  1:04   ` Mark A. Hershberger
  2023-02-18 16:22 ` Eli Zaretskii
  2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-14 22:05 UTC (permalink / raw)
  To: Mark A. Hershberger; +Cc: 61514


Thanks for your bug report, and for the detailed recipe!

>
> Bottom line: Emacs 30 is handling files with long lines worse than Emacs 
> 28.
>

You jump to a conclusion a bit too fast.  It's not a bug in Emacs in 
general, it's a bug in nXML, and specifically in its fontification 
routines.  Try the same recipe, but with fontification turned off:

emacs -Q
M-x global-font-lock-mode
C-x C-f a.xml

and you'll see that Emacs opens that file just fine, and that you can edit 
it normally.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-14 22:05 ` Gregory Heytings
@ 2023-02-15  1:04   ` Mark A. Hershberger
  2023-02-15  8:39     ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Mark A. Hershberger @ 2023-02-15  1:04 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 61514

On Tue, 2023-02-14 at 22:05 +0000, Gregory Heytings wrote:
> You jump to a conclusion a bit too fast.  It's not a bug in Emacs in 
> general, it's a bug in nXML, and specifically in its fontification 
> routines.  Try the same recipe, but with fontification turned off:
> 
> emacs -Q
> M-x global-font-lock-mode
> C-x C-f a.xml

The bug is not (only) in nXML.

If I run =emacs30 -Q= but alter the load path to use emacs28's nxml, it
still hangs when I try to load the file with the long line. (Yes, this
is a silly way to test if the bug is in nxml itself, but it was the
quickest I could come up with.)

In =emacs28 -Q=, when I try to load file, I see various errors, but it
still load the file and I can edit it.

Perhaps the bug is in fontification.

Perhaps the bug is in how emacs handles whatever errors it is
encountering when loading the file.

The point is that whether it is a bug in emacs "in general", in nXML,
or in glibc. The point is that there is a regression in how emacs
behaves when loading this file with long lines.

It "worked" better in Emacs28.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15  1:04   ` Mark A. Hershberger
@ 2023-02-15  8:39     ` Gregory Heytings
  2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 12:20       ` Dmitry Gutov
  0 siblings, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15  8:39 UTC (permalink / raw)
  To: Mark A. Hershberger; +Cc: 61514


>
> In =emacs28 -Q=, when I try to load file, I see various errors, but it 
> still load the file and I can edit it.
>

Did you actually try to edit it?  Not just typing "abc" at the beginning 
of the line, but (as is described on the web page you pointed to) moving 
to the end of the line and performing editing operations there.

>
> The point is that whether it is a bug in emacs "in general", in nXML, or 
> in glibc. The point is that there is a regression in how emacs behaves 
> when loading this file with long lines.
>

Nobody is saying that there is no bug here.  I was only pointing out that 
the bug is not "in Emacs", it is in the fontification routines of a 
specific mode.  If you turn fontification off, you will see that Emacs 30 
behaves much better than Emacs 28.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15  8:39     ` Gregory Heytings
@ 2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 10:41         ` Gregory Heytings
  2023-02-15 12:20       ` Dmitry Gutov
  1 sibling, 1 reply; 75+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 10:24 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Mark A. Hershberger, 61514

Gregory Heytings <gregory@heytings.org> writes:

> Nobody is saying that there is no bug here.  I was only pointing out
> that the bug is not "in Emacs", it is in the fontification routines of
> a specific mode.  If you turn fontification off, you will see that
> Emacs 30 behaves much better than Emacs 28.

And when that mode is part of Emacs, the bug is in Emacs.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 10:41         ` Gregory Heytings
  2023-02-15 10:52           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 10:41 UTC (permalink / raw)
  To: Po Lu; +Cc: Mark A. Hershberger, 61514


>> Nobody is saying that there is no bug here.  I was only pointing out 
>> that the bug is not "in Emacs", it is in the fontification routines of 
>> a specific mode.  If you turn fontification off, you will see that 
>> Emacs 30 behaves much better than Emacs 28.
>
> And when that mode is part of Emacs, the bug is in Emacs.
>

That remark misses the context and the point.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 10:41         ` Gregory Heytings
@ 2023-02-15 10:52           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 10:59             ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 10:52 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Mark A. Hershberger, 61514

Gregory Heytings <gregory@heytings.org> writes:

>>> Nobody is saying that there is no bug here.  I was only pointing
>>> out that the bug is not "in Emacs", it is in the fontification
>>> routines of a specific mode.  If you turn fontification off, you
>>> will see that Emacs 30 behaves much better than Emacs 28.
>>
>> And when that mode is part of Emacs, the bug is in Emacs.
>>
>
> That remark misses the context and the point.

The context is that you dismissed a bug in nxml because it cannot be
reproduced in fundamental-mode.

At least, that's what Mark will feel.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 10:52           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 10:59             ` Gregory Heytings
  2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 10:59 UTC (permalink / raw)
  To: Po Lu; +Cc: Mark A. Hershberger, 61514


>>>> Nobody is saying that there is no bug here.  I was only pointing out 
>>>> that the bug is not "in Emacs", it is in the fontification routines 
>>>> of a specific mode.  If you turn fontification off, you will see that 
>>>> Emacs 30 behaves much better than Emacs 28.
>>>
>>> And when that mode is part of Emacs, the bug is in Emacs.
>>
>> That remark misses the context and the point.
>
> The context is that you dismissed a bug in nxml because it cannot be 
> reproduced in fundamental-mode.
>

That's not the context, no.  Nor did I dismiss a bug, read the first 
sentence above: "Nobody is saying that there is no bug here."  Nor did 
fundamental-mode play any role whatsoever in this discussion.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 10:59             ` Gregory Heytings
@ 2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 12:11                 ` Gregory Heytings
  2023-02-15 13:56                 ` Eli Zaretskii
  0 siblings, 2 replies; 75+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 11:52 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Mark A. Hershberger, 61514

Gregory Heytings <gregory@heytings.org> writes:

>>>>> Nobody is saying that there is no bug here.  I was only pointing
>>>>> out that the bug is not "in Emacs", it is in the fontification
>>>>> routines of a specific mode.  If you turn fontification off, you
>>>>> will see that Emacs 30 behaves much better than Emacs 28.
>>>>
>>>> And when that mode is part of Emacs, the bug is in Emacs.
>>>
>>> That remark misses the context and the point.
>>
>> The context is that you dismissed a bug in nxml because it cannot be
>> reproduced in fundamental-mode.
>>
>
> That's not the context, no.  Nor did I dismiss a bug, read the first
> sentence above: "Nobody is saying that there is no bug here."  Nor did
> fundamental-mode play any role whatsoever in this discussion.

Here is what you said:

  Nobody is saying that there is no bug here. I was only pointing out that
  the bug is not "in Emacs", it is in the fontification routines of a
  specific mode. If you turn fontification off, you will see that Emacs 30
  behaves much better than Emacs 28.

or, simplified,

  "Nobody is saying that there is no bug here.  There is no bug in
  Emacs."

Now, people will think we will not fix bugs in nxml.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 12:11                 ` Gregory Heytings
  2023-02-15 12:54                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 13:56                 ` Eli Zaretskii
  1 sibling, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 12:11 UTC (permalink / raw)
  To: Po Lu; +Cc: Mark A. Hershberger, 61514


>
> Here is what you said:
>
> Nobody is saying that there is no bug here. I was only pointing out that 
> the bug is not "in Emacs", it is in the fontification routines of a 
> specific mode. If you turn fontification off, you will see that Emacs 30 
> behaves much better than Emacs 28.
>
> or, simplified,
>
> "Nobody is saying that there is no bug here.  There is no bug in Emacs."
>

That is your (very) personal interpretation and "simplification".

>
> Now, people will think we will not fix bugs in nxml.
>

Was this bug perhaps marked as "not a bug" or "won't fix"?  If not, why 
would anyone think that "we will not fix bugs in nXML"?

Now, if you have something to contribute to diagnose or fix this bug, 
please do.  If you don't, I don't see what the point of your posts is.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15  8:39     ` Gregory Heytings
  2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 12:20       ` Dmitry Gutov
  2023-02-15 13:58         ` Gregory Heytings
  1 sibling, 1 reply; 75+ messages in thread
From: Dmitry Gutov @ 2023-02-15 12:20 UTC (permalink / raw)
  To: Gregory Heytings, Mark A. Hershberger; +Cc: 61514

On 15/02/2023 10:39, Gregory Heytings wrote:
>> The point is that whether it is a bug in emacs "in general", in nXML, 
>> or in glibc. The point is that there is a regression in how emacs 
>> behaves when loading this file with long lines.
>>
> 
> Nobody is saying that there is no bug here.  I was only pointing out 
> that the bug is not "in Emacs", it is in the fontification routines of a 
> specific mode.  If you turn fontification off, you will see that Emacs 
> 30 behaves much better than Emacs 28.

It sounds like the bug lies somewhere in the intersection of nXML and 
the new long line fontification handling in Emacs 29 (with narrowing, 
perhaps)?

Which part is ultimately "at fault", is a matter of perspective.

In practice, I guess it will depend on which of them is easier to fix.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 12:11                 ` Gregory Heytings
@ 2023-02-15 12:54                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 13:31                     ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 12:54 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Mark A. Hershberger, 61514

Gregory Heytings <gregory@heytings.org> writes:

> That is your (very) personal interpretation and "simplification".

Evidently, no, judging by Mark's reaction.  I quoted, verbatim, you
saying that it was not a bug in Emacs.

> Was this bug perhaps marked as "not a bug" or "won't fix"?  If not,
> why would anyone think that "we will not fix bugs in nXML"?

Because that's what you said.  ``It's a bug in [NXML], not in Emacs''.

> Now, if you have something to contribute to diagnose or fix this bug,
> please do.  If you don't, I don't see what the point of your posts is.

I was trying to reassure Mark that we are not going to dismiss this bug.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 12:54                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 13:31                     ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 13:31 UTC (permalink / raw)
  To: Po Lu; +Cc: Mark A. Hershberger, 61514


Sorry, I won't continue this "discussion", in which you feel as usual free 
to attribute whatever meaning you want to what others say, with you.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-15 12:11                 ` Gregory Heytings
@ 2023-02-15 13:56                 ` Eli Zaretskii
  1 sibling, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-15 13:56 UTC (permalink / raw)
  To: Po Lu; +Cc: gregory, 61514, mah

> Cc: "Mark A. Hershberger" <mah@everybody.org>, 61514@debbugs.gnu.org
> Date: Wed, 15 Feb 2023 19:52:09 +0800
> From:  Po Lu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> Now, people will think we will not fix bugs in nxml.

Please cool down.  No one said this is not a bug we'd like to fix.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 12:20       ` Dmitry Gutov
@ 2023-02-15 13:58         ` Gregory Heytings
  2023-02-15 14:17           ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 13:58 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Mark A. Hershberger, 61514


>
> It sounds like the bug lies somewhere in the intersection of nXML and 
> the new long line fontification handling in Emacs 29 (with narrowing, 
> perhaps)?
>

Yes, it's an example in which the cure of narrowing around 
fontification-functions could be considered worse than the disease.  In 
this particular case however, the fontification routines already failed to 
do their job in Emacs 28 (and 24, 25, 26 and 27): after opening this file, 
errors are displayed in the echo area, and the buffer remains unfontified.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 13:58         ` Gregory Heytings
@ 2023-02-15 14:17           ` Eli Zaretskii
  2023-02-15 14:34             ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-15 14:17 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, dgutov

> Cc: "Mark A. Hershberger" <mah@everybody.org>, 61514@debbugs.gnu.org
> Date: Wed, 15 Feb 2023 13:58:37 +0000
> From: Gregory Heytings <gregory@heytings.org>
> 
> 
> >
> > It sounds like the bug lies somewhere in the intersection of nXML and 
> > the new long line fontification handling in Emacs 29 (with narrowing, 
> > perhaps)?
> 
> Yes, it's an example in which the cure of narrowing around 
> fontification-functions could be considered worse than the disease.  In 
> this particular case however, the fontification routines already failed to 
> do their job in Emacs 28 (and 24, 25, 26 and 27): after opening this file, 
> errors are displayed in the echo area, and the buffer remains unfontified.

If fontification-functions failed regardless of the restriction, then
perhaps fixing them so that they don't fail will also solve the
greater problem?

Btw, how does the narrowing make the matters worse, exactly? what is
the mechanism of worsening the situation in this case?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-15 14:17           ` Eli Zaretskii
@ 2023-02-15 14:34             ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-15 14:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, dgutov


>>> It sounds like the bug lies somewhere in the intersection of nXML and 
>>> the new long line fontification handling in Emacs 29 (with narrowing, 
>>> perhaps)?
>>
>> Yes, it's an example in which the cure of narrowing around 
>> fontification-functions could be considered worse than the disease. 
>> In this particular case however, the fontification routines already 
>> failed to do their job in Emacs 28 (and 24, 25, 26 and 27): after 
>> opening this file, errors are displayed in the echo area, and the 
>> buffer remains unfontified.
>
> If fontification-functions failed regardless of the restriction, then 
> perhaps fixing them so that they don't fail will also solve the greater 
> problem?
>

That's what I hope, indeed.

>
> Btw, how does the narrowing make the matters worse, exactly? what is the 
> mechanism of worsening the situation in this case?
>

I guess (but did not yet have time to investigate the issue in more 
detail) that it causes an infinite loop somewhere.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-14 22:05 ` Gregory Heytings
@ 2023-02-18 16:22 ` Eli Zaretskii
  2023-02-18 17:06   ` Mark A. Hershberger
                     ` (2 more replies)
  2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 3 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-18 16:22 UTC (permalink / raw)
  To: Mark A. Hershberger; +Cc: 61514

> Date: Tue, 14 Feb 2023 16:02:04 -0500
> From:  "Mark A. Hershberger" via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> 
> There seems to be a regression between 28 and 30 with how emacs handles
> long lines.

No, there's no regression with long lines.  There's an existing bug in
our regexp routines and/or nxml.  See below.

> Bottom line: Emacs 30 is handling files with long lines worse than Emacs
> 28.

This conclusion is incorrect, or at least inaccurate.  Emacs 28.2 has
the same problem as Emacs 30.  Take that a.xml file, truncate it after
250000 characters, then visit it with Emacs 28.2 -- you will see that
Emacs 28.2 freezes exactly like Emacs 30 does.

The problem is in the combination of nxml-mode and some subtle
bug/misfeature in our regexp routines.  Specifically, when we overflow
the fail stack, we fail to recover in this case, and seem to infloop
inside re_match_2_internal, or maybe recover very inefficiently (I
waited for almost 1 hour before giving up).  The call which causes the
loop is in xmltok.el, in the indicated line:

(defun xmltok-scan-attributes ()
  (let ((recovering nil)
	(atts-needing-normalization nil))
    (while (cond ((or (looking-at (xmltok-attribute regexp))
		      ;; use non-greedy group
		      (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
						(xmltok-attribute regexp)))
			(unless recovering
			  (xmltok-add-error "Malformed attribute"
					    (point)
					    (save-excursion
					      (goto-char (xmltok-attribute start
									   name))
					      (skip-chars-backward "\r\n\t ")
					      (point))))
			t))

The regexp that causes this is as follows:

  "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-18 16:22 ` Eli Zaretskii
@ 2023-02-18 17:06   ` Mark A. Hershberger
  2023-02-18 17:58     ` Eli Zaretskii
  2023-02-18 23:06   ` Gregory Heytings
  2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 75+ messages in thread
From: Mark A. Hershberger @ 2023-02-18 17:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61514

Eli writes:
> The problem is in the combination of nxml-mode and some subtle
> bug/misfeature in our regexp routines.

Thank you for your work in tracking this down.

That regex looks pretty awful.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-18 17:06   ` Mark A. Hershberger
@ 2023-02-18 17:58     ` Eli Zaretskii
  0 siblings, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-18 17:58 UTC (permalink / raw)
  To: Mark A. Hershberger; +Cc: 61514

> Date: Sat, 18 Feb 2023 09:06:25 -0800 (PST)
> From: "Mark A. Hershberger" <mah@nichework.com>
> Cc: 61514@debbugs.gnu.org
> 
> Eli writes:
> > The problem is in the combination of nxml-mode and some subtle
> > bug/misfeature in our regexp routines.
> 
> Thank you for your work in tracking this down.
> 
> That regex looks pretty awful.

Yep.  And coupled with quadratic (or worse?) behavior of our
backtracking regexp engine, it's a killer.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-18 16:22 ` Eli Zaretskii
  2023-02-18 17:06   ` Mark A. Hershberger
@ 2023-02-18 23:06   ` Gregory Heytings
  2023-02-19  0:46     ` Gregory Heytings
  2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-18 23:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mark A. Hershberger, 61514


Interestingly, the following simple patch fixes both the original and the 
truncated cases:

diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 2dca0d16ad9..eb943df46f0 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -877,7 +877,7 @@ #define INIT_FAILURE_ALLOC 20
     whose default stack limit is 2mb.  In order for a larger
     value to work reliably, you have to try to make it accord
     with the process stack limit.  */
-ptrdiff_t emacs_re_max_failures = 40000;
+ptrdiff_t emacs_re_max_failures = 37499;

  union fail_stack_elt
  {

I obtained the magical 37499 value by bisecting.  Both cases fail with 
37500 (or higher), and work as expected (i.e. they fail with "Stack 
overflow in regexp matcher") with 37499.  I don't know why exactly, but I 
note that:

37499 * 8 = 299992 and 37500 * 8 = 300000 (where 8 is sizeof (fail_stack_elt_t))

37499 * 20 * 8 = 5999840 and 37500 * 20 * 8 = 6000000 (where 20 is TYPICAL_FAILURE_SIZE)

so it seems that there is a kind of limit at exactly 6000000 bytes?






^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-18 23:06   ` Gregory Heytings
@ 2023-02-19  0:46     ` Gregory Heytings
  2023-02-19  6:42       ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-19  0:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mark A. Hershberger, 61514


>
> Interestingly, the following simple patch fixes both the original and 
> the truncated cases:
>
> diff --git a/src/regex-emacs.c b/src/regex-emacs.c
> index 2dca0d16ad9..eb943df46f0 100644
> --- a/src/regex-emacs.c
> +++ b/src/regex-emacs.c
> @@ -877,7 +877,7 @@ #define INIT_FAILURE_ALLOC 20
>    whose default stack limit is 2mb.  In order for a larger
>    value to work reliably, you have to try to make it accord
>    with the process stack limit.  */
> -ptrdiff_t emacs_re_max_failures = 40000;
> +ptrdiff_t emacs_re_max_failures = 37499;
>
> union fail_stack_elt
> {
>

After some further investigation, that's probably not TRT to do here. 
With a file truncated to 100000 characters, the same bug happens with 
emacs_re_max_failures >= 15000, and disappears with emacs_re_max_failures 
<= 14999.  Hmmm...






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19  0:46     ` Gregory Heytings
@ 2023-02-19  6:42       ` Eli Zaretskii
  2023-02-19 23:12         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-19 23:48         ` Gregory Heytings
  0 siblings, 2 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-19  6:42 UTC (permalink / raw)
  To: Gregory Heytings, Stefan Monnier; +Cc: mah, 61514

> Date: Sun, 19 Feb 2023 00:46:05 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: "Mark A. Hershberger" <mah@everybody.org>, 61514@debbugs.gnu.org
> 
> > Interestingly, the following simple patch fixes both the original and 
> > the truncated cases:
> >
> > diff --git a/src/regex-emacs.c b/src/regex-emacs.c
> > index 2dca0d16ad9..eb943df46f0 100644
> > --- a/src/regex-emacs.c
> > +++ b/src/regex-emacs.c
> > @@ -877,7 +877,7 @@ #define INIT_FAILURE_ALLOC 20
> >    whose default stack limit is 2mb.  In order for a larger
> >    value to work reliably, you have to try to make it accord
> >    with the process stack limit.  */
> > -ptrdiff_t emacs_re_max_failures = 40000;
> > +ptrdiff_t emacs_re_max_failures = 37499;
> >
> > union fail_stack_elt
> > {
> >
> 
> After some further investigation, that's probably not TRT to do here. 
> With a file truncated to 100000 characters, the same bug happens with 
> emacs_re_max_failures >= 15000, and disappears with emacs_re_max_failures 
> <= 14999.  Hmmm...

I'm not surprised.  There's something weird going on there.  Do you
understand the logic in this snippet near the end of
re_match_2_internal:

    /* We goto here if a matching operation fails. */
    fail:
      maybe_quit ();
      if (!FAIL_STACK_EMPTY ())
	{
	  [...]
	}
      else
	break;   /* Matching at this starting point really fails.  */
    } /* for (;;) */

  if (best_regs_set)
    goto restore_best_regs;

  unbind_to (count, Qnil);
  SAFE_FREE ();

  if (max_redisplay_ticks > 0 && nchars > 0)
    update_redisplay_ticks (nchars / 50 + 1, NULL);

  return -1;				/* Failure to match.  */

What is the mechanism to empty the failure stack, which eventually
causes us to report a failure?  What I see is that the stack is either
not being emptied, or being emptied very slowly.  Do the "magic"
numbers you came up with explain that?

Maybe we should devise some mechanism whereby re_match_2_internal
forcibly returns a failure after too much bactracking (if that is what
happens here), when called from redisplay?

Stefan, any ideas?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19  6:42       ` Eli Zaretskii
@ 2023-02-19 23:12         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-19 23:48         ` Gregory Heytings
  1 sibling, 0 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-19 23:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gregory Heytings, 61514, mah

> I'm not surprised.  There's something weird going on there.  Do you
> understand the logic in this snippet near the end of
> re_match_2_internal:

I should understand it, because I think I wrote (or at least
significantly changed) this part (20 years ago, maybe?).

>     /* We goto here if a matching operation fails. */
>     fail:
>       maybe_quit ();
>       if (!FAIL_STACK_EMPTY ())
> 	{
> 	  [...]
> 	}
>       else
> 	break;   /* Matching at this starting point really fails.  */
>     } /* for (;;) */
>
>   if (best_regs_set)
>     goto restore_best_regs;
>
>   unbind_to (count, Qnil);
>   SAFE_FREE ();
>
>   if (max_redisplay_ticks > 0 && nchars > 0)
>     update_redisplay_ticks (nchars / 50 + 1, NULL);
>
>   return -1;				/* Failure to match.  */
>
> What is the mechanism to empty the failure stack, which eventually
> causes us to report a failure?

It's `POP_FAILURE_POINT` done soon after testing `FAIL_STACK_EMPTY`.

> Maybe we should devise some mechanism whereby re_match_2_internal
> forcibly returns a failure after too much bactracking (if that is what
> happens here), when called from redisplay?
>
> Stefan, any ideas?

I don't understand the problem well enough yet, sorry.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-14 22:05 ` Gregory Heytings
  2023-02-18 16:22 ` Eli Zaretskii
@ 2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 12:41   ` Eli Zaretskii
  2 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-19 23:38 UTC (permalink / raw)
  To: Mark A. Hershberger; +Cc: 61514

> Opening the file (a.xml) produced by the script above from a dired
> buffer in Emacs 30.0.50 shows the following in the message window:
>
>     RNG NXML error: (error "Stack overflow in regexp matcher")

That's "good": much better than a freeze.

It points to the use of a regexp pattern somewhere which doesn't fall
into the small subset which our regexp engine handles efficiently, in
which case we get typically one stack element pushed per character, so
if the text is long enough we inevitably bump into the limit of our
regexp-stack depth.

We should look at the regex and try to rewrite it in a way that fits
better within the limits of our regexp matcher.

> After this, Emacs appears to hang and nothing else is displayed.

That's a second and separate bug (tho probably triggered by the first).
These tend to be nastier to diagnose.
It may also come from a poor regexp (except one where the problem is
not just the backtracking depth but the resulting algorithmic
complexity which can be up to exponential :-( ), but not necessarily.

> Bottom line: Emacs 30 is handling files with long lines worse than Emacs 28.

:-)

As you may have seen by now, this just triggers defensive reactions.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-18 16:22 ` Eli Zaretskii
  2023-02-18 17:06   ` Mark A. Hershberger
  2023-02-18 23:06   ` Gregory Heytings
@ 2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 12:19     ` Eli Zaretskii
  2 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-19 23:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mark A. Hershberger, 61514

> The problem is in the combination of nxml-mode and some subtle
> bug/misfeature in our regexp routines.  Specifically, when we overflow
> the fail stack, we fail to recover in this case, and seem to infloop
> inside re_match_2_internal, or maybe recover very inefficiently (I
> waited for almost 1 hour before giving up).  The call which causes the
> loop is in xmltok.el, in the indicated line:
>
> (defun xmltok-scan-attributes ()
>   (let ((recovering nil)
> 	(atts-needing-normalization nil))
>     (while (cond ((or (looking-at (xmltok-attribute regexp))
> 		      ;; use non-greedy group
> 		      (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
> 						(xmltok-attribute regexp)))
> 			(unless recovering
> 			  (xmltok-add-error "Malformed attribute"
> 					    (point)
> 					    (save-excursion
> 					      (goto-char (xmltok-attribute start
> 									   name))
> 					      (skip-chars-backward "\r\n\t ")
> 					      (point))))
> 			t))
>
> The regexp that causes this is as follows:
>
>   "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

IIUC the above describes the code where we're stuck inf-looping inside
`looking-at`?

Is it the same place where the regexp-stack overflow happens (and with
the same regexp)?


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19  6:42       ` Eli Zaretskii
  2023-02-19 23:12         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-19 23:48         ` Gregory Heytings
  2023-02-19 23:58           ` Gregory Heytings
  2023-02-20  0:14           ` Gregory Heytings
  1 sibling, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-19 23:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, Stefan Monnier


>
> I'm not surprised.  There's something weird going on there.  Do you 
> understand the logic in this snippet near the end of 
> re_match_2_internal:
>
>    /* We goto here if a matching operation fails. */
>    fail:
>      maybe_quit ();
>      if (!FAIL_STACK_EMPTY ())
> 	{
> 	  [...]
> 	}
>      else
> 	break;   /* Matching at this starting point really fails.  */
>    } /* for (;;) */
>
>  if (best_regs_set)
>    goto restore_best_regs;
>
>  unbind_to (count, Qnil);
>  SAFE_FREE ();
>
>  if (max_redisplay_ticks > 0 && nchars > 0)
>    update_redisplay_ticks (nchars / 50 + 1, NULL);
>
>  return -1;				/* Failure to match.  */
>
> What is the mechanism to empty the failure stack, which eventually 
> causes us to report a failure?  What I see is that the stack is either 
> not being emptied, or being emptied very slowly.  Do the "magic" numbers 
> you came up with explain that?
>

As Stefan just said, it's POP_FAILURE_POINT which reduces the failure 
stack and restarts the search (if appropriate).

After more investigation (and trying to make sense of the magical 
numbers), my conclusion is that there is most probably no bug in the 
regexp engine, and that the sole culprit here is the regexp in nXML.  I 
truncated the file to only 10k characters: it opens after a few seconds. 
Then I added 10k characters at a time, and opening the file took more and 
more time, but eventually succeeded.  I stopped at 50k characters, where 
opening the file took something like two minutes.  By extrapolation, 
opening the file truncated to 250k characters should take a year or so ;-)

Lowering emacs_re_max_failures just makes the regexp engine fail earlier, 
because there is not enough room in the failure stack.  In a sense it is 
better to fail earlier, but to do that in all cases, we would have to 
lower emacs_re_max_failures say to 10000, which I guess wouldn't be good 
because the it would fail too much.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:48         ` Gregory Heytings
@ 2023-02-19 23:58           ` Gregory Heytings
  2023-02-20  2:05             ` Gregory Heytings
  2023-02-20 12:31             ` Eli Zaretskii
  2023-02-20  0:14           ` Gregory Heytings
  1 sibling, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-19 23:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, Stefan Monnier


>
> Lowering emacs_re_max_failures just makes the regexp engine fail 
> earlier, because there is not enough room in the failure stack.  In a 
> sense it is better to fail earlier, but to do that in all cases, we 
> would have to lower emacs_re_max_failures say to 10000, which I guess 
> wouldn't be good because the it would fail too much.
>

BTW, this makes me wonder why emacs_re_max_failures is not accessible from 
Elisp.  I think it would be very useful, if only for debugging purposes. 
And perhaps let-binding it to a lower value around some potentially (or 
actually) problematic regexps would be a good way to prevent or fix bugs 
such as the current one.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:48         ` Gregory Heytings
  2023-02-19 23:58           ` Gregory Heytings
@ 2023-02-20  0:14           ` Gregory Heytings
  2023-02-20 12:32             ` Eli Zaretskii
  1 sibling, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20  0:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, Stefan Monnier


>
> In a sense it is better to fail earlier, but to do that in all cases, we 
> would have to lower emacs_re_max_failures say to 10000, which I guess 
> wouldn't be good because the it would fail too much.
>

Out of curiosity, I just bootstrapped Emacs with emacs_re_max_failures = 
10000.  make and make check succeed, except one test: regex-repeat-limit.

With emacs_re_max_failures = 19661 or higher that test succeeds.  I don't 
know how important it is to allow x\\{65535\\}.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:58           ` Gregory Heytings
@ 2023-02-20  2:05             ` Gregory Heytings
  2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 12:33               ` Eli Zaretskii
  2023-02-20 12:31             ` Eli Zaretskii
  1 sibling, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20  2:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]


>> Lowering emacs_re_max_failures just makes the regexp engine fail 
>> earlier, because there is not enough room in the failure stack.  In a 
>> sense it is better to fail earlier, but to do that in all cases, we 
>> would have to lower emacs_re_max_failures say to 10000, which I guess 
>> wouldn't be good because the it would fail too much.
>
> BTW, this makes me wonder why emacs_re_max_failures is not accessible 
> from Elisp.  I think it would be very useful, if only for debugging 
> purposes. And perhaps let-binding it to a lower value around some 
> potentially (or actually) problematic regexps would be a good way to 
> prevent or fix bugs such as the current one.
>

Looking at the history of that variable, which is in fact a compile-time 
constant, I see that it was initially (May 1995) set to 200000.  A few 
months later (Nov 1995) it was set to 20000, and reduced again (apparently 
because of bug reports) to 8000 and to 4000 (both in Jun 1996).  Two 
months later it was again set to 20000 (Aug 1996), and a year later to 
40000 (Dec 1997).  It kept that value since then.  As these changes (and 
this bug report) demonstrate, it is not possible to give that variable a 
"one size fits all" value.

Here's a patch that makes it modifiable, and "fixes" (in the sense of 
failing with a "Stack overflow in regexp matcher" instead of inflooping) 
the current bug.

WDYT?

[-- Attachment #2: Make-the-number-of-failure-points-in-regexp-searches.patch --]
[-- Type: text/x-diff, Size: 6256 bytes --]

From 0a58969670637cb4f065ae619531f78a11ea9bdb Mon Sep 17 00:00:00 2001
From: Gregory Heytings <gregory@heytings.org>
Date: Mon, 20 Feb 2023 01:47:28 +0000
Subject: [PATCH] Make the number of failure points in regexp searches
 modifiable

* src/search.c (syms_of_search) <regexp-max-failures>: New
variable, replacing the constant variable 'emacs_re_max_failures'.
Initialize it with the constant 'max_regexp_max_failure'.

* src/regex-emacs.h: Replace the external definition of
'emacs_re_max_failures' with the constant
'max_regexp_max_failure'.

* src/regex-emacs.c (GROW_FAIL_STACK): Use the new variable
instead of the constant.  Reset it to its maximum value if
necessary.

* src/emacs.c (main): Use the new constant
'max_regexp_max_failure'.

* lisp/nxml/xmltok.el (xmltok-scan-attributes): Bind
'regexp-max-failures' to a small value.  Fixes bug#61514.
---
 lisp/nxml/xmltok.el |  3 ++-
 src/emacs.c         |  4 ++--
 src/regex-emacs.c   | 23 ++++++++---------------
 src/regex-emacs.h   |  4 ++--
 src/search.c        |  4 ++++
 5 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index c36d225c7c9..32d6dfd39c1 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -731,7 +731,8 @@ xmltok-scan-after-comment-open
 
 (defun xmltok-scan-attributes ()
   (let ((recovering nil)
-	(atts-needing-normalization nil))
+	(atts-needing-normalization nil)
+	(regexp-max-failures 1000))
     (while (cond ((or (looking-at (xmltok-attribute regexp))
 		      ;; use non-greedy group
 		      (when (looking-at (concat "[^<>\n]+?"
diff --git a/src/emacs.c b/src/emacs.c
index 214e2e2a296..eb978c9de23 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1499,7 +1499,7 @@ main (int argc, char **argv)
       rlim_t lim = rlim.rlim_cur;
 
       /* Approximate the amount regex-emacs.c needs per unit of
-	 emacs_re_max_failures, then add 33% to cover the size of the
+	 max_regexp_max_failures, then add 33% to cover the size of the
 	 smaller stacks that regex-emacs.c successively allocates and
 	 discards on its way to the maximum.  */
       int min_ratio = 20 * sizeof (char *);
@@ -1514,7 +1514,7 @@ main (int argc, char **argv)
 
       if (try_to_grow_stack)
 	{
-	  rlim_t newlim = emacs_re_max_failures * ratio + extra;
+	  rlim_t newlim = max_regexp_max_failures * ratio + extra;
 
 	  /* Round the new limit to a page boundary; this is needed
 	     for Darwin kernel 15.4.0 (see Bug#23622) and perhaps
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 2dca0d16ad9..87d4d5cd434 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -868,17 +868,6 @@ print_double_string (re_char *where, re_char *string1, ptrdiff_t size1,
    space, so it is not a hard limit.  */
 #define INIT_FAILURE_ALLOC 20
 
-/* Roughly the maximum number of failure points on the stack.  Would be
-   exactly that if failure always used TYPICAL_FAILURE_SIZE items.
-   This is a variable only so users of regex can assign to it; we never
-   change it ourselves.  We always multiply it by TYPICAL_FAILURE_SIZE
-   before using it, so it should probably be a byte-count instead.  */
-/* Note that 4400 was enough to cause a crash on Alpha OSF/1,
-   whose default stack limit is 2mb.  In order for a larger
-   value to work reliably, you have to try to make it accord
-   with the process stack limit.  */
-ptrdiff_t emacs_re_max_failures = 40000;
-
 union fail_stack_elt
 {
   re_char *pointer;
@@ -912,7 +901,7 @@ #define INIT_FAIL_STACK()						\
 
 
 /* Double the size of FAIL_STACK, up to a limit
-   which allows approximately 'emacs_re_max_failures' items.
+   which allows approximately 'Vregexp_max_failures' items.
 
    Return 1 if succeeds, and 0 if either ran out of memory
    allocating space for it or it was already too large.
@@ -926,16 +915,20 @@ #define INIT_FAIL_STACK()						\
 #define FAIL_STACK_GROWTH_FACTOR 4
 
 #define GROW_FAIL_STACK(fail_stack)					\
-  (((fail_stack).size >= emacs_re_max_failures * TYPICAL_FAILURE_SIZE)        \
+  ((Vregexp_max_failures =						\
+    Vregexp_max_failures < 0						\
+    || Vregexp_max_failures > max_regexp_max_failures ?			\
+    max_regexp_max_failures : Vregexp_max_failures),			\
+   ((fail_stack).size >= Vregexp_max_failures * TYPICAL_FAILURE_SIZE)   \
    ? 0									\
    : ((fail_stack).stack						\
       = REGEX_REALLOCATE ((fail_stack).stack,				\
 	  (fail_stack).avail * sizeof (fail_stack_elt_t),		\
-          min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,                  \
+          min (Vregexp_max_failures * TYPICAL_FAILURE_SIZE,             \
                ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))          \
           * sizeof (fail_stack_elt_t)),                                 \
       ((fail_stack).size						\
-       = (min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,		\
+       = (min (Vregexp_max_failures * TYPICAL_FAILURE_SIZE,		\
 	       ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR)))),	\
       1))
 
diff --git a/src/regex-emacs.h b/src/regex-emacs.h
index 1bc973363e9..16bd7da094b 100644
--- a/src/regex-emacs.h
+++ b/src/regex-emacs.h
@@ -49,8 +49,8 @@ #define EMACS_REGEX_H 1
    TODO: turn into an actual function parameter.  */
 extern Lisp_Object re_match_object;
 
-/* Roughly the maximum number of failure points on the stack.  */
-extern ptrdiff_t emacs_re_max_failures;
+/* Maximum value for Vregexp_max_failures.  */
+#define max_regexp_max_failures 40000
 
 /* Amount of memory that we can safely stack allocate.  */
 extern ptrdiff_t emacs_re_safe_alloca;
diff --git a/src/search.c b/src/search.c
index 0bb52c03eef..e3360eb3a86 100644
--- a/src/search.c
+++ b/src/search.c
@@ -3431,6 +3431,10 @@ syms_of_search (void)
 is to bind it with `let' around a small expression.  */);
   Vinhibit_changing_match_data = Qnil;
 
+  DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
+	      doc: /* Maximum number of failures points in a regexp search.  */);
+  Vregexp_max_failures = max_regexp_max_failures;
+
   defsubr (&Slooking_at);
   defsubr (&Sposix_looking_at);
   defsubr (&Sstring_match);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20  2:05             ` Gregory Heytings
@ 2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 11:28                 ` Gregory Heytings
  2023-02-20 12:33               ` Eli Zaretskii
  1 sibling, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20  4:24 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

> Looking at the history of that variable, which is in fact a compile-time
> constant, I see that it was initially (May 1995) set to 200000.  A few
> months later (Nov 1995) it was set to 20000, and reduced again (apparently
> because of bug reports) to 8000 and to 4000 (both in Jun 1996).  Two months
> later it was again set to 20000 (Aug 1996), and a year later to 40000 (Dec
> 1997).  It kept that value since then.  As these changes (and this bug
> report) demonstrate, it is not possible to give that variable a "one size
> fits all" value.

Note that the stack is allocated with `SAFE_ALLOCA` and used to be
allocated with just `alloca`.  So the constant was probably reduced
(back in the 90s) in response to reports of segfaults due to
C stack overflows.

Nowadays we should be hopefully(?) safe from such segfaults since
`SAFE_ALLOCA` only uses `alloca` for smallish allocations.

> @@ -731,7 +731,8 @@ xmltok-scan-after-comment-open
>  
>  (defun xmltok-scan-attributes ()
>    (let ((recovering nil)
> -	(atts-needing-normalization nil))
> +	(atts-needing-normalization nil)
> +	(regexp-max-failures 1000))
>      (while (cond ((or (looking-at (xmltok-attribute regexp))
>  		      ;; use non-greedy group
>  		      (when (looking-at (concat "[^<>\n]+?"

This really needs a comment (at least one referring to this bug report).
I think the idea is that we hope the regexp will need at most one stack
entry per character, so the above means that we're willing to limit the
regexp search to about 1kB of text, which sounds fair given it's
supposed to match just a single XML attribute.

> +  DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
> +	      doc: /* Maximum number of failures points in a regexp search.  */);
> +  Vregexp_max_failures = max_regexp_max_failures;

This name is misleading.  It suggests it's talking about how many times
we fail, whereas the reality is that it's about the number of pending
branches in the search space (which the source code calls "failure
points" because it's info to be used in case the current branch fails
to match).  It could also be described as the number of "pending
continuations" or "stacked failure continuations" or some wording
like that.

But for the var name itself, how 'bout `regexp-max-backtracking-depth`?


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 11:28                 ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 11:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]


>> Looking at the history of that variable, which is in fact a 
>> compile-time constant, I see that it was initially (May 1995) set to 
>> 200000.  A few months later (Nov 1995) it was set to 20000, and reduced 
>> again (apparently because of bug reports) to 8000 and to 4000 (both in 
>> Jun 1996).  Two months later it was again set to 20000 (Aug 1996), and 
>> a year later to 40000 (Dec 1997).  It kept that value since then.  As 
>> these changes (and this bug report) demonstrate, it is not possible to 
>> give that variable a "one size fits all" value.
>
> Note that the stack is allocated with `SAFE_ALLOCA` and used to be 
> allocated with just `alloca`.  So the constant was probably reduced 
> (back in the 90s) in response to reports of segfaults due to C stack 
> overflows.
>

Indeed.  But now that we use SAFE_ALLOCA, we fallback to malloc when there 
is not enough room for an alloca, so the constant seems even more 
arbitrary.

>
> Nowadays we should be hopefully(?) safe from such segfaults since 
> `SAFE_ALLOCA` only uses `alloca` for smallish allocations.
>

That's not the case in regex-emacs.c: REGEX_USE_SAFE_ALLOCA sets sa_avail 
to emacs_re_safe_alloca (~6 MiB) instead of its default MAX_ALLOCA value 
(16 KiB).

>
> This really needs a comment (at least one referring to this bug report). 
> I think the idea is that we hope the regexp will need at most one stack 
> entry per character, so the above means that we're willing to limit the 
> regexp search to about 1kB of text, which sounds fair given it's 
> supposed to match just a single XML attribute.
>

Indeed, thanks!

>> +  DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
>> +	      doc: /* Maximum number of failures points in a regexp search.  */);
>> +  Vregexp_max_failures = max_regexp_max_failures;
>
> This name is misleading.  It suggests it's talking about how many times 
> we fail, whereas the reality is that it's about the number of pending 
> branches in the search space (which the source code calls "failure 
> points" because it's info to be used in case the current branch fails to 
> match).  It could also be described as the number of "pending 
> continuations" or "stacked failure continuations" or some wording like 
> that.
>
> But for the var name itself, how 'bout `regexp-max-backtracking-depth`?
>

Indeed again, and thanks again!

Updated patch attached.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Make-the-backtracking-depth-of-regexp-searches-modif.patch --]
[-- Type: text/x-diff; name=Make-the-backtracking-depth-of-regexp-searches-modif.patch, Size: 7625 bytes --]

From 4523387ac645d8d6daab07114e29d9386a02450a Mon Sep 17 00:00:00 2001
From: Gregory Heytings <gregory@heytings.org>
Date: Mon, 20 Feb 2023 11:18:30 +0000
Subject: [PATCH] Make the backtracking depth of regexp searches modifiable

* src/search.c (syms_of_search) <regexp-max-backtracking-depth>:
New variable, replacing the constant variable
'emacs_re_max_failures'.  Initialize it with the constant
'max_regexp_max_backtracking_depth'.

* src/regex-emacs.h: Replace the external definition of
'emacs_re_max_failures' with the constant
'max_regexp_max_backtracking_depth'.

* src/regex-emacs.c (GROW_FAIL_STACK): Use the new variable
instead of the constant.  Reset it to its maximum value when
necessary.

* src/emacs.c (main): Use the new constant
'max_regexp_max_backtracking_depth' in the calculations.

* lisp/nxml/xmltok.el (xmltok-scan-attributes): Bind
'regexp-max-backtracking-depth' to a small value, and add a
comment.  Fixes bug#61514.
---
 lisp/nxml/xmltok.el |  7 ++++++-
 src/emacs.c         |  8 ++++----
 src/regex-emacs.c   | 26 +++++++++++---------------
 src/regex-emacs.h   |  7 +++++--
 src/search.c        | 13 +++++++++++++
 5 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index c36d225c7c9..8201b773e0f 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -731,7 +731,12 @@ xmltok-scan-after-comment-open
 
 (defun xmltok-scan-attributes ()
   (let ((recovering nil)
-	(atts-needing-normalization nil))
+	(atts-needing-normalization nil)
+        ;; Limit the backtracking depth of regexp searches, to fail
+        ;; with a "Stack overflow in regexp matcher" error instead of
+        ;; inflooping in looking-at.  This assumes that XML attributes
+        ;; do not use more than about 1 KB characters.  See bug#61514.
+	(regexp-max-backtracking-depth 1000))
     (while (cond ((or (looking-at (xmltok-attribute regexp))
 		      ;; use non-greedy group
 		      (when (looking-at (concat "[^<>\n]+?"
diff --git a/src/emacs.c b/src/emacs.c
index 214e2e2a296..d0dca3f03ec 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1499,9 +1499,9 @@ main (int argc, char **argv)
       rlim_t lim = rlim.rlim_cur;
 
       /* Approximate the amount regex-emacs.c needs per unit of
-	 emacs_re_max_failures, then add 33% to cover the size of the
-	 smaller stacks that regex-emacs.c successively allocates and
-	 discards on its way to the maximum.  */
+	 max_regexp_max_backtracking_depth, then add 33% to cover the
+	 size of the smaller stacks that regex-emacs.c successively
+	 allocates and discards on its way to the maximum.  */
       int min_ratio = 20 * sizeof (char *);
       int ratio = min_ratio + min_ratio / 3;
 
@@ -1514,7 +1514,7 @@ main (int argc, char **argv)
 
       if (try_to_grow_stack)
 	{
-	  rlim_t newlim = emacs_re_max_failures * ratio + extra;
+	  rlim_t newlim = max_regexp_max_backtracking_depth * ratio + extra;
 
 	  /* Round the new limit to a page boundary; this is needed
 	     for Darwin kernel 15.4.0 (see Bug#23622) and perhaps
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 2dca0d16ad9..931db980e39 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -868,17 +868,6 @@ print_double_string (re_char *where, re_char *string1, ptrdiff_t size1,
    space, so it is not a hard limit.  */
 #define INIT_FAILURE_ALLOC 20
 
-/* Roughly the maximum number of failure points on the stack.  Would be
-   exactly that if failure always used TYPICAL_FAILURE_SIZE items.
-   This is a variable only so users of regex can assign to it; we never
-   change it ourselves.  We always multiply it by TYPICAL_FAILURE_SIZE
-   before using it, so it should probably be a byte-count instead.  */
-/* Note that 4400 was enough to cause a crash on Alpha OSF/1,
-   whose default stack limit is 2mb.  In order for a larger
-   value to work reliably, you have to try to make it accord
-   with the process stack limit.  */
-ptrdiff_t emacs_re_max_failures = 40000;
-
 union fail_stack_elt
 {
   re_char *pointer;
@@ -912,7 +901,7 @@ #define INIT_FAIL_STACK()						\
 
 
 /* Double the size of FAIL_STACK, up to a limit
-   which allows approximately 'emacs_re_max_failures' items.
+   which allows approximately 'Vregexp_max_backtracking_depth' items.
 
    Return 1 if succeeds, and 0 if either ran out of memory
    allocating space for it or it was already too large.
@@ -926,16 +915,23 @@ #define INIT_FAIL_STACK()						\
 #define FAIL_STACK_GROWTH_FACTOR 4
 
 #define GROW_FAIL_STACK(fail_stack)					\
-  (((fail_stack).size >= emacs_re_max_failures * TYPICAL_FAILURE_SIZE)        \
+  ((Vregexp_max_backtracking_depth =					\
+    Vregexp_max_backtracking_depth <= 0					\
+    || Vregexp_max_backtracking_depth					\
+       > max_regexp_max_backtracking_depth				\
+    ? max_regexp_max_backtracking_depth					\
+    : Vregexp_max_backtracking_depth),					\
+   ((fail_stack).size							\
+    >= Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE)		\
    ? 0									\
    : ((fail_stack).stack						\
       = REGEX_REALLOCATE ((fail_stack).stack,				\
 	  (fail_stack).avail * sizeof (fail_stack_elt_t),		\
-          min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,                  \
+          min (Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE,   \
                ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))          \
           * sizeof (fail_stack_elt_t)),                                 \
       ((fail_stack).size						\
-       = (min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,		\
+       = (min (Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE,	\
 	       ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR)))),	\
       1))
 
diff --git a/src/regex-emacs.h b/src/regex-emacs.h
index 1bc973363e9..9ccc4177487 100644
--- a/src/regex-emacs.h
+++ b/src/regex-emacs.h
@@ -49,8 +49,11 @@ #define EMACS_REGEX_H 1
    TODO: turn into an actual function parameter.  */
 extern Lisp_Object re_match_object;
 
-/* Roughly the maximum number of failure points on the stack.  */
-extern ptrdiff_t emacs_re_max_failures;
+/* Maximum value for Vregexp_max_backtracking_depth.  This is roughly
+   the maximum allowed number of failure points on the stack.  It
+   would be exactly that if failure always used TYPICAL_FAILURE_SIZE
+   items.  */
+#define max_regexp_max_backtracking_depth 40000
 
 /* Amount of memory that we can safely stack allocate.  */
 extern ptrdiff_t emacs_re_safe_alloca;
diff --git a/src/search.c b/src/search.c
index 0bb52c03eef..fc5d7c2b8e2 100644
--- a/src/search.c
+++ b/src/search.c
@@ -3431,6 +3431,19 @@ syms_of_search (void)
 is to bind it with `let' around a small expression.  */);
   Vinhibit_changing_match_data = Qnil;
 
+  DEFVAR_INT ("regexp-max-backtracking-depth", Vregexp_max_backtracking_depth,
+	      doc: /* Maximum backtracking depth in a regexp search.
+
+When the number of pending branches in the search space reaches that
+threshold, a regexp search fails with a "Stack overflow in regexp
+matcher".  Roughly speaking, this is the number of characters to which
+a regexp search is limited, with a complex enough regexp.
+
+Note that this variable will be reset to its default value if it is
+set to a non-positive value, or to a higher value than its default
+value.  */);
+  Vregexp_max_backtracking_depth = max_regexp_max_backtracking_depth;
+
   defsubr (&Slooking_at);
   defsubr (&Sposix_looking_at);
   defsubr (&Sstring_match);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 12:19     ` Eli Zaretskii
  2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 12:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mah, 61514

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: "Mark A. Hershberger" <mah@everybody.org>,  61514@debbugs.gnu.org
> Date: Sun, 19 Feb 2023 18:48:43 -0500
> 
> > The problem is in the combination of nxml-mode and some subtle
> > bug/misfeature in our regexp routines.  Specifically, when we overflow
> > the fail stack, we fail to recover in this case, and seem to infloop
> > inside re_match_2_internal, or maybe recover very inefficiently (I
> > waited for almost 1 hour before giving up).  The call which causes the
> > loop is in xmltok.el, in the indicated line:
> >
> > (defun xmltok-scan-attributes ()
> >   (let ((recovering nil)
> > 	(atts-needing-normalization nil))
> >     (while (cond ((or (looking-at (xmltok-attribute regexp))
> > 		      ;; use non-greedy group
> > 		      (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
> > 						(xmltok-attribute regexp)))
> > 			(unless recovering
> > 			  (xmltok-add-error "Malformed attribute"
> > 					    (point)
> > 					    (save-excursion
> > 					      (goto-char (xmltok-attribute start
> > 									   name))
> > 					      (skip-chars-backward "\r\n\t ")
> > 					      (point))))
> > 			t))
> >
> > The regexp that causes this is as follows:
> >
> >   "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
> 
> IIUC the above describes the code where we're stuck inf-looping inside
> `looking-at`?

Not inflooping, but very slowly backtracking, or so it seems.

> Is it the same place where the regexp-stack overflow happens (and with
> the same regexp)?

It's (almost) the same place, but not the same regexp.  The regexp
which causes the stack-overflow message (which is emitted from
set-auto-mode, before entering redisplay) is this:

  "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
makes all the difference.  So the looking-at which fails reasonably
quickly is the first call to looking-at above, whereas the one the
"hangs" is the second one.  Maybe this points out a way out of this
misery?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:58           ` Gregory Heytings
  2023-02-20  2:05             ` Gregory Heytings
@ 2023-02-20 12:31             ` Eli Zaretskii
  2023-02-20 12:40               ` Gregory Heytings
  1 sibling, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 12:31 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Sun, 19 Feb 2023 23:58:41 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, 
>     Stefan Monnier <monnier@iro.umontreal.ca>
> 
> BTW, this makes me wonder why emacs_re_max_failures is not accessible from 
> Elisp.  I think it would be very useful, if only for debugging purposes. 
> And perhaps let-binding it to a lower value around some potentially (or 
> actually) problematic regexps would be a good way to prevent or fix bugs 
> such as the current one.

If we know which regexps cause problems, shouldn't we instead fix
those regexps, or change how we use them?

For debugging purposes, you can set the value in the debugger after
starting Emacs, or with a breakpoint just before calling the
problematic code.

As you have seen from the history of this value, it's problematic to
calculate, and the meaning of the value is not obvious.  So exposing
this to Lisp would be a rope that's too long to give our users and
programmers.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20  0:14           ` Gregory Heytings
@ 2023-02-20 12:32             ` Eli Zaretskii
  0 siblings, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 12:32 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Mon, 20 Feb 2023 00:14:34 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, 
>     Stefan Monnier <monnier@iro.umontreal.ca>
> 
> Out of curiosity, I just bootstrapped Emacs with emacs_re_max_failures = 
> 10000.  make and make check succeed, except one test: regex-repeat-limit.
> 
> With emacs_re_max_failures = 19661 or higher that test succeeds.  I don't 
> know how important it is to allow x\\{65535\\}.

It makes little sense to me to limit legitimate uses of regexps
because of a single pathological use case.  It's the tail wagging the
dog in my book.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20  2:05             ` Gregory Heytings
  2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 12:33               ` Eli Zaretskii
  1 sibling, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 12:33 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Mon, 20 Feb 2023 02:05:59 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, 
>     Stefan Monnier <monnier@iro.umontreal.ca>
> 
> Here's a patch that makes it modifiable, and "fixes" (in the sense of 
> failing with a "Stack overflow in regexp matcher" instead of inflooping) 
> the current bug.
> 
> WDYT?

I'm against it.  I'd rather try to fix what nXML does and/or the
regexp it uses.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 12:31             ` Eli Zaretskii
@ 2023-02-20 12:40               ` Gregory Heytings
  2023-02-20 13:14                 ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 12:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>> BTW, this makes me wonder why emacs_re_max_failures is not accessible 
>> from Elisp.  I think it would be very useful, if only for debugging 
>> purposes. And perhaps let-binding it to a lower value around some 
>> potentially (or actually) problematic regexps would be a good way to 
>> prevent or fix bugs such as the current one.
>
> If we know which regexps cause problems, shouldn't we instead fix those 
> regexps, or change how we use them?
>

If we know how and where to fix them, that's better of course.  If we 
don't (and frankly when I look at that regexp I have no idea how it could 
be fixed), limiting the backtracking depth to a more reasonable value is 
better than not fixing the bug.

>
> For debugging purposes, you can set the value in the debugger after 
> starting Emacs, or with a breakpoint just before calling the problematic 
> code.
>

That's only true for the (very) few of us who are comfortable building 
Emacs and running it under GDB (and even for them it's much easier to just 
change the value with a setq).  If regexp-max-backtracking-depth had been 
present, everyone could easily have tried to set it to some lower value.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 12:41   ` Eli Zaretskii
  0 siblings, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 12:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mah, 61514

> Cc: 61514@debbugs.gnu.org
> Date: Sun, 19 Feb 2023 18:38:52 -0500
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> > After this, Emacs appears to hang and nothing else is displayed.
> 
> That's a second and separate bug (tho probably triggered by the first).
> These tend to be nastier to diagnose.

What happens here that causes the "hang" is clear: the problematic
regexp is used in a looking-at call that is called via
fontification-functions, so its being pathologically slow wedges
redisplay.  The backtrace is below:

  #2  0x0121ccc0 in re_match_2_internal (bufp=0x19c5a50 <searchbufs+5744>,
      string1=0x83207d0 "<id name=\"", 'n' <repeats 4193280 times>, "\">\n",
      size1=0,
      string2=0x83207d0 "<id name=\"", 'n' <repeats 4193280 times>, "\">\n",
      size2=250000, pos=10, regs=0x1870074 <main_thread+116>, stop=250000)
      at regex-emacs.c:4581
  #3  0x0121b22e in rpl_re_match_2 (bufp=0x19c5a50 <searchbufs+5744>,
      string1=0x83207d0 "<id name=\"", 'n' <repeats 4193280 times>, "\">\n",
      size1=0,
      string2=0x83207d0 "<id name=\"", 'n' <repeats 4193280 times>, "\">\n",
      size2=250000, pos=10, regs=0x1870074 <main_thread+116>, stop=250000)
      at regex-emacs.c:3861
  #4  0x01208bba in looking_at_1 (string=XIL(0x8000000007a39ed8), posix=false,
      modify_data=true) at search.c:314
  #5  0x01208db4 in Flooking_at (regexp=XIL(0x8000000007a39ed8),
      inhibit_modify=XIL(0)) at search.c:350
  #6  0x01279aad in funcall_subr (subr=0x187b8c0 <Slooking_at>, numargs=1,
      args=0x6c40448) at eval.c:3036
  #7  0x012ec2be in exec_byte_code (fun=XIL(0xa000000006c10520),
      args_template=769, nargs=2, args=0x6c40458) at bytecode.c:809
  #8  0x0127a046 in fetch_and_exec_byte_code (fun=XIL(0xa000000007ea04f8),
      args_template=257, nargs=1, args=0x6c401c8) at eval.c:3081
  #9  0x0127a5a5 in funcall_lambda (fun=XIL(0xa000000007ea04f8), nargs=1,
      arg_vector=0x6c401c8) at eval.c:3153
  #10 0x01279512 in funcall_general (fun=XIL(0xa000000007ea04f8), numargs=1,
      args=0x6c401c8) at eval.c:2945
  #11 0x01279897 in Ffuncall (nargs=2, args=0x6c401c0) at eval.c:2995
  #12 0x0127895a in run_hook_wrapped_funcall (nargs=2, args=0x6c401c0)
      at eval.c:2773
  #13 0x01278e11 in run_hook_with_args (nargs=2, args=0x6c401c0,
      funcall=0x1278912 <run_hook_wrapped_funcall>) at eval.c:2854
  #14 0x012789a9 in Frun_hook_wrapped (nargs=2, args=0x6c401c0) at eval.c:2788
  #15 0x01279ef7 in funcall_subr (subr=0x187ff80 <Srun_hook_wrapped>,
      numargs=2, args=0x6c401c0) at eval.c:3059
  #16 0x012ec2be in exec_byte_code (fun=XIL(0xa00000000615096c),
      args_template=514, nargs=2, args=0x6c400f8) at bytecode.c:809
  #17 0x0127a046 in fetch_and_exec_byte_code (fun=XIL(0xa00000000615043c),
      args_template=257, nargs=1, args=0x82ac08) at eval.c:3081
  #18 0x0127a5a5 in funcall_lambda (fun=XIL(0xa00000000615043c), nargs=1,
      arg_vector=0x82ac08) at eval.c:3153
  #19 0x01279512 in funcall_general (fun=XIL(0xa00000000615043c), numargs=1,
      args=0x82ac08) at eval.c:2945
  #20 0x01279897 in Ffuncall (nargs=2, args=0x82ac00) at eval.c:2995
  #21 0x0127394d in internal_condition_case_n (bfun=0x127974b <Ffuncall>,
      nargs=2, args=0x82ac00, handlers=XIL(0x30),
      hfun=0x10428ae <safe_eval_handler>) at eval.c:1558
  #22 0x01042ae1 in safe__call (inhibit_quit=false, nargs=2,
      func=XIL(0x477df6c), ap=0x82acc4 "") at xdisp.c:3024
  #23 0x01042b5a in safe_call (nargs=2, func=XIL(0x477df6c)) at xdisp.c:3039
  #24 0x01042bae in safe_call1 (fn=XIL(0x477df6c), arg=make_fixnum(1))
      at xdisp.c:3050
  #25 0x01046b6a in handle_fontified_prop (it=0x82af58) at xdisp.c:4445
  #26 0x0104554e in handle_stop (it=0x82af58) at xdisp.c:3978
  #27 0x01052103 in reseat (it=0x82af58, pos=..., force_p=true) at xdisp.c:7509
  #28 0x010444d5 in init_iterator (it=0x82af58, w=0x7967568, charpos=1,
      bytepos=1, row=0x7adeaf0, base_face_id=DEFAULT_FACE_ID) at xdisp.c:3488
  #29 0x0104484d in start_display (it=0x82af58, w=0x7967568, pos=...)
      at xdisp.c:3595
  #30 0x0107ccaa in try_window (window=XIL(0xa000000007967568), pos=...,
      flags=1) at xdisp.c:20568
  #31 0x01079885 in redisplay_window (window=XIL(0xa000000007967568),
      just_this_one_p=false) at xdisp.c:19960
  #32 0x01070924 in redisplay_window_0 (window=XIL(0xa000000007967568))
      at xdisp.c:17446
  #33 0x0127373a in internal_condition_case_1 (
      bfun=0x10708cc <redisplay_window_0>, arg=XIL(0xa000000007967568),
      handlers=XIL(0xc00000000648471c), hfun=0x107058e <redisplay_window_error>)
      at eval.c:1498
  #34 0x01070550 in redisplay_windows (window=XIL(0xa000000007967568))
      at xdisp.c:17416
  #35 0x0106ed14 in redisplay_internal () at xdisp.c:16866
  #36 0x0106c42e in redisplay () at xdisp.c:16049
  #37 0x0117570b in read_char (commandflag=1, map=XIL(0xc000000007e4f0f0),
      prev_event=XIL(0), used_mouse_menu=0x82f41f, end_time=0x0)
      at keyboard.c:2627
  #38 0x0118f671 in read_key_sequence (keybuf=0x82f6f8, prompt=XIL(0),
      dont_downcase_last=false, can_return_switch_frame=true,
      fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:10074
  #39 0x01170cf9 in command_loop_1 () at keyboard.c:1375
  #40 0x01273650 in internal_condition_case (bfun=0x1170698 <command_loop_1>,
      handlers=XIL(0x90), hfun=0x116f666 <cmd_error>) at eval.c:1474
  #41 0x01170105 in command_loop_2 (handlers=XIL(0x90)) at keyboard.c:1124
  #42 0x012724d7 in internal_catch (tag=XIL(0x10380),
      func=0x11700ce <command_loop_2>, arg=XIL(0x90)) at eval.c:1197
  #43 0x01170070 in command_loop () at keyboard.c:1102
  #44 0x0116f0c6 in recursive_edit_1 () at keyboard.c:711
  #45 0x0116f364 in Frecursive_edit () at keyboard.c:794
  #46 0x0116a10e in main (argc=2, argv=0xa428e0) at emacs.c:2529

  Lisp Backtrace:
  "looking-at" (0x6c40448)
  "xmltok-scan-attributes" (0x6c403f0)
  "xmltok-scan-after-lt" (0x6c403b8)
  "xmltok-forward" (0x6c40388)
  "nxml-tokenize-forward" (0x6c40350)
  "nxml-extend-region" (0x6c40308)
  "font-lock-default-fontify-region" (0x6c40298)
  "font-lock-fontify-region" (0x6c40230)
  0x7ea04f8 PVEC_COMPILED
  "run-hook-wrapped" (0x6c401c0)
  "jit-lock--run-functions" (0x6c400e8)
  "jit-lock-fontify-now" (0x6c40058)
  "jit-lock-function" (0x82ac08)
  "redisplay_internal (C function)" (0x0)





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 12:40               ` Gregory Heytings
@ 2023-02-20 13:14                 ` Eli Zaretskii
  2023-02-20 14:17                   ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 13:14 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Mon, 20 Feb 2023 12:40:54 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, monnier@iro.umontreal.ca
> 
> >> BTW, this makes me wonder why emacs_re_max_failures is not accessible 
> >> from Elisp.  I think it would be very useful, if only for debugging 
> >> purposes. And perhaps let-binding it to a lower value around some 
> >> potentially (or actually) problematic regexps would be a good way to 
> >> prevent or fix bugs such as the current one.
> >
> > If we know which regexps cause problems, shouldn't we instead fix those 
> > regexps, or change how we use them?
> >
> 
> If we know how and where to fix them, that's better of course.  If we 
> don't (and frankly when I look at that regexp I have no idea how it could 
> be fixed), limiting the backtracking depth to a more reasonable value is 
> better than not fixing the bug.

So let's try fixing the issue that way first, and only fall back to
"limiting failures" if we decide we failed with that.

> > For debugging purposes, you can set the value in the debugger after 
> > starting Emacs, or with a breakpoint just before calling the problematic 
> > code.
> 
> That's only true for the (very) few of us who are comfortable building 
> Emacs and running it under GDB (and even for them it's much easier to just 
> change the value with a setq).  If regexp-max-backtracking-depth had been 
> present, everyone could easily have tried to set it to some lower value.

I don't trust people who don't build Emacs and run it under GDB to use
such a variable judiciously.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 12:19     ` Eli Zaretskii
@ 2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 13:54         ` Eli Zaretskii
  2023-02-20 14:06         ` Gregory Heytings
  0 siblings, 2 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 13:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514

>> IIUC the above describes the code where we're stuck inf-looping inside
>> `looking-at`?
> Not inflooping, but very slowly backtracking, or so it seems.

Duh, right.  I meant "hang".  Sorry for being a bit mushy-brained for a moment.

>> Is it the same place where the regexp-stack overflow happens (and with
>> the same regexp)?
>
> It's (almost) the same place, but not the same regexp.  The regexp
> which causes the stack-overflow message (which is emitted from
> set-auto-mode, before entering redisplay) is this:
>
>   "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>
> As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
> makes all the difference.  So the looking-at which fails reasonably
> quickly is the first call to looking-at above, whereas the one the
> "hangs" is the second one.

Yes, it makes a lot of sense now.

> Maybe this points out a way out of this misery?

I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<"
instead of "[^<>\n]+?"  avoids the hang (not sure if it's the right
thing to do for all the regexp that can be returned by
`xmltok-attribute`, tho).

And for the stack overflow I haven't yet found its origin.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 13:54         ` Eli Zaretskii
  2023-02-20 14:59           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 14:06         ` Gregory Heytings
  1 sibling, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 13:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: mah, 61514

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: mah@everybody.org,  61514@debbugs.gnu.org
> Date: Mon, 20 Feb 2023 08:19:26 -0500
> 
> >   "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
> >
> > As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
> > makes all the difference.  So the looking-at which fails reasonably
> > quickly is the first call to looking-at above, whereas the one the
> > "hangs" is the second one.
> 
> Yes, it makes a lot of sense now.
> 
> > Maybe this points out a way out of this misery?
> 
> I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<"
> instead of "[^<>\n]+?"  avoids the hang

It does, thanks.

> (not sure if it's the right thing to do for all the regexp that can
> be returned by `xmltok-attribute`, tho).

How would we go about finding out?  Because other than that, changing
the regexp solves this nasty problem, and all the tests in
test/lisp/nxml/ still pass.

> And for the stack overflow I haven't yet found its origin.

Not sure what is the mystery here.  AFAIU, we look for the closing
">", don't find it, and then start looking for fewer and fewer non-'>'
characters followed by '>'.  Isn't that what happens here?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 13:54         ` Eli Zaretskii
@ 2023-02-20 14:06         ` Gregory Heytings
  2023-02-20 14:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 14:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<" 
> instead of "[^<>\n]+?" avoids the hang (not sure if it's the right thing 
> to do for all the regexp that can be returned by `xmltok-attribute`, 
> tho).
>

That does work, indeed.  Using e.g. "[^<>\n]\\{1,100\\}?" also works (but 
is not as efficient).  Perhaps Mark (who added xmltok.el to Emacs in 2007) 
can help here to determine what the right thing is?

>
> And for the stack overflow I haven't yet found its origin.
>

There is no stack overflow here, AFAIU.  It's simply that the prepended 
regexp matches one or more (without any upper bound) characters except 
"<>\n", which means that we backtrack _a lot_ when the line is long.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 14:06         ` Gregory Heytings
@ 2023-02-20 14:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 14:24             ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 14:16 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

>> And for the stack overflow I haven't yet found its origin.
> There is no stack overflow here, AFAIU.  It's simply that the prepended
> regexp matches one or more (without any upper bound) characters except
> "<>\n", which means that we backtrack _a lot_ when the line is long.

There is clearly a stack overflow since the OP showed stack overflow
errors in *Messages*.

And the stack overflow is in the rest of the regexp: the `+?` repetition
uses only ever 1 stack slot no matter how long a match we consider
(contrary to the `+` and `*` repetitions which use N stack slots for the
N repetitions of the longest match).


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 13:14                 ` Eli Zaretskii
@ 2023-02-20 14:17                   ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 14:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>>> For debugging purposes, you can set the value in the debugger after 
>>> starting Emacs, or with a breakpoint just before calling the 
>>> problematic code.
>>
>> That's only true for the (very) few of us who are comfortable building 
>> Emacs and running it under GDB (and even for them it's much easier to 
>> just change the value with a setq).  If regexp-max-backtracking-depth 
>> had been present, everyone could easily have tried to set it to some 
>> lower value.
>
> I don't trust people who don't build Emacs and run it under GDB to use 
> such a variable judiciously.
>

In the current patch it is automatically capped to a maximum value.  It 
could also be automatically reset to a minimum value (say 1000 or 500 or 
100).  I just tried to set it to 500 in my configuration during a few 
minutes, and did not see any errors, so I don't see what could go 
fundamentally wrong if we give users control on that threshold.

It would have been much easier to debug this bug by asking Mark "could you 
please try to temporarily set regexp-max-backtracking-depth to 1000 and 
see if that fixes the bug?".  This bug report was easy to reproduce, so 
that wasn't necessary, but it would be for bug reports from users with 
more complex setup.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 14:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 14:24             ` Gregory Heytings
  2023-02-20 15:02               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 14:24 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>>> And for the stack overflow I haven't yet found its origin.
>>
>> There is no stack overflow here, AFAIU.  It's simply that the prepended 
>> regexp matches one or more (without any upper bound) characters except 
>> "<>\n", which means that we backtrack _a lot_ when the line is long.
>
> There is clearly a stack overflow since the OP showed stack overflow 
> errors in *Messages*.
>

Ah yes, I misunderstood what you meant.  I thought you were talking about 
a stack overflow bug in the regexp engine.

>
> And the stack overflow is in the rest of the regexp: the `+?` repetition 
> uses only ever 1 stack slot no matter how long a match we consider 
> (contrary to the `+` and `*` repetitions which use N stack slots for the 
> N repetitions of the longest match).
>

Indeed.  That's the bug in the bug.  But it's the '+?' repetition which 
causes the "infloop", right?






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 13:54         ` Eli Zaretskii
@ 2023-02-20 14:59           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 15:56             ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 14:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514

Eli Zaretskii [2023-02-20 15:54:52] wrote:

>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: mah@everybody.org,  61514@debbugs.gnu.org
>> Date: Mon, 20 Feb 2023 08:19:26 -0500
>> 
>> >   "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>> >
>> > As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
>> > makes all the difference.  So the looking-at which fails reasonably
>> > quickly is the first call to looking-at above, whereas the one the
>> > "hangs" is the second one.
>> 
>> Yes, it makes a lot of sense now.
>> 
>> > Maybe this points out a way out of this misery?
>> 
>> I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<"
>> instead of "[^<>\n]+?"  avoids the hang
>
> It does, thanks.
>
>> (not sure if it's the right thing to do for all the regexp that can
>> be returned by `xmltok-attribute`, tho).
>
> How would we go about finding out?  Because other than that, changing
> the regexp solves this nasty problem, and all the tests in
> test/lisp/nxml/ still pass.

I did find out: we'll always get the same regexp hre, so it's OK.

It turns out that (xmltok-attribute regexp) doesn't mean to return "the
something of `regexp`" but to return the "the regexp named
`xmltok-attribute`".

`xmltok-attribute` is a funny macro built by `xmltok-defregexp`.

>> And for the stack overflow I haven't yet found its origin.
>
> Not sure what is the mystery here.  AFAIU, we look for the closing
> ">", don't find it, and then start looking for fewer and fewer non-'>'
> characters followed by '>'.  Isn't that what happens here?

Right, but the stack overflows always come from repetitions where
our `mutually_exclusive_p` test fails.  Let's see:

    \\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=

The first two `*` should be non-backtracking because they repeat
[-._[:alnum:]] which is mutually-exclusive with what follows (either `:`
or whitespace, or `=`).  Similarly the third `*` should be
non-backtracking because its body can't match the `=` that must follow.

    \\(?:[\s\r\t\n]*

there aren't enough whitespaces so even if this can backtrack it
shouldn't be the source of our current problems.

    \\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'

Neither `*` here should backtrack.

    \\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)

Same here.

    \\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

And here we're back to only repeating whitespace.

What am I missing?


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 14:24             ` Gregory Heytings
@ 2023-02-20 15:02               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 15:02 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

> Indeed.  That's the bug in the bug.  But it's the '+?' repetition which
> causes the "infloop", right?

It's the `+?` which causes the N repetitions of the O(N) time match,
resulting in an O(N²) complexity, I think, yes.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 14:59           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 15:56             ` Gregory Heytings
  2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 17:04               ` Gregory Heytings
  0 siblings, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 15:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> What am I missing?
>

I don't know... but I observe that this alone:

(with-current-buffer (get-buffer-create "*bug*")
   (insert "<id name=\"")
   (insert (make-string 250000 ?n))
   (goto-char 5)
   (looking-at "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"))

doesn't fail, so I don't think it's this regexp which causes the overflow.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 15:56             ` Gregory Heytings
@ 2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 17:14                 ` Gregory Heytings
  2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 17:04               ` Gregory Heytings
  1 sibling, 2 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 16:47 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

> I don't know... but I observe that this alone:
>
> (with-current-buffer (get-buffer-create "*bug*")
>   (insert "<id name=\"")
>   (insert (make-string 250000 ?n))
>   (goto-char 5)
>   (looking-at
> "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"))
>
> doesn't fail, so I don't think it's this regexp which causes the overflow.

Indeed, there' still something unclear about how the overflow occurs,
but at least it seems my analysis doesn't match emacs-regex.c's because
I can get a stack overflow using the first part of the regexp:

    (with-current-buffer (get-buffer-create "*bug*")
      (erase-buffer)
      (insert "<id name=\"")
      (insert (make-string 2500000 ?n))
      (goto-char (+ (point-min) 10))
      (looking-at
"\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*="))

where I can even reduce the regexp down to "[-._[:alnum:]]*\t*=".
Looks like we're missing a case in our backtracking-elimination code.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 15:56             ` Gregory Heytings
  2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 17:04               ` Gregory Heytings
  1 sibling, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 17:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> I don't think it's this regexp which causes the overflow.
>

... but with a larger buffer it does, so apparently the regexp is 
problematic after all:

(with-current-buffer (get-buffer-create "*bug*")
   (let ((regexp "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"))
     (erase-buffer)
     (insert "<id name=\"")
     (insert (make-string  266659 ?n))
     (goto-char 5)
     (looking-at regexp)))

Here 266658 does not overflow, and 266659 does.  If '=' is removed from 
the regexp it doesn't overflow anymore, even with a much larger string.

Trying to simplify the regexp gradually, I finally obtained the following 
minimal test case:

(with-current-buffer (get-buffer-create "*bug*")
   (let ((regexp "[[:alpha:]]*=\".*&.*\""))
     (erase-buffer)
     (insert "<id name=\"")
     (insert (make-string  266666 ?n))
     (goto-char 5)
     (looking-at regexp)))

Here it fails with 266666, and doesn't with 266665.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 17:14                 ` Gregory Heytings
  2023-02-20 17:34                   ` Gregory Heytings
  2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 17:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> where I can even reduce the regexp down to "[-._[:alnum:]]*\t*=".
>
> Looks like we're missing a case in our backtracking-elimination code.
>

Apparently we're doing the same thing at the same moment ;-)

I simplified it down to:

(with-current-buffer (get-buffer-create "*bug*")
   (erase-buffer)
   (insert (make-string 266666 ?n))
   (goto-char (point-min))
   (looking-at "[[:alpha:]]*=*"))

This fails with 266666, and succeeds with 266665.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 17:14                 ` Gregory Heytings
@ 2023-02-20 17:34                   ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 17:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> I simplified it down to:
>
> (with-current-buffer (get-buffer-create "*bug*")
>  (erase-buffer)
>  (insert (make-string 266666 ?n))
>  (goto-char (point-min))
>  (looking-at "[[:alpha:]]*=*"))
>
> This fails with 266666, and succeeds with 266665.
>

Even shorter:

(with-current-buffer (get-buffer-create "*bug*")
   (erase-buffer)
   (insert (make-string 266666 ?x))
   (goto-char (point-min))
   (looking-at "x*=*"))

Again this fails with 266666, and succeeds with 266665.

Likewise:

(with-current-buffer (get-buffer-create "*bug*")
   (erase-buffer)
   (insert (make-string 266666 ?x))
   (insert "=")
   (goto-char (point-min))
   (looking-at "x*=*"))

fails with 266666 and succeeds with 266665.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 17:14                 ` Gregory Heytings
@ 2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 19:11                   ` Gregory Heytings
  2023-02-20 20:01                   ` Eli Zaretskii
  1 sibling, 2 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 18:49 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

> where I can even reduce the regexp down to "[-._[:alnum:]]*\t*=".
> Looks like we're missing a case in our backtracking-elimination code.

The patch below fixes the stack overflow.
[ And thanks Gregory for the yet simpler test cases.  ]

I don't think we want that for `emacs-29`, but unless there's some
objection I'll push this to `master`,


        Stefan


diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 2dca0d16ad9..2571812cb39 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -3653,6 +3653,7 @@ mutually_exclusive_p (struct re_pattern_buffer *bufp, re_char *p1,
   re_opcode_t op2;
   bool multibyte = RE_MULTIBYTE_P (bufp);
   unsigned char *pend = bufp->buffer + bufp->used;
+  re_char *p2_orig = p2;
 
   eassert (p1 >= bufp->buffer && p1 < pend
 	   && p2 >= bufp->buffer && p2 <= pend);
@@ -3822,6 +3823,23 @@ mutually_exclusive_p (struct re_pattern_buffer *bufp, re_char *p1,
     case notcategoryspec:
       return ((re_opcode_t) *p1 == categoryspec && p1[1] == p2[1]);
 
+    case on_failure_jump_nastyloop:
+    case on_failure_jump_smart:
+    case on_failure_jump_loop:
+    case on_failure_keep_string_jump:
+    case on_failure_jump:
+      {
+        int mcnt;
+	p2++;
+	EXTRACT_NUMBER_AND_INCR (mcnt, p2);
+	/* Don't just test `mcnt > 0` because non-greedy loops have
+	   their test at the end with an unconditional jump at the start.  */
+	if (p2 + mcnt > p2_orig) /* Ensure forward progress.  */
+	  return (mutually_exclusive_p (bufp, p1, p2)
+		  && mutually_exclusive_p (bufp, p1, p2 + mcnt));
+	break;
+      }
+
     default:
       ;
     }
diff --git a/test/src/regex-emacs-tests.el b/test/src/regex-emacs-tests.el
index 34fa35e32ff..52d43775b8e 100644
--- a/test/src/regex-emacs-tests.el
+++ b/test/src/regex-emacs-tests.el
@@ -872,4 +872,15 @@ regexp-atomic-failure
   (should (equal (string-match "\\`\\(?:ab\\)*\\'" "a") nil))
   (should (equal (string-match "\\`a\\{2\\}*\\'" "a") nil)))
 
+(ert-deftest regexp-tests-backtrack-optimization () ;bug#61514
+  ;; Make sure we don't use up the regexp stack needlessly.
+  (with-current-buffer (get-buffer-create "*bug*")
+    (erase-buffer)
+    (insert (make-string 1000000 ?x) "=")
+    (goto-char (point-min))
+    (should (looking-at "x*=*"))
+    (should (looking-at "x*\\(=\\|:\\)"))
+    (should (looking-at "x*\\(=\\|:\\)*"))
+    (should (looking-at "x*=*?"))))
+
 ;;; regex-emacs-tests.el ends here






^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 19:11                   ` Gregory Heytings
  2023-02-20 19:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 20:01                   ` Eli Zaretskii
  1 sibling, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 19:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>
> The patch below fixes the stack overflow.
>

Together with the "\\<" in xmltok.el, this fixes this bug indeed in all 
cases (truncated and non-truncated ones).  Congrats!

>
> I don't think we want that for `emacs-29`, but unless there's some 
> objection I'll push this to `master`,
>

I'd say it fixes an important bug in the regexp engine, but I cannot judge 
whether it's important enough for emacs-29.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 19:11                   ` Gregory Heytings
@ 2023-02-20 19:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 19:29 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

> I'd say it fixes an important bug in the regexp engine, but I cannot judge
> whether it's important enough for emacs-29.

FWIW, I've been using a slight variant of this code in my local Emacs
hacks for the last probably 10-15 years (I can't remember when I wrote
it, and the Git history only goes back to the Bzr->Git switch).

I had completely forgotten about it, but while doing the tests, I saw
that some cases were working better (in my local Emacs) than what
I expected while reading the code on `master` :-)


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 19:11                   ` Gregory Heytings
  2023-02-20 19:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 20:13                       ` Gregory Heytings
  1 sibling, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-20 19:37 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

>> The patch below fixes the stack overflow.
> Together with the "\\<" in xmltok.el, this fixes this bug indeed in all
> cases (truncated and non-truncated ones).  Congrats!

We probably still have an O(N²) behavior which can bite with a line like

   <id * name="N_N_N_N_N_N_N_N_.....">

My patch should significantly improve the constant factor, but with
a long enough "N_N_N_N_N..." I suspect it can still end up painful.

Maybe we should reduce the scope of the search for the fallback case
(the case where we add the "[^...]+\\<" prefix) since AFAICT its only
purpose is to try and guess a helpful error messages when the XML is
ill-formed.

>> I don't think we want that for `emacs-29`, but unless there's some
>> objection I'll push this to `master`,
> I'd say it fixes an important bug in the regexp engine, but I cannot judge
> whether it's important enough for emacs-29.

It's a missing optimization that's been with us for many many years, so
I don't see any urgency to fix it.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-20 19:11                   ` Gregory Heytings
@ 2023-02-20 20:01                   ` Eli Zaretskii
  2023-02-21  2:23                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-20 20:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gregory, 61514, mah

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  mah@everybody.org,  61514@debbugs.gnu.org
> Date: Mon, 20 Feb 2023 13:49:49 -0500
> 
> > where I can even reduce the regexp down to "[-._[:alnum:]]*\t*=".
> > Looks like we're missing a case in our backtracking-elimination code.
> 
> The patch below fixes the stack overflow.
> [ And thanks Gregory for the yet simpler test cases.  ]
> 
> I don't think we want that for `emacs-29`, but unless there's some
> objection I'll push this to `master`,

Assuming all the regex-emacs-tests and search-tests pass after this
change, please install on emacs-29, and thanks.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-20 20:13                       ` Gregory Heytings
  2023-02-21 12:05                         ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-20 20:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah

[-- Attachment #1: Type: text/plain, Size: 2051 bytes --]


>
> We probably still have an O(N²) behavior which can bite with a line like
>
>   <id name="N_N_N_N_N_N_N_N_.....">
>
> My patch should significantly improve the constant factor, but with a 
> long enough "N_N_N_N_N..." I suspect it can still end up painful.
>

I just tried that, with a 4 MB such line, and indeed the result is 
painful, but nowhere as painful as this bug: opening that file takes 
"only" about 4 minutes, after which it can be edited normally.

>
> Maybe we should reduce the scope of the search for the fallback case 
> (the case where we add the "[^...]+\\<" prefix) since AFAICT its only 
> purpose is to try and guess a helpful error messages when the XML is 
> ill-formed.
>

That's an idea, yes.  With the following patch even your "n_n_..." example 
opens almost instantanously:

diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index c36d225c7c9..61783ea4dec 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -734,7 +734,7 @@ xmltok-scan-attributes
         (atts-needing-normalization nil))
      (while (cond ((or (looking-at (xmltok-attribute regexp))
                       ;; use non-greedy group
-                     (when (looking-at (concat "[^<>\n]+?"
+                     (when (looking-at (concat "[^<>\n]\\{1,1000\\}?\\<"
                                                 (xmltok-attribute regexp)))
                         (unless recovering
                           (xmltok-add-error "Malformed attribute"

>>> I don't think we want that for `emacs-29`, but unless there's some 
>>> objection I'll push this to `master`,
>>
>> I'd say it fixes an important bug in the regexp engine, but I cannot 
>> judge whether it's important enough for emacs-29.
>
> It's a missing optimization that's been with us for many many years, so 
> I don't see any urgency to fix it.
>

It's not urgent, indeed.  But it doesn't look risky either, especially 
given that you've been using that patch for years.  Anyway, I don't have a 
strong preference.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 20:01                   ` Eli Zaretskii
@ 2023-02-21  2:23                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-21  9:39                       ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-21  2:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gregory, 61514, mah

>> I don't think we want that for `emacs-29`, but unless there's some
>> objection I'll push this to `master`,
>
> Assuming all the regex-emacs-tests and search-tests pass after this
> change, please install on emacs-29, and thanks.

It passed the tests, I pushed it to `emacs-29`.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21  2:23                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-21  9:39                       ` Gregory Heytings
  2023-02-21 12:44                         ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21  9:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah


>>> I don't think we want that for `emacs-29`, but unless there's some 
>>> objection I'll push this to `master`,
>>
>> Assuming all the regex-emacs-tests and search-tests pass after this 
>> change, please install on emacs-29, and thanks.
>
> It passed the tests, I pushed it to `emacs-29`.
>

Great!  I guess we should we also fix the bug in nXML?






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-20 20:13                       ` Gregory Heytings
@ 2023-02-21 12:05                         ` Eli Zaretskii
  2023-02-21 12:37                           ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-21 12:05 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Mon, 20 Feb 2023 20:13:38 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: Eli Zaretskii <eliz@gnu.org>, 61514@debbugs.gnu.org, mah@everybody.org
> 
> > Maybe we should reduce the scope of the search for the fallback case 
> > (the case where we add the "[^...]+\\<" prefix) since AFAICT its only 
> > purpose is to try and guess a helpful error messages when the XML is 
> > ill-formed.
> >
> 
> That's an idea, yes.  With the following patch even your "n_n_..." example 
> opens almost instantanously:
> 
> diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
> index c36d225c7c9..61783ea4dec 100644
> --- a/lisp/nxml/xmltok.el
> +++ b/lisp/nxml/xmltok.el
> @@ -734,7 +734,7 @@ xmltok-scan-attributes
>          (atts-needing-normalization nil))
>       (while (cond ((or (looking-at (xmltok-attribute regexp))
>                        ;; use non-greedy group
> -                     (when (looking-at (concat "[^<>\n]+?"
> +                     (when (looking-at (concat "[^<>\n]\\{1,1000\\}?\\<"
>                                                  (xmltok-attribute regexp)))

SGTM, but isn't 1000 a somewhat low value?  What if we use half of the
value of long-line-optimizations-region-size instead?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 12:05                         ` Eli Zaretskii
@ 2023-02-21 12:37                           ` Gregory Heytings
  2023-02-21 13:07                             ` Eli Zaretskii
  2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21 12:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>
> SGTM, but isn't 1000 a somewhat low value?  What if we use half of the 
> value of long-line-optimizations-region-size instead?
>

Here are some benchmarks.  The time taken by Emacs to open the 4 MB 
"n_n_..." file with different regexps are:

"[^<>\n]\\{1,100\\}?\\<": 0.8 seconds
"[^<>\n]\\{1,1000\\}?\\<": 3.4 seconds
"[^<>\n]\\{1,10000\\}?\\<": 28.5 seconds
"[^<>\n]\\{1,65535\\}?\\<": 162.9 seconds
"[^<>\n]+?\\<": 356.6 seconds

65535 is the upper limit for such ranges, it's not possible to use a 
larger value.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21  9:39                       ` Gregory Heytings
@ 2023-02-21 12:44                         ` Eli Zaretskii
  0 siblings, 0 replies; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-21 12:44 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Tue, 21 Feb 2023 09:39:21 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: Eli Zaretskii <eliz@gnu.org>, 61514@debbugs.gnu.org, mah@everybody.org
> 
> 
> >>> I don't think we want that for `emacs-29`, but unless there's some 
> >>> objection I'll push this to `master`,
> >>
> >> Assuming all the regex-emacs-tests and search-tests pass after this 
> >> change, please install on emacs-29, and thanks.
> >
> > It passed the tests, I pushed it to `emacs-29`.
> >
> 
> Great!  I guess we should we also fix the bug in nXML?

Yes, I'd like that to be fixed as well, but see my question about the
hardcoded 1000.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 12:37                           ` Gregory Heytings
@ 2023-02-21 13:07                             ` Eli Zaretskii
  2023-02-21 14:38                               ` Gregory Heytings
  2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-21 13:07 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Tue, 21 Feb 2023 12:37:11 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, monnier@iro.umontreal.ca
> 
> 
> >
> > SGTM, but isn't 1000 a somewhat low value?  What if we use half of the 
> > value of long-line-optimizations-region-size instead?
> >
> 
> Here are some benchmarks.  The time taken by Emacs to open the 4 MB 
> "n_n_..." file with different regexps are:
> 
> "[^<>\n]\\{1,100\\}?\\<": 0.8 seconds
> "[^<>\n]\\{1,1000\\}?\\<": 3.4 seconds
> "[^<>\n]\\{1,10000\\}?\\<": 28.5 seconds
> "[^<>\n]\\{1,65535\\}?\\<": 162.9 seconds
> "[^<>\n]+?\\<": 356.6 seconds
> 
> 65535 is the upper limit for such ranges, it's not possible to use a 
> larger value.

OK, but does it sound outrageous to have more than 1K of non-newline
characters in a row without any brackets?

At the very least, maybe make the value be in some variable?





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 12:37                           ` Gregory Heytings
  2023-02-21 13:07                             ` Eli Zaretskii
@ 2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-02-21 13:35                               ` Gregory Heytings
  1 sibling, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-21 13:24 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

>> SGTM, but isn't 1000 a somewhat low value?  What if we use half of the
>> value of long-line-optimizations-region-size instead?
>>
>
> Here are some benchmarks.  The time taken by Emacs to open the 4 MB
> "n_n_..." file with different regexps are:
>
> "[^<>\n]\\{1,100\\}?\\<": 0.8 seconds
> "[^<>\n]\\{1,1000\\}?\\<": 3.4 seconds
> "[^<>\n]\\{1,10000\\}?\\<": 28.5 seconds
> "[^<>\n]\\{1,65535\\}?\\<": 162.9 seconds
> "[^<>\n]+?\\<": 356.6 seconds
>
> 65535 is the upper limit for such ranges, it's not possible to use
> a larger value.

BTW, personally when I suggested to limit the search I was thinking of
`narrow-to-region` (which bounds both N factors in the N² complexity).

AFAIK this part of the code is intended mostly when editing XML by
hand, where attributes aren't expected to be ridiculously long, so
limiting to a few kB would be perfectly acceptable (and if the search
fails it's not big deal: when the search succeeds we don't *really* know
what it means either, it may be a false positive anyway).


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-21 13:35                               ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21 13:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 61514, mah

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]


>
> BTW, personally when I suggested to limit the search I was thinking of 
> `narrow-to-region` (which bounds both N factors in the N² complexity).
>

Indeed, that's another way to cope with that problem, and a better one:

diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index c36d225c7c9..9badd7e4c53 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -734,8 +734,10 @@ xmltok-scan-attributes
         (atts-needing-normalization nil))
      (while (cond ((or (looking-at (xmltok-attribute regexp))
                       ;; use non-greedy group
-                     (when (looking-at (concat "[^<>\n]+?"
-                                               (xmltok-attribute regexp)))
+                     (when (with-restriction
+                             (point) (+ (point) 10000)
+                             (looking-at (concat "[^<>\n]+?"
+                                                (xmltok-attribute regexp))))
                         (unless recovering
                           (xmltok-add-error "Malformed attribute"
                                             (point)

With this opening the 4 MB file takes 1.6 seconds.  With 5000 instead of 
10000 it takes 0.8 seconds.

>
> AFAIK this part of the code is intended mostly when editing XML by hand, 
> where attributes aren't expected to be ridiculously long, so limiting to 
> a few kB would be perfectly acceptable (and if the search fails it's not 
> big deal: when the search succeeds we don't *really* know what it means 
> either, it may be a false positive anyway).
>

Indeed.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 13:07                             ` Eli Zaretskii
@ 2023-02-21 14:38                               ` Gregory Heytings
  2023-02-21 14:48                                 ` Eli Zaretskii
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21 14:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>
> OK, but does it sound outrageous to have more than 1K of non-newline 
> characters in a row without any brackets?
>
> At the very least, maybe make the value be in some variable?
>

See my reply to Stefan.  With a 'with-restriction' of 10000 chars, the 
file opens in 1.6 seconds.  I'm not sure it would make sense to add a 
variable/defcustom there instead of the (admittedly somewhat arbitrary) 
constant 10000, which should be large enough in practice.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 14:38                               ` Gregory Heytings
@ 2023-02-21 14:48                                 ` Eli Zaretskii
  2023-02-21 15:25                                   ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-02-21 14:48 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Tue, 21 Feb 2023 14:38:41 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: mah@everybody.org, 61514@debbugs.gnu.org, monnier@iro.umontreal.ca
> 
> 
> >
> > OK, but does it sound outrageous to have more than 1K of non-newline 
> > characters in a row without any brackets?
> >
> > At the very least, maybe make the value be in some variable?
> >
> 
> See my reply to Stefan.  With a 'with-restriction' of 10000 chars, the 
> file opens in 1.6 seconds.  I'm not sure it would make sense to add a 
> variable/defcustom there instead of the (admittedly somewhat arbitrary) 
> constant 10000, which should be large enough in practice.

OK, then let's go with that version.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 14:48                                 ` Eli Zaretskii
@ 2023-02-21 15:25                                   ` Gregory Heytings
  2023-02-21 15:44                                     ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21 15:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>> See my reply to Stefan.  With a 'with-restriction' of 10000 chars, the 
>> file opens in 1.6 seconds.  I'm not sure it would make sense to add a 
>> variable/defcustom there instead of the (admittedly somewhat arbitrary) 
>> constant 10000, which should be large enough in practice.
>
> OK, then let's go with that version.
>

OK, thanks.  Stefan, do you have any further comments/objections on that 
version?






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 15:25                                   ` Gregory Heytings
@ 2023-02-21 15:44                                     ` Gregory Heytings
  2023-02-21 16:58                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 75+ messages in thread
From: Gregory Heytings @ 2023-02-21 15:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>>> See my reply to Stefan.  With a 'with-restriction' of 10000 chars, the 
>>> file opens in 1.6 seconds.  I'm not sure it would make sense to add a 
>>> variable/defcustom there instead of the (admittedly somewhat 
>>> arbitrary) constant 10000, which should be large enough in practice.
>> 
>> OK, then let's go with that version.
>
> OK, thanks.  Stefan, do you have any further comments/objections on that 
> version?
>

By the way, I noted that a variant of the regexp still produces stack 
overflows:

(with-current-buffer (get-buffer-create "*bug*")
   (erase-buffer)
   (insert (make-string 266665 ?x) "=")
   (goto-char (point-min))
   (looking-at "[^y]*=*"))

266665 overflows, 266664 does not.  Is that expected?






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 15:44                                     ` Gregory Heytings
@ 2023-02-21 16:58                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-03-18 10:59                                         ` Gregory Heytings
  0 siblings, 1 reply; 75+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-21 16:58 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514, mah

>> OK, thanks.  Stefan, do you have any further comments/objections on
>> that version?

LGTM.

> By the way, I noted that a variant of the regexp still produces stack
>  overflows:
>
> (with-current-buffer (get-buffer-create "*bug*")
>   (erase-buffer)
>   (insert (make-string 266665 ?x) "=")
>   (goto-char (point-min))
>   (looking-at "[^y]*=*"))
>
> 266665 overflows, 266664 does not.  Is that expected?

Yes, there's "nothing" we can do about it (short of a significant
redesign of the engine): [^y] also matches = so at every iteration of
the loop, both paths (perform one more iteration, or exit the loop) are
valid, so we need to try them both, which we do via backtracking.

We'd need a "Thompson NFA" or something along the same lines to avoid
it.

Of course, we could also just backtrack less deep by exploring the
search space in a different order (e.g. the `*?` repetition does that),
but if we want to still return the same end result, we'd then have to
explore more of the search space (and after the fact, choose which
match we should return) rather than stop at the first match.


        Stefan






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-02-21 16:58                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-03-18 10:59                                         ` Gregory Heytings
  2023-03-18 11:10                                           ` Eli Zaretskii
  2023-03-19  2:39                                           ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-03-18 10:59 UTC (permalink / raw)
  To: mah; +Cc: Eli Zaretskii, 61514-done, Stefan Monnier


The patch to xmltok.el has just been pushed to emacs-29 (0eddfa28eb), and 
I'm therefore closing this bug.

Thanks again for your bug report, Mark.  Now that the bugs in the regexp 
engine and in xmltok have been fixed, your file opens in a fraction of a 
second.  I suggest you also try to open a similar 40 MB or 400 MB one-line 
file, to see how Emacs 29 handles files with long lines.






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-03-18 10:59                                         ` Gregory Heytings
@ 2023-03-18 11:10                                           ` Eli Zaretskii
  2023-03-18 15:06                                             ` Gregory Heytings
  2023-03-19  2:39                                           ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 75+ messages in thread
From: Eli Zaretskii @ 2023-03-18 11:10 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mah, 61514, monnier

> Date: Sat, 18 Mar 2023 10:59:20 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: Eli Zaretskii <eliz@gnu.org>, 61514-done@debbugs.gnu.org, 
>     Stefan Monnier <monnier@iro.umontreal.ca>
> 
> 
> The patch to xmltok.el has just been pushed to emacs-29 (0eddfa28eb), and 
> I'm therefore closing this bug.

Thanks, but shouldn't we limit the END argument of with-restriction to
not exceed (point-max)?  I see no protection from this anywhere in the
subroutines called by with-restriction.





^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50; sadistically long xml line hangs emacs
  2023-03-18 11:10                                           ` Eli Zaretskii
@ 2023-03-18 15:06                                             ` Gregory Heytings
  0 siblings, 0 replies; 75+ messages in thread
From: Gregory Heytings @ 2023-03-18 15:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mah, 61514, monnier


>> The patch to xmltok.el has just been pushed to emacs-29 (0eddfa28eb), 
>> and I'm therefore closing this bug.
>
> Thanks, but shouldn't we limit the END argument of with-restriction to 
> not exceed (point-max)?  I see no protection from this anywhere in the 
> subroutines called by with-restriction.
>

Good catch, thanks!  Now fixed (11592bcfda).






^ permalink raw reply	[flat|nested] 75+ messages in thread

* bug#61514: 30.0.50;  sadistically long xml line hangs emacs
  2023-03-18 10:59                                         ` Gregory Heytings
  2023-03-18 11:10                                           ` Eli Zaretskii
@ 2023-03-19  2:39                                           ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 75+ messages in thread
From: mah via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-03-19  2:39 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: Eli Zaretskii, 61514-done, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 392 bytes --]


I have a few processes running on my laptop so load is pretty high right now.  This means the file does not load in a fraction of a second on emacs29 for me right now.

It does load fairly quickly, though.  And editing close to the end of the file is pretty snappy.  Inserting characters closer to the beginning is slower.

But, yes, this bug is fixed.

I'm very pleased :)

 

[-- Attachment #2: Type: text/html, Size: 451 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2023-03-19  2:39 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 22:05 ` Gregory Heytings
2023-02-15  1:04   ` Mark A. Hershberger
2023-02-15  8:39     ` Gregory Heytings
2023-02-15 10:24       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:41         ` Gregory Heytings
2023-02-15 10:52           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:59             ` Gregory Heytings
2023-02-15 11:52               ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 12:11                 ` Gregory Heytings
2023-02-15 12:54                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:31                     ` Gregory Heytings
2023-02-15 13:56                 ` Eli Zaretskii
2023-02-15 12:20       ` Dmitry Gutov
2023-02-15 13:58         ` Gregory Heytings
2023-02-15 14:17           ` Eli Zaretskii
2023-02-15 14:34             ` Gregory Heytings
2023-02-18 16:22 ` Eli Zaretskii
2023-02-18 17:06   ` Mark A. Hershberger
2023-02-18 17:58     ` Eli Zaretskii
2023-02-18 23:06   ` Gregory Heytings
2023-02-19  0:46     ` Gregory Heytings
2023-02-19  6:42       ` Eli Zaretskii
2023-02-19 23:12         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:48         ` Gregory Heytings
2023-02-19 23:58           ` Gregory Heytings
2023-02-20  2:05             ` Gregory Heytings
2023-02-20  4:24               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 11:28                 ` Gregory Heytings
2023-02-20 12:33               ` Eli Zaretskii
2023-02-20 12:31             ` Eli Zaretskii
2023-02-20 12:40               ` Gregory Heytings
2023-02-20 13:14                 ` Eli Zaretskii
2023-02-20 14:17                   ` Gregory Heytings
2023-02-20  0:14           ` Gregory Heytings
2023-02-20 12:32             ` Eli Zaretskii
2023-02-19 23:48   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:19     ` Eli Zaretskii
2023-02-20 13:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 13:54         ` Eli Zaretskii
2023-02-20 14:59           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 15:56             ` Gregory Heytings
2023-02-20 16:47               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 17:14                 ` Gregory Heytings
2023-02-20 17:34                   ` Gregory Heytings
2023-02-20 18:49                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:11                   ` Gregory Heytings
2023-02-20 19:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:37                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 20:13                       ` Gregory Heytings
2023-02-21 12:05                         ` Eli Zaretskii
2023-02-21 12:37                           ` Gregory Heytings
2023-02-21 13:07                             ` Eli Zaretskii
2023-02-21 14:38                               ` Gregory Heytings
2023-02-21 14:48                                 ` Eli Zaretskii
2023-02-21 15:25                                   ` Gregory Heytings
2023-02-21 15:44                                     ` Gregory Heytings
2023-02-21 16:58                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-18 10:59                                         ` Gregory Heytings
2023-03-18 11:10                                           ` Eli Zaretskii
2023-03-18 15:06                                             ` Gregory Heytings
2023-03-19  2:39                                           ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:24                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:35                               ` Gregory Heytings
2023-02-20 20:01                   ` Eli Zaretskii
2023-02-21  2:23                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21  9:39                       ` Gregory Heytings
2023-02-21 12:44                         ` Eli Zaretskii
2023-02-20 17:04               ` Gregory Heytings
2023-02-20 14:06         ` Gregory Heytings
2023-02-20 14:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 14:24             ` Gregory Heytings
2023-02-20 15:02               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:41   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).