unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
@ 2018-12-27 10:13 Vincent Lefevre
  2018-12-27 16:02 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Vincent Lefevre @ 2018-12-27 10:13 UTC (permalink / raw)
  To: 33887


When I open a large XML file and immediately go to the end of the
file with '<ESC> >', Emacs hangs for several seconds. For instance,
on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
(a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
takes 15 seconds.

This is a regression: Emacs 25 did not hang at all.


In GNU Emacs 26.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.2)
 of 2018-12-26, modified by Debian built on x86-ubc-01
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description:	Debian GNU/Linux buster/sid

Recent messages:
Loading /etc/emacs/site-start.d/50latex-cjk-common.el (source)...done
Loading /etc/emacs/site-start.d/50latex-cjk-thai.el (source)...done
Loading /etc/emacs/site-start.d/50maxima-emacs.el (source)...done
Loading /etc/emacs/site-start.d/50psvn.el (source)...done
Loading /etc/emacs/site-start.d/50python-docutils.el (source)...done
Loading /etc/emacs/site-start.d/50texlive-lang-english.el (source)...done
Loading /etc/emacs/site-start.d/50why3.el (source)...done
Loading /home/vinc17/share/emacs/site-lisp/mutteditor.el (source)...done
Loading time...done
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --build x86_64-linux-gnu --prefix=/usr
 --sharedstatedir=/var/lib --libexecdir=/usr/lib
 --localstatedir=/var/lib --infodir=/usr/share/info
 --mandir=/usr/share/man --enable-libsystemd --with-pop=yes
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
 --with-sound=alsa --without-gconf --with-mailutils --build
 x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
 --libexecdir=/usr/lib --localstatedir=/var/lib
 --infodir=/usr/share/info --mandir=/usr/share/man --enable-libsystemd
 --with-pop=yes
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
 --with-sound=alsa --without-gconf --with-mailutils --with-x=yes
 --with-x-toolkit=gtk3 --with-toolkit-scroll-bars 'CFLAGS=-g -O2
 -fdebug-prefix-map=/build/emacs-3ThesY/emacs-26.1+1=.
 -fstack-protector-strong -Wformat -Werror=format-security -Wall'
 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' LDFLAGS=-Wl,-z,relro'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 THREADS LIBSYSTEMD LCMS2

Important settings:
  value of $LC_COLLATE: POSIX
  value of $LC_CTYPE: en_US.UTF-8
  value of $LC_TIME: en_DK
  value of $LANG: POSIX
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  display-time-mode: t
  show-paren-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.6/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.6/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.6/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.7/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.7/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.7/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.8/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.8/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.8/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.9/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.9/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.9/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-4.0/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-4.0/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-4.0/emacs
/usr/share/emacs/site-lisp/rst hides /usr/share/emacs/26.1/lisp/textmodes/rst
/usr/share/emacs/site-lisp/latex-cjk-thai/thai-word hides /usr/share/emacs/26.1/lisp/language/thai-word

Features:
(shadow sort mail-extr warnings emacsbug message rmc puny seq byte-opt
gv bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822
mml easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils elec-pair time cus-start cus-load paren
cc-styles cc-align cc-engine cc-vars cc-defs edmacro kmacro cl-loaddefs
cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 118562 10618)
 (symbols 48 23199 1)
 (miscs 40 54 133)
 (strings 32 34944 2101)
 (string-bytes 1 946046)
 (vectors 16 15937)
 (vector-slots 8 510844 4784)
 (floats 8 56 97)
 (intervals 56 279 0)
 (buffers 992 12))





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 10:13 bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode Vincent Lefevre
@ 2018-12-27 16:02 ` Eli Zaretskii
  2018-12-27 16:39   ` Stefan Monnier
  2019-01-17 22:57   ` Stefan Monnier
  2019-01-08 22:11 ` Fernando Jascovich
  2019-05-15 23:53 ` Noam Postavsky
  2 siblings, 2 replies; 42+ messages in thread
From: Eli Zaretskii @ 2018-12-27 16:02 UTC (permalink / raw)
  To: Vincent Lefevre, Stefan Monnier; +Cc: 33887

> From: Vincent Lefevre <vincent@vinc17.net>
> Date: Thu, 27 Dec 2018 11:13:06 +0100
> 
> When I open a large XML file and immediately go to the end of the
> file with '<ESC> >', Emacs hangs for several seconds. For instance,
> on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
> (a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
> takes 15 seconds.
> 
> This is a regression: Emacs 25 did not hang at all.

Confirmed, thanks.

The profile (see below) blames syntax-ppss called by
sgml-syntax-propertize, so I suspect commit 0055190, which added
sgml-syntax-propertize-inside to sgml-syntax-propertize.

CC'ing Stefan who made those changes.

Here's the profile:

  - command-execute                                                 532  77%
   - call-interactively                                             532  77%
    - funcall-interactively                                         522  75%
     - end-of-buffer                                                500  72%
      - recenter                                                    496  71%
       - jit-lock-function                                          496  71%
	- jit-lock-fontify-now                                      496  71%
	 - jit-lock--run-functions                                  496  71%
	  - run-hook-wrapped                                        496  71%
	   - #<compiled 0x200000000b3a7fd0>                         496  71%
	    - font-lock-fontify-region                              496  71%
	     - font-lock-default-fontify-region                     496  71%
	      - nxml-extend-region                                  496  71%
	       - skip-syntax-forward                                496  71%
		- internal--syntax-propertize                       496  71%
		 - syntax-propertize                                496  71%
		  - sgml-syntax-propertize                          490  71%
		     syntax-ppss                                    445  64%
	push-mark                                                     1   0%
     - find-file                                                     20   2%
      - find-file-noselect                                           20   2%
       - find-file-noselect-1                                        19   2%
	- after-find-file                                            17   2%
	 - normal-mode                                               17   2%
	  - set-auto-mode                                            17   2%
	   - set-auto-mode-0                                         17   2%
	    - xml-mode                                               17   2%
	     - byte-code                                             14   2%
	      - require                                              12   1%
	       - byte-code                                           11   1%
		- require                                            10   1%
		 - byte-code                                          9   1%
		  - require                                           6   0%
		   - byte-code                                        6   0%
		    - cl-generic-define-method                        4   0%
		     - cl--generic-make-function                      4   0%
		      - cl--generic-make-next-function                  4   0%
		       - cl--generic-get-dispatcher                   4   0%
			- byte-compile                                3   0%
			   byte-code                                  1   0%
			 - #<compiled 0x200000000b325048>                  1   0%
			    byte-compile-top-level                    1   0%
		  - custom-declare-variable                           1   0%
		   - custom-initialize-reset                          1   0%
		    - eval                                            1   0%
		     - funcall                                        1   0%
		      - #<compiled 0x200000000b3c88b8>                  1   0%
		       - executable-find                              1   0%
			  locate-file                                 1   0%
		 file-truename                                        1   0%
	     - rng-nxml-mode-init                                     2   0%
	      - rng-validate-mode                                     2   0%
	       - rng-auto-set-schema                                  2   0%
		- rng-locate-schema-file                              2   0%
		 - rng-locate-schema-file-using                       2   0%
		  - rng-get-parsed-schema-locating-file                  2   0%
		   - rng-parse-schema-locating-file                   1   0%
		    - rng-parse-validate-file                         1   0%
		     - nxml-parse-instance                            1   0%
			nxml-parse-instance-1                         1   0%
	     - file-truename                                          1   0%
	      - file-truename                                         1   0%
	       - file-truename                                        1   0%
		  file-truename                                       1   0%
	- insert-file-contents                                        1   0%
	   xml-find-file-coding-system                                1   0%
     - execute-extended-command                                       1   0%
      - sit-for                                                       1   0%
	 redisplay                                                    1   0%
     - minibuffer-complete                                            1   0%
      - completion-in-region                                          1   0%
       - completion--in-region                                        1   0%
	- #<compiled 0x2000000001b04c20>                              1   0%
	 - apply                                                      1   0%
	  - #<compiled 0x20000000013baac8>                            1   0%
	   - completion--in-region-1                                  1   0%
	    - completion--do-completion                               1   0%
	     - completion-try-completion                              1   0%
	      - completion--nth-completion                            1   0%
	       - completion--some                                     1   0%
		- #<compiled 0x2000000001b0bd20>                      1   0%
		 - completion-basic-try-completion                    1   0%
		  - try-completion                                    1   0%
		     completion-file-name-table                       1   0%
    - byte-code                                                      10   1%
     - read-extended-command                                          9   1%
      - completing-read                                               9   1%
       - completing-read-default                                      9   1%
	  read-from-minibuffer                                        9   1%
     - find-file-read-args                                            1   0%
      - read-file-name                                                1   0%
       - read-file-name-default                                       1   0%
	- completing-read                                             1   0%
	 - completing-read-default                                    1   0%
	  - read-from-minibuffer                                      1   0%
	   - redisplay_internal (C function)                          1   0%
	      find-image                                              1   0%
  - ...                                                             158  22%
     Automatic GC                                                   156  22%
   - macroexp--all-forms                                              1   0%
    - macroexp--expand-all                                            1   0%
     - #<compiled 0x2000000001375130>                                 1   0%
      - macroexp--all-forms                                           1   0%
       - macroexp--expand-all                                         1   0%
	- macroexp--all-forms                                         1   0%
	 - macroexp--expand-all                                       1   0%
	  - #<compiled 0x2000000001375130>                            1   0%
	   - macroexp--all-forms                                      1   0%
	    - macroexp--expand-all                                    1   0%
	     - #<compiled 0x2000000001375068>                         1   0%
	      - macroexp--all-forms                                   1   0%
	       - macroexp--expand-all                                 1   0%
		- macroexp-macroexpand                                1   0%
		 - macroexpand                                        1   0%
		    #<compiled 0x20000000013f0600>                    1   0%
   - rng-compute-start-tag-open-deriv                                 1   0%
    - rng-element-get-child                                           1   0%
     - rng-compile                                                    1   0%
      - apply                                                         1   0%
       - rng-compile-group                                            1   0%
	- mapcar                                                      1   0%
	 - rng-compile                                                1   0%
	  - apply                                                     1   0%
	   - rng-compile-attribute                                    1   0%
	    - rng-compile                                             1   0%
	     - apply                                                  1   0%
	      - rng-compile-ref                                       1   0%
	       - rng-compile                                          1   0%
		- apply                                               1   0%
		 - rng-compile-data                                   1   0%
		    rng-compile-dt                                    1   0%





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 16:02 ` Eli Zaretskii
@ 2018-12-27 16:39   ` Stefan Monnier
  2018-12-27 16:43     ` Eli Zaretskii
  2019-01-17 22:57   ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2018-12-27 16:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Vincent Lefevre, 33887

>> When I open a large XML file and immediately go to the end of the
>> file with '<ESC> >', Emacs hangs for several seconds. For instance,
>> on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
>> (a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
>> takes 15 seconds.
>> 
>> This is a regression: Emacs 25 did not hang at all.
>
> Confirmed, thanks.
>
> The profile (see below) blames syntax-ppss called by
> sgml-syntax-propertize, so I suspect commit 0055190, which added
> sgml-syntax-propertize-inside to sgml-syntax-propertize.

Sounds right, but I'm not sure what to do about this.
I don't wonder why so much time is passed on syntax-ppss, which is
generally expected to be relatively fast.
Maybe sgml-syntax-propertize is called too often (I see it's mostly
called from skip-syntax-forward; maybe we should call syntax-propertize
explicitly beforehand with a more distant position so
sgml-syntax-propertize is called just once).


        Stefan


> Here's the profile:
>
>   - command-execute                                                 532  77%
>    - call-interactively                                             532  77%
>     - funcall-interactively                                         522  75%
>      - end-of-buffer                                                500  72%
>       - recenter                                                    496  71%
>        - jit-lock-function                                          496  71%
> 	- jit-lock-fontify-now                                      496  71%
> 	 - jit-lock--run-functions                                  496  71%
> 	  - run-hook-wrapped                                        496  71%
> 	   - #<compiled 0x200000000b3a7fd0>                         496  71%
> 	    - font-lock-fontify-region                              496  71%
> 	     - font-lock-default-fontify-region                     496  71%
> 	      - nxml-extend-region                                  496  71%
> 	       - skip-syntax-forward                                496  71%
> 		- internal--syntax-propertize                       496  71%
> 		 - syntax-propertize                                496  71%
> 		  - sgml-syntax-propertize                          490  71%
> 		     syntax-ppss                                    445  64%
> 	push-mark                                                     1   0%
>      - find-file                                                     20   2%
>       - find-file-noselect                                           20   2%
>        - find-file-noselect-1                                        19   2%
> 	- after-find-file                                            17   2%
> 	 - normal-mode                                               17   2%
> 	  - set-auto-mode                                            17   2%
> 	   - set-auto-mode-0                                         17   2%
> 	    - xml-mode                                               17   2%
> 	     - byte-code                                             14   2%
> 	      - require                                              12   1%
> 	       - byte-code                                           11   1%
> 		- require                                            10   1%
> 		 - byte-code                                          9   1%
> 		  - require                                           6   0%
> 		   - byte-code                                        6   0%
> 		    - cl-generic-define-method                        4   0%
> 		     - cl--generic-make-function                      4   0%
> 		      - cl--generic-make-next-function                  4   0%
> 		       - cl--generic-get-dispatcher                   4   0%
> 			- byte-compile                                3   0%
> 			   byte-code                                  1   0%
> 			 - #<compiled 0x200000000b325048>                  1   0%
> 			    byte-compile-top-level                    1   0%
> 		  - custom-declare-variable                           1   0%
> 		   - custom-initialize-reset                          1   0%
> 		    - eval                                            1   0%
> 		     - funcall                                        1   0%
> 		      - #<compiled 0x200000000b3c88b8>                  1   0%
> 		       - executable-find                              1   0%
> 			  locate-file                                 1   0%
> 		 file-truename                                        1   0%
> 	     - rng-nxml-mode-init                                     2   0%
> 	      - rng-validate-mode                                     2   0%
> 	       - rng-auto-set-schema                                  2   0%
> 		- rng-locate-schema-file                              2   0%
> 		 - rng-locate-schema-file-using                       2   0%
> 		  - rng-get-parsed-schema-locating-file                  2   0%
> 		   - rng-parse-schema-locating-file                   1   0%
> 		    - rng-parse-validate-file                         1   0%
> 		     - nxml-parse-instance                            1   0%
> 			nxml-parse-instance-1                         1   0%
> 	     - file-truename                                          1   0%
> 	      - file-truename                                         1   0%
> 	       - file-truename                                        1   0%
> 		  file-truename                                       1   0%
> 	- insert-file-contents                                        1   0%
> 	   xml-find-file-coding-system                                1   0%
>      - execute-extended-command                                       1   0%
>       - sit-for                                                       1   0%
> 	 redisplay                                                    1   0%
>      - minibuffer-complete                                            1   0%
>       - completion-in-region                                          1   0%
>        - completion--in-region                                        1   0%
> 	- #<compiled 0x2000000001b04c20>                              1   0%
> 	 - apply                                                      1   0%
> 	  - #<compiled 0x20000000013baac8>                            1   0%
> 	   - completion--in-region-1                                  1   0%
> 	    - completion--do-completion                               1   0%
> 	     - completion-try-completion                              1   0%
> 	      - completion--nth-completion                            1   0%
> 	       - completion--some                                     1   0%
> 		- #<compiled 0x2000000001b0bd20>                      1   0%
> 		 - completion-basic-try-completion                    1   0%
> 		  - try-completion                                    1   0%
> 		     completion-file-name-table                       1   0%
>     - byte-code                                                      10   1%
>      - read-extended-command                                          9   1%
>       - completing-read                                               9   1%
>        - completing-read-default                                      9   1%
> 	  read-from-minibuffer                                        9   1%
>      - find-file-read-args                                            1   0%
>       - read-file-name                                                1   0%
>        - read-file-name-default                                       1   0%
> 	- completing-read                                             1   0%
> 	 - completing-read-default                                    1   0%
> 	  - read-from-minibuffer                                      1   0%
> 	   - redisplay_internal (C function)                          1   0%
> 	      find-image                                              1   0%
>   - ...                                                             158  22%
>      Automatic GC                                                   156  22%
>    - macroexp--all-forms                                              1   0%
>     - macroexp--expand-all                                            1   0%
>      - #<compiled 0x2000000001375130>                                 1   0%
>       - macroexp--all-forms                                           1   0%
>        - macroexp--expand-all                                         1   0%
> 	- macroexp--all-forms                                         1   0%
> 	 - macroexp--expand-all                                       1   0%
> 	  - #<compiled 0x2000000001375130>                            1   0%
> 	   - macroexp--all-forms                                      1   0%
> 	    - macroexp--expand-all                                    1   0%
> 	     - #<compiled 0x2000000001375068>                         1   0%
> 	      - macroexp--all-forms                                   1   0%
> 	       - macroexp--expand-all                                 1   0%
> 		- macroexp-macroexpand                                1   0%
> 		 - macroexpand                                        1   0%
> 		    #<compiled 0x20000000013f0600>                    1   0%
>    - rng-compute-start-tag-open-deriv                                 1   0%
>     - rng-element-get-child                                           1   0%
>      - rng-compile                                                    1   0%
>       - apply                                                         1   0%
>        - rng-compile-group                                            1   0%
> 	- mapcar                                                      1   0%
> 	 - rng-compile                                                1   0%
> 	  - apply                                                     1   0%
> 	   - rng-compile-attribute                                    1   0%
> 	    - rng-compile                                             1   0%
> 	     - apply                                                  1   0%
> 	      - rng-compile-ref                                       1   0%
> 	       - rng-compile                                          1   0%
> 		- apply                                               1   0%
> 		 - rng-compile-data                                   1   0%
> 		    rng-compile-dt                                    1   0%





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 16:39   ` Stefan Monnier
@ 2018-12-27 16:43     ` Eli Zaretskii
  2018-12-27 17:32       ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2018-12-27 16:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: vincent, 33887

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Vincent Lefevre <vincent@vinc17.net>, 33887@debbugs.gnu.org
> Date: Thu, 27 Dec 2018 11:39:06 -0500
> 
> > The profile (see below) blames syntax-ppss called by
> > sgml-syntax-propertize, so I suspect commit 0055190, which added
> > sgml-syntax-propertize-inside to sgml-syntax-propertize.
> 
> Sounds right, but I'm not sure what to do about this.
> I don't wonder why so much time is passed on syntax-ppss, which is
> generally expected to be relatively fast.

Why was sgml-syntax-propertize-inside added?  Is its effect an
absolute must, or merely a nice-to-have feature?  If the latter,
perhaps a defcustom that could disable that call will be an okay
solution, at least as a stopgap?





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 16:43     ` Eli Zaretskii
@ 2018-12-27 17:32       ` Stefan Monnier
  2018-12-27 17:47         ` Eli Zaretskii
  2018-12-27 18:43         ` Vincent Lefevre
  0 siblings, 2 replies; 42+ messages in thread
From: Stefan Monnier @ 2018-12-27 17:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: vincent, 33887

> Why was sgml-syntax-propertize-inside added?  Is its effect an
> absolute must, or merely a nice-to-have feature?

It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>

> If the latter, perhaps a defcustom that could disable that call will
> be an okay solution, at least as a stopgap?

I don't think it should be terribly expensive, so I'd rather first try
and better understand the performance issue,


        Stefan





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 17:32       ` Stefan Monnier
@ 2018-12-27 17:47         ` Eli Zaretskii
  2018-12-27 18:43         ` Vincent Lefevre
  1 sibling, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2018-12-27 17:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: vincent, 33887

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: vincent@vinc17.net, 33887@debbugs.gnu.org
> Date: Thu, 27 Dec 2018 12:32:21 -0500
> 
> > If the latter, perhaps a defcustom that could disable that call will
> > be an okay solution, at least as a stopgap?
> 
> I don't think it should be terribly expensive, so I'd rather first try
> and better understand the performance issue,

Sure.  I thought you already did ;-)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 17:32       ` Stefan Monnier
  2018-12-27 17:47         ` Eli Zaretskii
@ 2018-12-27 18:43         ` Vincent Lefevre
  2018-12-28 17:18           ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2018-12-27 18:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 33887

On 2018-12-27 12:32:21 -0500, Stefan Monnier wrote:
> > Why was sgml-syntax-propertize-inside added?  Is its effect an
> > absolute must, or merely a nice-to-have feature?
> 
> It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>

I use both in some of my XML files and I have never found any issue
with them. Or perhaps this is just for particular cases?

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 18:43         ` Vincent Lefevre
@ 2018-12-28 17:18           ` Stefan Monnier
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2018-12-28 17:18 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: 33887

>> > Why was sgml-syntax-propertize-inside added?  Is its effect an
>> > absolute must, or merely a nice-to-have feature?
>> It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>
> I use both in some of my XML files and I have never found any issue
> with them. Or perhaps this is just for particular cases?

Yes, it only makes a real difference when the content of those things
ends up confusing the parser (e.g. it looks like an unclosed tag, or
things along these lines).


        Stefan





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 10:13 bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode Vincent Lefevre
  2018-12-27 16:02 ` Eli Zaretskii
@ 2019-01-08 22:11 ` Fernando Jascovich
  2019-01-10 15:09   ` Eli Zaretskii
  2019-05-15 23:53 ` Noam Postavsky
  2 siblings, 1 reply; 42+ messages in thread
From: Fernando Jascovich @ 2019-01-08 22:11 UTC (permalink / raw)
  To: 33887

Hi everyone, this is my first email to bug-gnu-emacs, so please let me
know if I am making some mistake.
For no special reason, I took this bug in order to start to know  emacs'
code.
Following and confirming the details of the bug, I found that indeed the
performance issue is introduced at commit 0055190174, but not beacuse
the introduction of `sgml-syntax-propertize-inside`.
The problem is with the last rule:
```
("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
                    (goto-char (match-end 0)))
                  (string-to-syntax ".")))
```
I can't see the real effect of this rule, I tested xml parsing without
this rule and it works fine, marking double quotes inside tags as
expected without this performance issue.
Do we need to target double quotes outside tags explicitly?

-- 
Fernando Jascovich
developer
m: +54 9 3548 63 9833
github: https://github.com/fernando-jascovich/
linkedin: https://www.linkedin.com/in/fernandojascovich/





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-01-08 22:11 ` Fernando Jascovich
@ 2019-01-10 15:09   ` Eli Zaretskii
  2019-01-17 23:25     ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2019-01-10 15:09 UTC (permalink / raw)
  To: Fernando Jascovich, Stefan Monnier; +Cc: 33887

> From: Fernando Jascovich <fernando.ej@gmail.com>
> Date: Tue, 08 Jan 2019 19:11:02 -0300
> 
> Hi everyone, this is my first email to bug-gnu-emacs, so please let me
> know if I am making some mistake.
> For no special reason, I took this bug in order to start to know  emacs'
> code.
> Following and confirming the details of the bug, I found that indeed the
> performance issue is introduced at commit 0055190174, but not beacuse
> the introduction of `sgml-syntax-propertize-inside`.
> The problem is with the last rule:
> ```
> ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
>                     (goto-char (match-end 0)))
>                   (string-to-syntax ".")))
> ```
> I can't see the real effect of this rule, I tested xml parsing without
> this rule and it works fine, marking double quotes inside tags as
> expected without this performance issue.
> Do we need to target double quotes outside tags explicitly?

Stefan, any comments?





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 16:02 ` Eli Zaretskii
  2018-12-27 16:39   ` Stefan Monnier
@ 2019-01-17 22:57   ` Stefan Monnier
  1 sibling, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2019-01-17 22:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Vincent Lefevre, 33887

> The profile (see below) blames syntax-ppss called by
> sgml-syntax-propertize, so I suspect commit 0055190, which added
> sgml-syntax-propertize-inside to sgml-syntax-propertize.

Hmm... actually, the syntax-ppss calls that take time are directly made
from within sgml-syntax-propertize rather than from within
sgml-syntax-propertize-inside (which doesn't even appear in your profile
(in my profile I get 8099 units of time in sgml-syntax-propertize, of
which 7611 in syntax-ppss and only 77 in sgml-syntax-propertize-inside).
The problem seems to come from the following syntax propertize rule:

     ;; Double quotes outside of tags should not introduce strings.
     ;; Be careful to call `syntax-ppss' on a position before the one we're
     ;; going to change, so as not to need to flush the data we just computed.
     ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
                    (goto-char (match-end 0)))
                  (string-to-syntax "."))))

If I comment it out, the delay is *much* smaller.

The problem being that " are quite common characters in XML files, so
the regexp matches often and we call syntax-ppss each time, so we end up
calling syntax-ppss very often.

I'm trying to figure out how to avoid calling syntax-ppss for every
" character.  I'm thinking of looking at pairs of " chars and only do
extra work if there's a < or > between the two.


        Stefan





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-01-10 15:09   ` Eli Zaretskii
@ 2019-01-17 23:25     ` Stefan Monnier
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2019-01-17 23:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Fernando Jascovich, 33887

>> From: Fernando Jascovich <fernando.ej@gmail.com>
>> Date: Tue, 08 Jan 2019 19:11:02 -0300
>> 
>> Hi everyone, this is my first email to bug-gnu-emacs, so please let me
>> know if I am making some mistake.
>> For no special reason, I took this bug in order to start to know  emacs'
>> code.
>> Following and confirming the details of the bug, I found that indeed the
>> performance issue is introduced at commit 0055190174, but not beacuse
>> the introduction of `sgml-syntax-propertize-inside`.
>> The problem is with the last rule:
>> ```
>> ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
>>                     (goto-char (match-end 0)))
>>                   (string-to-syntax ".")))
>> ```
>> I can't see the real effect of this rule, I tested xml parsing without
>> this rule and it works fine, marking double quotes inside tags as
>> expected without this performance issue.
>> Do we need to target double quotes outside tags explicitly?
>
> Stefan, any comments?

Yes, he's exactly right.

I just pushed a patch to master which should reduce significantly
this delay.


        Stefan





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2018-12-27 10:13 bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode Vincent Lefevre
  2018-12-27 16:02 ` Eli Zaretskii
  2019-01-08 22:11 ` Fernando Jascovich
@ 2019-05-15 23:53 ` Noam Postavsky
  2019-05-16 10:54   ` Vincent Lefevre
                     ` (2 more replies)
  2 siblings, 3 replies; 42+ messages in thread
From: Noam Postavsky @ 2019-05-15 23:53 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: 33887

Vincent Lefevre <vincent@vinc17.net> writes:

> This is a regression: Emacs 25 did not hang at all.

Should we backport Stefan's fix to emacs-26?  Or specifically, backport
[1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
loss-of-single-quote-fontification bug (Bug#35381).

[1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
  Fix merge of sgml-syntax-propertize-rules
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-15 23:53 ` Noam Postavsky
@ 2019-05-16 10:54   ` Vincent Lefevre
  2019-05-16 12:15   ` Noam Postavsky
  2019-05-16 14:01   ` Eli Zaretskii
  2 siblings, 0 replies; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-16 10:54 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 33887

Hi,

On 2019-05-15 19:53:08 -0400, Noam Postavsky wrote:
> Vincent Lefevre <vincent@vinc17.net> writes:
> 
> > This is a regression: Emacs 25 did not hang at all.
> 
> Should we backport Stefan's fix to emacs-26?  Or specifically, backport
> [1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
> loss-of-single-quote-fontification bug (Bug#35381).
> 
> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
>   Fix merge of sgml-syntax-propertize-rules
>   https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b

It would be nice if this could be fixed quickly in emacs-26,
hoping that it could be fixed in Debian before the next stable
release.

(I'm still using Emacs 25 because of this bug.)

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-15 23:53 ` Noam Postavsky
  2019-05-16 10:54   ` Vincent Lefevre
@ 2019-05-16 12:15   ` Noam Postavsky
  2019-05-17 21:36     ` Vincent Lefevre
  2019-05-16 14:01   ` Eli Zaretskii
  2 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-16 12:15 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: 33887

[-- Attachment #1: Type: text/plain, Size: 336 bytes --]

Noam Postavsky <npostavs@gmail.com> writes:

> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
>   Fix merge of sgml-syntax-propertize-rules
>   https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b

Uh, I goofed that one, Stefan fixed it [2: 9a74e5666b].  The corrected patch would be as follows:


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 3164 bytes --]

From 2221c244ee01c4c336ec860cf52a1ef37111ff19 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Wed, 15 May 2019 18:51:30 -0400
Subject: [PATCH] Backport sgml-syntax-propertize-rules speedup (Bug#33887)

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Reapply
2019-01-17 "* lisp/textmodes/sgml-mode.el: Try and fix bug#33887."
taking into account 2019-05-09 "Recognize single quote attribute
values in nxml and sgml (Bug#35381)" which means we have to handle
single quotes as well.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-quote-works): New test.
---
 lisp/textmodes/sgml-mode.el            | 21 +++++++++++++++------
 test/lisp/textmodes/sgml-mode-tests.el |  7 +++++++
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 128e58810e..1c307d12b0 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -347,12 +347,21 @@ sgml-font-lock-keywords
      ("--[ \t\n]*\\(>\\)" (1 "> b"))
      ("\\(<\\)[?!]" (1 (prog1 "|>"
                          (sgml-syntax-propertize-inside end))))
-     ;; Quotes outside of tags should not introduce strings.
-     ;; Be careful to call `syntax-ppss' on a position before the one we're
-     ;; going to change, so as not to need to flush the data we just computed.
-     ("[\"']" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
-                       (goto-char (match-end 0)))
-                     (string-to-syntax ".")))))))
+     ;; Quotes outside of tags should not introduce strings which end up
+     ;; hiding tags.  We used to test every quote and mark it as "."
+     ;; if it's outside of tags, but there are too many quotes and
+     ;; the resulting number of calls to syntax-ppss made it too slow
+     ;; (bug#33887), so we're now careful to leave alone any pair
+     ;; of quotes that doesn't hold a < or > char, which is the vast majority.
+     ("\\(?:\\(?1:\"\\)[^\"<>]*[<>\"]\\|\\(?1:'\\)[^'<>]*[<>']\\)"
+      (1 (unless (memq (char-before) '(?\' ?\"))
+           ;; Be careful to call `syntax-ppss' on a position before the one
+           ;; we're going to change, so as not to need to flush the data we
+           ;; just computed.
+           (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
+                 (goto-char (1- (match-end 0))))
+               (string-to-syntax ".")))))
+     )))
 
 (defun sgml-syntax-propertize (start end)
   "Syntactic keywords for `sgml-mode'."
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index 7318a667b3..1c501abf38 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -130,5 +130,12 @@ sgml-with-content
    (sgml-delete-tag 1)
    (should (string= "Winter is comin'" (buffer-string)))))
 
+(ert-deftest sgml-tests--quotes-syntax ()
+  (with-temp-buffer
+    (sgml-mode)
+    (insert "a\"b <tag>c'd</tag>")
+    (should (= 1 (car (syntax-ppss (1- (point-max))))))
+    (should (= 0 (car (syntax-ppss (point-max)))))))
+
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 215 bytes --]


[2: 9a74e5666b]: 2019-05-15 22:21:36 -0400
  * lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Fix typo
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9a74e5666b022098c63d0047c0df90c66e1aa64a

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-15 23:53 ` Noam Postavsky
  2019-05-16 10:54   ` Vincent Lefevre
  2019-05-16 12:15   ` Noam Postavsky
@ 2019-05-16 14:01   ` Eli Zaretskii
  2 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2019-05-16 14:01 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: vincent, 33887

> From: Noam Postavsky <npostavs@gmail.com>
> Date: Wed, 15 May 2019 19:53:08 -0400
> Cc: 33887@debbugs.gnu.org
> 
> Vincent Lefevre <vincent@vinc17.net> writes:
> 
> > This is a regression: Emacs 25 did not hang at all.
> 
> Should we backport Stefan's fix to emacs-26?  Or specifically, backport
> [1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
> loss-of-single-quote-fontification bug (Bug#35381).
> 
> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
>   Fix merge of sgml-syntax-propertize-rules
>   https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b

I'd like to leave this fix on master for a while, so that we could
make sure it has no adverse consequences.  Can we revisit this in a
month's time, say?





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-16 12:15   ` Noam Postavsky
@ 2019-05-17 21:36     ` Vincent Lefevre
  2019-05-18  4:15       ` Noam Postavsky
  0 siblings, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-17 21:36 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 33887

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On 2019-05-16 08:15:58 -0400, Noam Postavsky wrote:
> The corrected patch would be as follows:
[...]

I've tried the combination of

  ca14dd1d4628094dd33d5d94694dcf5f29e843b8
  7dab3ee7ab54b3c2e7bc24170376054786c01d6f

and this patch against Debian's current source package.

Emacs no longer hangs, but I get incorrect highlighting,
for instance on the following XML file.

<root>
<!-- comment -->
<a>"a'</a>
<!-- comment -->
</root>

Highlighting starts to be wrong at the single-quote character.
I've attached a screenshot obtained with the -Q option.

Did I miss anything?

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[-- Attachment #2: nxml.png --]
[-- Type: image/png, Size: 5294 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-17 21:36     ` Vincent Lefevre
@ 2019-05-18  4:15       ` Noam Postavsky
  2019-05-18 14:47         ` Vincent Lefevre
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-18  4:15 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

Vincent Lefevre <vincent@vinc17.net> writes:

> I've tried the combination of
>
>   ca14dd1d4628094dd33d5d94694dcf5f29e843b8
>   7dab3ee7ab54b3c2e7bc24170376054786c01d6f
>
> and this patch against Debian's current source package.
>
> Emacs no longer hangs, but I get incorrect highlighting,
> for instance on the following XML file.
>
> <root>
> <!-- comment -->
> <a>"a'</a>
> <!-- comment -->
> </root>
>
> Highlighting starts to be wrong at the single-quote character.
> I've attached a screenshot obtained with the -Q option.
>
> Did I miss anything?

Ah, I didn't get the mixed quote handling right.  Here's the fix for master:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch --]
[-- Type: text/x-diff, Size: 2449 bytes --]

From 4677edd8dd65b5d956732821e78794f35b275418 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 18 May 2019 00:04:01 -0400
Subject: [PATCH] Fix Bug#33887 for mixed quote usage

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Only
skip syntax-ppss for matched quotes.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Expand test.
---
 lisp/textmodes/sgml-mode.el            |  4 ++--
 test/lisp/textmodes/sgml-mode-tests.el | 17 ++++++++++++-----
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 1b064fb825..e3cf56aa0e 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -345,8 +345,8 @@ sgml-font-lock-keywords
      ;; the resulting number of calls to syntax-ppss made it too slow
      ;; (bug#33887), so we're now careful to leave alone any pair
      ;; of quotes that doesn't hold a < or > char, which is the vast majority.
-     ("\\(?:\\(?1:\"\\)[^\"<>]*[<>\"]\\|\\(?1:'\\)[^'<>]*[<>']\\)"
-      (1 (unless (memq (char-before) '(?\' ?\"))
+     ("\\([\"']\\)[^<>\"']*[<>\"']"
+      (1 (unless (eq (char-after (match-beginning 1)) (char-before))
            ;; Be careful to call `syntax-ppss' on a position before the one
            ;; we're going to change, so as not to need to flush the data we
            ;; just computed.
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index a900e8dcf2..ffcc2cd840 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -161,11 +161,18 @@ sgml-with-content
       (should (string= "&&" (buffer-string))))))
 
 (ert-deftest sgml-tests--quotes-syntax ()
-  (with-temp-buffer
-    (sgml-mode)
-    (insert "a\"b <tag>c'd</tag>")
-    (should (= 1 (car (syntax-ppss (1- (point-max))))))
-    (should (= 0 (car (syntax-ppss (point-max)))))))
+  (dolist (str '("a\"b <t>c'd</t>"
+                 "a'b <t>c\"d</t>"
+                 "<t>\"a'</t>"
+                 "<t>'a\"</t>"
+                 "<t>\"a'\"</t>"
+                 "<t>'a\"'</t>"))
+   (with-temp-buffer
+     (sgml-mode)
+     (insert str)
+     ;; Check that last tag is parsed as a tag.
+     (should (= 1 (car (syntax-ppss (1- (point-max))))))
+     (should (= 0 (car (syntax-ppss (point-max))))))))
 
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 47 bytes --]


And the correponding patch against emacs-26:


[-- Attachment #4: patch --]
[-- Type: text/plain, Size: 3402 bytes --]

From 3a1a36b0b42772f35c70fb7e996ba8fed787e1c2 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Wed, 15 May 2019 18:51:30 -0400
Subject: [PATCH] Backport sgml-syntax-propertize-rules speedup (Bug#33887)

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Reapply
2019-01-17 "* lisp/textmodes/sgml-mode.el: Try and fix bug#33887."
taking into account 2019-05-09 "Recognize single quote attribute
values in nxml and sgml (Bug#35381)" which means we have to handle
single quotes as well.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-quote-works): New test.
---
 lisp/textmodes/sgml-mode.el            | 21 +++++++++++++++------
 test/lisp/textmodes/sgml-mode-tests.el | 14 ++++++++++++++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 128e58810e..f8a37c3820 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -347,12 +347,21 @@ sgml-font-lock-keywords
      ("--[ \t\n]*\\(>\\)" (1 "> b"))
      ("\\(<\\)[?!]" (1 (prog1 "|>"
                          (sgml-syntax-propertize-inside end))))
-     ;; Quotes outside of tags should not introduce strings.
-     ;; Be careful to call `syntax-ppss' on a position before the one we're
-     ;; going to change, so as not to need to flush the data we just computed.
-     ("[\"']" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
-                       (goto-char (match-end 0)))
-                     (string-to-syntax ".")))))))
+     ;; Quotes outside of tags should not introduce strings which end up
+     ;; hiding tags.  We used to test every quote and mark it as "."
+     ;; if it's outside of tags, but there are too many quotes and
+     ;; the resulting number of calls to syntax-ppss made it too slow
+     ;; (bug#33887), so we're now careful to leave alone any pair
+     ;; of quotes that doesn't hold a < or > char, which is the vast majority.
+     ("\\([\"']\\)[^<>\"']*[<>\"']"
+      (1 (unless (eq (char-after (match-beginning 1)) (char-before))
+           ;; Be careful to call `syntax-ppss' on a position before the one
+           ;; we're going to change, so as not to need to flush the data we
+           ;; just computed.
+           (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
+                 (goto-char (1- (match-end 0))))
+               (string-to-syntax ".")))))
+     )))
 
 (defun sgml-syntax-propertize (start end)
   "Syntactic keywords for `sgml-mode'."
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index 7318a667b3..8d0bb88163 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -130,5 +130,19 @@ sgml-with-content
    (sgml-delete-tag 1)
    (should (string= "Winter is comin'" (buffer-string)))))
 
+(ert-deftest sgml-tests--quotes-syntax ()
+  (dolist (str '("a\"b <t>c'd</t>"
+                 "a'b <t>c\"d</t>"
+                 "<t>\"a'</t>"
+                 "<t>'a\"</t>"
+                 "<t>\"a'\"</t>"
+                 "<t>'a\"'</t>"))
+   (with-temp-buffer
+     (sgml-mode)
+     (insert str)
+     ;; Check that last tag is parsed as a tag.
+     (should (= 1 (car (syntax-ppss (1- (point-max))))))
+     (should (= 0 (car (syntax-ppss (point-max))))))))
+
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18  4:15       ` Noam Postavsky
@ 2019-05-18 14:47         ` Vincent Lefevre
  2019-05-18 14:55           ` Vincent Lefevre
  2019-05-18 18:49           ` Noam Postavsky
  0 siblings, 2 replies; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-18 14:47 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

There's still an issue. On the following XML file

<root>
<a>text</a>
<!-- ' -->
<a>text</a>
</root>

the part after the comment <!-- ' --> is highlighted as a comment.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 14:47         ` Vincent Lefevre
@ 2019-05-18 14:55           ` Vincent Lefevre
  2019-05-18 14:57             ` Vincent Lefevre
  2019-05-18 18:49           ` Noam Postavsky
  1 sibling, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-18 14:55 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

On 2019-05-18 16:47:56 +0200, Vincent Lefevre wrote:
> There's still an issue. On the following XML file
> 
> <root>
> <a>text</a>
> <!-- ' -->
> <a>text</a>
> </root>
> 
> the part after the comment <!-- ' --> is highlighted as a comment.

And on the following XML file too:

<root>
<!DOCTYPE root [
<!ENTITY f SYSTEM "f.xml">
]>
<a>ab'cd</a>
<a>text</a>
</root>

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 14:55           ` Vincent Lefevre
@ 2019-05-18 14:57             ` Vincent Lefevre
  2019-05-18 15:01               ` Vincent Lefevre
  0 siblings, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-18 14:57 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

On 2019-05-18 16:55:43 +0200, Vincent Lefevre wrote:
> And on the following XML file too:
> 
> <root>
> <!DOCTYPE root [
> <!ENTITY f SYSTEM "f.xml">
> ]>
> <a>ab'cd</a>
> <a>text</a>
> </root>

I actually meant

<!DOCTYPE root [
<!ENTITY f SYSTEM "f.xml">
]>
<root>
<a>ab'cd</a>
<a>text</a>
</root>

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 14:57             ` Vincent Lefevre
@ 2019-05-18 15:01               ` Vincent Lefevre
  0 siblings, 0 replies; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-18 15:01 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

And another one:

<root>
<a>text</a>
<!-- "don't" -->
<a>text</a>
</root>

The second text is highlighted as a comment.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 14:47         ` Vincent Lefevre
  2019-05-18 14:55           ` Vincent Lefevre
@ 2019-05-18 18:49           ` Noam Postavsky
  2019-05-19  0:17             ` Vincent Lefevre
  2019-05-20 11:47             ` Vincent Lefevre
  1 sibling, 2 replies; 42+ messages in thread
From: Noam Postavsky @ 2019-05-18 18:49 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

Vincent Lefevre <vincent@vinc17.net> writes:

> There's still an issue. On the following XML file
>
> <root>
> <a>text</a>
> <!-- ' -->
> <a>text</a>
> </root>
>
> the part after the comment <!-- ' --> is highlighted as a comment.

> And another one:
>
> <root>
> <a>text</a>
> <!-- "don't" -->
> <a>text</a>
> </root>
>
> The second text is highlighted as a comment.

Right, this is a collision between the syntax rules.  The following
patch fixes it, though perhaps it would be better to just search for the
end of the comment in the ("\\(<\\)!--" (1 "< b")) rule instead?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch --]
[-- Type: text/x-diff, Size: 2702 bytes --]

From a866e4f4b556fb4a346fa68c62296f10966690a1 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 18 May 2019 13:18:19 -0400
Subject: [PATCH] Fix sgml syntax handling of quotes in comments

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Make
sure not to skip over comment ender when searching for quotes.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Add a some more cases.
---
 lisp/textmodes/sgml-mode.el            | 11 ++++++++---
 test/lisp/textmodes/sgml-mode-tests.el | 16 +++++++++-------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index e3cf56aa0e..1af1d1eaef 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -350,9 +350,14 @@ sgml-font-lock-keywords
            ;; Be careful to call `syntax-ppss' on a position before the one
            ;; we're going to change, so as not to need to flush the data we
            ;; just computed.
-           (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
-                 (goto-char (1- (match-end 0))))
-               (string-to-syntax ".")))))
+           (let ((ppss (syntax-ppss (match-beginning 0))))
+             (if (prog1 (zerop (car ppss)) ; Outside tag.
+                   (goto-char (1- (match-end 0)))
+                   ;; If we're in a comment, don't skip over comment
+                   ;; ender.
+                   (when (nth 4 ppss)
+                     (skip-chars-backward "- \t\n")))
+                (string-to-syntax "."))))))
      )))
 
 (defun sgml-syntax-propertize (start end)
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index ffcc2cd840..7e1ddf4047 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -166,13 +166,15 @@ sgml-with-content
                  "<t>\"a'</t>"
                  "<t>'a\"</t>"
                  "<t>\"a'\"</t>"
-                 "<t>'a\"'</t>"))
-   (with-temp-buffer
-     (sgml-mode)
-     (insert str)
-     ;; Check that last tag is parsed as a tag.
-     (should (= 1 (car (syntax-ppss (1- (point-max))))))
-     (should (= 0 (car (syntax-ppss (point-max))))))))
+                 "<t>'a\"'</t>"
+                 "<t><!-- ' --></t>"
+                 "<t><!-- \" --></t>"))
+    (ert-info (str :prefix "Test string: ")
+      (sgml-with-content
+       str
+       ;; Check that last tag is parsed as a tag.
+       (should (= 1 (car (syntax-ppss (1- (point-max))))))
+       (should (= 0 (car (syntax-ppss (point-max)))))))))
 
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 449 bytes --]


> <!DOCTYPE root [
> <!ENTITY f SYSTEM "f.xml">
> ]>
> <root>
> <a>ab'cd</a>
> <a>text</a>
> </root>

This is a different issue, I think the problem is that
sgml-syntax-propertize-inside doesn't handle nesting in the DTD
definition <! [ <! ... > ]>.  The patch below just avoids calling
sgml-syntax-propertize-inside on the prolog in nxml-mode (but the
problem remains in sgml-mode).  Though you'll hit Bug#18871/23668 if you
try to edit the DTD.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: patch --]
[-- Type: text/x-diff, Size: 2580 bytes --]

From 9a50fc38b537d570f739c428a57c66557152151b Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 18 May 2019 14:37:51 -0400
Subject: [PATCH] Don't sgml-syntax-propertize-inside XML prolog

* lisp/nxml/nxml-mode.el (nxml-syntax-propertize): New function.
(nxml-mode): Use it as the syntax-propertize-function.
* test/lisp/nxml/nxml-mode-tests.el (nxml-mode-doctype-and-quote-syntax):
New test.
---
 lisp/nxml/nxml-mode.el            | 16 +++++++++++++++-
 test/lisp/nxml/nxml-mode-tests.el |  8 ++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/lisp/nxml/nxml-mode.el b/lisp/nxml/nxml-mode.el
index ab035b927e..7c39c5023c 100644
--- a/lisp/nxml/nxml-mode.el
+++ b/lisp/nxml/nxml-mode.el
@@ -423,6 +423,20 @@ nxml-parent-document-set
     (when rng-validate-mode
       (rng-validate-while-idle (current-buffer)))))
 
+(defvar nxml-prolog-end) ;; nxml-rap.el
+(defun nxml-syntax-propertize (start end)
+  "Syntactic keywords for `nxml-mode'."
+  ;; Like `sgml-syntax-propertize', but skip prolog.
+  (setq start (max start nxml-prolog-end))
+  (if (>= start end)
+      (goto-char end)
+    (goto-char start)
+    (sgml-syntax-propertize-inside end)
+    (funcall
+     (syntax-propertize-rules sgml-syntax-propertize-rules)
+     start end)))
+
+
 (defvar tildify-space-string)
 (defvar tildify-foreach-region-function)
 
@@ -518,7 +532,7 @@ nxml-mode
 	(nxml-with-invisible-motion
 	  (nxml-scan-prolog)))))
   (setq-local syntax-ppss-table sgml-tag-syntax-table)
-  (setq-local syntax-propertize-function #'sgml-syntax-propertize)
+  (setq-local syntax-propertize-function #'nxml-syntax-propertize)
   (add-hook 'change-major-mode-hook #'nxml-cleanup nil t)
 
   ;; Emacs 23 handles the encoding attribute on the xml declaration
diff --git a/test/lisp/nxml/nxml-mode-tests.el b/test/lisp/nxml/nxml-mode-tests.el
index 92744be619..2bbf92bc96 100644
--- a/test/lisp/nxml/nxml-mode-tests.el
+++ b/test/lisp/nxml/nxml-mode-tests.el
@@ -78,5 +78,13 @@ nxml-mode-tests-correctly-indented-string
       (should-not (equal (get-text-property squote-txt-pos 'face)
                          (get-text-property dquote-att-pos 'face))))))
 
+(ert-deftest nxml-mode-doctype-and-quote-syntax ()
+  (with-temp-buffer
+    (insert "<!DOCTYPE t [\n<!ENTITY f SYSTEM \"f.xml\">\n]>\n<t>'</t>")
+    (nxml-mode)
+    ;; Check that last tag is parsed as a tag.
+    (should (= 1 (car (syntax-ppss (1- (point-max))))))
+    (should (= 0 (car (syntax-ppss (point-max)))))))
+
 (provide 'nxml-mode-tests)
 ;;; nxml-mode-tests.el ends here
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 18:49           ` Noam Postavsky
@ 2019-05-19  0:17             ` Vincent Lefevre
  2019-05-19 17:43               ` Noam Postavsky
  2019-05-20 11:47             ` Vincent Lefevre
  1 sibling, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-19  0:17 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

There's an issue with the following XML file:

<root>
<a>don't</a>
<a>text</a>
<a>></a>
<a>don't</a>
<a>text</a>
</root>

where highlighting becomes wrong starting at the second '.

However, even though > is valid, I normally use &gt; instead.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19  0:17             ` Vincent Lefevre
@ 2019-05-19 17:43               ` Noam Postavsky
  2019-05-19 18:48                 ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-19 17:43 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

Vincent Lefevre <vincent@vinc17.net> writes:

> There's an issue with the following XML file:
>
> <root>
> <a>don't</a>
> <a>text</a>
> <a>></a>
> <a>don't</a>
> <a>text</a>
> </root>
>
> where highlighting becomes wrong starting at the second '.
>
> However, even though > is valid, I normally use &gt; instead.

Hmm, I can't see a way to handle this case without making the
syntax propertizing slow again.  Stefan, any ideas?






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19 17:43               ` Noam Postavsky
@ 2019-05-19 18:48                 ` Stefan Monnier
  2019-05-19 19:03                   ` Noam Postavsky
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2019-05-19 18:48 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Vincent Lefevre, 33887

> Hmm, I can't see a way to handle this case without making the
> syntax propertizing slow again.  Stefan, any ideas?

Can you summarize the origin of the problem in his example?


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19 18:48                 ` Stefan Monnier
@ 2019-05-19 19:03                   ` Noam Postavsky
  2019-05-19 19:24                     ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-19 19:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Vincent Lefevre, 33887

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Can you summarize the origin of the problem in his example?

<t>>1</t>

(syntax-ppss) on the location of "1" in the above, gives (-1 ...).  And
then (syntax-ppss) on the "/" will give (0 ...).  So the syntax
propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
works correctly to see whether it's inside or outside a tag.

">" outside of tags should be set to syntax ".", but I would assume that
adding a syntax-propertize rule which calls syntax-ppss for every ">"
(to check whether it's inside a tag or not) will be very slow, just like
calling it for every quote was.






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19 19:03                   ` Noam Postavsky
@ 2019-05-19 19:24                     ` Stefan Monnier
  2019-05-20 20:47                       ` Noam Postavsky
  2019-05-22 21:44                       ` Stefan Monnier
  0 siblings, 2 replies; 42+ messages in thread
From: Stefan Monnier @ 2019-05-19 19:24 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Vincent Lefevre, 33887

>> Can you summarize the origin of the problem in his example?
>
> <t>>1</t>
>
> (syntax-ppss) on the location of "1" in the above, gives (-1 ...).  And
> then (syntax-ppss) on the "/" will give (0 ...).  So the syntax
> propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
> works correctly to see whether it's inside or outside a tag.
>
> ">" outside of tags should be set to syntax ".", but I would assume that
> adding a syntax-propertize rule which calls syntax-ppss for every ">"
> (to check whether it's inside a tag or not) will be very slow, just like
> calling it for every quote was.

Oh, damn!  Hmm...


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-18 18:49           ` Noam Postavsky
  2019-05-19  0:17             ` Vincent Lefevre
@ 2019-05-20 11:47             ` Vincent Lefevre
  1 sibling, 0 replies; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-20 11:47 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

There's an issue with the following XML file, which does not have
any special character, except a single quote in the middle of the
text.

<root>
<a>12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
</a>
</root>

Note that the newline character before the </a> is important.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19 19:24                     ` Stefan Monnier
@ 2019-05-20 20:47                       ` Noam Postavsky
  2019-05-21  1:06                         ` Vincent Lefevre
  2019-05-22 22:37                         ` Stefan Monnier
  2019-05-22 21:44                       ` Stefan Monnier
  1 sibling, 2 replies; 42+ messages in thread
From: Noam Postavsky @ 2019-05-20 20:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Vincent Lefevre, 33887

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

> There's an issue with the following XML file, which does not have
> any special character, except a single quote in the middle of the
> text.
>
> <root>
> <a>12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
> </a>
> </root>
>
> Note that the newline character before the </a> is important.

Right, this is due to chunking by syntax-propertize.  Here's the fix:


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 3469 bytes --]

From 2025fa25f76fd8a2df46fca8807ca386372757d5 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Mon, 20 May 2019 16:04:24 -0400
Subject: [PATCH 1/2] Handle lone quote 500+ characters away from XML tag
 (Bug#33887)

Because syntax-propertize works in small buffer chunks, the rule for
finding quotes which don't contain angle brackets failed to trigger
when the angle bracket was outside of the current chunk.
* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Match
quotes on lines with no other angle bracket or quote too (the
syntax-propertize chunk is extended to cover whole lines).
* test/lisp/nxml/nxml-mode-tests.el (nxml-mode-quote-in-long-text):
New test.
---
 lisp/textmodes/sgml-mode.el       |  9 +++++++--
 test/lisp/nxml/nxml-mode-tests.el | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 137745fbc1..b555db7b76 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -353,8 +353,13 @@ sgml-font-lock-keywords
      ;; the resulting number of calls to syntax-ppss made it too slow
      ;; (bug#33887), so we're now careful to leave alone any pair
      ;; of quotes that doesn't hold a < or > char, which is the vast majority.
-     ("\\([\"']\\)[^<>\"']*[<>\"']"
-      (1 (unless (eq (char-after (match-beginning 1)) (char-before))
+     ;; We also check quotes which are unpaired to end of line,
+     ;; otherwise we miss the case where the quote might "contain" an
+     ;; angle bracket outside of the current syntax-propertize chunk
+     ;; (this relies on `syntax-propertize-wholelines' being enabled).
+     ("\\([\"']\\)[^<>\"']*\\([<>\"']\\|$\\)"
+      (1 (unless (eq (char-after (match-beginning 1))
+                     (char-after (match-beginning 2)))
            ;; Be careful to call `syntax-ppss' on a position before the one
            ;; we're going to change, so as not to need to flush the data we
            ;; just computed.
diff --git a/test/lisp/nxml/nxml-mode-tests.el b/test/lisp/nxml/nxml-mode-tests.el
index 2bbf92bc96..0916a1e652 100644
--- a/test/lisp/nxml/nxml-mode-tests.el
+++ b/test/lisp/nxml/nxml-mode-tests.el
@@ -86,5 +86,27 @@ nxml-mode-tests-correctly-indented-string
     (should (= 1 (car (syntax-ppss (1- (point-max))))))
     (should (= 0 (car (syntax-ppss (point-max)))))))
 
+(ert-deftest nxml-mode-quote-in-long-text ()
+  (with-temp-buffer
+    (nxml-mode)
+    (insert "<t>"
+            ;; `syntax-propertize-wholelines' extends chunk size based
+            ;; on line length, so newlines are significant!
+            (make-string syntax-propertize-chunk-size ?a) "\n"
+            "'"
+            (make-string syntax-propertize-chunk-size ?a) "\n"
+            "</t>")
+    ;; If we just check (syntax-ppss (point-max)) immediately, then
+    ;; we'll end up propertizing the whole buffer in one chunk (so the
+    ;; test is useless).  Simulate something more like what happens
+    ;; when the buffer is viewed normally.
+    (cl-loop for pos from (point-min) to (point-max)
+             by syntax-propertize-chunk-size
+             do (syntax-ppss pos))
+    (syntax-ppss (point-max))
+    ;; Check that last tag is parsed as a tag.
+    (should (= 1 (- (car (syntax-ppss (1- (point-max))))
+                    (car (syntax-ppss (point-max))))))))
+
 (provide 'nxml-mode-tests)
 ;;; nxml-mode-tests.el ends here
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 896 bytes --]


Note that you have to be sure to recompile sgml-mode.el AND nxml-mode.el
after applying these patches, 'make' isn't smart enough to do it
automatically (yes, I figured this out the hard way).

>> <t>>1</t>
>>
>> (syntax-ppss) on the location of "1" in the above, gives (-1 ...).  And
>> then (syntax-ppss) on the "/" will give (0 ...).  So the syntax
>> propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
>> works correctly to see whether it's inside or outside a tag.
>>
>> ">" outside of tags should be set to syntax ".", but I would assume that
>> adding a syntax-propertize rule which calls syntax-ppss for every ">"
>> (to check whether it's inside a tag or not) will be very slow, just like
>> calling it for every quote was.

Oh, I figured it out, we can just look at (nth 9 ppss), because the list
of open parens is still okay, regardless of unmatched close parens.


[-- Attachment #4: patch --]
[-- Type: text/plain, Size: 2400 bytes --]

From d1520ab5b94d0f130955800ea11222a3702a5519 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Mon, 20 May 2019 16:29:04 -0400
Subject: [PATCH 2/2] Handle ">" outside SGML/XML tags (Bug#33887)

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Check
the list of open parens rather than current depth, the latter is not
reliable.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Extend test for this case.
---
 lisp/textmodes/sgml-mode.el            | 4 +++-
 test/lisp/textmodes/sgml-mode-tests.el | 9 ++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index b555db7b76..052201e5ee 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -364,7 +364,9 @@ sgml-font-lock-keywords
            ;; we're going to change, so as not to need to flush the data we
            ;; just computed.
            (let ((ppss (syntax-ppss (match-beginning 0))))
-             (if (prog1 (zerop (car ppss)) ; Outside tag.
+             ;; Can't rely on depth (nth 0 ppss), because we don't
+             ;; mark ">" outside of tags.
+             (if (prog1 (null (nth 9 ppss)) ; Outside tag.
                    (goto-char (1- (match-end 0)))
                    ;; If we're in a comment, don't skip over comment
                    ;; ender.
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index 09941fe6f1..d6913863d6 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -138,13 +138,16 @@ sgml-with-content
                  "<t>\"a'\"</t>"
                  "<t>'a\"'</t>"
                  "<t><!-- ' --></t>"
-                 "<t><!-- \" --></t>"))
+                 "<t><!-- \" --></t>"
+                 ;; Yes, ">" is technically valid outside tags!
+                 "<t>>'</t>"
+                 ))
     (ert-info (str :prefix "Test string: ")
       (sgml-with-content
        str
        ;; Check that last tag is parsed as a tag.
-       (should (= 1 (car (syntax-ppss (1- (point-max))))))
-       (should (= 0 (car (syntax-ppss (point-max)))))))))
+       (should (= 1 (- (car (syntax-ppss (1- (point-max))))
+                       (car (syntax-ppss (point-max))))))))))
 
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-20 20:47                       ` Noam Postavsky
@ 2019-05-21  1:06                         ` Vincent Lefevre
  2019-05-21 12:27                           ` Noam Postavsky
  2019-05-22 22:37                         ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-21  1:06 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

Thanks for the fixes.

Also I don't think that in a text node, the " and ' characters should
be interpreted for highlighting. In particular, ' is generally used
as an apostrophe, not as a quote. For instance, this looks strange:

<a>This "shouldn't" and "can't" be right.</a>

These characters have no special meaning in a text node.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-21  1:06                         ` Vincent Lefevre
@ 2019-05-21 12:27                           ` Noam Postavsky
  2019-05-22 13:58                             ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-21 12:27 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

Vincent Lefevre <vincent@vinc17.net> writes:

> Also I don't think that in a text node, the " and ' characters should
> be interpreted for highlighting. In particular, ' is generally used
> as an apostrophe, not as a quote. For instance, this looks strange:
>
> <a>This "shouldn't" and "can't" be right.</a>
>
> These characters have no special meaning in a text node.

Hmm, right, it should be possible to fix the crossing quotes in the
above case, but even the simpler

    <a>"oops" 'oops'</a>

shows the same highlighting.  This seems directly due to "we're now
careful to leave alone any pair of quotes that doesn't hold a < or >
char".  So uh, Stefan, how was that supposed to work exactly?





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-21 12:27                           ` Noam Postavsky
@ 2019-05-22 13:58                             ` Stefan Monnier
  2019-05-22 15:44                               ` Vincent Lefevre
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2019-05-22 13:58 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Vincent Lefevre, 33887

> shows the same highlighting.  This seems directly due to "we're now
> careful to leave alone any pair of quotes that doesn't hold a < or >
> char".  So uh, Stefan, how was that supposed to work exactly?

Remember: when I wrote this, we only supported "..." and not '...'.


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-22 13:58                             ` Stefan Monnier
@ 2019-05-22 15:44                               ` Vincent Lefevre
  2019-05-22 16:01                                 ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-22 15:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Noam Postavsky, 33887

On 2019-05-22 09:58:54 -0400, Stefan Monnier wrote:
> > shows the same highlighting.  This seems directly due to "we're now
> > careful to leave alone any pair of quotes that doesn't hold a < or >
> > char".  So uh, Stefan, how was that supposed to work exactly?
> 
> Remember: when I wrote this, we only supported "..." and not '...'.

I'm not sure what you mean by that, but the single quotes are not
the only issue. In general, you don't know the quoting rules in a
text node used by the underlying language (if any), even if you
have only double quotes. For instance, a text node may contain C
or shell code, which can be:

  "a string with \"double quotes\"..."

And one does not expect this to be interpreted as two pairs of
double-quoted text ("a string with \" and "..."). In short, you
should leave text nodes with no specific highlighting, as this
was the case with Emacs 25.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-22 15:44                               ` Vincent Lefevre
@ 2019-05-22 16:01                                 ` Stefan Monnier
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2019-05-22 16:01 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Noam Postavsky, 33887

> I'm not sure what you mean by that, but the single quotes are not
> the only issue.

No but it introduces problems a lot more often.

> In general, you don't know the quoting rules in a
> text node used by the underlying language (if any), even if you
> have only double quotes. For instance, a text node may contain C
> or shell code, which can be:
>
>   "a string with \"double quotes\"..."

Of course.  But to the extent that it doesn't break the rest of the SGML
support, I think it was a pretty good tradeoff (and has arguably a more
often beneficial than harmful effect).

> And one does not expect this to be interpreted as two pairs of
> double-quoted text ("a string with \" and "..."). In short, you
> should leave text nodes with no specific highlighting, as this
> was the case with Emacs 25.

IIRC in Emacs-24 it was yet different.  Basically, the focus should be
to handle tags correctly and what happens in the regular text between
tags is not nearly as important.


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-19 19:24                     ` Stefan Monnier
  2019-05-20 20:47                       ` Noam Postavsky
@ 2019-05-22 21:44                       ` Stefan Monnier
  1 sibling, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2019-05-22 21:44 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Vincent Lefevre, 33887

>> <t>>1</t>
> Oh, damn!  Hmm...

Maybe the best way to detect this is using `parse-partial-sexp` passing
it a `targetdepth` of -1.  The trick will be when/where to call it so
it's cheap enough.


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-20 20:47                       ` Noam Postavsky
  2019-05-21  1:06                         ` Vincent Lefevre
@ 2019-05-22 22:37                         ` Stefan Monnier
  2019-05-26 22:17                           ` Noam Postavsky
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2019-05-22 22:37 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Vincent Lefevre, 33887

> Right, this is due to chunking by syntax-propertize.  Here's the fix:

I pushed a patch which should fix the "lone >" problem without
introducing any undue extra cost.  It should also fix the "very long
line" case.


        Stefan






^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-22 22:37                         ` Stefan Monnier
@ 2019-05-26 22:17                           ` Noam Postavsky
  2019-05-27  9:18                             ` Vincent Lefevre
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-26 22:17 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Vincent Lefevre, 33887

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I pushed a patch which should fix the "lone >" problem without
> introducing any undue extra cost.  It should also fix the "very long
> line" case.

Seems to pass my tests.  Not sure if you missed the alternate fix I
proposed in https://debbugs.gnu.org/33887#94 or not.  It does have the
disadvantage of leaving (car (syntax-ppss)) unreliable for any other
code which uses it.

Here's a patch against master that should cover the remaining cases
Vincent raised:


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 4011 bytes --]

From 2ffdab0e86161396e3d2606949d1fcf93c58b592 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 26 May 2019 11:07:14 -0400
Subject: [PATCH 1/2] Fix some SGML syntax edge cases (Bug#33887)

* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Handle
single and double quotes symmetrically.  Don't skip quoted comment
enders.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Add more test cases.
(sgml-mode-quote-in-long-text): New test.
---
 lisp/textmodes/sgml-mode.el            |  5 +++-
 test/lisp/textmodes/sgml-mode-tests.el | 45 ++++++++++++++++++++++++++++------
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 75f20722b0..1df7e78afc 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -363,9 +363,12 @@ (eval-and-compile
      ;; the resulting number of calls to syntax-ppss made it too slow
      ;; (bug#33887), so we're now careful to leave alone any pair
      ;; of quotes that doesn't hold a < or > char, which is the vast majority.
-     ("\\(?:\\(?1:\"\\)[^\"<>]*\\|\\(?1:'\\)[^'\"<>]*\\)"
+     ("\\([\"']\\)[^\"'<>]*"
       (1 (if (eq (char-after) (char-after (match-beginning 0)))
              (forward-char 1)
+           ;; Avoid skipping comment ender.
+           (when (eq (char-after) ?>)
+             (skip-chars-backward "-"))
            ;; Be careful to call `syntax-ppss' on a position before the one
            ;; we're going to change, so as not to need to flush the data we
            ;; just computed.
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index 1b8965e344..34d26480a4 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -161,15 +161,46 @@ (ert-deftest sgml-quote-works ()
       (should (string= "&&" (buffer-string))))))
 
 (ert-deftest sgml-tests--quotes-syntax ()
+  (dolist (str '("a\"b <t>c'd</t>"
+                 "a'b <t>c\"d</t>"
+                 "<t>\"a'</t>"
+                 "<t>'a\"</t>"
+                 "<t>\"a'\"</t>"
+                 "<t>'a\"'</t>"
+                 "a\"b <tag>c'd</tag>"
+                 "<tag>c>'d</tag>"
+                 "<t><!-- \" --></t>"
+                 "<t><!-- ' --></t>"
+                 ))
+   (with-temp-buffer
+     (sgml-mode)
+     (insert str)
+     (ert-info ((format "%S" str) :prefix "Test case: ")
+       ;; Check that last tag is parsed as a tag.
+       (should (= 1 (car (syntax-ppss (1- (point-max))))))
+       (should (= 0 (car (syntax-ppss (point-max)))))))))
+
+(ert-deftest sgml-mode-quote-in-long-text ()
   (with-temp-buffer
     (sgml-mode)
-    (insert "a\"b <tag>c'd</tag>")
-    (should (= 1 (car (syntax-ppss (1- (point-max))))))
-    (should (= 0 (car (syntax-ppss (point-max)))))
-    (erase-buffer)
-    (insert "<tag>c>d</tag>")
-    (should (= 1 (car (syntax-ppss (1- (point-max))))))
-    (should (= 0 (car (syntax-ppss (point-max)))))))
+    (insert "<t>"
+            ;; `syntax-propertize-wholelines' extends chunk size based
+            ;; on line length, so newlines are significant!
+            (make-string syntax-propertize-chunk-size ?a) "\n"
+            "'"
+            (make-string syntax-propertize-chunk-size ?a) "\n"
+            "</t>")
+    ;; If we just check (syntax-ppss (point-max)) immediately, then
+    ;; we'll end up propertizing the whole buffer in one chunk (so the
+    ;; test is useless).  Simulate something more like what happens
+    ;; when the buffer is viewed normally.
+    (cl-loop for pos from (point-min) to (point-max)
+             by syntax-propertize-chunk-size
+             do (syntax-ppss pos))
+    (syntax-ppss (point-max))
+    ;; Check that last tag is parsed as a tag.
+    (should (= 1 (- (car (syntax-ppss (1- (point-max))))
+                    (car (syntax-ppss (point-max))))))))
 
 (provide 'sgml-mode-tests)
 ;;; sgml-mode-tests.el ends here
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 134 bytes --]


And about the highlighting of quoted text outside tags, we can just
disable fontification, while leaving the syntax code untouched:


[-- Attachment #4: patch --]
[-- Type: text/plain, Size: 4141 bytes --]

From a4a6008d96011e2517939cb8cb51624802a8c31e Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 26 May 2019 17:41:22 -0400
Subject: [PATCH 2/2] Don't fontiy text outside of SGML/XML tags (Bug#33887)

* lisp/font-lock.el (font-lock-syntactic-face-function-default): New
function.
(font-lock-syntactic-face-function): Use it as default value.
* lisp/textmodes/sgml-mode.el (sgml-font-lock-syntactic-face): New
function.
(sgml-mode):
* lisp/nxml/nxml-mode.el (nxml-mode): Use it as
font-lock-syntactic-face-function value.
---
 lisp/font-lock.el           |  7 +++++--
 lisp/nxml/nxml-mode.el      |  4 +++-
 lisp/textmodes/sgml-mode.el | 11 +++++++++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/lisp/font-lock.el b/lisp/font-lock.el
index 3991a4ee8e..ddf1cbdb9f 100644
--- a/lisp/font-lock.el
+++ b/lisp/font-lock.el
@@ -527,9 +527,12 @@ (defvar font-lock-syntactically-fontified 0
 sometimes be slightly incorrect.")
 (make-variable-buffer-local 'font-lock-syntactically-fontified)
 
+(defun font-lock-syntactic-face-function-default (state)
+  "Default value for `font-lock-syntactic-face-function'."
+  (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+
 (defvar font-lock-syntactic-face-function
-  (lambda (state)
-    (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+  #'font-lock-syntactic-face-function-default
   "Function to determine which face to use when fontifying syntactically.
 The function is called with a single parameter (the state as returned by
 `parse-partial-sexp' at the beginning of the region to highlight) and
diff --git a/lisp/nxml/nxml-mode.el b/lisp/nxml/nxml-mode.el
index da01b2a342..05044d66df 100644
--- a/lisp/nxml/nxml-mode.el
+++ b/lisp/nxml/nxml-mode.el
@@ -551,7 +551,9 @@ (define-derived-mode nxml-mode text-mode "nXML"
           nil  ; no special syntax table
           (font-lock-extend-region-functions . (nxml-extend-region))
           (jit-lock-contextually . t)
-          (font-lock-unfontify-region-function . nxml-unfontify-region)))
+          (font-lock-unfontify-region-function . nxml-unfontify-region)
+          (font-lock-syntactic-face-function
+           . sgml-font-lock-syntactic-face)))
 
   (with-demoted-errors (rng-nxml-mode-init)))
 
diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 1df7e78afc..225fe72a01 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -329,6 +329,11 @@ (defconst sgml-font-lock-keywords-2
 (defvar sgml-font-lock-keywords sgml-font-lock-keywords-1
   "Rules for highlighting SGML code.  See also `sgml-tag-face-alist'.")
 
+(defun sgml-font-lock-syntactic-face (state)
+  "`font-lock-syntactic-face-function' for `sgml-mode'."
+  (and (nth 9 state) ;; Only use faces within tags.
+       (font-lock-syntactic-face-function-default state)))
+
 (defvar-local sgml--syntax-propertize-ppss nil)
 
 (defun sgml--syntax-propertize-ppss (pos)
@@ -573,7 +578,7 @@ (define-derived-mode sgml-mode text-mode '(sgml-xml-mode "XML" "SGML")
   ;; This is desirable because SGML discards a newline that appears
   ;; immediately after a start tag or immediately before an end tag.
   (setq-local paragraph-start (concat "[ \t]*$\\|\
-[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
+\[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
   (setq-local paragraph-separate (concat paragraph-start "$"))
   (setq-local adaptive-fill-regexp "[ \t]*")
   (add-hook 'fill-nobreak-predicate 'sgml-fill-nobreak nil t)
@@ -591,7 +596,9 @@ (define-derived-mode sgml-mode text-mode '(sgml-xml-mode "XML" "SGML")
   (setq font-lock-defaults '((sgml-font-lock-keywords
 			      sgml-font-lock-keywords-1
 			      sgml-font-lock-keywords-2)
-			     nil t))
+                             nil t nil
+                             (font-lock-syntactic-face-function
+                              . sgml-font-lock-syntactic-face)))
   (setq-local syntax-propertize-function #'sgml-syntax-propertize)
   (setq-local facemenu-add-face-function 'sgml-mode-facemenu-add-face-function)
   (setq-local sgml-xml-mode (sgml-xml-guess))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-26 22:17                           ` Noam Postavsky
@ 2019-05-27  9:18                             ` Vincent Lefevre
  2019-05-27 12:02                               ` Noam Postavsky
  0 siblings, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-27  9:18 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

On 2019-05-26 18:17:55 -0400, Noam Postavsky wrote:
> And about the highlighting of quoted text outside tags, we can just
> disable fontification, while leaving the syntax code untouched:
[...]

I've applied it with a minor change against Emacs 26 (context lines
for hunk #1 of sgml-mode.el are different), but the comments are
no longer highlighted as comments.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-27  9:18                             ` Vincent Lefevre
@ 2019-05-27 12:02                               ` Noam Postavsky
  2019-05-29  0:30                                 ` Vincent Lefevre
  0 siblings, 1 reply; 42+ messages in thread
From: Noam Postavsky @ 2019-05-27 12:02 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

Vincent Lefevre <vincent@vinc17.net> writes:

> On 2019-05-26 18:17:55 -0400, Noam Postavsky wrote:
>> And about the highlighting of quoted text outside tags, we can just
>> disable fontification, while leaving the syntax code untouched:
> [...]
>
> I've applied it with a minor change against Emacs 26 (context lines
> for hunk #1 of sgml-mode.el are different), but the comments are
> no longer highlighted as comments.

Ah, I guess reusing the default font-lock-syntactic-face-function
doesn't really make sense after all.  So sgml-font-lock-syntactic-face
should be like this:

    (defun sgml-font-lock-syntactic-face (state)
      "`font-lock-syntactic-face-function' for `sgml-mode'."
      ;; Don't use string face outside of tags.
      (cond ((and (nth 9 state) (nth 3 state)) font-lock-string-face)
            ((nth 4 state) font-lock-comment-face)))


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 4188 bytes --]

From 0c3e6a97f92dec31e7e186dae933c86700034089 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 26 May 2019 17:41:22 -0400
Subject: [PATCH] Don't fontify text outside of SGML/XML tags (Bug#33887)

* lisp/font-lock.el (font-lock-syntactic-face-function-default): New
function.
(font-lock-syntactic-face-function): Use it as default value.
* lisp/textmodes/sgml-mode.el (sgml-font-lock-syntactic-face): New
function.
(sgml-mode):
* lisp/nxml/nxml-mode.el (nxml-mode): Use it as
font-lock-syntactic-face-function value.
---
 lisp/font-lock.el           |  7 +++++--
 lisp/nxml/nxml-mode.el      |  4 +++-
 lisp/textmodes/sgml-mode.el | 12 ++++++++++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/lisp/font-lock.el b/lisp/font-lock.el
index 3991a4ee8e..ddf1cbdb9f 100644
--- a/lisp/font-lock.el
+++ b/lisp/font-lock.el
@@ -527,9 +527,12 @@ (defvar font-lock-syntactically-fontified 0
 sometimes be slightly incorrect.")
 (make-variable-buffer-local 'font-lock-syntactically-fontified)
 
+(defun font-lock-syntactic-face-function-default (state)
+  "Default value for `font-lock-syntactic-face-function'."
+  (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+
 (defvar font-lock-syntactic-face-function
-  (lambda (state)
-    (if (nth 3 state) font-lock-string-face font-lock-comment-face))
+  #'font-lock-syntactic-face-function-default
   "Function to determine which face to use when fontifying syntactically.
 The function is called with a single parameter (the state as returned by
 `parse-partial-sexp' at the beginning of the region to highlight) and
diff --git a/lisp/nxml/nxml-mode.el b/lisp/nxml/nxml-mode.el
index da01b2a342..05044d66df 100644
--- a/lisp/nxml/nxml-mode.el
+++ b/lisp/nxml/nxml-mode.el
@@ -551,7 +551,9 @@ (define-derived-mode nxml-mode text-mode "nXML"
           nil  ; no special syntax table
           (font-lock-extend-region-functions . (nxml-extend-region))
           (jit-lock-contextually . t)
-          (font-lock-unfontify-region-function . nxml-unfontify-region)))
+          (font-lock-unfontify-region-function . nxml-unfontify-region)
+          (font-lock-syntactic-face-function
+           . sgml-font-lock-syntactic-face)))
 
   (with-demoted-errors (rng-nxml-mode-init)))
 
diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 1df7e78afc..da25665e62 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -329,6 +329,12 @@ (defconst sgml-font-lock-keywords-2
 (defvar sgml-font-lock-keywords sgml-font-lock-keywords-1
   "Rules for highlighting SGML code.  See also `sgml-tag-face-alist'.")
 
+(defun sgml-font-lock-syntactic-face (state)
+  "`font-lock-syntactic-face-function' for `sgml-mode'."
+  ;; Don't use string face outside of tags.
+  (cond ((and (nth 9 state) (nth 3 state)) font-lock-string-face)
+        ((nth 4 state) font-lock-comment-face)))
+
 (defvar-local sgml--syntax-propertize-ppss nil)
 
 (defun sgml--syntax-propertize-ppss (pos)
@@ -573,7 +579,7 @@ (define-derived-mode sgml-mode text-mode '(sgml-xml-mode "XML" "SGML")
   ;; This is desirable because SGML discards a newline that appears
   ;; immediately after a start tag or immediately before an end tag.
   (setq-local paragraph-start (concat "[ \t]*$\\|\
-[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
+\[ \t]*</?\\(" sgml-name-re sgml-attrs-re "\\)?>"))
   (setq-local paragraph-separate (concat paragraph-start "$"))
   (setq-local adaptive-fill-regexp "[ \t]*")
   (add-hook 'fill-nobreak-predicate 'sgml-fill-nobreak nil t)
@@ -591,7 +597,9 @@ (define-derived-mode sgml-mode text-mode '(sgml-xml-mode "XML" "SGML")
   (setq font-lock-defaults '((sgml-font-lock-keywords
 			      sgml-font-lock-keywords-1
 			      sgml-font-lock-keywords-2)
-			     nil t))
+                             nil t nil
+                             (font-lock-syntactic-face-function
+                              . sgml-font-lock-syntactic-face)))
   (setq-local syntax-propertize-function #'sgml-syntax-propertize)
   (setq-local facemenu-add-face-function 'sgml-mode-facemenu-add-face-function)
   (setq-local sgml-xml-mode (sgml-xml-guess))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-27 12:02                               ` Noam Postavsky
@ 2019-05-29  0:30                                 ` Vincent Lefevre
  2019-06-04 12:55                                   ` Noam Postavsky
  0 siblings, 1 reply; 42+ messages in thread
From: Vincent Lefevre @ 2019-05-29  0:30 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Stefan Monnier, 33887

Thanks. A last issue: a comment before the root element is not
highlighted. Example: in

<?xml version="1.0" encoding="utf-8"?>
<!-- comment -->
<root>
<!-- comment -->
</root>
<!-- comment -->

the first comment is not highlighted, but the other two comments are.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





^ permalink raw reply	[flat|nested] 42+ messages in thread

* bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
  2019-05-29  0:30                                 ` Vincent Lefevre
@ 2019-06-04 12:55                                   ` Noam Postavsky
  0 siblings, 0 replies; 42+ messages in thread
From: Noam Postavsky @ 2019-06-04 12:55 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: Stefan Monnier, 33887

tags 33887 fixed
close 33887 27.1
quit

Vincent Lefevre <vincent@vinc17.net> writes:

> Thanks. A last issue: a comment before the root element is not
> highlighted. Example: in
>
> <?xml version="1.0" encoding="utf-8"?>
> <!-- comment -->
> <root>
> <!-- comment -->
> </root>
> <!-- comment -->
>
> the first comment is not highlighted, but the other two comments are.

This was followed up in https://debbugs.gnu.org/32823#45

I'm pushing the current patches to master and closing this bug, as I
think all the issues here are resolved (if not, we can open new bugs).

e04f93e18a 2019-06-04T08:42:50-04:00 "Don't fontify text outside of SGML/XML tags (Bug#33887)"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e04f93e18a8083d3a4930decc523c4f5d9a97c9e

438e4804d1 2019-06-04T08:42:50-04:00 "Fix some SGML syntax edge cases (Bug#33887)"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=438e4804d107720f526d0c7c367cbd029f264676






^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-06-04 12:55 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-27 10:13 bug#33887: 26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode Vincent Lefevre
2018-12-27 16:02 ` Eli Zaretskii
2018-12-27 16:39   ` Stefan Monnier
2018-12-27 16:43     ` Eli Zaretskii
2018-12-27 17:32       ` Stefan Monnier
2018-12-27 17:47         ` Eli Zaretskii
2018-12-27 18:43         ` Vincent Lefevre
2018-12-28 17:18           ` Stefan Monnier
2019-01-17 22:57   ` Stefan Monnier
2019-01-08 22:11 ` Fernando Jascovich
2019-01-10 15:09   ` Eli Zaretskii
2019-01-17 23:25     ` Stefan Monnier
2019-05-15 23:53 ` Noam Postavsky
2019-05-16 10:54   ` Vincent Lefevre
2019-05-16 12:15   ` Noam Postavsky
2019-05-17 21:36     ` Vincent Lefevre
2019-05-18  4:15       ` Noam Postavsky
2019-05-18 14:47         ` Vincent Lefevre
2019-05-18 14:55           ` Vincent Lefevre
2019-05-18 14:57             ` Vincent Lefevre
2019-05-18 15:01               ` Vincent Lefevre
2019-05-18 18:49           ` Noam Postavsky
2019-05-19  0:17             ` Vincent Lefevre
2019-05-19 17:43               ` Noam Postavsky
2019-05-19 18:48                 ` Stefan Monnier
2019-05-19 19:03                   ` Noam Postavsky
2019-05-19 19:24                     ` Stefan Monnier
2019-05-20 20:47                       ` Noam Postavsky
2019-05-21  1:06                         ` Vincent Lefevre
2019-05-21 12:27                           ` Noam Postavsky
2019-05-22 13:58                             ` Stefan Monnier
2019-05-22 15:44                               ` Vincent Lefevre
2019-05-22 16:01                                 ` Stefan Monnier
2019-05-22 22:37                         ` Stefan Monnier
2019-05-26 22:17                           ` Noam Postavsky
2019-05-27  9:18                             ` Vincent Lefevre
2019-05-27 12:02                               ` Noam Postavsky
2019-05-29  0:30                                 ` Vincent Lefevre
2019-06-04 12:55                                   ` Noam Postavsky
2019-05-22 21:44                       ` Stefan Monnier
2019-05-20 11:47             ` Vincent Lefevre
2019-05-16 14:01   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).