unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#25706: 26.0.50; Slow C file fontification
@ 2017-02-13 18:20 Sujith
  2020-11-30 11:26 ` Lars Ingebrigtsen
  2020-11-30 12:46 ` Mattias Engdegård
  0 siblings, 2 replies; 45+ messages in thread
From: Sujith @ 2017-02-13 18:20 UTC (permalink / raw)
  To: 25706

On a machine that is not very high-powered, opening some C files
and trying to edit/view them is very slow.

For example:
https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h

This is a large file and filled with macros.
Is there any way to view this without disabling font-lock entirely ?
I am using the master branch and I have these in my .emacs:

(global-font-lock-mode t)
(setq font-lock-maximum-decoration
      (quote ((c-mode . 2) (c++-mode . 2) (t . t))))
(setq c-font-lock-extra-types
      (quote
       ("\\sw+_t" "bool" "complex" "imaginary" "FILE" "lconv" "tm" "va_list" "jmp_buf" "Lisp_Object"
	"u8" "u16" "u32" "u64"
	"s8" "s16" "s32" "s64"
	"__le16" "__le32" "__le64"
	"__be16" "__be32" "__be64"
	"__s8" "__s16" "__s32" "__s64"
	"__u8" "__u16" "__u32" "__u64")))

The machine is a low-end 10-inch netbook. Some details:

$ uname -a
Linux the-damned 4.10.0-rc7-wt #16 SMP PREEMPT Tue Feb 7 10:47:38 IST 2017 x86_64 GNU/Linux

$ free -m -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        542M        560M         87M        766M        1.0G
Swap:          2.0G         24M        2.0G

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 55
model name      : Intel(R) Celeron(R) CPU  N2807  @ 1.58GHz
stepping        : 8
microcode       : 0x811
cpu MHz         : 1828.644
cache size      : 1024 KB

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 55
model name      : Intel(R) Celeron(R) CPU  N2807  @ 1.58GHz
stepping        : 8
microcode       : 0x811
cpu MHz         : 1805.267
cache size      : 1024 KB


$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 6.3.1 20170109 (GCC)


In GNU Emacs 26.0.50.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.22.7)
 of 2017-02-13 built on the-damned
Repository revision: 271dcf8652ccf94d8582b2bcdb26f066d0b946a2
Windowing system distributor 'The X.Org Foundation', version 11.0.11901000
Recent messages:
Checking 57 files in /usr/share/emacs/26.0.50/lisp/eshell...
Checking 70 files in /usr/share/emacs/26.0.50/lisp/erc...
Checking 34 files in /usr/share/emacs/26.0.50/lisp/emulation...
Checking 172 files in /usr/share/emacs/26.0.50/lisp/emacs-lisp...
Checking 24 files in /usr/share/emacs/26.0.50/lisp/cedet...
Checking 57 files in /usr/share/emacs/26.0.50/lisp/calendar...
Checking 87 files in /usr/share/emacs/26.0.50/lisp/calc...
Checking 103 files in /usr/share/emacs/26.0.50/lisp/obsolete...
Checking for load-path shadows...done
Message modified; kill anyway? (y or n) y

Configured using:
 'configure --prefix=/usr --without-libsystemd --without-dbus
 --without-gconf --without-gsettings --without-selinux --without-threads
 --without-gpm --without-xaw3d --without-toolkit-scroll-bars
 --without-m17n-flt --without-libotf --without-imagemagick
 --without-rsvg --without-png --without-gif --without-tiff
 --without-jpeg --without-xpm --with-sound=no CFLAGS=-O3'

Configured features:
NOTIFY ACL GNUTLS LIBXML2 FREETYPE XFT ZLIB GTK3 X11

Important settings:
  value of $LANG: en_IN.UTF-8
  locale-coding-system: utf-8-unix

Major mode: C

Minor modes in effect:
  global-magit-file-mode: t
  magit-file-mode: t
  diff-auto-refine-mode: t
  magit-auto-revert-mode: t
  auto-revert-mode: t
  global-git-commit-mode: t
  async-bytecomp-package-mode: t
  shell-dirtrack-mode: t
  display-battery-mode: t
  display-time-mode: t
  iswitchb-mode: t
  savehist-mode: t
  save-place-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: 1
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Load-path shadows:
/home/sujith/.emacs.d/elpa/emms-20160304.920/tq hides /usr/share/emacs/26.0.50/lisp/emacs-lisp/tq

Features:
(cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine
cc-vars cc-defs pp shadow flyspell ispell face-remap emacsbug ibuf-ext
ibuffer ibuffer-loaddefs w3m-form w3m-filter w3m-bookmark w3m-tabmenu
w3m-session ffap w3m timezone w3m-hist w3m-fb bookmark-w3m w3m-ems
wid-edit w3m-ccl ccl w3m-favicon w3m-image w3m-proc w3m-util dired-aux
magit-obsolete magit-blame magit-stash magit-bisect magit-remote
magit-commit magit-sequence magit-notes magit-worktree magit-branch
magit-files magit-refs magit-status magit magit-repos magit-apply
magit-wip magit-log magit-diff smerge-mode diff-mode magit-core
magit-autorevert autorevert filenotify magit-process magit-margin
magit-mode magit-git crm magit-section magit-popup git-commit
magit-utils log-edit easy-mmode pcvs-util add-log with-editor
async-bytecomp async tramp-sh tramp tramp-compat tramp-loaddefs trampver
ucs-normalize shell pcomplete parse-time dash advice mu4e-contrib mu4e
desktop frameset mu4e-speedbar speedbar sb-image ezimage dframe
mu4e-main mu4e-context mu4e-view cal-menu calendar cal-loaddefs
thingatpt browse-url comint ansi-color mu4e-headers mu4e-compose
mu4e-draft mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark
mu4e-message flow-fill html2text mu4e-proc mu4e-proc-mu mu4e-utils
doc-view jka-compr image-mode mu4e-lists mu4e-vars message puny
format-spec rfc822 mml mml-sec epa derived epg gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047
rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils gmm-utils
mailheader hl-line cl mu4e-meta battery time dired-x dired
dired-loaddefs edmacro kmacro xcscope ring server iswitchb savehist
saveplace finder-inf info package epg-config url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars seq byte-opt subr-x gv bytecomp byte-compile
cl-extra help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript case-table
epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote inotify dynamic-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 1112940 3463)
 (symbols 48 36857 5)
 (miscs 40 96 357)
 (strings 32 73343 16247)
 (string-bytes 1 2484237)
 (vectors 16 59639)
 (vector-slots 8 1342999 94842)
 (floats 8 475 158)
 (intervals 56 127752 56)
 (buffers 976 21))





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2017-02-13 18:20 bug#25706: 26.0.50; Slow C file fontification Sujith
@ 2020-11-30 11:26 ` Lars Ingebrigtsen
  2020-11-30 11:37   ` Lars Ingebrigtsen
  2020-11-30 12:46 ` Mattias Engdegård
  1 sibling, 1 reply; 45+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-30 11:26 UTC (permalink / raw)
  To: Sujith; +Cc: 25706

Sujith <m.sujith@gmail.com> writes:

> On a machine that is not very high-powered, opening some C files
> and trying to edit/view them is very slow.
>
> For example:
> https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h
>
> This is a large file and filled with macros.
> Is there any way to view this without disabling font-lock entirely ?

(This bug report unfortunately got no response at the time.)

I tried reproducing this on a pretty new laptop, and opening the file in
question (with your settings) took less than a second with Emacs 28.

You say "very slow", but not what kind of time scale you mean -- one
second or one minute or something.

Are you still seeing this issue in more recent versions of Emacs?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 11:26 ` Lars Ingebrigtsen
@ 2020-11-30 11:37   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-30 11:37 UTC (permalink / raw)
  To: 25706

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Are you still seeing this issue in more recent versions of Emacs?

The mail bounced, so I guess it's unlikely to be any further progress in
this bug report, and I'm closing it.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2017-02-13 18:20 bug#25706: 26.0.50; Slow C file fontification Sujith
  2020-11-30 11:26 ` Lars Ingebrigtsen
@ 2020-11-30 12:46 ` Mattias Engdegård
  2020-11-30 12:49   ` Lars Ingebrigtsen
                     ` (3 more replies)
  1 sibling, 4 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-11-30 12:46 UTC (permalink / raw)
  To: 25706; +Cc: Alan Mackenzie, Lars Ingebrigtsen

>> https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h
> 
> I tried reproducing this on a pretty new laptop, and opening the file in
> question (with your settings) took less than a second with Emacs 28.

My lappy is less new but not really that slow -- compared to the hardware of the original reporter it's a speed demon --
but opening the file takes almost 4 s here. More importantly, scrolling through the file is painfully slow.

The code in the file is nothing out of the ordinary; it consists of macros that are 1-3 lines each; definitely not a pathological case. The entire fontification takes 64 s for this file.

I'd say the complaint is warranted, even if the original reporter is no longer reachable. Reopen?

Alan, do you have a diagnose?






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 12:46 ` Mattias Engdegård
@ 2020-11-30 12:49   ` Lars Ingebrigtsen
  2020-11-30 16:27   ` Eli Zaretskii
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-30 12:49 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Alan Mackenzie, 25706

Mattias Engdegård <mattiase@acm.org> writes:

> I'd say the complaint is warranted, even if the original reporter is
> no longer reachable. Reopen?

OK; reopening.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 12:46 ` Mattias Engdegård
  2020-11-30 12:49   ` Lars Ingebrigtsen
@ 2020-11-30 16:27   ` Eli Zaretskii
  2020-11-30 16:38   ` Alan Mackenzie
  2020-11-30 18:30   ` Alan Mackenzie
  3 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2020-11-30 16:27 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: acm, larsi, 25706

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Mon, 30 Nov 2020 13:46:30 +0100
> Cc: Alan Mackenzie <acm@muc.de>, Lars Ingebrigtsen <larsi@gnus.org>
> 
> >> https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h
> > 
> > I tried reproducing this on a pretty new laptop, and opening the file in
> > question (with your settings) took less than a second with Emacs 28.
> 
> My lappy is less new but not really that slow -- compared to the hardware of the original reporter it's a speed demon --
> but opening the file takes almost 4 s here. More importantly, scrolling through the file is painfully slow.
> 
> The code in the file is nothing out of the ordinary; it consists of macros that are 1-3 lines each; definitely not a pathological case. The entire fontification takes 64 s for this file.
> 
> I'd say the complaint is warranted, even if the original reporter is no longer reachable. Reopen?
> 
> Alan, do you have a diagnose?

I suggest to run this under "M-x profiler-start" and post the fully
expanded profile you get from that.  Bonus points for doing that after
loading the CC Mode files as .el (not .elc), which will make the
profile more detailed.





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 12:46 ` Mattias Engdegård
  2020-11-30 12:49   ` Lars Ingebrigtsen
  2020-11-30 16:27   ` Eli Zaretskii
@ 2020-11-30 16:38   ` Alan Mackenzie
  2020-11-30 16:53     ` Mattias Engdegård
  2020-11-30 18:30   ` Alan Mackenzie
  3 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-11-30 16:38 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Mon, Nov 30, 2020 at 13:46:30 +0100, Mattias Engdegård wrote:
> >> https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h

> > I tried reproducing this on a pretty new laptop, and opening the file
> > in question (with your settings) took less than a second with Emacs
> > 28.

> My lappy is less new but not really that slow -- compared to the
> hardware of the original reporter it's a speed demon -- but opening the
> file takes almost 4 s here. More importantly, scrolling through the
> file is painfully slow.

> The code in the file is nothing out of the ordinary; it consists of
> macros that are 1-3 lines each; definitely not a pathological case. The
> entire fontification takes 64 s for this file.

> I'd say the complaint is warranted, even if the original reporter is no
> longer reachable. Reopen?

> Alan, do you have a diagnose?

Yes.  I've had a look at the file, and it's large and lacking in braces.
There are functions in CC Mode which search backwards for opening braces
to establish context.  When there are none, the search goes back to BOB.
Lots of these searches, not efficiently cached, take a long time.

It's a problem with CC Mode, not with the source file.  It's a known
problem, and not easy to fix.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 16:38   ` Alan Mackenzie
@ 2020-11-30 16:53     ` Mattias Engdegård
  2020-11-30 17:04       ` Mattias Engdegård
  2020-12-01  9:21       ` Alan Mackenzie
  0 siblings, 2 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-11-30 16:53 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

30 nov. 2020 kl. 17.38 skrev Alan Mackenzie <acm@muc.de>:

> Yes.  I've had a look at the file, and it's large and lacking in braces.
> There are functions in CC Mode which search backwards for opening braces
> to establish context.  When there are none, the search goes back to BOB.
> Lots of these searches, not efficiently cached, take a long time.
> 
> It's a problem with CC Mode, not with the source file.  It's a known
> problem, and not easy to fix.

Actually, it's the underscores!
Demo: fill a file with the line pairs

#define abc_defg_hij_klm__nop_qrst_uvw_xyz_w__ooa_cin_e__aoi__uynv(s) \
 0

repeated 1000 times, thus making it 2000 lines. Save as something.h. Slow!
Now replace each underscore with a letter. Save. Fast!

Fontifying the 2000 line file (with underscores) takes longer than the original 80000 line file.

I started going through c-find-decl-spots and c-find-decl-prefix-search (together there are while statements nested 4 deep) but am not sure exactly where the trouble is. A regexp? Something syntax-char related (since '_' has symbol syntax, not word)?

CC-mode in general thrashes the regexp cache; the miss rate is at 27 % for the original file, which is way too high. Enlarging the cache enough to eliminate misses helps, but not nearly enough.






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 16:53     ` Mattias Engdegård
@ 2020-11-30 17:04       ` Mattias Engdegård
  2020-12-01  5:48         ` Ravine Var
  2020-12-01  9:29         ` Alan Mackenzie
  2020-12-01  9:21       ` Alan Mackenzie
  1 sibling, 2 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-11-30 17:04 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

> Actually, it's the underscores!

Found it. Suggested fix attached.

It can be improved: at least one pair of regexp group brackets can be removed, but I didn't dare doing so because I wasn't sure if it would throw some group numbers off by one.

Alan, please, let's work together and remove unnecessary capture groups from the regexps! Even XEmacs regexps support non-capturing brackets, \(?:...\), and they save time, regexp stack space, and reduce the hassle of computing the 'regexp depth' everywhere.


[-- Attachment #2: cc-underscores.diff --]
[-- Type: application/octet-stream, Size: 1322 bytes --]

diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
index d6089ea295..d1f795053c 100644
--- a/lisp/progmodes/cc-langs.el
+++ b/lisp/progmodes/cc-langs.el
@@ -967,7 +967,7 @@ c-opt-cpp-macro-define-start
   t (if (c-lang-const c-opt-cpp-macro-define)
 	(concat (c-lang-const c-opt-cpp-prefix)
 		(c-lang-const c-opt-cpp-macro-define)
-		"[ \t]+\\(\\(\\sw\\|_\\)+\\)\\(([^)]*)\\)?"
+		"[ \t]+\\(\\([[:word:]_]\\)+\\)\\(([^)]*)\\)?"
 		;;       ^                 ^ #defined name
 		"\\([ \t]\\|\\\\\n\\)*")))
 (c-lang-defvar c-opt-cpp-macro-define-start
@@ -979,7 +979,7 @@ c-opt-cpp-macro-define-id
   t (if (c-lang-const c-opt-cpp-macro-define)
 	(concat (c-lang-const c-opt-cpp-prefix)	; #
 		(c-lang-const c-opt-cpp-macro-define) ; define
-		"[ \t]+\\(\\sw\\|_\\)+")))
+		"[ \t]+\\([[:word:]_]\\)+")))
 (c-lang-defvar c-opt-cpp-macro-define-id
   (c-lang-const c-opt-cpp-macro-define-id))
 
@@ -989,7 +989,7 @@ c-anchored-hash-define-no-parens
   t (if (c-lang-const c-opt-cpp-macro-define)
 	(concat (c-lang-const c-anchored-cpp-prefix)
 		(c-lang-const c-opt-cpp-macro-define)
-		"[ \t]+\\(\\sw\\|_\\)+\\([^(a-zA-Z0-9_]\\|$\\)")))
+		"[ \t]+\\([[:word:]_]\\)+\\([^(a-zA-Z0-9_]\\|$\\)")))
 
 (c-lang-defconst c-cpp-expr-directives
   "List of cpp directives (without the prefix) that are followed by an

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 12:46 ` Mattias Engdegård
                     ` (2 preceding siblings ...)
  2020-11-30 16:38   ` Alan Mackenzie
@ 2020-11-30 18:30   ` Alan Mackenzie
  3 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-11-30 18:30 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: acm, Lars Ingebrigtsen, 25706

Hello, Mattias.

On Mon, Nov 30, 2020 at 13:46:30 +0100, Mattias Engdegård wrote:
> >> https://raw.githubusercontent.com/qca/qcamain_open_hal_public/master/hal/ar9300/osprey_reg_map_macro.h

> > I tried reproducing this on a pretty new laptop, and opening the file
> > in question (with your settings) took less than a second with Emacs
> > 28.

> My lappy is less new but not really that slow -- compared to the
> hardware of the original reporter it's a speed demon -- but opening the
> file takes almost 4 s here. More importantly, scrolling through the
> file is painfully slow.

Hah!  I just tried it, all the way through the file, and it took me
3568.429881811142 seconds, i.e. all of an hour bar 32 seconds.

My machine is no way slow, being a first generation AMD Ryzen from 2017.

> Alan, do you have a diagnose?

Other than what I told you last post (lack of braces), not yet, but I'm
going to take the first tenth of the OP's file (which is 4 Mb) for
testing on.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 17:04       ` Mattias Engdegård
@ 2020-12-01  5:48         ` Ravine Var
  2020-12-01 13:34           ` Mattias Engdegård
  2020-12-01  9:29         ` Alan Mackenzie
  1 sibling, 1 reply; 45+ messages in thread
From: Ravine Var @ 2020-12-01  5:48 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Alan Mackenzie, Lars Ingebrigtsen, 25706

Mattias Engdegård <mattiase@acm.org> writes:
> Found it. Suggested fix attached.
>
> It can be improved: at least one pair of regexp group brackets can be
> removed, but I didn't dare doing so because I wasn't sure if it would
> throw some group numbers off by one.

Thanks for working on this !

Will this patch fix the problem with big header files like
the one originally reported ?

I tested this patch and the issue is still there.

Also, such header files are very common. For example:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 16:53     ` Mattias Engdegård
  2020-11-30 17:04       ` Mattias Engdegård
@ 2020-12-01  9:21       ` Alan Mackenzie
  2020-12-01 12:03         ` Mattias Engdegård
  1 sibling, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-01  9:21 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Mon, Nov 30, 2020 at 17:53:04 +0100, Mattias Engdegård wrote:
> 30 nov. 2020 kl. 17.38 skrev Alan Mackenzie <acm@muc.de>:

> > Yes.  I've had a look at the file, and it's large and lacking in
> > braces.  There are functions in CC Mode which search backwards for
> > opening braces to establish context.  When there are none, the
> > search goes back to BOB.  Lots of these searches, not efficiently
> > cached, take a long time.

> > It's a problem with CC Mode, not with the source file.  It's a known
> > problem, and not easy to fix.

> Actually, it's the underscores!
> Demo: fill a file with the line pairs

> #define abc_defg_hij_klm__nop_qrst_uvw_xyz_w__ooa_cin_e__aoi__uynv(s) \
>  0

> repeated 1000 times, thus making it 2000 lines. Save as something.h. Slow!
> Now replace each underscore with a letter. Save. Fast!

> Fontifying the 2000 line file (with underscores) takes longer than the
> original 80000 line file.

Hey, wonderful!  I haven't tried it yet, but I did try this:
(i) Take the first 10% of the original 4MB file, and save it in a
  different file.
(ii) Fontify that file from top to bottom: according to EPL, 292s
(iii) Insert 9 new lines "{}" every 10% of that new file.
(iv) Fontify the amended file top to bottom: new time 98s.

That's a factor of 3 different.

> I started going through c-find-decl-spots and
> c-find-decl-prefix-search (together there are while statements nested
> 4 deep) but am not sure exactly where the trouble is. A regexp?
> Something syntax-char related (since '_' has symbol syntax, not word)?

> CC-mode in general thrashes the regexp cache; the miss rate is at 27 %
> for the original file, which is way too high. Enlarging the cache
> enough to eliminate misses helps, but not nearly enough.

So, you reckon replacing "\\(" by "\\(?:" wherever the first isn't
really needed would make a big difference?  Have I understood you right?
If so, I've got a big job ahead of me, going through all the regexps in
CC Mode doing the replacement, and fixing all the match_begininings and
match_ends, and so on, which depend on them.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-11-30 17:04       ` Mattias Engdegård
  2020-12-01  5:48         ` Ravine Var
@ 2020-12-01  9:29         ` Alan Mackenzie
  2020-12-01  9:44           ` martin rudalics
  1 sibling, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-01  9:29 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello again, Mattias.

On Mon, Nov 30, 2020 at 18:04:56 +0100, Mattias Engdegård wrote:
> > Actually, it's the underscores!

> Found it. Suggested fix attached.

> It can be improved: at least one pair of regexp group brackets can be
> removed, but I didn't dare doing so because I wasn't sure if it would
> throw some group numbers off by one.

> Alan, please, let's work together and remove unnecessary capture groups
> from the regexps! Even XEmacs regexps support non-capturing brackets,
> \(?:...\), and they save time, regexp stack space, and reduce the
> hassle of computing the 'regexp depth' everywhere.

There are 342 occurrences of '\\\\([^?]' in CC Mode.  Most of these can
surely be replaced by "\\(?:", but not all, by a long way.  This change
will be fun.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01  9:29         ` Alan Mackenzie
@ 2020-12-01  9:44           ` martin rudalics
  2020-12-01 10:07             ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: martin rudalics @ 2020-12-01  9:44 UTC (permalink / raw)
  To: Alan Mackenzie, Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

[-- Attachment #1: Type: text/plain, Size: 375 bytes --]

 > There are 342 occurrences of '\\\\([^?]' in CC Mode.  Most of these can
 > surely be replaced by "\\(?:", but not all, by a long way.  This change
 > will be fun.

Years ago I wrote the attached that might help you in this regard (load
it and do 'turn-on-regexp-lock-mode').  If you move point before the "("
of a "\\(" it should give you the appropriate nesting.

martin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: regexp-lock.el --]
[-- Type: text/x-emacs-lisp; name="regexp-lock.el", Size: 71548 bytes --]

;;; regexp-lock.el --- minor mode for highlighting Emacs Lisp regexps

;; Copyright (C) 2005 Martin Rudalics

;; Time-stamp: "2013-09-27 16:59:18 martin"
;; Author: Martin Rudalics <rudalics@gmx.at>
;; Keywords: regular expressions
;; Version: 0.1

;; regexp-lock.el is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.

;; regexp-lock.el is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;;; Commentary:

;; Regexp Lock is a minor mode for highlighting regular expressions in Emacs
;; Lisp mode.

;; `regexp-lock-mode' turns on/off Regexp Lock in the current buffer.  For
;; further information consult the documentation of `regexp-lock-mode'.

;; To turn on Regexp Lock in any Emacs Lisp file you open, add the lines
;;   (require 'regexp-lock)
;;   (add-hook 'emacs-lisp-mode-hook 'turn-on-regexp-lock-mode)
;; to your .emacs.

;;; Code:

;; _____________________________________________________________________________
;;
;;;                      Faces and customizable variables
;; _____________________________________________________________________________
;;
(defgroup regexp-lock nil
  "Highlight regular expressions in Lisp modes."
  :version "22.1"
  :group 'font-lock)

(defface regexp-lock-regexp
  '((((class color)) :background "Grey87")
    (t :underline t))
  "Face for highlighting regexp at point."
  :group 'regexp-lock)

(defface regexp-lock-group
  '((((class color)) :bold t :foreground "Black" :background "Orange")
    (t :bold t))
  "Face for highlighting group numbers in regexp at point."
  :group 'regexp-lock)

(defface regexp-lock-match
  '((((class color)) :background "Turquoise1")
    (t :underline t))
  "Face for highlighting match of regexp at point."
  :group 'regexp-lock)

(defface regexp-lock-match-group
  '((((class color)) :bold t :foreground "Black" :background "Turquoise1")
    (t :bold t))
  "Face for highlighting group numbers in match of regexp at point."
  :group 'regexp-lock)

(defface regexp-lock-match-other
  '((((class color)) :background "PaleTurquoise1")
    (t :underline t))
  "Face for highlighting other matches of regexp at point."
  :group 'regexp-lock)

(defcustom regexp-lock-minor-mode-string nil
  "*String to display in mode line when Regexp Lock is enabled."
  :type '(choice string (const :tag "none" nil))
  :group 'regexp-lock)

(defcustom regexp-lock-regexp-string
  "\\\\\\\\[](|)>}`'=_sSwWcCbB0-9]\\|\\[\\(?:[ ^:]\\|\\\\[tnf]\\)\\|\\][*+?]"
  "*Strings matching this regexp are considered regexp subexpressions.

This regexp is used to discriminate strings representing regular
expressions from \"ordinary\" strings.  The default value has Regexp
Lock search for one of the following:

- two backslashes preceding any of the characters expected in regexp
  backslash constructs but \"[\", \"{\" and \"<\" - the latter being
  excluded because the corresponding constructs have a special meaning
  in `substitute-command-keys'

- a left bracket followed by a space, a caret, a colon, or a backslash
  that precedes one of the characters \"t\", \"n\", or \"f\"

- a right bracket followed by one of \"*\", \"+\", or \"?\"

If any of these items is present in a string, that individual string is
considered part of a regular expression.  If, moreover, the string
literally appears within the argument list of a `concat' or `mapconcat',
all components of that list are considered regular expressions too."
  :type 'regexp
  :group 'regexp-lock)

(defcustom regexp-lock-redo-delay 0.1
  "*Time in seconds Regexp Lock waits before refontifying text.

By default, Regexp Lock refontifies text in order to correctly assign
the text properties of all regexps displayed.  When the value of this
variable is nil Regexp Lock never refontifies text.  As a consequence
regexps may appear improperly fontified after a buffer has been altered,
scrolled, or is displayed for the first time."
  :type '(choice (const :tag "never" nil) (number :tag "seconds"))
  :set (lambda (symbol value)
         (set-default symbol value)
         (when (boundp 'regexp-lock-redo-timer)
           (when regexp-lock-redo-timer
             (cancel-timer regexp-lock-redo-timer)
             (setq regexp-lock-redo-timer nil))
           (when value
             (setq regexp-lock-redo-timer
                   (run-with-idle-timer value t 'regexp-lock-redo)))))
  :group 'regexp-lock)

(defcustom regexp-lock-pause nil
  "*Time in seconds Regexp Lock pauses during refontifying and rechecking.

When the value of this variable is nil `regexp-lock-redo' and
`regexp-lock-recheck' never pause."
  :type '(choice (const :tag "never" nil) (number :tag "seconds"))
  :group 'regexp-lock)

(defcustom regexp-lock-redo-size 500
  "*Number of characters Regexp Lock refontifies without pause."
  :type 'integer
  :group 'regexp-lock)

(defcustom regexp-lock-recheck-delay 1
  "*Time in seconds Regexp Lock waits before rechecking.

Rechecking is needed since refontification \(`regexp-lock-redo'\) can
not tell whether a multi-line string that matches - or does not match -
`regexp-lock-regexp-string' did so in earlier fontifications too.  The
function `regexp-lock-recheck' periodically checks strings whether they
\(still\) qualify as regexp subexpressions.  It does so by searching
windows for `regexp-lock-regexp-string' and requesting refontification
whenever the semantics of a string might have changed.  If the value of
regexp-lock-recheck-delay is nil no rechecking is done.

In practice, the semantics of expressions change rarely.  A noticeable
exception occurs when you compose a regexp spanning multiple lines and
the first match for `regexp-lock-regexp-string' does not occur on the
first lines."
  :type '(choice (const :tag "never" nil) (number :tag "seconds"))
  :set (lambda (symbol value)
         (set-default symbol value)
         (when (boundp 'regexp-lock-recheck-timer)
           (when regexp-lock-recheck-timer
             (cancel-timer regexp-lock-recheck-timer)
             (setq regexp-lock-recheck-timer nil))
           (when value
             (setq regexp-lock-recheck-timer
                   (run-with-idle-timer value t 'regexp-lock-recheck)))))
  :group 'regexp-lock)

(defcustom regexp-lock-show-priority 1000
  "*Priority of overlays highlighting the regexp at point.

Regexp Lock uses this priority for overlays highlighting the regexp at
point and group numbers."
  :type 'integer
  :group 'regexp-lock)

(defcustom regexp-lock-show-delay 0.2
  "*Time in seconds to wait before highlighting the regexp at point.

Regexp Lock waits this many seconds before highlighting the regexp at
point and any group numbers.  A value of nil means that no such
highlighting is performed."
  :type '(choice (const :tag "never" nil) (number :tag "seconds"))
  :set (lambda (symbol value)
         (set-default symbol value)
         (when (boundp 'regexp-lock-show-timer)
           (when regexp-lock-show-timer
             (cancel-timer regexp-lock-show-timer))
           (setq regexp-lock-show-timer nil)
           (when value
             (setq regexp-lock-show-timer
                   (run-with-idle-timer value t 'regexp-lock-show)))))
  :group 'regexp-lock)

(defcustom regexp-lock-match-before-group "{"
  "*String displayed before group number of matching expression.

Matching the regexp at point has Regexp Lock display group numbers of
corresponding regexp subexpressions.  These numbers are indicated with
the help of overlays appearing before and after the match.  If two or
more subexpressions match at the same position, you may discriminate
them more easily by displaying this string before any group number."
  :type 'string
  :group 'regexp-lock)

(defcustom regexp-lock-match-after-group "}"
  "*String displayed after group number of matching expression.

Matching the regexp at point has Regexp Lock display group numbers of
corresponding regexp subexpressions.  These numbers are indicated with
the help of overlays appearing before and after the match.  If two or
more subexpressions match at the same position, you may discriminate
them more easily by displaying this string after any group number."
  :type 'string
  :group 'regexp-lock)

(defcustom regexp-lock-hook nil
  "Hook run after Regexp Lock has been turned on or off."
  :type 'hook
  :group 'regexp-lock)

;; _____________________________________________________________________________
;;
;;;                              Mode definitions
;; _____________________________________________________________________________
;;
(define-minor-mode regexp-lock-mode
  "Toggle Regexp Lock.

Regexp Lock is a minor mode for highlighting regular expressions in
Emacs Lisp mode.  When activated, it has font-lock modify syntactic
properties and appearance of regexp constituents as follows:

- Ordinary brackets, parentheses, and semicolons are assigned the
  `symbol' syntax-table property.  As a consequence, `forward-sexp' and
  `backward-sexp' within strings will skip parenthesized groups and
  alternatives in a more intuitive way.  `blink-matching-open' and
  `show-paren-mode' will not falsely indicate mismatching parens.

- Brackets delimiting character alternatives are highlighted with
  `font-lock-regexp-grouping-construct' face.  Special parentheses and
  brackets that don't match are signaled with `font-lock-warning-face'.

- Highlight the regular expression at point with `regexp-lock-regexp'
  face.  Also overlay the backslashes used to escape subgroup delimiting
  parens with the associated group number.  Group numbers are displayed
  with `regexp-lock-group' face.  These overlays are installed whenever
  `point' is immediately before or after a string or subgroup delimiter
  of the regexp at point.

The commands \\[regexp-lock-match-next] and \\[regexp-lock-match-prev]
can be used to highlight the next respectively previous expression
matching the regexp at point in another window.  These commands use
`eval' to evaluate the regexp at point.  For the current match they
highlight:

- The entire match `(match-string 0)' with `regexp-lock-match' face.

- Group numbers corresponding to subgroup matches are highlighted with
  `regexp-lock-match-group' face.  In addition, the strings specified by
  `regexp-lock-match-before-group' and `regexp-lock-match-after-group'
  are used to separate group numbers.

Matches before and after the current match are highlighted with
`regexp-lock-match-other' face.  If necessary, Regexp Lock splits the
selected window in order to display matches.  Initially, matches are
shown for the buffer containing the regexp at point.  Matches for any
other buffer can be shown by switching to that buffer in the window
displaying matches.

Finally, Regexp Lock provides a function `regexp-lock-increment' which
permits to in-/decrement arguments of `match-beginning' or `match-end'
within the region.


Caveats:

- Regexp Lock uses a number of heuristics to detect regexps.  Hence you
  will occasionally see ordinary strings highlighted as regexps as well
  as regexps highlighted as ordinary strings.  In some cases customizing
  the variable `regexp-lock-regexp-string' might help.

- Regexp Lock analyzes regular expressions literally.  Hence if you
  write something like

  \(defvar foo \"\\\\(\") \(defvar bar (concat foo \"bar\\\\)\"))

  Regexp Lock is not able to indicate group numbers correctly and will
  additionally issue two warnings.

- Regexp Lock expects that a regexp produced by `regexp-opt' is
  contained in a grouping construct iff the second argument of
  regexp-opt is present and does not equal one of the character
  sequences `nil' or `()'.

- Regexp Lock does not recognize expressions constructed by `rx' or
  `sregex'.

- Regexp Lock consumes processor resources.  On battery-powered systems
  you should turn it off whenever you don't need it."
  :lighter regexp-lock-minor-mode-string
  :group 'regexp-lock
  :keymap '(("\C-c(" . regexp-lock-match-next)
            ("\C-c)" . regexp-lock-match-prev)
            ("\C-c#" . regexp-lock-increment))
  (if regexp-lock-mode
      (regexp-lock-activate)
    (regexp-lock-deactivate))
  (run-hooks 'regexp-lock-hook))

(defun turn-on-regexp-lock-mode ()
  "Unequivocally turn on `regexp-lock-mode'."
  (interactive)
  (regexp-lock-mode 1))

;; _____________________________________________________________________________
;;
;;;                          Local definitions
;; _____________________________________________________________________________
;;
(defvar regexp-lock-redo t
  "When non-nil refontify this buffer.")

(defvar regexp-lock-redo-timer nil
  "Idle timer for `regexp-lock-redo'.")

(defvar regexp-lock-recheck t
  "When non-nil recheck this buffer.")

(defvar regexp-lock-recheck-timer nil
  "Idle timer for `regexp-lock-recheck'.")

(defvar regexp-lock-overlays nil
  "Overlays used by `regexp-lock-show'.")

(defvar regexp-lock-show-timer nil
  "Idle timer for `regexp-lock-show'.")

(defvar regexp-lock-match-regexp nil
  "`regexp-lock-match' searches for this regexp.")

(defvar regexp-lock-match-window nil
  "`regexp-lock-match' display matches in this window.")

(defvar regexp-lock-match-buffer nil
  "`regexp-lock-match-window' displays this buffer.")

(defvar regexp-lock-match-overlays nil
  "Overlays that highlight matches in `regexp-lock-match-window'.")

(defvar regexp-lock-match-from (make-marker)
  "Marker for match begin in `regexp-lock-match-buffer'.")

(defvar regexp-lock-match-to (make-marker)
  "Marker for match end in `regexp-lock-match-buffer'.")

(eval-when-compile
  (defmacro save-regexp-lock (&rest body)
    "Eval BODY with match-data, excursion, restrictions saved, buffer widened."
    `(save-match-data
       (save-excursion
         (save-restriction
           (widen)
           (progn ,@body)))))
  (put 'save-regexp-lock 'lisp-indent-function 0)
  (def-edebug-spec save-regexp-lock let)
  (defmacro with-regexp-lock (&rest body)
    "Eval BODY, preserving current buffer's modified and undo states."
    (let ((modified (make-symbol "modified")))
      `(let ((,modified (buffer-modified-p))
             (buffer-undo-list t)
             (inhibit-read-only t)
             (inhibit-point-motion-hooks t)
             (inhibit-modification-hooks t)
             deactivate-mark
             buffer-file-name
             buffer-file-truename)
	 (unwind-protect
	     (progn ,@body)
	   (unless ,modified
	     (restore-buffer-modified-p nil))))))
  (put 'with-regexp-lock 'lisp-indent-function 0)
  (def-edebug-spec with-regexp-lock let))

(defsubst regexp-lock-string-face-p (face)
  "Return t when character at `point' has `font-lock-string-face' face property."
  (or (and (listp face)
           (memq 'font-lock-string-face face))
      (eq face 'font-lock-string-face)))

(defsubst regexp-lock-syntactic-face-p (face)
  "Return t when face property at `point' indicates syntactic context.

More precisely, return t when character at point has one of
`font-lock-string-face', `font-lock-comment-face', or
`font-lock-doc-face' face property."
  (or (and (listp face)
           (or (memq 'font-lock-string-face face)
               (memq 'font-lock-comment-face face)
               (memq 'font-lock-doc-face face)))
      (memq face '(font-lock-string-face
                   font-lock-comment-face
                   font-lock-doc-face))))

;; the following function is commented out in font-lock.el
(defun remove-text-property (start end property &optional object)
 "Remove a property from text from START to END.
Argument PROPERTY is the property to remove.
Optional argument OBJECT is the string or buffer containing the text.
Return t if the property was actually removed, nil otherwise."
 (remove-text-properties start end (list property) object))

;; the following function is commented out in font-lock.el
(defun remove-single-text-property (start end prop value &optional object)
 "Remove a specific property value from text from START to END.
Arguments PROP and VALUE specify the property and value to remove.  The
resulting property values are not equal to VALUE nor lists containing VALUE.
Optional argument OBJECT is the string or buffer containing the text."
 (let ((start (text-property-not-all start end prop nil object)) next prev)
   (while start
     (setq next (next-single-property-change start prop object end)
	    prev (get-text-property start prop object))
     (cond ((and (symbolp prev) (eq value prev))
	     (remove-text-property start next prop object))
	    ((and (listp prev) (memq value prev))
	     (let ((new (delq value prev)))
	       (cond ((null new)
		      (remove-text-property start next prop object))
		     ((= (length new) 1)
		      (put-text-property start next prop (car new) object))
		     (t
		      (put-text-property start next prop new object))))))
     (setq start (text-property-not-all next end prop nil object)))))

;; _____________________________________________________________________________
;;
;;;                        Activate / Deactivate
;; _____________________________________________________________________________
;;
(defun regexp-lock-activate ()
  "Activate Regexp Lock in current buffer."
  (if (not (memq major-mode
                 '(emacs-lisp-mode lisp-mode lisp-interaction-mode reb-mode)))
      (error "Regexp Lock can be used in Lisp modes only")
    ;; turn on font-lock if necessary and integrate ourselves
    (unless font-lock-mode (font-lock-mode 1))
    (set (make-local-variable 'font-lock-extra-managed-props)
         (append font-lock-extra-managed-props
                 (list 'syntax-table 'regexp-lock)))
    (font-lock-add-keywords nil '(regexp-lock-fontify . nil) t)
    (font-lock-unfontify-buffer)
    (save-restriction
      (widen)
      (with-regexp-lock
        (remove-text-properties (point-min) (point-max) '(fontified t))))
    ;; syntax properties
    (set (make-local-variable 'parse-sexp-lookup-properties) t)
    ;; hooks
    (add-hook 'after-change-functions 'regexp-lock-after-change nil t)
    (add-hook 'window-scroll-functions 'regexp-lock-window-redo t t)
    (add-hook 'window-size-change-functions 'regexp-lock-frame-redo)
    (add-hook 'change-major-mode-hook 'regexp-lock-deactivate nil t)
    ;; redo-timer
    (when regexp-lock-redo-timer
      (cancel-timer regexp-lock-redo-timer)
      (setq regexp-lock-redo-timer nil))
    (when regexp-lock-redo-delay
      (setq regexp-lock-redo-timer
            (run-with-idle-timer regexp-lock-redo-delay t 'regexp-lock-redo)))
    (set (make-local-variable 'regexp-lock-redo) nil)
    ;; recheck-timer
    (when regexp-lock-recheck-timer
      (cancel-timer regexp-lock-recheck-timer)
      (setq regexp-lock-recheck-timer nil))
    (when regexp-lock-recheck-delay
      (setq regexp-lock-recheck-timer
            (run-with-idle-timer
             regexp-lock-recheck-delay t 'regexp-lock-recheck)))
    (set (make-local-variable 'regexp-lock-recheck) nil)
    ;; show-timer
    (when regexp-lock-show-timer
      (cancel-timer regexp-lock-show-timer)
      (setq regexp-lock-show-timer nil))
    (when regexp-lock-show-delay
      (setq regexp-lock-show-timer
            (run-with-idle-timer regexp-lock-show-delay t 'regexp-lock-show)))))

(defun regexp-lock-deactivate ()
  "Deactivate Regexp Lock in current buffer."
  ;; syntax properties
  (setq parse-sexp-lookup-properties nil)
  ;; local hooks
  (remove-hook 'after-change-functions 'regexp-lock-after-change)
  (remove-hook 'window-scroll-functions 'regexp-lock-window-redo)
  (remove-hook 'change-major-mode-hook 'regexp-lock-deactivate)
  (remove-hook 'pre-command-hook 'regexp-lock-match-pre-command)
  ;; redo
  (with-regexp-lock
    (remove-text-properties (point-min) (point-max) '(regexp-lock-redo nil)))
  ;; font lock
  (font-lock-unfontify-buffer)
  (setq font-lock-extra-managed-props
        (delq 'syntax-table
              (delq 'regexp-lock
                    font-lock-extra-managed-props)))
  (font-lock-remove-keywords nil '(regexp-lock-fontify . nil))
  (save-restriction
    (widen)
    (with-regexp-lock
      (remove-text-properties (point-min) (point-max) '(fontified t))))
  (unless (catch 'found
            (dolist (buffer (buffer-list))
              (when (with-current-buffer buffer regexp-lock-mode)
                (throw 'found t))))
    ;; markers
    (set-marker regexp-lock-match-from nil)
    (set-marker regexp-lock-match-to nil)
    ;; global hook
    (remove-hook 'window-size-change-functions 'regexp-lock-frame-redo)
    ;; redo-timer
    (when regexp-lock-redo-timer
      (cancel-timer regexp-lock-redo-timer)
      (setq regexp-lock-redo-timer nil))
    ;; recheck-timer
    (when regexp-lock-recheck-timer
      (cancel-timer regexp-lock-recheck-timer)
      (setq regexp-lock-recheck-timer nil))
    ;; show-timer
    (when regexp-lock-show-timer
      (cancel-timer regexp-lock-show-timer)
      (setq regexp-lock-show-timer nil))))

;; _____________________________________________________________________________
;;
;;;                           Text Properties
;; _____________________________________________________________________________
;;
(defun regexp-lock-after-change (start end old-len)
  "Mark text after buffer change to trigger `regexp-lock-redo'."
  (when regexp-lock-mode
    (with-regexp-lock
      (save-excursion
        (goto-char start)
        (if (save-match-data
              (save-excursion
                (beginning-of-line)
                (re-search-forward
                 regexp-lock-regexp-string (max end (line-end-position)) t)))
            (put-text-property
             (line-beginning-position) (min (max end (1+ start)) (point-max))
             'regexp-lock-redo 2)
          (put-text-property
           (line-beginning-position) (min (max end (1+ start)) (point-max))
           'regexp-lock-redo t))
        (setq regexp-lock-redo t)))))

(defun regexp-lock-window-redo (window start)
  "Mark text after window scroll to trigger `regexp-lock-redo'."
  (with-current-buffer (window-buffer window)
    (when regexp-lock-mode
      (setq regexp-lock-redo t))))

(defun regexp-lock-frame-redo (frame)
  "Mark text after window size change to trigger `regexp-lock-redo'."
  ;; Use frame-first-window since selected-window may be on a different frame.
  (with-selected-window (frame-first-window frame)
    (dolist (window (window-list frame 'nominibuf))
      (with-current-buffer (window-buffer window)
        (when regexp-lock-mode
          (setq regexp-lock-redo t))))))

(defun regexp-lock-redo ()
  "Refontify with Regexp Lock.

Currently this operates on all windows of the selected frame."
  (catch 'input
    (let ((current-buffer (current-buffer))
          (current-point (point))
          (current-point-min (point-min))
          (current-point-max (point-max)))
      (dolist (window (window-list nil 'nominibuf))
        (with-current-buffer (window-buffer window)
          (when (and regexp-lock-mode regexp-lock-redo font-lock-mode)
            (let ((window-start (window-start window))
                  (window-end (window-end window))
                  (parse-sexp-ignore-comments t))
              (save-regexp-lock
               (let* ((bod (save-excursion
                             ;; bod is the last beginning-of-defun
                             ;; preceding start of window or point-min
                             (goto-char window-start)
                             (or (condition-case nil
                                     (progn
                                       (beginning-of-defun)
                                       (line-beginning-position))
                                   (error (point-min)))
                                 (point-min))))
                      (eod (save-excursion
                             ;; eod is the first end-of-defun following
                             ;; end of window or point-max
                             (goto-char window-end)
                             (or (condition-case nil
                                     (progn
                                       (beginning-of-defun -1)
                                       (max window-end
                                            (line-beginning-position)))
                                   (error (point-max)))
                                 (point-max))))
                      ;; from is the first redo position between bod
                      ;; and eod
                      (from (min (or (text-property-any
                                      bod eod 'regexp-lock-redo t)
                                     eod)
                                 (or (text-property-any
                                      bod eod 'fontified nil)
                                     eod)))
                      to)
                 (when (and from (< from eod))
                   (save-excursion
                     (goto-char from)
                     (setq from (line-beginning-position)))
                   ;; adjust from
                   (when (or (< from bod)
                             (and (> from bod)
                                  (not (get-text-property
                                        (1- from) 'fontified))))
                     ;; refontify from bod
                     (setq from bod))
                   ;; initialize to
                   (when (or (< from window-end)
                             (not (equal (get-text-property
                                          (1- from) 'regexp-lock)
                                         (get-text-property
                                          from 'regexp-lock))))
                     (setq to (min (save-excursion
                                     (goto-char
                                      (+ from regexp-lock-redo-size))
                                     (line-beginning-position 2))
                                   eod))
                     ;; fontify
                     (while (and (< from to)
                                 (or (not regexp-lock-pause)
                                     (save-excursion
                                       (with-current-buffer current-buffer
                                         (save-restriction
                                           (goto-char current-point)
                                           (narrow-to-region
                                            current-point-min
                                            current-point-max)
                                           (sit-for regexp-lock-pause))))
                                     (throw 'input t)))
                       (with-regexp-lock
                         ;; record the following two properties _now_
                         ;; since font-lock may fontify past to
                         (let ((fontified-at-to
                                (get-text-property to 'fontified))
                               (lock-at-to
                                (get-text-property to 'regexp-lock)))
                           (put-text-property from to 'fontified t)
                           (if jit-lock-mode
                               ;; as jit-lock-fontify-now
                               (condition-case err
                                   (run-hook-with-args
                                    'jit-lock-functions from to)
                                 (quit (put-text-property
                                        from to 'fontified nil)
                                       (funcall
                                        'signal (car err) (cdr err))))
                             ;; plain font-lock-fontify-region
                             (font-lock-fontify-region from to))
                           (remove-text-properties
                            from to '(regexp-lock-redo nil))
                           (setq from to)
                           (when (and (< to eod)
                                      (or (not fontified-at-to)
                                          (not (equal (get-text-property
                                                       (1- to) 'regexp-lock)
                                                      lock-at-to))))
                             (put-text-property
                              to (min (1+ to) (point-max))
                              'regexp-lock-redo t)
                             (setq to (min (save-excursion
                                             (goto-char
                                              (+ to regexp-lock-redo-size))
                                             (line-beginning-position 2))
                                           eod))))))))))
              ;; keep the following always _within_ the outermost
              ;; let to avoid that other idle timers get confused
              (timer-activate-when-idle regexp-lock-show-timer t)
              (setq regexp-lock-redo nil)
              (setq regexp-lock-recheck t))))
        (or (not regexp-lock-pause)
            (sit-for regexp-lock-pause)
            (throw 'input t))))))

(defsubst regexp-lock-set-redo (from to)
  "Set `regexp-lock-redo' from `regexp-lock-recheck'.

This sets the `regexp-lock-redo' text-property at FROM as well as the
buffer-local value of `regexp-lock-redo' to t.  Values are set if a
match for `regexp-lock-regexp-string' is found before TO and the
`regexp-lock' text-property at FROM is not set or no match before TO
exists and the `regexp-lock' text-property is set."
  (if (re-search-forward regexp-lock-regexp-string to 'to)
      ;; match for regexp-lock-regexp-string
      (unless (get-text-property from 'regexp-lock)
        ;; regexp-lock not set, redo
        (with-regexp-lock
          (put-text-property from (1+ from) 'regexp-lock-redo t))
        (setq regexp-lock-redo t))
    ;; no match for regexp-lock-regexp-string
    (when (get-text-property from 'regexp-lock)
      ;; regexp-lock set, redo
      (with-regexp-lock
        (put-text-property from (1+ from) 'regexp-lock-redo t))
      (setq regexp-lock-redo t))))

(defun regexp-lock-recheck ()
  "Recheck windows with Regexp Lock.

Currently this operates on all windows of the selected frame."
  (catch 'input
    (let ((current-buffer (current-buffer))
          (current-point (point))
          (current-point-min (point-min))
          (current-point-max (point-max)))
      (dolist (window (window-list nil 'nominibuf))
        (with-current-buffer (window-buffer window)
          (when (and regexp-lock-mode regexp-lock-recheck font-lock-mode)
            (let ((window-start (window-start window))
                  (window-end (window-end window))
                  (parse-sexp-ignore-comments t))
              (save-regexp-lock
               (let* ((from (save-excursion
                              ;; from is the last beginning-of-defun
                              ;; preceding start of window or point-min
                              (goto-char window-start)
                              (or (condition-case nil
                                      (progn
                                        (beginning-of-defun)
                                        (line-beginning-position))
                                    (error (point-min)))
                                  (point-min))))
                      to face)
                 ;; check iff from has been already fontified
                 (when (get-text-property from 'fontified)
                   (goto-char from)
                   (while (re-search-forward "\\(\"\\)\
\\|(\\(\\(?:map\\)?concat\\)\\>\
\\|(\\(re-search-\\(?:for\\|back\\)ward\\|looking-\\(?:at\\|back\\)\\|string-match\\|replace-regexp-in-string\
\\|message\\|error\\|skip-\\(?:syntax\\|chars\\)-\\(?:for\\|back\\)ward\\|search-\\(?:for\\|back\\)ward\\)\\>"
                                             window-end 'window-end)
                     (setq face (get-text-property
                                 (or (match-end 1) (match-beginning 0))
                                 'face))
                     (cond
                      ((match-beginning 1)
                       ;; double-quote
                       (cond
                        ((and (regexp-lock-string-face-p face)
                              (save-excursion
                                (condition-case nil
                                    (progn
                                      (setq from (match-beginning 1))
                                      (goto-char from)
                                      (forward-sexp)
                                      (setq to (point)))
                                  (error nil))))
                         (regexp-lock-set-redo from to)
                         (goto-char (min to window-end)))
                        ((and (or (and (listp face)
                                       (memq 'font-lock-doc-face face))
                                  (eq 'font-lock-doc-face face))
                              (save-excursion
                                (condition-case nil
                                    (progn
                                      (goto-char (match-beginning 1))
                                      (forward-sexp)
                                      (setq to (point)))
                                  (error nil))))
                         ;; doc-string, skip
                         (goto-char (min to window-end)))))
                      ((match-beginning 2)
                       ;; concat, mapconcat
                       (when (and (not (regexp-lock-syntactic-face-p face))
                                  (save-excursion
                                    (condition-case nil
                                        (progn
                                          (setq from (match-beginning 0))
                                          (goto-char from)
                                          (forward-sexp)
                                          (setq to (point)))
                                      (error nil)))
                                  (goto-char from))
                         (regexp-lock-set-redo from to)
                         (goto-char (min to window-end))))
                      ((match-beginning 3)
                       ;; re-search- / looking- / string-match /
                       ;; replace-regexp-in-string /
                       ;; message / error / search- / skip-syntax- /
                       ;; skip-chars-, skip
                       (if (and (not (regexp-lock-syntactic-face-p face))
                                (save-excursion
                                  (condition-case nil
                                      (progn
                                        (goto-char (match-beginning 0))
                                        (forward-sexp)
                                        (setq to (point)))
                                    (error nil))))
                           (goto-char (min to window-end))
                         (goto-char (min (point) window-end)))))))
                 (setq regexp-lock-recheck nil)
                 (when regexp-lock-redo
                   ;; activate regexp-lock-redo-timer
                   (timer-activate-when-idle
                    regexp-lock-redo-timer t)))))))))
    (or (not regexp-lock-pause)
        (sit-for regexp-lock-pause)
        (throw 'input t))))

(defun regexp-lock-fontify (bound)
  "Fontify region from `point' to BOUND."
  (let ((lock (unless (= (point) (point-min))
                (get-text-property (1- (point)) 'regexp-lock)))
        ;; `lock' - the `regexp-lock' text property - is interpreted as:
        ;; nil - no regexp around point (nil is not stored as text property)
        ;; 0 - the following sexp is a regexp
        ;; 1 - within a regexp-string that is not argument of a `concat'
        ;; >= 2 - within a `concat' that has at least one regexp argument
        ;; within a character alternative values are negative
        (from (point))
        (parse-sexp-ignore-comments t)
        to face)
    (while (< (point) bound)
      (catch 'lock
        (if lock
            (while (re-search-forward
                    "\\(^\\s(\\)\\|\\(\"\\)\\|\\(?:\\\\\\\\\\)\\(?:\\(?:\\\\\\\\\\)\\|\\([()]\\)\\|\\(|\\)\\|\\(\\[\\)\\|\\(\\]\\)\\)\
\\|\\(\\\\[][()]\\)\\|\\(\\[:[a-zA-Z]+:\\]\\)\\|\\(\\[\\)\\|\\(\\]\\)\\|\\(;\\)\\|\\((\\)\\|\\()\\)\\|`\\(\\sw\\sw+\\)'" bound 'bound)
              (setq face (get-text-property (1- (point)) 'face))
              (cond
               ((match-beginning 1)
                ;; paren in column zero, throw
                (put-text-property from (match-beginning 1) 'regexp-lock lock)
                (setq lock nil)
                (throw 'lock nil))
               ((match-beginning 2)
                ;; double-quote, ignore for lock not in {-1,0,1}
                (cond
                 ((zerop lock)
                  ;; start new regexp-string
                  (put-text-property from (match-beginning 2) 'regexp-lock 0)
                  (setq from (match-beginning 2))
                  (goto-char (1+ from))
                  (setq lock 1))
                 ((and (or (= lock 1) (= lock -1))
                       ;; the following skips adjacent double-quotes as in
                       ;; "string1""string2" which should not do much harm
                       (regexp-lock-string-face-p face)
                       (or (= (point) bound) ; fails with escaped `"' at eob
                           (not (regexp-lock-string-face-p
                                 (get-text-property (point) 'face)))))
                  ;; terminate current regexp-string
                  (put-text-property from (point) 'regexp-lock lock)
                  (when (= lock -1)
                    ;; unclosed character alternative, warn
                    (put-text-property
                     (1- (point)) (point) 'face 'font-lock-warning-face))
                  (setq lock nil)
                  (throw 'lock nil))))
               ((and (match-beginning 12)
                     (not (regexp-lock-syntactic-face-p face)))
                ;; non-syntactic left paren, expects lock not in {-1,1}
                (put-text-property from (match-beginning 12) 'regexp-lock lock)
                (setq from (match-beginning 12))
                (cond
                 ((>= lock 2) (setq lock (1+ lock)))
                 ((<= lock -2) (setq lock (1- lock)))
                 ((zerop lock) (setq lock 2))
                 (t (setq lock nil)     ; looses
                    (throw 'lock nil))))
               ((and (match-beginning 13)
                     (not (regexp-lock-syntactic-face-p face)))
                ;; non-syntactic right paren, expects lock not in {-1,1}
                (put-text-property from (match-end 13) 'regexp-lock lock)
                (setq from (match-end 13))
                (cond
                 ((> lock 2) (setq lock (1- lock)))
                 ((< lock -2) (setq lock (1+ lock)))
                 (t (when (= lock -2)
                      ;; unclosed character alternative, warn
                      (put-text-property
                       (1- (point)) (point) 'face 'font-lock-warning-face))
                    (setq lock nil)     ; end of sexp or looser
                    (throw 'lock nil))))
               ((regexp-lock-string-face-p face)
                ;; matches below are valid within strings only
                (cond
                 ((match-beginning 3)   ; \\( or \\)
                  (when (< lock 0)
                    ;; within character alternative, set symbol syntax
                    (put-text-property (1- (point)) (point) 'syntax-table '(3))
                    ;; remove faces that are silly here
                    (remove-single-text-property
                     (match-beginning 0) (1- (match-end 0))
                     'face 'font-lock-regexp-backslash)
                    (remove-single-text-property
                     (1- (match-end 0)) (match-end 0)
                     'face 'font-lock-regexp-grouping-construct)))
                 ((match-beginning 4)   ; \\|
                  (when (< lock 0)
                    ;; within character alternative remove regexp-lock faces
                    (remove-single-text-property
                     (match-beginning 0) (1- (match-end 0))
                     'face 'font-lock-regexp-backslash)
                    (remove-single-text-property
                     (1- (match-end 0)) (match-end 0)
                     'face 'font-lock-regexp-grouping-construct)))
                 ((match-beginning 5)   ; \\[
                  (let ((face (get-text-property (point) 'face)))
                    (when (and (listp face)
                               (memq 'font-lock-constant-face face))
                      ;; remove font-lock-constant-face
                      (remove-single-text-property
                       (point) (next-single-property-change
                                (point) 'face nil (line-end-position))
                       'face 'font-lock-constant-face)))
                  (if (< lock 0)
                      ;; within character alternative, reread bracket
                      (goto-char (1- (point)))
                    ;; not within character alternative, set symbol syntax
                    (put-text-property
                     (1- (point)) (point) 'syntax-table '(3))))
                 ((match-beginning 6)   ; \\]
                  (if (< lock 0)
                      ;; within character alternative, reread bracket
                      (goto-char (1- (point)))
                    ;; not within character alternative, set symbol syntax
                    (put-text-property
                     (1- (point)) (point) 'syntax-table '(3))))
                 ((match-beginning 7)   ; escaped parenthesis or bracket
                  ;; set symbol syntax for backslash and reread paren
                  (put-text-property
                   (match-beginning 0) (1+ (match-beginning 0))
                   'syntax-table '(3))
                  (goto-char (1+ (match-beginning 0))))
                 ((match-beginning 8))
                 ;; POSIX character class, skip
                 ((match-beginning 9)   ; [
                  (let ((face (get-text-property (point) 'face)))
                    (when (and (listp face)
                               (memq 'font-lock-constant-face face))
                      ;; remove font-lock-constant-face
                      (remove-single-text-property
                       (point) (next-single-property-change
                                (point) 'face nil (line-end-position))
                       'face 'font-lock-constant-face)))
                  (if (< lock 0)
                      ;; within character alternative, set symbol syntax
                      (put-text-property
                       (1- (point)) (point) 'syntax-table '(3))
                    ;; start new character alternative
                    (put-text-property from (1- (point)) 'regexp-lock lock)
                    (setq from (1- (point)))
                    (setq lock (- lock))
                    (font-lock-prepend-text-property
                     (match-beginning 9) (match-end 9)
                     'face 'font-lock-regexp-grouping-construct)
                    (when (looking-at "\\(?:\\\\?\\^\\)?\\\\?\\(\\]\\)")
                      ;; non-special right bracket, set symbol syntax
                      (put-text-property
                       (match-beginning 1) (match-end 1) 'syntax-table '(3))
                      (goto-char (match-end 1)))))
                 ((match-beginning 10)  ; ]
                  (if (> lock 0)
                      ;; not within character alternative, warn
                      (font-lock-prepend-text-property
                       (match-beginning 10) (match-end 10)
                       'face 'font-lock-warning-face)
                    ;; terminate alternative
                    (font-lock-prepend-text-property
                     (match-beginning 10) (match-end 10)
                     'face 'font-lock-regexp-grouping-construct)
                    (put-text-property from (point) 'regexp-lock lock)
                    (setq from (point))
                    (setq lock (- lock))))
                 ((or (match-beginning 11)
                      (match-beginning 12)
                      (match-beginning 13)) ; (;), set symbol syntax
                  (put-text-property (1- (point)) (point) 'syntax-table '(3)))
                 ((match-beginning 14)  ; `..', remove constant face property
                  (remove-single-text-property
                   (match-beginning 0) (match-end 0)
                   'face 'font-lock-constant-face))))))
          ;; no lock
          (while (re-search-forward "\\(\"\\)\
\\|(\\(re-search-\\(?:for\\|back\\)ward\\|looking-\\(?:at\\|back\\)\\|string-match\\|replace-regexp-in-string\\)\\>\
\\|(\\(\\(?:map\\)?concat\\)\\>\
\\|(\\(message\\|error\\|skip-\\(?:syntax\\|chars\\)-\\(?:for\\|back\\)ward\\|search-\\(?:for\\|back\\)ward\\)\\>"
                                    bound 'bound)
            (setq face (get-text-property
                        (or (match-end 1) (match-beginning 0)) 'face))
            (cond
             ((match-beginning 1)
              ;; double-quote, search for `regexp-lock-regexp-string'
              (cond
               ((and (regexp-lock-string-face-p face)
                     (save-excursion
                       (condition-case nil
                           (progn
                             (setq from (match-beginning 1))
                             (goto-char from)
                             (forward-sexp)
                             (setq to (point)))
                         (error nil))))
                (if (re-search-forward regexp-lock-regexp-string to t)
                    ;; plain string matching `regexp-lock-regexp-string'
                    (progn
                      (setq lock 1)
                      (goto-char (1+ from))
                      (throw 'lock nil))
                  ;; plain string that does not match, skip
                  (goto-char (min to bound))))
               ((and (or (and (listp face) (memq 'font-lock-doc-face face))
                         (eq 'font-lock-doc-face face))
                     (save-excursion
                       (condition-case nil
                           (progn
                             (goto-char (match-beginning 1))
                             (forward-sexp)
                             (setq to (point)))
                         (error nil))))
                ;; doc-string, skip
                (goto-char (min to bound)))))
             ((match-beginning 2)
              ;; re-search- / looking- / string-match / replace-regexp-in-string
              (unless (regexp-lock-syntactic-face-p face)
                (setq from (match-end 2))
                (setq lock 0)
                (throw 'lock nil)))
             ((match-beginning 3)
              ;; concat / mapconcat, search arguments for
              ;; `regexp-lock-regexp-string'
              (if (and (not (regexp-lock-syntactic-face-p face))
                       (save-excursion
                         (condition-case nil
                             (progn
                               (setq from (match-beginning 0))
                               (goto-char from)
                               (forward-sexp)
                               (setq to (point)))
                           (error nil)))
                       (goto-char from)
                       (re-search-forward
                        (concat regexp-lock-regexp-string
                                "\\|regexp-opt") to 'to))
                  (progn
                    (setq lock 2)
                    (goto-char (1+ from))
                    (throw 'lock nil))
                (goto-char (min (point) bound))))
             ((match-beginning 4)
              ;; message / error / search- / skip-syntax- / skip-chars-, skip
              (if (and (not (regexp-lock-syntactic-face-p face))
                       (save-excursion
                         (condition-case nil
                             (progn
                               (goto-char (match-beginning 0))
                               (forward-sexp)
                               (setq to (point)))
                           (error nil))))
                  (goto-char (min to bound))
                (goto-char (min (point) bound)))))))))
    (when lock (put-text-property from bound 'regexp-lock lock))))

;; _____________________________________________________________________________
;;
;;;                              Overlays
;; _____________________________________________________________________________
;;
(defun regexp-lock-show ()
  "Display numbers of regular expression groups.

Groups considered are subexpressions enclosed by escaped parentheses
`\\(' and `\\)'.  Shy groups are not counted.  Group numbers overlay one
or both backslashes of any `\\(' and `\\)' of the same regexp with the
number of the group.  Overlays are highlighted whenever `point' is
before the left or after the right parenthesis of an `\\(' or `\\)'.
Hence the group enclosed by `\1(...\1)', for example, represents the
subexpression matching `(match-string 1)'.  Overlays are also shown when
`point' is before a double-quote beginning, or after a double-quote
terminating a string that is part of the regular expression.

Group numbers are displayed whenever Emacs becomes idle after a delay of
`regexp-lock-show-delay' seconds.  Group numbers are highlighted with
`regexp-lock-group' face."
  (when regexp-lock-overlays
    (dolist (overlay regexp-lock-overlays)
      (delete-overlay overlay))
    (setq regexp-lock-overlays nil))
  (when (and regexp-lock-mode
             (not (eq (selected-window) regexp-lock-match-window))
             (or (and (< 2 (point))     ; \\^(
                      (< (point) (point-max))
                      (char-equal (char-after) ?\( )
                      (get-text-property (1- (point)) 'regexp-lock)
                      (> (get-text-property (1- (point)) 'regexp-lock) 0)
                      (char-equal (char-before) ?\\ )
                      (char-equal (char-before (1- (point))) ?\\ ))
                 (and (< 3 (point))     ; \\)^
                      (char-equal (char-before) ?\) )
                      (get-text-property (1- (point)) 'regexp-lock)
                      (> (get-text-property (1- (point)) 'regexp-lock) 0)
                      (char-equal (char-before (1- (point))) ?\\ )
                      (char-equal (char-before (- (point) 2)) ?\\ ))
                 (and (< (point) (point-max)) ; ^"
                      (char-equal (char-after) ?\" )
                      (get-text-property (point) 'regexp-lock)
                      (regexp-lock-string-face-p
                       (get-text-property (point) 'face))
                      (or (= (point) (point-min))
                          (not (regexp-lock-string-face-p
                                (get-text-property (1- (point)) 'face)))))
                 (and (< 3 (point))     ; "^
                      (char-equal (char-before) ?\" )
                      (get-text-property (1- (point)) 'regexp-lock)
                      (regexp-lock-string-face-p
                       (get-text-property (1- (point)) 'face))
                      (or (= (point) (point-max))
                          (not (regexp-lock-string-face-p
                                (get-text-property (point) 'face)))))))
    (save-match-data
      (save-excursion
        (let* ((at (point)) (groups nil) (number 0) (total 0)
               (from at) (to at)
               (parse-sexp-ignore-comments t))
          ;; search beginning and end, tedious
          (while (and (> from (point-min))
                      (get-text-property (1- from) 'regexp-lock)
                      (not (zerop (get-text-property (1- from) 'regexp-lock)))
                      (setq from (previous-single-property-change
                                  (point) 'regexp-lock nil (point-min)))
                      (goto-char from)))
          (goto-char at)
          (while (and (< to (point-max))
                      (get-text-property to 'regexp-lock)
                      (setq to (next-single-property-change
                                (point) 'regexp-lock nil (point-max)))
                      (goto-char to)))
          ;; make overlay for group zero
          (let ((overlay (make-overlay from to)))
            (overlay-put overlay 'face 'regexp-lock-regexp)
            (overlay-put overlay 'window (selected-window))
            (overlay-put overlay 'cursor t)
            (overlay-put overlay 'priority regexp-lock-show-priority)
            (setq regexp-lock-overlays (cons overlay regexp-lock-overlays)))
          ;; using a fixed-size vector here would avoid consing but
          ;; introduce an upper limit on the number of groupings
          (goto-char from)
          (while (re-search-forward "\\(?:\\\\\\\\\\)\\(?:\\(?:\\\\\\\\\\)\\|\\((\\(\\?:\\)?\\)\\|\\()\\)\\)\\|\\(regexp-opt\\)" to t)
            (cond
             ((and (match-beginning 4)  ; (regexp-opt ...)
                   (not (regexp-lock-syntactic-face-p (match-beginning 4))))
              (save-match-data
                (let (at-too)           ; Re-search from here.
                  (when (save-excursion
                          (goto-char (match-end 4))
                          (condition-case nil
                              (progn
                                (forward-sexp)
                                (forward-comment (buffer-size))
                                (setq at-too (point))
                                ;; Anything but `nil' and `()' counts as non-nil.
                                (when (looking-at "\\(?:nil\\|()\\)")
                                  (goto-char (match-end 0))
                                  (forward-comment (buffer-size)))
                                (and (looking-at "[^)]")))
                            (error nil)))
                    (setq total (1+ total)))
                  (when at-too (goto-char at-too)))))
             ((or (not (regexp-lock-string-face-p
                        (get-text-property (1- (point)) 'face)))
                  (< (get-text-property (1- (point)) 'regexp-lock) 0)))
             ((match-beginning 2)       ; \\(?:
              (setq groups (cons 0 groups)))
             ((match-beginning 1)       ; \\(
              (setq number (1+ total))
              (setq total (1+ total))
              (let* ((number-string (number-to-string number))
                     (length (min (length number-string) 2))
                     (overlay (make-overlay
                               (- (match-beginning 1) length)
                               (match-beginning 1))))
                (overlay-put overlay 'display
                             (propertize number-string 'face 'regexp-lock-group))
                (overlay-put overlay 'window (selected-window))
                (overlay-put overlay 'cursor t)
                (overlay-put overlay 'priority regexp-lock-show-priority)
                (setq regexp-lock-overlays (cons overlay regexp-lock-overlays)))
              (setq groups (cons number groups)))
             ((match-beginning 3)       ; \\)
              (cond
               (groups
                (setq number (car groups))
                (unless (zerop number)
                  (let* ((number-string (number-to-string number))
                         (length (min (length number-string) 2))
                         (overlay (make-overlay
                                   (- (match-beginning 3) length)
                                   (match-beginning 3))))
                    (overlay-put overlay 'display
                                 (propertize
                                  number-string 'face 'regexp-lock-group))
                    (overlay-put overlay 'window (selected-window))
                    (overlay-put overlay 'cursor t)
                    (overlay-put overlay 'priority regexp-lock-show-priority)
                    (setq regexp-lock-overlays
                          (cons overlay regexp-lock-overlays))))
                (setq groups (cdr groups)))
               (t                       ; no open group, warn
                (let ((overlay (make-overlay (1- (match-end 3)) (match-end 3))))
                  (overlay-put overlay 'face font-lock-warning-face)
                  (overlay-put overlay 'window (selected-window))
                  (overlay-put overlay 'priority regexp-lock-show-priority)
                  (setq regexp-lock-overlays
                        (cons overlay regexp-lock-overlays))))))))
          (when groups
            ;; unclosed group, warn
            (let ((overlay (make-overlay (1- to) to)))
              (overlay-put overlay 'face font-lock-warning-face)
              (overlay-put overlay 'window (selected-window))
              (overlay-put overlay 'priority regexp-lock-show-priority)
              (setq regexp-lock-overlays
                    (cons overlay regexp-lock-overlays)))))))))

;; _____________________________________________________________________________
;;
;;;                                  Matching
;; _____________________________________________________________________________
;;
(defun regexp-lock-match-pre-command ()
  "Remove match overlays."
  (when regexp-lock-match-overlays
    (dolist (overlay regexp-lock-match-overlays)
      (delete-overlay overlay))
    (setq regexp-lock-match-overlays nil))
  ;; remove ourselves from pre-command-hook
  (remove-hook 'pre-command-hook 'regexp-lock-match-pre-command))

(defun regexp-lock-match (direction)
  "Highlight expressions matching current regexp."
  (interactive)
  (unless (and regexp-lock-match-regexp
               (memq last-command
                     '(regexp-lock-match-next regexp-lock-match-prev)))
    (if (or (and (< (point) (point-max))
                 (get-text-property (point) 'regexp-lock))
            (and (> (point) (point-min))
                 (get-text-property (1- (point)) 'regexp-lock)))
        (save-match-data
          (save-excursion
            (let* ((at (point)) (from at) (to at)
                   (parse-sexp-ignore-comments t))
              ;; search beginning and end, tedious
              (while (and (> from (point-min))
                          (get-text-property (1- from) 'regexp-lock)
                          (not (zerop (get-text-property
                                       (1- from) 'regexp-lock)))
                          (setq from (previous-single-property-change
                                      (point) 'regexp-lock nil (point-min)))
                          (goto-char from)))
              (goto-char at)
              (while (and (< to (point-max))
                          (get-text-property to 'regexp-lock)
                          (setq to (next-single-property-change
                                    (point) 'regexp-lock nil (point-max)))
                          (goto-char to)))

              (save-restriction
                (narrow-to-region from to)
                (goto-char (point-min))
                (setq regexp-lock-match-regexp
                      (condition-case var
                          (eval (read (current-buffer)))
                        ;; display signal information
                        (error (message "%s" var) nil)))))))
      (message "No regexp around point")))
  (when regexp-lock-match-regexp
    (if (and regexp-lock-match-window
             (window-live-p regexp-lock-match-window)
             (not (eq regexp-lock-match-window (selected-window))))
        ;; remember buffer
        (setq regexp-lock-match-buffer (window-buffer regexp-lock-match-window))
      ;; unless regexp-lock-match-window is a live window different from
      ;; the selected one, split the selected window and make the newly
      ;; created one the new regexp-lock-match-window
      (setq regexp-lock-match-window (split-window))
      (if (and (not (eq (window-buffer regexp-lock-match-window)
                        regexp-lock-match-buffer))
               (buffer-live-p regexp-lock-match-buffer))
	  (progn
	    ;; when regexp-lock-match-buffer is a live buffer assert
	    ;; that it is displayed in regexp-lock-match-window (make
	    ;; sure we're not affected by Stefan's `set-window-buffer'
	    ;; fix).
	    (set-window-buffer
	     regexp-lock-match-window regexp-lock-match-buffer)
	    (when (eq regexp-lock-match-window (selected-window))
	      (set-buffer regexp-lock-match-buffer)))
        ;; remember buffer
        (setq regexp-lock-match-buffer
              (window-buffer regexp-lock-match-window))))
    (save-match-data
      (save-excursion
        (with-selected-window regexp-lock-match-window
          ;; handle direction changes in an intuitive way
          (cond
           ((and (eq last-command 'regexp-lock-match-next)
                 (< direction 0)
                 (eq (marker-buffer regexp-lock-match-from)
                     regexp-lock-match-buffer))
            ;; use from marker
            (goto-char regexp-lock-match-from))
           ((and (eq last-command 'regexp-lock-match-prev)
                 (> direction 0)
                 (eq (marker-buffer regexp-lock-match-to)
                     regexp-lock-match-buffer))
            ;; use to marker
            (goto-char regexp-lock-match-to)))
          (let ((at (point))
                bound first)
            (catch 'empty
              (while (if (< direction 0)
                         (re-search-backward regexp-lock-match-regexp bound t)
                       (re-search-forward regexp-lock-match-regexp bound t))
                (if (= (match-beginning 0) (match-end 0))
                    (progn
                      (message "Empty match ...")
                      (sit-for 1)
                      (throw 'empty nil))
                  (let ((overlay (make-overlay
                                  (match-beginning 0) (match-end 0)))
                        (matches (cddr (match-data)))
                        (index 1))
                    (setq regexp-lock-match-overlays
                          (cons overlay regexp-lock-match-overlays))
                    (overlay-put overlay 'face
                                 (if first
                                     'regexp-lock-match-other
                                   'regexp-lock-match))
                    (overlay-put overlay 'window regexp-lock-match-window)
                    (unless first
                      (setq first (point))
                      (set-marker regexp-lock-match-from (match-beginning 0))
                      (set-marker regexp-lock-match-to (match-end 0))
                      (setq bound
                            (save-excursion
                              (vertical-motion
                               (if (< direction 0)
                                   (- (window-height))
                                 (window-height)))
                              (setq bound (point))))
                      ;; set pre-command-hook to remove match overlays eventually
                      (add-hook 'pre-command-hook 'regexp-lock-match-pre-command)
                      (while matches
                        (cond
                         ((eq (car matches) nil)
                          (setq index (1+ index))
                          (setq matches (cddr matches)))
                         ((integer-or-marker-p (car matches))
                          (setq overlay
                                (make-overlay (car matches) (cadr matches)))
                          (overlay-put
                           overlay 'before-string
                           (propertize (concat regexp-lock-match-before-group
                                               (number-to-string index))
                                       'face 'regexp-lock-match-group))
                          (overlay-put overlay 'priority index)
                          (overlay-put overlay 'window regexp-lock-match-window)
                          (setq regexp-lock-match-overlays
                                (cons overlay regexp-lock-match-overlays))
                          (overlay-put
                           overlay 'after-string
                           (propertize (concat (number-to-string index)
                                               regexp-lock-match-after-group)
                                       'face 'regexp-lock-match-group))
                          (overlay-put overlay 'priority index)
                          (overlay-put overlay 'window regexp-lock-match-window)
                          (setq regexp-lock-match-overlays
                                (cons overlay regexp-lock-match-overlays))
                          (setq index (1+ index))
                          (setq matches (cddr matches)))
                         (t (setq matches nil))))))))
              (let ((to (or (and first regexp-lock-match-from) at)))
                (save-excursion
                  (goto-char to)
                  (vertical-motion (- (window-height)))
                  (while (re-search-forward regexp-lock-match-regexp to t)
                    (cond
                     ((= (match-beginning 0) (match-end 0))
                      (message "Empty match ...")
                      (sit-for 1)
                      (throw 'empty nil))
                     (t
                      (let ((overlay (make-overlay
                                      (match-beginning 0) (match-end 0))))
                        (setq regexp-lock-match-overlays
                              (cons overlay regexp-lock-match-overlays))
                        (overlay-put overlay 'face 'regexp-lock-match-other)
                        (overlay-put
                         overlay 'window regexp-lock-match-window)))))
                  (goto-char (or (and first regexp-lock-match-to) to))
                  (setq to (save-excursion
                             (vertical-motion (window-height))
                             (point)))
                  (while (re-search-forward regexp-lock-match-regexp to t)
                    (cond
                     ((= (match-beginning 0) (match-end 0))
                      (message "Empty match ...")
                      (sit-for 1)
                      (throw 'empty nil))
                     (t
                      (let ((overlay (make-overlay
                                      (match-beginning 0) (match-end 0))))
                        (setq regexp-lock-match-overlays
                              (cons overlay regexp-lock-match-overlays))
                        (overlay-put overlay 'face 'regexp-lock-match-other)
                        (overlay-put
                         overlay 'window regexp-lock-match-window))))))))
            (if first
                (progn
                  (goto-char first)
                  (unless (pos-visible-in-window-p)
                    (if (< direction 0)
                        (recenter -3)
                      (recenter 3))))
              (goto-char at)
              (set-marker regexp-lock-match-from nil)
              (set-marker regexp-lock-match-to nil)
              (message "No (more) matches ...")
              (sit-for 1))))))))

(defun regexp-lock-match-next ()
  "Move to next matching expression."
  (interactive)
  (if (memq last-command '(regexp-lock-match-next regexp-lock-match-prev))
      (regexp-lock-match 1)
    (regexp-lock-match 0)))

(defun regexp-lock-match-prev ()
  "Move to previous matching expression."
  (interactive)
  (regexp-lock-match -1))

;; _____________________________________________________________________________
;;
;;;                 Increment / Decrement group numbers
;; _____________________________________________________________________________
;;
(defun regexp-lock-increment (above increment start end)
  "In-/Decrement group numbers within region.

Within region add INCREMENT to all arguments of `match-beginning',
`match-end', and `match-string' greater or equal ABOVE."
  (interactive "nIn-/Decrement group numbers >=: \nnBy: \nr")
  (save-excursion
    (goto-char start)
    (let ((count 0))
      (while (re-search-forward
	      ;; Added `replace-match' on 2009-08-04.
              "\\((match-\\(?:beginning\\|end\\|string\\(?:-no-properties\\)?\\)[ \t\n\f]+\\([0-9]+\\))\\)\
\\|\\((replace-match\\)"
              end t)
	(cond
	 ((match-beginning 1)
	  (let ((number (string-to-number (match-string 2))))
	    (when (>= number above)
	      (replace-match
	       (number-to-string (+ number increment)) nil nil nil 2)
	      (setq count (1+ count)))))
	 ((match-beginning 3)
	  ;; `replace-match' is hairy because the SUBEXP arg is optional.
	  (condition-case nil
	      (progn
		(forward-sexp 4)
		(forward-comment (buffer-size))
		(when (looking-at "[0-9]+")
		  (let ((number (string-to-number (match-string 0))))
		    (when (>= number above)
		      (replace-match
		       (number-to-string (+ number increment)))
		      (setq count (1+ count))))))
	    (error nil)))))
      (if (zerop count)
          (message "No substitutions performed")
        (message "%s substitution(s) performed" count)))))

(provide 'regexp-lock)

;;; regexp-lock.el ends here

^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01  9:44           ` martin rudalics
@ 2020-12-01 10:07             ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-01 10:07 UTC (permalink / raw)
  To: martin rudalics; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Hello, Martin.

On Tue, Dec 01, 2020 at 10:44:31 +0100, martin rudalics wrote:
>  > There are 342 occurrences of '\\\\([^?]' in CC Mode.  Most of these can
>  > surely be replaced by "\\(?:", but not all, by a long way.  This change
>  > will be fun.

> Years ago I wrote the attached that might help you in this regard (load
> it and do 'turn-on-regexp-lock-mode').  If you move point before the "("
> of a "\\(" it should give you the appropriate nesting.

Thanks!  I'll have a look at it.

> martin

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01  9:21       ` Alan Mackenzie
@ 2020-12-01 12:03         ` Mattias Engdegård
  2020-12-01 12:57           ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-01 12:03 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

1 dec. 2020 kl. 10.21 skrev Alan Mackenzie <acm@muc.de>:

> (i) Take the first 10% of the original 4MB file, and save it in a
>  different file.
> (ii) Fontify that file from top to bottom: according to EPL, 292s
> (iii) Insert 9 new lines "{}" every 10% of that new file.
> (iv) Fontify the amended file top to bottom: new time 98s.
> 
> That's a factor of 3 different.

Thank you, quite remarkable and a very useful piece of information!
Please let me curb some unwarranted optimism that I'm guilty of engendering:

We have been measuring slightly different things. Being lazy, I timed the fontification in one go:

 (font-lock-ensure (point-min) (point-max))

which took about 65 s originally and went down to about 24 s by fixing the regexps as previously mentioned. Much better but still not wonderful.

You have measured interactive scrolling which is more realistic, but fontifying the buffer piecemeal it exercises slightly different code paths. Fixing those regexps helps but not as much, and clearly more work is needed.

(By the way, could you direct me to your benchmark code? I don't think I have it.)

Still, improving regexps is clearly beneficial. Reducing allocation can be effective as well; a fair bit of the profile is in the GC.






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01 12:03         ` Mattias Engdegård
@ 2020-12-01 12:57           ` Alan Mackenzie
  2020-12-01 14:07             ` Mattias Engdegård
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-01 12:57 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Tue, Dec 01, 2020 at 13:03:21 +0100, Mattias Engdegård wrote:
> 1 dec. 2020 kl. 10.21 skrev Alan Mackenzie <acm@muc.de>:

> > (i) Take the first 10% of the original 4MB file, and save it in a
> >  different file.
> > (ii) Fontify that file from top to bottom: according to EPL, 292s
> > (iii) Insert 9 new lines "{}" every 10% of that new file.
> > (iv) Fontify the amended file top to bottom: new time 98s.

> > That's a factor of 3 different.

> Thank you, quite remarkable and a very useful piece of information!
> Please let me curb some unwarranted optimism that I'm guilty of
> engendering:

> We have been measuring slightly different things. Being lazy, I timed
> the fontification in one go:

>  (font-lock-ensure (point-min) (point-max))

> which took about 65 s originally and went down to about 24 s by fixing
> the regexps as previously mentioned. Much better but still not
> wonderful.

> You have measured interactive scrolling which is more realistic, but
> fontifying the buffer piecemeal it exercises slightly different code
> paths. Fixing those regexps helps but not as much, and clearly more
> work is needed.

> (By the way, could you direct me to your benchmark code? I don't think
> I have it.)

Just something I threw together a few years ago, and use regularly on
xdisp.c to check nothing's gone seriously slow/see how well my latest
optimisation has worked.

(defmacro time-it (&rest forms)
  "Time the running of a sequence of forms using `float-time'.
Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
  `(let ((start (float-time)))
    ,@forms
    (- (float-time) start)))

(defun time-scroll (&optional arg)
  (interactive "P")
  (message "%s"
           (time-it
            (condition-case nil
                (while t
                  (if arg (scroll-down) (scroll-up))
                  (sit-for 0))
              (error nil)))))

Put point at the start or end of a buffer and do M-: (time-scroll) or M-:
(time-scroll t) as appropriate.

> Still, improving regexps is clearly beneficial. Reducing allocation can
> be effective as well; a fair bit of the profile is in the GC.

How much time does this regexp change save on a "normal" file, such as
src/xdisp.c?

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01  5:48         ` Ravine Var
@ 2020-12-01 13:34           ` Mattias Engdegård
  0 siblings, 0 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-01 13:34 UTC (permalink / raw)
  To: Ravine Var; +Cc: Alan Mackenzie, Lars Ingebrigtsen, 25706

1 dec. 2020 kl. 06.48 skrev Ravine Var <ravine.var@gmail.com>:

> Will this patch fix the problem with big header files like
> the one originally reported ?

Unfortunately it seems that my benchmarking approach was misleading; see my previous reply to Alan. Sorry about that.
The patch helps a bit but not nearly enough, so for big header files like the ones you mention in the asic_reg directory, it may not make much of a difference.

It is obviously worthwhile, but again as Alan noted, the incremental fontifying cost increases with distance from the start of the file (absent any actual code other than preprocessor definitions), leading to the observed superlinear behaviour. More robust heuristics needed.






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01 12:57           ` Alan Mackenzie
@ 2020-12-01 14:07             ` Mattias Engdegård
  2020-12-01 15:27               ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-01 14:07 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

1 dec. 2020 kl. 13.57 skrev Alan Mackenzie <acm@muc.de>:

> Just something I threw together a few years ago, and use regularly on
> xdisp.c to check nothing's gone seriously slow/see how well my latest
> optimisation has worked.

Thank you, good, I just wanted to know that we are measuring the same thing!

> How much time does this regexp change save on a "normal" file, such as
> src/xdisp.c?

Not much, but clearly measurable -- about 1.5 % (scrolling benchmark).

What can be done for big files that mainly consist of preprocessor definitions? 




^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01 14:07             ` Mattias Engdegård
@ 2020-12-01 15:27               ` Alan Mackenzie
  2020-12-01 18:59                 ` Mattias Engdegård
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-01 15:27 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: acm, Lars Ingebrigtsen, 25706

Hello, Mattias.

On Tue, Dec 01, 2020 at 15:07:02 +0100, Mattias Engdegård wrote:
> 1 dec. 2020 kl. 13.57 skrev Alan Mackenzie <acm@muc.de>:

> > Just something I threw together a few years ago, and use regularly on
> > xdisp.c to check nothing's gone seriously slow/see how well my latest
> > optimisation has worked.

> Thank you, good, I just wanted to know that we are measuring the same thing!

> > How much time does this regexp change save on a "normal" file, such as
> > src/xdisp.c?

> Not much, but clearly measurable -- about 1.5 % (scrolling benchmark).

Ah.  ;-)  Do you think the difference might be significantly more if I
were systematically to expunge "\\("s from CC Mode?

> What can be done for big files that mainly consist of preprocessor
> definitions? 

Add in yet another cache (or fix the existing cache which is buggy) for
whatever it is that's searching backwards for braces.

The cache would look something like (P . St) meaning P is the position of
the highest brace before St.  P nil would mean there was no opening brace
at all before St.

So any backward search for a { starting between P and St could just
return P, any search starting after St. would only need to search back to
St, and so on.  It's rather messy and easy not to get right first time,
but it could make a tremendous difference to these crazy include files.

I put in a cache like that for macros after somebody complained about the
sluggishness in his file (which was basically a single 4,000 line macro).

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01 15:27               ` Alan Mackenzie
@ 2020-12-01 18:59                 ` Mattias Engdegård
  2020-12-02 10:15                   ` Alan Mackenzie
       [not found]                   ` <X8dpQeGaDD1w3kXX@ACM>
  0 siblings, 2 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-01 18:59 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

1 dec. 2020 kl. 16.27 skrev Alan Mackenzie <acm@muc.de>:

> Ah.  ;-)  Do you think the difference might be significantly more if I
> were systematically to expunge "\\("s from CC Mode?

No, probably not. It's just obvious low-hanging fruit; every little helps some. Doing so also makes the regexps a little less mystifying for the reader since the only capture groups left are those actually used. Finally, it removes or at least raises some hard limits that we had in the past (from regexp stack overflow).

> Add in yet another cache (or fix the existing cache which is buggy) for
> whatever it is that's searching backwards for braces.

Are the bugs in the existing cache preventing it from making the cases under discussion faster?

A naïve question: the files we are talking about are dominated by (mostly single-line) preprocessor directives whose fontification should be invariant of context (as long as they are not inside comments or strings, but that's not hard to find out). Why do we then spend time looking for context at all?

From profiling, it seems that about 30 % of the time is spent in c-determine-limit, called from c-fl-decl-start, c-font-lock-enclosing-decls and c-font-lock-cut-off-declarators (about 10 % each). 




^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-01 18:59                 ` Mattias Engdegård
@ 2020-12-02 10:15                   ` Alan Mackenzie
       [not found]                   ` <X8dpQeGaDD1w3kXX@ACM>
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-02 10:15 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Tue, Dec 01, 2020 at 19:59:04 +0100, Mattias Engdegård wrote:
> 1 dec. 2020 kl. 16.27 skrev Alan Mackenzie <acm@muc.de>:

> > Ah.  ;-)  Do you think the difference might be significantly more if I
> > were systematically to expunge "\\("s from CC Mode?

> No, probably not. It's just obvious low-hanging fruit; every little
> helps some. Doing so also makes the regexps a little less mystifying
> for the reader since the only capture groups left are those actually
> used. Finally, it removes or at least raises some hard limits that we
> had in the past (from regexp stack overflow).

OK.  That's a project for ASAP, but not, then, urgent.

> > Add in yet another cache (or fix the existing cache which is buggy)
> > for whatever it is that's searching backwards for braces.

> Are the bugs in the existing cache preventing it from making the cases
> under discussion faster?

I spent yesterday evening investigating the "CC Mode state cache", i.e.
the thing that keeps track of braces and open parens/brackets.  I found a
place where it was unnecessarily causing scanning from BOB, and fixed it
provisionally.  On doing a (time-scroll) on the entire monster buffer, it
saved ~25% of the run time.  There is definitely something else scanning
repeatedly from BOB - the screen scrolling was more sluggish near the end
of the buffer than half way through.

Here's that provisional patch, if you'd like to try it:



diff -r 863d08a1858a cc-engine.el
--- a/cc-engine.el	Thu Nov 26 11:27:52 2020 +0000
+++ b/cc-engine.el	Wed Dec 02 09:55:50 2020 +0000
@@ -3672,9 +3672,9 @@
 	    how-far 0))
      ((<= good-pos here)
       (setq strategy 'forward
-	    start-point (if changed-macro-start
-			    cache-pos
-			  (max good-pos cache-pos))
+	    start-point  ;; (if changed-macro-start  OLD STOUGH, 2020-12-01
+			 ;;    cache-pos
+			  (max good-pos cache-pos);; )
 	    how-far (- here start-point)))
      ((< (- good-pos here) (- here cache-pos)) ; FIXME!!! ; apply some sort of weighting.
       (setq strategy 'backward



> A naïve question: the files we are talking about are dominated by
> (mostly single-line) preprocessor directives whose fontification should
> be invariant of context (as long as they are not inside comments or
> strings, but that's not hard to find out). Why do we then spend time
> looking for context at all?

Because many situations are context dependent, particularly in C++ Mode.
That raises the possibility of not tracking context for these monster
files.h, but how would one distinguish between these different "types" of
CC Mode file?

> From profiling, it seems that about 30 % of the time is spent in
> c-determine-limit, called from c-fl-decl-start,
> c-font-lock-enclosing-decls and c-font-lock-cut-off-declarators (about
> 10 % each).

Yes.  c-determine-limit scans backwards over a buffer to find a position
that is around N non-string non-comment characters before point.

I put some instrumentation on it yesterday evening, and it is apparent
that it is getting called four times in succession from the same point
with N = 500, 1000, 1000, 1000.  This screams out for a simple cache,
which I intend to implement.  Also, maybe I should always call
c-determine-limit with the same N, and perhaps even cut N to 500 in all
cases.  Or something like that.  It is clear that a great deal of run
time could be saved, here.

Also, I intend to track down whatever the other thing is that is scanning
from the previous brace or BOB.  It may be possible to alter the handling
of these monster files from impossibly slow to somewhat sluggish.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                   ` <X8dpQeGaDD1w3kXX@ACM>
@ 2020-12-02 15:06                     ` Mattias Engdegård
  2020-12-03 10:48                       ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-02 15:06 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

2 dec. 2020 kl. 11.15 skrev Alan Mackenzie <acm@muc.de>:

> I spent yesterday evening investigating the "CC Mode state cache", i.e.
> the thing that keeps track of braces and open parens/brackets.  I found a
> place where it was unnecessarily causing scanning from BOB, and fixed it
> provisionally.  On doing a (time-scroll) on the entire monster buffer, it
> saved ~25% of the run time.  There is definitely something else scanning
> repeatedly from BOB - the screen scrolling was more sluggish near the end
> of the buffer than half way through.
> 
> Here's that provisional patch, if you'd like to try it:

Thanks, it does indeed speed things up in various synthetic tests as well. You are right that there still seems to be at least a quadratic term left.

> Because many situations are context dependent, particularly in C++ Mode.
> That raises the possibility of not tracking context for these monster
> files.h, but how would one distinguish between these different "types" of
> CC Mode file?

Please bear with my lack of understanding of how this works, but what I meant is that a preprocessor line neither affects nor is affected by the context, so until something other than such lines (and comments) are found in the region being fontified, there should be no need to determine the context in the first place.

> I put some instrumentation on it yesterday evening, and it is apparent
> that it is getting called four times in succession from the same point
> with N = 500, 1000, 1000, 1000.  This screams out for a simple cache,
> which I intend to implement.  Also, maybe I should always call
> c-determine-limit with the same N, and perhaps even cut N to 500 in all
> cases.  Or something like that.  It is clear that a great deal of run
> time could be saved, here.
> 
> Also, I intend to track down whatever the other thing is that is scanning
> from the previous brace or BOB.  It may be possible to alter the handling
> of these monster files from impossibly slow to somewhat sluggish.

There is optimism then! Some of the files from the Linux tree mentioned by Ravine Var are also good to try, such as
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-02 15:06                     ` Mattias Engdegård
@ 2020-12-03 10:48                       ` Alan Mackenzie
  2020-12-03 14:03                         ` Mattias Engdegård
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-03 10:48 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Wed, Dec 02, 2020 at 16:06:43 +0100, Mattias Engdegård wrote:
> 2 dec. 2020 kl. 11.15 skrev Alan Mackenzie <acm@muc.de>:

> > I spent yesterday evening investigating the "CC Mode state cache", i.e.
> > the thing that keeps track of braces and open parens/brackets.  I found a
> > place where it was unnecessarily causing scanning from BOB, and fixed it
> > provisionally.  On doing a (time-scroll) on the entire monster buffer, it
> > saved ~25% of the run time.  There is definitely something else scanning
> > repeatedly from BOB - the screen scrolling was more sluggish near the end
> > of the buffer than half way through.

I've found it.  There was a "harmless" c-backward-syntactic-ws invocation
in c-determine-limit.  This macro moves back over syntactic whitespace,
which includes macros.  So this was going back all the way to BOB, from
which we scanned forward again.

In the enclosed patch (which includes my previous amendment) I've removed
this.

There are many other places which invoke c-backward-syntactic-ws without
giving the limit argument, and these slow down CC Mode too, though not as
dramatically as the removed one.  I have given limits arguments to two of
these in c-font-complex-decl-prepare, which reduce the (time-scroll) time
for the last 10% of the entire monster file from ~77s to ~44s.

I intend to instrument c-backward-sws to determine which of the other
invocations of c-backward-syntactic-ws are most time consuming.  There
are around 90 such calls in CC Mode.  :-(

It now takes me just under 6 minutes to (time-scroll) through the entire
buffer, compared with a previous hour.  As already mentioned, it is still
slightly more sluggish near the end of the buffer than near the start.

> > Here's that provisional patch, if you'd like to try it:

So, here's another provisional patch:



diff -r 863d08a1858a cc-engine.el
--- a/cc-engine.el	Thu Nov 26 11:27:52 2020 +0000
+++ b/cc-engine.el	Thu Dec 03 10:43:45 2020 +0000
@@ -3672,9 +3672,7 @@
 	    how-far 0))
      ((<= good-pos here)
       (setq strategy 'forward
-	    start-point (if changed-macro-start
-			    cache-pos
-			  (max good-pos cache-pos))
+	    start-point (max good-pos cache-pos)
 	    how-far (- here start-point)))
      ((< (- good-pos here) (- here cache-pos)) ; FIXME!!! ; apply some sort of weighting.
       (setq strategy 'backward
@@ -5778,8 +5776,6 @@
   ;; Get a "safe place" approximately TRY-SIZE characters before START.
   ;; This defsubst doesn't preserve point.
   (goto-char start)
-  (c-backward-syntactic-ws)
-  (setq start (point))
   (let* ((pos (max (- start try-size) (point-min)))
 	 (s (c-semi-pp-to-literal pos))
 	 (cand (or (car (cddr s)) pos)))
diff -r 863d08a1858a cc-fonts.el
--- a/cc-fonts.el	Thu Nov 26 11:27:52 2020 +0000
+++ b/cc-fonts.el	Thu Dec 03 10:43:45 2020 +0000
@@ -948,7 +948,7 @@
     ;; closest token before the region.
     (save-excursion
       (let ((pos (point)))
-	(c-backward-syntactic-ws)
+	(c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 	(c-clear-char-properties
 	 (if (and (not (bobp))
 		  (memq (c-get-char-property (1- (point)) 'c-type)
@@ -970,7 +970,7 @@
     ;; The declared identifiers are font-locked correctly as types, if
     ;; that is what they are.
     (let ((prop (save-excursion
-		  (c-backward-syntactic-ws)
+		  (c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 		  (unless (bobp)
 		    (c-get-char-property (1- (point)) 'c-type)))))
       (when (memq prop '(c-decl-id-start c-decl-type-start))



[ .... ]

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-03 10:48                       ` Alan Mackenzie
@ 2020-12-03 14:03                         ` Mattias Engdegård
  2020-12-04 21:04                           ` Alan Mackenzie
       [not found]                           ` <X8qkcokfZGbaK5A2@ACM>
  0 siblings, 2 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-03 14:03 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

3 dec. 2020 kl. 11.48 skrev Alan Mackenzie <acm@muc.de>:

> I've found it.  There was a "harmless" c-backward-syntactic-ws invocation
> in c-determine-limit.  This macro moves back over syntactic whitespace,
> which includes macros.  So this was going back all the way to BOB, from
> which we scanned forward again.

Not bad. Now Emacs starts becoming usable for real code!
I can confirm a big subjective improvement on several big preprocessor-heavy files, and measurements agree.

> It now takes me just under 6 minutes to (time-scroll) through the entire
> buffer, compared with a previous hour.  As already mentioned, it is still
> slightly more sluggish near the end of the buffer than near the start.

Is that with or without my regexp patch?

It looks like there may be more regexp improvements possible. We can take a closer look later on, when the running time is less dominated by other issues.






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-03 14:03                         ` Mattias Engdegård
@ 2020-12-04 21:04                           ` Alan Mackenzie
       [not found]                           ` <X8qkcokfZGbaK5A2@ACM>
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-04 21:04 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

On Thu, Dec 03, 2020 at 15:03:27 +0100, Mattias Engdegård wrote:
> 3 dec. 2020 kl. 11.48 skrev Alan Mackenzie <acm@muc.de>:

> > I've found it.  There was a "harmless" c-backward-syntactic-ws
> > invocation in c-determine-limit.  This macro moves back over
> > syntactic whitespace, which includes macros.  So this was going back
> > all the way to BOB, from which we scanned forward again.

> Not bad. Now Emacs starts becoming usable for real code!  I can confirm
> a big subjective improvement on several big preprocessor-heavy files,
> and measurements agree.

I think you'll like my latest provisional patch!

I've tracked down and eliminated a ~0.5s delay when typing characters
into a "monster" buffer near the end.

> > It now takes me just under 6 minutes to (time-scroll) through the entire
> > buffer, compared with a previous hour.  As already mentioned, it is still
> > slightly more sluggish near the end of the buffer than near the start.

With the latest patch, it takes me 121s.

> Is that with or without my regexp patch?

Without.

> It looks like there may be more regexp improvements possible. We can
> take a closer look later on, when the running time is less dominated by
> other issues.

Maybe that time is now.  Please try the latest patch.  I think there are
still things needing optimisation in C++ Mode (make sure your monster
buffers are in C Mode, please).  But for now....



diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
index 252eec138c..22e6ef5894 100644
--- a/lisp/progmodes/cc-engine.el
+++ b/lisp/progmodes/cc-engine.el
@@ -972,7 +972,7 @@ c-beginning-of-statement-1
       ;; that we've moved.
       (while (progn
 	       (setq pos (point))
-	       (c-backward-syntactic-ws)
+	       (c-backward-syntactic-ws lim)
 	       ;; Protect post-++/-- operators just before a virtual semicolon.
 	       (and (not (c-at-vsemi-p))
 		    (/= (skip-chars-backward "-+!*&~@`#") 0))))
@@ -984,7 +984,7 @@ c-beginning-of-statement-1
       (if (and (memq (char-before) delims)
 	       (progn (forward-char -1)
 		      (setq saved (point))
-		      (c-backward-syntactic-ws)
+		      (c-backward-syntactic-ws lim)
 		      (or (memq (char-before) delims)
 			  (memq (char-before) '(?: nil))
 			  (eq (char-syntax (char-before)) ?\()
@@ -1164,7 +1164,7 @@ c-beginning-of-statement-1
                 ;; HERE IS THE SINGLE PLACE INSIDE THE PDA LOOP WHERE WE MOVE
 		;; BACKWARDS THROUGH THE SOURCE.
 
-		(c-backward-syntactic-ws)
+		(c-backward-syntactic-ws lim)
 		(let ((before-sws-pos (point))
 		      ;; The end position of the area to search for statement
 		      ;; barriers in this round.
@@ -1188,7 +1188,7 @@ c-beginning-of-statement-1
 			 ((and (not macro-start)
 			       (c-beginning-of-macro))
 			  (save-excursion
-			    (c-backward-syntactic-ws)
+			    (c-backward-syntactic-ws lim)
 			    (setq before-sws-pos (point)))
 			  ;; Have we crossed a statement boundary?  If not,
 			  ;; keep going back until we find one or a "real" sexp.
@@ -1413,7 +1413,7 @@ c-beginning-of-statement-1
 
       ;; Skip over the unary operators that can start the statement.
       (while (progn
-	       (c-backward-syntactic-ws)
+	       (c-backward-syntactic-ws lim)
 	       ;; protect AWK post-inc/decrement operators, etc.
 	       (and (not (c-at-vsemi-p (point)))
 		    (/= (skip-chars-backward "-.+!*&~@`#") 0)))
@@ -3568,15 +3568,18 @@ c-get-fallback-scan-pos
   ;; Return a start position for building `c-state-cache' from
   ;; scratch.  This will be at the top level, 2 defuns back.
   (save-excursion
-    ;; Go back 2 bods, but ignore any bogus positions returned by
-    ;; beginning-of-defun (i.e. open paren in column zero).
-    (goto-char here)
-    (let ((cnt 2))
-      (while (not (or (bobp) (zerop cnt)))
-	(c-beginning-of-defun-1)	; Pure elisp BOD.
-	(if (eq (char-after) ?\{)
-	    (setq cnt (1- cnt)))))
-    (point)))
+    (save-restriction
+      (when (> here (* 10 c-state-cache-too-far))
+	(narrow-to-region (- here (* 10 c-state-cache-too-far)) here))
+      ;; Go back 2 bods, but ignore any bogus positions returned by
+      ;; beginning-of-defun (i.e. open paren in column zero).
+      (goto-char here)
+      (let ((cnt 2))
+	(while (not (or (bobp) (zerop cnt)))
+	  (c-beginning-of-defun-1)	; Pure elisp BOD.
+	  (if (eq (char-after) ?\{)
+	      (setq cnt (1- cnt)))))
+      (point))))
 
 (defun c-state-balance-parens-backwards (here- here+ top)
   ;; Return the position of the opening paren/brace/bracket before HERE- which
@@ -3667,9 +3670,7 @@ c-parse-state-get-strategy
 	    how-far 0))
      ((<= good-pos here)
       (setq strategy 'forward
-	    start-point (if changed-macro-start
-			    cache-pos
-			  (max good-pos cache-pos))
+	    start-point (max good-pos cache-pos)
 	    how-far (- here start-point)))
      ((< (- good-pos here) (- here cache-pos)) ; FIXME!!! ; apply some sort of weighting.
       (setq strategy 'backward
@@ -4337,8 +4338,12 @@ c-invalidate-state-cache-1
       (if (and dropped-cons
 	       (<= too-high-pa here))
 	  (c-append-lower-brace-pair-to-state-cache too-high-pa here here-bol))
-      (setq c-state-cache-good-pos (or (c-state-cache-after-top-paren)
-				       (c-state-get-min-scan-pos)))))
+      (if (and c-state-cache-good-pos (< here c-state-cache-good-pos))
+	  (setq c-state-cache-good-pos
+		(or (save-excursion
+		      (goto-char here)
+		      (c-literal-start))
+		    here)))))
 
   ;; The brace-pair desert marker:
   (when (car c-state-brace-pair-desert)
@@ -5402,8 +5407,11 @@ c-syntactic-skip-backward
 	       ;; Optimize for, in particular, large blocks of comments from
 	       ;; `comment-region'.
 	       (progn (when opt-ws
-			(c-backward-syntactic-ws)
-			(setq paren-level-pos (point)))
+			(let ((opt-pos (point)))
+			  (c-backward-syntactic-ws limit)
+			  (if (> (point) limit)
+			      (setq paren-level-pos (point))
+			    (goto-char opt-pos))))
 		      t)
 	       ;; Move back to a candidate end point which isn't in a literal
 	       ;; or in a macro we didn't start in.
@@ -5423,7 +5431,10 @@ c-syntactic-skip-backward
 				     (setq macro-start (point))))
 			    (goto-char macro-start))))
 		   (when opt-ws
-		     (c-backward-syntactic-ws)))
+		     (let ((opt-pos (point)))
+		       (c-backward-syntactic-ws limit)
+		       (if (<= (point) limit)
+			   (goto-char opt-pos)))))
 		 (< (point) pos))
 
 	       ;; Check whether we're at the wrong level of nesting (when
@@ -5766,8 +5777,6 @@ c-determine-limit-get-base
   ;; Get a "safe place" approximately TRY-SIZE characters before START.
   ;; This defsubst doesn't preserve point.
   (goto-char start)
-  (c-backward-syntactic-ws)
-  (setq start (point))
   (let* ((pos (max (- start try-size) (point-min)))
 	 (s (c-semi-pp-to-literal pos))
 	 (cand (or (car (cddr s)) pos)))
@@ -6248,8 +6257,13 @@ c-find-decl-prefix-search
        ;; preceding syntactic ws to set `cfd-match-pos' and to catch
        ;; any decl spots in the syntactic ws.
        (unless cfd-re-match
-	 (c-backward-syntactic-ws)
-	 (setq cfd-re-match (point))))
+	 (let ((cfd-cbsw-lim (- (point) 1000)))
+	   (c-backward-syntactic-ws cfd-cbsw-lim)
+	   (setq cfd-re-match
+		 (if (> (point) cfd-cbsw-lim)
+		     (point)
+		   0)))		   ; Set BOB case if the token's too far back.
+	 ))
 
      ;; Choose whichever match is closer to the start.
      (if (< cfd-re-match cfd-prop-match)
@@ -6482,7 +6496,10 @@ c-find-decl-spots
 	(c-invalidate-find-decl-cache cfd-start-pos)
 
 	(setq syntactic-pos (point))
-	(unless (eq syntactic-pos c-find-decl-syntactic-pos)
+	(unless
+	    (or (eq syntactic-pos c-find-decl-syntactic-pos)
+		(null c-find-decl-syntactic-pos)
+		(< c-find-decl-syntactic-pos (- (point) 10000)))
 	  ;; Don't have to do this if the cache is relevant here,
 	  ;; typically if the same line is refontified again.  If
 	  ;; we're just some syntactic whitespace further down we can
diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el
index bb7e5bea6e..07dcefb8d1 100644
--- a/lisp/progmodes/cc-fonts.el
+++ b/lisp/progmodes/cc-fonts.el
@@ -947,7 +947,7 @@ c-font-lock-complex-decl-prepare
     ;; closest token before the region.
     (save-excursion
       (let ((pos (point)))
-	(c-backward-syntactic-ws)
+	(c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 	(c-clear-char-properties
 	 (if (and (not (bobp))
 		  (memq (c-get-char-property (1- (point)) 'c-type)
@@ -969,7 +969,7 @@ c-font-lock-complex-decl-prepare
     ;; The declared identifiers are font-locked correctly as types, if
     ;; that is what they are.
     (let ((prop (save-excursion
-		  (c-backward-syntactic-ws)
+		  (c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 		  (unless (bobp)
 		    (c-get-char-property (1- (point)) 'c-type)))))
       (when (memq prop '(c-decl-id-start c-decl-type-start))
@@ -1496,7 +1496,8 @@ c-font-lock-declarations
 
 		 ;; Check we haven't missed a preceding "typedef".
 		 (when (not (looking-at c-typedef-key))
-		   (c-backward-syntactic-ws)
+		   (c-backward-syntactic-ws
+		    (max (- (point) 1000) (point-min)))
 		   (c-backward-token-2)
 		   (or (looking-at c-typedef-key)
 		       (goto-char start-pos)))
@@ -1536,8 +1537,10 @@ c-font-lock-declarations
 				     (c-backward-token-2)
 				     (and
 				      (not (looking-at c-opt-<>-sexp-key))
-				      (progn (c-backward-syntactic-ws)
-					     (memq (char-before) '(?\( ?,)))
+				      (progn
+					(c-backward-syntactic-ws
+					 (max (- (point) 1000) (point-min)))
+					(memq (char-before) '(?\( ?,)))
 				      (not (eq (c-get-char-property (1- (point))
 								    'c-type)
 					       'c-decl-arg-start))))))
@@ -2295,7 +2298,8 @@ c-font-lock-c++-using
 		  (and c-colon-type-list-re
 		       (c-go-up-list-backward)
 		       (eq (char-after) ?{)
-		       (eq (car (c-beginning-of-decl-1)) 'same)
+		       (eq (car (c-beginning-of-decl-1
+				 (c-determine-limit 1000))) 'same)
 		       (looking-at c-colon-type-list-re)))
 		;; Inherited protected member: leave unfontified
 		)


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                           ` <X8qkcokfZGbaK5A2@ACM>
@ 2020-12-05 15:20                             ` Mattias Engdegård
  2020-12-08 18:42                               ` Alan Mackenzie
       [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
  0 siblings, 2 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-05 15:20 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

4 dec. 2020 kl. 22.04 skrev Alan Mackenzie <acm@muc.de>:

> I think you'll like my latest provisional patch!
> 
> I've tracked down and eliminated a ~0.5s delay when typing characters
> into a "monster" buffer near the end.

That's nice, thank you! It seems to be about 19 % faster than the previous patch on this particular file, which is not bad at all.

Somehow, the delay when inserting a newline (pressing return) at line 83610 of osprey_reg_map_macro.h becomes longer with the patch. Of course this is more than compensated by the speed-up in general, but it may be worth taking a look at.

There is also a new and noticeable delay (0.5-1 s) in the very beginning when scrolling through the file. (This is with the frame sized to show 41 lines of 80 chars of a window, excluding mode line and echo area.)






^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-05 15:20                             ` Mattias Engdegård
@ 2020-12-08 18:42                               ` Alan Mackenzie
       [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-08 18:42 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello again, Mattias.

On Sat, Dec 05, 2020 at 16:20:54 +0100, Mattias Engdegård wrote:
> 4 dec. 2020 kl. 22.04 skrev Alan Mackenzie <acm@muc.de>:

[ .... ]

> That's nice, thank you! It seems to be about 19 % faster than the
> previous patch on this particular file, which is not bad at all.

Well, the enclosed patch improves on this a little, particularly in C++
Mode.  (Trying the monster file.h in C++ Mode is now something worth
trying).

Just as a matter of interest, I've done a fair bit of testing with a
larger monster file (~14 MB) in the Linux kernel, at

    linux/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

.  That's 133,000 lines, give or take.  Even our largest file,
src/xdisp.c is only 36,000 lines.  I don't understand how a file
describing hardware can come to anything like 133k lines.  It must be
soul destroying to have to write a driver based on a file like this.
That file was put together by AMD, and I suspect they didn't take all
that much care to make it usable.

> Somehow, the delay when inserting a newline (pressing return) at line
> 83610 of osprey_reg_map_macro.h becomes longer with the patch.

I think I've fixed this.  Thanks for prompting me.

> Of course this is more than compensated by the speed-up in general,
> but it may be worth taking a look at.

There's one thing which still puzzles me.  In osprey_reg....h, when
scrolling through it (e.g. with (time-scroll)), it stutters markedly at
around 13% of the way through.  I've managed to localize this, it's
happening in the macro c-find-decl-prefix-search (invoked only from
c-find-decl-spots), and has something to do with the call to
re-search-forward there, but I've not manage to pin down exactly what
the cause is.

> There is also a new and noticeable delay (0.5-1 s) in the very
> beginning when scrolling through the file. (This is with the frame
> sized to show 41 lines of 80 chars of a window, excluding mode line
> and echo area.)

This seems still to be there.  I'll admit, I haven't really looked at
this yet.

Anyhow, please try out the (?)final version of my patch before I commit
it and close the bug.  It should apply cleanly to the master branch.  I
might well split it into three changes, two small, one large, since
there are, in a sense three distinct fixes there.

Thanks!



diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
index 252eec138c..2365085036 100644
--- a/lisp/progmodes/cc-engine.el
+++ b/lisp/progmodes/cc-engine.el
@@ -972,7 +972,7 @@ c-beginning-of-statement-1
       ;; that we've moved.
       (while (progn
 	       (setq pos (point))
-	       (c-backward-syntactic-ws)
+	       (c-backward-syntactic-ws lim)
 	       ;; Protect post-++/-- operators just before a virtual semicolon.
 	       (and (not (c-at-vsemi-p))
 		    (/= (skip-chars-backward "-+!*&~@`#") 0))))
@@ -984,7 +984,7 @@ c-beginning-of-statement-1
       (if (and (memq (char-before) delims)
 	       (progn (forward-char -1)
 		      (setq saved (point))
-		      (c-backward-syntactic-ws)
+		      (c-backward-syntactic-ws lim)
 		      (or (memq (char-before) delims)
 			  (memq (char-before) '(?: nil))
 			  (eq (char-syntax (char-before)) ?\()
@@ -1164,7 +1164,7 @@ c-beginning-of-statement-1
                 ;; HERE IS THE SINGLE PLACE INSIDE THE PDA LOOP WHERE WE MOVE
 		;; BACKWARDS THROUGH THE SOURCE.
 
-		(c-backward-syntactic-ws)
+		(c-backward-syntactic-ws lim)
 		(let ((before-sws-pos (point))
 		      ;; The end position of the area to search for statement
 		      ;; barriers in this round.
@@ -1174,33 +1174,35 @@ c-beginning-of-statement-1
 		  ;; Go back over exactly one logical sexp, taking proper
 		  ;; account of macros and escaped EOLs.
 		  (while
-		      (progn
-			(setq comma-delimited (and (not comma-delim)
-						   (eq (char-before) ?\,)))
-			(unless (c-safe (c-backward-sexp) t)
-			  ;; Give up if we hit an unbalanced block.  Since the
-			  ;; stack won't be empty the code below will report a
-			  ;; suitable error.
-			  (setq pre-stmt-found t)
-			  (throw 'loop nil))
-			(cond
-			 ;; Have we moved into a macro?
-			 ((and (not macro-start)
-			       (c-beginning-of-macro))
-			  (save-excursion
-			    (c-backward-syntactic-ws)
-			    (setq before-sws-pos (point)))
-			  ;; Have we crossed a statement boundary?  If not,
-			  ;; keep going back until we find one or a "real" sexp.
-			  (and
+		      (and
+		       (progn
+			 (setq comma-delimited (and (not comma-delim)
+						    (eq (char-before) ?\,)))
+			 (unless (c-safe (c-backward-sexp) t)
+			   ;; Give up if we hit an unbalanced block.  Since the
+			   ;; stack won't be empty the code below will report a
+			   ;; suitable error.
+			   (setq pre-stmt-found t)
+			   (throw 'loop nil))
+			 (cond
+			  ;; Have we moved into a macro?
+			  ((and (not macro-start)
+				(c-beginning-of-macro))
 			   (save-excursion
-			     (c-end-of-macro)
-			     (not (c-crosses-statement-barrier-p
-				   (point) maybe-after-boundary-pos)))
-			   (setq maybe-after-boundary-pos (point))))
-			 ;; Have we just gone back over an escaped NL?  This
-			 ;; doesn't count as a sexp.
-			 ((looking-at "\\\\$")))))
+			     (c-backward-syntactic-ws lim)
+			     (setq before-sws-pos (point)))
+			   ;; Have we crossed a statement boundary?  If not,
+			   ;; keep going back until we find one or a "real" sexp.
+			   (and
+			    (save-excursion
+			      (c-end-of-macro)
+			      (not (c-crosses-statement-barrier-p
+				    (point) maybe-after-boundary-pos)))
+			    (setq maybe-after-boundary-pos (point))))
+			  ;; Have we just gone back over an escaped NL?  This
+			  ;; doesn't count as a sexp.
+			  ((looking-at "\\\\$"))))
+		       (>= (point) lim)))
 
 		  ;; Have we crossed a statement boundary?
 		  (setq boundary-pos
@@ -1413,7 +1415,7 @@ c-beginning-of-statement-1
 
       ;; Skip over the unary operators that can start the statement.
       (while (progn
-	       (c-backward-syntactic-ws)
+	       (c-backward-syntactic-ws lim)
 	       ;; protect AWK post-inc/decrement operators, etc.
 	       (and (not (c-at-vsemi-p (point)))
 		    (/= (skip-chars-backward "-.+!*&~@`#") 0)))
@@ -3568,15 +3570,18 @@ c-get-fallback-scan-pos
   ;; Return a start position for building `c-state-cache' from
   ;; scratch.  This will be at the top level, 2 defuns back.
   (save-excursion
-    ;; Go back 2 bods, but ignore any bogus positions returned by
-    ;; beginning-of-defun (i.e. open paren in column zero).
-    (goto-char here)
-    (let ((cnt 2))
-      (while (not (or (bobp) (zerop cnt)))
-	(c-beginning-of-defun-1)	; Pure elisp BOD.
-	(if (eq (char-after) ?\{)
-	    (setq cnt (1- cnt)))))
-    (point)))
+    (save-restriction
+      (when (> here (* 10 c-state-cache-too-far))
+	(narrow-to-region (- here (* 10 c-state-cache-too-far)) here))
+      ;; Go back 2 bods, but ignore any bogus positions returned by
+      ;; beginning-of-defun (i.e. open paren in column zero).
+      (goto-char here)
+      (let ((cnt 2))
+	(while (not (or (bobp) (zerop cnt)))
+	  (c-beginning-of-defun-1)	; Pure elisp BOD.
+	  (if (eq (char-after) ?\{)
+	      (setq cnt (1- cnt)))))
+      (point))))
 
 (defun c-state-balance-parens-backwards (here- here+ top)
   ;; Return the position of the opening paren/brace/bracket before HERE- which
@@ -3667,9 +3672,7 @@ c-parse-state-get-strategy
 	    how-far 0))
      ((<= good-pos here)
       (setq strategy 'forward
-	    start-point (if changed-macro-start
-			    cache-pos
-			  (max good-pos cache-pos))
+	    start-point (max good-pos cache-pos)
 	    how-far (- here start-point)))
      ((< (- good-pos here) (- here cache-pos)) ; FIXME!!! ; apply some sort of weighting.
       (setq strategy 'backward
@@ -4337,8 +4340,12 @@ c-invalidate-state-cache-1
       (if (and dropped-cons
 	       (<= too-high-pa here))
 	  (c-append-lower-brace-pair-to-state-cache too-high-pa here here-bol))
-      (setq c-state-cache-good-pos (or (c-state-cache-after-top-paren)
-				       (c-state-get-min-scan-pos)))))
+      (if (and c-state-cache-good-pos (< here c-state-cache-good-pos))
+	  (setq c-state-cache-good-pos
+		(or (save-excursion
+		      (goto-char here)
+		      (c-literal-start))
+		    here)))))
 
   ;; The brace-pair desert marker:
   (when (car c-state-brace-pair-desert)
@@ -4796,7 +4803,7 @@ c-on-identifier
 
      ;; Handle the "operator +" syntax in C++.
      (when (and c-overloadable-operators-regexp
-		(= (c-backward-token-2 0) 0))
+		(= (c-backward-token-2 0 nil (c-determine-limit 500)) 0))
 
        (cond ((and (looking-at c-overloadable-operators-regexp)
 		   (or (not c-opt-op-identifier-prefix)
@@ -5065,7 +5072,8 @@ c-backward-token-2
 	  (while (and
 		  (> count 0)
 		  (progn
-		    (c-backward-syntactic-ws)
+		    (c-backward-syntactic-ws
+		     limit)
 		    (backward-char)
 		    (if (looking-at jump-syntax)
 			(goto-char (scan-sexps (1+ (point)) -1))
@@ -5402,8 +5410,12 @@ c-syntactic-skip-backward
 	       ;; Optimize for, in particular, large blocks of comments from
 	       ;; `comment-region'.
 	       (progn (when opt-ws
-			(c-backward-syntactic-ws)
-			(setq paren-level-pos (point)))
+			(let ((opt-pos (point)))
+			  (c-backward-syntactic-ws limit)
+			  (if (or (null limit)
+			      (> (point) limit))
+			      (setq paren-level-pos (point))
+			    (goto-char opt-pos))))
 		      t)
 	       ;; Move back to a candidate end point which isn't in a literal
 	       ;; or in a macro we didn't start in.
@@ -5423,7 +5435,11 @@ c-syntactic-skip-backward
 				     (setq macro-start (point))))
 			    (goto-char macro-start))))
 		   (when opt-ws
-		     (c-backward-syntactic-ws)))
+		     (let ((opt-pos (point)))
+		       (c-backward-syntactic-ws limit)
+		       (if (and limit
+			   (<= (point) limit))
+			   (goto-char opt-pos)))))
 		 (< (point) pos))
 
 	       ;; Check whether we're at the wrong level of nesting (when
@@ -5474,7 +5490,7 @@ c-syntactic-skip-backward
 	     (progn
 	       ;; Skip syntactic ws afterwards so that we don't stop at the
 	       ;; end of a comment if `skip-chars' is something like "^/".
-	       (c-backward-syntactic-ws)
+	       (c-backward-syntactic-ws limit)
 	       (point)))))
 
     ;; We might want to extend this with more useful return values in
@@ -5762,12 +5778,23 @@ c-literal-type
 	      (t 'c)))			; Assuming the range is valid.
     range))
 
+(defun c-determine-limit-no-macro (here org-start)
+  ;; If HERE is inside a macro, and ORG-START is not also in the same macro,
+  ;; return the beginning of the macro.  Otherwise return HERE.  Point is not
+  ;; preserved by this function.
+  (goto-char here)
+  (let ((here-BOM (and (c-beginning-of-macro) (point))))
+    (if (and here-BOM
+	     (not (eq (progn (goto-char org-start)
+			     (and (c-beginning-of-macro) (point)))
+		      here-BOM)))
+	here-BOM
+      here)))
+
 (defsubst c-determine-limit-get-base (start try-size)
   ;; Get a "safe place" approximately TRY-SIZE characters before START.
   ;; This defsubst doesn't preserve point.
   (goto-char start)
-  (c-backward-syntactic-ws)
-  (setq start (point))
   (let* ((pos (max (- start try-size) (point-min)))
 	 (s (c-semi-pp-to-literal pos))
 	 (cand (or (car (cddr s)) pos)))
@@ -5776,20 +5803,23 @@ c-determine-limit-get-base
       (parse-partial-sexp pos start nil nil (car s) 'syntax-table)
       (point))))
 
-(defun c-determine-limit (how-far-back &optional start try-size)
+(defun c-determine-limit (how-far-back &optional start try-size org-start)
   ;; Return a buffer position approximately HOW-FAR-BACK non-literal
   ;; characters from START (default point).  The starting position, either
   ;; point or START may not be in a comment or string.
   ;;
   ;; The position found will not be before POINT-MIN and won't be in a
-  ;; literal.
+  ;; literal.  It will also not be inside a macro, unless START/point is also
+  ;; in the same macro.
   ;;
   ;; We start searching for the sought position TRY-SIZE (default
   ;; twice HOW-FAR-BACK) bytes back from START.
   ;;
   ;; This function must be fast.  :-)
+
   (save-excursion
     (let* ((start (or start (point)))
+	   (org-start (or org-start start))
 	   (try-size (or try-size (* 2 how-far-back)))
 	   (base (c-determine-limit-get-base start try-size))
 	   (pos base)
@@ -5842,21 +5872,27 @@ c-determine-limit
 	(setq elt (car stack)
 	      stack (cdr stack))
 	(setq count (+ count (cdr elt))))
-
-      ;; Have we found enough yet?
       (cond
        ((null elt)			; No non-literal characters found.
-	(if (> base (point-min))
-	    (c-determine-limit how-far-back base (* 2 try-size))
-	  (point-min)))
+	(cond
+	 ((> pos start)			; Nothing but literals
+	  base)
+	 ((> base (point-min))
+	  (c-determine-limit how-far-back base (* 2 try-size) org-start))
+	 (t base)))
        ((>= count how-far-back)
-	(+ (car elt) (- count how-far-back)))
+	(c-determine-limit-no-macro
+	(+ (car elt) (- count how-far-back))
+	org-start))
        ((eq base (point-min))
 	(point-min))
        ((> base (- start try-size)) ; Can only happen if we hit point-min.
-	(car elt))
+	(c-determine-limit-no-macro
+	(car elt)
+	org-start))
        (t
-	(c-determine-limit (- how-far-back count) base (* 2 try-size)))))))
+	(c-determine-limit (- how-far-back count) base (* 2 try-size)
+			   org-start))))))
 
 (defun c-determine-+ve-limit (how-far &optional start-pos)
   ;; Return a buffer position about HOW-FAR non-literal characters forward
@@ -6153,7 +6189,8 @@ c-bs-at-toplevel-p
     (or (null stack)			; Probably unnecessary.
 	(<= (cadr stack) 1))))
 
-(defmacro c-find-decl-prefix-search ()
+(defmacro
+    c-find-decl-prefix-search ()
   ;; Macro used inside `c-find-decl-spots'.  It ought to be a defun,
   ;; but it contains lots of free variables that refer to things
   ;; inside `c-find-decl-spots'.  The point is left at `cfd-match-pos'
@@ -6248,8 +6285,14 @@ c-find-decl-prefix-search
        ;; preceding syntactic ws to set `cfd-match-pos' and to catch
        ;; any decl spots in the syntactic ws.
        (unless cfd-re-match
-	 (c-backward-syntactic-ws)
-	 (setq cfd-re-match (point))))
+	 (let ((cfd-cbsw-lim
+		(max (- (point) 1000) (point-min))))
+	   (c-backward-syntactic-ws cfd-cbsw-lim)
+	   (setq cfd-re-match
+		 (if (or (bobp) (> (point) cfd-cbsw-lim))
+		     (point)
+		   (point-min))))  ; Set BOB case if the token's too far back.
+	 ))
 
      ;; Choose whichever match is closer to the start.
      (if (< cfd-re-match cfd-prop-match)
@@ -6410,7 +6453,7 @@ c-find-decl-spots
 	   (while (and (not (bobp))
 		       (c-got-face-at (1- (point)) c-literal-faces))
 	     (goto-char (previous-single-property-change
-			 (point) 'face nil (point-min))))
+			 (point) 'face nil (point-min)))) ; No limit.  FIXME, perhaps?  2020-12-07.
 
 	   ;; XEmacs doesn't fontify the quotes surrounding string
 	   ;; literals.
@@ -6482,12 +6525,15 @@ c-find-decl-spots
 	(c-invalidate-find-decl-cache cfd-start-pos)
 
 	(setq syntactic-pos (point))
-	(unless (eq syntactic-pos c-find-decl-syntactic-pos)
+	(unless
+	    (eq syntactic-pos c-find-decl-syntactic-pos)
 	  ;; Don't have to do this if the cache is relevant here,
 	  ;; typically if the same line is refontified again.  If
 	  ;; we're just some syntactic whitespace further down we can
 	  ;; still use the cache to limit the skipping.
-	  (c-backward-syntactic-ws c-find-decl-syntactic-pos))
+	  (c-backward-syntactic-ws 
+	   (max (or c-find-decl-syntactic-pos (point-min))
+		(- (point) 10000) (point-min))))
 
 	;; If we hit `c-find-decl-syntactic-pos' and
 	;; `c-find-decl-match-pos' is set then we install the cached
@@ -6613,7 +6659,8 @@ c-find-decl-spots
 	  ;; syntactic ws.
 	  (when (and cfd-match-pos (< cfd-match-pos syntactic-pos))
 	    (goto-char syntactic-pos)
-	    (c-forward-syntactic-ws)
+	    (c-forward-syntactic-ws
+	     (min (+ (point) 2000) (point-max)))
 	    (and cfd-continue-pos
 		 (< cfd-continue-pos (point))
 		 (setq cfd-token-pos (point))))
@@ -6654,7 +6701,8 @@ c-find-decl-spots
 			;; can't be nested, and that's already been done in
 			;; `c-find-decl-prefix-search'.
 			(when (> cfd-continue-pos cfd-token-pos)
-			  (c-forward-syntactic-ws)
+			  (c-forward-syntactic-ws
+			   (min (+ (point) 2000) (point-max)))
 			  (setq cfd-token-pos (point)))
 
 			;; Continue if the following token fails the
@@ -8817,7 +8865,7 @@ c-back-over-member-initializer-braces
     (or res (goto-char here))
     res))
 
-(defmacro c-back-over-list-of-member-inits ()
+(defmacro c-back-over-list-of-member-inits (limit)
   ;; Go back over a list of elements, each looking like:
   ;; <symbol> (<expression>) ,
   ;; or <symbol> {<expression>} , (with possibly a <....> expressions
@@ -8826,21 +8874,21 @@ c-back-over-list-of-member-inits
   ;; a comma.  If either of <symbol> or bracketed <expression> is missing,
   ;; throw nil to 'level.  If the terminating } or ) is unmatched, throw nil
   ;; to 'done.  This is not a general purpose macro!
-  '(while (eq (char-before) ?,)
+  `(while (eq (char-before) ?,)
      (backward-char)
-     (c-backward-syntactic-ws)
+     (c-backward-syntactic-ws ,limit)
      (when (not (memq (char-before) '(?\) ?})))
        (throw 'level nil))
      (when (not (c-go-list-backward))
        (throw 'done nil))
-     (c-backward-syntactic-ws)
+     (c-backward-syntactic-ws ,limit)
      (while (eq (char-before) ?>)
        (when (not (c-backward-<>-arglist nil))
 	 (throw 'done nil))
-       (c-backward-syntactic-ws))
+       (c-backward-syntactic-ws ,limit))
      (when (not (c-back-over-compound-identifier))
        (throw 'level nil))
-     (c-backward-syntactic-ws)))
+     (c-backward-syntactic-ws ,limit)))
 
 (defun c-back-over-member-initializers (&optional limit)
   ;; Test whether we are in a C++ member initializer list, and if so, go back
@@ -8859,14 +8907,14 @@ c-back-over-member-initializers
 	    (catch 'done
 	      (setq level-plausible
 		    (catch 'level
-		      (c-backward-syntactic-ws)
+		      (c-backward-syntactic-ws limit)
 		      (when (memq (char-before) '(?\) ?}))
 			(when (not (c-go-list-backward))
 			  (throw 'done nil))
-			(c-backward-syntactic-ws))
+			(c-backward-syntactic-ws limit))
 		      (when (c-back-over-compound-identifier)
-			(c-backward-syntactic-ws))
-		      (c-back-over-list-of-member-inits)
+			(c-backward-syntactic-ws limit))
+		      (c-back-over-list-of-member-inits limit)
 		      (and (eq (char-before) ?:)
 			   (save-excursion
 			     (c-backward-token-2)
@@ -8880,14 +8928,14 @@ c-back-over-member-initializers
 		(setq level-plausible
 		      (catch 'level
 			(goto-char pos)
-			(c-backward-syntactic-ws)
+			(c-backward-syntactic-ws limit)
 			(when (not (c-back-over-compound-identifier))
 			  (throw 'level nil))
-			(c-backward-syntactic-ws)
-			(c-back-over-list-of-member-inits)
+			(c-backward-syntactic-ws limit)
+			(c-back-over-list-of-member-inits limit)
 			(and (eq (char-before) ?:)
 			     (save-excursion
-			       (c-backward-token-2)
+			       (c-backward-token-2 nil nil limit)
 			       (not (looking-at c-:$-multichar-token-regexp)))
 			     (c-just-after-func-arglist-p)))))
 
@@ -12012,7 +12060,7 @@ c-looking-at-inexpr-block
 	(goto-char haskell-op-pos))
 
       (while (and (eq res 'maybe)
-		  (progn (c-backward-syntactic-ws)
+		  (progn (c-backward-syntactic-ws lim)
 			 (> (point) closest-lim))
 		  (not (bobp))
 		  (progn (backward-char)
@@ -12783,7 +12831,7 @@ c-guess-basic-syntax
 		  (setq paren-state (cons containing-sexp paren-state)
 			containing-sexp nil)))
 	      (setq lim (1+ containing-sexp))))
-	(setq lim (point-min)))
+	(setq lim (c-determine-limit 1000)))
 
       ;; If we're in a parenthesis list then ',' delimits the
       ;; "statements" rather than being an operator (with the
@@ -13025,7 +13073,9 @@ c-guess-basic-syntax
        ;; CASE 4: In-expression statement.  C.f. cases 7B, 16A and
        ;; 17E.
        ((setq placeholder (c-looking-at-inexpr-block
-			   (c-safe-position containing-sexp paren-state)
+			   (or
+			    (c-safe-position containing-sexp paren-state)
+			    (c-determine-limit 1000 containing-sexp))
 			   containing-sexp
 			   ;; Have to turn on the heuristics after
 			   ;; the point even though it doesn't work
@@ -13150,7 +13200,8 @@ c-guess-basic-syntax
 	 ;; init lists can, in practice, be very large.
 	 ((save-excursion
 	    (when (and (c-major-mode-is 'c++-mode)
-		       (setq placeholder (c-back-over-member-initializers)))
+		       (setq placeholder (c-back-over-member-initializers
+					  lim)))
 	      (setq tmp-pos (point))))
 	  (if (= (c-point 'bosws) (1+ tmp-pos))
 		(progn
@@ -13469,7 +13520,7 @@ c-guess-basic-syntax
 	 ;; CASE 5I: ObjC method definition.
 	 ((and c-opt-method-key
 	       (looking-at c-opt-method-key))
-	  (c-beginning-of-statement-1 nil t)
+	  (c-beginning-of-statement-1 (c-determine-limit 1000) t)
 	  (if (= (point) indent-point)
 	      ;; Handle the case when it's the first (non-comment)
 	      ;; thing in the buffer.  Can't look for a 'same return
@@ -13542,7 +13593,16 @@ c-guess-basic-syntax
 			  (if (>= (point) indent-point)
 			      (throw 'not-in-directive t))
 			  (setq placeholder (point)))
-			nil)))))
+			nil))
+	         (and macro-start
+		      (not (c-beginning-of-statement-1 lim nil nil nil t))
+		      (setq placeholder
+			    (let ((ps-top (car paren-state)))
+			      (if (consp ps-top)
+				  (progn
+				    (goto-char (cdr ps-top))
+				    (c-forward-syntactic-ws indent-point))
+				(point-min))))))))
 	  ;; For historic reasons we anchor at bol of the last
 	  ;; line of the previous declaration.  That's clearly
 	  ;; highly bogus and useless, and it makes our lives hard
@@ -13591,19 +13651,30 @@ c-guess-basic-syntax
 	       (eq (char-before) ?<)
 	       (not (and c-overloadable-operators-regexp
 			 (c-after-special-operator-id lim))))
-	  (c-beginning-of-statement-1 (c-safe-position (point) paren-state))
+	  (c-beginning-of-statement-1
+	   (or
+	    (c-safe-position (point) paren-state)
+	    (c-determine-limit 1000)))
 	  (c-add-syntax 'template-args-cont (c-point 'boi)))
 
 	 ;; CASE 5Q: we are at a statement within a macro.
-	 (macro-start
-	  (c-beginning-of-statement-1 containing-sexp)
+	 ((and
+	   macro-start
+	   (save-excursion
+	     (prog1
+		 (not (eq (c-beginning-of-statement-1
+			   (or containing-sexp (c-determine-limit 1000))
+			   nil nil nil t)
+			  nil)))
+	       (setq placeholder (point))))
+	  (goto-char placeholder)
 	  (c-add-stmt-syntax 'statement nil t containing-sexp paren-state))
 
-	 ;;CASE 5N: We are at a topmost continuation line and the only
+	 ;;CASE 5S: We are at a topmost continuation line and the only
 	 ;;preceding items are annotations.
 	 ((and (c-major-mode-is 'java-mode)
 	       (setq placeholder (point))
-	       (c-beginning-of-statement-1)
+	       (c-beginning-of-statement-1 lim)
 	       (progn
 		 (while (and (c-forward-annotation))
 		   (c-forward-syntactic-ws))
@@ -13615,7 +13686,9 @@ c-guess-basic-syntax
 
 	 ;; CASE 5M: we are at a topmost continuation line
 	 (t
-	  (c-beginning-of-statement-1 (c-safe-position (point) paren-state))
+	  (c-beginning-of-statement-1
+	   (or (c-safe-position (point) paren-state)
+	       (c-determine-limit 1000)))
 	  (when (c-major-mode-is 'objc-mode)
 	    (setq placeholder (point))
 	    (while (and (c-forward-objc-directive)
@@ -13671,8 +13744,9 @@ c-guess-basic-syntax
 		   (setq tmpsymbol '(block-open . inexpr-statement)
 			 placeholder
 			 (cdr-safe (c-looking-at-inexpr-block
-				    (c-safe-position containing-sexp
-						     paren-state)
+				    (or
+				     (c-safe-position containing-sexp paren-state)
+				     (c-determine-limit 1000 containing-sexp))
 				    containing-sexp)))
 		   ;; placeholder is nil if it's a block directly in
 		   ;; a function arglist.  That makes us skip out of
@@ -13804,7 +13878,9 @@ c-guess-basic-syntax
 			  (setq placeholder (c-guess-basic-syntax))))
 	      (setq c-syntactic-context placeholder)
 	    (c-beginning-of-statement-1
-	     (c-safe-position (1- containing-sexp) paren-state))
+	     (or
+	      (c-safe-position (1- containing-sexp) paren-state)
+	      (c-determine-limit 1000 (1- containing-sexp))))
 	    (c-forward-token-2 0)
 	    (while (cond
 		    ((looking-at c-specifier-key)
@@ -13838,7 +13914,8 @@ c-guess-basic-syntax
 	      (c-add-syntax 'brace-list-close (point))
 	    (setq lim (or (save-excursion
 			    (and
-			     (c-back-over-member-initializers)
+			     (c-back-over-member-initializers
+			      (c-determine-limit 1000))
 			     (point)))
 			  (c-most-enclosing-brace state-cache (point))))
 	    (c-beginning-of-statement-1 lim nil nil t)
@@ -13871,7 +13948,8 @@ c-guess-basic-syntax
 		(c-add-syntax 'brace-list-intro (point))
 	      (setq lim (or (save-excursion
 			      (and
-			       (c-back-over-member-initializers)
+			       (c-back-over-member-initializers
+				(c-determine-limit 1000))
 			       (point)))
 			    (c-most-enclosing-brace state-cache (point))))
 	      (c-beginning-of-statement-1 lim nil nil t)
@@ -13927,7 +14005,9 @@ c-guess-basic-syntax
 	 ;; CASE 16A: closing a lambda defun or an in-expression
 	 ;; block?  C.f. cases 4, 7B and 17E.
 	 ((setq placeholder (c-looking-at-inexpr-block
-			     (c-safe-position containing-sexp paren-state)
+			     (or
+			      (c-safe-position containing-sexp paren-state)
+			      (c-determine-limit 1000 containing-sexp))
 			     nil))
 	  (setq tmpsymbol (if (eq (car placeholder) 'inlambda)
 			      'inline-close
@@ -14090,7 +14170,9 @@ c-guess-basic-syntax
 	 ;; CASE 17E: first statement in an in-expression block.
 	 ;; C.f. cases 4, 7B and 16A.
 	 ((setq placeholder (c-looking-at-inexpr-block
-			     (c-safe-position containing-sexp paren-state)
+			     (or
+			      (c-safe-position containing-sexp paren-state)
+			      (c-determine-limit 1000 containing-sexp))
 			     nil))
 	  (setq tmpsymbol (if (eq (car placeholder) 'inlambda)
 			      'defun-block-intro
diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el
index bb7e5bea6e..166cbd7a49 100644
--- a/lisp/progmodes/cc-fonts.el
+++ b/lisp/progmodes/cc-fonts.el
@@ -947,7 +947,7 @@ c-font-lock-complex-decl-prepare
     ;; closest token before the region.
     (save-excursion
       (let ((pos (point)))
-	(c-backward-syntactic-ws)
+	(c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 	(c-clear-char-properties
 	 (if (and (not (bobp))
 		  (memq (c-get-char-property (1- (point)) 'c-type)
@@ -969,7 +969,7 @@ c-font-lock-complex-decl-prepare
     ;; The declared identifiers are font-locked correctly as types, if
     ;; that is what they are.
     (let ((prop (save-excursion
-		  (c-backward-syntactic-ws)
+		  (c-backward-syntactic-ws (max (- (point) 500) (point-min)))
 		  (unless (bobp)
 		    (c-get-char-property (1- (point)) 'c-type)))))
       (when (memq prop '(c-decl-id-start c-decl-type-start))
@@ -1008,15 +1008,24 @@ c-font-lock-<>-arglists
 	     (boundp 'parse-sexp-lookup-properties)))
 	  (c-parse-and-markup-<>-arglists t)
 	  c-restricted-<>-arglists
-	  id-start id-end id-face pos kwd-sym)
+	  id-start id-end id-face pos kwd-sym
+	  old-pos)
 
       (while (and (< (point) limit)
-		  (re-search-forward c-opt-<>-arglist-start limit t))
-
-	(setq id-start (match-beginning 1)
-	      id-end (match-end 1)
-	      pos (point))
-
+		  (setq old-pos (point))
+		  (c-syntactic-re-search-forward "<" limit t nil t))
+	(setq pos (point))
+	(save-excursion
+	  (backward-char)
+	  (c-backward-syntactic-ws old-pos)
+	  (if (re-search-backward
+	       (concat "\\(\\`\\|" c-nonsymbol-key "\\)\\(" c-symbol-key"\\)\\=")
+	       old-pos t)
+	      (setq id-start (match-beginning 2)
+		    id-end (match-end 2))
+	    (setq id-start nil id-end nil)))
+
+	(when id-start
 	(goto-char id-start)
 	(unless (c-skip-comments-and-strings limit)
 	  (setq kwd-sym nil
@@ -1033,7 +1042,7 @@ c-font-lock-<>-arglists
 		(when (looking-at c-opt-<>-sexp-key)
 		  ;; There's a special keyword before the "<" that tells
 		  ;; that it's an angle bracket arglist.
-		  (setq kwd-sym (c-keyword-sym (match-string 1)))))
+		  (setq kwd-sym (c-keyword-sym (match-string 2)))))
 
 	       (t
 		;; There's a normal identifier before the "<".  If we're not in
@@ -1067,7 +1076,7 @@ c-font-lock-<>-arglists
 						       'font-lock-type-face))))))
 
 		  (goto-char pos)))
-	    (goto-char pos))))))
+	    (goto-char pos)))))))
   nil)
 
 (defun c-font-lock-declarators (limit list types not-top
@@ -1496,7 +1505,8 @@ c-font-lock-declarations
 
 		 ;; Check we haven't missed a preceding "typedef".
 		 (when (not (looking-at c-typedef-key))
-		   (c-backward-syntactic-ws)
+		   (c-backward-syntactic-ws
+		    (max (- (point) 1000) (point-min)))
 		   (c-backward-token-2)
 		   (or (looking-at c-typedef-key)
 		       (goto-char start-pos)))
@@ -1536,8 +1546,10 @@ c-font-lock-declarations
 				     (c-backward-token-2)
 				     (and
 				      (not (looking-at c-opt-<>-sexp-key))
-				      (progn (c-backward-syntactic-ws)
-					     (memq (char-before) '(?\( ?,)))
+				      (progn
+					(c-backward-syntactic-ws
+					 (max (- (point) 1000) (point-min)))
+					(memq (char-before) '(?\( ?,)))
 				      (not (eq (c-get-char-property (1- (point))
 								    'c-type)
 					       'c-decl-arg-start))))))
@@ -2295,7 +2307,8 @@ c-font-lock-c++-using
 		  (and c-colon-type-list-re
 		       (c-go-up-list-backward)
 		       (eq (char-after) ?{)
-		       (eq (car (c-beginning-of-decl-1)) 'same)
+		       (eq (car (c-beginning-of-decl-1
+				 (c-determine-limit 1000))) 'same)
 		       (looking-at c-colon-type-list-re)))
 		;; Inherited protected member: leave unfontified
 		)
diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
index d6089ea295..4d1aeaa5cb 100644
--- a/lisp/progmodes/cc-langs.el
+++ b/lisp/progmodes/cc-langs.el
@@ -699,6 +699,7 @@ c-populate-syntax-table
   ;; The same thing regarding Unicode identifiers applies here as to
   ;; `c-symbol-key'.
   t (concat "[" (c-lang-const c-nonsymbol-chars) "]"))
+(c-lang-defvar c-nonsymbol-key (c-lang-const c-nonsymbol-key))
 
 (c-lang-defconst c-identifier-ops
   "The operators that make up fully qualified identifiers.  nil in
diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el
index c5201d1af5..df9709df94 100644
--- a/lisp/progmodes/cc-mode.el
+++ b/lisp/progmodes/cc-mode.el
@@ -499,11 +499,14 @@ c-unfind-coalesced-tokens
   (save-excursion
     (when (< beg end)
       (goto-char beg)
+      (let ((lim (c-determine-limit 1000))
+	    (lim+ (c-determine-+ve-limit 1000 end)))
       (when
 	  (and (not (bobp))
-	       (progn (c-backward-syntactic-ws) (eq (point) beg))
+	       (progn (c-backward-syntactic-ws lim) (eq (point) beg))
 	       (/= (skip-chars-backward c-symbol-chars (1- (point))) 0)
-	       (progn (goto-char beg) (c-forward-syntactic-ws) (<= (point) end))
+	       (progn (goto-char beg) (c-forward-syntactic-ws lim+)
+		      (<= (point) end))
 	       (> (point) beg)
 	       (goto-char end)
 	       (looking-at c-symbol-char-key))
@@ -514,14 +517,14 @@ c-unfind-coalesced-tokens
       (goto-char end)
       (when
 	  (and (not (eobp))
-	       (progn (c-forward-syntactic-ws) (eq (point) end))
+	       (progn (c-forward-syntactic-ws lim+) (eq (point) end))
 	       (looking-at c-symbol-char-key)
-	       (progn (c-backward-syntactic-ws) (>= (point) beg))
+	       (progn (c-backward-syntactic-ws lim) (>= (point) beg))
 	       (< (point) end)
 	       (/= (skip-chars-backward c-symbol-chars (1- (point))) 0))
 	(goto-char (1+ end))
 	(c-end-of-current-token)
-	(c-unfind-type (buffer-substring-no-properties end (point)))))))
+	(c-unfind-type (buffer-substring-no-properties end (point))))))))
 
 ;; c-maybe-stale-found-type records a place near the region being
 ;; changed where an element of `found-types' might become stale.  It
@@ -1993,10 +1996,10 @@ c-before-change
 		;; inserting stuff after "foo" in "foo bar;", or
 		;; before "foo" in "typedef foo *bar;"?
 		;;
-		;; We search for appropriate c-type properties "near"
-		;; the change.  First, find an appropriate boundary
-		;; for this property search.
-		(let (lim
+		;; We search for appropriate c-type properties "near" the
+		;; change.  First, find an appropriate boundary for this
+		;; property search.
+		(let (lim lim-2
 		      type type-pos
 		      marked-id term-pos
 		      (end1
@@ -2007,8 +2010,11 @@ c-before-change
 		  (when (>= end1 beg) ; Don't hassle about changes entirely in
 					; comments.
 		    ;; Find a limit for the search for a `c-type' property
+		    ;; Point is currently undefined.  A `goto-char' somewhere is needed.  (2020-12-06).
+		    (setq lim-2 (c-determine-limit 1000 (point) ; that is wrong.  FIXME!!!  (2020-12-06)
+						   ))
 		    (while
-			(and (/= (skip-chars-backward "^;{}") 0)
+			(and (/= (skip-chars-backward "^;{}" lim-2) 0)
 			     (> (point) (point-min))
 			     (memq (c-get-char-property (1- (point)) 'face)
 				   '(font-lock-comment-face font-lock-string-face))))
@@ -2032,7 +2038,8 @@ c-before-change
 				(buffer-substring-no-properties (point) type-pos)))
 
 			(goto-char end1)
-			(skip-chars-forward "^;{}") ; FIXME!!!  loop for
+			(setq lim-2 (c-determine-+ve-limit 1000))
+			(skip-chars-forward "^;{}" lim-2) ; FIXME!!!  loop for
 					; comment, maybe
 			(setq lim (point))
 			(setq term-pos
@@ -2270,9 +2277,11 @@ c-fl-decl-end
   ;; preserved.
   (goto-char pos)
   (let ((lit-start (c-literal-start))
+	(lim (c-determine-limit 1000))
 	enclosing-attribute pos1)
     (unless lit-start
-      (c-backward-syntactic-ws)
+      (c-backward-syntactic-ws
+       lim)
       (when (setq enclosing-attribute (c-enclosing-c++-attribute))
 	(goto-char (car enclosing-attribute))) ; Only happens in C++ Mode.
       (when (setq pos1 (c-on-identifier))
@@ -2296,14 +2305,14 @@ c-fl-decl-end
 			   (setq pos1 (c-on-identifier))
 			   (goto-char pos1)
 			   (progn
-			     (c-backward-syntactic-ws)
+			     (c-backward-syntactic-ws lim)
 			     (eq (char-before) ?\())
 			   (c-fl-decl-end (1- (point))))
-			(c-backward-syntactic-ws)
+			(c-backward-syntactic-ws lim)
 			(point))))
 		 (and (progn (c-forward-syntactic-ws lim)
 			     (not (eobp)))
-		      (c-backward-syntactic-ws)
+		      (c-backward-syntactic-ws lim)
 		      (point)))))))))
 
 (defun c-change-expand-fl-region (_beg _end _old-len)


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
@ 2020-12-08 19:32                                 ` Mattias Engdegård
  2020-12-09  7:31                                 ` Ravine Var
  2020-12-09 17:00                                 ` Mattias Engdegård
  2 siblings, 0 replies; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-08 19:32 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

Hello Alan,

8 dec. 2020 kl. 19.42 skrev Alan Mackenzie <acm@muc.de>:
>   That's 133,000 lines, give or take.  Even our largest file,
> src/xdisp.c is only 36,000 lines.  I don't understand how a file
> describing hardware can come to anything like 133k lines.  It must be
> soul destroying to have to write a driver based on a file like this.
> That file was put together by AMD, and I suspect they didn't take all
> that much care to make it usable.

Those files are likely not hand-written but generated from a hardware description language where device registers are declared in more comfortable ways, and often are part of or at least have tie-ins to VLSI synthesis tools.

Nevertheless, there are quite big files that are crafted by hand, and in any case users need to look at them sooner or later in an editor anyway (hence the bug report), so the speed-up job here is essential and benefits everyone.

> There's one thing which still puzzles me.  In osprey_reg....h, when
> scrolling through it (e.g. with (time-scroll)), it stutters markedly at
> around 13% of the way through.

Tried applying my regexp patch? It should reduce the pain, which may indicate that the stuttering is caused by severe regexp backtracking effects.

> Anyhow, please try out the (?)final version of my patch before I commit
> it and close the bug.  It should apply cleanly to the master branch.  I
> might well split it into three changes, two small, one large, since
> there are, in a sense three distinct fixes there.

Thank you very much, I'll take a look, and as promised I'll put together a more detailed guide to what I think could be done about some of the regexps.







^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
  2020-12-08 19:32                                 ` Mattias Engdegård
@ 2020-12-09  7:31                                 ` Ravine Var
  2020-12-09  7:47                                   ` Ravine Var
                                                     ` (2 more replies)
  2020-12-09 17:00                                 ` Mattias Engdegård
  2 siblings, 3 replies; 45+ messages in thread
From: Ravine Var @ 2020-12-09  7:31 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Alan Mackenzie <acm@muc.de> writes:
> Anyhow, please try out the (?)final version of my patch before I commit
> it and close the bug.  It should apply cleanly to the master branch.  I
> might well split it into three changes, two small, one large, since
> there are, in a sense three distinct fixes there.

I tested this patch, along with Mattias' patch posted earlier, on two
machines.

On a reasonably fast machine (AMD Ryzen 3 3200G with 16 GB RAM), there
is a marked improvement in visiting and scrolling the header files
in the linux kernel tree. The complete lockups that happened earlier
did not happen.

I also tested the patches on a Chromebook (Intel Celeron N2840 with 4GB
RAM), which is similar to the machine in the original report.
Unfortunately, the behavior was still bad, with lockups and freezing.
I tried both c-mode and c++-mode with font-lock-maximum-decoration set
to 2.





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-09  7:31                                 ` Ravine Var
@ 2020-12-09  7:47                                   ` Ravine Var
  2020-12-10  8:08                                     ` Alan Mackenzie
  2020-12-09 18:46                                   ` Alan Mackenzie
       [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
  2 siblings, 1 reply; 45+ messages in thread
From: Ravine Var @ 2020-12-09  7:47 UTC (permalink / raw)
  To: 25706

I came across another place where a similar lockup happens
(even with the patches posted here).

https://gitlab.com/wireshark/wireshark/-/raw/master/epan/dissectors/packet-rrc.c

Towards the end of the file, once we get to the function
proto_register_rrc(void), the slowdown of scrolling starts and eventually
things freeze.

Just copying that function to a smaller C file is enough to
reproduce the issue. (I found that C-M-h is a nifty command to do this.)

I can open a new bug report if required.





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
  2020-12-08 19:32                                 ` Mattias Engdegård
  2020-12-09  7:31                                 ` Ravine Var
@ 2020-12-09 17:00                                 ` Mattias Engdegård
  2020-12-10 12:26                                   ` Alan Mackenzie
  2 siblings, 1 reply; 45+ messages in thread
From: Mattias Engdegård @ 2020-12-09 17:00 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Lars Ingebrigtsen, 25706

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

First, some Emacs regexp basics:

1. If A and B match single characters, then A\|B should be written [AB] whenever possible. The reason is that A\|B adds a backtrack record which uses stack space and wastes time if matching fails later on. The cost can be quite noticeable, which we have seen.

2. Syntax-class constructs are usually better written as character alternatives when possible.

The \sX construct, for some X, is typically somewhat slower to match than explicitly listing the characters to match. For example, if all you care about are space and tab, then "\\s *" should be written "[ \t]*".

3. Unicode character classes are slower to match than ASCII-only ones. For example, [[:alpha:]] is slower than [A-Za-z], assuming only those characters are of interest.

4. [^...] will match \n unless included in the set. For example, "[^a]\\|$" will almost never match the $ (end-of-line) branch, because a newline will be matched by the first branch. The only exception is at the very end of the buffer if it is not newline-terminated, but that is rarely worth considering for source code.

5. \r (carriage return) normally doesn't appear in buffers even if the file uses DOS line endings. Line endings are converted into a single \n (newline) when the buffer is read. In particular, $ does NOT match at \r, only before \n.

When \r appears it is usually because the file contains a mixture of line-ending styles, typically from being edited using broken tools. Whether you want to take such files into account is a matter of judgement; most modes don't bother.

6. Capturing groups costs more than non-capturing groups, but you already know that.

On to specifics: here are annotations for possible improvements in cc-langs.el. (I didn't bother about capturing groups here.)


[-- Attachment #2: cc-regexp-annot.diff --]
[-- Type: application/octet-stream, Size: 7058 bytes --]

diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
index d6089ea295..695c41fce6 100644
--- a/lisp/progmodes/cc-langs.el
+++ b/lisp/progmodes/cc-langs.el
@@ -903,6 +903,7 @@ c-opt-cpp-prefix
   ;; TODO (ACM, 2005-04-01).  Amend the following to recognize escaped NLs;
   ;; amend all uses of c-opt-cpp-prefix which count regexp-depth.
   t "\\s *#\\s *"
+;;; XXX replace "\\s " with char alt, presumably [ \t] (2x)
   (java awk) nil)
 (c-lang-defvar c-opt-cpp-prefix (c-lang-const c-opt-cpp-prefix))
 
@@ -910,6 +911,7 @@ c-anchored-cpp-prefix
   "Regexp matching the prefix of a cpp directive anchored to BOL,
 in the languages that have a macro preprocessor."
   t "^\\s *\\(#\\)\\s *"
+;;; XXX replace "\\s " with char alt, presumably [ \t] (2x)
   (java awk) nil)
 (c-lang-defvar c-anchored-cpp-prefix (c-lang-const c-anchored-cpp-prefix))
 
@@ -920,6 +922,7 @@ c-opt-cpp-start
   t    (if (c-lang-const c-opt-cpp-prefix)
 	   (concat (c-lang-const c-opt-cpp-prefix)
 		   "\\([" c-alnum "]+\\)"))
+;;; XXX all cpp directives are lower-case ASCII letters; should be [a-z]+
   ;; Pike, being a scripting language, recognizes hash-bangs too.
   pike (concat (c-lang-const c-opt-cpp-prefix)
 	       "\\([" c-alnum "]+\\|!\\)"))
@@ -968,6 +971,8 @@ c-opt-cpp-macro-define-start
 	(concat (c-lang-const c-opt-cpp-prefix)
 		(c-lang-const c-opt-cpp-macro-define)
 		"[ \t]+\\(\\(\\sw\\|_\\)+\\)\\(([^)]*)\\)?"
+;;; XXX \\(\\sw\\|_\\)+ should be [[:word:]_]+,
+;;; XXX or more likely [[:alpha:]_][[:alnum:]_]*
 		;;       ^                 ^ #defined name
 		"\\([ \t]\\|\\\\\n\\)*")))
 (c-lang-defvar c-opt-cpp-macro-define-start
@@ -980,6 +985,8 @@ c-opt-cpp-macro-define-id
 	(concat (c-lang-const c-opt-cpp-prefix)	; #
 		(c-lang-const c-opt-cpp-macro-define) ; define
 		"[ \t]+\\(\\sw\\|_\\)+")))
+;;; XXX \\(\\sw\\|_\\)+ should be [[:word:]_]+,
+;;; XXX or more likely [[:alpha:]_][[:alnum:]_]*
 (c-lang-defvar c-opt-cpp-macro-define-id
   (c-lang-const c-opt-cpp-macro-define-id))
 
@@ -990,6 +997,10 @@ c-anchored-hash-define-no-parens
 	(concat (c-lang-const c-anchored-cpp-prefix)
 		(c-lang-const c-opt-cpp-macro-define)
 		"[ \t]+\\(\\sw\\|_\\)+\\([^(a-zA-Z0-9_]\\|$\\)")))
+;;; XXX \\(\\sw\\|_\\)+ should be [[:word:]_]+,
+;;; XXX or more likely [[:alpha:]_][[:alnum:]_]*
+;;; XXX but what about the ASCII-only tail? Besides, [^(a-zA-Z0-9_] will
+;;; XXX always match \n so the $ is almost never useful!
 
 (c-lang-defconst c-cpp-expr-directives
   "List of cpp directives (without the prefix) that are followed by an
@@ -1353,6 +1364,7 @@ c-assignment-op-regexp
 	(concat
 	 ;; Need special case for "=" since it's a prefix of "==".
 	 "=\\([^=]\\|$\\)"
+;;; XXX [^=] matches \n so the $ is almost never useful
 	 "\\|"
 	 (c-make-keywords-re nil
 	   (c--set-difference (c-lang-const c-assignment-operators)
@@ -1412,6 +1424,7 @@ c-<-pseudo-digraph-cont-regexp
 template opener followed by the \"::\" operator - usually."
   t regexp-unmatchable
   c++ "::\\([^:>]\\|$\\)")
+;;; XXX [^:>] matches \n so the $ is almost never useful
 (c-lang-defvar c-<-pseudo-digraph-cont-regexp
 	       (c-lang-const c-<-pseudo-digraph-cont-regexp))
 
@@ -1599,6 +1612,7 @@ c-simple-ws
 Does not contain a \\| operator at the top level."
   ;; "\\s " is not enough since it doesn't match line breaks.
   t "\\(\\s \\|[\n\r]\\)")
+;;; XXX replace with single char alt: [ \t\n\r\f]
 
 (c-lang-defconst c-simple-ws-depth
   ;; Number of regexp grouping parens in `c-simple-ws'.
@@ -1702,6 +1716,7 @@ c-last-c-comment-end-on-line-re
 comments.  When a match is found, submatch 1 contains the comment
 ender."
   t "\\(\\*/\\)\\([^*]\\|\\*+\\([^*/]\\|$\\)\\)*$"
+;;; XXX [^*/] matches \n so the $ is almost never useful
   awk nil)
 (c-lang-defvar c-last-c-comment-end-on-line-re
 	       (c-lang-const c-last-c-comment-end-on-line-re))
@@ -1778,6 +1793,7 @@ comment-start-skip
 			   (c-lang-const c-block-comment-starter)))
 	     "\\|")
 	    "\\)\\s *"))
+;;; XXX replace "\\s " with char alt, presumably [ \t]
 (c-lang-setvar comment-start-skip (c-lang-const comment-start-skip))
 
 (c-lang-defconst comment-end-can-be-escaped
@@ -1792,6 +1808,7 @@ c-syntactic-ws-start
   ;; Regexp matching any sequence that can start syntactic whitespace.
   ;; The only uncertain case is '#' when there are cpp directives.
   t (concat "\\s \\|"
+;;; XXX replace "\\s " with char alt, presumably [ \t]
 	    (c-make-keywords-re nil
 	      (append (list (c-lang-const c-line-comment-starter)
 			    (c-lang-const c-block-comment-starter)
@@ -1799,6 +1816,7 @@ c-syntactic-ws-start
 			      "#"))
 		      '("\n" "\r")))
 	    "\\|\\\\[\n\r]"
+;;; XXX unclear if \r is ever relevant here (2x)
 	    (when (memq 'gen-comment-delim c-emacs-features)
 	      "\\|\\s!")))
 (c-lang-defvar c-syntactic-ws-start (c-lang-const c-syntactic-ws-start))
@@ -1847,6 +1865,8 @@ c-unterminated-block-comment-regexp
 			"]"
 			"[^" (substring end 0 1) "\n\r]*"
 			"\\)*"))
+;;; XXX this is baroque, since c-block-comment-ender is either nil or "*/",
+;;; XXX so why not special case those and be done with it?
 	       (t
 		(error "Can't handle a block comment ender of length %s"
 		       (length end))))))))
@@ -1868,6 +1888,7 @@ c-block-comment-regexp
 	       ((= (length end) 2)
 		(concat (regexp-quote (substring end 0 1)) "+"
 			(regexp-quote (substring end 1 2))))
+;;; XXX see above; c-block-comment-ender is nil or "*/"
 	       (t
 		(error "Can't handle a block comment ender of length %s"
 		       (length end))))))))
@@ -1883,6 +1904,7 @@ c-nonwhite-syntactic-ws
 		     "[^\n\r]*[\n\r]"))
 	   (c-lang-const c-block-comment-regexp)
 	   "\\\\[\n\r]"
+;;; XXX \r here is probably unnecessary (3x)
 	   (when (memq 'gen-comment-delim c-emacs-features)
 	     "\\s!\\S!*\\s!"))
      "\\|"))
@@ -1927,6 +1949,7 @@ c-single-line-syntactic-ws
 		(c-lang-const c-block-comment-regexp)
 		"\\s *\\)*")
       "\\s *"))
+;;; XXX replace "\\s " with char alt, presumably [ \t] (3x)
 
 (c-lang-defconst c-single-line-syntactic-ws-depth
   ;; Number of regexp grouping parens in `c-single-line-syntactic-ws'.
@@ -3476,6 +3499,7 @@ c-type-decl-prefix-key
 	       "\\)"
 	       "\\([^=]\\|$\\)")
   pike "\\(\\*\\)\\([^=]\\|$\\)")
+;;; XXX [^=] matches \n so the $ is almost never useful (3x)
 (c-lang-defvar c-type-decl-prefix-key (c-lang-const c-type-decl-prefix-key)
   'dont-doc)
 
@@ -3498,6 +3522,7 @@ c-type-decl-operator-prefix-key
 	       "\\)"
 	       "\\([^=]\\|$\\)")
   pike "\\(\\*\\)\\([^=]\\|$\\)")
+;;; XXX [^=] matches \n so the $ is almost never useful (3x)
 (c-lang-defvar c-type-decl-operator-prefix-key
   (c-lang-const c-type-decl-operator-prefix-key))
 
@@ -3647,6 +3672,8 @@ c-pre-id-bracelist-key
 "
   t regexp-unmatchable
   c++ "new\\([^[:alnum:]_$]\\|$\\)\\|&&?\\(\\S.\\|$\\)")
+;;; XXX [^[:alnum:_$] matches \n so the $ is almost never useful
+;;; XXX \\S. matches \n so the $ is almost never useful
 (c-lang-defvar c-pre-id-bracelist-key (c-lang-const c-pre-id-bracelist-key))
 
 (c-lang-defconst c-recognize-typeless-decls

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-09  7:31                                 ` Ravine Var
  2020-12-09  7:47                                   ` Ravine Var
@ 2020-12-09 18:46                                   ` Alan Mackenzie
       [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
  2 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-09 18:46 UTC (permalink / raw)
  To: Ravine Var; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Hello, Ravine.

Thanks for doing all this testing!

On Wed, Dec 09, 2020 at 13:01:31 +0530, Ravine Var wrote:
> Alan Mackenzie <acm@muc.de> writes:
> > Anyhow, please try out the (?)final version of my patch before I commit
> > it and close the bug.  It should apply cleanly to the master branch.  I
> > might well split it into three changes, two small, one large, since
> > there are, in a sense three distinct fixes there.

> I tested this patch, along with Mattias' patch posted earlier, on two
> machines.

> On a reasonably fast machine (AMD Ryzen 3 3200G with 16 GB RAM), there
> is a marked improvement in visiting and scrolling the header files
> in the linux kernel tree. The complete lockups that happened earlier
> did not happen.

That is close to the spec of my machine, and I find that these large .h
files (without braces), with the patch, now work fast enough for me.

> I also tested the patches on a Chromebook (Intel Celeron N2840 with 4GB
> RAM), which is similar to the machine in the original report.
> Unfortunately, the behavior was still bad, with lockups and freezing.
> I tried both c-mode and c++-mode with font-lock-maximum-decoration set
> to 2.

Thank you indeed for taking the trouble to test the patch on the lesser
machine.  I do not have access to such a machine.  I am assuming that
before this patch, such a large file like osprey_reg....h would have
been completely unworkable on the machine.  It sounds as though it still
is.  However, have you noticed any improvement at all in performance?

Could I ask you please to do one more thing, and that is to take a
profile on this machine where it is giving trouble.  From a freshly
loaded buffer, move forward (if necessary) to a troublesome spot.  N.B.
C-u 1 M-> moves to 10% away from the end of the buffer, C-u 2 M-> 20%,
and so on.  Then start the profiler and do what is causing sluggish
performance.  Then have a look at the final profiler output, and expand
it sensibly so that the troublesome function can be found.

(Optional paragraph.)  How to use the profiler: Do M-x profiler-start
RET, and accept the default mode with another RET.  Perform the stuff to
be profiled.  Do M-x profiler-report, which gives three or four lines of
output, each with a number and a percentage.  Move point to a line with
a large percentage and type RET to expand it.  You can repeat this to
expand further.  Please expand the lines down to where the percentages
remaining are around 5% or 6%.  There will be quite a lot of lines near
the start showing the same large percentage.

Then could you please post that output here, so as to give me some idea
of where the poor performance is coming from.  Thanks!

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
@ 2020-12-09 20:04                                     ` Eli Zaretskii
  2020-12-09 20:32                                       ` Alan Mackenzie
  2020-12-10 17:02                                     ` Ravine Var
  1 sibling, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2020-12-09 20:04 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: mattiase, ravine.var, larsi, 25706

> Date: Wed, 9 Dec 2020 18:46:55 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Mattias Engdegård <mattiase@acm.org>,
>  Lars Ingebrigtsen <larsi@gnus.org>, 25706@debbugs.gnu.org
> 
> Move point to a line with a large percentage and type RET to expand
> it.  You can repeat this to expand further.  Please expand the lines
> down to where the percentages remaining are around 5% or 6%.  There
> will be quite a lot of lines near the start showing the same large
> percentage.

One can also expand everything with "C-u RET".





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-09 20:04                                     ` Eli Zaretskii
@ 2020-12-09 20:32                                       ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-09 20:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mattiase, ravine.var, larsi, 25706

Hello, Eli.

On Wed, Dec 09, 2020 at 22:04:20 +0200, Eli Zaretskii wrote:
> > Date: Wed, 9 Dec 2020 18:46:55 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: Mattias Engdegård <mattiase@acm.org>,
> >  Lars Ingebrigtsen <larsi@gnus.org>, 25706@debbugs.gnu.or

> > Move point to a line with a large percentage and type RET to expand
> > it.  You can repeat this to expand further.  Please expand the lines
> > down to where the percentages remaining are around 5% or 6%.  There
> > will be quite a lot of lines near the start showing the same large
> > percentage.

> One can also expand everything with "C-u RET".

Thanks.  I didn't know that.  I don't think that's in the Elisp manual.

Also useful would be a command to expand "everything which is
sufficiently big" for some value of "sufficiently big", to avoid swathes
of irrelevancies down at 1% or 0%.

I once tried to amend the profiler to move its statistics columns
further to the right, because I was seeing far too many truncated
function names.  But I gave up, because the code was masses and masses
of tiny functions, largely without doc strings or comments, and I just
couldn't make sense of it.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-09  7:47                                   ` Ravine Var
@ 2020-12-10  8:08                                     ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-10  8:08 UTC (permalink / raw)
  To: Ravine Var; +Cc: 25706

Hello again, Ravine.

On Wed, Dec 09, 2020 at 13:17:23 +0530, Ravine Var wrote:
> I came across another place where a similar lockup happens
> (even with the patches posted here).

> https://gitlab.com/wireshark/wireshark/-/raw/master/epan/dissectors/packet-rrc.c

> Towards the end of the file, once we get to the function
> proto_register_rrc(void), the slowdown of scrolling starts and eventually
> things freeze.

Outch!  That's a 50,000 line long function.  ;-(  I've lost some naivety
about "reasonableness" in the past week or two.

> Just copying that function to a smaller C file is enough to
> reproduce the issue. (I found that C-M-h is a nifty command to do this.)

> I can open a new bug report if required.

Would you do this, please.  The mechanism for the slowdown in that
function is entirely different from that in the .h files with lots of
macros.  In the .c file, there are lots and lots of braces, and it seems
we need a new cache to handle them faster.  In the .h files, there are
no braces, and we needed to put limits into backward searches.

Thanks again for taking the trouble to report all these bugs.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-09 17:00                                 ` Mattias Engdegård
@ 2020-12-10 12:26                                   ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-10 12:26 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, 25706

Hello, Mattias.

Thanks for this!

On Wed, Dec 09, 2020 at 18:00:30 +0100, Mattias Engdegård wrote:
> First, some Emacs regexp basics:

> 1. If A and B match single characters, then A\|B should be written
> [AB] whenever possible. The reason is that A\|B adds a backtrack
> record which uses stack space and wastes time if matching fails later
> on. The cost can be quite noticeable, which we have seen.

> 2. Syntax-class constructs are usually better written as character
> alternatives when possible.

> The \sX construct, for some X, is typically somewhat slower to match
> than explicitly listing the characters to match. For example, if all
> you care about are space and tab, then "\\s *" should be written "[
> \t]*".

> 3. Unicode character classes are slower to match than ASCII-only ones.
> For example, [[:alpha:]] is slower than [A-Za-z], assuming only those
> characters are of interest.

> 4. [^...] will match \n unless included in the set. For example,
> "[^a]\\|$" will almost never match the $ (end-of-line) branch, because
> a newline will be matched by the first branch. The only exception is
> at the very end of the buffer if it is not newline-terminated, but
> that is rarely worth considering for source code.

> 5. \r (carriage return) normally doesn't appear in buffers even if the
> file uses DOS line endings. Line endings are converted into a single
> \n (newline) when the buffer is read. In particular, $ does NOT match
> at \r, only before \n.

> When \r appears it is usually because the file contains a mixture of
> line-ending styles, typically from being edited using broken tools.
> Whether you want to take such files into account is a matter of
> judgement; most modes don't bother.

> 6. Capturing groups costs more than non-capturing groups, but you
> already know that.

> On to specifics: here are annotations for possible improvements in
> cc-langs.el. (I didn't bother about capturing groups here.)

I think we should get around to fixing the regexps in CC Mode soon.  But
I think I would rather do this as a separate exercise, since the patch
for this bug is already around 800 lines and Ravine Var, the OP, has
found further problems on a slowish machine.

In particular, some of the fixes in your patch relate to the CPP
constructs, and they might well be slowing down that regexp in
c-find-decl-spots I highlighted earlier.  So I'm keen to look at this
again, once the current bug is settled.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
  2020-12-09 20:04                                     ` Eli Zaretskii
@ 2020-12-10 17:02                                     ` Ravine Var
  2020-12-10 20:02                                       ` Alan Mackenzie
  1 sibling, 1 reply; 45+ messages in thread
From: Ravine Var @ 2020-12-10 17:02 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

> Thank you indeed for taking the trouble to test the patch on the lesser
> machine.  I do not have access to such a machine.  I am assuming that
> before this patch, such a large file like osprey_reg....h would have
> been completely unworkable on the machine.  It sounds as though it still
> is.  However, have you noticed any improvement at all in performance?

There is a marginal improvement - recovery from scroll lockups are
slightly faster. But, in general, working with the osprey header
file is still very painful.

> Could I ask you please to do one more thing, and that is to take a
> profile on this machine where it is giving trouble.  From a freshly
> loaded buffer, move forward (if necessary) to a troublesome spot.  N.B.
> C-u 1 M-> moves to 10% away from the end of the buffer, C-u 2 M-> 20%,
> and so on.  Then start the profiler and do what is causing sluggish
> performance.  Then have a look at the final profiler output, and expand
> it sensibly so that the troublesome function can be found.
>
> (Optional paragraph.)  How to use the profiler: Do M-x profiler-start
> RET, and accept the default mode with another RET.  Perform the stuff to
> be profiled.  Do M-x profiler-report, which gives three or four lines of
> output, each with a number and a percentage.  Move point to a line with
> a large percentage and type RET to expand it.  You can repeat this to
> expand further.  Please expand the lines down to where the percentages
> remaining are around 5% or 6%.  There will be quite a lot of lines near
> the start showing the same large percentage.

I opened the osprey file and started scrolling down and the screen
locked up. Here is the profile report (with emacs -Q):

https://gist.github.com/ravine-var/0c293968a902cde76af77f2872dde1d7

I am using emacs master (along with your patch) built with LTO enabled
and CFLAGS set to '-O2 -march=native'.





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-10 17:02                                     ` Ravine Var
@ 2020-12-10 20:02                                       ` Alan Mackenzie
  2020-12-11 10:55                                         ` Ravine Var
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-10 20:02 UTC (permalink / raw)
  To: Ravine Var; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Hello, Ravine.

On Thu, Dec 10, 2020 at 22:32:17 +0530, Ravine Var wrote:
> > Thank you indeed for taking the trouble to test the patch on the lesser
> > machine.  I do not have access to such a machine.  I am assuming that
> > before this patch, such a large file like osprey_reg....h would have
> > been completely unworkable on the machine.  It sounds as though it still
> > is.  However, have you noticed any improvement at all in performance?

> There is a marginal improvement - recovery from scroll lockups are
> slightly faster. But, in general, working with the osprey header
> file is still very painful.

OK, I still have some work to do, here.

> > Could I ask you please to do one more thing, and that is to take a
> > profile on this machine where it is giving trouble.  From a freshly
> > loaded buffer, move forward (if necessary) to a troublesome spot.  N.B.
> > C-u 1 M-> moves to 10% away from the end of the buffer, C-u 2 M-> 20%,
> > and so on.  Then start the profiler and do what is causing sluggish
> > performance.  Then have a look at the final profiler output, and expand
> > it sensibly so that the troublesome function can be found.

[ .... ]

> I opened the osprey file and started scrolling down and the screen
> locked up. Here is the profile report (with emacs -Q):

> https://gist.github.com/ravine-var/0c293968a902cde76af77f2872dde1d7

Thanks.  That was very helpful.  I've still got to analyse it more
deeply, but one thing that stood out (to me, at least), was
c-forward-name taking up 13% of the run time in your profile.  If we
include the garbage collection this will have caused, it might be as
high as 20% of the time, and that's right at the beginnning of your
buffer.

To fix this, can I ask you, please, to try adding the following patch to
your already patched software, and let me know if it helps at all.  If
it does, that's great, if not, could I ask you to do another profile for
me on the less powerful machine, say by opening the buffer, starting the
profiler, then moving to the middle of the buffer with C-u 5 M->.  This
may take some time to profile.  Thanks!

> I am using emacs master (along with your patch) built with LTO enabled
> and CFLAGS set to '-O2 -march=native'.

That's the ideal testing setup.

Here's that patch:



diff -r 863d08a1858a cc-engine.el
--- a/cc-engine.el	Thu Nov 26 11:27:52 2020 +0000
+++ b/cc-engine.el	Tue Dec 08 19:48:50 2020 +0000
@@ -8276,7 +8325,8 @@
 	;; typically called from `c-forward-type' in this case, and
 	;; the caller only wants the top level type that it finds to
 	;; be promoted.
-	c-promote-possible-types)
+	c-promote-possible-types
+	(lim+ (c-determine-+ve-limit 500)))
     (while
 	(and
 	 (looking-at c-identifier-key)
@@ -8306,7 +8359,7 @@
 
 		 ;; Handle a C++ operator or template identifier.
 		 (goto-char id-end)
-		 (c-forward-syntactic-ws)
+		 (c-forward-syntactic-ws lim+)
 		 (cond ((eq (char-before id-end) ?e)
 			;; Got "... ::template".
 			(let ((subres (c-forward-name)))
@@ -8336,13 +8389,13 @@
 					     (looking-at "::")
 					     (progn
 					       (goto-char (match-end 0))
-					       (c-forward-syntactic-ws)
+					       (c-forward-syntactic-ws lim+)
 					       (eq (char-after) ?*))
 					     (progn
 					       (forward-char)
 					       t))))
 			    (while (progn
-				     (c-forward-syntactic-ws)
+				     (c-forward-syntactic-ws lim+)
 				     (setq pos (point))
 				     (looking-at c-opt-type-modifier-key))
 			      (goto-char (match-end 1))))))
@@ -8352,7 +8405,7 @@
 			(setq c-last-identifier-range
 			      (cons (point) (match-end 0)))
 			(goto-char (match-end 0))
-			(c-forward-syntactic-ws)
+			(c-forward-syntactic-ws lim+)
 			(setq pos (point)
 			      res 'operator)))
 
@@ -8366,7 +8419,7 @@
 	       (setq c-last-identifier-range
 		     (cons id-start id-end)))
 	     (goto-char id-end)
-	     (c-forward-syntactic-ws)
+	     (c-forward-syntactic-ws lim+)
 	     (setq pos (point)
 		   res t)))
 
@@ -8382,7 +8435,7 @@
 	       ;; cases with tricky syntactic whitespace that aren't
 	       ;; covered in `c-identifier-key'.
 	       (goto-char (match-end 0))
-	       (c-forward-syntactic-ws)
+	       (c-forward-syntactic-ws lim+)
 	       t)
 
 	      ((and c-recognize-<>-arglists
@@ -8391,7 +8444,7 @@
 	       (when (let (c-last-identifier-range)
 		       (c-forward-<>-arglist nil))
 
-		 (c-forward-syntactic-ws)
+		 (c-forward-syntactic-ws lim+)
 		 (unless (eq (char-after) ?\()
 		   (setq c-last-identifier-range nil)
 		   (c-add-type start (1+ pos)))
@@ -8406,7 +8459,7 @@
 		       (when (and c-record-type-identifiers id-start)
 			 (c-record-ref-id (cons id-start id-end)))
 		       (forward-char 2)
-		       (c-forward-syntactic-ws)
+		       (c-forward-syntactic-ws lim+)
 		       t)
 
 		   (when (and c-record-type-identifiers id-start


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-10 20:02                                       ` Alan Mackenzie
@ 2020-12-11 10:55                                         ` Ravine Var
  2020-12-12 15:34                                           ` Alan Mackenzie
       [not found]                                           ` <X9TjCeydJaE2mpK8@ACM>
  0 siblings, 2 replies; 45+ messages in thread
From: Ravine Var @ 2020-12-11 10:55 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Alan Mackenzie <acm@muc.de> writes:
> To fix this, can I ask you, please, to try adding the following patch to
> your already patched software, and let me know if it helps at all.  If
> it does, that's great, if not, could I ask you to do another profile for
> me on the less powerful machine, say by opening the buffer, starting the
> profiler, then moving to the middle of the buffer with C-u 5 M->.  This
> may take some time to profile.  Thanks!

Doing C-u 5 M-> just jumps to the middle immediately. The problem
happens when the file is opened and I start scrolling with C-v.
With the new patch, things are still bad - emacs freezes almost
instantly.

I tested with 3 patches applied from messages 35, 95 and 128.

Here's the profile with emacs -Q :

https://gist.github.com/ravine-var/48b3e1469ac5a7f3c3df8d6d9313661a





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-11 10:55                                         ` Ravine Var
@ 2020-12-12 15:34                                           ` Alan Mackenzie
       [not found]                                           ` <X9TjCeydJaE2mpK8@ACM>
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-12 15:34 UTC (permalink / raw)
  To: Ravine Var; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Hello, Ravine.

On Fri, Dec 11, 2020 at 16:25:20 +0530, Ravine Var wrote:
> Alan Mackenzie <acm@muc.de> writes:
> > To fix this, can I ask you, please, to try adding the following patch to
> > your already patched software, and let me know if it helps at all.  If
> > it does, that's great, if not, could I ask you to do another profile for
> > me on the less powerful machine, say by opening the buffer, starting the
> > profiler, then moving to the middle of the buffer with C-u 5 M->.  This
> > may take some time to profile.  Thanks!

> Doing C-u 5 M-> just jumps to the middle immediately. The problem
> happens when the file is opened and I start scrolling with C-v.
> With the new patch, things are still bad - emacs freezes almost
> instantly.

I've had a good look at your latest profile result.  There doesn't seem
to be any further untoward looping of low-level functions.  So I'm not
sure what more to fix, other than....

Have you got the option fast-but-imprecise-scrolling set (or customized)
to non-nil?  If not, could I suggest you try it.  It's effect is to stop
Emacs fontifying every screen it scrolls over, instead only fontifying
screens when it's got no more input commands waiting.  This speeds
things up quite a bit on a slower machine.

> I tested with 3 patches applied from messages 35, 95 and 128.

> Here's the profile with emacs -Q :

> https://gist.github.com/ravine-var/48b3e1469ac5a7f3c3df8d6d9313661a

Thanks!  There appear to be about 8 seconds worth of profile data there.
How many screenfulls, approximately, did you actually scroll over in
that time?  Or, rather than answering that question, could I get you to
try another timing test?

Please put the following code into your *scratch* buffer (it's the same
code I've posted before) and evaluate it:

    (defmacro time-it (&rest forms)
      "Time the running of a sequence of forms using `float-time'.
    Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
      `(let ((start (float-time)))
        ,@forms
        (- (float-time) start)))

Then please load osprey_reg_map_macro.h freshly into a buffer, and type
(or cut and paste) the following into M-:

    (time-it (let ((n 10)) (while (> n 0) (scroll-up) (sit-for 0) (setq n (1- n)))))

What is the reported timing for scrolling these ten screens?

Thanks!

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
       [not found]                                           ` <X9TjCeydJaE2mpK8@ACM>
@ 2020-12-14  7:20                                             ` Ravine Var
  2020-12-14 11:44                                               ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Ravine Var @ 2020-12-14  7:20 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Alan Mackenzie <acm@muc.de> writes:
> Have you got the option fast-but-imprecise-scrolling set (or customized)
> to non-nil?  If not, could I suggest you try it.  It's effect is to stop
> Emacs fontifying every screen it scrolls over, instead only fontifying
> screens when it's got no more input commands waiting.  This speeds
> things up quite a bit on a slower machine.

Turning on fast-but-imprecise-scrolling improves things by a lot.
Viewing and scrolling the osprey file is much faster/smoother and the
screen doesn't freeze.

> Please put the following code into your *scratch* buffer (it's the same
> code I've posted before) and evaluate it:
>
>     (defmacro time-it (&rest forms)
>       "Time the running of a sequence of forms using `float-time'.
>     Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
>       `(let ((start (float-time)))
>         ,@forms
>         (- (float-time) start)))
>
> Then please load osprey_reg_map_macro.h freshly into a buffer, and type
> (or cut and paste) the following into M-:
>
>     (time-it (let ((n 10)) (while (> n 0) (scroll-up) (sit-for 0) (setq n (1- n)))))
>
> What is the reported timing for scrolling these ten screens?

Running emacs -Q (master + 3 patches) :

With fast-but-imprecise-scrolling: 0.9250097274780273
Without fast-but-imprecise-scrolling: 0.8903303146362305

I think using the fast-but-imprecise-scrolling option
is a workaround that can be used in underpowered machines
for big header files...





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-14  7:20                                             ` Ravine Var
@ 2020-12-14 11:44                                               ` Alan Mackenzie
  2020-12-15  4:01                                                 ` Ravine Var
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-14 11:44 UTC (permalink / raw)
  To: Ravine Var; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

Hello, Ravine.

On Mon, Dec 14, 2020 at 12:50:36 +0530, Ravine Var wrote:
> Alan Mackenzie <acm@muc.de> writes:
> > Have you got the option fast-but-imprecise-scrolling set (or customized)
> > to non-nil?  If not, could I suggest you try it.  It's effect is to stop
> > Emacs fontifying every screen it scrolls over, instead only fontifying
> > screens when it's got no more input commands waiting.  This speeds
> > things up quite a bit on a slower machine.

> Turning on fast-but-imprecise-scrolling improves things by a lot.
> Viewing and scrolling the osprey file is much faster/smoother and the
> screen doesn't freeze.

:-)

> > Please put the following code into your *scratch* buffer (it's the same
> > code I've posted before) and evaluate it:

> >     (defmacro time-it (&rest forms)
> >       "Time the running of a sequence of forms using `float-time'.
> >     Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
> >       `(let ((start (float-time)))
> >         ,@forms
> >         (- (float-time) start)))

> > Then please load osprey_reg_map_macro.h freshly into a buffer, and type
> > (or cut and paste) the following into M-:

> >     (time-it (let ((n 10)) (while (> n 0) (scroll-up) (sit-for 0) (setq n (1- n)))))

> > What is the reported timing for scrolling these ten screens?

> Running emacs -Q (master + 3 patches) :

> With fast-but-imprecise-scrolling: 0.9250097274780273
> Without fast-but-imprecise-scrolling: 0.8903303146362305

Thanks for doing that further testing.

That's 0.09 seconds per scrolling of a screen.  That is surely an
acceptably low delay.

> I think using the fast-but-imprecise-scrolling option
> is a workaround that can be used in underpowered machines
> for big header files...

Or even in up to date full powered machines.  ;-)  I have it enabled all
the time, and my PC is very similar to your faster one.

So, I propose that these two patches (the big one and the smaller one for
all the c-forward-syntactic-ws's) are sufficient to fix the bug, and I
propose closing it now.  What do you say to that?

I have looked at the other problem you mention (slow scrolling through
the machine-generated function proto_register_rrc in the wireshark file
packet-rrc.c) and have made significant progress towards implementing a
cache for the CC Mode function c-looking-at-or-maybe-in-bracelist, which
should eliminate the long delays.  Have you raised a new bug for this
problem, yet?

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-14 11:44                                               ` Alan Mackenzie
@ 2020-12-15  4:01                                                 ` Ravine Var
  2020-12-15 12:27                                                   ` Alan Mackenzie
  0 siblings, 1 reply; 45+ messages in thread
From: Ravine Var @ 2020-12-15  4:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706

> So, I propose that these two patches (the big one and the smaller one for
> all the c-forward-syntactic-ws's) are sufficient to fix the bug, and I
> propose closing it now.  What do you say to that?

Works for me. Thanks for the patches. :-)

> I have looked at the other problem you mention (slow scrolling through
> the machine-generated function proto_register_rrc in the wireshark file
> packet-rrc.c) and have made significant progress towards implementing a
> cache for the CC Mode function c-looking-at-or-maybe-in-bracelist, which
> should eliminate the long delays.  Have you raised a new bug for this
> problem, yet?

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=45248





^ permalink raw reply	[flat|nested] 45+ messages in thread

* bug#25706: 26.0.50; Slow C file fontification
  2020-12-15  4:01                                                 ` Ravine Var
@ 2020-12-15 12:27                                                   ` Alan Mackenzie
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Mackenzie @ 2020-12-15 12:27 UTC (permalink / raw)
  To: Ravine Var; +Cc: Mattias Engdegård, Lars Ingebrigtsen, 25706-done

Hello, Ravine.

On Tue, Dec 15, 2020 at 09:31:01 +0530, Ravine Var wrote:
> > So, I propose that these two patches (the big one and the smaller
> > one for all the c-forward-syntactic-ws's) are sufficient to fix the
> > bug, and I propose closing it now.  What do you say to that?

> Works for me. Thanks for the patches. :-)

Thank you for all the testing!  I've committed the changes to everywhere
relevant, and I'm closing the bug with this post.

> > I have looked at the other problem you mention (slow scrolling
> > through the machine-generated function proto_register_rrc in the
> > wireshark file packet-rrc.c) and have made significant progress
> > towards implementing a cache for the CC Mode function
> > c-looking-at-or-maybe-in-bracelist, which should eliminate the long
> > delays.  Have you raised a new bug for this problem, yet?

> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=45248

Thank you for this new bug report.  I'll carry on trying to fix it.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-12-15 12:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-13 18:20 bug#25706: 26.0.50; Slow C file fontification Sujith
2020-11-30 11:26 ` Lars Ingebrigtsen
2020-11-30 11:37   ` Lars Ingebrigtsen
2020-11-30 12:46 ` Mattias Engdegård
2020-11-30 12:49   ` Lars Ingebrigtsen
2020-11-30 16:27   ` Eli Zaretskii
2020-11-30 16:38   ` Alan Mackenzie
2020-11-30 16:53     ` Mattias Engdegård
2020-11-30 17:04       ` Mattias Engdegård
2020-12-01  5:48         ` Ravine Var
2020-12-01 13:34           ` Mattias Engdegård
2020-12-01  9:29         ` Alan Mackenzie
2020-12-01  9:44           ` martin rudalics
2020-12-01 10:07             ` Alan Mackenzie
2020-12-01  9:21       ` Alan Mackenzie
2020-12-01 12:03         ` Mattias Engdegård
2020-12-01 12:57           ` Alan Mackenzie
2020-12-01 14:07             ` Mattias Engdegård
2020-12-01 15:27               ` Alan Mackenzie
2020-12-01 18:59                 ` Mattias Engdegård
2020-12-02 10:15                   ` Alan Mackenzie
     [not found]                   ` <X8dpQeGaDD1w3kXX@ACM>
2020-12-02 15:06                     ` Mattias Engdegård
2020-12-03 10:48                       ` Alan Mackenzie
2020-12-03 14:03                         ` Mattias Engdegård
2020-12-04 21:04                           ` Alan Mackenzie
     [not found]                           ` <X8qkcokfZGbaK5A2@ACM>
2020-12-05 15:20                             ` Mattias Engdegård
2020-12-08 18:42                               ` Alan Mackenzie
     [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
2020-12-08 19:32                                 ` Mattias Engdegård
2020-12-09  7:31                                 ` Ravine Var
2020-12-09  7:47                                   ` Ravine Var
2020-12-10  8:08                                     ` Alan Mackenzie
2020-12-09 18:46                                   ` Alan Mackenzie
     [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
2020-12-09 20:04                                     ` Eli Zaretskii
2020-12-09 20:32                                       ` Alan Mackenzie
2020-12-10 17:02                                     ` Ravine Var
2020-12-10 20:02                                       ` Alan Mackenzie
2020-12-11 10:55                                         ` Ravine Var
2020-12-12 15:34                                           ` Alan Mackenzie
     [not found]                                           ` <X9TjCeydJaE2mpK8@ACM>
2020-12-14  7:20                                             ` Ravine Var
2020-12-14 11:44                                               ` Alan Mackenzie
2020-12-15  4:01                                                 ` Ravine Var
2020-12-15 12:27                                                   ` Alan Mackenzie
2020-12-09 17:00                                 ` Mattias Engdegård
2020-12-10 12:26                                   ` Alan Mackenzie
2020-11-30 18:30   ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).