unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 25706@debbugs.gnu.org
Subject: bug#25706: 26.0.50; Slow C file fontification
Date: Thu, 10 Dec 2020 12:26:48 +0000	[thread overview]
Message-ID: <X9IUCMA+SPu3fmi1__4728.1057968334$1607607835$gmane$org@ACM> (raw)
In-Reply-To: <FF2C8BEC-A227-4533-8ADC-93080A5BB5DF@acm.org>

Hello, Mattias.

Thanks for this!

On Wed, Dec 09, 2020 at 18:00:30 +0100, Mattias Engdegård wrote:
> First, some Emacs regexp basics:

> 1. If A and B match single characters, then A\|B should be written
> [AB] whenever possible. The reason is that A\|B adds a backtrack
> record which uses stack space and wastes time if matching fails later
> on. The cost can be quite noticeable, which we have seen.

> 2. Syntax-class constructs are usually better written as character
> alternatives when possible.

> The \sX construct, for some X, is typically somewhat slower to match
> than explicitly listing the characters to match. For example, if all
> you care about are space and tab, then "\\s *" should be written "[
> \t]*".

> 3. Unicode character classes are slower to match than ASCII-only ones.
> For example, [[:alpha:]] is slower than [A-Za-z], assuming only those
> characters are of interest.

> 4. [^...] will match \n unless included in the set. For example,
> "[^a]\\|$" will almost never match the $ (end-of-line) branch, because
> a newline will be matched by the first branch. The only exception is
> at the very end of the buffer if it is not newline-terminated, but
> that is rarely worth considering for source code.

> 5. \r (carriage return) normally doesn't appear in buffers even if the
> file uses DOS line endings. Line endings are converted into a single
> \n (newline) when the buffer is read. In particular, $ does NOT match
> at \r, only before \n.

> When \r appears it is usually because the file contains a mixture of
> line-ending styles, typically from being edited using broken tools.
> Whether you want to take such files into account is a matter of
> judgement; most modes don't bother.

> 6. Capturing groups costs more than non-capturing groups, but you
> already know that.

> On to specifics: here are annotations for possible improvements in
> cc-langs.el. (I didn't bother about capturing groups here.)

I think we should get around to fixing the regexps in CC Mode soon.  But
I think I would rather do this as a separate exercise, since the patch
for this bug is already around 800 lines and Ravine Var, the OP, has
found further problems on a slowish machine.

In particular, some of the fixes in your patch relate to the CPP
constructs, and they might well be slowing down that regexp in
c-find-decl-spots I highlighted earlier.  So I'm keen to look at this
again, once the current bug is settled.

-- 
Alan Mackenzie (Nuremberg, Germany).





  reply	other threads:[~2020-12-10 12:26 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-13 18:20 bug#25706: 26.0.50; Slow C file fontification Sujith
2020-11-30 11:26 ` Lars Ingebrigtsen
2020-11-30 11:37   ` Lars Ingebrigtsen
2020-11-30 12:46 ` Mattias Engdegård
2020-11-30 12:49   ` Lars Ingebrigtsen
2020-11-30 16:27   ` Eli Zaretskii
2020-11-30 16:38   ` Alan Mackenzie
2020-11-30 16:53     ` Mattias Engdegård
2020-11-30 17:04       ` Mattias Engdegård
2020-12-01  5:48         ` Ravine Var
2020-12-01 13:34           ` Mattias Engdegård
2020-12-01  9:29         ` Alan Mackenzie
2020-12-01  9:44           ` martin rudalics
2020-12-01 10:07             ` Alan Mackenzie
2020-12-01  9:21       ` Alan Mackenzie
2020-12-01 12:03         ` Mattias Engdegård
2020-12-01 12:57           ` Alan Mackenzie
2020-12-01 14:07             ` Mattias Engdegård
2020-12-01 15:27               ` Alan Mackenzie
2020-12-01 18:59                 ` Mattias Engdegård
2020-12-02 10:15                   ` Alan Mackenzie
     [not found]                   ` <X8dpQeGaDD1w3kXX@ACM>
2020-12-02 15:06                     ` Mattias Engdegård
2020-12-03 10:48                       ` Alan Mackenzie
2020-12-03 14:03                         ` Mattias Engdegård
2020-12-04 21:04                           ` Alan Mackenzie
     [not found]                           ` <X8qkcokfZGbaK5A2@ACM>
2020-12-05 15:20                             ` Mattias Engdegård
2020-12-08 18:42                               ` Alan Mackenzie
     [not found]                               ` <X8/JG7eD7SfkEimH@ACM>
2020-12-08 19:32                                 ` Mattias Engdegård
2020-12-09  7:31                                 ` Ravine Var
2020-12-09  7:47                                   ` Ravine Var
2020-12-10  8:08                                     ` Alan Mackenzie
2020-12-09 18:46                                   ` Alan Mackenzie
     [not found]                                   ` <X9Ebn7hKnG/vpDcZ@ACM>
2020-12-09 20:04                                     ` Eli Zaretskii
2020-12-09 20:32                                       ` Alan Mackenzie
2020-12-10 17:02                                     ` Ravine Var
2020-12-10 20:02                                       ` Alan Mackenzie
2020-12-11 10:55                                         ` Ravine Var
2020-12-12 15:34                                           ` Alan Mackenzie
     [not found]                                           ` <X9TjCeydJaE2mpK8@ACM>
2020-12-14  7:20                                             ` Ravine Var
2020-12-14 11:44                                               ` Alan Mackenzie
2020-12-15  4:01                                                 ` Ravine Var
2020-12-15 12:27                                                   ` Alan Mackenzie
2020-12-09 17:00                                 ` Mattias Engdegård
2020-12-10 12:26                                   ` Alan Mackenzie [this message]
2020-11-30 18:30   ` Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='X9IUCMA+SPu3fmi1__4728.1057968334$1607607835$gmane$org@ACM' \
    --to=acm@muc.de \
    --cc=25706@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).