Re: A possible way for CC Mode to resolve its sluggishness

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: cc-mode-help@lists.sourceforge.net, emacs-devel@gnu.org
Subject: Re: A possible way for CC Mode to resolve its sluggishness
Date: Sat, 27 Apr 2019 13:57:25 +0000	[thread overview]
Message-ID: <20190427135725.GB4822@ACM> (raw)
In-Reply-To: <jwv4l6kkzf6.fsf-monnier+emacs@gnu.org>

Hello, Stefan.

On Fri, Apr 26, 2019 at 22:10:23 -0400, Stefan Monnier wrote:
> > The problem is that CC Mode's before/after-change-functions are very
> > general, and scan the buffer looking for situations which only arise
> > sporadically.  Things like an open string getting closed, or a >
> > being inserted which needs to be checked for a template delimiter.
> > However, these expensive checks are performed for _every_ buffer
> > change.  Even doing something like inserting a letter or a digit
> > causes the full range of tests to be performed.  This is not good.

> Part of the problem is that CC-mode is very eager in its management of
> syntax information: the `syntax-table` text-properties are always kept
> up-to-date over the whole buffer right after every single change.

That is not part of the problem.  That is part of the challenge.

> Modes using syntax-propertize work more lazily:
> before-change-functions only marks that some change occurred at
> position POS and the syntax-table properties after that position are
> only updated afterward on-demand.

Yes, but it is somewhat unclear whether, how, and when modes using
syntax-propertize can update syntax-table properties on positions
_before_ a change.  This is a prime reason for CC Mode not using this
strategy.

> CC-mode tries to make up for it by being more clever about which parts
> of the buffer after position POS actually need to be updated, but when
> there are several consecutive changes, the extra work performed
> between each one of those changes add up quickly.

My proposal is to reduce this amount of work when it's not needed.

> [ Of course, there are cases where the approach used in
>   syntax-propertize loses big time.  E.g. if you have a loop that first
>   modifies a char near point-min, then asks for the syntax-table
>   properties near point-max, and then repeats... performance will suck.
>   But luckily I haven't yet seen a real-world use case where
>   this occurs.  ]

> Maybe another part of the problem is that CC-mode tries to do more than
> most other major modes: e.g. the highlighting of unclosed strings.
> For plain single-line strings this can be fairly cheap, but for
> multiline strings, keeping this information constantly up-to-date over
> the whole buffer can be costly.

CC Mode is successful in this regard.  The highlighting with
warning-face of unclosed string openers is a useful feature which other
modes could emulate.

I think I suggested a little while ago that this could be done in
syntactic analysis and font-lock.  We have a syntax flag saying "this
character (LF) terminates a style b comment", we could equally well have
a flag saying it terminates a string.  Then font-lock could examine the
string terminator, and use string-face or warning-face on the opener
depending on the terminating character.

But that's a digression from the topic of this thread.

> Most other major modes just let the font-lock-string-face bleeds further
> than the user intended, which requires much less work and works well
> enough for all other syntactic elements (CC-mode doesn't highlight
> unclosed parens, or mismatched parens, or `do` with missing `while`,
> ...).  When needed these many different kinds of errors are detected and
> shown to the user via things like flymake or LSP instead, which work
> much more lazily w.r.t buffer changes, so they don't need to same kind
> of engineering efforts to make them fast enough.

> > Thoughts?

> Not sure whether you intend this to be just a change to CC-mode (it does
> sound like it can all be implemented in Elisp) or you intend for some
> change at the C level.

At the Lisp level.  I hadn't even considered any C enhancements.

> My gut feeling is that the checks you suggest in (iii) could be
> implemented in Elisp without losing too much performance (they should
> spend most of their time within a few C primitives), tho it depends on
> the specifics of the cases you'll want to catch.  Also if you want to
> implement it in C those same specifics will need to be spelled out to
> figure out how a major mode will communicate them to the C code (for
> this to be useful beyond CC-mode, it would need to be very general, so
> it could be tricky to design).

> But to tell you the truth, other than CC-mode, I'm having a hard time
> imagining which other major mode will want to use such a thing.
> Performance of syntax-propertize is not stellar but doesn't seem
> problematic, and it is not too hard to use (its functioning is not
> exactly the same as what a real lexer would do, but you can make use of
> the language spec more or less straightforwardly), ....

Again, can syntax-propertize work on positions _before_ a buffer change?

> .... whereas I get the impression that your suggestion relies on
> properties of the language which are not often used, so are less
> familiar to the average mode implementor (and a language spec is
> unlikely to help you figure out what to do).

If other modes were to use the mechanism, they would need to define
their syntactic cell boundaries, as indeed I yet have to do for CC Mode.

> Maybe if we want to speed things up, we should consider a new parsing
> engine (instead of parse-partial-sexp and syntax-tables) based maybe on
> a DFA for the tokenizer and GLR parser on top.  That might arguably be
> more generally useful and easier to use (in the sense that one can more
> or less follow the language spec when implementing the major mode).

That would be a lot of design and a lot of work, and sounds like
something from the distant rather than medium future.  The indentation
and font-lock routines would have to be rewritten for each mode using
it.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

next prev parent reply	other threads:[~2019-04-27 13:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-26 19:30 A possible way for CC Mode to resolve its sluggishness Alan Mackenzie
2019-04-26 19:53 ` Eli Zaretskii
2019-04-26 20:11   ` Alan Mackenzie
2019-04-27  2:10 ` Stefan Monnier
2019-04-27  3:34   ` Óscar Fuentes
2019-04-27 13:57   ` Alan Mackenzie [this message]
2019-04-28 17:32     ` Stephen Leake
2019-04-29  1:46     ` Stefan Monnier
2019-04-29  9:23       ` Alan Mackenzie
2019-04-29 12:19         ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190427135725.GB4822@ACM \
    --to=acm@muc.de \
    --cc=cc-mode-help@lists.sourceforge.net \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.