From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: cc-mode-help@lists.sourceforge.net, emacs-devel@gnu.org
Subject: Re: A possible way for CC Mode to resolve its sluggishness
Date: Sat, 27 Apr 2019 13:57:25 +0000 [thread overview]
Message-ID: <20190427135725.GB4822@ACM> (raw)
In-Reply-To: <jwv4l6kkzf6.fsf-monnier+emacs@gnu.org>
Hello, Stefan.
On Fri, Apr 26, 2019 at 22:10:23 -0400, Stefan Monnier wrote:
> > The problem is that CC Mode's before/after-change-functions are very
> > general, and scan the buffer looking for situations which only arise
> > sporadically. Things like an open string getting closed, or a >
> > being inserted which needs to be checked for a template delimiter.
> > However, these expensive checks are performed for _every_ buffer
> > change. Even doing something like inserting a letter or a digit
> > causes the full range of tests to be performed. This is not good.
> Part of the problem is that CC-mode is very eager in its management of
> syntax information: the `syntax-table` text-properties are always kept
> up-to-date over the whole buffer right after every single change.
That is not part of the problem. That is part of the challenge.
> Modes using syntax-propertize work more lazily:
> before-change-functions only marks that some change occurred at
> position POS and the syntax-table properties after that position are
> only updated afterward on-demand.
Yes, but it is somewhat unclear whether, how, and when modes using
syntax-propertize can update syntax-table properties on positions
_before_ a change. This is a prime reason for CC Mode not using this
strategy.
> CC-mode tries to make up for it by being more clever about which parts
> of the buffer after position POS actually need to be updated, but when
> there are several consecutive changes, the extra work performed
> between each one of those changes add up quickly.
My proposal is to reduce this amount of work when it's not needed.
> [ Of course, there are cases where the approach used in
> syntax-propertize loses big time. E.g. if you have a loop that first
> modifies a char near point-min, then asks for the syntax-table
> properties near point-max, and then repeats... performance will suck.
> But luckily I haven't yet seen a real-world use case where
> this occurs. ]
> Maybe another part of the problem is that CC-mode tries to do more than
> most other major modes: e.g. the highlighting of unclosed strings.
> For plain single-line strings this can be fairly cheap, but for
> multiline strings, keeping this information constantly up-to-date over
> the whole buffer can be costly.
CC Mode is successful in this regard. The highlighting with
warning-face of unclosed string openers is a useful feature which other
modes could emulate.
I think I suggested a little while ago that this could be done in
syntactic analysis and font-lock. We have a syntax flag saying "this
character (LF) terminates a style b comment", we could equally well have
a flag saying it terminates a string. Then font-lock could examine the
string terminator, and use string-face or warning-face on the opener
depending on the terminating character.
But that's a digression from the topic of this thread.
> Most other major modes just let the font-lock-string-face bleeds further
> than the user intended, which requires much less work and works well
> enough for all other syntactic elements (CC-mode doesn't highlight
> unclosed parens, or mismatched parens, or `do` with missing `while`,
> ...). When needed these many different kinds of errors are detected and
> shown to the user via things like flymake or LSP instead, which work
> much more lazily w.r.t buffer changes, so they don't need to same kind
> of engineering efforts to make them fast enough.
> > Thoughts?
> Not sure whether you intend this to be just a change to CC-mode (it does
> sound like it can all be implemented in Elisp) or you intend for some
> change at the C level.
At the Lisp level. I hadn't even considered any C enhancements.
> My gut feeling is that the checks you suggest in (iii) could be
> implemented in Elisp without losing too much performance (they should
> spend most of their time within a few C primitives), tho it depends on
> the specifics of the cases you'll want to catch. Also if you want to
> implement it in C those same specifics will need to be spelled out to
> figure out how a major mode will communicate them to the C code (for
> this to be useful beyond CC-mode, it would need to be very general, so
> it could be tricky to design).
> But to tell you the truth, other than CC-mode, I'm having a hard time
> imagining which other major mode will want to use such a thing.
> Performance of syntax-propertize is not stellar but doesn't seem
> problematic, and it is not too hard to use (its functioning is not
> exactly the same as what a real lexer would do, but you can make use of
> the language spec more or less straightforwardly), ....
Again, can syntax-propertize work on positions _before_ a buffer change?
> .... whereas I get the impression that your suggestion relies on
> properties of the language which are not often used, so are less
> familiar to the average mode implementor (and a language spec is
> unlikely to help you figure out what to do).
If other modes were to use the mechanism, they would need to define
their syntactic cell boundaries, as indeed I yet have to do for CC Mode.
> Maybe if we want to speed things up, we should consider a new parsing
> engine (instead of parse-partial-sexp and syntax-tables) based maybe on
> a DFA for the tokenizer and GLR parser on top. That might arguably be
> more generally useful and easier to use (in the sense that one can more
> or less follow the language spec when implementing the major mode).
That would be a lot of design and a lot of work, and sounds like
something from the distant rather than medium future. The indentation
and font-lock routines would have to be rewritten for each mode using
it.
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).
next prev parent reply other threads:[~2019-04-27 13:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-26 19:30 A possible way for CC Mode to resolve its sluggishness Alan Mackenzie
2019-04-26 19:53 ` Eli Zaretskii
2019-04-26 20:11 ` Alan Mackenzie
2019-04-27 2:10 ` Stefan Monnier
2019-04-27 3:34 ` Óscar Fuentes
2019-04-27 13:57 ` Alan Mackenzie [this message]
2019-04-28 17:32 ` Stephen Leake
2019-04-29 1:46 ` Stefan Monnier
2019-04-29 9:23 ` Alan Mackenzie
2019-04-29 12:19 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190427135725.GB4822@ACM \
--to=acm@muc.de \
--cc=cc-mode-help@lists.sourceforge.net \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).