From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel,gmane.emacs.cc-mode.general Subject: Re: A possible way for CC Mode to resolve its sluggishness Date: Sat, 27 Apr 2019 13:57:25 +0000 Message-ID: <20190427135725.GB4822@ACM> References: <20190426193056.GC4720@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="99413"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: cc-mode-help@lists.sourceforge.net, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 27 15:58:12 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hKNqG-000PiJ-7e for ged-emacs-devel@m.gmane.org; Sat, 27 Apr 2019 15:58:12 +0200 Original-Received: from localhost ([127.0.0.1]:60686 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hKNqF-0003uX-5z for ged-emacs-devel@m.gmane.org; Sat, 27 Apr 2019 09:58:11 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:37347) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hKNph-0003cs-1o for emacs-devel@gnu.org; Sat, 27 Apr 2019 09:57:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hKNpf-0002WT-30 for emacs-devel@gnu.org; Sat, 27 Apr 2019 09:57:36 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:52382 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1hKNpc-0002S4-FX for emacs-devel@gnu.org; Sat, 27 Apr 2019 09:57:33 -0400 Original-Received: (qmail 99071 invoked by uid 3782); 27 Apr 2019 13:57:26 -0000 Original-Received: from acm.muc.de (p4FE15039.dip0.t-ipconnect.de [79.225.80.57]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sat, 27 Apr 2019 15:57:25 +0200 Original-Received: (qmail 4907 invoked by uid 1000); 27 Apr 2019 13:57:25 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:235989 gmane.emacs.cc-mode.general:7613 Archived-At: Hello, Stefan. On Fri, Apr 26, 2019 at 22:10:23 -0400, Stefan Monnier wrote: > > The problem is that CC Mode's before/after-change-functions are very > > general, and scan the buffer looking for situations which only arise > > sporadically. Things like an open string getting closed, or a > > > being inserted which needs to be checked for a template delimiter. > > However, these expensive checks are performed for _every_ buffer > > change. Even doing something like inserting a letter or a digit > > causes the full range of tests to be performed. This is not good. > Part of the problem is that CC-mode is very eager in its management of > syntax information: the `syntax-table` text-properties are always kept > up-to-date over the whole buffer right after every single change. That is not part of the problem. That is part of the challenge. > Modes using syntax-propertize work more lazily: > before-change-functions only marks that some change occurred at > position POS and the syntax-table properties after that position are > only updated afterward on-demand. Yes, but it is somewhat unclear whether, how, and when modes using syntax-propertize can update syntax-table properties on positions _before_ a change. This is a prime reason for CC Mode not using this strategy. > CC-mode tries to make up for it by being more clever about which parts > of the buffer after position POS actually need to be updated, but when > there are several consecutive changes, the extra work performed > between each one of those changes add up quickly. My proposal is to reduce this amount of work when it's not needed. > [ Of course, there are cases where the approach used in > syntax-propertize loses big time. E.g. if you have a loop that first > modifies a char near point-min, then asks for the syntax-table > properties near point-max, and then repeats... performance will suck. > But luckily I haven't yet seen a real-world use case where > this occurs. ] > Maybe another part of the problem is that CC-mode tries to do more than > most other major modes: e.g. the highlighting of unclosed strings. > For plain single-line strings this can be fairly cheap, but for > multiline strings, keeping this information constantly up-to-date over > the whole buffer can be costly. CC Mode is successful in this regard. The highlighting with warning-face of unclosed string openers is a useful feature which other modes could emulate. I think I suggested a little while ago that this could be done in syntactic analysis and font-lock. We have a syntax flag saying "this character (LF) terminates a style b comment", we could equally well have a flag saying it terminates a string. Then font-lock could examine the string terminator, and use string-face or warning-face on the opener depending on the terminating character. But that's a digression from the topic of this thread. > Most other major modes just let the font-lock-string-face bleeds further > than the user intended, which requires much less work and works well > enough for all other syntactic elements (CC-mode doesn't highlight > unclosed parens, or mismatched parens, or `do` with missing `while`, > ...). When needed these many different kinds of errors are detected and > shown to the user via things like flymake or LSP instead, which work > much more lazily w.r.t buffer changes, so they don't need to same kind > of engineering efforts to make them fast enough. > > Thoughts? > Not sure whether you intend this to be just a change to CC-mode (it does > sound like it can all be implemented in Elisp) or you intend for some > change at the C level. At the Lisp level. I hadn't even considered any C enhancements. > My gut feeling is that the checks you suggest in (iii) could be > implemented in Elisp without losing too much performance (they should > spend most of their time within a few C primitives), tho it depends on > the specifics of the cases you'll want to catch. Also if you want to > implement it in C those same specifics will need to be spelled out to > figure out how a major mode will communicate them to the C code (for > this to be useful beyond CC-mode, it would need to be very general, so > it could be tricky to design). > But to tell you the truth, other than CC-mode, I'm having a hard time > imagining which other major mode will want to use such a thing. > Performance of syntax-propertize is not stellar but doesn't seem > problematic, and it is not too hard to use (its functioning is not > exactly the same as what a real lexer would do, but you can make use of > the language spec more or less straightforwardly), .... Again, can syntax-propertize work on positions _before_ a buffer change? > .... whereas I get the impression that your suggestion relies on > properties of the language which are not often used, so are less > familiar to the average mode implementor (and a language spec is > unlikely to help you figure out what to do). If other modes were to use the mechanism, they would need to define their syntactic cell boundaries, as indeed I yet have to do for CC Mode. > Maybe if we want to speed things up, we should consider a new parsing > engine (instead of parse-partial-sexp and syntax-tables) based maybe on > a DFA for the tokenizer and GLR parser on top. That might arguably be > more generally useful and easier to use (in the sense that one can more > or less follow the language spec when implementing the major mode). That would be a lot of design and a lot of work, and sounds like something from the distant rather than medium future. The indentation and font-lock routines would have to be rewritten for each mode using it. > Stefan -- Alan Mackenzie (Nuremberg, Germany).