From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.cc-mode.general,gmane.emacs.devel Subject: Re: A possible way for CC Mode to resolve its sluggishness Date: Fri, 26 Apr 2019 22:10:23 -0400 Message-ID: References: <20190426193056.GC4720@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="84830"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: cc-mode-help@lists.sourceforge.net Original-X-From: cc-mode-help-bounces@lists.sourceforge.net Sat Apr 27 04:11:43 2019 Return-path: Envelope-to: sf-cc-mode-help@m.gmane.org Original-Received: from lists.sourceforge.net ([216.105.38.7]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hKCoW-000LpO-OJ for sf-cc-mode-help@m.gmane.org; Sat, 27 Apr 2019 04:11:40 +0200 Original-Received: from [127.0.0.1] (helo=sfs-ml-2.v29.lw.sourceforge.com) by sfs-ml-2.v29.lw.sourceforge.com with esmtp (Exim 4.90_1) (envelope-from ) id 1hKCoL-0001BZ-9V; Sat, 27 Apr 2019 02:11:29 +0000 Original-Received: from [172.30.20.202] (helo=mx.sourceforge.net) by sfs-ml-2.v29.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1hKCoJ-0001BS-Ve for cc-mode-help@lists.sourceforge.net; Sat, 27 Apr 2019 02:11:27 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Cc:Content-Type:Mime-Version:References:Message-ID: Date:Subject:From:To:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=B1dEBCJhtJTVKcQwsxXqT85YUq6rA02nJys4gyrpg68=; b=eCl7ZJUBX6tCuL/+XvxygLefMi PghdXa+DKJInxI74yf6UByQLVKCQffNRIZiWS0zR0pLfCSfawTk7BXEpWkfp/xu1lsEteyKAIk9+j HO/JI6ruKkjMJ1cEhEmYFRlIaampR22zVSMh+pDPRioVakmetGwK87M2EvAP6qdmffTM=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Cc:Content-Type:Mime-Version:References:Message-ID:Date:Subject:From:To: Sender:Reply-To:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:List-Id:List-Help:List-Unsubscribe:List-Subscribe:List-Post: List-Owner:List-Archive; bh=B1dEBCJhtJTVKcQwsxXqT85YUq6rA02nJys4gyrpg68=; b=k UHXr0omDHpjGjG3l2dpyxGRQ80vS4PkYiggWhfdQJX3lOp+qQSTgUZvM6L4JyLBmsbFsGdVLjgwog r2KaQ5YK4rOPo5FAjrzuW4dWzgShzP4vL+GyCl/V182zMRfvRXrotmiGENPGxHujYKZDRPLER2r3l o/7m75UDPLmgbkBs=; Original-Received: from [195.159.176.226] (helo=blaine.gmane.org) by sfi-mx-3.v28.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) id 1hKCoI-00F0NF-BL for cc-mode-help@lists.sourceforge.net; Sat, 27 Apr 2019 02:11:27 +0000 Original-Received: from list by blaine.gmane.org with local (Exim 4.89) (envelope-from ) id 1hKCo9-000Lbj-UF for cc-mode-help@lists.sourceforge.net; Sat, 27 Apr 2019 04:11:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ Cancel-Lock: sha1:XW0dwEUeb6OvNV36tIH/81n9fb0= X-Headers-End: 1hKCoI-00F0NF-BL X-BeenThere: cc-mode-help@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Bug reports, feature requests, and general talk about CC Mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: cc-mode-help-bounces@lists.sourceforge.net Xref: news.gmane.org gmane.emacs.cc-mode.general:7612 gmane.emacs.devel:235975 Archived-At: > The problem is that CC Mode's before/after-change-functions are very > general, and scan the buffer looking for situations which only arise > sporadically. Things like an open string getting closed, or a > being > inserted which needs to be checked for a template delimiter. However, > these expensive checks are performed for _every_ buffer change. Even > doing something like inserting a letter or a digit causes the full range > of tests to be performed. This is not good. Part of the problem is that CC-mode is very eager in its management of syntax information: the `syntax-table` text-properties are always kept up-to-date over the whole buffer right after every single change. Modes using syntax-propertize work more lazily: before-change-functions only marks that some change occurred at position POS and the syntax-table properties after that position are only updated afterward on-demand. CC-mode tries to make up for it by being more clever about which parts of the buffer after position POS actually need to be updated, but when there are several consecutive changes, the extra work performed between each one of those changes add up quickly. [ Of course, there are cases where the approach used in syntax-propertize loses big time. E.g. if you have a loop that first modifies a char near point-min, then asks for the syntax-table properties near point-max, and then repeats... performance will suck. But luckily I haven't yet seen a real-world use case where this occurs. ] Maybe another part of the problem is that CC-mode tries to do more than most other major modes: e.g. the highlighting of unclosed strings. For plain single-line strings this can be fairly cheap, but for multiline strings, keeping this information constantly up-to-date over the whole buffer can be costly. Most other major modes just let the font-lock-string-face bleeds further than the user intended, which requires much less work and works well enough for all other syntactic elements (CC-mode doesn't highlight unclosed parens, or mismatched parens, or `do` with missing `while`, ...). When needed these many different kinds of errors are detected and shown to the user via things like flymake or LSP instead, which work much more lazily w.r.t buffer changes, so they don't need to same kind of engineering efforts to make them fast enough. > Thoughts? Not sure whether you intend this to be just a change to CC-mode (it does sound like it can all be implemented in Elisp) or you intend for some change at the C level. My gut feeling is that the checks you suggest in (iii) could be implemented in Elisp without losing too much performance (they should spend most of their time within a few C primitives), tho it depends on the specifics of the cases you'll want to catch. Also if you want to implement it in C those same specifics will need to be spelled out to figure out how a major mode will communicate them to the C code (for this to be useful beyond CC-mode, it would need to be very general, so it could be tricky to design). But to tell you the truth, other than CC-mode, I'm having a hard time imagining which other major mode will want to use such a thing. Performance of syntax-propertize is not stellar but doesn't seem problematic, and it is not too hard to use (its functioning is not exactly the same as what a real lexer would do, but you can make use of the language spec more or less straightforwardly), whereas I get the impression that your suggestion relies on properties of the language which are not often used, so are less familiar to the average mode implementor (and a language spec is unlikely to help you figure out what to do). Maybe if we want to speed things up, we should consider a new parsing engine (instead of parse-partial-sexp and syntax-tables) based maybe on a DFA for the tokenizer and GLR parser on top. That might arguably be more generally useful and easier to use (in the sense that one can more or less follow the language spec when implementing the major mode). Stefan