From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#25706: 26.0.50; Slow C file fontification Date: Wed, 2 Dec 2020 10:15:29 +0000 Message-ID: References: <55C404DC-1C29-449F-9A49-B20EDFFCFCEA@acm.org> <27B320DF-8102-4CDF-8C6A-7157EEAACF64@acm.org> <956BCA08-0376-4FAD-B1F7-2087C03F6181@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37049"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , 25706@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 02 11:30:41 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kkPPE-0009Xf-T3 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 02 Dec 2020 11:30:40 +0100 Original-Received: from localhost ([::1]:36206 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kkPPD-0004vb-LK for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 02 Dec 2020 05:30:39 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48666) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kkPB4-0005Uu-Ox for bug-gnu-emacs@gnu.org; Wed, 02 Dec 2020 05:16:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:51033) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kkPB4-0002vH-E2; Wed, 02 Dec 2020 05:16:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kkPB4-0003cF-83; Wed, 02 Dec 2020 05:16:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bug-cc-mode@gnu.org Resent-Date: Wed, 02 Dec 2020 10:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25706 X-GNU-PR-Package: emacs,cc-mode X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 25706-submit@debbugs.gnu.org id=B25706.160690413912515 (code B ref 25706); Wed, 02 Dec 2020 10:16:02 +0000 Original-Received: (at 25706) by debbugs.gnu.org; 2 Dec 2020 10:15:39 +0000 Original-Received: from localhost ([127.0.0.1]:34346 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kkPAg-0003FO-Tk for submit@debbugs.gnu.org; Wed, 02 Dec 2020 05:15:39 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:41384 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1kkPAe-00037G-Jk for 25706@debbugs.gnu.org; Wed, 02 Dec 2020 05:15:37 -0500 Original-Received: (qmail 3615 invoked by uid 3782); 2 Dec 2020 10:15:30 -0000 Original-Received: from acm.muc.de (p4fe15cf2.dip0.t-ipconnect.de [79.225.92.242]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Wed, 02 Dec 2020 11:15:29 +0100 Original-Received: (qmail 6617 invoked by uid 1000); 2 Dec 2020 10:15:29 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:194777 Archived-At: Hello, Mattias. On Tue, Dec 01, 2020 at 19:59:04 +0100, Mattias Engdegård wrote: > 1 dec. 2020 kl. 16.27 skrev Alan Mackenzie : > > Ah. ;-) Do you think the difference might be significantly more if I > > were systematically to expunge "\\("s from CC Mode? > No, probably not. It's just obvious low-hanging fruit; every little > helps some. Doing so also makes the regexps a little less mystifying > for the reader since the only capture groups left are those actually > used. Finally, it removes or at least raises some hard limits that we > had in the past (from regexp stack overflow). OK. That's a project for ASAP, but not, then, urgent. > > Add in yet another cache (or fix the existing cache which is buggy) > > for whatever it is that's searching backwards for braces. > Are the bugs in the existing cache preventing it from making the cases > under discussion faster? I spent yesterday evening investigating the "CC Mode state cache", i.e. the thing that keeps track of braces and open parens/brackets. I found a place where it was unnecessarily causing scanning from BOB, and fixed it provisionally. On doing a (time-scroll) on the entire monster buffer, it saved ~25% of the run time. There is definitely something else scanning repeatedly from BOB - the screen scrolling was more sluggish near the end of the buffer than half way through. Here's that provisional patch, if you'd like to try it: diff -r 863d08a1858a cc-engine.el --- a/cc-engine.el Thu Nov 26 11:27:52 2020 +0000 +++ b/cc-engine.el Wed Dec 02 09:55:50 2020 +0000 @@ -3672,9 +3672,9 @@ how-far 0)) ((<= good-pos here) (setq strategy 'forward - start-point (if changed-macro-start - cache-pos - (max good-pos cache-pos)) + start-point ;; (if changed-macro-start OLD STOUGH, 2020-12-01 + ;; cache-pos + (max good-pos cache-pos);; ) how-far (- here start-point))) ((< (- good-pos here) (- here cache-pos)) ; FIXME!!! ; apply some sort of weighting. (setq strategy 'backward > A naïve question: the files we are talking about are dominated by > (mostly single-line) preprocessor directives whose fontification should > be invariant of context (as long as they are not inside comments or > strings, but that's not hard to find out). Why do we then spend time > looking for context at all? Because many situations are context dependent, particularly in C++ Mode. That raises the possibility of not tracking context for these monster files.h, but how would one distinguish between these different "types" of CC Mode file? > From profiling, it seems that about 30 % of the time is spent in > c-determine-limit, called from c-fl-decl-start, > c-font-lock-enclosing-decls and c-font-lock-cut-off-declarators (about > 10 % each). Yes. c-determine-limit scans backwards over a buffer to find a position that is around N non-string non-comment characters before point. I put some instrumentation on it yesterday evening, and it is apparent that it is getting called four times in succession from the same point with N = 500, 1000, 1000, 1000. This screams out for a simple cache, which I intend to implement. Also, maybe I should always call c-determine-limit with the same N, and perhaps even cut N to 500 in all cases. Or something like that. It is clear that a great deal of run time could be saved, here. Also, I intend to track down whatever the other thing is that is scanning from the previous brace or BOB. It may be possible to alter the handling of these monster files from impossibly slow to somewhat sluggish. -- Alan Mackenzie (Nuremberg, Germany).