From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#25706: 26.0.50; Slow C file fontification Date: Thu, 10 Dec 2020 12:26:48 +0000 Message-ID: References: <53CC4F6E-716E-4D4B-8903-F32CCB676163@acm.org> <05F2A660-A403-4B81-AE77-416A739160A7@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="559"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , 25706@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Dec 10 14:43:50 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1knMEX-000Abr-Rb for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 10 Dec 2020 14:43:49 +0100 Original-Received: from localhost ([::1]:51352 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1knMEW-0005xC-Qu for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 10 Dec 2020 08:43:48 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60244) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1knL2E-000133-9l for bug-gnu-emacs@gnu.org; Thu, 10 Dec 2020 07:27:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:53860) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1knL2D-0002Xn-VX; Thu, 10 Dec 2020 07:27:01 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1knL2D-0000KJ-QR; Thu, 10 Dec 2020 07:27:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bug-cc-mode@gnu.org Resent-Date: Thu, 10 Dec 2020 12:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25706 X-GNU-PR-Package: emacs,cc-mode X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 25706-submit@debbugs.gnu.org id=B25706.16076032181246 (code B ref 25706); Thu, 10 Dec 2020 12:27:01 +0000 Original-Received: (at 25706) by debbugs.gnu.org; 10 Dec 2020 12:26:58 +0000 Original-Received: from localhost ([127.0.0.1]:37173 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1knL2A-0000K2-DG for submit@debbugs.gnu.org; Thu, 10 Dec 2020 07:26:58 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:22658 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1knL27-0000Jj-CK for 25706@debbugs.gnu.org; Thu, 10 Dec 2020 07:26:57 -0500 Original-Received: (qmail 5305 invoked by uid 3782); 10 Dec 2020 12:26:48 -0000 Original-Received: from acm.muc.de (p4fe15c51.dip0.t-ipconnect.de [79.225.92.81]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Thu, 10 Dec 2020 13:26:48 +0100 Original-Received: (qmail 4652 invoked by uid 1000); 10 Dec 2020 12:26:48 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:195648 Archived-At: Hello, Mattias. Thanks for this! On Wed, Dec 09, 2020 at 18:00:30 +0100, Mattias Engdegård wrote: > First, some Emacs regexp basics: > 1. If A and B match single characters, then A\|B should be written > [AB] whenever possible. The reason is that A\|B adds a backtrack > record which uses stack space and wastes time if matching fails later > on. The cost can be quite noticeable, which we have seen. > 2. Syntax-class constructs are usually better written as character > alternatives when possible. > The \sX construct, for some X, is typically somewhat slower to match > than explicitly listing the characters to match. For example, if all > you care about are space and tab, then "\\s *" should be written "[ > \t]*". > 3. Unicode character classes are slower to match than ASCII-only ones. > For example, [[:alpha:]] is slower than [A-Za-z], assuming only those > characters are of interest. > 4. [^...] will match \n unless included in the set. For example, > "[^a]\\|$" will almost never match the $ (end-of-line) branch, because > a newline will be matched by the first branch. The only exception is > at the very end of the buffer if it is not newline-terminated, but > that is rarely worth considering for source code. > 5. \r (carriage return) normally doesn't appear in buffers even if the > file uses DOS line endings. Line endings are converted into a single > \n (newline) when the buffer is read. In particular, $ does NOT match > at \r, only before \n. > When \r appears it is usually because the file contains a mixture of > line-ending styles, typically from being edited using broken tools. > Whether you want to take such files into account is a matter of > judgement; most modes don't bother. > 6. Capturing groups costs more than non-capturing groups, but you > already know that. > On to specifics: here are annotations for possible improvements in > cc-langs.el. (I didn't bother about capturing groups here.) I think we should get around to fixing the regexps in CC Mode soon. But I think I would rather do this as a separate exercise, since the patch for this bug is already around 800 lines and Ravine Var, the OP, has found further problems on a slowish machine. In particular, some of the fixes in your patch relate to the CPP constructs, and they might well be slowing down that regexp in c-find-decl-spots I highlighted earlier. So I'm keen to look at this again, once the current bug is settled. -- Alan Mackenzie (Nuremberg, Germany).