From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Theodor Thornhill via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Date: Sun, 20 Nov 2022 21:33:06 +0100 Message-ID: <87v8n9qscd.fsf@thornhill.no> References: <83v8n94ij9.fsf@gnu.org> <87k03pwgf6.fsf@thornhill.no> <83h6yt4c12.fsf@gnu.org> Reply-To: Theodor Thornhill Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20300"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 59415@debbugs.gnu.org, casouri@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 20 21:34:20 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1owr1A-00057r-Du for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 20 Nov 2022 21:34:20 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1owr0u-0001lb-3Y; Sun, 20 Nov 2022 15:34:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1owr0s-0001lT-So for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 15:34:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1owr0s-0000O2-Kd for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 15:34:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1owr0s-0001Ad-3M for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 15:34:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Theodor Thornhill Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 20 Nov 2022 20:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59415 X-GNU-PR-Package: emacs X-Debbugs-Original-Cc: bug-gnu-emacs@gnu.org, casouri@gmail.com Original-Received: via spool by submit@debbugs.gnu.org id=B.16689764024452 (code B ref -1); Sun, 20 Nov 2022 20:34:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 20 Nov 2022 20:33:22 +0000 Original-Received: from localhost ([127.0.0.1]:44739 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1owr0E-00019j-9N for submit@debbugs.gnu.org; Sun, 20 Nov 2022 15:33:22 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:59734) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1owr0C-00019b-Dh for submit@debbugs.gnu.org; Sun, 20 Nov 2022 15:33:21 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1owr0C-0001LJ-8N for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 15:33:20 -0500 Original-Received: from out0.migadu.com ([2001:41d0:2:267::]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1owr09-0000LT-98; Sun, 20 Nov 2022 15:33:19 -0500 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thornhill.no; s=key1; t=1668976394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uKdpTeYZ+o5peTl/XEnSCrQ26Agtw4Xg/S3pEnS8Nkg=; b=r9NUyVHpLG2DI+vdoACBM20Ez2raBccXl1GrAbcLItkfjGVvZ8c1Pj0HkhLE7dIZpMQcmV BIryXsAFSmpFJMuGxd+2sfaGP0cOTUGy9aMm149zmroqalZzEmITypcTE6TgKP5uKMV2oB DU4pSk6W2CQHFa1xI+3mKdpUsgslw8YV41sfUJXgbm5+JN7Dg6L92FwfeTp65VS8/QFVMV Rxb+KPM1/kKyX6xTt7cChxMbYxh2M35ybH8MJqv0RhS0XXMyEr8VCbjKV1lHbzE7rnccrn 4gBfLQEYPm16HIiu+sS9x92y7/9JchQhzaXRKZhTCTVHXE/ARjDVmlPgs3svxg== In-Reply-To: <83h6yt4c12.fsf@gnu.org> X-Migadu-Flow: FLOW_OUT Received-SPF: pass client-ip=2001:41d0:2:267::; envelope-from=theo@thornhill.no; helo=out0.migadu.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:248473 Archived-At: Eli Zaretskii writes: >> From: Theodor Thornhill >> Cc: Yuan Fu >> Date: Sun, 20 Nov 2022 20:54:05 +0100 >> >> > Observe that fontifications stop at this line for some reason. >> > Fontification reappears on line 209271. Maybe it's because of the many >> > braces that appear in warning face? Why does TS think there are syntax >> > errors here? The C++ TS parser doesn't have that problem, btw. >> >> It seems the c parser definitely can't handle what it's seeing. > > Yes, but do you have any clue why it gives up at that line? > No, not yet. > One thing that I see is that many braces around there are shown in warning > face, so perhaps the parser is overwhelmed by the amount of parsing errors? > Yeah that's my first guess, but that shouldn't be an issue, it should be able to font-lock _something_. >> > P.S. Btw, isn't the treesit-max-buffer-size limit too low? 4 MiB? >> >> It might be! IIRC treesit uses 10x the buffer size to store the ast, so >> it'll be some more memory usage. > > After lifting the limit to allow visiting the file, this file causes Emacs > to go up to 350 MiB. Which is significant, but definitely not outrageous > enough to prevent using TS with this file. And I'm sure "normal" C files > (as opposed to ones written by a program) will need less memory. So 4 MiB > sounds too restrictive to me. We should maybe increase that to 15 MiB on > 32-bit systems and say 40 MiB on 64-bit? > I think it should probably be the same as in the C level, as I mentioned in the other mail? >> I'll do some more digging, but in the >> meantime I attach this profiler report that shows font-locking as the >> culprit: > > Culprit for what? For slow performance? Yeah. > Don't get me wrong: from my POV, TS works here better than CC Mode, in > many use cases which are much more important than scrolling through > the entire humongous file top to bottom. For example, just visiting > the file takes 3 times as much with CC Mode as with c-ts-mode; going > to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5 > sec. And likewise jumping into a random point in the file. Instead > of Alan's 150 sec for a full scroll by CC Mode I get 27 min. The > number of GC cycles with CC Mode is 10 times as large as with TS. > (Caveat: my Emacs is built without optimizations, whereas Tree-sitter > and the language support libraries are, of course, fully optimized.) > Ok, that's good to know! >> In this profile I followed your repro, and did some more movement around >> the buffer after. This isn't from emacs -Q, but I believe the results >> will be just the same, considering where the slowness seems to be >> >> >> 16695 85% - redisplay_internal (C function) >> 16695 85% - jit-lock-function >> 16695 85% - jit-lock-fontify-now >> 16695 85% - jit-lock--run-functions >> 16695 85% - run-hook-wrapped >> 16695 85% - # >> 16695 85% - font-lock-fontify-region >> 16695 85% - font-lock-default-fontify-region >> 16679 84% - treesit-font-lock-fontify-region > > Yes, treesit-font-lock-fontify-region takes the lion's share. If you or > Yuan can speed this up, please do. But I see no reason to consider this a > catastrophe, quite to the contrary. I think it boils down to getting the root too many times. In an unmodified buffer I think getting the root node should be instant, and it seems to take some time. I'll try to figure out why. Theo