From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: New optimisations for long raw strings in C++ Mode. Date: Fri, 12 Aug 2022 09:05:06 -0400 Message-ID: References: <87fsi5xw9l.fsf@gnus.org> <83wnbhtlzb.fsf@gnu.org> <703c2351d96919276449@heytings.org> <83o7wsqlcm.fsf@gnu.org> <83edxoqcnl.fsf@gnu.org> <83a68cqbm0.fsf@gnu.org> <834jykq9m6.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000001a375d05e60aef42" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21373"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Alan Mackenzie , gregory@heytings.org, Lars Ingebrigtsen , emacs-devel To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 12 15:08:16 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oMUOe-0005RP-G2 for ged-emacs-devel@m.gmane-mx.org; Fri, 12 Aug 2022 15:08:16 +0200 Original-Received: from localhost ([::1]:57240 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMUOd-00089T-Ik for ged-emacs-devel@m.gmane-mx.org; Fri, 12 Aug 2022 09:08:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51910) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMULt-0006u3-IG for emacs-devel@gnu.org; Fri, 12 Aug 2022 09:05:33 -0400 Original-Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]:37676) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oMULr-0000eS-Ay; Fri, 12 Aug 2022 09:05:25 -0400 Original-Received: by mail-pl1-x630.google.com with SMTP id m2so717167pls.4; Fri, 12 Aug 2022 06:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=C1HV7OWYHrxdH9ZyBfEfYx5OKvddpCXkuAgEQX89GlU=; b=kNUCl5xzbY3xXeQGJjQ5OMWRUk3dXWjTregrytznqfvthHIS5F9+KpdnpF1fi7ovgD crXxoyk9w3zyp8UM0m8NAOgPTZBxNr6QHuTbI4XR5J4N5dFYA9WrvABwYjgSYisaEMWu kGn1qW3b7G8uvxl4v69/VTqkUe4OFXXEU2VaXxWYRBMghVRWE7hgFeb9CirCkRRVClPm kbOMf1zOHz0ts9S20LIUXwh2xxrCYVxsF8rlvqkWzJdRitpHeKBbtgdo6hBAUUT5ww72 G5l4f/Wc6eqMDxbEAYdFtY6j7BSgq0p2JPNa9db8Oynw3RP4L5DF4NbJxmwspDy/wAAu cEbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=C1HV7OWYHrxdH9ZyBfEfYx5OKvddpCXkuAgEQX89GlU=; b=ttSaFswg246izuPJk+qZgBR8l/JXUilPXPyc0KTt+EVjJY36RpDKvFG9wrbbjbqY2O 5L+eXtL+ltob7xM8JYBWvt8gM3X1A589kACJNfRz9Ni2JLFg2FWNgBSKlNle36Em7/3S oqHoKXRlSENNrFngKxCmzw1AlrZaV2nMSTSYvKq6hxfyAOAPtZwNvnCPxc1y8K7dFr+I PSFtArJtNBDaR1fHwt2mlOWtxL0YmfVtA9iKCQZYSsDiUfFt7/lECObTwVu1FJYXDxan 0WqN9rhere0L1IfaNqecXzfPVGpJgX6gdz8r6v50NzTAPyr0UhfIpPhdzfD77nICKuf3 gkxQ== X-Gm-Message-State: ACgBeo3Mi7ylEYlHHwhimzPs6S9dJ/rRaHduDIGF2gIyCaOLyCCjEBo0 HHxdYjQWtNAm6RygEcwdiX4WDieMYrCw0bumlsSg/tda X-Google-Smtp-Source: AA6agR7rKFZeacIWcggJvsG+ZANUdaEDfyrMWt5Asv7Mha0Y9AFBUQOkcnaLXpbFK4iEsRFFuX+8CUFdVD5FalC7VkA= X-Received: by 2002:a17:903:50e:b0:170:d829:b3bb with SMTP id jn14-20020a170903050e00b00170d829b3bbmr3882719plb.93.1660309519085; Fri, 12 Aug 2022 06:05:19 -0700 (PDT) In-Reply-To: <834jykq9m6.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=owinebar@gmail.com; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:293386 Archived-At: --0000000000001a375d05e60aef42 Content-Type: text/plain; charset="UTF-8" On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii wrote: > > Really? Then please tell me how is it that we the humans can detect > incorrect fontifications even when shown partial strings and comments? > We know that fontifications are incorrect, and where strings or > comments start or end immediately, just after a single glance. We > never need to go to BOB to find that out. Serious question: is fontification intended to display text according to what the author probably intended, or according to how a compiler will process that text (leaving correctness to a more precise tool than font-lock, whether Semantic, tree-sitter, LSP, whatever)? Because I can definitely write code that has some subtle issue that I will miss, and erroneously think should display one way but which would be processed in a different way. Should fontification show my likely intention (plus, and only for bonus points, possibly highlight the error that disconnects the likely intended from the actual parse), or should it display according to the way the tools will interpret it so the author will find errors that way? When I use a dedicated IDE of recent vintage, it feels less like I am writing a stream of characters than filling in partially constructed objects representing the abstract syntax of the language I'm writing in (with grammar that has allowances for incomplete or erroneous constructs), with the text being displayed as a representation of the underlying object. IOW, the relationship of the syntactic object and the text is inverted compared to emacs's design, where (if I understand correctly) the properties of the syntactic object are only tied to the text through text properties. With the other approach, the fontification and the syntax object are tied together, but with emacs the relationship seems much more tenuous. E.g. completion and fontification are completely separate activities as far as I know, though the same contextual information should be useful for both activities. I have this CC-mode derived mode for a DSL I did not design. I'm currently the sole user of the mode, so I just wanted something quick and dirty. But as the pile of code I deal with in this DSL grows, I want to put in Semantic support for it to get context-aware completion, precise fontification, etc. The current discussion has made me wonder if deriving from CC mode is having some non-obvious effects on how font-lock works, making it non-local in ways that are not necessary, so the re-entrant nature of the Semantic parsers won't cure some of the slowness. For example, I want to use the font-lock of that mode in the REPL to fontify the statements/expressions I enter at the prompt, but otherwise ignore text. Particularly, at the end and the beginning of the REPL buffer. I don't want to narrow the buffer, just the area fontification applies to. Fontifying hundreds of megabytes of tracing print statements is not just unnecessary, it's bad news for the GC even after the buffer is cleared IME. If CC mode is determining more syntactic information than tree-sitter's incremental parsing provides (per Immanuel Lizroth's comment in this thread), then there is a disconnect somewhere in the scope of expectations for what font-lock is supposed to do. I'm certainly not clear (yet) on how to cleanly separate and then rejoin a proper syntactic analysis with fontification, and if there is "an Emacs way" to do it. Lynn --0000000000001a375d05e60aef42 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii &= lt;eliz@gnu.org> w= rote:

Really?=C2=A0 Then please tell me how is it that we the humans can detect incorrect fontifications even when shown partial strings and comments?
We know that fontifications are incorrect, and where strings or
comments start or end immediately, just after a single glance.=C2=A0 We
never need to go to BOB to find that out.

Serious question: is fontification in= tended to display text according to what the author probably intended, or a= ccording to how a compiler will process that text (leaving correctness to a= more precise tool than font-lock, whether Semantic, tree-sitter, LSP, what= ever)?
Because I can definitely write code that has = some subtle issue that I will miss, and erroneously think should display on= e way but which would be processed in a different way.=C2=A0 Should fontifi= cation show my likely intention (plus, and only for bonus points, possibly = highlight the error that disconnects the likely intended from the actual pa= rse), or should it display according to the way the tools will interpret it= so the author will find errors that way?=C2=A0

=
When I use a dedicated IDE of recent vintage, it feels=C2=A0 les= s like I am writing a stream of characters than filling in partially constr= ucted objects representing the abstract syntax of the language I'm writ= ing in (with grammar that has allowances for incomplete or erroneous constr= ucts), with the text being displayed as a representation of the underlying = object.=C2=A0 IOW, the relationship of the syntactic object and the text is= inverted compared to emacs's=C2=A0design, where (if I understand corre= ctly) the properties of the syntactic object are only tied to the text thro= ugh text properties.=C2=A0 With the other approach, the fontification and t= he syntax object are tied together, but with emacs the relationship seems m= uch more tenuous. E.g. completion and fontification are completely separate= activities as far as I know,=C2=A0though the same contextual information s= hould be useful for both activities.

I have this C= C-mode derived mode for a DSL I did not design.=C2=A0 I'm currently the= sole user of the mode, so I just wanted something quick and dirty.=C2=A0 B= ut as the pile of code I deal with in this DSL grows, I want to put in Sema= ntic support for it to get context-aware completion, precise fontification,= etc.=C2=A0 The current discussion has made me wonder if deriving from CC m= ode is having some non-obvious effects on how font-lock works, making it no= n-local in ways that are not necessary, so the re-entrant nature of the Sem= antic parsers won't cure some of the slowness.=C2=A0 For example, I wan= t to use the font-lock of that mode in the REPL to fontify the statements/e= xpressions I enter at the prompt, but otherwise ignore text.=C2=A0 Particul= arly, at the end and the beginning of the REPL buffer.=C2=A0 I don't wa= nt to narrow the buffer, just the area fontification applies to.=C2=A0 Font= ifying hundreds of megabytes of tracing print statements is not just unnece= ssary, it's bad news for the GC even after the buffer is cleared IME.

If CC mode is determining more syntactic informatio= n than tree-sitter's incremental parsing provides (per Immanuel Lizroth= 's comment in this thread), then there is a disconnect somewhere in the= scope of expectations for what font-lock is supposed to do.=C2=A0 I'm = certainly not clear (yet) on how to cleanly separate and then rejoin a prop= er syntactic analysis with fontification, and if there is "an Emacs wa= y" to do it.

Lynn


--0000000000001a375d05e60aef42--