From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Lynn Winebarger <owinebar@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: New optimisations for long raw strings in C++ Mode.
Date: Fri, 12 Aug 2022 09:05:06 -0400
Message-ID: <CAM=F=bCCPL3n_F=xcVtPQoxiDwC9hwKQhsOSM4H0CcpX1oJFmg@mail.gmail.com>
References: <87fsi5xw9l.fsf@gnus.org> <YvKF1LhBlcCP/LCd@ACM>
 <83wnbhtlzb.fsf@gnu.org> <703c2351d96919276449@heytings.org>
 <YvLVAHO4RlLPZ9Mj@ACM> <83o7wsqlcm.fsf@gnu.org> <YvPbfzIcL7ibpAa/@ACM>
 <83edxoqcnl.fsf@gnu.org> <YvPh41Fc/2+x2dj7@ACM> <83a68cqbm0.fsf@gnu.org>
 <YvPrvna9FdJoCt10@ACM> <834jykq9m6.fsf@gnu.org>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="0000000000001a375d05e60aef42"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="21373"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Alan Mackenzie <acm@muc.de>, gregory@heytings.org,
 Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 12 15:08:16 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1oMUOe-0005RP-G2
	for ged-emacs-devel@m.gmane-mx.org; Fri, 12 Aug 2022 15:08:16 +0200
Original-Received: from localhost ([::1]:57240 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1oMUOd-00089T-Ik
	for ged-emacs-devel@m.gmane-mx.org; Fri, 12 Aug 2022 09:08:15 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51910)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1oMULt-0006u3-IG
 for emacs-devel@gnu.org; Fri, 12 Aug 2022 09:05:33 -0400
Original-Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]:37676)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1oMULr-0000eS-Ay; Fri, 12 Aug 2022 09:05:25 -0400
Original-Received: by mail-pl1-x630.google.com with SMTP id m2so717167pls.4;
 Fri, 12 Aug 2022 06:05:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc;
 bh=C1HV7OWYHrxdH9ZyBfEfYx5OKvddpCXkuAgEQX89GlU=;
 b=kNUCl5xzbY3xXeQGJjQ5OMWRUk3dXWjTregrytznqfvthHIS5F9+KpdnpF1fi7ovgD
 crXxoyk9w3zyp8UM0m8NAOgPTZBxNr6QHuTbI4XR5J4N5dFYA9WrvABwYjgSYisaEMWu
 kGn1qW3b7G8uvxl4v69/VTqkUe4OFXXEU2VaXxWYRBMghVRWE7hgFeb9CirCkRRVClPm
 kbOMf1zOHz0ts9S20LIUXwh2xxrCYVxsF8rlvqkWzJdRitpHeKBbtgdo6hBAUUT5ww72
 G5l4f/Wc6eqMDxbEAYdFtY6j7BSgq0p2JPNa9db8Oynw3RP4L5DF4NbJxmwspDy/wAAu
 cEbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=C1HV7OWYHrxdH9ZyBfEfYx5OKvddpCXkuAgEQX89GlU=;
 b=ttSaFswg246izuPJk+qZgBR8l/JXUilPXPyc0KTt+EVjJY36RpDKvFG9wrbbjbqY2O
 5L+eXtL+ltob7xM8JYBWvt8gM3X1A589kACJNfRz9Ni2JLFg2FWNgBSKlNle36Em7/3S
 oqHoKXRlSENNrFngKxCmzw1AlrZaV2nMSTSYvKq6hxfyAOAPtZwNvnCPxc1y8K7dFr+I
 PSFtArJtNBDaR1fHwt2mlOWtxL0YmfVtA9iKCQZYSsDiUfFt7/lECObTwVu1FJYXDxan
 0WqN9rhere0L1IfaNqecXzfPVGpJgX6gdz8r6v50NzTAPyr0UhfIpPhdzfD77nICKuf3
 gkxQ==
X-Gm-Message-State: ACgBeo3Mi7ylEYlHHwhimzPs6S9dJ/rRaHduDIGF2gIyCaOLyCCjEBo0
 HHxdYjQWtNAm6RygEcwdiX4WDieMYrCw0bumlsSg/tda
X-Google-Smtp-Source: AA6agR7rKFZeacIWcggJvsG+ZANUdaEDfyrMWt5Asv7Mha0Y9AFBUQOkcnaLXpbFK4iEsRFFuX+8CUFdVD5FalC7VkA=
X-Received: by 2002:a17:903:50e:b0:170:d829:b3bb with SMTP id
 jn14-20020a170903050e00b00170d829b3bbmr3882719plb.93.1660309519085; Fri, 12
 Aug 2022 06:05:19 -0700 (PDT)
In-Reply-To: <834jykq9m6.fsf@gnu.org>
Received-SPF: pass client-ip=2607:f8b0:4864:20::630;
 envelope-from=owinebar@gmail.com; helo=mail-pl1-x630.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:293386
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/293386>

--0000000000001a375d05e60aef42
Content-Type: text/plain; charset="UTF-8"

On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii <eliz@gnu.org> wrote:

>
> Really?  Then please tell me how is it that we the humans can detect
> incorrect fontifications even when shown partial strings and comments?
> We know that fontifications are incorrect, and where strings or
> comments start or end immediately, just after a single glance.  We
> never need to go to BOB to find that out.


Serious question: is fontification intended to display text according to
what the author probably intended, or according to how a compiler will
process that text (leaving correctness to a more precise tool than
font-lock, whether Semantic, tree-sitter, LSP, whatever)?
Because I can definitely write code that has some subtle issue that I will
miss, and erroneously think should display one way but which would be
processed in a different way.  Should fontification show my likely
intention (plus, and only for bonus points, possibly highlight the error
that disconnects the likely intended from the actual parse), or should it
display according to the way the tools will interpret it so the author will
find errors that way?

When I use a dedicated IDE of recent vintage, it feels  less like I am
writing a stream of characters than filling in partially constructed
objects representing the abstract syntax of the language I'm writing in
(with grammar that has allowances for incomplete or erroneous constructs),
with the text being displayed as a representation of the underlying
object.  IOW, the relationship of the syntactic object and the text is
inverted compared to emacs's design, where (if I understand correctly) the
properties of the syntactic object are only tied to the text through text
properties.  With the other approach, the fontification and the syntax
object are tied together, but with emacs the relationship seems much more
tenuous. E.g. completion and fontification are completely separate
activities as far as I know, though the same contextual information should
be useful for both activities.

I have this CC-mode derived mode for a DSL I did not design.  I'm currently
the sole user of the mode, so I just wanted something quick and dirty.  But
as the pile of code I deal with in this DSL grows, I want to put in
Semantic support for it to get context-aware completion, precise
fontification, etc.  The current discussion has made me wonder if deriving
from CC mode is having some non-obvious effects on how font-lock works,
making it non-local in ways that are not necessary, so the re-entrant
nature of the Semantic parsers won't cure some of the slowness.  For
example, I want to use the font-lock of that mode in the REPL to fontify
the statements/expressions I enter at the prompt, but otherwise ignore
text.  Particularly, at the end and the beginning of the REPL buffer.  I
don't want to narrow the buffer, just the area fontification applies to.
Fontifying hundreds of megabytes of tracing print statements is not just
unnecessary, it's bad news for the GC even after the buffer is cleared IME.

If CC mode is determining more syntactic information than tree-sitter's
incremental parsing provides (per Immanuel Lizroth's comment in this
thread), then there is a disconnect somewhere in the scope of expectations
for what font-lock is supposed to do.  I'm certainly not clear (yet) on how
to cleanly separate and then rejoin a proper syntactic analysis with
fontification, and if there is "an Emacs way" to do it.

Lynn

--0000000000001a375d05e60aef42
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"auto"><div><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii &=
lt;<a href=3D"mailto:eliz@gnu.org" target=3D"_blank">eliz@gnu.org</a>&gt; w=
rote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex"><br>
Really?=C2=A0 Then please tell me how is it that we the humans can detect<b=
r>
incorrect fontifications even when shown partial strings and comments?<br>
We know that fontifications are incorrect, and where strings or<br>
comments start or end immediately, just after a single glance.=C2=A0 We<br>
never need to go to BOB to find that out.</blockquote></div></div><div dir=
=3D"auto"><br></div><div dir=3D"auto">Serious question: is fontification in=
tended to display text according to what the author probably intended, or a=
ccording to how a compiler will process that text (leaving correctness to a=
 more precise tool than font-lock, whether Semantic, tree-sitter, LSP, what=
ever)?</div><div dir=3D"auto">Because I can definitely write code that has =
some subtle issue that I will miss, and erroneously think should display on=
e way but which would be processed in a different way.=C2=A0 Should fontifi=
cation show my likely intention (plus, and only for bonus points, possibly =
highlight the error that disconnects the likely intended from the actual pa=
rse), or should it display according to the way the tools will interpret it=
 so the author will find errors that way?=C2=A0</div><div dir=3D"auto"><br>=
</div><div>When I use a dedicated IDE of recent vintage, it feels=C2=A0 les=
s like I am writing a stream of characters than filling in partially constr=
ucted objects representing the abstract syntax of the language I&#39;m writ=
ing in (with grammar that has allowances for incomplete or erroneous constr=
ucts), with the text being displayed as a representation of the underlying =
object.=C2=A0 IOW, the relationship of the syntactic object and the text is=
 inverted compared to emacs&#39;s=C2=A0design, where (if I understand corre=
ctly) the properties of the syntactic object are only tied to the text thro=
ugh text properties.=C2=A0 With the other approach, the fontification and t=
he syntax object are tied together, but with emacs the relationship seems m=
uch more tenuous. E.g. completion and fontification are completely separate=
 activities as far as I know,=C2=A0though the same contextual information s=
hould be useful for both activities.</div><div><br></div><div>I have this C=
C-mode derived mode for a DSL I did not design.=C2=A0 I&#39;m currently the=
 sole user of the mode, so I just wanted something quick and dirty.=C2=A0 B=
ut as the pile of code I deal with in this DSL grows, I want to put in Sema=
ntic support for it to get context-aware completion, precise fontification,=
 etc.=C2=A0 The current discussion has made me wonder if deriving from CC m=
ode is having some non-obvious effects on how font-lock works, making it no=
n-local in ways that are not necessary, so the re-entrant nature of the Sem=
antic parsers won&#39;t cure some of the slowness.=C2=A0 For example, I wan=
t to use the font-lock of that mode in the REPL to fontify the statements/e=
xpressions I enter at the prompt, but otherwise ignore text.=C2=A0 Particul=
arly, at the end and the beginning of the REPL buffer.=C2=A0 I don&#39;t wa=
nt to narrow the buffer, just the area fontification applies to.=C2=A0 Font=
ifying hundreds of megabytes of tracing print statements is not just unnece=
ssary, it&#39;s bad news for the GC even after the buffer is cleared IME.</=
div><div><br></div><div>If CC mode is determining more syntactic informatio=
n than tree-sitter&#39;s incremental parsing provides (per Immanuel Lizroth=
&#39;s comment in this thread), then there is a disconnect somewhere in the=
 scope of expectations for what font-lock is supposed to do.=C2=A0 I&#39;m =
certainly not clear (yet) on how to cleanly separate and then rejoin a prop=
er syntactic analysis with fontification, and if there is &quot;an Emacs wa=
y&quot; to do it.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Lynn</=
div><div dir=3D"auto"><br></div><div dir=3D"auto"><br></div><div dir=3D"aut=
o"></div></div>
</div>

--0000000000001a375d05e60aef42--