From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: JD Smith Newsgroups: gmane.emacs.bugs Subject: bug#71345: Feature: unleash font-lock's secret weapon; handle Qfontified = non-nil Date: Tue, 4 Jun 2024 11:38:05 -0400 Message-ID: <798B70AF-69BD-479E-992E-5CE9B4924820@gmail.com> References: <8A929E16-AF10-4D2B-AD71-AEAD4435F016@gmail.com> <1F2B8726-7594-494F-AB9D-08C48B7BCC43@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_6F099584-0520-4CCB-ACC9-249D01275136" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8442"; mail-complaints-to="usenet@ciao.gmane.io" Cc: dmitry@gutov.dev, 71345@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Jun 04 17:39:24 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sEWFw-0001zJ-8n for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 04 Jun 2024 17:39:24 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sEWFQ-00087H-Jz; Tue, 04 Jun 2024 11:38:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sEWFN-00086r-EM for bug-gnu-emacs@gnu.org; Tue, 04 Jun 2024 11:38:49 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sEWFN-0002wV-6c for bug-gnu-emacs@gnu.org; Tue, 04 Jun 2024 11:38:49 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sEWFa-0005fo-Bf for bug-gnu-emacs@gnu.org; Tue, 04 Jun 2024 11:39:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: JD Smith Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 04 Jun 2024 15:39:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71345 X-GNU-PR-Package: emacs X-Debbugs-Original-Cc: Dmitry Gutov , bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.171751552121753 (code B ref -1); Tue, 04 Jun 2024 15:39:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 4 Jun 2024 15:38:41 +0000 Original-Received: from localhost ([127.0.0.1]:60885 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sEWFE-0005em-H6 for submit@debbugs.gnu.org; Tue, 04 Jun 2024 11:38:41 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:52132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sEWFB-0005eU-Gc for submit@debbugs.gnu.org; Tue, 04 Jun 2024 11:38:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sEWEx-00080C-Ad for bug-gnu-emacs@gnu.org; Tue, 04 Jun 2024 11:38:23 -0400 Original-Received: from mail-ot1-x336.google.com ([2607:f8b0:4864:20::336]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sEWEt-0002mA-8Q for bug-gnu-emacs@gnu.org; Tue, 04 Jun 2024 11:38:22 -0400 Original-Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-6f8cd25ebd5so2896726a34.1 for ; Tue, 04 Jun 2024 08:38:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717515497; x=1718120297; darn=gnu.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=RFxdygIO1NIh85FSNp3Z78A4bBauKAgDag9oYG4Wstc=; b=g4UKxzSKoS72irMiBzVDOyDZWayueAtfDqOA/M8rAodhyTKaGHTVSgfx7xfqQR/FVH vtZ0RZI0gYK2a9F7+rGMvtRcJvyNuKryA5gMdMKiQmi2B+YmekcUmCVvFL0/0T0sfBqU hrpdlHs3FNuT5caMVoGO5E5CTMxWJ3Ad81gNxTjDHLnjNZZRflz68U6a6FsiGzmGglhT aQ+Waz0dRSfxS2sJUd++tSTs2qv0SIKLgj9q//BZ0TyVO5KUxbWEbUwLRIE/xrVi+vqo XlpAmfA8jzDMsPzx5EOZLDqvW4L9NZvRN3ePRpmrQ2gvcOja7OZn5cByNms5RtWgk3VZ 5Fag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717515497; x=1718120297; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RFxdygIO1NIh85FSNp3Z78A4bBauKAgDag9oYG4Wstc=; b=xJI9x/aZqmdFp1Y4VQ8wQwyEjnEYPa23uiuaZAnahCjNLmhKyHNZylTuwXXYrc4HT2 TUQli4dzOrd+uAetH7zwjDGdXV/XVr9skh5FSvY+lQr5YBBSUSoAXwqEBF1uUytKx9SY 06hJnjiztKWWZTEQRVl5xse5A7gRHki/R3RdAOIPNbZpx+gRAo3lgCNRb6hjw0qsPo3l E9LwoG0hPLVnJRpG5O2sKbfwn9SRLzGDon2dy8Wchb2WJtlEGSfWhiYOHkg6tmvVOEi4 CTst+rM0bWZ7ZGQy8DEq2q5dcBOQQ7MF3dBM9yldGwcEcrhB6uv4I6qrMq5qRu2Nbg23 hJPQ== X-Gm-Message-State: AOJu0YxnicV64xhNaieaQ3PBw+EKp2X6vk6z1wi76AzusEiB4BIe8wzl Qt0dOzYCzqc7Hr12s18UNQXm7qHH9FIy+GKiElXYwFgWRIyAXSuNvmNaZA== X-Google-Smtp-Source: AGHT+IFLYMF15LlvNqR0H965176upVpn0C/s61IgWBeoI08xsGzxf4i9wz65MyRYMN7d6+rk2VMA8w== X-Received: by 2002:a05:6830:1252:b0:6f9:3932:2ad4 with SMTP id 46e09a7af769-6f939322c4bmr2251948a34.38.1717515496884; Tue, 04 Jun 2024 08:38:16 -0700 (PDT) Original-Received: from smtpclient.apple ([131.183.131.33]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794f2f13ccdsm370753385a.44.2024.06.04.08.38.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Jun 2024 08:38:16 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3774.500.171.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::336; envelope-from=jdtsmith@gmail.com; helo=mail-ot1-x336.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:286550 Archived-At: --Apple-Mail=_6F099584-0520-4CCB-ACC9-249D01275136 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 4, 2024, at 10:15=E2=80=AFAM, Stefan Monnier = wrote: >=20 >>>> That starts to sound like a lot of property slinging, which might = even >>>> dominate the work done. >>> Indeed, this amount of work could become significant. It's my main >>> worry, but I don't have a clear feel for how serious it would be >>> in practice. >> In my situation, the most likely scenario is that fontified=3Dnil is = noticed >> during redisplay when there is a fairly large stretch of = already-fontified >> property having the same value. So jit-lock-fontify-now will quickly = find >> a nice large chunk to call my FONTIFICATION-FUNCTION=3DF-F with. >> Since jit-lock-after-change will likely clear away already-fontified = and set >> fontified=3Dnil, a single additional F-F on top of jit-lock-function = will >> probably be very well handled. A good question is how it would scale = with >> more functions all operating in the same region. One idea is to rig = up >> a test file, do some fake jit-lock-flushing on it, and check = performance of >> just subtracting/searching/dividing the already-fontified property as = you >> add more (fake) F-F's. For me, jit-lock-fontify-now of a 2500 char = chunk >> in a heavy treesitter buffer is in the 2-5ms range. Individual F-F's = could >> be much lighter weight. >=20 > I must say that I can't follow you. I suspect we're not talking about > quite the same thing. Could you clarify what is the costs you imagine > could be significant? What you compare it to? Apologies for the lack of clarity. Here I was revisiting the notion = that "this amount of work could become significant." I was trying to = convey that the costs of i) applying the proposed = jit-lock-already-fontified property (with subtraction, as in your = original idea), and ii) parsing it into regions in jit-lock-fontify-now = might in fact be fairly minimal, for my situation. My situation =3D = font-lock-fontify-region + my-special-fontify-region. =20 In other words, for many cases there would in fact not be much property = management work. This leads naturally to considering more complicated = cases, with several additional fontification functions all = interoperating. The property work will grow quickly (though I also = outlined some ideas to keep it under control, which probably already = occurred to you). > You seem to be comparing "a single big jit-lock backend" vs "several > jit-lock backends", which is a completely different worry from mine. This is indeed the implicit comparison I'm making:=20 the current situation of a single big backend which redoes EVERYTHING as = potentially large regions are invalidated, with much of its work done = unnecessarily vs.=20 multiple backends used for more targeted & orthogonal updates, at the = cost of additional property management in jit-lock. As long as the additional property management costs are well below the = savings you reap from not having repeated the unnecessary work, this = would be a positive outcome. The 2-5ms I mention is the cost for me of = running "one large backend" over one chunk =E2=80=94 namely = font-lock-fontify-region with treesitter backing. In my scenario of bar = updates resulting from point motion, this represents purely wasted work. = So if the additional "property management" costs per chunk are, say, = 100x below that, you are safely in "well worth it" territory. > Splitting a backend into several backends comes with many more issues > (such as the issue of fighting over which one controls which = properties, > or removing internal dependencies such that none of them needs to look > at the properties set by the others, ...) but that seems largely > orthogonal to the question at hand: if you want to be able to refresh > the position-dependent highlighting separately from the rest of the > highlighting you need that position-dependent highlighting to be > independent anyway (e.g. you need to be able to remove it without > affecting the position-independent highlighting). Agreed that could be an issue. In practice keyword-based fontification = can lead to these same sorts of conflicts for non trivial FACE forms = too. So backends would need to ensure the changes they are making in = the buffer are interoperable with the other likely backends (in = particular font-lock). This also raises the question of what should happen after-change. In my = view, that should wipe the slate fully clean in the changed region. = This means other backends would still need to add to = font-lock-extra-managed-props any unusual properties they will apply (or = do the equivalent on their own during unfontify). And the order of = backend registration would be significant, with the last one having "the = final word". Context re-fontification is a special case of this: some = backends could ignore that, others would need to be re-run =E2=80=94 = something they'd have to decide by themselves.=20 >> But things like `text-property-any' will be quickly defeated by the >> combinatorics of a large F-F set. >=20 > `text-property-any` only tests `eq`ness so it works just as quickly = with > a property made up of a million-element list as with a property made = of > a boolean. >=20 > IOW, I again can't follow you. I was referring to the number of such lists, not the speed of testing = them. Imagine a scenario as follows: 4 different backends are all = operating over the same region =E2=80=94 F (for normal font-lock), A, B, = and C. As various invalidation events occur and backends call = jit-lock-flush, a given region of text may accumulate a patchwork of = already-fontified lists (here assuming F always wipes the slate clean as = it works, and therefore always appears on the already-fontified list): '(F) '(F A) '(F B) '(F C) '(F A B) '(F A C) '(F B C) '(F A B C) So jit-lock-fontify-now's job has gotten quite challenging, as it = decides over what region to apply a particular backend, say A. To know = whether it can skip A, it must either look inside all the lists to see = if there's an A, or it must look for lists `eq` to all possible = combinations which contain A. =20 It's possible you've already conceived of this and have a solution in = mind; apologies if so. My simple solution to this was to let the = property values themselves constitute the list of already-done/pending = backends. Then it's much easier to ask "is A already fontified = everywhere in this block"? >> So here's an idea. You could invert the logic, and have a set of >> `fontified-pending' properties which jit-lock-flush adds to as it = sets >> fontified=3Dnil, >=20 > Yes, of course, we could use the complement set. The distinct idea here was to map each backend to an individual = property, in place of the idea of a single property holding a list of = already-done or pending backends, with the aim of significantly reducing = property management costs. That's really just an implementation detail = though. =20 I think your concern of backend priority and the related issue of how = after-change and contextual refontification are handled is probably more = important to sort out.= --Apple-Mail=_6F099584-0520-4CCB-ACC9-249D01275136 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Jun 4, 2024, at 10:15=E2=80=AFAM, Stefan Monnier = <monnier@iro.umontreal.ca> wrote:

That starts to sound like a lot of property slinging, = which might even
dominate the work done.
Indeed, this = amount of work could become significant.  It's my main
worry, = but I don't have a clear feel for how serious it would be
in = practice.
In my situation, the most likely scenario is = that fontified=3Dnil is noticed
during redisplay when there is a = fairly large stretch of already-fontified
property having the same = value.  So jit-lock-fontify-now will quickly find
a nice large = chunk to call my FONTIFICATION-FUNCTION=3DF-F = with.

Since jit-lock-after-change will likely clear = away already-fontified and set
fontified=3Dnil, a single additional = F-F on top of jit-lock-function will
probably be very well handled. =  A good question is how it would scale with
more functions all = operating in the same region.  One idea is to rig up
a test = file, do some fake jit-lock-flushing on it, and check performance = of
just subtracting/searching/dividing the already-fontified property = as you
add more (fake) F-F's.   For me, = jit-lock-fontify-now of a 2500 char chunk
in a heavy treesitter = buffer is in the 2-5ms range.  Individual F-F's could
be much = lighter weight.

I must = say that I can't follow you.  I suspect we're not talking = about
quite the same thing. =  Could you clarify what is the costs you imagine
could be significant?  What you = compare it to?

Apologies for = the lack of clarity.  Here I was revisiting the notion that "this = amount of work could become significant."  I was trying to convey = that the costs of i) applying the proposed jit-lock-already-fontified = property (with subtraction, as in your original idea), and ii) parsing = it into regions in jit-lock-fontify-now might in fact be fairly minimal, = for my situation.  My situation =3D font-lock-fontify-region + = my-special-fontify-region.  

In other words, for many cases there would = in fact not be much property management work.  This leads = naturally to considering more complicated cases, with several additional = fontification functions all interoperating.  The property work will = grow quickly (though I also outlined some ideas to keep it under = control, which probably already occurred to = you).

You seem to be comparing "a single big = jit-lock backend" vs "several
jit-lock backends", which is a completely different worry = from mine.

This is indeed the = implicit comparison I'm making: 

  1. the current situation of a single big backend = which redoes EVERYTHING as potentially large regions are invalidated, = with much of its work done unnecessarily vs. 
  2. multiple = backends used for more targeted & orthogonal updates, at the cost of = additional property management in = jit-lock.

As long as the additional = property management costs are well below the savings you reap from not = having repeated the unnecessary work, this would be a positive outcome. =  The 2-5ms I mention is the cost for me of running "one large = backend" over one chunk =E2=80=94 namely font-lock-fontify-region with = treesitter backing.  In my scenario of bar updates resulting from = point motion, this represents purely wasted work.  So if the = additional "property management" costs per chunk are, say, 100x below = that, you are safely in "well worth it" = territory.

Splitting a backend into several backends = comes with many more issues
(such = as the issue of fighting over which one controls which = properties,
or removing internal = dependencies such that none of them needs to look
at the properties set by the others, ...) = but that seems largely
orthogonal to the question at hand: if you want to be able = to refresh
the position-dependent = highlighting separately from the rest of the
highlighting you need that = position-dependent highlighting to be
independent anyway (e.g. you need to be able to remove it = without
affecting the = position-independent highlighting).

Agreed = that could be an issue.  In practice keyword-based fontification = can lead to these same sorts of conflicts for non trivial FACE forms = too.  So backends would need to ensure the changes they are making = in the buffer are interoperable with the other likely backends (in = particular font-lock).

This also raises the = question of what should happen after-change.  In my view, that = should wipe the slate fully clean in the changed region.  This = means other backends would still need to add = to font-lock-extra-managed-props any unusual properties they will = apply (or do the equivalent on their own during unfontify).  And = the order of backend registration would be significant, with the last = one having "the final word".  Context re-fontification is a special = case of this: some backends could ignore that, others would need to be = re-run =E2=80=94 something they'd have to decide by = themselves. 

But things like `text-property-any' will be quickly defeated by = the
combinatorics of a large F-F set.

`text-property-any` only tests `eq`ness so = it works just as quickly with
a = property made up of a million-element list as with a property made = of
a boolean.

IOW, I again can't follow you.

I was referring to the = number of such lists, not the speed of testing them.  Imagine a = scenario as follows: 4 different backends are all operating over the = same region =E2=80=94 F (for normal font-lock), A, B, and C.  As = various invalidation events occur and backends call jit-lock-flush, a = given region of text may accumulate a patchwork of already-fontified = lists (here assuming F always wipes the slate clean as it works, and = therefore always appears on the already-fontified = list):

'(F) '(F A) '(F = B) '(F C) '(F A B)  '(F A C) '(F B C)  '(F A = B C)

So jit-lock-fontify-now's = job has gotten quite challenging, as it decides over what region to = apply a particular backend, say A.  To know whether it can skip A, = it must either look inside all the lists to see if there's an A, or it = must look for lists `eq` to all possible combinations which contain A. =  

It's possible you've already conceived of this = and have a solution in mind; apologies if so.  My simple solution = to this was to let the property values themselves constitute the list of = already-done/pending backends.  Then it's much easier to ask "is A = already fontified everywhere in this block"?

So here's an idea.  You could invert the = logic, and have a set of
`fontified-pending' properties which = jit-lock-flush adds to as it sets
fontified=3Dnil,

Yes, of course, we could use the complement = set.

The distinct idea here was = to map each backend to an individual property, in place of the idea of a = single property holding a list of already-done or pending backends, with = the aim of significantly reducing property management costs. =  That's really just an implementation detail though. =  

I think your concern of backend priority = and the related issue of how after-change and contextual refontification = are handled is probably more important to sort = out.
= --Apple-Mail=_6F099584-0520-4CCB-ACC9-249D01275136--