* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-06 20:13 Major modes using `widen' is a good, even essential, programming practice Alan Mackenzie
@ 2022-08-06 21:05 ` Stefan Monnier
2022-08-07 6:03 ` Eli Zaretskii
` (2 subsequent siblings)
3 siblings, 0 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-06 21:05 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel, Gregory Heytings
> Narrowing is primarily a user feature. Users can arbitrarily narrow a
> buffer to ANY contiguous region of text. So when a major mode needs to
> examine text even slightly distant from point, it MUST widen, to be sure
> that the text to be examined is within the visible region.
Agreed.
> Also, in font locking, a major mode might need to examine text
> arbitrarily far from the fontification region.
Agreed.
But the two cases are quite different: the font-lock case goes through
font-lock, which knows very well that the whole buffer is needed, so *it*
widens, making it unnecessary for the major mode to do so.
The same holds for `indent-line-function` where the generic code widens
before calling that function.
In both cases, it's then very important that the major mode does not
redundantly widen on its own because it's not just redundant but it
breaks those cases where widening is wrong (most commonly because of
something like MMM-mode).
So, yes, calling widen is normal for a command provided by a major mode.
But it's almost always a bug for a major mode function called by
font-lock or by the indentation code.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-06 20:13 Major modes using `widen' is a good, even essential, programming practice Alan Mackenzie
2022-08-06 21:05 ` Stefan Monnier
@ 2022-08-07 6:03 ` Eli Zaretskii
2022-08-07 13:31 ` Gregory Heytings
2022-08-07 17:57 ` Dmitry Gutov
3 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 6:03 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel, gregory
> Date: Sat, 6 Aug 2022 20:13:13 +0000
> From: Alan Mackenzie <acm@muc.de>
>
> Narrowing is primarily a user feature. Users can arbitrarily narrow a
> buffer to ANY contiguous region of text.
This feature doesn't interfere with user-level narrowing.
> So when a major mode needs to examine text even slightly distant
> from point, it MUST widen
No, it doesn't. It shouldn't. If the buffer is narrowed, the
portions outside the narrowing don't exist, period. That's how most
of Emacs behaves, starting from display and ending with navigation and
search commands. With display, for example, narrowing a buffer in a
certain way can completely disrupt how bidirectional text is shown,
and yet we intentionally do nothing to avoid that. Neither should any
major mode.
A mode is not exempt from obeying the narrowing. It cannot claim that
it "knows better". If some features of the mode work in suboptimal
fashion in the presence of narrowing, the mode has no business
"fixing" that -- it's what whoever did the narrowing wanted, so that
is what they should get.
> Also, in font locking, a major mode might need to examine text
> arbitrarily far from the fontification region.
It shouldn't. The Emacs display code specifically asks the mode to
fontify only a relatively small chunk of text, and it does so for a
very good reason: performance, especially in large buffers.
That CC Mode unilaterally decided not to be bound by these
consideration is IMNSHO a very unfortunate decision, to say the least,
that causes us and me personally (as someone who works on C and C++
code a lot) a lot of grief. Editing C/C++ code is lamentably becoming
more and more slow and sluggish with each new Emacs release, even on
moderately fast machines. Significantly, it doesn't become more
stable, as bugs keep being reported.
As I've said many times, if this is the direction you think CC Mode
should be developed, I'd prefer less accurate fontification that would
give us simpler, faster, less buggy, and more easily maintained code.
My hope is that tree-sitter integration will perhaps solve this for
good, because frankly, I'm sorry to say that I've lost all hope that
CC Mode by itself will ever be fixed to perform better.
> For example, to know whether or not a particular place is inside a
> comment or a string, or to know whether to fontify a string opener
> with f-l-warning-face (when the string is unterminated) or
> f-l-string-face (for a validly terminated string).
If a string begins outside of the narrowing, it is not a string,
period.
> There's no "weren't supposed to do" involved anywhere. Major modes,
> like all Emacs Lisp programs, have, by design, access to the full range
> of Emacs's facilities.
No, they don't, they shouldn't. You have it backwards: the design of
Emacs in the presence of narrowing is to behave as if the text outside
of the restriction didn't exist.
> I've recently spent many hours reading the discussion for bug #56682 and
> its predecessor bug, and if I recall correctly, you have put the
> `widen'/ `narrow-to-region' "bug" into other hooks called from
> redisplay. If I'm right, there, was there any specific reason for this?
> Like, did it dramatically speed anything up (and I'm not talking about a
> mere 10%, say, here)?
The speedups are dramatic. You can easily try that yourself; the
recipe for doing that was described here many times.
> Several times in the thread when other posters have complained about the
> new long lines facility, you have invited them to "set
> long-line-threshold to nil". That doesn't allow them to use the other
> benefits of the facility whilst retaining fully functional `widen',
> 'narrow-to-region' and font-locking. Would it be possible, please, to
> add an option to enable such a mix of features?
Which "facility" are you talking about, specifically? The changes
related to long lines involve locked narrowing in several places; they
don't involve anything else.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-06 20:13 Major modes using `widen' is a good, even essential, programming practice Alan Mackenzie
2022-08-06 21:05 ` Stefan Monnier
2022-08-07 6:03 ` Eli Zaretskii
@ 2022-08-07 13:31 ` Gregory Heytings
2022-08-07 14:13 ` Alan Mackenzie
2022-08-07 17:57 ` Dmitry Gutov
3 siblings, 1 reply; 136+ messages in thread
From: Gregory Heytings @ 2022-08-07 13:31 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
>
> Major modes, like all Emacs Lisp programs, have, by design, access to
> the full range of Emacs's facilities. Should it make your task more
> difficult (and I don't see that it does), that's your problem, not the
> major modes' maintainers'.
>
I fear it is not my problem, no.
jit-lock-functions is an API, with a contract. CC Mode decided to break
that contract. It is wrong that "arbitrary Lisp can be executed through
fontification-functions", as you said earlier. A function called from
fontification-functions isn't supposed to download a file, or to send an
email, or to change a user option, or to remove or create a file, or to
remove or insert text in the buffer, or to kill Emacs or a frame or a
window or the current buffer, or to change the window layout, and so on
and so forth.
So far breaching that contract has not posed major problems, except of
course for CC Mode users, who had to bear its sluggishness. We now have a
new feature which assumes, and rightly so, that this contract is obeyed.
This breaks CC Mode.
From where I stand, there are two options here: either CC Mode disables
the new feature (which is, on purpose, 100% backward compatible), or CC
Mode is fixed to obey the jit-lock-functions contract, and to become
compatible with the new feature (which will likely also, by side-effect
and as a bonus, fix the aforementioned sluggishness).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 13:31 ` Gregory Heytings
@ 2022-08-07 14:13 ` Alan Mackenzie
2022-08-07 14:20 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-07 14:13 UTC (permalink / raw)
To: Gregory Heytings; +Cc: emacs-devel
Hello, Gregory.
On Sun, Aug 07, 2022 at 13:31:05 +0000, Gregory Heytings wrote:
> > Major modes, like all Emacs Lisp programs, have, by design, access to
> > the full range of Emacs's facilities. Should it make your task more
> > difficult (and I don't see that it does), that's your problem, not the
> > major modes' maintainers'.
> I fear it is not my problem, no.
> jit-lock-functions is an API, with a contract. CC Mode decided to break
> that contract.
Where, exactly are the terms of this supposed contract formulated? And
which part of this supposed contract has CC Mode broken?
I suspect that this "contract" is something implicit you have in your
understanding of Emacs, shared by some other people, and that you have
assumed that everybody else shares that understanding. Such conflicts
have occurred before.
> It is wrong that "arbitrary Lisp can be executed through
> fontification-functions", as you said earlier.
It is not wrong. If you disagree, point out where, say in the Elisp
manual, these restrictions are imposed.
> A function called from fontification-functions isn't supposed to
> download a file, or to send an email, or to change a user option, or
> to remove or create a file, or to remove or insert text in the buffer,
> or to kill Emacs or a frame or a window or the current buffer, or to
> change the window layout, and so on and so forth.
That doesn't deserve a reply, so won't be getting one.
> So far breaching that contract has not posed major problems, except of
> course for CC Mode users, who had to bear its sluggishness. We now have a
> new feature which assumes, and rightly so, that this contract is obeyed.
This new feature deliberately breaks contracts, in particular the
definitions of widen and narrow-to-region. That will, sooner or later,
break existing code. Maybe this is necessary, but having read the bug
threads, I didn't notice much effort being put into preserving these
contracts.
If you find CC Mode too sluggish for you (I don't), then configure it to
be faster and inaccurate by setting font-lock-maximum-decoration.
> This breaks CC Mode.
CC Mode will cope.
> >From where I stand, there are two options here: either CC Mode disables
> the new feature (which is, on purpose, 100% backward compatible), or CC
> Mode is fixed to obey the jit-lock-functions contract, and to become
> compatible with the new feature (which will likely also, by side-effect
> and as a bonus, fix the aforementioned sluggishness).
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 14:13 ` Alan Mackenzie
@ 2022-08-07 14:20 ` Eli Zaretskii
2022-08-07 14:59 ` Alan Mackenzie
2022-08-07 20:17 ` Major modes using `widen' is a good, even essential, programming practice Gregory Heytings
2022-08-07 23:21 ` Stefan Monnier
2 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 14:20 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Sun, 7 Aug 2022 14:13:36 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > jit-lock-functions is an API, with a contract. CC Mode decided to break
> > that contract.
>
> Where, exactly are the terms of this supposed contract formulated? And
> which part of this supposed contract has CC Mode broken?
jit-lock calls the functions with two arguments, BEG and END, and
expects them to work only on that chunk of text.
> If you find CC Mode too sluggish for you (I don't), then configure it to
> be faster and inaccurate by setting font-lock-maximum-decoration.
It doesn't help, IME. The mode is still slow.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 14:20 ` Eli Zaretskii
@ 2022-08-07 14:59 ` Alan Mackenzie
2022-08-07 15:13 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-07 14:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gregory, emacs-devel
Hello, Eli.
On Sun, Aug 07, 2022 at 17:20:52 +0300, Eli Zaretskii wrote:
> > Date: Sun, 7 Aug 2022 14:13:36 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > jit-lock-functions is an API, with a contract. CC Mode decided to break
> > > that contract.
> > Where, exactly are the terms of this supposed contract formulated? And
> > which part of this supposed contract has CC Mode broken?
> jit-lock calls the functions with two arguments, BEG and END, and
> expects them to work only on that chunk of text.
I don't think you really mean that. Consider the second jit-lock chunk
at the beginning of xdisp.c. Fontifying that chunk involves looking
back 1500 characters before BEG to see that it needs
font-lock-comment-face. You might argue that that information will be
in a cache anyway, but that's not dependable.
Also, the (BEG END) region will typically get rounded up to whole lines,
again "violating" that chunk. In principle, font-lock needs to look
outside of (BEG END).
> > If you find CC Mode too sluggish for you (I don't), then configure it to
> > be faster and inaccurate by setting font-lock-maximum-decoration.
> It doesn't help, IME. The mode is still slow.
OK, I need to look at CC Mode with f-l-m-d 2. There are surely things
which can be taken out of that.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 14:59 ` Alan Mackenzie
@ 2022-08-07 15:13 ` Eli Zaretskii
2022-08-07 17:01 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 15:13 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Sun, 7 Aug 2022 14:59:06 +0000
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > jit-lock calls the functions with two arguments, BEG and END, and
> > expects them to work only on that chunk of text.
>
> I don't think you really mean that.
No, I really do.
> Consider the second jit-lock chunk
> at the beginning of xdisp.c. Fontifying that chunk involves looking
> back 1500 characters before BEG to see that it needs
> font-lock-comment-face. You might argue that that information will be
> in a cache anyway, but that's not dependable.
Either in the cache or in the buffer: the previous chunk was
fontified, so its end has the font-lock-comment-face. So you know.
> Also, the (BEG END) region will typically get rounded up to whole lines,
> again "violating" that chunk.
That's a far cry from going to BOB. And if you ask nicely, we could
arrange that jit-lock calls you only on line boundaries (unless lines
are longer than some reasonable value).
> In principle, font-lock needs to look outside of (BEG END).
No, it doesn't. A string cannot begin before a beginning of a
function, for example. And if you need to go too far, just give up
and blame the user who writes such code. It is much better than
letting every use of CC Mode wait because once in a blue moon someone
could have a very long string.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 15:13 ` Eli Zaretskii
@ 2022-08-07 17:01 ` Alan Mackenzie
2022-08-07 17:23 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-07 17:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gregory, emacs-devel
Hello, Eli.
On Sun, Aug 07, 2022 at 18:13:47 +0300, Eli Zaretskii wrote:
> > Date: Sun, 7 Aug 2022 14:59:06 +0000
> > Cc: gregory@heytings.org, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > jit-lock calls the functions with two arguments, BEG and END, and
> > > expects them to work only on that chunk of text.
> > I don't think you really mean that.
> No, I really do.
> > Consider the second jit-lock chunk
> > at the beginning of xdisp.c. Fontifying that chunk involves looking
> > back 1500 characters before BEG to see that it needs
> > font-lock-comment-face. You might argue that that information will be
> > in a cache anyway, but that's not dependable.
> Either in the cache or in the buffer: the previous chunk was
> fontified, so its end has the font-lock-comment-face. So you know.
No, you don't. The buffer might be being opened by desktop in a large
comment in the middle of the file.
What jit-lock/font-lock actually do at the moment is to widen, then use
syntax-ppss, i.e. in effect scan from BOB.
> > Also, the (BEG END) region will typically get rounded up to whole lines,
> > again "violating" that chunk.
> That's a far cry from going to BOB. And if you ask nicely, we could
> arrange that jit-lock calls you only on line boundaries (unless lines
> are longer than some reasonable value).
The search for line boundaries is done by font-lock.el.
> > In principle, font-lock needs to look outside of (BEG END).
> No, it doesn't. A string cannot begin before a beginning of a
> function, for example. And if you need to go too far, just give up
> and blame the user who writes such code. It is much better than
> letting every use of CC Mode wait because once in a blue moon someone
> could have a very long string.
That "needing to go too far" is an instantaneous jump, not a scanning.
The string start will be in a parse-partial-sexp result somewhere.
Sometimes people write long strings. They certainly write long comments.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 17:01 ` Alan Mackenzie
@ 2022-08-07 17:23 ` Eli Zaretskii
2022-08-07 17:53 ` Dmitry Gutov
2022-08-07 19:20 ` Alan Mackenzie
0 siblings, 2 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 17:23 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Sun, 7 Aug 2022 17:01:09 +0000
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > Consider the second jit-lock chunk
> > > at the beginning of xdisp.c. Fontifying that chunk involves looking
> > > back 1500 characters before BEG to see that it needs
> > > font-lock-comment-face. You might argue that that information will be
> > > in a cache anyway, but that's not dependable.
>
> > Either in the cache or in the buffer: the previous chunk was
> > fontified, so its end has the font-lock-comment-face. So you know.
>
> No, you don't. The buffer might be being opened by desktop in a large
> comment in the middle of the file.
You've changed the scenario, yes?
> What jit-lock/font-lock actually do at the moment is to widen, then use
> syntax-ppss, i.e. in effect scan from BOB.
Yes, and that's SLOOOWWWW!
> > > Also, the (BEG END) region will typically get rounded up to whole lines,
> > > again "violating" that chunk.
>
> > That's a far cry from going to BOB. And if you ask nicely, we could
> > arrange that jit-lock calls you only on line boundaries (unless lines
> > are longer than some reasonable value).
>
> The search for line boundaries is done by font-lock.el.
I don't trust it to DTRT when lines are very long.
> > > In principle, font-lock needs to look outside of (BEG END).
>
> > No, it doesn't. A string cannot begin before a beginning of a
> > function, for example. And if you need to go too far, just give up
> > and blame the user who writes such code. It is much better than
> > letting every use of CC Mode wait because once in a blue moon someone
> > could have a very long string.
>
> That "needing to go too far" is an instantaneous jump, not a scanning.
Please tell that to someone who doesn't edit C sources as frequently
as I do.
> The string start will be in a parse-partial-sexp result somewhere.
> Sometimes people write long strings. They certainly write long comments.
Why do I have top suffer every day just because someone, somewhere,
might do that? I'd rather we "punish" those few people who do it
(rarely).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 17:23 ` Eli Zaretskii
@ 2022-08-07 17:53 ` Dmitry Gutov
2022-08-07 18:00 ` Eli Zaretskii
2022-08-07 19:20 ` Alan Mackenzie
1 sibling, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-07 17:53 UTC (permalink / raw)
To: Eli Zaretskii, Alan Mackenzie; +Cc: gregory, emacs-devel
On 07.08.2022 20:23, Eli Zaretskii wrote:
>> What jit-lock/font-lock actually do at the moment is to widen, then use
>> syntax-ppss, i.e. in effect scan from BOB.
> Yes, and that's SLOOOWWWW!
It's not slow on files of reasonable size. E.g. even if we take xdisp.c
(the file I've seen referred to in complaints on CC Mode's speed),
(benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
(point-max))))
reports 20-50 ms on my machine. And that result is cached too.
So whatever perceived slowness is there, it most likely comes from other
sources.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 17:53 ` Dmitry Gutov
@ 2022-08-07 18:00 ` Eli Zaretskii
2022-08-07 18:05 ` Dmitry Gutov
` (2 more replies)
0 siblings, 3 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 18:00 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: acm, gregory, emacs-devel
> Date: Sun, 7 Aug 2022 20:53:58 +0300
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 07.08.2022 20:23, Eli Zaretskii wrote:
> >> What jit-lock/font-lock actually do at the moment is to widen, then use
> >> syntax-ppss, i.e. in effect scan from BOB.
> > Yes, and that's SLOOOWWWW!
>
> It's not slow on files of reasonable size. E.g. even if we take xdisp.c
> (the file I've seen referred to in complaints on CC Mode's speed),
>
> (benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
> (point-max))))
>
> reports 20-50 ms on my machine.
It takes 0.5 sec here.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:00 ` Eli Zaretskii
@ 2022-08-07 18:05 ` Dmitry Gutov
2022-08-07 18:37 ` Eli Zaretskii
2022-08-07 18:49 ` Óscar Fuentes
2022-08-07 18:56 ` Lars Ingebrigtsen
2 siblings, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-07 18:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: acm, gregory, emacs-devel
On 07.08.2022 21:00, Eli Zaretskii wrote:
>> Date: Sun, 7 Aug 2022 20:53:58 +0300
>> Cc:gregory@heytings.org,emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 07.08.2022 20:23, Eli Zaretskii wrote:
>>>> What jit-lock/font-lock actually do at the moment is to widen, then use
>>>> syntax-ppss, i.e. in effect scan from BOB.
>>> Yes, and that's SLOOOWWWW!
>> It's not slow on files of reasonable size. E.g. even if we take xdisp.c
>> (the file I've seen referred to in complaints on CC Mode's speed),
>>
>> (benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
>> (point-max))))
>>
>> reports 20-50 ms on my machine.
> It takes 0.5 sec here.
Interesting. That would mean that scrolling to the end of xdisp.c will
at least take this long the first time around.
The more than 10x difference in performance is weird, though. Perhaps
you have any ideas why parse-partial-sexp's implementation might behave
poorly on your un-optimized build? Some tweak in the code might make a
lot of difference.
This is an area I can't be of much help with, unfortunately.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:05 ` Dmitry Gutov
@ 2022-08-07 18:37 ` Eli Zaretskii
2022-08-07 23:02 ` Stefan Monnier
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 18:37 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: acm, gregory, emacs-devel
> Date: Sun, 7 Aug 2022 21:05:44 +0300
> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> >> reports 20-50 ms on my machine.
> > It takes 0.5 sec here.
>
> Interesting. That would mean that scrolling to the end of xdisp.c will
> at least take this long the first time around.
>
> The more than 10x difference in performance is weird, though. Perhaps
> you have any ideas why parse-partial-sexp's implementation might behave
> poorly on your un-optimized build? Some tweak in the code might make a
> lot of difference.
I hope Stefan could do something about it.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:37 ` Eli Zaretskii
@ 2022-08-07 23:02 ` Stefan Monnier
0 siblings, 0 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-07 23:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Dmitry Gutov, acm, gregory, emacs-devel
>> The more than 10x difference in performance is weird, though. Perhaps
>> you have any ideas why parse-partial-sexp's implementation might behave
>> poorly on your un-optimized build? Some tweak in the code might make a
>> lot of difference.
>
> I hope Stefan could do something about it.
Not sure I can be of help. Over the last few years we've had a few
times some significant performance regressions in the unoptimized build,
typically because of code cleanups that replace macros with inlinable
functions or things like that, where the optimized build usually doesn't
suffer (or even gets faster) which the unoptimized build can be
significantly impacted.
Way back when, an unoptimized build was not terribly slower than a -O2
build, but our current coding style in src/*.[ch] tends to make that
difference much larger.
The last few times we saw that, Paul has been quite good at finding
a few culprits and tweaking the code so that the overall impact is
brought back down to something acceptable.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:00 ` Eli Zaretskii
2022-08-07 18:05 ` Dmitry Gutov
@ 2022-08-07 18:49 ` Óscar Fuentes
2022-08-07 18:59 ` Eli Zaretskii
2022-08-07 18:56 ` Lars Ingebrigtsen
2 siblings, 1 reply; 136+ messages in thread
From: Óscar Fuentes @ 2022-08-07 18:49 UTC (permalink / raw)
To: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> > Yes, and that's SLOOOWWWW!
>>
>> It's not slow on files of reasonable size. E.g. even if we take xdisp.c
>> (the file I've seen referred to in complaints on CC Mode's speed),
>>
>> (benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
>> (point-max))))
>>
>> reports 20-50 ms on my machine.
>
> It takes 0.5 sec here.
Is that the unoptimized build you usually work with?
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:49 ` Óscar Fuentes
@ 2022-08-07 18:59 ` Eli Zaretskii
0 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-07 18:59 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Sun, 07 Aug 2022 20:49:03 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> > Yes, and that's SLOOOWWWW!
> >>
> >> It's not slow on files of reasonable size. E.g. even if we take xdisp.c
> >> (the file I've seen referred to in complaints on CC Mode's speed),
> >>
> >> (benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
> >> (point-max))))
> >>
> >> reports 20-50 ms on my machine.
> >
> > It takes 0.5 sec here.
>
> Is that the unoptimized build you usually work with?
Yes.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 18:00 ` Eli Zaretskii
2022-08-07 18:05 ` Dmitry Gutov
2022-08-07 18:49 ` Óscar Fuentes
@ 2022-08-07 18:56 ` Lars Ingebrigtsen
2 siblings, 0 replies; 136+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-07 18:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Dmitry Gutov, acm, gregory, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> It's not slow on files of reasonable size. E.g. even if we take xdisp.c
>> (the file I've seen referred to in complaints on CC Mode's speed),
>>
>> (benchmark 1 '(save-excursion (parse-partial-sexp (point-min)
>> (point-max))))
>>
>> reports 20-50 ms on my machine.
>
> It takes 0.5 sec here.
Takes 0.03s here on an --enable-checking=yes build.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 17:23 ` Eli Zaretskii
2022-08-07 17:53 ` Dmitry Gutov
@ 2022-08-07 19:20 ` Alan Mackenzie
2022-08-07 19:26 ` Dmitry Gutov
2022-08-08 2:36 ` Eli Zaretskii
1 sibling, 2 replies; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-07 19:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gregory, emacs-devel
Hello, Eli.
On Sun, Aug 07, 2022 at 20:23:21 +0300, Eli Zaretskii wrote:
> > Date: Sun, 7 Aug 2022 17:01:09 +0000
> > Cc: gregory@heytings.org, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > > Consider the second jit-lock chunk
> > > > at the beginning of xdisp.c. Fontifying that chunk involves looking
> > > > back 1500 characters before BEG to see that it needs
> > > > font-lock-comment-face. You might argue that that information will be
> > > > in a cache anyway, but that's not dependable.
> > > Either in the cache or in the buffer: the previous chunk was
> > > fontified, so its end has the font-lock-comment-face. So you know.
> > No, you don't. The buffer might be being opened by desktop in a large
> > comment in the middle of the file.
> You've changed the scenario, yes?
Yes. We've got to deal with all scenarios, preferably without
special-caseing special cases.
> > What jit-lock/font-lock actually do at the moment is to widen, then use
> > syntax-ppss, i.e. in effect scan from BOB.
> Yes, and that's SLOOOWWWW!
On my machine, with an optimised build, it takes just under 20 ms to
parse-partial-sexp over xdisp.c (not counting any redisplay at the end).
I don't understand any more than Dmitry does, why your unoptimised build
is taking 25 times as long.
> > > > Also, the (BEG END) region will typically get rounded up to whole
> > > > lines, again "violating" that chunk.
> > > That's a far cry from going to BOB. And if you ask nicely, we
> > > could arrange that jit-lock calls you only on line boundaries
> > > (unless lines are longer than some reasonable value).
> > The search for line boundaries is done by font-lock.el.
> I don't trust it to DTRT when lines are very long.
I think I raised the topic a few days ago of font-lock expanding regions
to whole lines. Maybe we shouldn't do it for long lines. We'd need
something in its place, though.
> > > > In principle, font-lock needs to look outside of (BEG END).
> > > No, it doesn't. A string cannot begin before a beginning of a
> > > function, for example. And if you need to go too far, just give up
> > > and blame the user who writes such code. It is much better than
> > > letting every use of CC Mode wait because once in a blue moon someone
> > > could have a very long string.
> > That "needing to go too far" is an instantaneous jump, not a scanning.
> Please tell that to someone who doesn't edit C sources as frequently
> as I do.
Are you saying that long strings and long comments cause a particular
slowdown in C Mode, not seen when strings and comments are all short?
> > The string start will be in a parse-partial-sexp result somewhere.
> > Sometimes people write long strings. They certainly write long comments.
> Why do I have top suffer every day just because someone, somewhere,
> might do that? I'd rather we "punish" those few people who do it
> (rarely).
I don't think we should punish people who write comments. I'm thinking
of Gerd M., who was likely the writer of the comment at the beginning of
xdisp.c.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 19:20 ` Alan Mackenzie
@ 2022-08-07 19:26 ` Dmitry Gutov
2022-08-08 2:36 ` Eli Zaretskii
1 sibling, 0 replies; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-07 19:26 UTC (permalink / raw)
To: Alan Mackenzie, Eli Zaretskii; +Cc: gregory, emacs-devel
On 07.08.2022 22:20, Alan Mackenzie wrote:
>>> The search for line boundaries is done by font-lock.el.
>> I don't trust it to DTRT when lines are very long.
> I think I raised the topic a few days ago of font-lock expanding regions
> to whole lines. Maybe we shouldn't do it for long lines. We'd need
> something in its place, though.
>
See the recent addition of syntax-wholeline-max.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 19:20 ` Alan Mackenzie
2022-08-07 19:26 ` Dmitry Gutov
@ 2022-08-08 2:36 ` Eli Zaretskii
2022-08-08 9:58 ` Alan Mackenzie
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 2:36 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Sun, 7 Aug 2022 19:20:44 +0000
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > > Either in the cache or in the buffer: the previous chunk was
> > > > fontified, so its end has the font-lock-comment-face. So you know.
>
> > > No, you don't. The buffer might be being opened by desktop in a large
> > > comment in the middle of the file.
>
> > You've changed the scenario, yes?
>
> Yes. We've got to deal with all scenarios, preferably without
> special-caseing special cases.
No one said that all the scenarios must have the same solution.
> > > What jit-lock/font-lock actually do at the moment is to widen, then use
> > > syntax-ppss, i.e. in effect scan from BOB.
>
> > Yes, and that's SLOOOWWWW!
>
> On my machine, with an optimised build, it takes just under 20 ms to
> parse-partial-sexp over xdisp.c (not counting any redisplay at the end).
> I don't understand any more than Dmitry does, why your unoptimised build
> is taking 25 times as long.
It doesn't help to know that some very fast machine can do this stuff
quickly enough to remain below the annoyance threshold. 20 ms is a
very long time by the current CPU speed measure: just calculate the
number of CPU cycles in that time and you will see it.
> > > That "needing to go too far" is an instantaneous jump, not a scanning.
>
> > Please tell that to someone who doesn't edit C sources as frequently
> > as I do.
>
> Are you saying that long strings and long comments cause a particular
> slowdown in C Mode, not seen when strings and comments are all short?
I don't know what makes it slow, but it feels sluggish in even the
simplest editing operations, and font-lock updates are slow as well.
> > > The string start will be in a parse-partial-sexp result somewhere.
> > > Sometimes people write long strings. They certainly write long comments.
>
> > Why do I have top suffer every day just because someone, somewhere,
> > might do that? I'd rather we "punish" those few people who do it
> > (rarely).
>
> I don't think we should punish people who write comments. I'm thinking
> of Gerd M., who was likely the writer of the comment at the beginning of
> xdisp.c.
We are still talking about long lines, yes? There are no long lines
in that commentary at the beginning of xdisp.c.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 2:36 ` Eli Zaretskii
@ 2022-08-08 9:58 ` Alan Mackenzie
2022-08-08 11:39 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 9:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gregory, emacs-devel
Hello, Eli.
On Mon, Aug 08, 2022 at 05:36:08 +0300, Eli Zaretskii wrote:
> > Date: Sun, 7 Aug 2022 19:20:44 +0000
> > Cc: gregory@heytings.org, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > > > Either in the cache or in the buffer: the previous chunk was
> > > > > fontified, so its end has the font-lock-comment-face. So you know.
> > > > No, you don't. The buffer might be being opened by desktop in a large
> > > > comment in the middle of the file.
> > > You've changed the scenario, yes?
> > Yes. We've got to deal with all scenarios, preferably without
> > special-caseing special cases.
> No one said that all the scenarios must have the same solution.
I suppose not, but optimising for different scenarios would be, well, an
optimisation. Do we really need it.
> > > > What jit-lock/font-lock actually do at the moment is to widen, then use
> > > > syntax-ppss, i.e. in effect scan from BOB.
> > > Yes, and that's SLOOOWWWW!
> > On my machine, with an optimised build, it takes just under 20 ms to
> > parse-partial-sexp over xdisp.c (not counting any redisplay at the end).
> > I don't understand any more than Dmitry does, why your unoptimised build
> > is taking 25 times as long.
> It doesn't help to know that some very fast machine can do this stuff
> quickly enough to remain below the annoyance threshold. 20 ms is a
> very long time by the current CPU speed measure: just calculate the
> number of CPU cycles in that time and you will see it.
We're talking about 1.2 MB here. That works out at less than 17 ns per
character. Each round of the loop is a fairly sophisticated finite state
machine pass. Possibly it could be optimised, but I doubt by very much.
Some things take a certain amount of time, and there's nothing we can do
about it. (Of course we can do things about some other aspects.)
> > > > That "needing to go too far" is an instantaneous jump, not a scanning.
> > > Please tell that to someone who doesn't edit C sources as frequently
> > > as I do.
> > Are you saying that long strings and long comments cause a particular
> > slowdown in C Mode, not seen when strings and comments are all short?
> I don't know what makes it slow, but it feels sluggish in even the
> simplest editing operations, and font-lock updates are slow as well.
How about us opening a bug report for CC Mode's speed with
font-lock-maximum-decoration = 2?
> > > > The string start will be in a parse-partial-sexp result somewhere.
> > > > Sometimes people write long strings. They certainly write long comments.
> > > Why do I have top suffer every day just because someone, somewhere,
> > > might do that? I'd rather we "punish" those few people who do it
> > > (rarely).
> > I don't think we should punish people who write comments. I'm thinking
> > of Gerd M., who was likely the writer of the comment at the beginning of
> > xdisp.c.
> We are still talking about long lines, yes? There are no long lines
> in that commentary at the beginning of xdisp.c.
I don't think we were still talking about long lines. We were talking
about parse-partial-sexp on large files.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 9:58 ` Alan Mackenzie
@ 2022-08-08 11:39 ` Eli Zaretskii
2022-08-08 15:05 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 11:39 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Mon, 8 Aug 2022 09:58:29 +0000
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > > You've changed the scenario, yes?
>
> > > Yes. We've got to deal with all scenarios, preferably without
> > > special-caseing special cases.
>
> > No one said that all the scenarios must have the same solution.
>
> I suppose not, but optimising for different scenarios would be, well, an
> optimisation. Do we really need it.
I thought the facts and findings in this and related discussions
recently have shown beyond any reasonable doubt that we do.
> > > On my machine, with an optimised build, it takes just under 20 ms to
> > > parse-partial-sexp over xdisp.c (not counting any redisplay at the end).
> > > I don't understand any more than Dmitry does, why your unoptimised build
> > > is taking 25 times as long.
>
> > It doesn't help to know that some very fast machine can do this stuff
> > quickly enough to remain below the annoyance threshold. 20 ms is a
> > very long time by the current CPU speed measure: just calculate the
> > number of CPU cycles in that time and you will see it.
>
> We're talking about 1.2 MB here. That works out at less than 17 ns per
> character. Each round of the loop is a fairly sophisticated finite state
> machine pass. Possibly it could be optimised, but I doubt by very much.
I guess you have just explained to yourself why processing all of the
buffer is unscalable? Which was my point all the way.
> Some things take a certain amount of time, and there's nothing we can do
> about it.
Of course, there's something to do: do less of that! This is exactly
the main idea behind narrowing, and any other similar limitations.
For example, bidi.c stops looking back for a paragraph's beginning if
it didn't find one within a predefined number of lines. Failure to
find and process the paragraph's beginning could potentially produce
an utterly incorrect display of bidirectional text, but we do that
anyway. Why? because it's unreasonable to slow down redisplay to a
crawl for the benefit of very rare situations.
Why cannot CC Mode do something similar, with a similar justification?
> > I don't know what makes it slow, but it feels sluggish in even the
> > simplest editing operations, and font-lock updates are slow as well.
>
> How about us opening a bug report for CC Mode's speed with
> font-lock-maximum-decoration = 2?
Consider it open. I'm sure I complained about this more than once
already, and yet here we are.
> > We are still talking about long lines, yes? There are no long lines
> > in that commentary at the beginning of xdisp.c.
>
> I don't think we were still talking about long lines. We were talking
> about parse-partial-sexp on large files.
Well, I _am_ talking about long lines. Because you insist on the
necessity to go far back even in that case.
When we solve the long-line case, we can consider the large-buffer
case, and perhaps even find that the same solution fits both.
^ permalink raw reply [flat|nested] 136+ messages in thread
* CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.]
2022-08-08 11:39 ` Eli Zaretskii
@ 2022-08-08 15:05 ` Alan Mackenzie
2022-08-08 15:51 ` Gregory Heytings
2022-08-08 17:15 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Eli Zaretskii
0 siblings, 2 replies; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 15:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Mon, Aug 08, 2022 at 14:39:56 +0300, Eli Zaretskii wrote:
> > Date: Mon, 8 Aug 2022 09:58:29 +0000
> > Cc: gregory@heytings.org, emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
[ .... ]
> > > I don't know what makes it [CC Mode with
> > > font-lock-maximum-decoration 2] slow, but it feels sluggish in even
> > > the simplest editing operations, and font-lock updates are slow as
> > > well.
I don't think that's entirely true. I've just measured the font-locking
time, and (let's call it) CC Mode/2 fontifies at 72% the speed of Emacs
Lisp Mode. I don't think it's reasonable to expect more than that, given
that C is more complicated than Lisp.
I think it more likely that the before/after-change-functions in CC
Mode/2 are excessive, and making it look like fontification is slow.
For this measurement, I started with subr.el, and appended copies of it
to itself, then took functions off the end, to make it the same size as
xdisp.c. xdisp.c is 1209233 bytes, my .el buffer was 1209371 bytes.
I used M-: (benchmark-run 1 (time-scroll-b)) on each buffer, with:
(defun time-scroll-b (&optional arg) ; For use in `benchmark-run'.
(condition-case nil
(while t
(if arg (scroll-down) (scroll-up))
(sit-for 0))
(error nil)))
.. The exact results were:
(xdisp.c): (5.7370774540000005 9 0.7672129740000013)
(elisp): (4.1201735589999995 5 0.42918214299999846).
This was, of course, on an optimised build on GNU/Linux using the Linux
console, both measurements starting at BOB, having typed and deleted a
character to erase existing font-locking.
> > How about us opening a bug report for CC Mode's speed with
> > font-lock-maximum-decoration = 2?
> Consider it open. I'm sure I complained about this more than once
> already, and yet here we are.
[ .... ]
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.]
2022-08-08 15:05 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Alan Mackenzie
@ 2022-08-08 15:51 ` Gregory Heytings
2022-08-08 16:05 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
2022-08-08 17:15 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Eli Zaretskii
1 sibling, 1 reply; 136+ messages in thread
From: Gregory Heytings @ 2022-08-08 15:51 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel
>
> I've just measured the font-locking time, and (let's call it) CC Mode/2
> fontifies at 72% the speed of Emacs Lisp Mode.
>
I don't know under which exact conditions you obtained these (very
favorable) numbers, but I at least cannot reproduce them. Here are two
ways to measure the relative speed of fontification in Emacs Lisp and C.
First create a "complex.el" file with
for i in $(seq 1 3); do cat lisp/simple.el; done > complex.el
(Note that the resulting complex.el file is slightly larger than xdisp.c.)
Then:
1. Use your (benchmark-run 1 (time-scroll)) after loading complex.el and
xdisp.c in emacs -Q. The numbers here are 4.5 seconds for complex.el and
20.5 seconds for xdisp.c. More than four times slower.
2. Use (benchmark-run 1 (font-lock-fontify-region (point-min) (point-max))
after loading complex.el and xdisp.c in emacs -Q. The numbers here are
1.2 seconds for complex.el and 12.5 seconds for xdisp.el. More than ten
times slower.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 15:51 ` Gregory Heytings
@ 2022-08-08 16:05 ` Alan Mackenzie
2022-08-08 16:50 ` Gregory Heytings
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 16:05 UTC (permalink / raw)
To: Gregory Heytings; +Cc: Eli Zaretskii, emacs-devel
Hello, Gregory.
On Mon, Aug 08, 2022 at 15:51:50 +0000, Gregory Heytings wrote:
> > I've just measured the font-locking time, and (let's call it) CC Mode/2
> > fontifies at 72% the speed of Emacs Lisp Mode.
> I don't know under which exact conditions you obtained these (very
> favorable) numbers, but I at least cannot reproduce them. ....
I think the bit you missed was:
>>>> I don't know what makes it [CC Mode with
>>>> font-lock-maximum-decoration 2] slow,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Try it again with that setting.
[ .... ]
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 16:05 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
@ 2022-08-08 16:50 ` Gregory Heytings
2022-08-09 19:49 ` Gregory Heytings
0 siblings, 1 reply; 136+ messages in thread
From: Gregory Heytings @ 2022-08-08 16:50 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel
> I think the bit you missed was:
>
>> I don't know what makes it [CC Mode with font-lock-maximum-decoration 2]
>
> Try it again with that setting.
>
Indeed, that was it.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 16:50 ` Gregory Heytings
@ 2022-08-09 19:49 ` Gregory Heytings
0 siblings, 0 replies; 136+ messages in thread
From: Gregory Heytings @ 2022-08-09 19:49 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel
>> I think the bit you missed was:
>>
>>> I don't know what makes it [CC Mode with font-lock-maximum-decoration 2]
>>
>> Try it again with that setting.
>
> Indeed, that was it.
>
That being said, IMO that comparison is unfair, because Emacs Lisp has
only two fontification levels, so font-lock-maximum-decoration t and 2 are
the same (and likewise nil and 1 are the same). A fairer comparison would
be to use one less than the maximum level in each case, that is,
font-lock-maximum-decoration 2 for CC Mode and
font-lock-maximum-decoration 1 for Emacs Lisp. In which case the numbers
are much less favorable:
1. (benchmark-run 1 (time-scroll)) is 3.3 seconds for complex.el and 6.2
seconds for xdisp.c, two times slower;
2. (benchmark-run 1 (font-lock-fontify-region (point-min) (point-max))) is
0.25 seconds for complex.el and 1.75 seconds for xdisp.c, seven times
slower.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.]
2022-08-08 15:05 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Alan Mackenzie
2022-08-08 15:51 ` Gregory Heytings
@ 2022-08-08 17:15 ` Eli Zaretskii
2022-08-08 17:41 ` Eli Zaretskii
2022-08-08 18:20 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
1 sibling, 2 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 17:15 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Mon, 8 Aug 2022 15:05:29 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> For this measurement, I started with subr.el, and appended copies of it
> to itself, then took functions off the end, to make it the same size as
> xdisp.c. xdisp.c is 1209233 bytes, my .el buffer was 1209371 bytes.
>
> I used M-: (benchmark-run 1 (time-scroll-b)) on each buffer, with:
>
> (defun time-scroll-b (&optional arg) ; For use in `benchmark-run'.
> (condition-case nil
> (while t
> (if arg (scroll-down) (scroll-up))
> (sit-for 0))
> (error nil)))
>
> .. The exact results were:
> (xdisp.c): (5.7370774540000005 9 0.7672129740000013)
> (elisp): (4.1201735589999995 5 0.42918214299999846).
>
> This was, of course, on an optimised build on GNU/Linux using the Linux
> console, both measurements starting at BOB, having typed and deleted a
> character to erase existing font-locking.
Editing source code is more than just scrolling through the text and
getting it fontified, though. For realistic measurements, you need to
emulate and time a typical mix of editing operations.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.]
2022-08-08 17:15 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Eli Zaretskii
@ 2022-08-08 17:41 ` Eli Zaretskii
2022-08-08 18:41 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
2022-08-08 18:20 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 17:41 UTC (permalink / raw)
To: acm; +Cc: emacs-devel
> Date: Mon, 08 Aug 2022 20:15:25 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
>
> > For this measurement, I started with subr.el, and appended copies of it
> > to itself, then took functions off the end, to make it the same size as
> > xdisp.c. xdisp.c is 1209233 bytes, my .el buffer was 1209371 bytes.
> >
> > I used M-: (benchmark-run 1 (time-scroll-b)) on each buffer, with:
> >
> > (defun time-scroll-b (&optional arg) ; For use in `benchmark-run'.
> > (condition-case nil
> > (while t
> > (if arg (scroll-down) (scroll-up))
> > (sit-for 0))
> > (error nil)))
> >
> > .. The exact results were:
> > (xdisp.c): (5.7370774540000005 9 0.7672129740000013)
> > (elisp): (4.1201735589999995 5 0.42918214299999846).
> >
> > This was, of course, on an optimised build on GNU/Linux using the Linux
> > console, both measurements starting at BOB, having typed and deleted a
> > character to erase existing font-locking.
>
> Editing source code is more than just scrolling through the text and
> getting it fontified, though. For realistic measurements, you need to
> emulate and time a typical mix of editing operations.
And btw, I'm not sure I understand what you are saying. Are you
saying that level 2 is enough for fontifications in C mode? If so,
what are we losing when compared to the value t, and if we don't lose
anything important, why do we need any fontifications beyond what
level 2 gives us?
And what about the value nil instead of 2?
IOW, if you are saying that you consider level 2 to be the recommended
level for C sources, why didn't we make that change long ago?
For Lisp, btw, the difference between level 2 and t is negligible.
And the same goes for most/all other modes, which is the reason why we
have set the value to t years ago. I'm quite sure at that time the
difference between 2 and t for C mode was also very small.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 17:41 ` Eli Zaretskii
@ 2022-08-08 18:41 ` Alan Mackenzie
2022-08-08 18:51 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 18:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Mon, Aug 08, 2022 at 20:41:37 +0300, Eli Zaretskii wrote:
> > Date: Mon, 08 Aug 2022 20:15:25 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: emacs-devel@gnu.org
> > > For this measurement, I started with subr.el, and appended copies of it
> > > to itself, then took functions off the end, to make it the same size as
> > > xdisp.c. xdisp.c is 1209233 bytes, my .el buffer was 1209371 bytes.
> > > I used M-: (benchmark-run 1 (time-scroll-b)) on each buffer, with:
> > > (defun time-scroll-b (&optional arg) ; For use in `benchmark-run'.
> > > (condition-case nil
> > > (while t
> > > (if arg (scroll-down) (scroll-up))
> > > (sit-for 0))
> > > (error nil)))
> > > .. The exact results were:
> > > (xdisp.c): (5.7370774540000005 9 0.7672129740000013)
> > > (elisp): (4.1201735589999995 5 0.42918214299999846).
> > > This was, of course, on an optimised build on GNU/Linux using the Linux
> > > console, both measurements starting at BOB, having typed and deleted a
> > > character to erase existing font-locking.
> > Editing source code is more than just scrolling through the text and
> > getting it fontified, though. For realistic measurements, you need to
> > emulate and time a typical mix of editing operations.
> And btw, I'm not sure I understand what you are saying. Are you
> saying that level 2 is enough for fontifications in C mode?
No.
> If so, what are we losing when compared to the value t, and if we don't
> lose anything important, why do we need any fontifications beyond what
> level 2 gives us?
We lose accuracy. That is important to a lot of people, including the
many who have sent in bug reports because of lack of accuracy.
> And what about the value nil instead of 2?
I haven't tried that, yet.
> IOW, if you are saying that you consider level 2 to be the recommended
> level for C sources, why didn't we make that change long ago?
I'm not saying that. I think, on balance, most users prefer the accuracy
of level 3 to the speed of level 2. I've got no real evidence for that,
however.
> For Lisp, btw, the difference between level 2 and t is negligible.
> And the same goes for most/all other modes, which is the reason why we
> have set the value to t years ago. I'm quite sure at that time the
> difference between 2 and t for C mode was also very small.
Martin Stjernholm wrote (what has become) the current level 3 around 20
years ago, noting specifically it was expected to be slower than before,
and that the new level 2 was comparable in both speed and accuracy to the
old level 3. Since then level 3 has become considerably more accurate
and quite a bit slower, too.
My impression of those times was that the old level 3 was just incapable
of being amended to satisfy users' demands for accurate fontification.
Again, I'd have to check old CC Mode bug list archives to be sure.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 18:41 ` CC Mode with font-lock-maximum-decoration 2 Alan Mackenzie
@ 2022-08-08 18:51 ` Eli Zaretskii
2022-08-08 19:09 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 18:51 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Mon, 8 Aug 2022 18:41:01 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > And btw, I'm not sure I understand what you are saying. Are you
> > saying that level 2 is enough for fontifications in C mode?
>
> No.
>
> > If so, what are we losing when compared to the value t, and if we don't
> > lose anything important, why do we need any fontifications beyond what
> > level 2 gives us?
>
> We lose accuracy. That is important to a lot of people, including the
> many who have sent in bug reports because of lack of accuracy.
Then what is the importance of these measurements of yours? The fact
that at level 2 C mode is only slightly slower than Lisp mode is
therefore purely academic: you don't expect anyone to use it, and
don't recommend using it.
> > For Lisp, btw, the difference between level 2 and t is negligible.
> > And the same goes for most/all other modes, which is the reason why we
> > have set the value to t years ago. I'm quite sure at that time the
> > difference between 2 and t for C mode was also very small.
>
> Martin Stjernholm wrote (what has become) the current level 3 around 20
> years ago, noting specifically it was expected to be slower than before,
> and that the new level 2 was comparable in both speed and accuracy to the
> old level 3. Since then level 3 has become considerably more accurate
> and quite a bit slower, too.
That's almost certainly what happened.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 18:51 ` Eli Zaretskii
@ 2022-08-08 19:09 ` Alan Mackenzie
2022-08-09 2:24 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 19:09 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Mon, Aug 08, 2022 at 21:51:47 +0300, Eli Zaretskii wrote:
> > Date: Mon, 8 Aug 2022 18:41:01 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > And btw, I'm not sure I understand what you are saying. Are you
> > > saying that level 2 is enough for fontifications in C mode?
> > No.
> > > If so, what are we losing when compared to the value t, and if we don't
> > > lose anything important, why do we need any fontifications beyond what
> > > level 2 gives us?
> > We lose accuracy. That is important to a lot of people, including the
> > many who have sent in bug reports because of lack of accuracy.
> Then what is the importance of these measurements of yours?
They show that an attempt to speed up CC Mode/2 should be concentrating
on the code which isn't fontification code.
> The fact that at level 2 C mode is only slightly slower than Lisp mode
> is therefore purely academic: you don't expect anyone to use it, and
> don't recommend using it.
The _FONTIFICATION_ of CC Mode/2 is only a little slower than that of
Emacs Lisp Mode. Reports from you show that the mode as a whole is too
slow. I do expect people to use it, those for whom lightning fast
response is more important than accuracy. I just don't think these users
constitute a majority.
> > > For Lisp, btw, the difference between level 2 and t is negligible.
> > > And the same goes for most/all other modes, which is the reason why we
> > > have set the value to t years ago. I'm quite sure at that time the
> > > difference between 2 and t for C mode was also very small.
> > Martin Stjernholm wrote (what has become) the current level 3 around 20
> > years ago, noting specifically it was expected to be slower than before,
> > and that the new level 2 was comparable in both speed and accuracy to the
> > old level 3. Since then level 3 has become considerably more accurate
> > and quite a bit slower, too.
> That's almost certainly what happened.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 19:09 ` Alan Mackenzie
@ 2022-08-09 2:24 ` Eli Zaretskii
2022-08-09 8:00 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 2:24 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Mon, 8 Aug 2022 19:09:50 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > We lose accuracy. That is important to a lot of people, including the
> > > many who have sent in bug reports because of lack of accuracy.
>
> > Then what is the importance of these measurements of yours?
>
> They show that an attempt to speed up CC Mode/2 should be concentrating
> on the code which isn't fontification code.
Please elaborate on this conclusion, because I don't think I
understand how you arrived at it, based on your measurements. With
the default value of font-lock-maximum-decoration, the fontifications
are also very slow, relatively to other modes.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 2:24 ` Eli Zaretskii
@ 2022-08-09 8:00 ` Alan Mackenzie
2022-08-09 11:07 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-09 8:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Tue, Aug 09, 2022 at 05:24:02 +0300, Eli Zaretskii wrote:
> > Date: Mon, 8 Aug 2022 19:09:50 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > > We lose accuracy. That is important to a lot of people, including the
> > > > many who have sent in bug reports because of lack of accuracy.
> > > Then what is the importance of these measurements of yours?
> > They show that an attempt to speed up CC Mode/2 should be concentrating
> > on the code which isn't fontification code.
> Please elaborate on this conclusion, because I don't think I
> understand how you arrived at it, based on your measurements. With
> the default value of font-lock-maximum-decoration, the fontifications
> are also very slow, relatively to other modes.
I mean CC Mode with font-lock-maximum-decoration = 2, particularly. The
fontification in this setup is not slow (72% of Emacs Lisp Mode's
speed). The setup as a whole is not fast enough. Therefore to speed it
up, fontification is not the aspect to concentrate on.
This has no relevance to CC Mode with font-lock-maximum-decoration with
the default value.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 8:00 ` Alan Mackenzie
@ 2022-08-09 11:07 ` Eli Zaretskii
2022-08-09 11:24 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 11:07 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 9 Aug 2022 08:00:16 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > > Then what is the importance of these measurements of yours?
>
> > > They show that an attempt to speed up CC Mode/2 should be concentrating
> > > on the code which isn't fontification code.
>
> > Please elaborate on this conclusion, because I don't think I
> > understand how you arrived at it, based on your measurements. With
> > the default value of font-lock-maximum-decoration, the fontifications
> > are also very slow, relatively to other modes.
>
> I mean CC Mode with font-lock-maximum-decoration = 2, particularly. The
> fontification in this setup is not slow (72% of Emacs Lisp Mode's
> speed). The setup as a whole is not fast enough. Therefore to speed it
> up, fontification is not the aspect to concentrate on.
But if we will never recommend using level 2, those conclusions are
again of no practical value for our users. Right?
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 11:07 ` Eli Zaretskii
@ 2022-08-09 11:24 ` Alan Mackenzie
2022-08-09 11:57 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-09 11:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Tue, Aug 09, 2022 at 14:07:33 +0300, Eli Zaretskii wrote:
> > Date: Tue, 9 Aug 2022 08:00:16 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > > > Then what is the importance of these measurements of yours?
> > > > They show that an attempt to speed up CC Mode/2 should be concentrating
> > > > on the code which isn't fontification code.
> > > Please elaborate on this conclusion, because I don't think I
> > > understand how you arrived at it, based on your measurements. With
> > > the default value of font-lock-maximum-decoration, the fontifications
> > > are also very slow, relatively to other modes.
> > I mean CC Mode with font-lock-maximum-decoration = 2, particularly. The
> > fontification in this setup is not slow (72% of Emacs Lisp Mode's
> > speed). The setup as a whole is not fast enough. Therefore to speed it
> > up, fontification is not the aspect to concentrate on.
> But if we will never recommend using level 2, those conclusions are
> again of no practical value for our users. Right?
I don't agree. If there is some place in our documentation to do it,
then we should recommend level 2 for those, like you, who want rapid
response, and level 3 for those, like me, who want accurate
fontification. It's a simple (or complicated) user choice.
You have stated that CC Mode with level 2 is not fast enough. I intend
to make this (what I call CC Mode/2) faster.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 11:24 ` Alan Mackenzie
@ 2022-08-09 11:57 ` Eli Zaretskii
2022-08-09 16:36 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 11:57 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 9 Aug 2022 11:24:20 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> Hello, Eli.
>
> On Tue, Aug 09, 2022 at 14:07:33 +0300, Eli Zaretskii wrote:
> > > Date: Tue, 9 Aug 2022 08:00:16 +0000
> > > Cc: emacs-devel@gnu.org
> > > From: Alan Mackenzie <acm@muc.de>
>
> > > > > > Then what is the importance of these measurements of yours?
>
> > > > > They show that an attempt to speed up CC Mode/2 should be concentrating
> > > > > on the code which isn't fontification code.
>
> > > > Please elaborate on this conclusion, because I don't think I
> > > > understand how you arrived at it, based on your measurements. With
> > > > the default value of font-lock-maximum-decoration, the fontifications
> > > > are also very slow, relatively to other modes.
>
> > > I mean CC Mode with font-lock-maximum-decoration = 2, particularly. The
> > > fontification in this setup is not slow (72% of Emacs Lisp Mode's
> > > speed). The setup as a whole is not fast enough. Therefore to speed it
> > > up, fontification is not the aspect to concentrate on.
>
> > But if we will never recommend using level 2, those conclusions are
> > again of no practical value for our users. Right?
>
> I don't agree. If there is some place in our documentation to do it,
> then we should recommend level 2 for those, like you, who want rapid
> response, and level 3 for those, like me, who want accurate
> fontification. It's a simple (or complicated) user choice.
We are not talking about my personal customizations, we are talking
about what CC Mode does by default. If we'd changed the default to be
level 2 for CC Mode, I could understand your line of reasoning. But
since you don't think this should be the default, I say what CC Mode
does at level 2 is not of practical importance for making CC Mode fast
enough.
> You have stated that CC Mode with level 2 is not fast enough. I intend
> to make this (what I call CC Mode/2) faster.
That factoid doesn't do anything for making CC Mode faster for our
users, even if you assume that I personally will use that level.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 11:57 ` Eli Zaretskii
@ 2022-08-09 16:36 ` Alan Mackenzie
2022-08-09 16:59 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-09 16:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Tue, Aug 09, 2022 at 14:57:25 +0300, Eli Zaretskii wrote:
> > Date: Tue, 9 Aug 2022 11:24:20 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > Hello, Eli.
> > On Tue, Aug 09, 2022 at 14:07:33 +0300, Eli Zaretskii wrote:
> > > > Date: Tue, 9 Aug 2022 08:00:16 +0000
> > > > Cc: emacs-devel@gnu.org
> > > > From: Alan Mackenzie <acm@muc.de>
> > > > > > > Then what is the importance of these measurements of yours?
> > > > > > They show that an attempt to speed up CC Mode/2 should be concentrating
> > > > > > on the code which isn't fontification code.
> > > > > Please elaborate on this conclusion, because I don't think I
> > > > > understand how you arrived at it, based on your measurements. With
> > > > > the default value of font-lock-maximum-decoration, the fontifications
> > > > > are also very slow, relatively to other modes.
> > > > I mean CC Mode with font-lock-maximum-decoration = 2, particularly. The
> > > > fontification in this setup is not slow (72% of Emacs Lisp Mode's
> > > > speed). The setup as a whole is not fast enough. Therefore to speed it
> > > > up, fontification is not the aspect to concentrate on.
> > > But if we will never recommend using level 2, those conclusions are
> > > again of no practical value for our users. Right?
> > I don't agree. If there is some place in our documentation to do it,
> > then we should recommend level 2 for those, like you, who want rapid
> > response, and level 3 for those, like me, who want accurate
> > fontification. It's a simple (or complicated) user choice.
> We are not talking about my personal customizations, we are talking
> about what CC Mode does by default. If we'd changed the default to be
> level 2 for CC Mode, I could understand your line of reasoning. But
> since you don't think this should be the default, I say what CC Mode
> does at level 2 is not of practical importance for making CC Mode fast
> enough.
Fast enough for what? CC Mode at level 3 is fast enough for many,
probably most, users. Over the years there've been fewer complaints
about speed than correctness, and most of these have been in connection
with unusual files. There's never any objection to more speed, but for
those who really want instantaneous response, there is level 2, or even
level 1, and beyond that, fundamental-mode.
> > You have stated that CC Mode with level 2 is not fast enough. I intend
> > to make this (what I call CC Mode/2) faster.
> That factoid doesn't do anything for making CC Mode faster for our
> users, even if you assume that I personally will use that level.
This is the old argument that users can't change settings from their
defaults. I do assume that you use level 2 when you're a user (as
distinct from the maintainer). Am I right?
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 16:36 ` Alan Mackenzie
@ 2022-08-09 16:59 ` Eli Zaretskii
2022-08-09 17:43 ` Alan Mackenzie
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 16:59 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 9 Aug 2022 16:36:04 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > We are not talking about my personal customizations, we are talking
> > about what CC Mode does by default. If we'd changed the default to be
> > level 2 for CC Mode, I could understand your line of reasoning. But
> > since you don't think this should be the default, I say what CC Mode
> > does at level 2 is not of practical importance for making CC Mode fast
> > enough.
>
> Fast enough for what?
Fast enough for editing free of annoying delays and sluggishness.
> CC Mode at level 3 is fast enough for many, probably most, users.
I don't think so. How am I different from other users? If you think
I always use an unoptimized build, you are wrong: my production
sessions run fully optimized builds, and CC Mode still feels sluggish,
perhaps because I unconsciously compare it with other major mode (like
ELisp).
> Over the years there've been fewer complaints about speed than
> correctness, and most of these have been in connection with unusual
> files. There's never any objection to more speed, but for those who
> really want instantaneous response, there is level 2, or even level
> 1, and beyond that, fundamental-mode.
What you describe is factually incorrect, but I don't want to argue
about whether we did or didn't have complaints. I'm complaining now
(and did so a few months ago, but maybe you forgot).
> I do assume that you use level 2 when you're a user (as distinct
> from the maintainer). Am I right?
No, you are wrong. I use the default all the time. And since you
didn't really describe the effect of going down to level 2, I cannot
even begin thinking whether using level 2 is worth considering for my
purposes.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 16:59 ` Eli Zaretskii
@ 2022-08-09 17:43 ` Alan Mackenzie
2022-08-09 17:55 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-09 17:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Tue, Aug 09, 2022 at 19:59:36 +0300, Eli Zaretskii wrote:
> > Date: Tue, 9 Aug 2022 16:36:04 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > We are not talking about my personal customizations, we are talking
> > > about what CC Mode does by default. If we'd changed the default to be
> > > level 2 for CC Mode, I could understand your line of reasoning. But
> > > since you don't think this should be the default, I say what CC Mode
> > > does at level 2 is not of practical importance for making CC Mode fast
> > > enough.
> > Fast enough for what?
> Fast enough for editing free of annoying delays and sluggishness.
What's annoying and sluggish for Alice is perfectly fine for Bob.
Everybody's different.
> > CC Mode at level 3 is fast enough for many, probably most, users.
> I don't think so. How am I different from other users?
You're probably a lot faster than most users at just about everything
you do. How else could you keep Emacs under control? If your mental
processes are faster than most people's, then what's sluggish to you
would be perfectly OK to other people. Everybody's different.
> If you think I always use an unoptimized build, you are wrong: my
> production sessions run fully optimized builds, and CC Mode still
> feels sluggish, perhaps because I unconsciously compare it with other
> major mode (like ELisp).
Emacs Lisp Mode cannot help but be much faster than CC Mode. It is
unreasonable to expect parity in their speeds.
> > Over the years there've been fewer complaints about speed than
> > correctness, and most of these have been in connection with unusual
> > files. There's never any objection to more speed, but for those who
> > really want instantaneous response, there is level 2, or even level
> > 1, and beyond that, fundamental-mode.
> What you describe is factually incorrect, but I don't want to argue
> about whether we did or didn't have complaints. I'm complaining now
> (and did so a few months ago, but maybe you forgot).
No, I haven't forgotten.
> > I do assume that you use level 2 when you're a user (as distinct
> > from the maintainer). Am I right?
> No, you are wrong. I use the default all the time. And since you
> didn't really describe the effect of going down to level 2, I cannot
> even begin thinking whether using level 2 is worth considering for my
> purposes.
Well, you could always try it out for an evening. I think I've
described it reasonably well - faster, but less accurate. It doesn't
seem worth the time it would take to catalogue each deficiency in its
fontification. Maybe the inaccuracies would annoy you less than the
sluggishness of level 3. Clearly you don't think so, but only you can
say.
But I don't like your proposed solution, which you've mentioned several
times, namely to make level 3 more like level 2. I.e., to deliberately
reduce its accuracy in the name of speed.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-09 17:43 ` Alan Mackenzie
@ 2022-08-09 17:55 ` Eli Zaretskii
2022-08-10 0:22 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 17:55 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 9 Aug 2022 17:43:31 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> But I don't like your proposed solution, which you've mentioned several
> times, namely to make level 3 more like level 2. I.e., to deliberately
> reduce its accuracy in the name of speed.
Then I guess CC Mode will remain slow, until maybe tree-sitter
integration will fix it. Sadly.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-09 17:55 ` Eli Zaretskii
@ 2022-08-10 0:22 ` Lynn Winebarger
2022-08-10 2:14 ` Po Lu
` (2 more replies)
0 siblings, 3 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-10 0:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Alan Mackenzie, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]
On Tue, Aug 9, 2022, 1:58 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Tue, 9 Aug 2022 17:43:31 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> >
> > But I don't like your proposed solution, which you've mentioned several
> > times, namely to make level 3 more like level 2. I.e., to deliberately
> > reduce its accuracy in the name of speed.
>
> Then I guess CC Mode will remain slow, until maybe tree-sitter
> integration will fix it. Sadly.
>
I have a related question. I have this programming mode derived from CC
mode. The formal syntax is on the one hand much simpler than anything like
C++, but I find trying to capture the constructs by the regular expression
rules employed by font lock very difficult to get right.
I have a re-entrant LALR grammar for this language that I intend to use
with Semantic to get proper handling of all constructs. That's one of the
main reasons I wanted to be sure to get as efficient a baseline system as
possible (and can now proceed with).
I'm curious, though, as to why Semantic/CEDET seems to have been superceded
by external solutions like tree-sitter or LSP-based (non-emacs) servers.
One of the draws of Emacs for me is the "batteries included" nature of it
having Emacs Lisp built in. Is there a downside to using Semantic as the
basis for improving my derived mode that's non-obvious? Would producing a
threaded code parser instead of a straight table driven parser be a better
approach with the native compilation option now available?
I also use this mode to fontify a REPL session for this language, which has
pretty awful performance when producing tracing output that hits 1-5
hundred K lines in a buffer, which is why this narrowing discussion
interests me, even though the buffers in question don't have particularly
long lines. It just chews up memory if I try to jump from the end of the
buffer to the beginning. Or at least it did in v24.3.
Thanks,
Lynn
[-- Attachment #2: Type: text/html, Size: 2984 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 0:22 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Lynn Winebarger
@ 2022-08-10 2:14 ` Po Lu
2022-08-10 2:42 ` Eli Zaretskii
2022-08-14 19:24 ` Eric Ludlam
2022-08-10 17:03 ` Tassilo Horn
2022-08-13 14:40 ` Jostein Kjønigsen
2 siblings, 2 replies; 136+ messages in thread
From: Po Lu @ 2022-08-10 2:14 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Eli Zaretskii, Alan Mackenzie, emacs-devel
Lynn Winebarger <owinebar@gmail.com> writes:
> I'm curious, though, as to why Semantic/CEDET seems to have been
> superceded by external solutions like tree-sitter or LSP-based
> (non-emacs) servers. One of the draws of Emacs for me is the
> "batteries included" nature of it having Emacs Lisp built in. Is
> there a downside to using Semantic as the basis for improving my
> derived mode that's non-obvious?
I think Semantic lost intertia after the original author lost interest
in it (or left for unrelated reasons, I don't remember which.)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 2:14 ` Po Lu
@ 2022-08-10 2:42 ` Eli Zaretskii
2022-08-10 10:05 ` Lynn Winebarger
2022-08-14 19:24 ` Eric Ludlam
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-10 2:42 UTC (permalink / raw)
To: Po Lu; +Cc: owinebar, acm, emacs-devel
> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, Alan Mackenzie <acm@muc.de>, emacs-devel
> <emacs-devel@gnu.org>
> Date: Wed, 10 Aug 2022 10:14:58 +0800
>
> Lynn Winebarger <owinebar@gmail.com> writes:
>
> > I'm curious, though, as to why Semantic/CEDET seems to have been
> > superceded by external solutions like tree-sitter or LSP-based
> > (non-emacs) servers. One of the draws of Emacs for me is the
> > "batteries included" nature of it having Emacs Lisp built in. Is
> > there a downside to using Semantic as the basis for improving my
> > derived mode that's non-obvious?
>
> I think Semantic lost intertia after the original author lost interest
> in it (or left for unrelated reasons, I don't remember which.)
It is simply too slow to be a modern solution for these features.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 2:42 ` Eli Zaretskii
@ 2022-08-10 10:05 ` Lynn Winebarger
2022-08-10 10:49 ` Po Lu
2022-08-10 11:31 ` Eli Zaretskii
0 siblings, 2 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-10 10:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Po Lu, Alan Mackenzie, emacs-devel, Stefan Monnier
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]
Explicitly adding Stefan M on the CC: list since, as I understand it, he is
a primary driver of tree-sitter integration, as well as having been
involved in bringing Semantic/CEDET into the core (if I'm not
misremembering the acknowledgements in the source code and/or doc files).
On Tue, Aug 9, 2022, 10:42 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Po Lu <luangruo@yahoo.com>
> > Cc: Eli Zaretskii <eliz@gnu.org>, Alan Mackenzie <acm@muc.de>,
> emacs-devel
> > <emacs-devel@gnu.org>
> > Date: Wed, 10 Aug 2022 10:14:58 +0800
> >
> > Lynn Winebarger <owinebar@gmail.com> writes:
> >
> > > I'm curious, though, as to why Semantic/CEDET seems to have been
> > > superceded by external solutions like tree-sitter or LSP-based
> > > (non-emacs) servers. One of the draws of Emacs for me is the
> > > "batteries included" nature of it having Emacs Lisp built in. Is
> > > there a downside to using Semantic as the basis for improving my
> > > derived mode that's non-obvious?
> >
> > I think Semantic lost intertia after the original author lost interest
> > in it (or left for unrelated reasons, I don't remember which.)
>
> It is simply too slow to be a modern solution for these features.
>
Can you (or anyone on the list) provide a more detailed analysis? Is the
slowness inherent in the algorithm design, the implementation method (eg
table driven parsing designed before the availability of the native
compiler), the basic synchronous nature of ELisp, the impact on garbage
collection, etc?
If the analyzer were run in a second emacs process using mmaped files to
share buffers being analyzed, then communicating the results either via LSP
or some other channel, would that make it usable?
There are dangling references in the semantic/wisent manual to docs that
are in the last sourceforge repo around the time of the migration into core
emacs, but never made it in. The grammar framework doc in particular is
needed to make sense of the existing grammars.
I've definitely noticed more pausing with Semantic turned on, but it's not
unusable so far (but I'm also not looking at any C++ source, just ELisp and
C, maybe some shell scripts, info files and Markdown).
Lynn
[-- Attachment #2: Type: text/html, Size: 3534 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 10:05 ` Lynn Winebarger
@ 2022-08-10 10:49 ` Po Lu
2022-08-10 11:31 ` Eli Zaretskii
1 sibling, 0 replies; 136+ messages in thread
From: Po Lu @ 2022-08-10 10:49 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Alan Mackenzie, emacs-devel, Stefan Monnier
Lynn Winebarger <owinebar@gmail.com> writes:
> Explicitly adding Stefan M on the CC: list since, as I understand it,
> he is a primary driver of tree-sitter integration
I think you want Yuan Fu <casouri@gmail.com> instead.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 10:05 ` Lynn Winebarger
2022-08-10 10:49 ` Po Lu
@ 2022-08-10 11:31 ` Eli Zaretskii
2022-08-12 12:37 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-10 11:31 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: luangruo, acm, emacs-devel, monnier
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Wed, 10 Aug 2022 06:05:56 -0400
> Cc: Po Lu <luangruo@yahoo.com>, Alan Mackenzie <acm@muc.de>, emacs-devel <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>
>
> It is simply too slow to be a modern solution for these features.
>
> Can you (or anyone on the list) provide a more detailed analysis? Is the slowness inherent in the algorithm
> design, the implementation method (eg table driven parsing designed before the availability of the native
> compiler), the basic synchronous nature of ELisp, the impact on garbage collection, etc?
I don't have this information. Maybe someone else does. But in
general, it is a very small wonder that a parser written in optimized
C is much faster than anything written in Emacs Lisp, given that Lisp
is an interpreted language that has no special support for writing
parsers.
> If the analyzer were run in a second emacs process using mmaped files to share buffers being analyzed,
> then communicating the results either via LSP or some other channel, would that make it usable?
I doubt that. In particular, LSP-style communications are a cause of
slower operation, not faster operation. Various LSP-based packages
tolerate that because the server can do stuff clients cannot easily do
without investing a lot of language-specific efforts and expertise.
> I've definitely noticed more pausing with Semantic turned on, but it's not unusable so far (but I'm also not
> looking at any C++ source, just ELisp and C, maybe some shell scripts, info files and Markdown).
I didn't say Semantic is unusable. It certainly is usable.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 11:31 ` Eli Zaretskii
@ 2022-08-12 12:37 ` Lynn Winebarger
2022-08-12 12:50 ` Eli Zaretskii
2022-08-12 16:00 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Akib Azmain Turja
0 siblings, 2 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-12 12:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: luangruo, acm, emacs-devel, monnier, Yuan Fu
CC'ing Yuan Fu on Po Lu's recommendation earlier in the thread.
On Wed, Aug 10, 2022 at 7:31 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Lynn Winebarger <owinebar@gmail.com>
> > Date: Wed, 10 Aug 2022 06:05:56 -0400
> > Cc: Po Lu <luangruo@yahoo.com>, Alan Mackenzie <acm@muc.de>, emacs-devel <emacs-devel@gnu.org>,
> > Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > It is simply too slow to be a modern solution for these features.
> >
> > Can you (or anyone on the list) provide a more detailed analysis? Is the slowness inherent in the algorithm
> > design, the implementation method (eg table driven parsing designed before the availability of the native
> > compiler), the basic synchronous nature of ELisp, the impact on garbage collection, etc?
>
> I don't have this information. Maybe someone else does. But in
> general, it is a very small wonder that a parser written in optimized
> C is much faster than anything written in Emacs Lisp, given that Lisp
> is an interpreted language that has no special support for writing
> parsers.
That can be cured over time, now that the bulk of the core of emacs
uses lexical scoping. With proper tail recursion, ELisp should be
able to produce lexers and parsers roughly as efficient as C code, if
not more efficient (depending on if you allow use of "computed goto"
in the C code for the lexers and parsers). That does require changes
to the byte code VM, but it's doable.
>
> > If the analyzer were run in a second emacs process using mmaped files to share buffers being analyzed,
> > then communicating the results either via LSP or some other channel, would that make it usable?
>
> I doubt that. In particular, LSP-style communications are a cause of
> slower operation, not faster operation. Various LSP-based packages
> tolerate that because the server can do stuff clients cannot easily do
> without investing a lot of language-specific efforts and expertise.
That's more along the lines of what I expected (and Tassilo's
response) - the overhead of creating and maintaining these language
analyses in Emacs. But for a DSL, I'd personally prefer to be able to
put something together "quickly" in emacs to get a working IDE.
> > I've definitely noticed more pausing with Semantic turned on, but it's not unusable so far (but I'm also not
> > looking at any C++ source, just ELisp and C, maybe some shell scripts, info files and Markdown).
>
> I didn't say Semantic is unusable. It certainly is usable.
The pausing was in fact due to a different library - tabbar-ruler -
that had for some reason made frequently refreshing the tab list call
eval on a thunk instead of just calling the thunk, generating huge
amounts of garbage from "eval". Since I corrected that, I haven't
noticed much pausing. A better fix would make it stop refreshing tabs
that aren't even displayed when Semantic takes over the header line,
but that would require more effort.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-12 12:37 ` Lynn Winebarger
@ 2022-08-12 12:50 ` Eli Zaretskii
2022-08-12 21:50 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Stefan Monnier
2022-08-12 16:00 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Akib Azmain Turja
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-12 12:50 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: luangruo, acm, emacs-devel, monnier, casouri
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Fri, 12 Aug 2022 08:37:25 -0400
> Cc: luangruo@yahoo.com, acm@muc.de, emacs-devel@gnu.org,
> monnier@iro.umontreal.ca, Yuan Fu <casouri@gmail.com>
>
> > I don't have this information. Maybe someone else does. But in
> > general, it is a very small wonder that a parser written in optimized
> > C is much faster than anything written in Emacs Lisp, given that Lisp
> > is an interpreted language that has no special support for writing
> > parsers.
>
> That can be cured over time, now that the bulk of the core of emacs
> uses lexical scoping.
I very much doubt that. ELisp code cannot match the speed of
optimized C code.
> That does require changes to the byte code VM, but it's doable.
Then you are no longer talking about Emacs as we know it.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2))
2022-08-12 12:50 ` Eli Zaretskii
@ 2022-08-12 21:50 ` Stefan Monnier
2022-08-12 23:26 ` Lynn Winebarger
2022-08-13 4:39 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Ihor Radchenko
0 siblings, 2 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-12 21:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lynn Winebarger, luangruo, acm, emacs-devel, casouri
>> > I don't have this information. Maybe someone else does. But in
>> > general, it is a very small wonder that a parser written in optimized
>> > C is much faster than anything written in Emacs Lisp, given that Lisp
>> > is an interpreted language that has no special support for writing
>> > parsers.
>> That can be cured over time, now that the bulk of the core of emacs
>> uses lexical scoping.
> I very much doubt that.
Agreed.
> ELisp code cannot match the speed of optimized C code.
I suspect it could, to some extent, in theory. Getting there would
require a fair bit more work, probably using a different compilation
strategy than the AOT compiler we have now.
A good reference is JavaScript which shouldn't be noticeably easier to
compile efficiently and where many millions of dollars have been poured
over several years to speed it up. The result is indeed able to match
the performance of C nowadays for several non-trivial examples of code,
but it's far from obvious that we have the resources to reproduce this
feat for ELisp.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2))
2022-08-12 21:50 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Stefan Monnier
@ 2022-08-12 23:26 ` Lynn Winebarger
2022-08-13 2:11 ` Ideal performance of ELisp Stefan Monnier
2022-08-13 4:39 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Ihor Radchenko
1 sibling, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-12 23:26 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]
On Fri, Aug 12, 2022, 5:50 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> > I don't have this information. Maybe someone else does. But in
> >> > general, it is a very small wonder that a parser written in optimized
> >> > C is much faster than anything written in Emacs Lisp, given that Lisp
> >> > is an interpreted language that has no special support for writing
> >> > parsers.
> >> That can be cured over time, now that the bulk of the core of emacs
> >> uses lexical scoping.
> > I very much doubt that.
>
> Agreed.
>
I noted the switch to lexical scope, because dynamic scope prevents any
call from ever being a tail call in any meaningful sense. It's not
automatic, but when the core of Emacs lisp code used dynamic scoping,
changes to the VM to support proper tail recursion would have been
meaningless. That is no longer the case.
Once the VM supports proper tail recursion, it's straightforward to
generate automata that never perform a function call, at least not as part
of the recognizer. Lisp with proper tail recursion gives you the
equivalent of computed goto's, which are not available in
standards-conformant C AFAIK. If the compiler can prove certain local
variables are always fixnums and remove the tag/untag overhead, then the
generated IR passed to libgccjit (or clang) should be able to match
anything expressible in C for those automata.
The code in the actions may not be as optimizable as the recognition
automata, I'll grant you.
> ELisp code cannot match the speed of optimized C code.
>
> I suspect it could, to some extent, in theory. Getting there would
> require a fair bit more work, probably using a different compilation
> strategy than the AOT compiler we have now.
> A good reference is JavaScript which shouldn't be noticeably easier to
> compile efficiently and where many millions of dollars have been poured
> over several years to speed it up. The result is indeed able to match
> the performance of C nowadays for several non-trivial examples of code,
> but it's far from obvious that we have the resources to reproduce this
> feat for ELisp.
The claim wasn't that every piece of ELisp code could be as optimized as
sharply written C or C++ code, but that there is a subset that is (that can
be targeted by tools like parser and lexer generators), at least at the
level of an abstract RISC machine that the IRs for compiler backends like
libgccjit or llvm let you target.
I don't think the compiler has to match the level of optimization of the V8
compiler to get significant improvements in the performance of ELisp.
We're really only talking about optimizations known to Common Lisp and
Scheme compiler writers in the 1980s. However, the most significant
improvements in performance will probably only come after changes that
remove impediments to parallelizing work and improved memory management.
Lynn
[-- Attachment #2: Type: text/html, Size: 3944 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-12 23:26 ` Lynn Winebarger
@ 2022-08-13 2:11 ` Stefan Monnier
2022-08-13 10:51 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-13 2:11 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
> Once the VM supports proper tail recursion, it's straightforward to
> generate automata that never perform a function call, at least not as part
> of the recognizer.
It was straightforward beforehand as well (using a `while` loop instead
of recursion). And if you do use recursion, then it's not very much
simpler with `lexical-binding` than without because you still have to
take into account the possibility that the function gets redefined
during your recursion :-(
Don't get me wrong: `lexical-binding` is definitely very useful for
native compilation (and it does help for tail-calls in some cases,
e.g. in `named-let`), but I suspect that for the foreseeable future
it'll stay hard to be competitive with something like tree-sitter when
writing the code in ELisp.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 2:11 ` Ideal performance of ELisp Stefan Monnier
@ 2022-08-13 10:51 ` Lynn Winebarger
2022-08-13 11:13 ` Lynn Winebarger
2022-08-13 14:07 ` Stefan Monnier
0 siblings, 2 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-13 10:51 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 2316 bytes --]
On Fri, Aug 12, 2022, 10:11 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > Once the VM supports proper tail recursion, it's straightforward to
> > generate automata that never perform a function call, at least not as
> part
> > of the recognizer.
>
> It was straightforward beforehand as well (using a `while` loop instead
> of recursion). And if you do use recursion, then it's not very much
> simpler with `lexical-binding` than without because you still have to
> take into account the possibility that the function gets redefined
> during your recursion :-(
>
I think you're mistaking self-tail recursion for tail recursion. I mean
proper tail recursion in Clinger's sense. Any program written in CPS form
will not "accumulate stack" (note the scare quotes, I know the details
depend on implementation). You can use a while loop with a trampoline to
emulate it, sure, but that's not the same as having all control transfers
take place by simple branching. If you lift all the lambdas, there's no
implicit memory allocation either. I used to write code like that all the
time - it's just "higher order assembly language".
You're right about the hiccup introduced by having a "Lisp-2" without
locally scoped function names. That could be solved by introducing an
explicit "function lambda" whose parameters provide lexically scoped
function variables, or by simply using funcall to dispatch to closures on
ordinary variables. As long as the dispatch happens on locally scoped
names, the compiler should be able to tell if they are constants.
> Don't get me wrong: `lexical-binding` is definitely very useful for
> native compilation (and it does help for tail-calls in some cases,
> e.g. in `named-let`), but I suspect that for the foreseeable future
> it'll stay hard to be competitive with something like tree-sitter when
> writing the code in ELisp
This is Emacs. Even if there was a new VM with these features, and a
transpiler for porting existing ELC files, available today, I wouldn't be
sure it would be integrated anytime soon.
I just think the main barrier to introducing such improvements was,
historically, the dynamic scoping in the massive lisp code base of Emacs.
With that removed, I don't think Eli's skepticism is warranted. This was
all hashed out in the 1980s.
Lynn
[-- Attachment #2: Type: text/html, Size: 3337 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 10:51 ` Lynn Winebarger
@ 2022-08-13 11:13 ` Lynn Winebarger
2022-08-13 14:07 ` Stefan Monnier
1 sibling, 0 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-13 11:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 2638 bytes --]
On Sat, Aug 13, 2022, 6:51 AM Lynn Winebarger <owinebar@gmail.com> wrote:
> On Fri, Aug 12, 2022, 10:11 PM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
>
>> > Once the VM supports proper tail recursion, it's straightforward to
>> > generate automata that never perform a function call, at least not as
>> part
>> > of the recognizer.
>>
>> It was straightforward beforehand as well (using a `while` loop instead
>> of recursion). And if you do use recursion, then it's not very much
>> simpler with `lexical-binding` than without because you still have to
>> take into account the possibility that the function gets redefined
>> during your recursion :-(
>>
>
> I think you're mistaking self-tail recursion for tail recursion. I mean
> proper tail recursion in Clinger's sense. Any program written in CPS form
> will not "accumulate stack" (note the scare quotes, I know the details
> depend on implementation). You can use a while loop with a trampoline to
> emulate it, sure, but that's not the same as having all control transfers
> take place by simple branching. If you lift all the lambdas, there's no
> implicit memory allocation either. I used to write code like that all the
> time - it's just "higher order assembly language".
>
> You're right about the hiccup introduced by having a "Lisp-2" without
> locally scoped function names. That could be solved by introducing an
> explicit "function lambda" whose parameters provide lexically scoped
> function variables, or by simply using funcall to dispatch to closures on
> ordinary variables. As long as the dispatch happens on locally scoped
> names, the compiler should be able to tell if they are constants.
>
I should have said "(eval-when-compile #'funcall)" in place of "funcall",
to guarantee the operator is a constant involving no runtime lookup of a
function symbol.
>
>> Don't get me wrong: `lexical-binding` is definitely very useful for
>> native compilation (and it does help for tail-calls in some cases,
>> e.g. in `named-let`), but I suspect that for the foreseeable future
>> it'll stay hard to be competitive with something like tree-sitter when
>> writing the code in ELisp
>
>
> This is Emacs. Even if there was a new VM with these features, and a
> transpiler for porting existing ELC files, available today, I wouldn't be
> sure it would be integrated anytime soon.
> I just think the main barrier to introducing such improvements was,
> historically, the dynamic scoping in the massive lisp code base of Emacs.
> With that removed, I don't think Eli's skepticism is warranted. This was
> all hashed out in the 1980s.
>
> Lynn
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 4165 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 10:51 ` Lynn Winebarger
2022-08-13 11:13 ` Lynn Winebarger
@ 2022-08-13 14:07 ` Stefan Monnier
2022-08-13 14:56 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-13 14:07 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
>> > Once the VM supports proper tail recursion, it's straightforward to
>> > generate automata that never perform a function call, at least not as part
>> > of the recognizer.
>>
>> It was straightforward beforehand as well (using a `while` loop instead
>> of recursion). And if you do use recursion, then it's not very much
>> simpler with `lexical-binding` than without because you still have to
>> take into account the possibility that the function gets redefined
>> during your recursion :-(
>>
>
> I think you're mistaking self-tail recursion for tail recursion.
No, I was simply restricting the discussion to the case you mention of
"generat[ing an] automata", in which case you usually have enough
control over the generated code to use a `while` loop if desired.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 14:07 ` Stefan Monnier
@ 2022-08-13 14:56 ` Lynn Winebarger
2022-08-16 16:46 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-13 14:56 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]
On Sat, Aug 13, 2022, 10:07 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> > Once the VM supports proper tail recursion, it's straightforward to
> >> > generate automata that never perform a function call, at least not as
> part
> >> > of the recognizer.
> >>
> >> It was straightforward beforehand as well (using a `while` loop instead
> >> of recursion). And if you do use recursion, then it's not very much
> >> simpler with `lexical-binding` than without because you still have to
> >> take into account the possibility that the function gets redefined
> >> during your recursion :-(
> >>
> >
> > I think you're mistaking self-tail recursion for tail recursion.
>
> No, I was simply restricting the discussion to the case you mention of
> "generat[ing an] automata", in which case you usually have enough
> control over the generated code to use a `while` loop if desired.
It's true you can avoid funcall dispatch overhead that way, but unless I'm
missing something you are stuck with the overhead of the while loop plus
whatever conditional branching mechanism you would use for dispatching to a
state label, as opposed to simply jumping to the contents of a register.
I'm not 100% on whether there's an ELisp construct that the byte compiler
can turn into byte code that uses a table of labels for condirional
dispatch the way a switch statement may be implemented in C.
Lynn
>
[-- Attachment #2: Type: text/html, Size: 2176 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 14:56 ` Lynn Winebarger
@ 2022-08-16 16:46 ` Lynn Winebarger
2022-08-16 17:22 ` Stefan Monnier
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 16:46 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
On Sat, Aug 13, 2022 at 10:56 AM Lynn Winebarger <owinebar@gmail.com> wrote:
>
> On Sat, Aug 13, 2022, 10:07 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>
>> >> > Once the VM supports proper tail recursion, it's straightforward to
>> >> > generate automata that never perform a function call, at least not as part
>> >> > of the recognizer.
>> >>
>> >> It was straightforward beforehand as well (using a `while` loop instead
>> >> of recursion). And if you do use recursion, then it's not very much
>> >> simpler with `lexical-binding` than without because you still have to
>> >> take into account the possibility that the function gets redefined
>> >> during your recursion :-(
>> >>
>> >
>> > I think you're mistaking self-tail recursion for tail recursion.
>>
>> No, I was simply restricting the discussion to the case you mention of
>> "generat[ing an] automata", in which case you usually have enough
>> control over the generated code to use a `while` loop if desired.
>
>
> It's true you can avoid funcall dispatch overhead that way, but unless I'm missing something you are stuck with the overhead of the while loop plus whatever conditional branching mechanism you would use for dispatching to a state label, as opposed to simply jumping to the contents of a register. I'm not 100% on whether there's an ELisp construct that the byte compiler can turn into byte code that uses a table of labels for conditional dispatch the way a switch statement may be implemented in C.
>
BTW, I was being serious - if there's a way to write a simple jump to
do case-dispatching for that trampolining while loop, I'd definitely
look at making semantic produce such automata so the native compiler,
in particular, could optimize the result properly.
Plug for inline LAP versus "C-like DSL" - At least then you could
generate bytecode that jumps to calculated offsets, by constructing a
hash table of symbol values to byte-code labels only available at
assembly time?
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 16:46 ` Lynn Winebarger
@ 2022-08-16 17:22 ` Stefan Monnier
2022-08-17 12:41 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-16 17:22 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
> BTW, I was being serious - if there's a way to write a simple jump to
> do case-dispatching for that trampolining while loop, I'd definitely
> look at making semantic produce such automata so the native compiler,
> in particular, could optimize the result properly.
AFAIK a (pcase x ('foo ..) ('bar ...) ...) should be compiled to
a `switch` bytecode which uses a hash-table lookup to find the
destination target to jump to.
Not sure how well it works for very large tables where we risk bumping
into the 32K limit of bytecode jumps.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 17:22 ` Stefan Monnier
@ 2022-08-17 12:41 ` Lynn Winebarger
2022-08-17 14:04 ` Stefan Monnier
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-17 12:41 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
On Tue, Aug 16, 2022 at 1:22 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > BTW, I was being serious - if there's a way to write a simple jump to
> > do case-dispatching for that trampolining while loop, I'd definitely
> > look at making semantic produce such automata so the native compiler,
> > in particular, could optimize the result properly.
>
> AFAIK a (pcase x ('foo ..) ('bar ...) ...) should be compiled to
> a `switch` bytecode which uses a hash-table lookup to find the
> destination target to jump to.
>
> Not sure how well it works for very large tables where we risk bumping
> into the 32K limit of bytecode jumps.
That did occur to me.
Perhaps one way to address that limitation without completely
overhauling the VM would be to introduce a form of segmented memory
for byte strings. It would require modification of the byte-code
vector during the activation frame of exec_byte_code, but I believe
there is a scheme that would ensure the byte-code object would be
returned to its non-modified state before either executing a funcall
or returning.
It would require one new byte-code op, which I will call "trampoline",
and a vector of byte-code strings (or "code segments") either at a
known location in the constants vector, or as an additional entry in
the byte-code vector itself. I'd lean toward the latter because it
would be easy to discriminate byte-code vectors that are allowed to
use the trampoline instruction without imposing additional space
overhead for existing byte-vectors.
The first entry in the code segment vector would be the "real" or
"entry" byte-string, and no funcalls or returns would be performed
except when that code segment was active.
The trampoline instruction would two arguments off the operand stack -
a code segment, and an offset in the code segment - set the byte
string of the code vector to the indicated code segment, then set the
pc to the offset to perform the jump.
A typical entry byte-string for an automata might take a label and an
arbitrary number of arguments, look up the label in a dispatch table,
and trampoline to that label with the rest of the arguments on the
stack. The following instructions might raise an error to indicate the
label was invalid. The rest of the entry byte string would contain a
block that would expect the operand stack to have the form [ fxn,
arg0, .., argn, cont-segment,cont-offset ], then perform a funcall
using all the elements on the operand stack but the bottom two, then
swap the return value to the bottom of the operand stack and
trampoline to the continuation label. Another block would be required
that would just perform the return operation. There might be another
for dealing with stack unwinding, I don't know the details of how that
works.
At any rate, I believe that would be a feasible scheme that would
allow the byte-compiler to (a) overcome the address space limit, and
(b) allow implementation of efficient tail recursion in simple cases
like dispatching to locally-bound closures that are closed over the
same lexical environment. The compiler could presumably do (b) now if
there wasn't the concern for the limited address space.
Such an instruction would definitely make writing an efficient automata easier.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 12:41 ` Lynn Winebarger
@ 2022-08-17 14:04 ` Stefan Monnier
2022-08-17 14:19 ` Mattias Engdegård
2022-08-17 14:25 ` Lynn Winebarger
0 siblings, 2 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-17 14:04 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
>> Not sure how well it works for very large tables where we risk bumping
>> into the 32K limit of bytecode jumps.
> That did occur to me.
> Perhaps one way to address that limitation without completely
> overhauling the VM would be to introduce a form of segmented memory
> for byte strings.
IIRC the limitation is only in the size of relative jumps, so we might
be able to fix it just by adding a new "longbranch" instruction,
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 14:04 ` Stefan Monnier
@ 2022-08-17 14:19 ` Mattias Engdegård
2022-08-17 22:18 ` Stefan Monnier
2022-08-17 14:25 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Mattias Engdegård @ 2022-08-17 14:19 UTC (permalink / raw)
To: Stefan Monnier
Cc: Lynn Winebarger, Eli Zaretskii, Po Lu, Alan Mackenzie,
emacs-devel, Yuan Fu
17 aug. 2022 kl. 16.04 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> IIRC the limitation is only in the size of relative jumps, so we might
> be able to fix it just by adding a new "longbranch" instruction,
All bytecode jumps are 16-bit absolute. It is something we would like to change; relative jumps would make the interpreter faster.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 14:19 ` Mattias Engdegård
@ 2022-08-17 22:18 ` Stefan Monnier
0 siblings, 0 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-17 22:18 UTC (permalink / raw)
To: Mattias Engdegård
Cc: Lynn Winebarger, Eli Zaretskii, Po Lu, Alan Mackenzie,
emacs-devel, Yuan Fu
Mattias Engdegård [2022-08-17 16:19:08] wrote:
> 17 aug. 2022 kl. 16.04 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
>> IIRC the limitation is only in the size of relative jumps, so we might
>> be able to fix it just by adding a new "longbranch" instruction,
> All bytecode jumps are 16-bit absolute.
Oh well.
> It is something we would like to change; relative jumps would make the
> interpreter faster.
Sounds like it would break backward compatibility, tho :-(
We can increase the range fairly easily if 16bit is not enough, tho.
E.g. we can just use the destination 65535 as a flag to indicate that
the next 4 bytes contain a 32bit destination.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 14:04 ` Stefan Monnier
2022-08-17 14:19 ` Mattias Engdegård
@ 2022-08-17 14:25 ` Lynn Winebarger
1 sibling, 0 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-17 14:25 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]
On Wed, Aug 17, 2022, 10:04 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> Not sure how well it works for very large tables where we risk bumping
> >> into the 32K limit of bytecode jumps.
> > That did occur to me.
> > Perhaps one way to address that limitation without completely
> > overhauling the VM would be to introduce a form of segmented memory
> > for byte strings.
>
> IIRC the limitation is only in the size of relative jumps, so we might
> be able to fix it just by adding a new "longbranch" instruction,
The issue there is that (IIRC) the relative branch opcodes haven't been
used since v20, and are marked as obsolete/slated for removal. So, the
compiler would have to be altered to use long branches initially, with some
later phase replacing them with either relative or short absolute branch
codes currently emitted if applicable. The trampoline extension wouldn't
require alterations to any construct that is already correctly compiled.
The other potential bonus of the trampoline approach is that it could be
used to "link" byte code vectors at some point in the future or provide a
form of compilation unit for byte-code.
Lynn
[-- Attachment #2: Type: text/html, Size: 1678 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2))
2022-08-12 21:50 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Stefan Monnier
2022-08-12 23:26 ` Lynn Winebarger
@ 2022-08-13 4:39 ` Ihor Radchenko
2022-08-13 7:45 ` Ideal performance of ELisp Philip Kaludercic
2022-08-13 14:15 ` Stefan Monnier
1 sibling, 2 replies; 136+ messages in thread
From: Ihor Radchenko @ 2022-08-13 4:39 UTC (permalink / raw)
To: Stefan Monnier
Cc: Eli Zaretskii, Lynn Winebarger, luangruo, acm, emacs-devel,
casouri
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> ELisp code cannot match the speed of optimized C code.
>
> I suspect it could, to some extent, in theory. Getting there would
> require a fair bit more work, probably using a different compilation
> strategy than the AOT compiler we have now.
Could it be possible to embed C snippets into lisp functions directly?
Similar to assembler snippets in C.
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 4:39 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Ihor Radchenko
@ 2022-08-13 7:45 ` Philip Kaludercic
2022-08-13 11:58 ` Ihor Radchenko
2022-08-13 14:15 ` Stefan Monnier
1 sibling, 1 reply; 136+ messages in thread
From: Philip Kaludercic @ 2022-08-13 7:45 UTC (permalink / raw)
To: Ihor Radchenko
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Ihor Radchenko <yantar92@gmail.com> writes:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>>> ELisp code cannot match the speed of optimized C code.
>>
>> I suspect it could, to some extent, in theory. Getting there would
>> require a fair bit more work, probably using a different compilation
>> strategy than the AOT compiler we have now.
>
> Could it be possible to embed C snippets into lisp functions directly?
> Similar to assembler snippets in C.
Inline assembler is usually just pasted verbatim by the C compiler into
the assembled output, but Lisp if Lisp is interpreted, the best thing I
can imagine would be the automatic the automatic generation and loading
of dynamic modules, which considering the call overhead involved would
rarely be worthwhile for just a "snippet".
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 7:45 ` Ideal performance of ELisp Philip Kaludercic
@ 2022-08-13 11:58 ` Ihor Radchenko
0 siblings, 0 replies; 136+ messages in thread
From: Ihor Radchenko @ 2022-08-13 11:58 UTC (permalink / raw)
To: Philip Kaludercic
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Philip Kaludercic <philipk@posteo.net> writes:
>> Could it be possible to embed C snippets into lisp functions directly?
>> Similar to assembler snippets in C.
>
> Inline assembler is usually just pasted verbatim by the C compiler into
> the assembled output, but Lisp if Lisp is interpreted, the best thing I
> can imagine would be the automatic the automatic generation and loading
> of dynamic modules, which considering the call overhead involved would
> rarely be worthwhile for just a "snippet".
I had something like macro in mind.
Say, we got something like:
(c-code
"lines"
"of"
"C code")
would expand to
#<subr anonymous-C-code>
utilising JIT-compiler we already use for native-comp.
Normally, this would happen at compile time, and we do not need to care
about JIT overheads. Otherwise, JIT-compilation can be used as we
already do.
Of course, this is just an idea. But it could be an extremely useful
feature when we want to beat certain performance bottlenecks and do not
want to change Emacs core every time we need such optimization.
Dynamic modules are not good idea, AFAIU. The communication between
modules and the Elisp machine is very limited.
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 4:39 ` Ideal performance of ELisp (was: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)) Ihor Radchenko
2022-08-13 7:45 ` Ideal performance of ELisp Philip Kaludercic
@ 2022-08-13 14:15 ` Stefan Monnier
2022-08-14 9:25 ` Andrea Corallo
1 sibling, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-13 14:15 UTC (permalink / raw)
To: Ihor Radchenko
Cc: Eli Zaretskii, Lynn Winebarger, luangruo, acm, emacs-devel,
casouri
> Could it be possible to embed C snippets into lisp functions directly?
> Similar to assembler snippets in C.
Andrea is better placed to answer that, but I think it would be fairly
easy to allow insertion of "C-like" snippets when the code gets compiled
with the native compiler.
Of course, we'd probably want to make it work even when the code is
interpreted (or only compiled to bytecode).
It would probably offer very handy improvements to the module API.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-13 14:15 ` Stefan Monnier
@ 2022-08-14 9:25 ` Andrea Corallo
2022-08-14 9:34 ` Ihor Radchenko
2022-08-14 13:01 ` Stefan Monnier
0 siblings, 2 replies; 136+ messages in thread
From: Andrea Corallo @ 2022-08-14 9:25 UTC (permalink / raw)
To: Stefan Monnier
Cc: Ihor Radchenko, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Could it be possible to embed C snippets into lisp functions directly?
>> Similar to assembler snippets in C.
>
> Andrea is better placed to answer that, but I think it would be fairly
> easy to allow insertion of "C-like" snippets when the code gets compiled
> with the native compiler.
>
> Of course, we'd probably want to make it work even when the code is
> interpreted (or only compiled to bytecode).
>
> It would probably offer very handy improvements to the module API.
I think the main issue is that libgccjit does not compile C but
libgccir. I can't think of a nice way to blend the two things as of
now.
Andrea
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-14 9:25 ` Andrea Corallo
@ 2022-08-14 9:34 ` Ihor Radchenko
2022-08-14 13:01 ` Eli Zaretskii
2022-08-16 19:23 ` Andrea Corallo
2022-08-14 13:01 ` Stefan Monnier
1 sibling, 2 replies; 136+ messages in thread
From: Ihor Radchenko @ 2022-08-14 9:34 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Andrea Corallo <akrl@sdf.org> writes:
> I think the main issue is that libgccjit does not compile C but
> libgccir. I can't think of a nice way to blend the two things as of
> now.
What about https://gcc.gnu.org/onlinedocs/jit/intro/tutorial02.html ?
Isn't it showing exactly C compilation?
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-14 9:34 ` Ihor Radchenko
@ 2022-08-14 13:01 ` Eli Zaretskii
2022-08-16 19:23 ` Andrea Corallo
1 sibling, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-14 13:01 UTC (permalink / raw)
To: Ihor Radchenko
Cc: akrl, monnier, owinebar, luangruo, acm, emacs-devel, casouri
> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii
> <eliz@gnu.org>, Lynn Winebarger <owinebar@gmail.com>,
> luangruo@yahoo.com, acm@muc.de, emacs-devel@gnu.org, casouri@gmail.com
> Date: Sun, 14 Aug 2022 17:34:34 +0800
>
> Andrea Corallo <akrl@sdf.org> writes:
>
> > I think the main issue is that libgccjit does not compile C but
> > libgccir. I can't think of a nice way to blend the two things as of
> > now.
>
> What about https://gcc.gnu.org/onlinedocs/jit/intro/tutorial02.html ?
> Isn't it showing exactly C compilation?
No, it doesn't. It shows how to write a C program which, when run,
will emit libgccir, which will then be compiled by libgccjit.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-14 9:34 ` Ihor Radchenko
2022-08-14 13:01 ` Eli Zaretskii
@ 2022-08-16 19:23 ` Andrea Corallo
1 sibling, 0 replies; 136+ messages in thread
From: Andrea Corallo @ 2022-08-16 19:23 UTC (permalink / raw)
To: Ihor Radchenko
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Ihor Radchenko <yantar92@gmail.com> writes:
> Andrea Corallo <akrl@sdf.org> writes:
>
>> I think the main issue is that libgccjit does not compile C but
>> libgccir. I can't think of a nice way to blend the two things as of
>> now.
>
> What about https://gcc.gnu.org/onlinedocs/jit/intro/tutorial02.html ?
> Isn't it showing exactly C compilation?
No, as Eli explained that is about writing a C program that describes to
libgccjit the equivalent of another C program.
Andrea
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-14 9:25 ` Andrea Corallo
2022-08-14 9:34 ` Ihor Radchenko
@ 2022-08-14 13:01 ` Stefan Monnier
2022-08-16 8:59 ` Andrea Corallo
1 sibling, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-14 13:01 UTC (permalink / raw)
To: Andrea Corallo
Cc: Ihor Radchenko, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
>>> Could it be possible to embed C snippets into lisp functions directly?
>>> Similar to assembler snippets in C.
>>
>> Andrea is better placed to answer that, but I think it would be fairly
>> easy to allow insertion of "C-like" snippets when the code gets compiled
>> with the native compiler.
>>
>> Of course, we'd probably want to make it work even when the code is
>> interpreted (or only compiled to bytecode).
>>
>> It would probably offer very handy improvements to the module API.
>
> I think the main issue is that libgccjit does not compile C but
> libgccir. I can't think of a nice way to blend the two things as of
> now.
I was thinking of a C-like DSL (probably with a Lisp-style syntax).
Designed to be easy to translate to libgccir as well as not too hard
to interpret when libgccjit is not available.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-14 13:01 ` Stefan Monnier
@ 2022-08-16 8:59 ` Andrea Corallo
2022-08-16 9:50 ` Ihor Radchenko
2022-08-16 15:06 ` Lynn Winebarger
0 siblings, 2 replies; 136+ messages in thread
From: Andrea Corallo @ 2022-08-16 8:59 UTC (permalink / raw)
To: Stefan Monnier
Cc: Ihor Radchenko, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>> Could it be possible to embed C snippets into lisp functions directly?
>>>> Similar to assembler snippets in C.
>>>
>>> Andrea is better placed to answer that, but I think it would be fairly
>>> easy to allow insertion of "C-like" snippets when the code gets compiled
>>> with the native compiler.
>>>
>>> Of course, we'd probably want to make it work even when the code is
>>> interpreted (or only compiled to bytecode).
>>>
>>> It would probably offer very handy improvements to the module API.
>>
>> I think the main issue is that libgccjit does not compile C but
>> libgccir. I can't think of a nice way to blend the two things as of
>> now.
>
> I was thinking of a C-like DSL (probably with a Lisp-style syntax).
> Designed to be easy to translate to libgccir as well as not too hard
> to interpret when libgccjit is not available.
Yep that's an option, I thought about that as well, not sure how much
practical it is tho. One of the downside is that we could not consume
nor provide any of the existing C header files.
Andrea
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 8:59 ` Andrea Corallo
@ 2022-08-16 9:50 ` Ihor Radchenko
2022-08-16 18:21 ` Andrea Corallo
2022-08-16 15:06 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Ihor Radchenko @ 2022-08-16 9:50 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Andrea Corallo <akrl@sdf.org> writes:
>> I was thinking of a C-like DSL (probably with a Lisp-style syntax).
>> Designed to be easy to translate to libgccir as well as not too hard
>> to interpret when libgccjit is not available.
>
> Yep that's an option, I thought about that as well, not sure how much
> practical it is tho. One of the downside is that we could not consume
> nor provide any of the existing C header files.
Do I understand correctly that a certain pre-defined set of C headers
can still be included? If so, it is not necessarily a downside.
We may not want to allow loading arbitrary #include-s because it will
pose a risk running non-free libraries inside Emacs. Yet, using the
libraries already utilized by Emacs + internal Emacs constructs could be
good enough.
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 9:50 ` Ihor Radchenko
@ 2022-08-16 18:21 ` Andrea Corallo
2022-08-17 9:48 ` Ihor Radchenko
0 siblings, 1 reply; 136+ messages in thread
From: Andrea Corallo @ 2022-08-16 18:21 UTC (permalink / raw)
To: Ihor Radchenko
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Ihor Radchenko <yantar92@gmail.com> writes:
> Andrea Corallo <akrl@sdf.org> writes:
>
>>> I was thinking of a C-like DSL (probably with a Lisp-style syntax).
>>> Designed to be easy to translate to libgccir as well as not too hard
>>> to interpret when libgccjit is not available.
>>
>> Yep that's an option, I thought about that as well, not sure how much
>> practical it is tho. One of the downside is that we could not consume
>> nor provide any of the existing C header files.
>
> Do I understand correctly that a certain pre-defined set of C headers
> can still be included?
Where do you undersand that from? libgccjit does *not* consume nor
compile C code as input, header files are just that.
Andrea
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 18:21 ` Andrea Corallo
@ 2022-08-17 9:48 ` Ihor Radchenko
2022-08-17 12:02 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Ihor Radchenko @ 2022-08-17 9:48 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Eli Zaretskii, Lynn Winebarger, luangruo, acm,
emacs-devel, casouri
Andrea Corallo <akrl@sdf.org> writes:
>>> Yep that's an option, I thought about that as well, not sure how much
>>> practical it is tho. One of the downside is that we could not consume
>>> nor provide any of the existing C header files.
>>
>> Do I understand correctly that a certain pre-defined set of C headers
>> can still be included?
>
> Where do you undersand that from? libgccjit does *not* consume nor
> compile C code as input, header files are just that.
An example in https://gcc.gnu.org/onlinedocs/jit/intro/tutorial01.html
is using functions from stdio.
I did not refer to C code verbatim but rather to whatever is the
function used to translate Elisp "C" macro into libgccjit code. Such
macro may refer to C primitives that are included in the file definition
the libgccjit C wrapper.
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 9:48 ` Ihor Radchenko
@ 2022-08-17 12:02 ` Eli Zaretskii
0 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:02 UTC (permalink / raw)
To: Ihor Radchenko
Cc: akrl, monnier, owinebar, luangruo, acm, emacs-devel, casouri
> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii
> <eliz@gnu.org>, Lynn Winebarger <owinebar@gmail.com>,
> luangruo@yahoo.com, acm@muc.de, emacs-devel@gnu.org, casouri@gmail.com
> Date: Wed, 17 Aug 2022 17:48:44 +0800
>
> Andrea Corallo <akrl@sdf.org> writes:
>
> >> Do I understand correctly that a certain pre-defined set of C headers
> >> can still be included?
> >
> > Where do you undersand that from? libgccjit does *not* consume nor
> > compile C code as input, header files are just that.
>
> An example in https://gcc.gnu.org/onlinedocs/jit/intro/tutorial01.html
> is using functions from stdio.
Only for the program which creates libgccir, not for libgccir itself.
The program that creates libgccir uses functions like 'exit' and
'fprintf', so it needs the headers which declare them. But the
generated code, which is then compiled by libgccjit, doesn't call
those functions.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 8:59 ` Andrea Corallo
2022-08-16 9:50 ` Ihor Radchenko
@ 2022-08-16 15:06 ` Lynn Winebarger
2022-08-16 18:24 ` Andrea Corallo
1 sibling, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 15:06 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Ihor Radchenko, Eli Zaretskii, luangruo, acm,
emacs-devel, casouri
On Tue, Aug 16, 2022 at 4:59 AM Andrea Corallo <akrl@sdf.org> wrote:
>
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> >>>> Could it be possible to embed C snippets into lisp functions directly?
> >>>> Similar to assembler snippets in C.
> >>>
> >>> Andrea is better placed to answer that, but I think it would be fairly
> >>> easy to allow insertion of "C-like" snippets when the code gets compiled
> >>> with the native compiler.
> > I was thinking of a C-like DSL (probably with a Lisp-style syntax).
> > Designed to be easy to translate to libgccir as well as not too hard
> > to interpret when libgccjit is not available.
>
> Yep that's an option, I thought about that as well, not sure how much
> practical it is tho. One of the downside is that we could not consume
> nor provide any of the existing C header files.
Ugh. Why not inline LAP blocks?
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 15:06 ` Lynn Winebarger
@ 2022-08-16 18:24 ` Andrea Corallo
2022-08-17 13:04 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Andrea Corallo @ 2022-08-16 18:24 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Stefan Monnier, Ihor Radchenko, Eli Zaretskii, luangruo, acm,
emacs-devel, casouri
Lynn Winebarger <owinebar@gmail.com> writes:
> On Tue, Aug 16, 2022 at 4:59 AM Andrea Corallo <akrl@sdf.org> wrote:
>>
>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>
>> >>>> Could it be possible to embed C snippets into lisp functions directly?
>> >>>> Similar to assembler snippets in C.
>> >>>
>> >>> Andrea is better placed to answer that, but I think it would be fairly
>> >>> easy to allow insertion of "C-like" snippets when the code gets compiled
>> >>> with the native compiler.
>> > I was thinking of a C-like DSL (probably with a Lisp-style syntax).
>> > Designed to be easy to translate to libgccir as well as not too hard
>> > to interpret when libgccjit is not available.
>>
>> Yep that's an option, I thought about that as well, not sure how much
>> practical it is tho. One of the downside is that we could not consume
>> nor provide any of the existing C header files.
>
> Ugh. Why not inline LAP blocks?
You could inline LAP or even LIMPLE relatively easily but I don't see
any perf opportunity to take advantage from in doing that.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-16 18:24 ` Andrea Corallo
@ 2022-08-17 13:04 ` Lynn Winebarger
2022-08-17 14:18 ` Andrea Corallo
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-17 13:04 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Ihor Radchenko, Eli Zaretskii, luangruo, acm,
emacs-devel, casouri
On Tue, Aug 16, 2022 at 2:24 PM Andrea Corallo <akrl@sdf.org> wrote:
>
> Lynn Winebarger <owinebar@gmail.com> writes:
>
> > On Tue, Aug 16, 2022 at 4:59 AM Andrea Corallo <akrl@sdf.org> wrote:
> >>
> >> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> >>
> >
> > Ugh. Why not inline LAP blocks?
>
> You could inline LAP or even LIMPLE relatively easily but I don't see
> any perf opportunity to take advantage from in doing that.
I suppose it depends on what type of instructions/machine model are
operated on by the respective IRs (there's no spec for Emacs LAP
code). Assuming one of them allows you to operate directly on machine
values, without any implicit type-tagging/untagging, then you should
be able to do all the same kind of
abstraction-breaking-but-difficult-or-impossible-for-the-compiler-to-prove-safe
operations that C code could perform. That is the point of the
proposed feature, isn't it?
Assuming LIMPLE is required, I'm not sure how the feature would be
incorporated for users without access to libgccjit. Perhaps an
additional byte-code operator like "execute-limple-insn" could be
implemented that would support a set of supported "unsafe" LIMPLE
instructions?
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 13:04 ` Lynn Winebarger
@ 2022-08-17 14:18 ` Andrea Corallo
2022-08-18 12:17 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Andrea Corallo @ 2022-08-17 14:18 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Stefan Monnier, Ihor Radchenko, Eli Zaretskii, luangruo, acm,
emacs-devel, casouri
Lynn Winebarger <owinebar@gmail.com> writes:
> On Tue, Aug 16, 2022 at 2:24 PM Andrea Corallo <akrl@sdf.org> wrote:
>>
>> Lynn Winebarger <owinebar@gmail.com> writes:
>>
>> > On Tue, Aug 16, 2022 at 4:59 AM Andrea Corallo <akrl@sdf.org> wrote:
>> >>
>> >> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> >>
>> >
>> > Ugh. Why not inline LAP blocks?
>>
>> You could inline LAP or even LIMPLE relatively easily but I don't see
>> any perf opportunity to take advantage from in doing that.
>
> I suppose it depends on what type of instructions/machine model are
> operated on by the respective IRs (there's no spec for Emacs LAP
> code).
LAP ATM is defined by the current implementation, so when we talk about
that this is what we refer to.
> Assuming one of them allows you to operate directly on machine
> values, without any implicit type-tagging/untagging, then you should
> be able to do all the same kind of
> abstraction-breaking-but-difficult-or-impossible-for-the-compiler-to-prove-safe
> operations that C code could perform. That is the point of the
> proposed feature, isn't it?
ATM LAP (apart from some exception) relies on calling primitive
functions, those do not manipulate unboxed objects.
But yeah in principle changing LAP, exposing it and exposing through a
number of functions capable of working on unboxed objects might be
useful for writing some optimized code, *but*...
...this is a ton of changes for what? Having an non easy to use DSL
that is capable of optimizing only some very specific case?
I think there are better ways to improve in this area.
Don't want to sound harsh, but the thing about these discussions IMO is
that typically is more about writing the longest and last mail in other
to prove to be right, more than implementing real changes and
improvements. I'm not a big fun of this, my personal preference goes
for seeing a definitely higher LOC/discussion ratio.
Andrea
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Ideal performance of ELisp
2022-08-17 14:18 ` Andrea Corallo
@ 2022-08-18 12:17 ` Lynn Winebarger
0 siblings, 0 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-18 12:17 UTC (permalink / raw)
To: Andrea Corallo
Cc: Stefan Monnier, Ihor Radchenko, Eli Zaretskii, luangruo, acm,
emacs-devel, casouri
On Wed, Aug 17, 2022 at 10:18 AM Andrea Corallo <akrl@sdf.org> wrote:
> Lynn Winebarger <owinebar@gmail.com> writes:
>
> > Assuming one of them allows you to operate directly on machine
> > values, without any implicit type-tagging/untagging, then you should
> > be able to do all the same kind of
> > abstraction-breaking-but-difficult-or-impossible-for-the-compiler-to-prove-safe
> > operations that C code could perform. That is the point of the
> > proposed feature, isn't it?
>
> ATM LAP (apart from some exception) relies on calling primitive
> functions, those do not manipulate unboxed objects.
>
> But yeah in principle changing LAP, exposing it and exposing through a
> number of functions capable of working on unboxed objects might be
> useful for writing some optimized code, *but*...
>
> ...this is a ton of changes for what? Having an non easy to use DSL
> that is capable of optimizing only some very specific case?
The original question for this subthread was around whether there
could be a way to inline C snippets in Elisp the way assembler
(usually, and in an implementation-specific way) can be included in C
programs. Assuming that any such extension would have to provide a
meaningful semantics for users that don't have libgccjit, it seems a
lot more useful to me to define access to the equivalent of assembly
language in a way both implementations can make use of it. Then
anyone can layer a DSL targeting that core, whether for a C-like
syntax or whatever, and get defined behavior on all platforms Emacs
supports. I mean, *if* you (or anyone) were going to implement low
level facilities, I'd rather end up with something like this I could
use to extend or replace the compiler dynamically than some partial
recreation of C semantics, for whatever that is worth.
I'm not suggesting it should be the first on anyone's list. OTOH, as
you noted, providing the ability to inline LAP or LIMPLE is relatively
low-hanging fruit. Then it would be on whoever wanted to use that
facility to implement any extensions required to do so. Anyone who is
using such targets is pretty much declaring that optimizations by the
native compiler passes (other than those by/following libgccjit) are
of no interest.
> Don't want to sound harsh, but the thing about these discussions IMO is
> that typically is more about writing the longest and last mail in other
> to prove to be right, more than implementing real changes and
> improvements. I'm not a big fun of this, my personal preference goes
> for seeing a definitely higher LOC/discussion ratio.
You're entitled to your preferences, but the last word in these
discussions is what's checked into the code base. And, unfortunately,
due to the nature of employment law in the US and the expense involved
to verify the enforceability or non-enforceability of broadly written
contracts of adhesion by employers, I am unable to contribute any LOC
for the time being. OTOH, the same lack of value you see in
discussing design points is what allows me to engage in them. It's an
odd fact that usable code might be claimed as IP, but merely
describing the required modifications (so long as they only involve
publically available software and well-known techniques apparent to
anyone with the appropriate expertise) isn't subject to such claims.
Or, at least, I'm not willing to countenance the possibility that such
an act would be subject to IP claims or ownership by an employer. So
the best I can do is try to clarify any misinterpretation of what I've
written (presumably due to non-intentional opaqueness on my part), or
correct errors in things I've written.
As a general rule, I prefer to ask people about why code is a certain
way, or what their preferences are for solving a particular issue,
before embarking on extensive rework of a piece of code. I can, have,
and do speculate on what their answers might be based on the code as I
work with it, but I just think it's better to simply ask and let them
speak for themselves than to reverse-engineer (potentially
incorrectly) their intentions.
Personally, I'm a little discouraged that I've reported issues that
have been ignored or dismissed, then see them discussed later as
though they were a surprise. I assume it's because the maintainers
don't know me from Adam, and my discussion points are on the
idiosyncratic side. But I'm contributing what I can at the moment.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-12 12:37 ` Lynn Winebarger
2022-08-12 12:50 ` Eli Zaretskii
@ 2022-08-12 16:00 ` Akib Azmain Turja
2022-08-12 19:06 ` tomas
2022-08-13 11:57 ` Lynn Winebarger
1 sibling, 2 replies; 136+ messages in thread
From: Akib Azmain Turja @ 2022-08-12 16:00 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, luangruo, acm, emacs-devel, monnier, Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]
Lynn Winebarger <owinebar@gmail.com> writes:
> On Wed, Aug 10, 2022 at 7:31 AM Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> I don't have this information. Maybe someone else does. But in
>> general, it is a very small wonder that a parser written in optimized
>> C is much faster than anything written in Emacs Lisp, given that Lisp
>> is an interpreted language that has no special support for writing
>> parsers.
>
> That can be cured over time, now that the bulk of the core of emacs
> uses lexical scoping. With proper tail recursion, ELisp should be
> able to produce lexers and parsers roughly as efficient as C code, if
> not more efficient (depending on if you allow use of "computed goto"
> in the C code for the lexers and parsers). That does require changes
> to the byte code VM, but it's doable.
It's hard for any compiled language to beat C code, and I believe it's
*impossible* for any interpreted language to do that. And if it somehow
does that, I would believe that the result is *hard-coded* in it.
By the way, is native compiled Emacs Lisp faster than the code produced
by Guile's JIT? If so, we can write the parser and the lexer in Scheme
and use the result in Emacs.
(Triggering a heated discussion again...) Or maybe we can link Guile
to Emacs so that people can extend Emacs with the "GNU’s Ubiquitous
Intelligent Language for Extensions".
--
Akib Azmain Turja
Find me on Mastodon at @akib@hostux.social.
This message is signed by me with my GnuPG key. Its fingerprint is:
7001 8CE5 819F 17A3 BBA6 66AF E74F 0EFA 922A E7F5
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-12 16:00 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Akib Azmain Turja
@ 2022-08-12 19:06 ` tomas
2022-08-13 4:41 ` Akib Azmain Turja
2022-08-13 11:57 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: tomas @ 2022-08-12 19:06 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 895 bytes --]
On Fri, Aug 12, 2022 at 10:00:43PM +0600, Akib Azmain Turja wrote:
[...]
> It's hard for any compiled language to beat C code, and I believe it's
> *impossible* for any interpreted language to do that. And if it somehow
> does that, I would believe that the result is *hard-coded* in it.
[...]
I think this is too simplistic. There are known (small) cases where
(compiled) Common Lisp beats C code, or where LuaJit [1] does (don't
forget: a JIT knows things about your program a compiler can't). At
the other end of the scale (the very complex), where you end up
writing a whole garbage collector in your C app, it will be pretty
hard to beat one of the modern GCs you'll find in Schemes or
Javascripts.
So the answer to this is most probably "it depends" :)
Cheers
[1] https://wingolog.org/archives/2014/09/02/high-performance-packet-filtering-with-pflua
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-12 19:06 ` tomas
@ 2022-08-13 4:41 ` Akib Azmain Turja
2022-08-13 5:14 ` tomas
0 siblings, 1 reply; 136+ messages in thread
From: Akib Azmain Turja @ 2022-08-13 4:41 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]
<tomas@tuxteam.de> writes:
> On Fri, Aug 12, 2022 at 10:00:43PM +0600, Akib Azmain Turja wrote:
>
> [...]
>
>> It's hard for any compiled language to beat C code, and I believe it's
>> *impossible* for any interpreted language to do that. And if it somehow
>> does that, I would believe that the result is *hard-coded* in it.
>
> [...]
>
> I think this is too simplistic. There are known (small) cases where
> (compiled) Common Lisp beats C code, or where LuaJit [1] does (don't
> forget: a JIT knows things about your program a compiler can't). At
> the other end of the scale (the very complex), where you end up
> writing a whole garbage collector in your C app, it will be pretty
> hard to beat one of the modern GCs you'll find in Schemes or
> Javascripts.
>
> So the answer to this is most probably "it depends" :)
>
> Cheers
>
> [1] https://wingolog.org/archives/2014/09/02/high-performance-packet-filtering-with-pflua
Yeah, it's possible for very optimized Brainfuck code to beat poor C
code. Emacs has a native compiler, and AFAIK it's a ahead of time (AOT)
compiler. If you really need a JIT, do performance-critical things in
Guile Scheme and use results from Emacs.
--
Akib Azmain Turja
Find me on Mastodon at @akib@hostux.social.
This message is signed by me with my GnuPG key. Its fingerprint is:
7001 8CE5 819F 17A3 BBA6 66AF E74F 0EFA 922A E7F5
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-13 4:41 ` Akib Azmain Turja
@ 2022-08-13 5:14 ` tomas
0 siblings, 0 replies; 136+ messages in thread
From: tomas @ 2022-08-13 5:14 UTC (permalink / raw)
To: Akib Azmain Turja; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1288 bytes --]
On Sat, Aug 13, 2022 at 10:41:16AM +0600, Akib Azmain Turja wrote:
[...]
> Yeah, it's possible for very optimized Brainfuck code to beat poor C
> code. Emacs has a native compiler, and AFAIK it's a ahead of time (AOT)
> compiler. If you really need a JIT, do performance-critical things in
> Guile Scheme and use results from Emacs.
I wasn't talking about brainfuck, but about Lua. And about the fact
that performance is such a broad topic as to make sweeping assertions
(like "C is faster than...") almost always wrong in some way.
Of course, current C compilers are extremely good in the niche they
were designed for (AOT compiling, not extremely complex systems [1]),
because tons and tons of resources went into them already. When other
systems get that attention (see Stefan's example with Javascript in
this thread), you see results. They can play out advantages found in
other niches (no AOT, e.g. JIT) and so on.
As for Guile/Emacs, there are people working on that. Their approach,
though, is to teach Guile to understand Emacs Lisp. It moves slowly...
for a lack of resources.
Cheers
[1] Extremely complex systems are still possible, but very expensive.
Cf. the Linux kernel.
[2] e.g. https://www.emacswiki.org/emacs/GuileEmacs
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-12 16:00 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Akib Azmain Turja
2022-08-12 19:06 ` tomas
@ 2022-08-13 11:57 ` Lynn Winebarger
2022-08-13 14:28 ` Akib Azmain Turja
1 sibling, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-13 11:57 UTC (permalink / raw)
To: Akib Azmain Turja
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Stefan Monnier,
Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 666 bytes --]
On Fri, Aug 12, 2022, 12:03 PM Akib Azmain Turja <akib@disroot.org> wrote:
>
>
> (Triggering a heated discussion again...) Or maybe we can link Guile
> to Emacs so that people can extend Emacs with the "GNU’s Ubiquitous
> Intelligent Language for Extensions".
>
If the maintainers of Guile actually dropped their opposition to enforcing
proper tail recursion at some point in the last 2 decades (when I last
looked at it), then that's an option. The doc for v3 says proper tail
recursion is supported, but then the section on history discusses the
"proposed" support for proper tail recursion, so I don't really know what
the status is.
Lynn
[-- Attachment #2: Type: text/html, Size: 1325 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-13 11:57 ` Lynn Winebarger
@ 2022-08-13 14:28 ` Akib Azmain Turja
0 siblings, 0 replies; 136+ messages in thread
From: Akib Azmain Turja @ 2022-08-13 14:28 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, Po Lu, Alan Mackenzie, emacs-devel, Stefan Monnier,
Yuan Fu
[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]
Lynn Winebarger <owinebar@gmail.com> writes:
> On Fri, Aug 12, 2022, 12:03 PM Akib Azmain Turja <akib@disroot.org> wrote:
>
>>
>>
>> (Triggering a heated discussion again...) Or maybe we can link Guile
>> to Emacs so that people can extend Emacs with the "GNU’s Ubiquitous
>> Intelligent Language for Extensions".
>>
>
> If the maintainers of Guile actually dropped their opposition to enforcing
> proper tail recursion at some point in the last 2 decades (when I last
> looked at it), then that's an option. The doc for v3 says proper tail
> recursion is supported, but then the section on history discusses the
> "proposed" support for proper tail recursion, so I don't really know what
> the status is.
>
> Lynn
The manual says tail call optimization is supported because it is
required by the Scheme specification. And IIUC proper tail recursion is
a proposed feature of *WebAssembly*, not Guile.
--
Akib Azmain Turja
Find me on Mastodon at @akib@hostux.social.
This message is signed by me with my GnuPG key. Its fingerprint is:
7001 8CE5 819F 17A3 BBA6 66AF E74F 0EFA 922A E7F5
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 2:14 ` Po Lu
2022-08-10 2:42 ` Eli Zaretskii
@ 2022-08-14 19:24 ` Eric Ludlam
2022-08-16 10:42 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Eric Ludlam @ 2022-08-14 19:24 UTC (permalink / raw)
To: Po Lu, Lynn Winebarger; +Cc: Eli Zaretskii, Alan Mackenzie, emacs-devel
On 8/9/22 10:14 PM, Po Lu wrote:
> Lynn Winebarger <owinebar@gmail.com> writes:
>
>> I'm curious, though, as to why Semantic/CEDET seems to have been
>> superceded by external solutions like tree-sitter or LSP-based
>> (non-emacs) servers. One of the draws of Emacs for me is the
>> "batteries included" nature of it having Emacs Lisp built in. Is
>> there a downside to using Semantic as the basis for improving my
>> derived mode that's non-obvious?
>
> I think Semantic lost intertia after the original author lost interest
> in it (or left for unrelated reasons, I don't remember which.)
I eventually stopped pushing on CEDET for a few reasons - but a big one
was that I don't code professionally anymore, and trying to wrestle the
legal paperwork from my company and merges between repositories
necessitated by those restrictions was just too troublesome.
I was also frequently surprised by how hard it was to get CEDET to 'just
work' well enough for everyone to use it as intended, and how often
people just jumped over to simpler one-off external tools because the
full suite way CEDET works was too heavy a lift. That in turn resulted
in not a lot of contributors to help support/improve those workflows.
Tools like LSP also became good enough where there was no way I could
keep up. I had hoped to pull data from external tools like lsp into the
framework CEDET used, but again the simpler one-off tools were too
appealing to that audience.
Overall, I think that is fine though - having many projects
experimenting with different techniques, and having the best solution
win is the benefit of free software. Developing CEDET back when it was
the only game it town was a good time with many good people helping, and
I am glad to have been a part of that, and I'm glad CEDET is still
useful in many cases.
Eric
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-14 19:24 ` Eric Ludlam
@ 2022-08-16 10:42 ` Lynn Winebarger
2022-08-17 1:56 ` Eric Ludlam
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 10:42 UTC (permalink / raw)
To: Eric Ludlam; +Cc: Po Lu, Eli Zaretskii, Alan Mackenzie, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3042 bytes --]
On Sun, Aug 14, 2022 at 3:24 PM Eric Ludlam <ericludlam@gmail.com> wrote:
>
> On 8/9/22 10:14 PM, Po Lu wrote:
> > Lynn Winebarger <owinebar@gmail.com> writes:
> >
> >> I'm curious, though, as to why Semantic/CEDET seems to have been
> >> superceded by external solutions like tree-sitter or LSP-based
> >> (non-emacs) servers. One of the draws of Emacs for me is the
> >> "batteries included" nature of it having Emacs Lisp built in. Is
> >> there a downside to using Semantic as the basis for improving my
> >> derived mode that's non-obvious?
> >
> > I think Semantic lost intertia after the original author lost interest
> > in it (or left for unrelated reasons, I don't remember which.)
>
> I eventually stopped pushing on CEDET for a few reasons - but a big one
> was that I don't code professionally anymore, and trying to wrestle the
> legal paperwork from my company and merges between repositories
> necessitated by those restrictions was just too troublesome.
Unfortunately, I feel your pain.
> I was also frequently surprised by how hard it was to get CEDET to 'just
> work' well enough for everyone to use it as intended, and how often
> people just jumped over to simpler one-off external tools because the
> full suite way CEDET works was too heavy a lift. That in turn resulted
> in not a lot of contributors to help support/improve those workflows.
> Tools like LSP also became good enough where there was no way I could
> keep up. I had hoped to pull data from external tools like lsp into the
> framework CEDET used, but again the simpler one-off tools were too
> appealing to that audience.
>
> Overall, I think that is fine though - having many projects
> experimenting with different techniques, and having the best solution
> win is the benefit of free software. Developing CEDET back when it was
> the only game it town was a good time with many good people helping, and
> I am glad to have been a part of that, and I'm glad CEDET is still
> useful in many cases.
I think there should be a substantive place for such a framework in Emacs,
regardless of external tools that can be used to provide some of the
analysis. Even if a mode doesn't use the parser generated by a grammar,
the grammar can also provide a description of the syntactic structure that
can be used in separating fontification from syntactic analysis. If I
understand it correctly, Semantic provides support for that generic
approach and tying the classification to the text through overlays It
should be straightforward for a major mode to create a set of faces that
can be applied generically by font-lock based on those overlays instead of
via regular expressions on the underlying text.
I've skimmed lsp-mode, but I can't tell how it attaches the analysis from
the server to the text.
Just looking at tsc-core's GitHub page, I don't see a similar generic
approach being provided. I get the impression there is a lot of dependence
on the individual language/mode as to how the information gets incorporated
in the fontification.
Lynn
[-- Attachment #2: Type: text/html, Size: 3720 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 10:42 ` Lynn Winebarger
@ 2022-08-17 1:56 ` Eric Ludlam
0 siblings, 0 replies; 136+ messages in thread
From: Eric Ludlam @ 2022-08-17 1:56 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Po Lu, Eli Zaretskii, Alan Mackenzie, emacs-devel
On 8/16/22 6:42 AM, Lynn Winebarger wrote:
> On Sun, Aug 14, 2022 at 3:24 PM Eric Ludlam <ericludlam@gmail.com> wrote:
> > I was also frequently surprised by how hard it was to get CEDET to 'just
> > work' well enough for everyone to use it as intended, and how often
> > people just jumped over to simpler one-off external tools because the
> > full suite way CEDET works was too heavy a lift. That in turn resulted
> > in not a lot of contributors to help support/improve those workflows.
> > Tools like LSP also became good enough where there was no way I could
> > keep up. I had hoped to pull data from external tools like lsp into the
> > framework CEDET used, but again the simpler one-off tools were too
> > appealing to that audience.
> >
> > Overall, I think that is fine though - having many projects
> > experimenting with different techniques, and having the best solution
> > win is the benefit of free software. Developing CEDET back when it was
> > the only game it town was a good time with many good people helping, and
> > I am glad to have been a part of that, and I'm glad CEDET is still
> > useful in many cases.
>
> I think there should be a substantive place for such a framework in
> Emacs, regardless of external tools that can be used to provide some
> of the analysis. Even if a mode doesn't use the parser generated by a
> grammar, the grammar can also provide a description of the syntactic
> structure that can be used in separating fontification from syntactic
> analysis. If I understand it correctly, Semantic provides support for
> that generic approach and tying the classification to the text through
> overlays It should be straightforward for a major mode to create a
> set of faces that can be applied generically by font-lock based on
> those overlays instead of via regular expressions on the underlying text.
> I've skimmed lsp-mode, but I can't tell how it attaches the analysis
> from the server to the text.
> Just looking at tsc-core's GitHub page, I don't see a similar generic
> approach being provided. I get the impression there is a lot of
> dependence on the individual language/mode as to how the information
> gets incorporated in the fontification.
One of my goals in CEDET/Semantic was to provide distinct interfaces
between layers. Thus, there is a parsing/tagging layer, a middle layer
to manage tagging data, and a few example tools that could take
advantage of the data.
This mostly worked well. One parser that didn't make it into Emacs used
ctags which added support for 8 additional modes such as sh, asm, &
pascal. This tool would populate the tag data the same way as other
semantic parsers, so when you use some of tools that sit on top, such as
'senator' everything just works.
So indeed, if the middle layer were to be used with more tools, new
parser/taggers could just fill it in as a way to let those tools work.
In fact, there's no reason why lsp couldn't populate the tagging layer
while continuing to do the other things it does well.
Eric
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 0:22 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Lynn Winebarger
2022-08-10 2:14 ` Po Lu
@ 2022-08-10 17:03 ` Tassilo Horn
2022-08-13 14:40 ` Jostein Kjønigsen
2 siblings, 0 replies; 136+ messages in thread
From: Tassilo Horn @ 2022-08-10 17:03 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Eli Zaretskii, Alan Mackenzie, emacs-devel
Lynn Winebarger <owinebar@gmail.com> writes:
> I'm curious, though, as to why Semantic/CEDET seems to have been
> superceded by external solutions like tree-sitter or LSP-based
> (non-emacs) servers.
I guess another very large bonus point for tree-sitter is that grammars
for all major and not so major languages are already there [1] and don't
need to be written and maintained by us.
Bye,
Tassilo
[1] https://tree-sitter.github.io/tree-sitter/#available-parsers
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-10 0:22 ` Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2) Lynn Winebarger
2022-08-10 2:14 ` Po Lu
2022-08-10 17:03 ` Tassilo Horn
@ 2022-08-13 14:40 ` Jostein Kjønigsen
2022-08-14 1:23 ` Po Lu
2 siblings, 1 reply; 136+ messages in thread
From: Jostein Kjønigsen @ 2022-08-13 14:40 UTC (permalink / raw)
To: Lynn Winebarger, Eli Zaretskii; +Cc: Alan Mackenzie, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2535 bytes --]
It seems (as usual) the core question asked in the thread gets burried
in lots of other discussion, which while I'm sure is worthwhile, doesn't
quite answer the original question.
Specifically I'm referring to the email-topic and the question asked in
the email:
On 10.08.2022 02:22, Lynn Winebarger wrote:
> I'm curious, though, as to why Semantic/CEDET seems to have been
> superceded by external solutions like tree-sitter or LSP-based
> (non-emacs) servers.
Maybe I'm not an average Emacs'er. I'll even be willing to admit that
one of the main problems here is my lack of knowledge in this matter.
But I've yet to date seen any case where Semantic or CEDET has provided
me with value as a Emacs end-user and programmer. Maybe it works. Maybe
it's useful. Maybe it does magical things. But as far as I know, it's a
mode you can enable, and then .... nothing happens.
Compare that to LSP... I install a package (lsp-client), add the
relevant hook to prog-mode, and /instantly /every major-mode I interact
with gets intelligent, contextually correct, auto-complete. I can do
safe renames across projects, and lots of actual useful, observable
magic happens. And I get the exact same language support as every other
editor out there, and I'm not limited to who is willing to contribute to
the Emacs-ecosystem.
This is why LSP has such great adoption "everywhere", and became the
universal standard almost overnight.
If CEDET can do this too... Then why not provide some documentatio or
guides on how to set it up to be equally useful? If it can't, then one
needs to clarify what CEDETs role is, in a world where everyone else has
decided on LSP being the way forward, and provide documentation/guides
which helps people exploit what CEDET has to offer. Otherwise what I
suspect is already a niche mode will become, if possible, even more niche.
Back to tree-sitter...I'm not saying that the same thing as LSP is
guaranteed to happen to tree-sitter, not by far. But it has some of the
same great things giving it momentum: It's fast (instant?),
editor-external, meaning it gets contributions from lots of developers
outside the Emacs ecosystem, and as a result already has parser for
pretty much all languages out there.
If I'm making a new major-mode these days, for the same reason, I'm
going to be basing it on tree-sitter in one form or the other.
--
Kind regards
*Jostein Kjønigsen*
jostein@kjonigsen.net 🍵 jostein@gmail.com
https://jostein.kjønigsen.no <https://jostein.kjønigsen.no>
[-- Attachment #2: Type: text/html, Size: 4120 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-13 14:40 ` Jostein Kjønigsen
@ 2022-08-14 1:23 ` Po Lu
2022-08-16 9:06 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Po Lu @ 2022-08-14 1:23 UTC (permalink / raw)
To: Jostein Kjønigsen
Cc: Lynn Winebarger, Eli Zaretskii, jostein, Alan Mackenzie,
emacs-devel
Jostein Kjønigsen <jostein@secure.kjonigsen.net> writes:
> But I've yet to date seen any case where Semantic or CEDET has
> provided me with value as a Emacs end-user and programmer. Maybe it
> works. Maybe it's useful. Maybe it does magical things. But as far as
> I know, it's a mode you can enable, and then .... nothing happens.
That's not true. If you enable Semantic and EDE, and add system
includes (with semantic-add-system-include), and wait for the initial
parse to finish after visiting a file in a project, it becomes
immediately useful for editing C code. It's actually what I use for my
day job.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-14 1:23 ` Po Lu
@ 2022-08-16 9:06 ` Lynn Winebarger
2022-08-16 11:05 ` Po Lu
2022-08-16 11:41 ` Eli Zaretskii
0 siblings, 2 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 9:06 UTC (permalink / raw)
To: Po Lu
Cc: Jostein Kjønigsen, Eli Zaretskii, jostein, Alan Mackenzie,
emacs-devel, Yuan Fu
On Sat, Aug 13, 2022 at 9:23 PM Po Lu <luangruo@yahoo.com> wrote:
>
> Jostein Kjønigsen <jostein@secure.kjonigsen.net> writes:
>
> > But I've yet to date seen any case where Semantic or CEDET has
> > provided me with value as a Emacs end-user and programmer. Maybe it
> > works. Maybe it's useful. Maybe it does magical things. But as far as
> > I know, it's a mode you can enable, and then .... nothing happens.
>
> That's not true. If you enable Semantic and EDE, and add system
> includes (with semantic-add-system-include), and wait for the initial
> parse to finish after visiting a file in a project, it becomes
> immediately useful for editing C code. It's actually what I use for my
> day job.
I think what Jostein means is - how would you know you need to take
all those steps?
I'm not 100% on what Semantic is doing for me. I did run "make TAGS"
in the Emacs source directory, and it seems to do something, although
I don't know how much of that is Semantic versus one of the many other
packages I've loaded into the emacs dump I use. I know I get great
completion sometimes, and I see notifications that my Elisp buffers
get parsed by an LL parser. The fontification is different, and the
header line is used for displaying the function scope point is in.
I added Yuan Fu to the CC per your earlier recommendation. I'm not
sure what their role is - migrating the documentation?
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 9:06 ` Lynn Winebarger
@ 2022-08-16 11:05 ` Po Lu
2022-08-16 11:41 ` Eli Zaretskii
1 sibling, 0 replies; 136+ messages in thread
From: Po Lu @ 2022-08-16 11:05 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Jostein Kjønigsen, Eli Zaretskii, jostein, Alan Mackenzie,
emacs-devel, Yuan Fu
Lynn Winebarger <owinebar@gmail.com> writes:
> I'm not 100% on what Semantic is doing for me. I did run "make TAGS"
> in the Emacs source directory, and it seems to do something, although
> I don't know how much of that is Semantic versus one of the many other
> packages I've loaded into the emacs dump I use.
make TAGS generates a tags table, not really related to Semantic.
> I know I get great completion sometimes, and I see notifications that
> my Elisp buffers get parsed by an LL parser.
This is Semantic.
> The fontification is different, and the header line is used for
> displaying the function scope point is in.
That's probably Semantic too, but some other third party code does that
as well.
> I added Yuan Fu to the CC per your earlier recommendation. I'm not
> sure what their role is - migrating the documentation?
Yuan Fu is the author of the tree sitter code in the feature branch.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 9:06 ` Lynn Winebarger
2022-08-16 11:05 ` Po Lu
@ 2022-08-16 11:41 ` Eli Zaretskii
2022-08-16 16:33 ` Lynn Winebarger
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-16 11:41 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: luangruo, jostein, jostein, acm, emacs-devel, casouri
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Tue, 16 Aug 2022 05:06:58 -0400
> Cc: Jostein Kjønigsen <jostein@secure.kjonigsen.net>,
> Eli Zaretskii <eliz@gnu.org>, jostein@kjonigsen.net, Alan Mackenzie <acm@muc.de>,
> emacs-devel <emacs-devel@gnu.org>, Yuan Fu <casouri@gmail.com>
>
> On Sat, Aug 13, 2022 at 9:23 PM Po Lu <luangruo@yahoo.com> wrote:
> >
> > Jostein Kjønigsen <jostein@secure.kjonigsen.net> writes:
> >
> > That's not true. If you enable Semantic and EDE, and add system
> > includes (with semantic-add-system-include), and wait for the initial
> > parse to finish after visiting a file in a project, it becomes
> > immediately useful for editing C code. It's actually what I use for my
> > day job.
>
> I think what Jostein means is - how would you know you need to take
> all those steps?
As usual: by reading the fine documentation. Amazingly enough,
Semantic does have an Info manual, which comes with Emacs, and those
steps are documented there.
More generally: Semantic's problems, issues, and disadvantages aside,
let's not pretend that Emacs maintainers are incompetent. When the
decision was made to add parts of Semantic to Emacs core, back in
Emacs 23 days, a lot of effort went into its proper integration,
including making its documentation available. So if someone asks
him/herself how do I use this stuff, I expect that someone to make the
minimal effort of reading the available documentation and trying to
follow it. If that doesn't work, then yes, by all means do complain.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 11:41 ` Eli Zaretskii
@ 2022-08-16 16:33 ` Lynn Winebarger
2022-08-16 17:19 ` Stefan Monnier
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 16:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: luangruo, jostein, jostein, acm, emacs-devel, casouri
On Tue, Aug 16, 2022 at 7:42 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Lynn Winebarger <owinebar@gmail.com>
> > Date: Tue, 16 Aug 2022 05:06:58 -0400
> > Cc: Jostein Kjønigsen <jostein@secure.kjonigsen.net>,
> > Eli Zaretskii <eliz@gnu.org>, jostein@kjonigsen.net, Alan Mackenzie <acm@muc.de>,
> > emacs-devel <emacs-devel@gnu.org>, Yuan Fu <casouri@gmail.com>
> >
> > On Sat, Aug 13, 2022 at 9:23 PM Po Lu <luangruo@yahoo.com> wrote:
> > >
> > > Jostein Kjønigsen <jostein@secure.kjonigsen.net> writes:
> > >
> > > That's not true. If you enable Semantic and EDE, and add system
> > > includes (with semantic-add-system-include), and wait for the initial
> > > parse to finish after visiting a file in a project, it becomes
> > > immediately useful for editing C code. It's actually what I use for my
> > > day job.
> >
> > I think what Jostein means is - how would you know you need to take
> > all those steps?
>
> As usual: by reading the fine documentation. Amazingly enough,
> Semantic does have an Info manual, which comes with Emacs, and those
> steps are documented there.
>
> More generally: Semantic's problems, issues, and disadvantages aside,
> let's not pretend that Emacs maintainers are incompetent. When the
> decision was made to add parts of Semantic to Emacs core, back in
> Emacs 23 days, a lot of effort went into its proper integration,
> including making its documentation available. So if someone asks
> him/herself how do I use this stuff, I expect that someone to make the
> minimal effort of reading the available documentation and trying to
> follow it. If that doesn't work, then yes, by all means do complain.
I'm only saying there's a disconnect between Jostein's report and Po's
response. It's probably a UI issue. There's a checkbox in a dropdown
menu that says "Source Code Parsers (Semantic)". That probably
suggests to the casual user that all they have to do is check the box
to get the full benefit. Or that if they need to do more
configuration, clicking the box will lead to a process to do the
configuration. Whether that's a reasonable expectation on the user's
part, I couldn't say.
That said, as a developer there are some missing pieces in the
Semantics docs. For one, some of the texi files - particularly the
ones for the grammar framework and language dev, did not make it into
the emacs source tree, though there are still dangling references to
those documents. I went to the last (circa 2014) version of CEDET
from sourceforge and grabbed the corresponding docs from there.
I will say one of my disappointments with the Semantic grammar
framework (once I grabbed those grammar-fw and langdev docs) is that
the lexers rely on the syntax tables to identify blocks.
Unfortunately, that is one of the limitations I had hoped to overcome
by using Semantic. For example,
* "${" and "{" could both open a block closed by "}"
* if/fi, case/esac, etc, or possibly all keyword blocks are closed by "end"
* "variadic" structures like try/catch+/finally?
It's not clear from the doc just how much this reliance on the syntax
table based block identification is baked into the lexer/parser
generation.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 16:33 ` Lynn Winebarger
@ 2022-08-16 17:19 ` Stefan Monnier
2022-08-16 17:40 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-16 17:19 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Eli Zaretskii, luangruo, jostein, jostein, acm, emacs-devel,
casouri
> I'm only saying there's a disconnect between Jostein's report and Po's
> response. It's probably a UI issue. There's a checkbox in a dropdown
> menu that says "Source Code Parsers (Semantic)".
FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
files and that was all that was needed to get TAB completion of struct
field's names working.
I haven't used it for much more than that, admittedly.
> * "${" and "{" could both open a block closed by "}"
Why do you think it's a problem?
> * if/fi, case/esac, etc, or possibly all keyword blocks are closed by "end"
These aren't handled by syntax tables.
> * "variadic" structures like try/catch+/finally?
Same.
> It's not clear from the doc just how much this reliance on the syntax
> table based block identification is baked into the lexer/parser
> generation.
IIRC the syntax-tables are used for speed so as to skip whole blocks
without fully parsing their contents.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 17:19 ` Stefan Monnier
@ 2022-08-16 17:40 ` Lynn Winebarger
2022-08-17 1:41 ` Eric Ludlam
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-16 17:40 UTC (permalink / raw)
To: Stefan Monnier
Cc: Eli Zaretskii, luangruo, jostein, jostein, acm, emacs-devel,
casouri
On Tue, Aug 16, 2022 at 1:19 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > I'm only saying there's a disconnect between Jostein's report and Po's
> > response. It's probably a UI issue. There's a checkbox in a dropdown
> > menu that says "Source Code Parsers (Semantic)".
>
> FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
> files and that was all that was needed to get TAB completion of struct
> field's names working.
> I haven't used it for much more than that, admittedly.
It also works for me, but I also have been mostly looking at Emacs
source with it, and Semantic knows how to use the TAGS file for
context-sensitive completion in C. And something is working
gangbusters in Elisp, but unfortunately I can't really identify which
package is doing the work.
> > * "${" and "{" could both open a block closed by "}"
>
> Why do you think it's a problem?
If you want the lexer to tokenize the ${ as a symbol while still
recognizing the text in between as delimited, it seems like a problem.
I mean, I already deal with that in ordinary font-lock, I was hoping
the parser/lexer generation would address the issue independently of
syntax tables.
>
> > * if/fi, case/esac, etc, or possibly all keyword blocks are closed by "end"
>
> These aren't handled by syntax tables.
>
> > * "variadic" structures like try/catch+/finally?
>
> Same.
>
I know they aren't.
> > It's not clear from the doc just how much this reliance on the syntax
> > table based block identification is baked into the lexer/parser
> > generation.
>
> IIRC the syntax-tables are used for speed so as to skip whole blocks
> without fully parsing their contents.
As I wrote, I'm not sure how baked into the parser/lexer generation
these special token types are, and if blocks are used for things like
error recovery or limiting the scope where the syntactic structure is
either illegal or incomplete. That's part of the selling point of the
design.
I can make the necessary modifications for myself, but if it was
already in Semantic I could make use of it immediately without having
to worry about these copyright assignment issues.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-16 17:40 ` Lynn Winebarger
@ 2022-08-17 1:41 ` Eric Ludlam
2022-08-18 12:34 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Eric Ludlam @ 2022-08-17 1:41 UTC (permalink / raw)
To: Lynn Winebarger, Stefan Monnier
Cc: Eli Zaretskii, luangruo, jostein, jostein, acm, emacs-devel,
casouri
On 8/16/22 1:40 PM, Lynn Winebarger wrote:
> On Tue, Aug 16, 2022 at 1:19 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>
>>> I'm only saying there's a disconnect between Jostein's report and Po's
>>> response. It's probably a UI issue. There's a checkbox in a dropdown
>>> menu that says "Source Code Parsers (Semantic)".
>>
>> FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
>> files and that was all that was needed to get TAB completion of struct
>> field's names working.
>> I haven't used it for much more than that, admittedly.
>
> It also works for me, but I also have been mostly looking at Emacs
> source with it, and Semantic knows how to use the TAGS file for
> context-sensitive completion in C. And something is working
> gangbusters in Elisp, but unfortunately I can't really identify which
> package is doing the work.
>
>>> * "${" and "{" could both open a block closed by "}"
>>
>> Why do you think it's a problem?
> If you want the lexer to tokenize the ${ as a symbol while still
> recognizing the text in between as delimited, it seems like a problem.
> I mean, I already deal with that in ordinary font-lock, I was hoping
> the parser/lexer generation would address the issue independently of
> syntax tables.
Lexers are built per-language from a set of analyzers. Thus, you call
(define-lex ...) and list a bunch of analyzers, which are created with
`define-lex-analyzer' or one of the variants.
The analyzers mostly use regular expressions, and when possible, uses
expressions that use the syntax table because they are quite fast. If
you restrict yourself to the built-in named lexer analyzers, like
'semantic-lex-whitespace', then that is what they are, but you can use
`define-lex-analyzer' or `define-lex-regex-analyzer' and write any code
you want to do a match, push a token, and find the end point. The C
lexer/parser does this a lot.
For a very simple case like matching ${:
(define-lex-simple-regex-analyzer my-dollar-curly
"doc string"
"\\$\\{" 'dollar-curly)
and then put this in front of the { } block analyzer when you build up
your lexer.
Eric
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-17 1:41 ` Eric Ludlam
@ 2022-08-18 12:34 ` Lynn Winebarger
2022-08-20 13:15 ` Eric Ludlam
0 siblings, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-18 12:34 UTC (permalink / raw)
To: Eric Ludlam
Cc: Stefan Monnier, Eli Zaretskii, luangruo, jostein, jostein, acm,
emacs-devel, casouri
On Tue, Aug 16, 2022 at 9:41 PM Eric Ludlam <ericludlam@gmail.com> wrote:
>
> On 8/16/22 1:40 PM, Lynn Winebarger wrote:
> > On Tue, Aug 16, 2022 at 1:19 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >>
> >>> I'm only saying there's a disconnect between Jostein's report and Po's
> >>> response. It's probably a UI issue. There's a checkbox in a dropdown
> >>> menu that says "Source Code Parsers (Semantic)".
> >>
> >> FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
> >> files and that was all that was needed to get TAB completion of struct
> >> field's names working.
> >> I haven't used it for much more than that, admittedly.
> >
> > It also works for me, but I also have been mostly looking at Emacs
> > source with it, and Semantic knows how to use the TAGS file for
> > context-sensitive completion in C. And something is working
> > gangbusters in Elisp, but unfortunately I can't really identify which
> > package is doing the work.
> >
> >>> * "${" and "{" could both open a block closed by "}"
> >>
> >> Why do you think it's a problem?
> > If you want the lexer to tokenize the ${ as a symbol while still
> > recognizing the text in between as delimited, it seems like a problem.
> > I mean, I already deal with that in ordinary font-lock, I was hoping
> > the parser/lexer generation would address the issue independently of
> > syntax tables.
>
> Lexers are built per-language from a set of analyzers. Thus, you call
> (define-lex ...) and list a bunch of analyzers, which are created with
> `define-lex-analyzer' or one of the variants.
>
> The analyzers mostly use regular expressions, and when possible, uses
> expressions that use the syntax table because they are quite fast. If
> you restrict yourself to the built-in named lexer analyzers, like
> 'semantic-lex-whitespace', then that is what they are, but you can use
> `define-lex-analyzer' or `define-lex-regex-analyzer' and write any code
> you want to do a match, push a token, and find the end point. The C
> lexer/parser does this a lot.
>
> For a very simple case like matching ${:
> (define-lex-simple-regex-analyzer my-dollar-curly
> "doc string"
> "\\$\\{" 'dollar-curly)
>
> and then put this in front of the { } block analyzer when you build up
> your lexer.
Thanks for the details. I'm not sure what you mean by "put this in
front of the ... block analyzer" though. I just don't understand how
the different token types interact with each other and/or the "block"
(or other) construct well enough to confidently use the built-in
types.
What I will take away here is that I can closely review the C
lexer/parser to see how someone who does understand the interaction of
those types uses them effectively, before investing a lot of time
studying the construction of the built-in types for the purpose of
extending them. Which I'm not sure I would do for the problem I'm
currently dealing with in any case.
Am I right that the "block" classification is used to allow Semantic
to localize the impact of unparseable text? It sounds like the system
will still function without explicitly declaring block constructs, but
some useful features might be effectively disabled.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with font-lock-maximum-decoration 2)
2022-08-18 12:34 ` Lynn Winebarger
@ 2022-08-20 13:15 ` Eric Ludlam
0 siblings, 0 replies; 136+ messages in thread
From: Eric Ludlam @ 2022-08-20 13:15 UTC (permalink / raw)
To: Lynn Winebarger
Cc: Stefan Monnier, Eli Zaretskii, luangruo, jostein, jostein, acm,
emacs-devel, casouri
On 8/18/22 8:34 AM, Lynn Winebarger wrote:
> On Tue, Aug 16, 2022 at 9:41 PM Eric Ludlam <ericludlam@gmail.com> wrote:
>> On 8/16/22 1:40 PM, Lynn Winebarger wrote:
>>> On Tue, Aug 16, 2022 at 1:19 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>>>> I'm only saying there's a disconnect between Jostein's report and Po's
>>>>> response. It's probably a UI issue. There's a checkbox in a dropdown
>>>>> menu that says "Source Code Parsers (Semantic)".
>>>> FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
>>>> files and that was all that was needed to get TAB completion of struct
>>>> field's names working.
>>>> I haven't used it for much more than that, admittedly.
>>> It also works for me, but I also have been mostly looking at Emacs
>>> source with it, and Semantic knows how to use the TAGS file for
>>> context-sensitive completion in C. And something is working
>>> gangbusters in Elisp, but unfortunately I can't really identify which
>>> package is doing the work.
>>>
>>>>> * "${" and "{" could both open a block closed by "}"
>>>> Why do you think it's a problem?
>>> If you want the lexer to tokenize the ${ as a symbol while still
>>> recognizing the text in between as delimited, it seems like a problem.
>>> I mean, I already deal with that in ordinary font-lock, I was hoping
>>> the parser/lexer generation would address the issue independently of
>>> syntax tables.
>> Lexers are built per-language from a set of analyzers. Thus, you call
>> (define-lex ...) and list a bunch of analyzers, which are created with
>> `define-lex-analyzer' or one of the variants.
>>
>> The analyzers mostly use regular expressions, and when possible, uses
>> expressions that use the syntax table because they are quite fast. If
>> you restrict yourself to the built-in named lexer analyzers, like
>> 'semantic-lex-whitespace', then that is what they are, but you can use
>> `define-lex-analyzer' or `define-lex-regex-analyzer' and write any code
>> you want to do a match, push a token, and find the end point. The C
>> lexer/parser does this a lot.
>>
>> For a very simple case like matching ${:
>> (define-lex-simple-regex-analyzer my-dollar-curly
>> "doc string"
>> "\\$\\{" 'dollar-curly)
>>
>> and then put this in front of the { } block analyzer when you build up
>> your lexer.
> Thanks for the details. I'm not sure what you mean by "put this in
> front of the ... block analyzer" though. I just don't understand how
> the different token types interact with each other and/or the "block"
> (or other) construct well enough to confidently use the built-in
> types.
> What I will take away here is that I can closely review the C
> lexer/parser to see how someone who does understand the interaction of
> those types uses them effectively, before investing a lot of time
> studying the construction of the built-in types for the purpose of
> extending them. Which I'm not sure I would do for the problem I'm
> currently dealing with in any case.
> Am I right that the "block" classification is used to allow Semantic
> to localize the impact of unparseable text? It sounds like the system
> will still function without explicitly declaring block constructs, but
> some useful features might be effectively disabled.
Building a lexer is done in two steps. In one step, you would build
some analyzers for specific matches such as the example above. Once you
have a set of analyzers for specific syntaxes, you assemble them into a
lexer, like this:
(define-lex my-lexer
"Doc string"
semantic-lex-ignore-whitespace
;; Custom stuff that conflicts with blocks
my-dollar-curly
;; Do some blocks
semantic-lex-paren-or-list
semantic-lex-close-paren
;; Other stuff
semantic-lex-number
;; End with this
semantic-lex-default-action)
Hopefully this explains the basics of building out some analyzers and
your lexer.
If you are building out a lexer just to do some tokenizing, then this is
about what you need, plus what is in the documentation for more details.
If you want to build a parser that sits on the lexer, there is more to
it, as I recommend using the wisent parser-generator, as it creates
faster parsers. In the wisent .wy files, you define %tokens using a
bison-like syntax, and that in turns builds analyzers that you include
in your lexer. The java parser & lexer has a lot of cases, though the
calc parser is smaller and easier to grok.
The purpose of 'block' constructs in the lexer is to just cut-out large
chunks of text that you don't have to write a parser generator for. My
goal was creating tags, and parsing the body of a function, for example,
is not needed. Thus using the lexer to skip all that speeds things up.
If you want to parse the ENTIRE file, just don't put blocks in your
lexer, and only put in the open/close paren analyzers.
Hope this helps.
Eric
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: CC Mode with font-lock-maximum-decoration 2
2022-08-08 17:15 ` CC Mode with font-lock-maximum-decoration 2 [Was Major modes using `widen' is a good, even essential, programming practice.] Eli Zaretskii
2022-08-08 17:41 ` Eli Zaretskii
@ 2022-08-08 18:20 ` Alan Mackenzie
1 sibling, 0 replies; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 18:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Mon, Aug 08, 2022 at 20:15:25 +0300, Eli Zaretskii wrote:
> > Date: Mon, 8 Aug 2022 15:05:29 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > For this measurement, I started with subr.el, and appended copies of it
> > to itself, then took functions off the end, to make it the same size as
> > xdisp.c. xdisp.c is 1209233 bytes, my .el buffer was 1209371 bytes.
> > I used M-: (benchmark-run 1 (time-scroll-b)) on each buffer, with:
> > (defun time-scroll-b (&optional arg) ; For use in `benchmark-run'.
> > (condition-case nil
> > (while t
> > (if arg (scroll-down) (scroll-up))
> > (sit-for 0))
> > (error nil)))
> > .. The exact results were:
> > (xdisp.c): (5.7370774540000005 9 0.7672129740000013)
> > (elisp): (4.1201735589999995 5 0.42918214299999846).
> > This was, of course, on an optimised build on GNU/Linux using the Linux
> > console, both measurements starting at BOB, having typed and deleted a
> > character to erase existing font-locking.
> Editing source code is more than just scrolling through the text and
> getting it fontified, though.
Yes, of course. Here I'm deliberately separating the fontification from
the buffer changes to get a ballpark figure purely for the fontification.
And the results suggest that optimising that in CC Mode/2 is not going to
be fruitful.
> For realistic measurements, you need to emulate and time a typical mix
> of editing operations.
Yes. I'm intending to do that sometime soon.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 14:13 ` Alan Mackenzie
2022-08-07 14:20 ` Eli Zaretskii
@ 2022-08-07 20:17 ` Gregory Heytings
2022-08-07 20:46 ` Alan Mackenzie
2022-08-07 23:21 ` Stefan Monnier
2 siblings, 1 reply; 136+ messages in thread
From: Gregory Heytings @ 2022-08-07 20:17 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
>> It is wrong that "arbitrary Lisp can be executed through
>> fontification-functions", as you said earlier.
>
> It is not wrong. [...]
>
>> A function called from fontification-functions isn't supposed to
>> download a file, or to send an email, or to change a user option, or to
>> remove or create a file, or to remove or insert text in the buffer, or
>> to kill Emacs or a frame or a window or the current buffer, or to
>> change the window layout, and so on and so forth.
>
> That doesn't deserve a reply, so won't be getting one.
>
Yet the above are all perfectly legitimate examples or "arbitrary Lisp".
So I take it that in fact you agree that it is wrong that "arbitrary Lisp
can be executed through fontification-functions". Neither the
deliberately extreme examples above, nor anything else that is outside of
the scope of the API contract. Code executed through
fontification-functions should do what it was designed to do, and only
that, otherwise it breaks the API contract.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 20:17 ` Major modes using `widen' is a good, even essential, programming practice Gregory Heytings
@ 2022-08-07 20:46 ` Alan Mackenzie
2022-08-07 20:53 ` Gregory Heytings
2022-08-08 2:37 ` Eli Zaretskii
0 siblings, 2 replies; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-07 20:46 UTC (permalink / raw)
To: Gregory Heytings; +Cc: emacs-devel
Hello, Gregory.
On Sun, Aug 07, 2022 at 20:17:26 +0000, Gregory Heytings wrote:
> >> It is wrong that "arbitrary Lisp can be executed through
> >> fontification-functions", as you said earlier.
> > It is not wrong. [...]
> >> A function called from fontification-functions isn't supposed to
> >> download a file, or to send an email, or to change a user option, or to
> >> remove or create a file, or to remove or insert text in the buffer, or
> >> to kill Emacs or a frame or a window or the current buffer, or to
> >> change the window layout, and so on and so forth.
> > That doesn't deserve a reply, so won't be getting one.
> Yet the above are all perfectly legitimate examples of "arbitrary Lisp".
> So I take it that in fact you agree that it is wrong that "arbitrary Lisp
> can be executed through fontification-functions".
You can take no such thing.
> Neither the deliberately extreme examples above, nor anything else
> that is outside of the scope of the API contract.
I challenged you to state what you think this API contract consists of.
You seem unable to meet this challenge. So I take it that in fact you
agree there is no such thing as this "API contract".
> Code executed through fontification-functions should do what it was
> designed to do, and only that, otherwise it breaks the API contract.
Yet you are unable to state precisely what this "designed to do" is.
This "API contract" is a mythological creature. We've already
established, in conversation with Eli, that widening is routinely done
by functions on fontification-functions, and arbitrary buffer positions
are accessed.
If you look closely at CC Mode's use of font locking you'll see that it
font locks, and nothing else.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 20:46 ` Alan Mackenzie
@ 2022-08-07 20:53 ` Gregory Heytings
2022-08-08 2:37 ` Eli Zaretskii
1 sibling, 0 replies; 136+ messages in thread
From: Gregory Heytings @ 2022-08-07 20:53 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
>> Neither the deliberately extreme examples above, nor anything else that
>> is outside of the scope of the API contract.
>
> I challenged you to state what you think this API contract consists of.
> You seem unable to meet this challenge. So I take it that in fact you
> agree there is no such thing as this "API contract".
>
Eli already answered that point, I did not have anything to add.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 20:46 ` Alan Mackenzie
2022-08-07 20:53 ` Gregory Heytings
@ 2022-08-08 2:37 ` Eli Zaretskii
2022-08-08 10:33 ` Alan Mackenzie
1 sibling, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 2:37 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Sun, 7 Aug 2022 20:46:17 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > Code executed through fontification-functions should do what it was
> > designed to do, and only that, otherwise it breaks the API contract.
>
> Yet you are unable to state precisely what this "designed to do" is.
> This "API contract" is a mythological creature. We've already
> established, in conversation with Eli, that widening is routinely done
> by functions on fontification-functions, and arbitrary buffer positions
> are accessed.
Establishing that sad fact doesn't mean we agree with it.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 2:37 ` Eli Zaretskii
@ 2022-08-08 10:33 ` Alan Mackenzie
2022-08-08 11:41 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 10:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gregory, emacs-devel
On Mon, Aug 08, 2022 at 05:37:08 +0300, Eli Zaretskii wrote:
> > Date: Sun, 7 Aug 2022 20:46:17 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > Code executed through fontification-functions should do what it was
> > > designed to do, and only that, otherwise it breaks the API contract.
> > Yet you are unable to state precisely what this "designed to do" is.
> > This "API contract" is a mythological creature. We've already
> > established, in conversation with Eli, that widening is routinely done
> > by functions on fontification-functions, and arbitrary buffer positions
> > are accessed.
> Establishing that sad fact doesn't mean we agree with it.
Death and taxes are sad facts too. Some things, whether we agree with
them or not, just are. Calling them "sad" doesn't help deal with them.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 10:33 ` Alan Mackenzie
@ 2022-08-08 11:41 ` Eli Zaretskii
0 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 11:41 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: gregory, emacs-devel
> Date: Mon, 8 Aug 2022 10:33:17 +0000
> Cc: gregory@heytings.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> On Mon, Aug 08, 2022 at 05:37:08 +0300, Eli Zaretskii wrote:
> > > Date: Sun, 7 Aug 2022 20:46:17 +0000
> > > Cc: emacs-devel@gnu.org
> > > From: Alan Mackenzie <acm@muc.de>
>
> > > > Code executed through fontification-functions should do what it was
> > > > designed to do, and only that, otherwise it breaks the API contract.
>
> > > Yet you are unable to state precisely what this "designed to do" is.
> > > This "API contract" is a mythological creature. We've already
> > > established, in conversation with Eli, that widening is routinely done
> > > by functions on fontification-functions, and arbitrary buffer positions
> > > are accessed.
>
> > Establishing that sad fact doesn't mean we agree with it.
>
> Death and taxes are sad facts too.
Except that in this case, there's no external authority which imposes
that.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 14:13 ` Alan Mackenzie
2022-08-07 14:20 ` Eli Zaretskii
2022-08-07 20:17 ` Major modes using `widen' is a good, even essential, programming practice Gregory Heytings
@ 2022-08-07 23:21 ` Stefan Monnier
2022-08-08 2:29 ` Eli Zaretskii
` (2 more replies)
2 siblings, 3 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-07 23:21 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Gregory Heytings, emacs-devel
> Where, exactly are the terms of this supposed contract formulated?
I'm not sure it's written anywhere.
More specifically, for `jit-lock-functions`, the contract is not
very constraining.
For font-lock the contract is not still very explicit but is more
constraining in that we expect major modes not to look before point-min
or after point-max. For that reason font-lock normally widens the
buffer before it does anything else (unless `font-lock-dont-widen` is
set).
> And which part of this supposed contract has CC Mode broken?
It calls `widen` within its font-lock code.
Eli Zaretskii [2022-08-07 17:20:52] wrote:
> jit-lock calls the functions with two arguments, BEG and END, and
> expects them to work only on that chunk of text.
That is not the case: it expects the function to "fontify" *at least*
from BEG to END, but is quite happy to let it fontify more (and the
function can return a value indicating which portion was actually
returned in that case). Furthermore, it's clear that fontification of
BEG..END may need to look at text before BEG (and occasionally beyond
END as well).
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 23:21 ` Stefan Monnier
@ 2022-08-08 2:29 ` Eli Zaretskii
2022-08-08 9:25 ` Stefan Monnier
2022-08-08 10:38 ` Alan Mackenzie
2022-08-08 10:41 ` Gregory Heytings
2 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 2:29 UTC (permalink / raw)
To: Stefan Monnier; +Cc: acm, gregory, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Gregory Heytings <gregory@heytings.org>, emacs-devel@gnu.org
> Date: Sun, 07 Aug 2022 19:21:32 -0400
>
> Eli Zaretskii [2022-08-07 17:20:52] wrote:
> > jit-lock calls the functions with two arguments, BEG and END, and
> > expects them to work only on that chunk of text.
>
> That is not the case: it expects the function to "fontify" *at least*
> from BEG to END, but is quite happy to let it fontify more (and the
> function can return a value indicating which portion was actually
> returned in that case). Furthermore, it's clear that fontification of
> BEG..END may need to look at text before BEG (and occasionally beyond
> END as well).
The intent is clearly that fontifications don't look far beyond these
two points, because otherwise the whole design of jit-lock and its
invocations during redisplay is basically thrown out the window.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 2:29 ` Eli Zaretskii
@ 2022-08-08 9:25 ` Stefan Monnier
2022-08-08 11:16 ` Lynn Winebarger
2022-08-08 11:30 ` Eli Zaretskii
0 siblings, 2 replies; 136+ messages in thread
From: Stefan Monnier @ 2022-08-08 9:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: acm, gregory, emacs-devel
>> Eli Zaretskii [2022-08-07 17:20:52] wrote:
>> > jit-lock calls the functions with two arguments, BEG and END, and
>> > expects them to work only on that chunk of text.
>>
>> That is not the case: it expects the function to "fontify" *at least*
>> from BEG to END, but is quite happy to let it fontify more (and the
>> function can return a value indicating which portion was actually
>> returned in that case). Furthermore, it's clear that fontification of
>> BEG..END may need to look at text before BEG (and occasionally beyond
>> END as well).
>
> The intent is clearly that fontifications don't look far beyond these
> two points, because otherwise the whole design of jit-lock and its
> invocations during redisplay is basically thrown out the window.
Usually, font-lock rules don't look before BOL or after EOL, indeed,
*except* via `syntax-ppss` which does look at all the text from BOB
to point. To make up for that, `syntax-ppss` relies heavily on caching,
so that it *usually* doesn't need to look very far at all (and if
there's no `syntax-propertize-function`, it's usually quite fast
because it's fully coded in C).
For GB-sized buffers, even the fast C code of `syntax-ppss` incurs
a significant delay in the "unusual" case, so have various options:
- suck it up (potentially wait several minutes when jumping to the end
of the file).
- give up providing more or less correct highlighting (either via some
arbitrary narrowing like we do now, or turning off font-lock).
- try and find some clever heuristic that can find a "nearby safe spot",
i.e. a position for which we can guess the PPSS value (usually we
look for a position that is "known" to be outside of any string,
comment, or parenthesis).
- display the buffer quickly without highlighting while the fontification
is computed in the background.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 9:25 ` Stefan Monnier
@ 2022-08-08 11:16 ` Lynn Winebarger
2022-08-08 11:47 ` Eli Zaretskii
2022-08-08 11:30 ` Eli Zaretskii
1 sibling, 1 reply; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-08 11:16 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Alan Mackenzie, gregory, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3162 bytes --]
On Mon, Aug 8, 2022, 5:29 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> Eli Zaretskii [2022-08-07 17:20:52] wrote:
> >> > jit-lock calls the functions with two arguments, BEG and END, and
> >> > expects them to work only on that chunk of text.
> >>
> >> That is not the case: it expects the function to "fontify" *at least*
> >> from BEG to END, but is quite happy to let it fontify more (and the
> >> function can return a value indicating which portion was actually
> >> returned in that case). Furthermore, it's clear that fontification of
> >> BEG..END may need to look at text before BEG (and occasionally beyond
> >> END as well).
> >
> > The intent is clearly that fontifications don't look far beyond these
> > two points, because otherwise the whole design of jit-lock and its
> > invocations during redisplay is basically thrown out the window.
>
> Usually, font-lock rules don't look before BOL or after EOL, indeed,
> *except* via `syntax-ppss` which does look at all the text from BOB
> to point. To make up for that, `syntax-ppss` relies heavily on caching,
> so that it *usually* doesn't need to look very far at all (and if
> there's no `syntax-propertize-function`, it's usually quite fast
> because it's fully coded in C).
>
> For GB-sized buffers, even the fast C code of `syntax-ppss` incurs
> a significant delay in the "unusual" case, so have various options:
> - suck it up (potentially wait several minutes when jumping to the end
> of the file).
> - give up providing more or less correct highlighting (either via some
> arbitrary narrowing like we do now, or turning off font-lock).
> - try and find some clever heuristic that can find a "nearby safe spot",
> i.e. a position for which we can guess the PPSS value (usually we
> look for a position that is "known" to be outside of any string,
> comment, or parenthesis).
> - display the buffer quickly without highlighting while the fontification
> is computed in the background.
I know CC mode relies on heuristics to identify syntactic structures, and
not a full parser (whether from semantic or LSP), but it seems the issue is
that you don't have a parse state for the beginning of the narrowed buffer,
where an initial parse state is inappropriate. Assuming that text outside
the narrowing is not allowed to change, determining the appropriate parse
state should only be required once on narrowing.
So, could there be a pre-narrowing hook to run before narrowing takes
effect to allow a major mode to determine the appropriate parse state for
the beginning of the narrowed buffer?
Also, as I'm not a big user of explicit narrowing, the only place I've
noticed it happening is in info mode, where the focus is narrowed to a
particular syntactic unit.
Is there a way for a major mode to let the user signal the syntactic unit
that they believe they are narrowing to, either with command variants or an
interrogative(with a list of options supplied by the mode) when narrowing
is performed by the user interactively? With the fall-back of either
having the mode determine the correct initial state or turning off
fontification during the narrowing?
Lynn
[-- Attachment #2: Type: text/html, Size: 3986 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 11:16 ` Lynn Winebarger
@ 2022-08-08 11:47 ` Eli Zaretskii
2022-08-08 14:24 ` Lynn Winebarger
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 11:47 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: monnier, acm, gregory, emacs-devel
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Mon, 8 Aug 2022 07:16:44 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>, Alan Mackenzie <acm@muc.de>, gregory@heytings.org,
> emacs-devel <emacs-devel@gnu.org>
>
> I know CC mode relies on heuristics to identify syntactic structures, and not a full parser (whether from
> semantic or LSP), but it seems the issue is that you don't have a parse state for the beginning of the
> narrowed buffer, where an initial parse state is inappropriate. Assuming that text outside the narrowing is
> not allowed to change, determining the appropriate parse state should only be required once on narrowing.
> So, could there be a pre-narrowing hook to run before narrowing takes effect to allow a major mode to
> determine the appropriate parse state for the beginning of the narrowed buffer?
Why do you need a hook? When the mode is first enabled in the buffer,
there will be no narrowing in effect yet, so the mode could do
whatever it wants at that time.
Of course, this won't help us to solve the issues discussed here,
because scanning the entire buffer at any time is slow and
non-scalable.
> Also, as I'm not a big user of explicit narrowing, the only place I've noticed it happening is in info mode, where
> the focus is narrowed to a particular syntactic unit.
> Is there a way for a major mode to let the user signal the syntactic unit that they believe they are narrowing
> to, either with command variants or an interrogative(with a list of options supplied by the mode) when
> narrowing is performed by the user interactively? With the fall-back of either having the mode determine the
> correct initial state or turning off fontification during the narrowing?
We are not talking about user narrowing here.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 11:47 ` Eli Zaretskii
@ 2022-08-08 14:24 ` Lynn Winebarger
0 siblings, 0 replies; 136+ messages in thread
From: Lynn Winebarger @ 2022-08-08 14:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, acm, gregory, emacs-devel
On Mon, Aug 8, 2022 at 7:48 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Lynn Winebarger <owinebar@gmail.com>
> > Date: Mon, 8 Aug 2022 07:16:44 -0400
> > Cc: Eli Zaretskii <eliz@gnu.org>, Alan Mackenzie <acm@muc.de>, gregory@heytings.org,
> > emacs-devel <emacs-devel@gnu.org>
> >
> > I know CC mode relies on heuristics to identify syntactic structures, and not a full parser (whether from
> > semantic or LSP), but it seems the issue is that you don't have a parse state for the beginning of the
> > narrowed buffer, where an initial parse state is inappropriate. Assuming that text outside the narrowing is
> > not allowed to change, determining the appropriate parse state should only be required once on narrowing.
> > So, could there be a pre-narrowing hook to run before narrowing takes effect to allow a major mode to
> > determine the appropriate parse state for the beginning of the narrowed buffer?
>
> Why do you need a hook? When the mode is first enabled in the buffer,
> there will be no narrowing in effect yet, so the mode could do
> whatever it wants at that time.
>
> Of course, this won't help us to solve the issues discussed here,
> because scanning the entire buffer at any time is slow and
> non-scalable.
>
I think you answered your own question - the code (I believe we're
discussing jit-lock on arbitrarily long lines in the particular) doing
the narrowing would have to identify the starting and ending points
via some special variable prior to running the hook, so the thunks
would be able to determine the right state *before* narrowing is
actually done.
> > Also, as I'm not a big user of explicit narrowing, the only place I've noticed it happening is in info mode, where
> > the focus is narrowed to a particular syntactic unit.
> > Is there a way for a major mode to let the user signal the syntactic unit that they believe they are narrowing
> > to, either with command variants or an interrogative(with a list of options supplied by the mode) when
> > narrowing is performed by the user interactively? With the fall-back of either having the mode determine the
> > correct initial state or turning off fontification during the narrowing?
>
> We are not talking about user narrowing here.
Maybe not explicit user narrowing, but I don't think we're discussing
a piece of code that is just randomly jumping around a buffer,
narrowing and then requiring fontification without user interaction.
So, one way of handling it would be to have the code doing the
narrowing take the place of the user in the above scenario in some
programmatic way, perhaps using some prespecified (by the mode)
regular expressions to make a best guess at what the user response
would be based solely on local conditions. A mode that doesn't
specify a way to make a selection could be defaulted to turn off
fontifcation while narrowing.
An approach that doesn't require cooperation from the mode would be to
* open a new buffer in the same mode
* insert the characters from the narrowed region linearly in electric
mode (or something that would create newlines in "appropriate" places
for the user's preferred style),
* indent and fontify that buffer in some way so the indentation of
the first line is consistent with the indentation of the last line
(e.g. if the narrowed region is a series of '}' in CC mode, the first
"}" should be indented to reflect the level implied by the number of
"}"s therein - if necessary by inserting leading matching delimiters
in the reverse order)
* copy the fontification back to the region in the original buffer
Or to do something equivalent to that without actually opening the
additional buffer. This method would have the advantage of being
entirely local.
Of course, as a user, I would like to have a command that lets me take
a point that I *know* the correct syntactic classification of, and
specify that from a menu, then have the mode fontify with the
constraint that my assertion is held constant, at least until there is
a modification at the point I made the assertion.
Lynn
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 9:25 ` Stefan Monnier
2022-08-08 11:16 ` Lynn Winebarger
@ 2022-08-08 11:30 ` Eli Zaretskii
2022-08-08 12:05 ` Stefan Monnier
2022-08-08 21:16 ` Dmitry Gutov
1 sibling, 2 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 11:30 UTC (permalink / raw)
To: Stefan Monnier; +Cc: acm, gregory, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
> Date: Mon, 08 Aug 2022 05:25:21 -0400
>
> > The intent is clearly that fontifications don't look far beyond these
> > two points, because otherwise the whole design of jit-lock and its
> > invocations during redisplay is basically thrown out the window.
>
> Usually, font-lock rules don't look before BOL or after EOL, indeed,
But BOL and EOL could also be very far away, so that, too, needs some
reasonable limit.
> *except* via `syntax-ppss` which does look at all the text from BOB
> to point. To make up for that, `syntax-ppss` relies heavily on caching,
> so that it *usually* doesn't need to look very far at all (and if
> there's no `syntax-propertize-function`, it's usually quite fast
> because it's fully coded in C).
>
> For GB-sized buffers, even the fast C code of `syntax-ppss` incurs
> a significant delay in the "unusual" case, so have various options:
I reported this for a 18MB single-line file, which is way below the GB
bar.
The problem is that the initial full scan can take "forever" in those
files, and that basically means we cannot edit such files in practice.
So if you dislike the current solution of locked narrowing, how about
making syntax-ppss work in chunks (perhaps from an idle timer?), after
initially scanning only the first small portion of the file. The goal
is to have the file displayed quickly enough, and thereafter complete
the scan when possible.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 11:30 ` Eli Zaretskii
@ 2022-08-08 12:05 ` Stefan Monnier
2022-08-08 12:40 ` Eli Zaretskii
2022-08-08 21:16 ` Dmitry Gutov
1 sibling, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-08 12:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: acm, gregory, emacs-devel
>> > The intent is clearly that fontifications don't look far beyond these
>> > two points, because otherwise the whole design of jit-lock and its
>> > invocations during redisplay is basically thrown out the window.
>> Usually, font-lock rules don't look before BOL or after EOL, indeed,
> But BOL and EOL could also be very far away, so that, too, needs some
> reasonable limit.
Indeed. But in general this is easier to address, because the use of
BOL/EOL is itself a heuristic, so we can impose some other limit (as is
done now with the forced narrowing or with `syntax-wholeline-max`) and
the harm is usually tolerable, IME.
>> *except* via `syntax-ppss` which does look at all the text from BOB
>> to point. To make up for that, `syntax-ppss` relies heavily on caching,
>> so that it *usually* doesn't need to look very far at all (and if
>> there's no `syntax-propertize-function`, it's usually quite fast
>> because it's fully coded in C).
>>
>> For GB-sized buffers, even the fast C code of `syntax-ppss` incurs
>> a significant delay in the "unusual" case, so have various options:
>
> I reported this for a 18MB single-line file, which is way below the GB
> bar.
On my own builds (which are slow on old machines), 18MB typically
results in a delay of just a few seconds. Given that it's occasional,
I find it very tolerable (similar delays can occur for other reasons,
such as swapping, or a GC when the heap is large).
But this scales linearly, so sooner or later on the way to multi-GB
files the delay becomes intolerable.
> The problem is that the initial full scan can take "forever" in those
> files, and that basically means we cannot edit such files in practice.
> So if you dislike the current solution of locked narrowing, how about
> making syntax-ppss work in chunks (perhaps from an idle timer?), after
> initially scanning only the first small portion of the file. The goal
> is to have the file displayed quickly enough, and thereafter complete
> the scan when possible.
Yes, that's probably the better long term solution.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 12:05 ` Stefan Monnier
@ 2022-08-08 12:40 ` Eli Zaretskii
2022-08-08 17:22 ` Stefan Monnier
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 12:40 UTC (permalink / raw)
To: Stefan Monnier; +Cc: acm, gregory, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
> Date: Mon, 08 Aug 2022 08:05:31 -0400
>
> >> For GB-sized buffers, even the fast C code of `syntax-ppss` incurs
> >> a significant delay in the "unusual" case, so have various options:
> >
> > I reported this for a 18MB single-line file, which is way below the GB
> > bar.
>
> On my own builds (which are slow on old machines), 18MB typically
> results in a delay of just a few seconds.
Optimized build or unoptimized?
And is that before or after the syntax-wholeline-max limitation?
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 12:40 ` Eli Zaretskii
@ 2022-08-08 17:22 ` Stefan Monnier
2022-08-08 17:34 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Stefan Monnier @ 2022-08-08 17:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: acm, gregory, emacs-devel
>> On my own builds (which are slow on old machines), 18MB typically
>> results in a delay of just a few seconds.
> Optimized build or unoptimized?
Unoptimized.
> And is that before or after the syntax-wholeline-max limitation?
Before. I'm basically talking about the time to do `parse-partial-sexp`
from BOB to EOB.
Stefan
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 17:22 ` Stefan Monnier
@ 2022-08-08 17:34 ` Eli Zaretskii
0 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 17:34 UTC (permalink / raw)
To: Stefan Monnier; +Cc: acm, gregory, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
> Date: Mon, 08 Aug 2022 13:22:03 -0400
>
> >> On my own builds (which are slow on old machines), 18MB typically
> >> results in a delay of just a few seconds.
> > Optimized build or unoptimized?
>
> Unoptimized.
>
> > And is that before or after the syntax-wholeline-max limitation?
>
> Before. I'm basically talking about the time to do `parse-partial-sexp`
> from BOB to EOB.
Then your old machine is much faster than my new one.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 11:30 ` Eli Zaretskii
2022-08-08 12:05 ` Stefan Monnier
@ 2022-08-08 21:16 ` Dmitry Gutov
2022-08-09 11:30 ` Eli Zaretskii
1 sibling, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-08 21:16 UTC (permalink / raw)
To: Eli Zaretskii, Stefan Monnier; +Cc: acm, gregory, emacs-devel
On 08.08.2022 14:30, Eli Zaretskii wrote:
> So if you dislike the current solution of locked narrowing, how about
> making syntax-ppss work in chunks (perhaps from an idle timer?), after
> initially scanning only the first small portion of the file. The goal
> is to have the file displayed quickly enough, and thereafter complete
> the scan when possible.
The file is already displayed "quickly enough". The problem arrives when
you try to navigate far from BOB (e.g. to EOB).
What's going to happen then, if the timer hasn't fired yet? And for the
timer's work to be useful, it has to had happened between the last edit
and the subsequent navigation. A lot of idle timers like that = a lot of
discarded work.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 21:16 ` Dmitry Gutov
@ 2022-08-09 11:30 ` Eli Zaretskii
2022-08-09 14:38 ` Dmitry Gutov
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 11:30 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: monnier, acm, gregory, emacs-devel
> Date: Tue, 9 Aug 2022 00:16:23 +0300
> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 08.08.2022 14:30, Eli Zaretskii wrote:
> > So if you dislike the current solution of locked narrowing, how about
> > making syntax-ppss work in chunks (perhaps from an idle timer?), after
> > initially scanning only the first small portion of the file. The goal
> > is to have the file displayed quickly enough, and thereafter complete
> > the scan when possible.
>
> The file is already displayed "quickly enough".
What do you mean by "quickly enough"? With this recipe:
emacs -Q
M-: (setq long-line-threshold nil) RET
M-: (setq syntax-wholeline-max most-positive-fixnum) RET
visiting dictionary.json, a 19MB single-line file, takes "forever" (I
killed it after 20 minutes) before it shows anything in the window.
And since both variables use "arbitrary restrictions", and both can
cause inaccurate/incorrect/wrong/buggy/<your euphemism here>
fontifications, my proposal above was to do something smarter.
> What's going to happen then, if the timer hasn't fired yet?
We should process a relatively small portion of the buffer around the
new position of point.
Not surprisingly, this is precisely how jit-lock is supposed to work,
if only the stuff called through fontification-functions obeyed the
region which it was told to process.
> And for the timer's work to be useful, it has to had happened
> between the last edit and the subsequent navigation. A lot of idle
> timers like that = a lot of discarded work.
Not if the user will subsequently visit the place where the "discarded
work" was invested.
Full disclosure: I'm a long-time and very happy user of
jit-lock stealth fontifications.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 11:30 ` Eli Zaretskii
@ 2022-08-09 14:38 ` Dmitry Gutov
2022-08-09 16:12 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-09 14:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, acm, gregory, emacs-devel
On 09.08.2022 14:30, Eli Zaretskii wrote:
>> Date: Tue, 9 Aug 2022 00:16:23 +0300
>> Cc: acm@muc.de, gregory@heytings.org, emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> On 08.08.2022 14:30, Eli Zaretskii wrote:
>>> So if you dislike the current solution of locked narrowing, how about
>>> making syntax-ppss work in chunks (perhaps from an idle timer?), after
>>> initially scanning only the first small portion of the file. The goal
>>> is to have the file displayed quickly enough, and thereafter complete
>>> the scan when possible.
>>
>> The file is already displayed "quickly enough".
>
> What do you mean by "quickly enough"? With this recipe:
>
> emacs -Q
> M-: (setq long-line-threshold nil) RET
> M-: (setq syntax-wholeline-max most-positive-fixnum) RET
>
> visiting dictionary.json, a 19MB single-line file, takes "forever" (I
> killed it after 20 minutes) before it shows anything in the window.
> And since both variables use "arbitrary restrictions", and both can
> cause inaccurate/incorrect/wrong/buggy/<your euphemism here>
> fontifications, my proposal above was to do something smarter.
I never recommended you to change any of those vars.
Doing that brings back pathological slowdowns that don't have anything
to do with the speed of parse-partial-sexp.
>> What's going to happen then, if the timer hasn't fired yet?
>
> We should process a relatively small portion of the buffer around the
> new position of point.
To speed up the jump to a far distant part of the buffer after doing an
edit "here", the timer would have to parse the whole buffer between here
and there. Or most of it.
> Not surprisingly, this is precisely how jit-lock is supposed to work,
> if only the stuff called through fontification-functions obeyed the
> region which it was told to process.
If you concerned with the speed of font-lock itself (and not with the
speed of syntax-ppss cache maintenance which we've talked about before),
and in your case it might be justified, given the unoptimized build,
then using something like stealth fontifications could indeed speed up
C-v/M-v. Not M->, though.
>> And for the timer's work to be useful, it has to had happened
>> between the last edit and the subsequent navigation. A lot of idle
>> timers like that = a lot of discarded work.
>
> Not if the user will subsequently visit the place where the "discarded
> work" was invested.
>
> Full disclosure: I'm a long-time and very happy user of
> jit-lock stealth fontifications.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 14:38 ` Dmitry Gutov
@ 2022-08-09 16:12 ` Eli Zaretskii
2022-08-09 16:52 ` Dmitry Gutov
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 16:12 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: monnier, acm, gregory, emacs-devel
> Date: Tue, 9 Aug 2022 17:38:22 +0300
> Cc: monnier@iro.umontreal.ca, acm@muc.de, gregory@heytings.org,
> emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> > emacs -Q
> > M-: (setq long-line-threshold nil) RET
> > M-: (setq syntax-wholeline-max most-positive-fixnum) RET
> >
> > visiting dictionary.json, a 19MB single-line file, takes "forever" (I
> > killed it after 20 minutes) before it shows anything in the window.
> > And since both variables use "arbitrary restrictions", and both can
> > cause inaccurate/incorrect/wrong/buggy/<your euphemism here>
> > fontifications, my proposal above was to do something smarter.
>
> I never recommended you to change any of those vars.
Then I don't really understand what is it that you are arguing about.
My proposal to Stefan was to make syntax-ppss and friends less of a
burden _instead_ of the currently implemented "arbitrary restrictions"
that he doesn't like. You seemed to have contradicted my proposal by
saying that the file is already displayed quickly enough, but that
only happens _with_ those "arbitrary restrictions".
So what is your point here?
> >> What's going to happen then, if the timer hasn't fired yet?
> >
> > We should process a relatively small portion of the buffer around the
> > new position of point.
>
> To speed up the jump to a far distant part of the buffer after doing an
> edit "here", the timer would have to parse the whole buffer between here
> and there. Or most of it.
I didn't say it should be done from a timer in this case. And it
shouldn't.
> > Not surprisingly, this is precisely how jit-lock is supposed to work,
> > if only the stuff called through fontification-functions obeyed the
> > region which it was told to process.
>
> If you concerned with the speed of font-lock itself (and not with the
> speed of syntax-ppss cache maintenance which we've talked about before),
I'm concerned with both, because font-lock typically calls syntax-ppss
in many modes.
> and in your case it might be justified, given the unoptimized build,
> then using something like stealth fontifications could indeed speed up
> C-v/M-v. Not M->, though.
I'm a happy user of stealth fontification in my production sessions,
where I run fully optimized builds.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 16:12 ` Eli Zaretskii
@ 2022-08-09 16:52 ` Dmitry Gutov
2022-08-09 17:05 ` Eli Zaretskii
0 siblings, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-09 16:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, acm, gregory, emacs-devel
On 09.08.2022 19:12, Eli Zaretskii wrote:
>> I never recommended you to change any of those vars.
>
> Then I don't really understand what is it that you are arguing about.
>
> My proposal to Stefan was to make syntax-ppss and friends less of a
> burden _instead_ of the currently implemented "arbitrary restrictions"
> that he doesn't like. You seemed to have contradicted my proposal by
> saying that the file is already displayed quickly enough, but that
> only happens _with_ those "arbitrary restrictions".
No, it doesn't.
You might recall the patch I suggested recently that doesn't change
either of those vars but disables narrowing in handle_fontified_prop.
BTW, you can try js-json-mode in the latest master, I have fixed another
source of slow font-locking there (coming from js-mode).
Just remove the expression that starts with 'if
(current_buffer->long_line_optimizations_p)' from handle_fontified_prop,
recompile, and visit dictionary.json.
>>> Not surprisingly, this is precisely how jit-lock is supposed to work,
>>> if only the stuff called through fontification-functions obeyed the
>>> region which it was told to process.
>>
>> If you concerned with the speed of font-lock itself (and not with the
>> speed of syntax-ppss cache maintenance which we've talked about before),
>
> I'm concerned with both, because font-lock typically calls syntax-ppss
> in many modes.
"Stealth" syntax-ppss, to have any visible impact, is likely to have the
problem I described: lots of work, the results of which are regularly
discarded. Meaning, lost of wasting CPU energy.
What might work better instead (and would benefit specifically the
scenario with a lot of jumping around and editing in different parts of
a large file) is to try to avoid dumping the whole spss cache when the
use edits near BOB, and instead record the fact of such edits but later,
but later try to "revalidate" the cache entries by calling
parse-partial-sexp on the interval where the edits occurred in the
meantime, and keep them if the result shows that the edits should have
no effect on the later values. That's something tree-sitter does, AFAIU,
but for much complex parse tree.
Anyway, that approach would require some work and subsequent testing,
and it would improve performance for a particular class of operations.
It's not a given that the performance issues you see in CC Mode fit that
profile.
And I hope that somebody could look into improving that 10x difference
between yours and mine performance of parse-partial-sexp first, so then
we could see where the remaining bottlenecks are.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 16:52 ` Dmitry Gutov
@ 2022-08-09 17:05 ` Eli Zaretskii
2022-08-09 18:52 ` Dmitry Gutov
0 siblings, 1 reply; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-09 17:05 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: monnier, acm, gregory, emacs-devel
> Date: Tue, 9 Aug 2022 19:52:46 +0300
> Cc: monnier@iro.umontreal.ca, acm@muc.de, gregory@heytings.org,
> emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> > Then I don't really understand what is it that you are arguing about.
> >
> > My proposal to Stefan was to make syntax-ppss and friends less of a
> > burden _instead_ of the currently implemented "arbitrary restrictions"
> > that he doesn't like. You seemed to have contradicted my proposal by
> > saying that the file is already displayed quickly enough, but that
> > only happens _with_ those "arbitrary restrictions".
>
> No, it doesn't.
>
> You might recall the patch I suggested recently that doesn't change
> either of those vars but disables narrowing in handle_fontified_prop.
Why is that of importance? More importantly, how is that proposal
related to what I was discussing with Stefan?
> BTW, you can try js-json-mode in the latest master, I have fixed another
> source of slow font-locking there (coming from js-mode).
I already did. This trime I got impatient more quickly, and killed
the session only after 5 minutes that it was unable to show me
dictionary.json (after disabling the narrowing).
> Just remove the expression that starts with 'if
> (current_buffer->long_line_optimizations_p)' from handle_fontified_prop,
> recompile, and visit dictionary.json.
Sorry, I cannot afford trying half-baked solutions. I asked you to
push a feature branch or an optional feature on master precisely so
that I won't need to hack my development branch. When such a feature
is available, I'll surely test it in a variety of scenarios,
> >> If you concerned with the speed of font-lock itself (and not with the
> >> speed of syntax-ppss cache maintenance which we've talked about before),
> >
> > I'm concerned with both, because font-lock typically calls syntax-ppss
> > in many modes.
>
> "Stealth" syntax-ppss, to have any visible impact, is likely to have the
> problem I described: lots of work, the results of which are regularly
> discarded. Meaning, lost of wasting CPU energy.
Well, my many years of using jit-lock-stealth clearly prove otherwise.
By the time I get to revisit the buffers after some break, they are
already fully fontified.
> What might work better instead (and would benefit specifically the
> scenario with a lot of jumping around and editing in different parts of
> a large file) is to try to avoid dumping the whole spss cache when the
> use edits near BOB, and instead record the fact of such edits but later,
> but later try to "revalidate" the cache entries by calling
> parse-partial-sexp on the interval where the edits occurred in the
> meantime, and keep them if the result shows that the edits should have
> no effect on the later values. That's something tree-sitter does, AFAIU,
> but for much complex parse tree.
>
> Anyway, that approach would require some work and subsequent testing,
> and it would improve performance for a particular class of operations.
> It's not a given that the performance issues you see in CC Mode fit that
> profile.
Well, I hope someone will actually try to make that happen.
> And I hope that somebody could look into improving that 10x difference
> between yours and mine performance of parse-partial-sexp first, so then
> we could see where the remaining bottlenecks are.
That too.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 17:05 ` Eli Zaretskii
@ 2022-08-09 18:52 ` Dmitry Gutov
2022-08-09 19:46 ` Gregory Heytings
0 siblings, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-09 18:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, acm, gregory, emacs-devel
On 09.08.2022 20:05, Eli Zaretskii wrote:
>> Date: Tue, 9 Aug 2022 19:52:46 +0300
>> Cc: monnier@iro.umontreal.ca, acm@muc.de, gregory@heytings.org,
>> emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>>> Then I don't really understand what is it that you are arguing about.
>>>
>>> My proposal to Stefan was to make syntax-ppss and friends less of a
>>> burden _instead_ of the currently implemented "arbitrary restrictions"
>>> that he doesn't like. You seemed to have contradicted my proposal by
>>> saying that the file is already displayed quickly enough, but that
>>> only happens _with_ those "arbitrary restrictions".
>>
>> No, it doesn't.
>>
>> You might recall the patch I suggested recently that doesn't change
>> either of those vars but disables narrowing in handle_fontified_prop.
>
> Why is that of importance? More importantly, how is that proposal
> related to what I was discussing with Stefan?
We've been talking about "unconstrained" font-lock and the performance
problems it causes or can cause. The patch is the way to actually try it
without bringing back unrelated performance problems.
>> BTW, you can try js-json-mode in the latest master, I have fixed another
>> source of slow font-locking there (coming from js-mode).
>
> I already did. This trime I got impatient more quickly, and killed
> the session only after 5 minutes that it was unable to show me
> dictionary.json (after disabling the narrowing).
Am I correct to assume that you tried it while setting
long-line-threshold to nil?
That kind of experiment tells me nothing, as explained before.
>> Just remove the expression that starts with 'if
>> (current_buffer->long_line_optimizations_p)' from handle_fontified_prop,
>> recompile, and visit dictionary.json.
>
> Sorry, I cannot afford trying half-baked solutions. I asked you to
> push a feature branch or an optional feature on master precisely so
> that I won't need to hack my development branch. When such a feature
> is available, I'll surely test it in a variety of scenarios,
It's a tiny patch. I can push it to a branch, but it will still be a
tiny patch that just removes 14 lines.
I'm also working on a bigger change that will push the
narrowing/limiting mechanics down to font-lock, but I'm yet to find the
best place to put that logic.
And the problem with working on a feature like that is that it will be
fixing performance problems I don't really have. And, as such, cannot
evaluate different tradeoffs. And neither you nor Gregory want to give
me feedback by actually trying that tiny patch.
>>>> If you concerned with the speed of font-lock itself (and not with the
>>>> speed of syntax-ppss cache maintenance which we've talked about before),
>>>
>>> I'm concerned with both, because font-lock typically calls syntax-ppss
>>> in many modes.
>>
>> "Stealth" syntax-ppss, to have any visible impact, is likely to have the
>> problem I described: lots of work, the results of which are regularly
>> discarded. Meaning, lost of wasting CPU energy.
>
> Well, my many years of using jit-lock-stealth clearly prove otherwise.
> By the time I get to revisit the buffers after some break, they are
> already fully fontified.
Then you are not in the same usage scenario that we described before
(lots of jumping around *and* lots of editing). Because said edits
invalidate the syntax highlighting, forcing Emacs to do it all over again.
>> What might work better instead (and would benefit specifically the
>> scenario with a lot of jumping around and editing in different parts of
>> a large file) is to try to avoid dumping the whole spss cache when the
>> use edits near BOB, and instead record the fact of such edits but later,
>> but later try to "revalidate" the cache entries by calling
>> parse-partial-sexp on the interval where the edits occurred in the
>> meantime, and keep them if the result shows that the edits should have
>> no effect on the later values. That's something tree-sitter does, AFAIU,
>> but for much complex parse tree.
>>
>> Anyway, that approach would require some work and subsequent testing,
>> and it would improve performance for a particular class of operations.
>> It's not a given that the performance issues you see in CC Mode fit that
>> profile.
>
> Well, I hope someone will actually try to make that happen.
Definitely. But so far I'm not convinced that your impression of CC
Mode's "sluggishness" comes from syntax-ppss at all. Or, at least, most
of it.
And not from CC Mode's approach to maintaining syntax information
eagerly, through buffer change hooks (after-change-functions, etc).
>> And I hope that somebody could look into improving that 10x difference
>> between yours and mine performance of parse-partial-sexp first, so then
>> we could see where the remaining bottlenecks are.
>
> That too.
Yep.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-09 18:52 ` Dmitry Gutov
@ 2022-08-09 19:46 ` Gregory Heytings
0 siblings, 0 replies; 136+ messages in thread
From: Gregory Heytings @ 2022-08-09 19:46 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, monnier, acm, emacs-devel
>
> I'm also working on a bigger change that will push the
> narrowing/limiting mechanics down to font-lock, but I'm yet to find the
> best place to put that logic.
>
You may have seen that this has already been tried, and found
insufficient.
>
> And the problem with working on a feature like that is that it will be
> fixing performance problems I don't really have. And, as such, cannot
> evaluate different tradeoffs. And neither you nor Gregory want to give
> me feedback by actually trying that tiny patch.
>
Sorry, I cannot reply to all posts in a few hours.
I hope you realize how... local that patch it. Sure, it makes the 18 MB
json file example (and only that example) slightly better. But on my
machine I still see Emacs stuttering when leaning on C-v, I still see
delays when searching through the file with C-s, and so forth. And of
course with larger files the delays become more and more significant.
With a 300 MB json file I have to wait about 30 s after pressing M->.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 23:21 ` Stefan Monnier
2022-08-08 2:29 ` Eli Zaretskii
@ 2022-08-08 10:38 ` Alan Mackenzie
2022-08-08 11:49 ` Eli Zaretskii
2022-08-08 10:41 ` Gregory Heytings
2 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-08 10:38 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Gregory Heytings, emacs-devel
Hello, Stefan.
On Sun, Aug 07, 2022 at 19:21:32 -0400, Stefan Monnier wrote:
> > Where, exactly are the terms of this supposed contract formulated?
> I'm not sure it's written anywhere.
> More specifically, for `jit-lock-functions`, the contract is not
> very constraining.
> For font-lock the contract is not still very explicit but is more
> constraining in that we expect major modes not to look before point-min
> or after point-max. For that reason font-lock normally widens the
> buffer before it does anything else (unless `font-lock-dont-widen` is
> set).
> > And which part of this supposed contract has CC Mode broken?
> It calls `widen` within its font-lock code.
I can't remember exactly why CC Mode widens here (though I could surely
find it in my notes), but it should be largely harmless in normal
buffers.
> Eli Zaretskii [2022-08-07 17:20:52] wrote:
> > jit-lock calls the functions with two arguments, BEG and END, and
> > expects them to work only on that chunk of text.
> That is not the case: it expects the function to "fontify" *at least*
> from BEG to END, but is quite happy to let it fontify more (and the
> function can return a value indicating which portion was actually
> returned in that case). Furthermore, it's clear that fontification of
> BEG..END may need to look at text before BEG (and occasionally beyond
> END as well).
It's also worth pointing out that _looking_ at text, whatever that means
exactly, is an order of magnitude faster than _fontifying_ that text.
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-08 10:38 ` Alan Mackenzie
@ 2022-08-08 11:49 ` Eli Zaretskii
0 siblings, 0 replies; 136+ messages in thread
From: Eli Zaretskii @ 2022-08-08 11:49 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: monnier, gregory, emacs-devel
> Date: Mon, 8 Aug 2022 10:38:36 +0000
> Cc: Gregory Heytings <gregory@heytings.org>, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> It's also worth pointing out that _looking_ at text, whatever that means
> exactly, is an order of magnitude faster than _fontifying_ that text.
I'm quite sure this is not why major modes (and CC Mode in particular)
widen. They must do something entirely non-trivial at BOB.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 23:21 ` Stefan Monnier
2022-08-08 2:29 ` Eli Zaretskii
2022-08-08 10:38 ` Alan Mackenzie
@ 2022-08-08 10:41 ` Gregory Heytings
2 siblings, 0 replies; 136+ messages in thread
From: Gregory Heytings @ 2022-08-08 10:41 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Alan Mackenzie, emacs-devel
>> jit-lock calls the functions with two arguments, BEG and END, and
>> expects them to work only on that chunk of text.
>
> That is not the case: it expects the function to "fontify" *at least*
> from BEG to END, but is quite happy to let it fontify more (and the
> function can return a value indicating which portion was actually
> returned in that case).
>
And how much is "more"? There's a reason it's called "jit-lock" and not
"aot-lock", isn't it?
The docstring of jit-lock-mode is quite clear about this: it's a
"demand-driven buffer fontification", "triggered by Emacs C code", with
which "fontification occurs when necessary" when motion commands "would
otherwise reveal unfontified areas". Likewise "the START and END of the
region to fontify" in the docstring of jit-lock-functions give the bounds
within which fontification is supposed to happen. Exceeding these bounds
a bit, say by a few hundred characters, is okay; considering that they are
mere hints and that the whole buffer can potentially be modified isn't.
>
> Furthermore, it's clear that fontification of BEG..END may need to look
> at text before BEG (and occasionally beyond END as well).
>
Yes, and that's one the reasons why the locked narrowed region is in fact
quite large. For example, with emacs -Q, the size (width x height) of the
window is 2880 characters, and the locked narrowed region is 16800
characters, roughly equally distributed before and after point. That's
(inside a long line) about three screenfulls before and three screenfulls
after point.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-06 20:13 Major modes using `widen' is a good, even essential, programming practice Alan Mackenzie
` (2 preceding siblings ...)
2022-08-07 13:31 ` Gregory Heytings
@ 2022-08-07 17:57 ` Dmitry Gutov
2022-08-22 11:26 ` Alan Mackenzie
3 siblings, 1 reply; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-07 17:57 UTC (permalink / raw)
To: Alan Mackenzie, emacs-devel, Gregory Heytings
On 06.08.2022 23:13, Alan Mackenzie wrote:
> Narrowing is primarily a user feature. Users can arbitrarily narrow a
> buffer to ANY contiguous region of text. So when a major mode needs to
> examine text even slightly distant from point, it MUST widen, to be sure
> that the text to be examined is within the visible region.
Now wouldn't it have been nice if user-level narrowing didn't create an
*actual* narrowing but only some visual perception of it? IIRC there is
a third-party package which implements this approach.
From what I've seen of feature requests related to narrowing in my
packages, it's always along the lines of "please add (save-restriction
(widen) ...) around the whole implementation".
Are there actually user-level commands which should not ignore
narrowing? If not, it would be better if user-level narrowing was
implemented as something else (e.g. two invisible overlays). Then all
other code wouldn't have to bother with undoing it.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-07 17:57 ` Dmitry Gutov
@ 2022-08-22 11:26 ` Alan Mackenzie
2022-08-22 23:59 ` Dmitry Gutov
0 siblings, 1 reply; 136+ messages in thread
From: Alan Mackenzie @ 2022-08-22 11:26 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: emacs-devel, Gregory Heytings
Hello, Dmitry.
A bit late, but ....
On Sun, Aug 07, 2022 at 20:57:59 +0300, Dmitry Gutov wrote:
> On 06.08.2022 23:13, Alan Mackenzie wrote:
> > Narrowing is primarily a user feature. Users can arbitrarily narrow a
> > buffer to ANY contiguous region of text. So when a major mode needs to
> > examine text even slightly distant from point, it MUST widen, to be sure
> > that the text to be examined is within the visible region.
> Now wouldn't it have been nice if user-level narrowing didn't create an
> *actual* narrowing but only some visual perception of it? IIRC there is
> a third-party package which implements this approach.
I'm not convinced, given how well narrowing currently works. I don't
think it's useful to debate how things _would_ have been, when they are
currently very different.
> From what I've seen of feature requests related to narrowing in my
> packages, it's always along the lines of "please add (save-restriction
> (widen) ...) around the whole implementation".
> Are there actually user-level commands which should not ignore
> narrowing?
Yes, lots and lots of them. goto-char, isearch, occur, and many others.
It might be easier to answer the question which user-level commands are
not restricted by narrowing.
> If not, it would be better if user-level narrowing was implemented as
> something else (e.g. two invisible overlays). Then all other code
> wouldn't have to bother with undoing it.
But "all" other code would instead have to take account of the invisible
overlays instead. I don't think this would be better. It would involve
a _lot_ of work to implement and we'd be left with some other
inconveniences instead of the currently perceived ones.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: Major modes using `widen' is a good, even essential, programming practice.
2022-08-22 11:26 ` Alan Mackenzie
@ 2022-08-22 23:59 ` Dmitry Gutov
0 siblings, 0 replies; 136+ messages in thread
From: Dmitry Gutov @ 2022-08-22 23:59 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel, Gregory Heytings
Hi Alan,
On 22.08.2022 14:26, Alan Mackenzie wrote:
> A bit late, but ....
Not at all.
> On Sun, Aug 07, 2022 at 20:57:59 +0300, Dmitry Gutov wrote:
>> On 06.08.2022 23:13, Alan Mackenzie wrote:
>>> Narrowing is primarily a user feature. Users can arbitrarily narrow a
>>> buffer to ANY contiguous region of text. So when a major mode needs to
>>> examine text even slightly distant from point, it MUST widen, to be sure
>>> that the text to be examined is within the visible region.
>
>> Now wouldn't it have been nice if user-level narrowing didn't create an
>> *actual* narrowing but only some visual perception of it? IIRC there is
>> a third-party package which implements this approach.
>
> I'm not convinced, given how well narrowing currently works. I don't
> think it's useful to debate how things _would_ have been, when they are
> currently very different.
I admit the migration path looks murky.
>> From what I've seen of feature requests related to narrowing in my
>> packages, it's always along the lines of "please add (save-restriction
>> (widen) ...) around the whole implementation".
>
>> Are there actually user-level commands which should not ignore
>> narrowing?
>
> Yes, lots and lots of them. goto-char, isearch, occur, and many others.
> It might be easier to answer the question which user-level commands are
> not restricted by narrowing.
You might want to take a look at the patch I posted in
https://lists.gnu.org/archive/html/emacs-devel/2022-08/msg00644.html
It handles a bunch of commands OOTB (namely, simple navigation and
editing), and Isearch support took about 2 lines of code.
Occur should require attention, one way or the other (it's not obvious
to me that Occur should limit itself to accessible region, but if not,
navigation to inaccessible parts should undo soft-narrowing), but the
implementation should likewise be trivial.
The work in choosing the desired behavior for various commands should
take the most part of the effort.
>> If not, it would be better if user-level narrowing was implemented as
>> something else (e.g. two invisible overlays). Then all other code
>> wouldn't have to bother with undoing it.
>
> But "all" other code would instead have to take account of the invisible
> overlays instead. I don't think this would be better. It would involve
> a _lot_ of work to implement and we'd be left with some other
> inconveniences instead of the currently perceived ones.
Some of the code would be handled automagically. A lot of the code
doesn't want to obey user-level narrowing anyway. And the rest, yes,
would need to be updated.
^ permalink raw reply [flat|nested] 136+ messages in thread