* Re: Gap buffer problem?
@ 2024-12-12 23:40 arthur miller
2024-12-13 3:19 ` Gerd Möllmann
0 siblings, 1 reply; 35+ messages in thread
From: arthur miller @ 2024-12-12 23:40 UTC (permalink / raw)
To: gerd.moellmann@gmail.com; +Cc: emacs-devel@gnu.org
[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: emacs-devel@gnu.org, ofv@wanadoo.es, pipcet@protonmail.com
>>> Date: Wed, 11 Dec 2024 16:51:56 +0100
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
>>> > moves point, then inserts a small number of characters, then moves
>>> > point far away and again inserts a small number of characters, etc.,
>>> > I'd be very surprised if the gap buffer caused significant performance
>>> > problems on a modern CPU.
>>> >
>>> > Can you profile that case and post the expanded profile? I'm always
>>> > happy to be wrong about performance bottlenecks, and profiles are good
>>> > at proving me wrong.
>>>
>>> Maybe I'll try to investigate that further at some point. Such things
>>> always tend to be so time consuming...
>>
>> I meant profiling with "M-x profile-start", then run your slow-down
>> recipe. That should be easy and should not consume any significant
>> time. Analyzing the profile could, but producing it shouldn't.
>
>Plus making it reproducible, if it is.
There is a paper by R. Strandh & others from they work on Climacs.
They have, by now a 20 old year Flexichain implements a circular gap
buffer with the explicit goal to workaround that case. At the same time
it also turns gap buffer into a flexible array usable for other use-cases
like qeues and stacks.
However, they have also gone away from the gap buffer, to something they
call "Cluffer", and which is a strategy where they use a double
linked list of lines. The "open" line is a gap buffer. As they say
in the introduction, the similiar strategy was used in Multics Emacs.
The idea is to allow for incremental editing, a lá
tree sitter I guess, and it is also suitable for multiple cursors. They
also say that for modern hardware the additional memory cost of double
linked lists is not prohibitive.
For those interested relevant papers and sources are here:
https://flexichain.common-lisp.dev/download/StrandhVilleneuveMoore.pdf
https://www.european-lisp-workshop.org/archives/2004/slides/Strandh-slides.pdf
https://github.com/robert-strandh/Flexichain
http://metamodular.com/cluffer.pdf
https://hal.science/hal-01887230v1/file/5-incremental-parsing.pdf
https://research.gold.ac.uk/id/eprint/2351/1/climacssyntax.pdf
https://github.com/robert-strandh/Cluffer
https://github.com/scymtym/text.editing <-- a text editor sans the
application based on the Cluffer
https://github.com/robert-strandh/Second-Climacs <-- text editor
application based on the above text.editing and Cluffer
There is also Lem which uses a similar strategy for its text representation
(if I am not misstaken, long time ago I looked at their code):
https://github.com/lem-project/lem
[https://opengraph.githubassets.com/936483c1586b682884e2dfee10ced041f6306d29afd3df15a26830cc9eb47f65/lem-project/lem]<https://github.com/lem-project/lem>
GitHub
- lem-project/lem: Common Lisp editor/IDE with high expansibility<https://github.com/lem-project/lem>
Common
Lisp editor/IDE with high expansibility. Contribute to lem-project/lem development by creating an account on GitHub.
github.com
Don't know if it is of use for you or not, but perhaps there is some
inspiration there. Haven't seen those papers mentioned anywhere
in this discussion, so thought they might be of interest to some of you.
[-- Attachment #2: Type: text/html, Size: 14511 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-12 23:40 Gap buffer problem? arthur miller @ 2024-12-13 3:19 ` Gerd Möllmann 0 siblings, 0 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-13 3:19 UTC (permalink / raw) To: arthur miller; +Cc: emacs-devel@gnu.org arthur miller <arthur.miller@live.com> writes: > There is a paper by R. Strandh & others from they work on Climacs. > > They have, by now a 20 old year Flexichain implements a circular gap > buffer with the explicit goal to workaround that case. At the same time > it also turns gap buffer into a flexible array usable for other use-cases > like qeues and stacks. > > However, they have also gone away from the gap buffer, to something they > call "Cluffer", and which is a strategy where they use a double > linked list of lines. The "open" line is a gap buffer. As they say > in the introduction, the similiar strategy was used in Multics Emacs. > The idea is to allow for incremental editing, a lá > tree sitter I guess, and it is also suitable for multiple cursors. They > also say that for modern hardware the additional memory cost of double > linked lists is not prohibitive. > > For those interested relevant papers and sources are here: > > https://flexichain.common-lisp.dev/download/StrandhVilleneuveMoore.pdf > https://www.european-lisp-workshop.org/archives/2004/slides/Strandh-slides.pdf > https://github.com/robert-strandh/Flexichain > > http://metamodular.com/cluffer.pdf > https://hal.science/hal-01887230v1/file/5-incremental-parsing.pdf > https://research.gold.ac.uk/id/eprint/2351/1/climacssyntax.pdf > https://github.com/robert-strandh/Cluffer > > https://github.com/scymtym/text.editing <-- a text editor sans the > application based on the Cluffer > > https://github.com/robert-strandh/Second-Climacs <-- text editor > application based on the above text.editing and Cluffer > > There is also Lem which uses a similar strategy for its text representation > (if I am not misstaken, long time ago I looked at their code): > > https://github.com/lem-project/lem > > * GitHub > - lem-project/lem: Common Lisp editor/IDE with high expansibility > Common > Lisp editor/IDE with high expansibility. Contribute to lem-project/lem development by creating an account on GitHub. > github.com > * > > Don't know if it is of use for you or not, but perhaps there is some > inspiration there. Haven't seen those papers mentioned anywhere > in this discussion, so thought they might be of interest to some of you. Thanks for the links! I find them interesting. ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <mailman.39.1723910423.12184.emacs-devel@gnu.org>]
* Re: pdumper on Solaris 10 @ 2024-12-08 16:49 ` Eli Zaretskii 2024-12-08 17:37 ` Pip Cet via Emacs development discussions. 0 siblings, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-08 16:49 UTC (permalink / raw) To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel > Date: Sun, 08 Dec 2024 16:17:53 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > >> Date: Sun, 08 Dec 2024 13:52:09 +0000 > >> From: Pip Cet <pipcet@protonmail.com> > >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > >> > >> "Eli Zaretskii" <eliz@gnu.org> writes: > >> > >> > Which builds except WIDE_EMACS_INT need to use !USE_LSB? > >> > >> The only platforms that "need" to use !USE_LSB are those that cannot > >> guarantee 8-byte alignment for static objects, which is why I asked > >> about those. > > > > That means: none, AFAIK. At least not given the platforms we > > currently support. So it's little wonder that configuration had > > bit-rotten. > > So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB? That'd be a waste of effort. What we have now works, and works well. I'm not interested in throwing away a lot of hard work which got us to where we are with WIDE_EMACS_INT, for advantages which I'm not sure even exist, let alone are significant. Those bits are unused in the WIDE_EMACS_INT build, so using them is a no-brainer, IMO. > >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB. That it > >> currently does is a very questionable optimization at best (fixnum > >> manipulation may be very slightly faster with !USE_LSB, but pointer > >> manipulation will be slower and requires extra registers, which is an > >> issue on i386). > > > > Where can one find i386 these days, except in a museum? > > I meant all x86 systems using the 32-bit instruction set (and, in > particular, its limited exposed register set). Those will be around for > a while. Modern x86 CPUs can handle 64-bit values just fine, thank you. > >> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway. As far > >> as I can tell, the justification for its continued existence is that > >> some C code assumes buffer positions are fixnums (and, because we expose > >> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we > >> had implemented fixnums to be transparent, we could simply remove > >> WIDE_EMACS_INT, but that mistake has been made...) > > > > I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not > > recommended ;-) > > I don't think you should be happy; WIDE_EMACS_INT is sadly necessary to > support buffers > 512MB on 32-bit systems, but you're wasting 32 bits > for almost every Lisp_Object, and registers as well. Why should I care? It isn't like each wasted bit comes with some monetary fine, does it? > As 32-bit systems go away, it will become harder to write Lisp code that > works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to > make WIDE_EMACS_INT the default at some point. If you are trying to convince me to switch to 64-bit development environment, you are wasting your time. I have my very good reasons, and don't plan on doing so any time soon. And 64-but Windows supports 32-bit code perfectly for my needs. > > In fact, one of my strongest reservations about the igc branch is that > > it will most probably force me to lose WIDE_EMACS_INT. > > I believe that problem is exclusively due to the fact that > WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to > use WIDE_EMACS_INT normally in MPS builds, I think. No, there's also a built-in assumption in MPS about the size of a word. > The "low-hanging fruit" performance improvements USE_LSB allows for > (faster stack scanning during GC and many places which don't need to > look at the MSB word at all) are, I think, real, while the way in which > !USE_LSB is superior (we dereference pointer words without having to > untag them first) may reduce code size slightly, but shouldn't really > affect performance. I have no problems with performance that I can report, so I don't expect anyone to waste time and effort on these optimizations. We have enough real problems for the resources we have. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-08 16:49 ` pdumper on Solaris 10 Eli Zaretskii @ 2024-12-08 17:37 ` Pip Cet via Emacs development discussions. 2024-12-08 18:41 ` Eli Zaretskii 0 siblings, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-08 17:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel "Eli Zaretskii" <eliz@gnu.org> writes: >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB? > > That'd be a waste of effort. It'd be a good investment of effort today, in exchange for making the GC code significantly easier to understand and maintain in the future. It would certainly not be without its benefits, so calling it a "waste of effort" is unfair. > I'm not interested in throwing away a lot of hard work which got us to > where we are with WIDE_EMACS_INT, for advantages which I'm not sure > even exist, let alone are significant. I think maintainability of the GC code is significant. > Those bits are unused in the WIDE_EMACS_INT build, so using them is a > no-brainer, IMO. As are the low-order bits of pointers, which have the advantage of already being present in the 32-bit register rather than needing a second register. >> >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB. That it >> >> currently does is a very questionable optimization at best (fixnum >> >> manipulation may be very slightly faster with !USE_LSB, but pointer >> >> manipulation will be slower and requires extra registers, which is an >> >> issue on i386). >> > >> > Where can one find i386 these days, except in a museum? >> >> I meant all x86 systems using the 32-bit instruction set (and, in >> particular, its limited exposed register set). Those will be around for >> a while. > > Modern x86 CPUs can handle 64-bit values just fine, thank you. Modern x86 CPUs running 32-bit code (x86, not x32) still need two register names for each 64-bit value. With 8 GPRs, that's a significant problem. So, no, "just fine" isn't accurate here. >> >> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway. As far >> >> as I can tell, the justification for its continued existence is that >> >> some C code assumes buffer positions are fixnums (and, because we expose >> >> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we >> >> had implemented fixnums to be transparent, we could simply remove >> >> WIDE_EMACS_INT, but that mistake has been made...) >> > >> > I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not >> > recommended ;-) >> >> I don't think you should be happy; WIDE_EMACS_INT is sadly necessary to >> support buffers > 512MB on 32-bit systems, but you're wasting 32 bits >> for almost every Lisp_Object, and registers as well. > > Why should I care? It isn't like each wasted bit comes with some > monetary fine, does it? I think most users of 32-bit systems at this stage do care about wasting a lot of memory, even if you personally don't. >> As 32-bit systems go away, it will become harder to write Lisp code that >> works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to >> make WIDE_EMACS_INT the default at some point. > > If you are trying to convince me to switch to 64-bit development > environment, you are wasting your time. I have my very good reasons, > and don't plan on doing so any time soon. I wasn't, and I'm not sure how you got the impression I was. I meant what I said, that we may have to give up on !WIDE_EMACS_INT 32-bit builds at some point. As you're using WIDE_EMACS_INT already, this wouldn't affect you. >> > In fact, one of my strongest reservations about the igc branch is that >> > it will most probably force me to lose WIDE_EMACS_INT. >> >> I believe that problem is exclusively due to the fact that >> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to >> use WIDE_EMACS_INT normally in MPS builds, I think. > > No, there's also a built-in assumption in MPS about the size of a > word. That's very vague. If there is an assumption that EMACS_INT == mps_word_t, it would certainly not be built into MPS, which doesn't know about EMACS_INT at all. But as it is, I have no idea where you even suspect this "built-in" assumption is made. >> The "low-hanging fruit" performance improvements USE_LSB allows for >> (faster stack scanning during GC and many places which don't need to >> look at the MSB word at all) are, I think, real, while the way in which >> !USE_LSB is superior (we dereference pointer words without having to >> untag them first) may reduce code size slightly, but shouldn't really >> affect performance. > > I have no problems with performance that I can report, so I don't > expect anyone to waste time and effort on these optimizations. We > have enough real problems for the resources we have. If performance and wasted memory aren't issues, then it's a tradeoff between leaving old code untouched and simplifying it to enable future development. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-08 17:37 ` Pip Cet via Emacs development discussions. @ 2024-12-08 18:41 ` Eli Zaretskii 2024-12-09 4:59 ` Stefan Kangas 0 siblings, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-08 18:41 UTC (permalink / raw) To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel > Date: Sun, 08 Dec 2024 17:37:50 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB? > > > > That'd be a waste of effort. > > It'd be a good investment of effort today, in exchange for making the GC > code significantly easier to understand and maintain in the future. It > would certainly not be without its benefits, so calling it a "waste of > effort" is unfair. I disagree. We've lived with this GC code for a long time, and I don't see any complications due to !USE_LSB. And if we are going to switch to igc at some point, investment in GC is even less sensible. I don't see what's unfair in making my position clear. > > I'm not interested in throwing away a lot of hard work which got us to > > where we are with WIDE_EMACS_INT, for advantages which I'm not sure > > even exist, let alone are significant. > > I think maintainability of the GC code is significant. It is, but there are no significant issues there at this time due to !USE_LSB. > > Those bits are unused in the WIDE_EMACS_INT build, so using them is a > > no-brainer, IMO. > > As are the low-order bits of pointers, which have the advantage of > already being present in the 32-bit register rather than needing a > second register. What's your point? The !USE_LSB ode works, the one you suggest needs to be written and debugged. > >> >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB. That it > >> >> currently does is a very questionable optimization at best (fixnum > >> >> manipulation may be very slightly faster with !USE_LSB, but pointer > >> >> manipulation will be slower and requires extra registers, which is an > >> >> issue on i386). > >> > > >> > Where can one find i386 these days, except in a museum? > >> > >> I meant all x86 systems using the 32-bit instruction set (and, in > >> particular, its limited exposed register set). Those will be around for > >> a while. > > > > Modern x86 CPUs can handle 64-bit values just fine, thank you. > > Modern x86 CPUs running 32-bit code (x86, not x32) still need two > register names for each 64-bit value. With 8 GPRs, that's a significant > problem. So, no, "just fine" isn't accurate here. I again disagree. And you forget other registers. > >> > In fact, one of my strongest reservations about the igc branch is that > >> > it will most probably force me to lose WIDE_EMACS_INT. > >> > >> I believe that problem is exclusively due to the fact that > >> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to > >> use WIDE_EMACS_INT normally in MPS builds, I think. > > > > No, there's also a built-in assumption in MPS about the size of a > > word. > > That's very vague. If there is an assumption that EMACS_INT == > mps_word_t, it would certainly not be built into MPS, which doesn't know > about EMACS_INT at all. Not EMACS_INT, Lisp_Object. At least that's what Gerd explained to me back when I asked about WIDE_EMACS_INT in the MPS build. Maybe he can chime in and clarify this. > >> The "low-hanging fruit" performance improvements USE_LSB allows for > >> (faster stack scanning during GC and many places which don't need to > >> look at the MSB word at all) are, I think, real, while the way in which > >> !USE_LSB is superior (we dereference pointer words without having to > >> untag them first) may reduce code size slightly, but shouldn't really > >> affect performance. > > > > I have no problems with performance that I can report, so I don't > > expect anyone to waste time and effort on these optimizations. We > > have enough real problems for the resources we have. > > If performance and wasted memory aren't issues, then it's a tradeoff > between leaving old code untouched and simplifying it to enable future > development. The existing code doesn't preclude nor interfere with future development. So yes, leaving working code untouched is the preference here. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-08 18:41 ` Eli Zaretskii @ 2024-12-09 4:59 ` Stefan Kangas 2024-12-09 14:39 ` Eli Zaretskii 0 siblings, 1 reply; 35+ messages in thread From: Stefan Kangas @ 2024-12-09 4:59 UTC (permalink / raw) To: Eli Zaretskii, Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> Date: Sun, 08 Dec 2024 17:37:50 +0000 >> From: Pip Cet <pipcet@protonmail.com> >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org >> >> "Eli Zaretskii" <eliz@gnu.org> writes: >> >> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB? >> > >> > That'd be a waste of effort. >> >> It'd be a good investment of effort today, in exchange for making the GC >> code significantly easier to understand and maintain in the future. It >> would certainly not be without its benefits, so calling it a "waste of >> effort" is unfair. > > I disagree. We've lived with this GC code for a long time, and I > don't see any complications due to !USE_LSB. And if we are going to > switch to igc at some point, investment in GC is even less sensible. Assuming that we are 100% sure that mpc will land, then I can agree that making any changes here is basically wasted effort. Unless, of course, the change would also simplify the mpc work (would it?). On the other hand, IIUC, we have some way to go with making the merging of the mpc branch a guarantee. While I'm an enthusiastic supporter of the great work that's being done on the mpc branch, isn't hedging our bets prudent until that work is done? Or am I misunderstanding how close we are to merging the mpc branch? >> If performance and wasted memory aren't issues, then it's a tradeoff >> between leaving old code untouched and simplifying it to enable future >> development. > > The existing code doesn't preclude nor interfere with future > development. So yes, leaving working code untouched is the preference > here. Based on my limited mucking around in the GC, it does interfere somewhat because you do need to understand both configurations, at least on a high level, and once you do you need to mentally filter that stuff out when reading the code. So I think I'd appreciate the simplification, at least. If the only known drawbacks are stability concerns, we could also consider an intermediate step along these lines: Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations on master. See what issues crop up, if any. If anything does come up, ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG. If nothing too bad comes up, we can then consider removing the associated code in Emacs 32. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-09 4:59 ` Stefan Kangas @ 2024-12-09 14:39 ` Eli Zaretskii 2024-12-10 0:09 ` Stefan Kangas 0 siblings, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-09 14:39 UTC (permalink / raw) To: Stefan Kangas; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel > From: Stefan Kangas <stefankangas@gmail.com> > Date: Sun, 8 Dec 2024 23:59:14 -0500 > Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > > Eli Zaretskii <eliz@gnu.org> writes: > > >> Date: Sun, 08 Dec 2024 17:37:50 +0000 > >> From: Pip Cet <pipcet@protonmail.com> > >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > >> > >> "Eli Zaretskii" <eliz@gnu.org> writes: > >> > >> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB? > >> > > >> > That'd be a waste of effort. > >> > >> It'd be a good investment of effort today, in exchange for making the GC > >> code significantly easier to understand and maintain in the future. It > >> would certainly not be without its benefits, so calling it a "waste of > >> effort" is unfair. > > > > I disagree. We've lived with this GC code for a long time, and I > > don't see any complications due to !USE_LSB. And if we are going to > > switch to igc at some point, investment in GC is even less sensible. > > Assuming that we are 100% sure that mpc will land, then I can agree that > making any changes here is basically wasted effort. Unless, of course, > the change would also simplify the mpc work (would it?). The igc branch already dropped WIDE_EMACS_INT support, so it only supports USE_LSB anyway. > On the other hand, IIUC, we have some way to go with making the merging > of the mpc branch a guarantee. While I'm an enthusiastic supporter of > the great work that's being done on the mpc branch, isn't hedging our > bets prudent until that work is done? From where I stand, what's left to do on the branch is stability: using the branch, reporting bugs, and fixing them, especially on some rarer platforms (*BSD, for example). Plus some decisions: do we fork MPS or not, for example. So it isn't such a distant future. > Or am I misunderstanding how close we are to merging the mpc branch? Possibly. > >> If performance and wasted memory aren't issues, then it's a tradeoff > >> between leaving old code untouched and simplifying it to enable future > >> development. > > > > The existing code doesn't preclude nor interfere with future > > development. So yes, leaving working code untouched is the preference > > here. > > Based on my limited mucking around in the GC, it does interfere somewhat > because you do need to understand both configurations, at least on a > high level, and once you do you need to mentally filter that stuff out > when reading the code. So I think I'd appreciate the simplification, at > least. The simplification is minuscule at best. We need to mask some bits, either at the LSB end or at MSB end, that's all the difference. And we have macros that hide the differences from most levels. And remember that the original scheme of tagging in Emacs was !USE_LSB, so some veterans might even prefer it. > If the only known drawbacks are stability concerns, we could also > consider an intermediate step along these lines: > > Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations > on master. That would put the WIDE_EMACS_INT configuration at risk, since that configuration will need changes. > See what issues crop up, if any. If anything does come up, > ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting > to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG. If > nothing too bad comes up, we can then consider removing the associated > code in Emacs 32. My point is that all of that could be avoided entirely, given some development decisions which basically drop !USE_LSB_TAG configurations. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-09 14:39 ` Eli Zaretskii @ 2024-12-10 0:09 ` Stefan Kangas 2024-12-10 12:59 ` Eli Zaretskii 0 siblings, 1 reply; 35+ messages in thread From: Stefan Kangas @ 2024-12-10 0:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stefan Kangas <stefankangas@gmail.com> >> Date: Sun, 8 Dec 2024 23:59:14 -0500 >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org >> >> Assuming that we are 100% sure that mpc will land, then I can agree that >> making any changes here is basically wasted effort. Unless, of course, >> the change would also simplify the mpc work (would it?). > > The igc branch already dropped WIDE_EMACS_INT support, so it only > supports USE_LSB anyway. I thought that WIDE_EMACS_INT will remain supported in non-MPS (i.e. "old GC") builds even after the igc merge? Am I mistaken? >> Based on my limited mucking around in the GC, it does interfere somewhat >> because you do need to understand both configurations, at least on a >> high level, and once you do you need to mentally filter that stuff out >> when reading the code. So I think I'd appreciate the simplification, at >> least. > > The simplification is minuscule at best. We need to mask some bits, > either at the LSB end or at MSB end, that's all the difference. And > we have macros that hide the differences from most levels. I agree that it's not a major issue, indeed. You don't need to look at this unless you want to understand how we do GC tagging in detail. OTOH, complexity almost always presents itself in small increments that individually don't look like much. It's only with the combined effect of many such small increments that they become a concern; hence the desire to take similarly small steps towards removing complexity. >> If the only known drawbacks are stability concerns, we could also >> consider an intermediate step along these lines: >> >> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations >> on master. > > That would put the WIDE_EMACS_INT configuration at risk, since that > configuration will need changes. That's why I proposed disabling it on master tentatively, with the option to revert the change if we don't like it. Setting a flag back to 0 is easy enough. But making the experiment I proposed might also demonstrate that we're fine, after all. OTOH, if we don't make the experiment, we have less data on which to base our decision. >> See what issues crop up, if any. If anything does come up, >> ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting >> to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG. If >> nothing too bad comes up, we can then consider removing the associated >> code in Emacs 32. > > My point is that all of that could be avoided entirely, given some > development decisions which basically drop !USE_LSB_TAG > configurations. Is your thinking here that we could merge MPS, wait, and then when it comes time to remove the old GC, we will get to drop !USE_LSB_TAG for free? If yes, couldn't that leave us waiting for a very long time indeed? Or are you saying something else? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-10 0:09 ` Stefan Kangas @ 2024-12-10 12:59 ` Eli Zaretskii 2024-12-10 13:39 ` Óscar Fuentes 0 siblings, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-10 12:59 UTC (permalink / raw) To: Stefan Kangas; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel > From: Stefan Kangas <stefankangas@gmail.com> > Date: Mon, 9 Dec 2024 19:09:59 -0500 > Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, > emacs-devel@gnu.org > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Stefan Kangas <stefankangas@gmail.com> > >> Date: Sun, 8 Dec 2024 23:59:14 -0500 > >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org > >> > >> Assuming that we are 100% sure that mpc will land, then I can agree that > >> making any changes here is basically wasted effort. Unless, of course, > >> the change would also simplify the mpc work (would it?). > > > > The igc branch already dropped WIDE_EMACS_INT support, so it only > > supports USE_LSB anyway. > > I thought that WIDE_EMACS_INT will remain supported in non-MPS > (i.e. "old GC") builds even after the igc merge? Am I mistaken? Probably, but who will want to give up igc to get back WIDE_EMACS_INT (if indeed they are incompatible, which seems to be in disagreement)? I most probably won't. > OTOH, complexity almost always presents itself in small increments that > individually don't look like much. But here we have only a handful of increments, so the sum is also minuscule. > >> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations > >> on master. > > > > That would put the WIDE_EMACS_INT configuration at risk, since that > > configuration will need changes. > > That's why I proposed disabling it on master tentatively, with the > option to revert the change if we don't like it. Setting a flag back to > 0 is easy enough. But making the experiment I proposed might also > demonstrate that we're fine, after all. I think we already know that we are "not fine"? Didn't someone say that stack scanning is broken? > > My point is that all of that could be avoided entirely, given some > > development decisions which basically drop !USE_LSB_TAG > > configurations. > > Is your thinking here that we could merge MPS, wait, and then when it > comes time to remove the old GC, we will get to drop !USE_LSB_TAG for > free? If yes, couldn't that leave us waiting for a very long time > indeed? Maybe so, but why is such a long wait a problem? GC _works_, and works well. There are no pressing problems there, and we've lived with it for many years virtually without changes. What's the urge to make modifications there now, especially when there are chances we will be dropping this GC at some point? IMO, our main task here is to develop the application levels of Emacs, and infrastructure needed to enable such developments. We should only invest efforts in stuff like GC and other basics if we see significant issues, or could envision significant performance gains. There are no such issues or gains here, AFAIU. So diverting our humble resources to such jobs is a mistake, IMO. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-10 12:59 ` Eli Zaretskii @ 2024-12-10 13:39 ` Óscar Fuentes 2024-12-10 15:38 ` Pip Cet via Emacs development discussions. 0 siblings, 1 reply; 35+ messages in thread From: Óscar Fuentes @ 2024-12-10 13:39 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Maybe so, but why is such a long wait a problem? GC _works_, and > works well. Working on certain projects with lsp-mode is a miserable experience due to all the random pauses. My perception of the past week or two using igc is that those pauses are much less jarring, if perceptible at all. I need more time to make a definitive judgment, though. As code edition evolves and Emacs is put on more demanding tasks the limitations of GC become more obvious (and CPUs are not getting faster anymore). Apart from that, I'm convinced that there is quite a bit of evolutionary pressure exerted by GC on the Elisp package ecosystem: something that works too slowly or is too bumpy does not atract users and die. Others may end devoting a lot of effort to optimize GC usage and when they finally work "well enough" (for some generous interpretation) most potential users already made their mind (flx.el is a paradigmatic case) or the package author simply stops working on it, sometimes without making the first release. GC also diminishes the benefits of native-comp and other performance enhancements: no matter how fast you make your Elisp execution engine, the time taken by GC stablishes a hard limit. But the "stop the world" mode of GC operation makes user experience quite worse even if the total time to perform a task is smaller. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: pdumper on Solaris 10 2024-12-10 13:39 ` Óscar Fuentes @ 2024-12-10 15:38 ` Pip Cet via Emacs development discussions. 2024-12-11 5:27 ` Gap buffer problem? Gerd Möllmann 0 siblings, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-10 15:38 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote: > Eli Zaretskii eliz@gnu.org writes: > > Maybe so, but why is such a long wait a problem? GC works, and > > works well. > > Working on certain projects with lsp-mode is a miserable experience due > to all the random pauses. To be fair, part of that may be the gap buffer problem rather than GC. > My perception of the past week or two using igc is that those pauses are > much less jarring, if perceptible at all. I need more time to make a > definitive judgment, though. If you do, and it's negative, please take into account that MPS offers many tunable parameters, and hasn't been fine-tuned for Emacs yet. Even if the current scratch/igc branch isn't satisfactory by itself, it's very likely it can be improved by changing some numbers. > But the "stop the world" mode of GC operation makes user experience > quite worse even if the total time to perform a task is smaller. Of course, these problems are largely fixable, and have been fixed, by such approaches as the fork()-based GC I wrote, which Eli vetoed (I believe the same applies to moving the GC mark bits to their own memory regions, which would have allowed us to interrupt GC on user input). The "don't touch the GC" edict has done a great deal of harm to Emacs; this is relevant because we're now discussing a simplification of the GC code which would help MPS, but is being vetoed (again), while putting effort into making our current code even more complicated by including an impossible code path is being encouraged. So, no, the current GC doesn't work well, it does cause problems, its code is overly complicated, and simplifications would make switching to MPS a lot easier. All is not well in GC land. Put drastically, if MPS fails to land, the most likely reason is the capriciously-applied "do not touch the GC" rule. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Gap buffer problem? 2024-12-10 15:38 ` Pip Cet via Emacs development discussions. @ 2024-12-11 5:27 ` Gerd Möllmann 2024-12-11 8:50 ` Pip Cet via Emacs development discussions. 2024-12-11 14:22 ` Eli Zaretskii 0 siblings, 2 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 5:27 UTC (permalink / raw) To: Pip Cet via Emacs development discussions.; +Cc: Óscar Fuentes, Pip Cet Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org> writes: > On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote: >> Eli Zaretskii eliz@gnu.org writes: >> > Maybe so, but why is such a long wait a problem? GC works, and >> > works well. >> >> Working on certain projects with lsp-mode is a miserable experience due >> to all the random pauses. > > To be fair, part of that may be the gap buffer problem rather than GC. Could you please tell more about the gap buffer problem? I've read a little about the tradeoffs between gap buffers, piece tables, ropes, but I'm wondering if there is something concrete already known for sure that is a performance problem in Emacs. Maybe a bug that has been analyzed or something. (I'm asking because I just recently encountered a performance problem when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and editing there was so slow that it was absolutely no fun, and that on a an M1 pro. Haven't investigated the reason.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 5:27 ` Gap buffer problem? Gerd Möllmann @ 2024-12-11 8:50 ` Pip Cet via Emacs development discussions. 2024-12-11 9:35 ` Gerd Möllmann 2024-12-11 14:22 ` Eli Zaretskii 1 sibling, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 8:50 UTC (permalink / raw) To: Gerd Möllmann Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org> > writes: > >> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote: >>> Eli Zaretskii eliz@gnu.org writes: >>> > Maybe so, but why is such a long wait a problem? GC works, and >>> > works well. >>> >>> Working on certain projects with lsp-mode is a miserable experience due >>> to all the random pauses. >> >> To be fair, part of that may be the gap buffer problem rather than GC. > > Could you please tell more about the gap buffer problem? Just anecdotes, I'm afraid. My problem was a large buffer of test descriptions for a programming language, and I was running the tests and modifying the buffer to contain the output for each test in a block after the test itself. That worked, but running several tests in parallel, moving back and forth in the buffer to modify text as the output came in ... not so much. I also recall discussion somewhere (nullprogram.com, maybe) about multiple cursors and the gap buffer, and that's also a potential use case where the gap buffer would make things very slow. > I've read a little about the tradeoffs between gap buffers, piece > tables, ropes, but I'm wondering if there is something concrete already > known for sure that is a performance problem in Emacs. Maybe a bug that > has been analyzed or something. I'd be very interested in such a bug. Replacing the gap buffer assumption is quite hard: IIRC, the main problem is that the regexp code has been hacked to support gap buffers but not other data structures, so we'd need to do something about that. > (I'm asking because I just recently encountered a performance problem > when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and > editing there was so slow that it was absolutely no fun, and that on a > an M1 pro. Haven't investigated the reason.) Interesting. It may be worth it to try reproducing that and disabling modes one by one to find out which one is at fault. I suspect that it's overlays/the interval tree rather than the gap buffer per se (however, if we ever replace the gap buffer code, we should make sure its replacement actually handles buffer text and text properties/intervals in an integrated manner, rather than storing just buffer text). Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 8:50 ` Pip Cet via Emacs development discussions. @ 2024-12-11 9:35 ` Gerd Möllmann 2024-12-11 11:50 ` Pip Cet via Emacs development discussions. 2024-12-11 12:27 ` Pip Cet via Emacs development discussions. 0 siblings, 2 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 9:35 UTC (permalink / raw) To: Pip Cet Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org> >> writes: >> >>> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote: >>>> Eli Zaretskii eliz@gnu.org writes: >>>> > Maybe so, but why is such a long wait a problem? GC works, and >>>> > works well. >>>> >>>> Working on certain projects with lsp-mode is a miserable experience due >>>> to all the random pauses. >>> >>> To be fair, part of that may be the gap buffer problem rather than GC. >> >> Could you please tell more about the gap buffer problem? > > Just anecdotes, I'm afraid. My problem was a large buffer of test > descriptions for a programming language, and I was running the tests and > modifying the buffer to contain the output for each test in a block > after the test itself. That worked, but running several tests in > parallel, moving back and forth in the buffer to modify text as the > output came in ... not so much. > > I also recall discussion somewhere (nullprogram.com, maybe) about > multiple cursors and the gap buffer, and that's also a potential use > case where the gap buffer would make things very slow. Thanks. > >> I've read a little about the tradeoffs between gap buffers, piece >> tables, ropes, but I'm wondering if there is something concrete already >> known for sure that is a performance problem in Emacs. Maybe a bug that >> has been analyzed or something. > > I'd be very interested in such a bug. Replacing the gap buffer > assumption is quite hard: IIRC, the main problem is that the regexp code > has been hacked to support gap buffers but not other data structures, so > we'd need to do something about that. > >> (I'm asking because I just recently encountered a performance problem >> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and >> editing there was so slow that it was absolutely no fun, and that on a >> an M1 pro. Haven't investigated the reason.) > > Interesting. It may be worth it to try reproducing that and disabling > modes one by one to find out which one is at fault. I suspect that it's > overlays/the interval tree rather than the gap buffer per se (however, Yeah, maybe I'll investigate that further at some point, not sure. I did try with VSCode and Zed now, though, for no good reason. They don't have a problem. > if we ever replace the gap buffer code, we should make sure its > replacement actually handles buffer text and text properties/intervals > in an integrated manner, rather than storing just buffer text). > > Pip And if I may add a wish to the future author: Make whatever you use persistent data structures, so that one could think of letting redisplay run concurrently. Really! :-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 9:35 ` Gerd Möllmann @ 2024-12-11 11:50 ` Pip Cet via Emacs development discussions. 2024-12-11 13:22 ` Gerd Möllmann 2024-12-11 12:27 ` Pip Cet via Emacs development discussions. 1 sibling, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 11:50 UTC (permalink / raw) To: Gerd Möllmann Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Pip Cet <pipcet@protonmail.com> writes: >> if we ever replace the gap buffer code, we should make sure its >> replacement actually handles buffer text and text properties/intervals >> in an integrated manner, rather than storing just buffer text). >> >> Pip > > And if I may add a wish to the future author: Make whatever you use > persistent data structures, so that one could think of letting redisplay > run concurrently. Really! :-) You won't be surprised to hear I've been playing with some code, so could I ask you to expand on this point? What precisely does redisplay require? Full snapshotting or would it be sufficient to have fine-grained locking? (However, before anyone gets their hopes and/or fears up, my code depends on disabling most of the regexp code, and the additional number of garbage-collected objects is so great that I concluded I'd wait for MPS to land before resuming work on it. One of the few distinct advantages of the current gap buffer approach is that it doesn't affect GC...) I know virtually nothing about redisplay. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 11:50 ` Pip Cet via Emacs development discussions. @ 2024-12-11 13:22 ` Gerd Möllmann 2024-12-11 14:53 ` Pip Cet via Emacs development discussions. 0 siblings, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 13:22 UTC (permalink / raw) To: Pip Cet Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes [-- Attachment #1: Type: text/plain, Size: 1621 bytes --] Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Pip Cet <pipcet@protonmail.com> writes: >>> if we ever replace the gap buffer code, we should make sure its >>> replacement actually handles buffer text and text properties/intervals >>> in an integrated manner, rather than storing just buffer text). >>> >>> Pip >> >> And if I may add a wish to the future author: Make whatever you use >> persistent data structures, so that one could think of letting redisplay >> run concurrently. Really! :-) > > You won't be surprised to hear I've been playing with some code, Indeed, I was just thinking to myself "I knew it" :-). Two thumbs up! > so could I ask you to expand on this point? What precisely does > redisplay require? Full snapshotting or would it be sufficient to have > fine-grained locking? Maybe it's helpful when I tell something about the background. Some time last year I asked myself if I could make Emacs more than one of my plenty of CPU cores without solving the multi-threaded Elisp problem. And the idea was that I could do that, possibly, by letting redisplay happen in another thread. I later realized while thinking about the details, that this undertaking is an order of magnitude too large for me. Everything taking more than a few months is. And, in addition, I wouldn't want to do data structures in C anyway. So it's history. Won't happen. But, there is an incomplete, terse, terrible Org file from those times that I kept. I talked a bit about this with Stefan Monnier and Eli at the time, just FYI. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Concurrent redisplay --] [-- Type: text/x-org, Size: 19475 bytes --] :PROPERTIES: :ID: E5E87FA1-48D1-4753-AAAE-E86FB36F5742 :END: #+title: Concurrent Redisplay # -*- mode: org; eval: (auto-fill-mode 1) -*- #+STARTUP: content #+AUTHOR: gerd@gnu.org * Concurrent Redisplay Redisplay is currently performed sequentially as part of Emacs' command loop. The command loop calls =redisplay= to make sure that changes in buffers are made visible on the screen. Concurrent redisplay means to change Emacs' architecture, so that redisplay can be done concurrently with the command loop and running Elisp. In this document, I'm trying to get an impression if a parallel redisplay is achievable, from a very high-level perspective at least. To make thinking about this possible, I make a number of assumptions and simplications, which are described in the following. ** Multi-threaded Lisp This document is no way concerned with making Elisp multi-threaded, if that's possible, if so how, and what else. Due to demand from others, I'm also considering the case that a concurrent redisplay can call Lisp. How this is made possible, I'm not considering. ** Possible Gains - Distribute work on more than one CPU core - Makes it possible to implement advanced display features in the future that would be too costly to perform in a sequential redisplay. ** Concurrency Architecture As a simple to reason about architecture, I assume that Emacs will consist of two modules: - The =main= module consists of command loop and Lisp, and runs in one thread. - The =redisplay= module runs in another thread. Both modules are isolated from each other, and may not access data owned by the other module. Communication between modules only happens by exchanging non-blocking messages. I could imagine a GUI/TUI backend model in this picture, for good measure, but won't consider that further. Random links: The Problem with Threads https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf Plain Threads are the Goto of Modern Computing https://isocpp.org/blog/2014/12/plain-threads-are-the-goto-of-todays-computing ** Display Model Concurrent redisplay in the architecture described must work on a model that it owns. It is assumed for now, that this model represents a buffer's text plus a number of properties/variables relevant to redisplay, like faces that apply to regions of text. See [[*Redisplay Model]]. ** Triggering Redisplay Concurrentl redisplay could choose to display at its own whim, or triggered by receiving a message from the main module. It could, for example, decide to redisplay based on available hardware frame rates. How this is done is not considered here. ** Display Update Roughly speaking, current redisplay can be divided into two parts: - Produce desired glyphs, which describe what the display should look like. - Update the display by comparing current and desired glyphs and calling the GUI/TUI backend(s). Then set the current glyphs to the desired glyphs for the next round. The update part is not considered in the following. There are several conceivable ways to implement an update: - Update in the =redisplay= module * Call GUI backend directly * Post messages to the =main= module, which calls GUI backend * Post messages to a possible GUI module - Update in the =main= module + =Redisplay= posts message containing desired glyphs This looks like a solvable problem to me. So, for simplicity, I don't consider it here. In the following, "redisplay" mainly refers to producing desired glyphs. ** One Window/Buffer For simplicity, I only consider one window displaying one buffer. An interesting, maybe even natural, idea might be to run more than one redisplay in parallel, one for each window, but that is also not in scope here. ** Frame-based Redisplay Also not considered here is the update phase of TTY frames, which currently requires a view of all windows on a frame at once, which is commonly called frame-based redisplay. * Aspects to Consider The following is a list of things to consider when thinking of making redisplax concurrent. ** jit-lock Current redisplay calls run =fontification-functions= to ensure that properties are up to date for the text being displayed. This will not be possible in a concurrent redisplay, unless one assumes that Lisp can be called from multiple threads. Stefan (and Eli?) thinks we will eventually need to be able to call some Lisp-ish code from a concurrent redisplay before it can fully replace the existing synchronous redisplay. I myself would accept a display that is not always 100% accorate, for exmaple because parts of the text have not been fontified yet, or compositions determined. Instead of calling Lisp from redisplay, one use background fontification in the main module together with guesstimates which regions of text will actually be displayed to make sure that fontification results are visible as soon as possible. The redisplay module could also post messages to guide this guessing, for example at the end of redisplay. ** Hooks and functions In general, the idea is that =redisplay= posts messages to the =main= module that certain things have happened. Hooks/functions run by current redisplay are: - window-scroll-functions - activate-menubar-hook (redisplay_internal -> prepare_menu_bar -> update_menu_bar) - update-menubar-hook (update_menu_bar) - fontification-functions (jit-lock, and maybe others) - ? Open: what are the expectations of these hook function about variables, buffers? ** Caches Redisplay needs access to face, font and image caches which are stored on frames (owned by main module). I propose as an idea to remove the caches from frames and give ownership of these to concurrent redisplay. Could be a table frame -> caches. Some communication is necessary from the main module to redisplay for frame changes, clearing caches from Lisp, .... - does face/font/image code called during redisplay call Lisp? - can face etc. code be run from another thread? ** Glyph matrices Glyph matrices are a form of cache, so they should be treated likewise. Depends on which module does the update phase of redisplay. ** Point and mark Needed for region highlighting. Create new model version when changing these. This requires that creating new model versions is reasonably fast. ** window-start computation Redisplay posts message back to main module containing information about what is on the display. Window start/end could be part of that. ** move_it functions This concerns Lisp functions like vertical-motion. Rely (relied?) partly on current glyph matricx, and otherwise on redisplay functions that are used without producing glyphs. - do expect results based on current text, even if not displayed yet. Open. No solution in mind that isn't ugly (locking, maybe). - need display model to operate on. - Pixel positions in text? ** Mouse highlight Open. No idea how that is done nowadays. It used to use the current matrices, only. ** TTYs The update phase of redisplay on ttys needs a view of the whole frame's current and desired glyph matrices for optimization. This is done by giving tty frames matrices, and sub-allocating window matrices from these. The descriptions above are not affected by this, but it has to be kept in mind, for the update phase, which is not yet taken into account. ** minibuffer, reading from Seems to be all the same as other windows, but open. ** Echo area Open (-> window geometry?) ** Bidi Eli says: #+begin_quote I think this can be removed from the list of issues. Basically (with a few caveats, see below, which I don't think change anything in principle), bidi.c is just a subroutine of set_iterator_to_next, which implements the non-linear scanning of buffer text needed for bidi reordering. It effectively causes set_iterator_to_next to move to the next character in _visual_ order, not in buffer position order (the latter would require just incrementing the buffer position). To do this, bidi.c needs access to buffer text, and little else. The caveats I mentioned are: . sometimes we need to figure out the base paragraph direction, either L2R or R2L (the latter will be displayed with characters starting at the right edge of the window instead of the left), in this case bidi.c looks back using regexps for the beginning of the paragraph, because the Unicode Bidirectional Algorithm mandates that the paragraph direction is determined by the first strong directional character of the paragraph . when the buffer includes display properties, bidi.c treats all the characters "covered" by the property as a single neutral character, since this is how images and other such stuff needs to be handled for display reordering purposes -- this requires partial processing of display properties for the single purpose of determining whether they are "replacing" or "non-replacing" properties, and in the former case to determine at which buffer position the display property ends I don't think these caveats change anything, since again they only need to access buffer text. The bidi reordering code maintains a state (struct bidi_it), but it is a sub-structure of struct it, and lives only as long as the iterator object lives. #+end_quote ** Narrowing Don't remember how that is done. ** Selective display Open. Should at long last die. ** Window geometry changes Open. ** Others? * Redisplay Model What is being displayed and how it is displayed depends on - buffer text - Properties of the text (overlays, text properties) - Values of display-relevant variables (=truncate-lines=, ...) Concurrent redisplay mustown such a model, so that no synchronization is necessary between =main= and =redisplay= module. ** Buffer Text *** Copying One could think of making copies of all what is needed for redisplay and let concurrent redisplay work on such a model. I believe this is out of question, for performance reasons. Such a copy would have to be made by the main module, and that could easily cost more than what what we do now in sequential redisplay, especially if we don't exactly know what data redisplay will need (range of text, for example). *** Another "Copying" Possibility Stefan Monnier had another interesting idea that I quote here #+begin_quote Note that you can also use the current text representation with a concurrent redisplay: simply keep a whole copy of the buffer over in the redisplay side. Updating that whole copy should usually be quite efficient thanks to BEG/END_UNCHANGED. #+end_quote *** Persistence To avoid copying, let buffer text be represented as a persistent data structure. Conceptually, this persistent data structure contains an ordered set of buffer-text versions. When the =main= module modifies buffer text, new versions are created. When =redisplay= starts, it picks the youngest version available as buffer zexz. because it is known that any modification in =main= will lead to a new version, and not modify an existing version. The "piece table" is an interesting representation for such a persistent buffer text data structure. Some later descriptions assume that buffer text uses a persistent piece table. Some links: An interesting paper about text representations in general: https://www.cs.unm.edu/~crowley/papers/sds.pdf Piece tables: https://www.averylaird.com/programming/the%20text%20editor/2017/09/30/the-piece-table An implementation of a persistent piece table: https://github.com/cdacamar/fredbuf A blog post about VSCode using piece tables: https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation An implementation of a persistent tree: https://cglab.ca/~dana/pbst/#:~:text=A%20persistent%20binary%20search%20tree,into%2Fdeletion%20from%20the%20tree. ** Properties (Overlays + Text Properties) Properties that are relevant for redisplay are: - =face= - =invisible= - =display= - =composition= Redisplay needs the following information about properties: - start + end position - property value Property values can contain constructs that eval Lisp. Examples: - display (=:when=, =(:height FN)=, ...) - =mode-line-format= may also contain =:eval= If concurrent redisplay cannot call Lisp: The parts of the property values that require evaluating Lisp must be part of the display model in evaluated form. Such a model could contain a map =Lisp_Object= -> =value= (at the time the model version was current), where - the key =Lisp_Object= is the part of the original property value containing =:eval=, for instance. It could be the =cons= cell of an =(:eval ...)= - =value= is the evaluated value Changing properties must create new model versions. - adding/removing/changing props -> new model version - each piece in a piece table could have a list of applicable props for the whole piece. - mass changes could be done without producing lots of new model versions + requires that concurrent redisplay doesn't work on a model that is mass-updated, which could require synchronization, which is ugly. Possible optimizations: - discard/coalesce old model versions in the background, to reduce memory footprint? The main module creates new versions, only. Redisplay uses only the latest version. ** Variables The display model must also contain a snapshot of the values of all relevant variables at the time of the model version. Relevant values are: - truncate-lines - scroll-conservatively - window, frame, buffer, global values (window-start, ...) - ? - todo. make a list * Persistent Data Structures Wikipedia: https://en.wikipedia.org/wiki/Persistent_data_structure ** Terminology Short summary of the terminology: - persistent + general term encompassing veriations below + always preserves versions of itself when modified. + immutable in the sense that they are not changed in-place. - partially persistent + all versions can be read + only newest version can be modified. - fully persistent + all versions can be read + every version can be modified. - confluently persistent + fully persistent + versions can be merged (melded). ** Links Kind of a brief overview: https://academic-accelerator.com/encyclopedia/persistent-data-structure Irmin: Mergeable ropes https://inria.hal.science/hal-01099136v1/document Intersting article: https://blog.acolyer.org/2015/01/14/mergeable-persistent-data-structures/ Partially and fully persistent DS in C (no merges) https://github.com/vineeths96/Persistent-Data-Structures Confluently persistent DS paper https://arxiv.org/pdf/1301.3388.pdf https://www.cs.utexas.edu/~ecprice/papers/confluent_swat.pdf Data visualization with persistent DS https://www.researchgate.net/publication/258713092_Efficient_Dynamic_Data_Visualization_with_Persistent_Data_Structures * Redisplay calling Lisp This is a hypotheical scenario, but Eli and Stefan seem to assume that it is important to have to make concurrent redisplay acceptable to users. - Redisplay calls Lisp to fontify etc. + just assuming that is possible in the future + as a substitute for storing a snapshot in display model. + how calling Lisp from redisplay works on the Lisp side, is not yet specified My conclusions from this: - Properties must be persistent data structures + because no props snapshot in display model + because redisplay needs props corresponding to its buffer-text version - Properties must be confluently persistent data structures + need to be able to modify prop versions in Lisp + need to merge back changes into current versions - buffer modifications from Lisp either + should be prevented (how?) + or buffer-text must be confluently persistent - merge or discard any buffer-text changes (delete version, if it was created). Probably discard. - what about if Lisp changes display-relevant variables? - unclear - Other modifications? ** Merging properties Imagine =fontification-functions= adding properties for font-lock. These modifications should not be lost once concurrent redisplay has finished. That means the properties added to an old version of the buffer text etc must be merged into newer versions. - Confluently persistent props require + way to merge changes to newer versions + consider only merge version n-1 to n - single prop = (beg end value) + position translation - know what changed in buffer-texts from n-1 to n + wanting translation pos_{n-1} to pos_n + piece added in n in front of pos_{n-1} => add length + piece delete in from of pos_{n-1} => subtract + details depend on buffer-text DS - buffer-text DS must take into account that translations must be possible - looks doable + value merging + assume interval [beg end] in the following (including beg and end) + added property - [beg_n end_n] may intersect with 0+ props in version n. - say first intersection is in [a b], a >= beg, b <= end with value val - [beg a-1] -> new value - [a b] -> value-dependent handling of old/new value (merge/discard..., must be defined) - [b+1 end] -> either new value or merge with next intersecing prop from n + changed values - treat as remove + add + removed props - no direct representation in version of props in version n-1 - assume scan whole version n for prop of the same kind - could record min/max pos of changes in n-1 - let [a b val] be "interesting" prop in n (face, ...) - if there is no intersecting prop in [a b] in n-1, what does that mean? - it has been newly added in n compared to n-1 - it was in n-2 and been removed in n-1 - must find out to resolve + can we in all cases? Assuming everything still open can be resolved, this looks doable, but it is certainly non-trivial. * Performance/Memory Considerations The use of persistent data structures will have an impoact on both performance and memory consumption. How large this impact will be I find impossible to tell, especially on older hardware. But keep in mind, that at the time this might be implemented, current hardware will be old. * Personal Conclusions I'm stopping here, despite open questions, because I think I have reached a sufficient level of gut feeling about the subject. I'd summarize my thoughts as: - Concurrent redisplay is feasible, both with and without being able to call Lisp from redisplay. - Changing buffer-text representation using a piece table is a big enough bite that it is only worth it only if a concurrent redisplay comes at some point. - If performance on old hardware will be acceptable, for some value of acceptable, I find unpredictable. - Concurrent redisplay with the ability to call Lisp from redisplay is considerably more complex than without being able to call Lisp. I'd say at least 2 times. - Concurrent redisplay will not happen unless at least 2 or 3 people with enought time decide to work on it. * Random Grab Bag - make pieces for long lines (max length of piece) - concurrency -> dump complicated redisplay optimizations? - pieces provide more detailed information about what text has changed (compared to BEG_UNCHANGED and END_UNCHANGED). Zed editor, rasterization on GPU https://zed.dev/blog/videogame # end. [-- Attachment #3: Type: text/plain, Size: 630 bytes --] It's probably not very helpful, but at least I get the idea of a concurrent redisplay planted into brains, where it can do it's evil work :-). > > (However, before anyone gets their hopes and/or fears up, my code > depends on disabling most of the regexp code, and the additional number > of garbage-collected objects is so great that I concluded I'd wait for > MPS to land before resuming work on it. One of the few distinct > advantages of the current gap buffer approach is that it doesn't affect > GC...) > > I know virtually nothing about redisplay. > > Pip What I've written is pretty high-level, nothing to worry about. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 13:22 ` Gerd Möllmann @ 2024-12-11 14:53 ` Pip Cet via Emacs development discussions. 2024-12-11 15:33 ` Gerd Möllmann 0 siblings, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 14:53 UTC (permalink / raw) To: Gerd Möllmann Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Pip Cet <pipcet@protonmail.com> writes: > >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >>> Pip Cet <pipcet@protonmail.com> writes: >>>> if we ever replace the gap buffer code, we should make sure its >>>> replacement actually handles buffer text and text properties/intervals >>>> in an integrated manner, rather than storing just buffer text). >>>> >>>> Pip >>> >>> And if I may add a wish to the future author: Make whatever you use >>> persistent data structures, so that one could think of letting redisplay >>> run concurrently. Really! :-) >> >> You won't be surprised to hear I've been playing with some code, > > Indeed, I was just thinking to myself "I knew it" :-). > Two thumbs up! > >> so could I ask you to expand on this point? What precisely does >> redisplay require? Full snapshotting or would it be sufficient to have >> fine-grained locking? > > Maybe it's helpful when I tell something about the background. Some time > last year I asked myself if I could make Emacs more than one of my > plenty of CPU cores without solving the multi-threaded Elisp problem. > And the idea was that I could do that, possibly, by letting redisplay > happen in another thread. This may be a very stupid idea, but why not use a separate process? fork() is fast on GNU/Linux, and I suspect on macOS too, and the redisplay child would receive a consistent snapshot of the data to inspect and/or modify while coming up with the redisplay instructions, which it would then send back via a pipe or shared memory to be executed in the main process. I suggested doing something similar for GC (the GC child would perform a full GC and send back the Lisp_Objects which are definitely unreachable via a pipe. No, I never figured out how to make that work for weak hash tables which may resurrect references, I just made all hash tables strong...), and in that case the pipe seemed sufficient for the amount of data that was transferred, but I'm not sure how compact (or otherwise) serialized redisplay "instructions" would be. One issue I see is that fork() does a lot of housekeeping work in addition to marking the child's memory as a COW copy of the parent's memory at the time of the fork(). ISTR you can split that process on GNU/Linux (probably not Android), so you'd already have a prepared thread/LWP which wouldn't need to "start up" when you un-share the memory, but I can't find the relevant manpage right now. However, I have no real idea just how bad the fork() latency would be (as you point out, most people have more CPU cores than they can use, so I don't consider the approximate doubling of CPU usage a problem). This would deal very nicely with fontification code attempting to modify data it shouldn't, by ignoring such modifications. It would also deal with catastrophic failure in the redisplay code, as it's insulated in a separate process and we could just print a nice message in the main process rather than crashing all of Emacs. I'm emphatically not suggesting letting the redisplay child actually communicate with the X server or equivalent. That would be much more difficult. In fact, I think a good way to test this approach would be to use the tty code, since there's already a standard serialization of redisplay instructions for tty displays: VT100 escape sequences. > I later realized while thinking about the details, that this undertaking > is an order of magnitude too large for me. Everything taking more than a > few months is. And, in addition, I wouldn't want to do data structures > in C anyway. I think the VT100 case could be done as a weekend project (those always end up taking several weeks for me...), but I'm not sure it's worth it as VT100 redisplay isn't the common use case, and the performance problems are more visible on GUI terminals. And, like pretty much all Emacs ideas, this depends on having a better GC. (However, I've just experimented with an 8 GB process forking, and it's much slower than I'd hoped for - about 70 ms. I wouldn't be surprised if most of that cost is setting up page tables for the ridiculously small 4KB page size x86 uses, so it may work a lot better for AArch64 systems such as yours). Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 14:53 ` Pip Cet via Emacs development discussions. @ 2024-12-11 15:33 ` Gerd Möllmann 2024-12-11 16:58 ` Eli Zaretskii 0 siblings, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 15:33 UTC (permalink / raw) To: Pip Cet Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Pip Cet <pipcet@protonmail.com> writes: >> >>> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >>> >>>> Pip Cet <pipcet@protonmail.com> writes: >>>>> if we ever replace the gap buffer code, we should make sure its >>>>> replacement actually handles buffer text and text properties/intervals >>>>> in an integrated manner, rather than storing just buffer text). >>>>> >>>>> Pip >>>> >>>> And if I may add a wish to the future author: Make whatever you use >>>> persistent data structures, so that one could think of letting redisplay >>>> run concurrently. Really! :-) >>> >>> You won't be surprised to hear I've been playing with some code, >> >> Indeed, I was just thinking to myself "I knew it" :-). >> Two thumbs up! >> >>> so could I ask you to expand on this point? What precisely does >>> redisplay require? Full snapshotting or would it be sufficient to have >>> fine-grained locking? >> >> Maybe it's helpful when I tell something about the background. Some time >> last year I asked myself if I could make Emacs more than one of my >> plenty of CPU cores without solving the multi-threaded Elisp problem. >> And the idea was that I could do that, possibly, by letting redisplay >> happen in another thread. > > This may be a very stupid idea, but why not use a separate process? Not stupid at all. I thought about something similar in a different context, namely if one could decouple the GUI part of Emacs from the rest. Something like that has been done by Eberhard Mattes for OS/2 with the old redisplay. He had to do that because the whole Presentation Manager (GUI) in OS/2 would block, for all process, when an application did not timely handle events and return to the PM. Something like that. Eberhard's OS/2 Emacs had one process doing the GUI stuff, and one for the rest. Both communicated with each other using a defined message protocol. It worked. Don't remember what he used for process communication, pipes or something else. I got stuck with this idea because everything seemed to depend on everything else nowadays. Redisplay needs to execute Lisp, Font backends I think, not sure. Some GUIs call redisplay (nsterm). And then I imagined the licensing issues, and dropped the idea. Although - NS could really need something done, IMO, which was the reason I thought about that in the first place. NS is not working for me at least. I always wonder why nobody else has the same freezing problems that I have. I think the same dependency problems also creep up with to concurrent redisplay, don't know. Values of variables, faces, jit-lock, and so on. I think it would be "easier" to handle if one has everything in one process. But in principle both could be done. An actor model. > fork() is fast on GNU/Linux, and I suspect on macOS too, and the > redisplay child would receive a consistent snapshot of the data to > inspect and/or modify while coming up with the redisplay instructions, > which it would then send back via a pipe or shared memory to be executed > in the main process. > > I suggested doing something similar for GC (the GC child would perform a > full GC and send back the Lisp_Objects which are definitely unreachable > via a pipe. No, I never figured out how to make that work for weak hash > tables which may resurrect references, I just made all hash tables > strong...), and in that case the pipe seemed sufficient for the amount > of data that was transferred, but I'm not sure how compact (or > otherwise) serialized redisplay "instructions" would be. > > One issue I see is that fork() does a lot of housekeeping work in > addition to marking the child's memory as a COW copy of the parent's > memory at the time of the fork(). ISTR you can split that process on > GNU/Linux (probably not Android), so you'd already have a prepared > thread/LWP which wouldn't need to "start up" when you un-share the > memory, but I can't find the relevant manpage right now. However, I have > no real idea just how bad the fork() latency would be (as you point out, > most people have more CPU cores than they can use, so I don't consider > the approximate doubling of CPU usage a problem). > > This would deal very nicely with fontification code attempting to modify > data it shouldn't, by ignoring such modifications. It would also deal > with catastrophic failure in the redisplay code, as it's insulated in a > separate process and we could just print a nice message in the main > process rather than crashing all of Emacs. > > I'm emphatically not suggesting letting the redisplay child actually > communicate with the X server or equivalent. That would be much more > difficult. > > In fact, I think a good way to test this approach would be to use the > tty code, since there's already a standard serialization of redisplay > instructions for tty displays: VT100 escape sequences. > >> I later realized while thinking about the details, that this undertaking >> is an order of magnitude too large for me. Everything taking more than a >> few months is. And, in addition, I wouldn't want to do data structures >> in C anyway. > > I think the VT100 case could be done as a weekend project (those always > end up taking several weeks for me...), but I'm not sure it's worth it > as VT100 redisplay isn't the common use case, and the performance > problems are more visible on GUI terminals. Yes. In a way, it's already the case that the GUI part of Emacs that I described above for OS/2, is the terminal emulator, and the protocol is VT100. > And, like pretty much all Emacs ideas, this depends on having a better > GC. > > (However, I've just experimented with an 8 GB process forking, and it's > much slower than I'd hoped for - about 70 ms. I wouldn't be surprised > if most of that cost is setting up page tables for the ridiculously > small 4KB page size x86 uses, so it may work a lot better for AArch64 > systems such as yours). > > Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 15:33 ` Gerd Möllmann @ 2024-12-11 16:58 ` Eli Zaretskii 2024-12-11 17:13 ` Gerd Möllmann 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. 0 siblings, 2 replies; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 16:58 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, emacs-devel, ofv > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, > Óscar Fuentes <ofv@wanadoo.es> > Date: Wed, 11 Dec 2024 16:33:18 +0100 > > Pip Cet <pipcet@protonmail.com> writes: > > > This may be a very stupid idea, but why not use a separate process? > > Not stupid at all. I thought about something similar in a different > context, namely if one could decouple the GUI part of Emacs from the > rest. If it can be done by two processes, it can also be done by two threads in the same process. Right? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 16:58 ` Eli Zaretskii @ 2024-12-11 17:13 ` Gerd Möllmann 2024-12-11 17:45 ` Robert Pluim 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. 1 sibling, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 17:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, emacs-devel, ofv Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, >> Óscar Fuentes <ofv@wanadoo.es> >> Date: Wed, 11 Dec 2024 16:33:18 +0100 >> >> Pip Cet <pipcet@protonmail.com> writes: >> >> > This may be a very stupid idea, but why not use a separate process? >> >> Not stupid at all. I thought about something similar in a different >> context, namely if one could decouple the GUI part of Emacs from the >> rest. > > If it can be done by two processes, it can also be done by two threads > in the same process. Right? Yes, I think so. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:13 ` Gerd Möllmann @ 2024-12-11 17:45 ` Robert Pluim 2024-12-11 18:11 ` Gerd Möllmann 2024-12-11 19:08 ` Eli Zaretskii 0 siblings, 2 replies; 35+ messages in thread From: Robert Pluim @ 2024-12-11 17:45 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv >>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said: Gerd> Eli Zaretskii <eliz@gnu.org> writes: >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, >>> Óscar Fuentes <ofv@wanadoo.es> >>> Date: Wed, 11 Dec 2024 16:33:18 +0100 >>> >>> Pip Cet <pipcet@protonmail.com> writes: >>> >>> > This may be a very stupid idea, but why not use a separate process? >>> >>> Not stupid at all. I thought about something similar in a different >>> context, namely if one could decouple the GUI part of Emacs from the >>> rest. >> >> If it can be done by two processes, it can also be done by two threads >> in the same process. Right? Gerd> Yes, I think so. But then you have to throw a lock over all the memory in the non-display thread that might affect redisplay (although come to think of it, youʼd probably need that even when using fork) Robert -- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:45 ` Robert Pluim @ 2024-12-11 18:11 ` Gerd Möllmann 2024-12-11 19:08 ` Eli Zaretskii 1 sibling, 0 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 18:11 UTC (permalink / raw) To: Robert Pluim; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv Robert Pluim <rpluim@gmail.com> writes: >>>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said: > > Gerd> Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> > >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, > >>> Óscar Fuentes <ofv@wanadoo.es> > >>> Date: Wed, 11 Dec 2024 16:33:18 +0100 > >>> > >>> Pip Cet <pipcet@protonmail.com> writes: > >>> > >>> > This may be a very stupid idea, but why not use a separate process? > >>> > >>> Not stupid at all. I thought about something similar in a different > >>> context, namely if one could decouple the GUI part of Emacs from the > >>> rest. > >> > >> If it can be done by two processes, it can also be done by two threads > >> in the same process. Right? > > Gerd> Yes, I think so. > > But then you have to throw a lock over all the memory in the > non-display thread that might affect redisplay (although come to think > of it, youʼd probably need that even when using fork) > > Robert Well, it depends. Assume you have a solution that works in a second process. That solution wouldn't use things in the first process because it can't. Now move that code of the second process to the first process, and make two threads out of the two process, and replace process communication with inter-thread message passing like in an actor model. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:45 ` Robert Pluim 2024-12-11 18:11 ` Gerd Möllmann @ 2024-12-11 19:08 ` Eli Zaretskii 1 sibling, 0 replies; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 19:08 UTC (permalink / raw) To: Robert Pluim; +Cc: gerd.moellmann, pipcet, emacs-devel, ofv > From: Robert Pluim <rpluim@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, pipcet@protonmail.com, > emacs-devel@gnu.org, ofv@wanadoo.es > Date: Wed, 11 Dec 2024 18:45:15 +0100 > > >>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said: > > Gerd> Eli Zaretskii <eliz@gnu.org> writes: > >> > >> If it can be done by two processes, it can also be done by two threads > >> in the same process. Right? > > Gerd> Yes, I think so. > > But then you have to throw a lock over all the memory in the > non-display thread that might affect redisplay No, you copy on write. Exactly like the OS does with forked process. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 16:58 ` Eli Zaretskii 2024-12-11 17:13 ` Gerd Möllmann @ 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. 2024-12-11 19:04 ` Eli Zaretskii 2024-12-11 19:09 ` Gerd Möllmann 1 sibling, 2 replies; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 17:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Gerd Möllmann, emacs-devel, ofv "Eli Zaretskii" <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, >> Óscar Fuentes <ofv@wanadoo.es> >> Date: Wed, 11 Dec 2024 16:33:18 +0100 >> >> Pip Cet <pipcet@protonmail.com> writes: >> >> > This may be a very stupid idea, but why not use a separate process? >> >> Not stupid at all. I thought about something similar in a different >> context, namely if one could decouple the GUI part of Emacs from the >> rest. > > If it can be done by two processes, it can also be done by two threads > in the same process. Right? AFAIU: No, not right. I may have misunderstood, but if the idea is to preserve a consistent state of all Lisp data and buffer text for redisplay to use, the easiest way to ensure that consistency is fork(). The other ways, such as copying all heap objects that might be used by redisplay (and adjusting all internal pointers in such heap objects to point to the copy rather than the original data), probably will end up either being a lot slower or being very specific to the system we're running on. I know that implementing fork() on Windows is very slow, and I don't know about a comparable snapshotting mechanism for Windows. To be honest, though, I'm a bit disappointed that GNU/Linux appears to make fork() take significant time that is proportional to the size of the mapped address space, even if it's never COW-faulted in. I'm pretty sure that could be avoided (and I hope the Linux kernel avoids doing it for swapped-out memory, not that anyone still does that). Concurrent access to Lisp data from several threads requires a locking mechanism (fine-grained or coarse) for all such data, and possibly requires rewriting addresses, which means no "ambiguous" references whatsoever. That's a lot harder than using MPS, which generously allows for ambiguous references. It's possible we could have gotten away with concurrent access by the redisplay machinery if we inhibited GC while the redisplay thread was busy inspecting our data, but inhibiting MPS GC is a lot harder and shouldn't be done for ordinary operations. Oh, and of course mmap() breaks fork()'s snapshotting magic. The reason I said this depends on a new GC is a bit subtle, by the way: the old GC does best if we sacrifice a lot of memory and only run it rarely, which we can usually get away with because RAM is cheap. With a fork()-based approach, memory usage comes with a performance penalty for every fork(), so we need to reduce both memory usage and GC time, which we can't do with non-incremental GC. The last reason it's difficult is that MPS isn't optimized for multi-thread settings: in an ideal world, "scanning" a memory area would use a secondary mapping of the memory, known only to the scanning code, so other threads could continue running while an area is being scanned. With MPS, there is only one mapping, so we need to stop all other threads while one thread un-mprotect()s a memory area to scan it. Unless MPS breaks POSIX threads in some spectacular way, fork() should still work, though. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. @ 2024-12-11 19:04 ` Eli Zaretskii 2024-12-11 19:54 ` Pip Cet via Emacs development discussions. 2024-12-11 19:09 ` Gerd Möllmann 1 sibling, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 19:04 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv > Date: Wed, 11 Dec 2024 17:41:29 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es > > "Eli Zaretskii" <eliz@gnu.org> writes: > > > If it can be done by two processes, it can also be done by two threads > > in the same process. Right? > > AFAIU: No, not right. > > I may have misunderstood, but if the idea is to preserve a consistent > state of all Lisp data and buffer text for redisplay to use, the easiest > way to ensure that consistency is fork(). The other ways, such as > copying all heap objects that might be used by redisplay (and adjusting > all internal pointers in such heap objects to point to the copy rather > than the original data), probably will end up either being a lot slower > or being very specific to the system we're running on. How do you do the same in a forked process? The glyph matrices are not allocated once, they are reallocated constantly. Are you going to fork each time? And if you are, how is it different from copying stuff lazily within the same process, exactly like the OS does with forked processes? > I know that implementing fork() on Windows is very slow, and I don't > know about a comparable snapshotting mechanism for Windows. I'm not talking about Windows, I'm talking about Posix systems. Anyway, the fact that redisplay calls Lisp and Lisp calls back into redisplay all but kills this idea. Gerd's document has also other gotchas. We didn't just give up easily back when we discussed that. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 19:04 ` Eli Zaretskii @ 2024-12-11 19:54 ` Pip Cet via Emacs development discussions. 2024-12-11 20:26 ` Eli Zaretskii 2024-12-11 22:07 ` Dmitry Gutov 0 siblings, 2 replies; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 19:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv "Eli Zaretskii" <eliz@gnu.org> writes: >> Date: Wed, 11 Dec 2024 17:41:29 +0000 >> From: Pip Cet <pipcet@protonmail.com> >> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es >> >> "Eli Zaretskii" <eliz@gnu.org> writes: >> >> > If it can be done by two processes, it can also be done by two threads >> > in the same process. Right? >> >> AFAIU: No, not right. >> >> I may have misunderstood, but if the idea is to preserve a consistent >> state of all Lisp data and buffer text for redisplay to use, the easiest >> way to ensure that consistency is fork(). The other ways, such as >> copying all heap objects that might be used by redisplay (and adjusting >> all internal pointers in such heap objects to point to the copy rather >> than the original data), probably will end up either being a lot slower >> or being very specific to the system we're running on. > > How do you do the same in a forked process? The glyph matrices are > not allocated once, they are reallocated constantly. Are you going to > fork each time? Not necessarily "each time" (meaning once per frame/keystroke), but quite frequently, yes. > And if you are, how is it different from copying > stuff lazily within the same process, exactly like the OS does with > forked processes? It is very different indeed: Copying within a process involves changing the (virtual) addresses that the copied data is at (unless you use an architecture-specific implementation of TLS). The beauty of fork() is that the virtual addresses stay the same, so we don't need to adjust any pointers, which we cannot do because there are ambiguous references to Lisp data. IOW, no, you can't lazily create two copies of Lisp data in the same process. You have to do so eagerly, adjusting any and all pointers (and only those) in the Lisp data before the new data is read for the first time (because what you read might be a pointer, and then it needs to be adjusted). With fork(), you only have to make the copy when the data is being written to, by either process. (Of course you can just access all memory through some sort of API that translates addresses for you, but that would effectively mean we'd be running Emacs on a virtual machine and simulate fork() on it). > Anyway, the fact that redisplay calls Lisp and Lisp calls back into > redisplay all but kills this idea. Gerd's document has also other > gotchas. We didn't just give up easily back when we discussed that. I don't see why the redisplay process would not be able to call Lisp; it's a full Emacs process (with a single thread), except it doesn't have an FD or socket for the window system, and has an extra pipe to communicate with the parent process instead. It's true that the side effects of the called Lisp code won't be visible to the next redisplay process, but such side effects are perilous anyway, and avoiding them would seem to me to be a feature, not a bug. However, if such side effects are desired, we can use IPC to execute Lisp in the main process (some effort) or simply send a "this redisplay needs to happen synchronously" message to the main process, which would kill the current redisplay process and perform a synchronous redisplay (as not all operating systems support fork() reliably, we'll have to retain the ability to redisplay synchronously, either way). But, to be perfectly honest, I'm not sure redisplay is slowing me down the way traditional GC is. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 19:54 ` Pip Cet via Emacs development discussions. @ 2024-12-11 20:26 ` Eli Zaretskii 2024-12-11 22:07 ` Dmitry Gutov 1 sibling, 0 replies; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 20:26 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv > Date: Wed, 11 Dec 2024 19:54:07 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org, ofv@wanadoo.es > > "Eli Zaretskii" <eliz@gnu.org> writes: > > > Anyway, the fact that redisplay calls Lisp and Lisp calls back into > > redisplay all but kills this idea. Gerd's document has also other > > gotchas. We didn't just give up easily back when we discussed that. > > I don't see why the redisplay process would not be able to call Lisp; > it's a full Emacs process (with a single thread) So you are going to fork on each redisplay? And how will you pass back the results of Lisp evaluation, if the other process meanwhile changes the global state (as it's running concurrently)? > except it doesn't have > an FD or socket for the window system, and has an extra pipe to > communicate with the parent process instead. Do you have an estimation of the throughput that such a pipe will need to handle in order to support GUI display? What will you send through the pipe? If you send only some kind of commands, then the other process will need to generate the font glyphs in some way -- the same glyphs that the "redisplay" process already produced. And if you intend to send the pixels, that would be too much traffic, I think. And again, the global state of the receiving process could have changed, which means any high-level data might be useless (e.g., using a font that was unloaded). > It's true that the side effects of the called Lisp code won't be visible > to the next redisplay process, but such side effects are perilous > anyway, and avoiding them would seem to me to be a feature, not a bug. In Emacs, they are a feature, and are expected to work. You'd be surprised to see how many packages and user code rely on that. > But, to be perfectly honest, I'm not sure redisplay is slowing me down > the way traditional GC is. It's the other way around: the Lisp machine blocks user interaction, including the UI and display. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 19:54 ` Pip Cet via Emacs development discussions. 2024-12-11 20:26 ` Eli Zaretskii @ 2024-12-11 22:07 ` Dmitry Gutov 1 sibling, 0 replies; 35+ messages in thread From: Dmitry Gutov @ 2024-12-11 22:07 UTC (permalink / raw) To: Pip Cet, Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv On 11/12/2024 21:54, Pip Cet via Emacs development discussions. wrote: > It is very different indeed: > > Copying within a process involves changing the (virtual) addresses that > the copied data is at (unless you use an architecture-specific > implementation of TLS). The beauty of fork() is that the virtual > addresses stay the same, so we don't need to adjust any pointers, which > we cannot do because there are ambiguous references to Lisp data. > > IOW, no, you can't lazily create two copies of Lisp data in the same > process. You have to do so eagerly, adjusting any and all pointers (and > only those) in the Lisp data before the new data is read for the first > time (because what you read might be a pointer, and then it needs to be > adjusted). With fork(), you only have to make the copy when the data is > being written to, by either process. > > (Of course you can just access all memory through some sort of API that > translates addresses for you, but that would effectively mean we'd be > running Emacs on a virtual machine and simulate fork() on it). Could one really avoid global interpreter lock using fork()? It doesn't sound right: even if you get cheap snapshotting using the underlying OS's mechanisms, could we guarantee that the snapshot is "consistent" from the Lisp VM's point of view. Or if Lisp were not in the picture, that the data structures are consistent anyway, unless the second process adheres to the first one's locks anyway. > But, to be perfectly honest, I'm not sure redisplay is slowing me down > the way traditional GC is. IMO redisplay is not the problem most of time indeed. Sometimes it is, but I'm not sure parallelizing the rendering is the best answer in those scenarios either. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. 2024-12-11 19:04 ` Eli Zaretskii @ 2024-12-11 19:09 ` Gerd Möllmann 2024-12-12 8:55 ` Robert Pluim 1 sibling, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 19:09 UTC (permalink / raw) To: Pip Cet; +Cc: Eli Zaretskii, emacs-devel, ofv Pip Cet <pipcet@protonmail.com> writes: > "Eli Zaretskii" <eliz@gnu.org> writes: > >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>, >>> Óscar Fuentes <ofv@wanadoo.es> >>> Date: Wed, 11 Dec 2024 16:33:18 +0100 >>> >>> Pip Cet <pipcet@protonmail.com> writes: >>> >>> > This may be a very stupid idea, but why not use a separate process? >>> >>> Not stupid at all. I thought about something similar in a different >>> context, namely if one could decouple the GUI part of Emacs from the >>> rest. >> >> If it can be done by two processes, it can also be done by two threads >> in the same process. Right? > > AFAIU: No, not right. > > I may have misunderstood, but if the idea is to preserve a consistent > state of all Lisp data and buffer text for redisplay to use, the easiest > way to ensure that consistency is fork(). The other ways, such as > copying all heap objects that might be used by redisplay (and adjusting > all internal pointers in such heap objects to point to the copy rather > than the original data), probably will end up either being a lot slower > or being very specific to the system we're running on. I may also be misunderstanding, but in principle, I agree with Eli. Say we have processes A and B communicating with each other. Take the code of A and move it to B, possibly with some automatic transformations if A and B have the same source code. Make two threads in the result process for A and B. Replace inter-process message passing with inter-thread message passing. Initial message may be "fork" transferring the world of thread A to thread B. But I'm also thinking too abstract sometimes. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 19:09 ` Gerd Möllmann @ 2024-12-12 8:55 ` Robert Pluim 2024-12-12 10:14 ` Gerd Möllmann 0 siblings, 1 reply; 35+ messages in thread From: Robert Pluim @ 2024-12-12 8:55 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv >>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said: Gerd> I may also be misunderstanding, but in principle, I agree with Eli. Gerd> Say we have processes A and B communicating with each other. Take the Gerd> code of A and move it to B, possibly with some automatic transformations Gerd> if A and B have the same source code. Make two threads in the result Gerd> process for A and B. Replace inter-process message passing with Gerd> inter-thread message passing. Initial message may be "fork" transferring Gerd> the world of thread A to thread B. Your first sentence is doing a lot of lifting there :-) In the two process situation, you automatically have two copies of every object, so you donʼt need to ensure that the processes are not stepping on each other. In the two thread situation you donʼt have that guarantee, unless you have *first* ensured (via locking, or COW, etc) that the object states are consistent. Once youʼve done that, then you can use messaging to keep things synchronized (or again, locking etc) Robert -- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-12 8:55 ` Robert Pluim @ 2024-12-12 10:14 ` Gerd Möllmann 0 siblings, 0 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-12 10:14 UTC (permalink / raw) To: Robert Pluim; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv Robert Pluim <rpluim@gmail.com> writes: >>>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said: > > Gerd> I may also be misunderstanding, but in principle, I agree with Eli. > > Gerd> Say we have processes A and B communicating with each other. Take the > Gerd> code of A and move it to B, possibly with some automatic transformations > Gerd> if A and B have the same source code. Make two threads in the result > Gerd> process for A and B. Replace inter-process message passing with > Gerd> inter-thread message passing. Initial message may be "fork" transferring > Gerd> the world of thread A to thread B. > > Your first sentence is doing a lot of lifting there :-) > > In the two process situation, you automatically have two copies of > every object, so you donʼt need to ensure that the processes are not > stepping on each other. In the two thread situation you donʼt have > that guarantee, unless you have *first* ensured (via locking, or COW, > etc) that the object states are consistent. Once youʼve done that, > then you can use messaging to keep things synchronized (or again, > locking etc) > > Robert Sure, a bit of effort is left to the implementer :-). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 9:35 ` Gerd Möllmann 2024-12-11 11:50 ` Pip Cet via Emacs development discussions. @ 2024-12-11 12:27 ` Pip Cet via Emacs development discussions. 2024-12-11 13:27 ` Gerd Möllmann 1 sibling, 1 reply; 35+ messages in thread From: Pip Cet via Emacs development discussions. @ 2024-12-11 12:27 UTC (permalink / raw) To: Gerd Möllmann Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Pip Cet <pipcet@protonmail.com> writes: > >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> I also recall discussion somewhere (nullprogram.com, maybe) about >> multiple cursors and the gap buffer, and that's also a potential use >> case where the gap buffer would make things very slow. It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The title is "Gap Buffers Are Not Optimized for Multiple Cursors", which seems accurate to me. Pip ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 12:27 ` Pip Cet via Emacs development discussions. @ 2024-12-11 13:27 ` Gerd Möllmann 2024-12-11 15:06 ` Marcus Harnisch 0 siblings, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 13:27 UTC (permalink / raw) To: Pip Cet Cc: Pip Cet via "Emacs development discussions.", Óscar Fuentes Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Pip Cet <pipcet@protonmail.com> writes: >> >>> Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >>> I also recall discussion somewhere (nullprogram.com, maybe) about >>> multiple cursors and the gap buffer, and that's also a potential use >>> case where the gap buffer would make things very slow. > > It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The > title is "Gap Buffers Are Not Optimized for Multiple Cursors", which > seems accurate to me. > > Pip Thanks! Added to my collection. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 13:27 ` Gerd Möllmann @ 2024-12-11 15:06 ` Marcus Harnisch 2024-12-11 22:11 ` Dmitry Gutov 0 siblings, 1 reply; 35+ messages in thread From: Marcus Harnisch @ 2024-12-11 15:06 UTC (permalink / raw) To: emacs-devel On 11/12/2024 14.27, Gerd Möllmann wrote: > Pip Cet <pipcet@protonmail.com> writes: > >> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The >> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which >> seems accurate to me. > > Thanks! Added to my collection. You may be interested in this article, too, which refererences the blog post above: https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 15:06 ` Marcus Harnisch @ 2024-12-11 22:11 ` Dmitry Gutov 2024-12-12 3:49 ` Gerd Möllmann 2024-12-12 6:01 ` Eli Zaretskii 0 siblings, 2 replies; 35+ messages in thread From: Dmitry Gutov @ 2024-12-11 22:11 UTC (permalink / raw) To: Marcus Harnisch, emacs-devel On 11/12/2024 17:06, Marcus Harnisch wrote: > On 11/12/2024 14.27, Gerd Möllmann wrote: >> Pip Cet <pipcet@protonmail.com> writes: >> >>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The >>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which >>> seems accurate to me. >> >> Thanks! Added to my collection. > > You may be interested in this article, too, which refererences the blog > post above: > https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/ To quote from the bottom of the article: The way I see it, gap buffers are better for searching and memory usage, but ropes are better at non-local editing patterns. Despite their simplicity, gap buffers can hold their own in the modern world. Maybe Emacs was on to something. This is also my takeaway from reading a number of other texts on the subject (not benchmarking personally, though, TBF). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 22:11 ` Dmitry Gutov @ 2024-12-12 3:49 ` Gerd Möllmann 2024-12-12 19:07 ` Dmitry Gutov 2024-12-12 6:01 ` Eli Zaretskii 1 sibling, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-12 3:49 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel Dmitry Gutov <dmitry@gutov.dev> writes: > On 11/12/2024 17:06, Marcus Harnisch wrote: >> On 11/12/2024 14.27, Gerd Möllmann wrote: >>> Pip Cet <pipcet@protonmail.com> writes: >>> >>>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The >>>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which >>>> seems accurate to me. >>> >>> Thanks! Added to my collection. >> You may be interested in this article, too, which refererences the >> blog post above: >> https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/ > > To quote from the bottom of the article: > > The way I see it, gap buffers are better for searching and memory > usage, but ropes are better at non-local editing patterns. Despite > their simplicity, gap buffers can hold their own in the modern world. > Maybe Emacs was on to something. > > This is also my takeaway from reading a number of other texts on the > subject (not benchmarking personally, though, TBF). The Zed editor, which is heavily performance-oriented, decided to use ropes. They have are a number of blog entries that I find interesting, for example https://zed.dev/blog/zed-decoded-rope-sumtree VSCode uses persistent piece tables https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-12 3:49 ` Gerd Möllmann @ 2024-12-12 19:07 ` Dmitry Gutov 2024-12-12 19:30 ` Eli Zaretskii 2024-12-12 19:40 ` Gerd Möllmann 0 siblings, 2 replies; 35+ messages in thread From: Dmitry Gutov @ 2024-12-12 19:07 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Marcus Harnisch, emacs-devel On 12/12/2024 05:49, Gerd Möllmann wrote: > The Zed editor, which is heavily performance-oriented, decided to use > ropes. They have are a number of blog entries that I find interesting, > for example > > https://zed.dev/blog/zed-decoded-rope-sumtree IIUC their goal there was a use a data structure that can do everything. They also have an ambition to support live collaboration, which we don't have anything for, and not for the reasons of performance. > VSCode uses persistent piece tables > > https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation And this article compared to the previous "array of strings" implementation. Both editors' data structures (not ropes) seem to have something that can be used like our "newline cache", so if anything I would try to understand whether either has an advantage in that area. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-12 19:07 ` Dmitry Gutov @ 2024-12-12 19:30 ` Eli Zaretskii 2024-12-12 19:40 ` Gerd Möllmann 1 sibling, 0 replies; 35+ messages in thread From: Eli Zaretskii @ 2024-12-12 19:30 UTC (permalink / raw) To: Dmitry Gutov; +Cc: gerd.moellmann, mh-gmane, emacs-devel > Date: Thu, 12 Dec 2024 21:07:25 +0200 > Cc: Marcus Harnisch <mh-gmane@online.de>, emacs-devel@gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > > https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation > > And this article compared to the previous "array of strings" implementation. > > Both editors' data structures (not ropes) seem to have something that > can be used like our "newline cache", so if anything I would try to > understand whether either has an advantage in that area. Our newline cache doesn't really help to solve the problems in the display engine for which it would be good to find an alternative to gap buffer. So I wouldn't waste time on thinking how to replace the newline cache. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-12 19:07 ` Dmitry Gutov 2024-12-12 19:30 ` Eli Zaretskii @ 2024-12-12 19:40 ` Gerd Möllmann 1 sibling, 0 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-12 19:40 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel Dmitry Gutov <dmitry@gutov.dev> writes: > On 12/12/2024 05:49, Gerd Möllmann wrote: > >> The Zed editor, which is heavily performance-oriented, decided to use >> ropes. They have are a number of blog entries that I find interesting, >> for example >> https://zed.dev/blog/zed-decoded-rope-sumtree > > IIUC their goal there was a use a data structure that can do everything. > > They also have an ambition to support live collaboration, which we > don't have anything for, and not for the reasons of performance. I don't find it surprising that being fast isn't the only requirement for a modern editor, TBH. Thing is that from what I observe, Zed seems to have no problem with long lines. But maybe that's a wrong observation. I never did anything that required long-line support. Point I was trying to make is that apparently ropes can support long lines. One could maybe learn from what they do; Rust sources are available, Of course, additional data structures like ones to make positions to actual text and so on belong into the picture. It not just using ropes, but in this case the rope is the basic data structure, and additional data structures depend on the properties of that. > >> VSCode uses persistent piece tables >> https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation > > And this article compared to the previous "array of strings" implementation. > And shows that they are using piece tables, the alternative to ropes, if one doesn't count the other two which are gap buffer and set of lines for the moment. And I think they got long-lines support working, too. And again there are additional data structure involved, and blah blah. > Both editors' data structures (not ropes) seem to have something that > can be used like our "newline cache", so if anything I would try to > understand whether either has an advantage in that area. For me that would be a too narrow perspective, I must admit. But I won't do anything in that area anyway, so please ignore me :-). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 22:11 ` Dmitry Gutov 2024-12-12 3:49 ` Gerd Möllmann @ 2024-12-12 6:01 ` Eli Zaretskii 1 sibling, 0 replies; 35+ messages in thread From: Eli Zaretskii @ 2024-12-12 6:01 UTC (permalink / raw) To: Dmitry Gutov; +Cc: mh-gmane, emacs-devel, Pip Cet > Date: Thu, 12 Dec 2024 00:11:54 +0200 > From: Dmitry Gutov <dmitry@gutov.dev> > > > https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/ > > To quote from the bottom of the article: > > The way I see it, gap buffers are better for searching and memory > usage, but ropes are better at non-local editing patterns. Despite > their simplicity, gap buffers can hold their own in the modern world. > Maybe Emacs was on to something. > > This is also my takeaway from reading a number of other texts on the > subject (not benchmarking personally, though, TBF). Yes. But one important aspect that blog doesn't touch is the potential effect of changing the buffer text data structure on the various Emacs display issues. Some problems in the current display code that cause slow redisplay in some situations (mainly, very long lines) cannot really be solved as long as we stay with buffer text stored as a long C string, with or without the gap. This important aspect of Emacs still awaits serious research of possible alternatives, IMO. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 5:27 ` Gap buffer problem? Gerd Möllmann 2024-12-11 8:50 ` Pip Cet via Emacs development discussions. @ 2024-12-11 14:22 ` Eli Zaretskii 2024-12-11 15:51 ` Gerd Möllmann 1 sibling, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 14:22 UTC (permalink / raw) To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: Óscar Fuentes <ofv@wanadoo.es>, Pip Cet > <pipcet@protonmail.com> > Date: Wed, 11 Dec 2024 06:27:43 +0100 > > Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org> > writes: > > > To be fair, part of that may be the gap buffer problem rather than GC. > > Could you please tell more about the gap buffer problem? > > I've read a little about the tradeoffs between gap buffers, piece > tables, ropes, but I'm wondering if there is something concrete already > known for sure that is a performance problem in Emacs. Maybe a bug that > has been analyzed or something. > > (I'm asking because I just recently encountered a performance problem > when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and > editing there was so slow that it was absolutely no fun, and that on a > an M1 pro. Haven't investigated the reason.) Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that moves point, then inserts a small number of characters, then moves point far away and again inserts a small number of characters, etc., I'd be very surprised if the gap buffer caused significant performance problems on a modern CPU. Can you profile that case and post the expanded profile? I'm always happy to be wrong about performance bottlenecks, and profiles are good at proving me wrong. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 14:22 ` Eli Zaretskii @ 2024-12-11 15:51 ` Gerd Möllmann 2024-12-11 17:06 ` Eli Zaretskii 0 siblings, 1 reply; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 15:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: Óscar Fuentes <ofv@wanadoo.es>, Pip Cet >> <pipcet@protonmail.com> >> Date: Wed, 11 Dec 2024 06:27:43 +0100 >> >> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org> >> writes: >> >> > To be fair, part of that may be the gap buffer problem rather than GC. >> >> Could you please tell more about the gap buffer problem? >> >> I've read a little about the tradeoffs between gap buffers, piece >> tables, ropes, but I'm wondering if there is something concrete already >> known for sure that is a performance problem in Emacs. Maybe a bug that >> has been analyzed or something. >> >> (I'm asking because I just recently encountered a performance problem >> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and >> editing there was so slow that it was absolutely no fun, and that on a >> an M1 pro. Haven't investigated the reason.) > > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that > moves point, then inserts a small number of characters, then moves > point far away and again inserts a small number of characters, etc., > I'd be very surprised if the gap buffer caused significant performance > problems on a modern CPU. > > Can you profile that case and post the expanded profile? I'm always > happy to be wrong about performance bottlenecks, and profiles are good > at proving me wrong. Maybe I'll try to investigate that further at some point. Such things always tend to be so time consuming... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 15:51 ` Gerd Möllmann @ 2024-12-11 17:06 ` Eli Zaretskii 2024-12-11 17:15 ` Gerd Möllmann 0 siblings, 1 reply; 35+ messages in thread From: Eli Zaretskii @ 2024-12-11 17:06 UTC (permalink / raw) To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: emacs-devel@gnu.org, ofv@wanadoo.es, pipcet@protonmail.com > Date: Wed, 11 Dec 2024 16:51:56 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that > > moves point, then inserts a small number of characters, then moves > > point far away and again inserts a small number of characters, etc., > > I'd be very surprised if the gap buffer caused significant performance > > problems on a modern CPU. > > > > Can you profile that case and post the expanded profile? I'm always > > happy to be wrong about performance bottlenecks, and profiles are good > > at proving me wrong. > > Maybe I'll try to investigate that further at some point. Such things > always tend to be so time consuming... I meant profiling with "M-x profile-start", then run your slow-down recipe. That should be easy and should not consume any significant time. Analyzing the profile could, but producing it shouldn't. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Gap buffer problem? 2024-12-11 17:06 ` Eli Zaretskii @ 2024-12-11 17:15 ` Gerd Möllmann 0 siblings, 0 replies; 35+ messages in thread From: Gerd Möllmann @ 2024-12-11 17:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: emacs-devel@gnu.org, ofv@wanadoo.es, pipcet@protonmail.com >> Date: Wed, 11 Dec 2024 16:51:56 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that >> > moves point, then inserts a small number of characters, then moves >> > point far away and again inserts a small number of characters, etc., >> > I'd be very surprised if the gap buffer caused significant performance >> > problems on a modern CPU. >> > >> > Can you profile that case and post the expanded profile? I'm always >> > happy to be wrong about performance bottlenecks, and profiles are good >> > at proving me wrong. >> >> Maybe I'll try to investigate that further at some point. Such things >> always tend to be so time consuming... > > I meant profiling with "M-x profile-start", then run your slow-down > recipe. That should be easy and should not consume any significant > time. Analyzing the profile could, but producing it shouldn't. Plus making it reproducible, if it is. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2024-12-13 3:19 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-12 23:40 Gap buffer problem? arthur miller 2024-12-13 3:19 ` Gerd Möllmann [not found] <mailman.39.1723910423.12184.emacs-devel@gnu.org> 2024-12-08 16:49 ` pdumper on Solaris 10 Eli Zaretskii 2024-12-08 17:37 ` Pip Cet via Emacs development discussions. 2024-12-08 18:41 ` Eli Zaretskii 2024-12-09 4:59 ` Stefan Kangas 2024-12-09 14:39 ` Eli Zaretskii 2024-12-10 0:09 ` Stefan Kangas 2024-12-10 12:59 ` Eli Zaretskii 2024-12-10 13:39 ` Óscar Fuentes 2024-12-10 15:38 ` Pip Cet via Emacs development discussions. 2024-12-11 5:27 ` Gap buffer problem? Gerd Möllmann 2024-12-11 8:50 ` Pip Cet via Emacs development discussions. 2024-12-11 9:35 ` Gerd Möllmann 2024-12-11 11:50 ` Pip Cet via Emacs development discussions. 2024-12-11 13:22 ` Gerd Möllmann 2024-12-11 14:53 ` Pip Cet via Emacs development discussions. 2024-12-11 15:33 ` Gerd Möllmann 2024-12-11 16:58 ` Eli Zaretskii 2024-12-11 17:13 ` Gerd Möllmann 2024-12-11 17:45 ` Robert Pluim 2024-12-11 18:11 ` Gerd Möllmann 2024-12-11 19:08 ` Eli Zaretskii 2024-12-11 17:41 ` Pip Cet via Emacs development discussions. 2024-12-11 19:04 ` Eli Zaretskii 2024-12-11 19:54 ` Pip Cet via Emacs development discussions. 2024-12-11 20:26 ` Eli Zaretskii 2024-12-11 22:07 ` Dmitry Gutov 2024-12-11 19:09 ` Gerd Möllmann 2024-12-12 8:55 ` Robert Pluim 2024-12-12 10:14 ` Gerd Möllmann 2024-12-11 12:27 ` Pip Cet via Emacs development discussions. 2024-12-11 13:27 ` Gerd Möllmann 2024-12-11 15:06 ` Marcus Harnisch 2024-12-11 22:11 ` Dmitry Gutov 2024-12-12 3:49 ` Gerd Möllmann 2024-12-12 19:07 ` Dmitry Gutov 2024-12-12 19:30 ` Eli Zaretskii 2024-12-12 19:40 ` Gerd Möllmann 2024-12-12 6:01 ` Eli Zaretskii 2024-12-11 14:22 ` Eli Zaretskii 2024-12-11 15:51 ` Gerd Möllmann 2024-12-11 17:06 ` Eli Zaretskii 2024-12-11 17:15 ` Gerd Möllmann
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).