Gap buffer problem?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Gap buffer problem?
  2024-12-10 15:38                 ` Pip Cet via Emacs development discussions.
@ 2024-12-11  5:27                   ` Gerd Möllmann
  2024-12-11  8:50                     ` Pip Cet via Emacs development discussions.
  2024-12-11 14:22                     ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11  5:27 UTC (permalink / raw)
  To: Pip Cet via Emacs development discussions.; +Cc: Óscar Fuentes, Pip Cet

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>> Eli Zaretskii eliz@gnu.org writes:
>> > Maybe so, but why is such a long wait a problem? GC works, and
>> > works well.
>> 
>> Working on certain projects with lsp-mode is a miserable experience due
>> to all the random pauses.
>
> To be fair, part of that may be the gap buffer problem rather than GC.

Could you please tell more about the gap buffer problem?

I've read a little about the tradeoffs between gap buffers, piece
tables, ropes, but I'm wondering if there is something concrete already
known for sure that is a performance problem in Emacs. Maybe a bug that
has been analyzed or something.

(I'm asking because I just recently encountered a performance problem
when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
editing there was so slow that it was absolutely no fun, and that on a
an M1 pro. Haven't investigated the reason.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  5:27                   ` Gap buffer problem? Gerd Möllmann
@ 2024-12-11  8:50                     ` Pip Cet via Emacs development discussions.
  2024-12-11  9:35                       ` Gerd Möllmann
  2024-12-11 14:22                     ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11  8:50 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>>> Eli Zaretskii eliz@gnu.org writes:
>>> > Maybe so, but why is such a long wait a problem? GC works, and
>>> > works well.
>>>
>>> Working on certain projects with lsp-mode is a miserable experience due
>>> to all the random pauses.
>>
>> To be fair, part of that may be the gap buffer problem rather than GC.
>
> Could you please tell more about the gap buffer problem?

Just anecdotes, I'm afraid.  My problem was a large buffer of test
descriptions for a programming language, and I was running the tests and
modifying the buffer to contain the output for each test in a block
after the test itself. That worked, but running several tests in
parallel, moving back and forth in the buffer to modify text as the
output came in ... not so much.

I also recall discussion somewhere (nullprogram.com, maybe) about
multiple cursors and the gap buffer, and that's also a potential use
case where the gap buffer would make things very slow.

> I've read a little about the tradeoffs between gap buffers, piece
> tables, ropes, but I'm wondering if there is something concrete already
> known for sure that is a performance problem in Emacs. Maybe a bug that
> has been analyzed or something.

I'd be very interested in such a bug. Replacing the gap buffer
assumption is quite hard: IIRC, the main problem is that the regexp code
has been hacked to support gap buffers but not other data structures, so
we'd need to do something about that.

> (I'm asking because I just recently encountered a performance problem
> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
> editing there was so slow that it was absolutely no fun, and that on a
> an M1 pro. Haven't investigated the reason.)

Interesting. It may be worth it to try reproducing that and disabling
modes one by one to find out which one is at fault. I suspect that it's
overlays/the interval tree rather than the gap buffer per se (however,
if we ever replace the gap buffer code, we should make sure its
replacement actually handles buffer text and text properties/intervals
in an integrated manner, rather than storing just buffer text).

Pip

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  8:50                     ` Pip Cet via Emacs development discussions.
@ 2024-12-11  9:35                       ` Gerd Möllmann
  2024-12-11 11:50                         ` Pip Cet via Emacs development discussions.
  2024-12-11 12:27                         ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11  9:35 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>>
>>> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>>>> Eli Zaretskii eliz@gnu.org writes:
>>>> > Maybe so, but why is such a long wait a problem? GC works, and
>>>> > works well.
>>>>
>>>> Working on certain projects with lsp-mode is a miserable experience due
>>>> to all the random pauses.
>>>
>>> To be fair, part of that may be the gap buffer problem rather than GC.
>>
>> Could you please tell more about the gap buffer problem?
>
> Just anecdotes, I'm afraid.  My problem was a large buffer of test
> descriptions for a programming language, and I was running the tests and
> modifying the buffer to contain the output for each test in a block
> after the test itself. That worked, but running several tests in
> parallel, moving back and forth in the buffer to modify text as the
> output came in ... not so much.
>
> I also recall discussion somewhere (nullprogram.com, maybe) about
> multiple cursors and the gap buffer, and that's also a potential use
> case where the gap buffer would make things very slow.

Thanks.

>
>> I've read a little about the tradeoffs between gap buffers, piece
>> tables, ropes, but I'm wondering if there is something concrete already
>> known for sure that is a performance problem in Emacs. Maybe a bug that
>> has been analyzed or something.
>
> I'd be very interested in such a bug. Replacing the gap buffer
> assumption is quite hard: IIRC, the main problem is that the regexp code
> has been hacked to support gap buffers but not other data structures, so
> we'd need to do something about that.
>
>> (I'm asking because I just recently encountered a performance problem
>> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
>> editing there was so slow that it was absolutely no fun, and that on a
>> an M1 pro. Haven't investigated the reason.)
>
> Interesting. It may be worth it to try reproducing that and disabling
> modes one by one to find out which one is at fault. I suspect that it's
> overlays/the interval tree rather than the gap buffer per se (however,

Yeah, maybe I'll investigate that further at some point, not sure. I did
try with VSCode and Zed now, though, for no good reason. They don't have
a problem.

> if we ever replace the gap buffer code, we should make sure its
> replacement actually handles buffer text and text properties/intervals
> in an integrated manner, rather than storing just buffer text).
>
> Pip

And if I may add a wish to the future author: Make whatever you use 
persistent data structures, so that one could think of letting redisplay
run concurrently. Really! :-)



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  9:35                       ` Gerd Möllmann
@ 2024-12-11 11:50                         ` Pip Cet via Emacs development discussions.
  2024-12-11 13:22                           ` Gerd Möllmann
  2024-12-11 12:27                         ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 11:50 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>> if we ever replace the gap buffer code, we should make sure its
>> replacement actually handles buffer text and text properties/intervals
>> in an integrated manner, rather than storing just buffer text).
>>
>> Pip
>
> And if I may add a wish to the future author: Make whatever you use
> persistent data structures, so that one could think of letting redisplay
> run concurrently. Really! :-)

You won't be surprised to hear I've been playing with some code, so
could I ask you to expand on this point? What precisely does redisplay
require? Full snapshotting or would it be sufficient to have
fine-grained locking?

(However, before anyone gets their hopes and/or fears up, my code
depends on disabling most of the regexp code, and the additional number
of garbage-collected objects is so great that I concluded I'd wait for
MPS to land before resuming work on it. One of the few distinct
advantages of the current gap buffer approach is that it doesn't affect
GC...)

I know virtually nothing about redisplay.

Pip

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  9:35                       ` Gerd Möllmann
  2024-12-11 11:50                         ` Pip Cet via Emacs development discussions.
@ 2024-12-11 12:27                         ` Pip Cet via Emacs development discussions.
  2024-12-11 13:27                           ` Gerd Möllmann
  1 sibling, 1 reply; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 12:27 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> I also recall discussion somewhere (nullprogram.com, maybe) about
>> multiple cursors and the gap buffer, and that's also a potential use
>> case where the gap buffer would make things very slow.

It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
seems accurate to me.

Pip




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 11:50                         ` Pip Cet via Emacs development discussions.
@ 2024-12-11 13:22                           ` Gerd Möllmann
  2024-12-11 14:53                             ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 13:22 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

[-- Attachment #1: Type: text/plain, Size: 1621 bytes --]

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>> if we ever replace the gap buffer code, we should make sure its
>>> replacement actually handles buffer text and text properties/intervals
>>> in an integrated manner, rather than storing just buffer text).
>>>
>>> Pip
>>
>> And if I may add a wish to the future author: Make whatever you use
>> persistent data structures, so that one could think of letting redisplay
>> run concurrently. Really! :-)
>
> You won't be surprised to hear I've been playing with some code, 

Indeed, I was just thinking to myself "I knew it" :-).
Two thumbs up!

> so could I ask you to expand on this point? What precisely does
> redisplay require? Full snapshotting or would it be sufficient to have
> fine-grained locking?

Maybe it's helpful when I tell something about the background. Some time
last year I asked myself if I could make Emacs more than one of my
plenty of CPU cores without solving the multi-threaded Elisp problem.
And the idea was that I could do that, possibly, by letting redisplay
happen in another thread.

I later realized while thinking about the details, that this undertaking
is an order of magnitude too large for me. Everything taking more than a
few months is. And, in addition, I wouldn't want to do data structures
in C anyway.

So it's history. Won't happen. But, there is an incomplete, terse,
terrible Org file from those times that I kept. I talked a bit about
this with Stefan Monnier and Eli at the time, just FYI. 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Concurrent redisplay --]
[-- Type: text/x-org, Size: 19475 bytes --]

:PROPERTIES:
:ID:       E5E87FA1-48D1-4753-AAAE-E86FB36F5742
:END:
#+title: Concurrent Redisplay
# -*- mode: org; eval: (auto-fill-mode 1)  -*-
#+STARTUP: content
#+AUTHOR: gerd@gnu.org

* Concurrent Redisplay

Redisplay is currently performed sequentially as part of Emacs'
command loop. The command loop calls =redisplay= to make sure that
changes in buffers are made visible on the screen.

Concurrent redisplay means to change Emacs' architecture, so that
redisplay can be done concurrently with the command loop and running
Elisp.

In this document, I'm trying to get an impression if a parallel
redisplay is achievable, from a very high-level perspective at least.
To make thinking about this possible, I make a number of assumptions
and simplications, which are described in the following.

** Multi-threaded Lisp

This document is no way concerned with making Elisp multi-threaded, if
that's possible, if so how, and what else.

Due to demand from others, I'm also considering the case that a
concurrent redisplay can call Lisp.  How this is made possible, I'm
not considering.

** Possible Gains

- Distribute work on more than one CPU core
- Makes it possible to implement advanced display features in the
  future that would be too costly to perform in a sequential
  redisplay.

** Concurrency Architecture

As a simple to reason about architecture, I assume that Emacs will
consist of two modules:

- The =main= module consists of command loop and Lisp, and runs in
  one thread.
- The =redisplay= module runs in another thread.

Both modules are isolated from each other, and may not access data
owned by the other module.  Communication between modules only happens
by exchanging non-blocking messages.

I could imagine a GUI/TUI backend model in this picture, for good
measure, but won't consider that further.

Random links:

The Problem with Threads
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

Plain Threads are the Goto of Modern Computing
https://isocpp.org/blog/2014/12/plain-threads-are-the-goto-of-todays-computing

** Display Model

Concurrent redisplay in the architecture described must work on a
model that it owns.

It is assumed for now, that this model represents a buffer's text plus
a number of properties/variables relevant to redisplay, like faces
that apply to regions of text. See [[*Redisplay Model]].

** Triggering Redisplay

Concurrentl redisplay could choose to display at its own whim, or
triggered by receiving a message from the main module.  It could, for
example, decide to redisplay based on available hardware frame
rates. How this is done is not considered here.

** Display Update

Roughly speaking, current redisplay can be divided into two parts:

- Produce desired glyphs, which describe what the display should look
  like.
- Update the display by comparing current and desired glyphs and
  calling the GUI/TUI backend(s).  Then set the current glyphs to the
  desired glyphs for the next round.

The update part is not considered in the following.  There are several
conceivable ways to implement an update:

- Update in the =redisplay= module
  * Call GUI backend directly
  * Post messages to the =main= module, which calls GUI backend
  * Post messages to a possible GUI module
- Update in the =main= module
  + =Redisplay= posts message containing desired glyphs

This looks like a solvable problem to me. So, for simplicity, I don't
consider it here.  In the following, "redisplay" mainly refers to
producing desired glyphs.

** One Window/Buffer

For simplicity, I only consider one window displaying one buffer.

An interesting, maybe even natural, idea might be to run more than one
redisplay in parallel, one for each window, but that is also not in
scope here.

** Frame-based Redisplay

Also not considered here is the update phase of TTY frames, which
currently requires a view of all windows on a frame at once, which is
commonly called frame-based redisplay.

* Aspects to Consider

The following is a list of things to consider when thinking of making
redisplax concurrent.

** jit-lock

Current redisplay calls run =fontification-functions= to ensure that
properties are up to date for the text being displayed.  This will not
be possible in a concurrent redisplay, unless one assumes that Lisp
can be called from multiple threads.

Stefan (and Eli?) thinks we will eventually need to be able to call
some Lisp-ish code from a concurrent redisplay before it can fully
replace the existing synchronous redisplay.

I myself would accept a display that is not always 100% accorate, for
exmaple because parts of the text have not been fontified yet, or
compositions determined.

Instead of calling Lisp from redisplay, one use background
fontification in the main module together with guesstimates which
regions of text will actually be displayed to make sure that
fontification results are visible as soon as possible.

The redisplay module could also post messages to guide this guessing,
for example at the end of redisplay.

** Hooks and functions

In general, the idea is that =redisplay= posts messages to the =main=
module that certain things have happened.

Hooks/functions run by current redisplay are:
  - window-scroll-functions
  - activate-menubar-hook (redisplay_internal -> prepare_menu_bar ->
    update_menu_bar)
  - update-menubar-hook (update_menu_bar)
  - fontification-functions (jit-lock, and maybe others)
  - ?

Open: what are the expectations of these hook function about
variables, buffers?

** Caches

Redisplay needs access to face, font and image caches which are stored
on frames (owned by main module).

I propose as an idea to remove the caches from frames and give
ownership of these to concurrent redisplay.  Could be a table frame ->
caches.  Some communication is necessary from the main module to
redisplay for frame changes, clearing caches from Lisp, ....

- does face/font/image code called during redisplay call Lisp?
- can face etc. code be run from another thread?

** Glyph matrices

Glyph matrices are a form of cache, so they should be treated likewise.

Depends on which module does the update phase of redisplay.

** Point and mark

Needed for region highlighting.

Create new model version when changing these.  This requires that
creating new model versions is reasonably fast.

** window-start computation

Redisplay posts message back to main module containing information
about what is on the display.  Window start/end could be part of that.

** move_it functions

This concerns Lisp functions like vertical-motion.  Rely (relied?)
partly on current glyph matricx, and otherwise on redisplay functions
that are used without producing glyphs.

- do expect results based on current text, even if not displayed yet.

Open.  No solution in mind that isn't ugly (locking, maybe).

- need display model to operate on.
- Pixel positions in text?

** Mouse highlight

Open. No idea how that is done nowadays.  It used to use the current
matrices, only.

** TTYs

The update phase of redisplay on ttys needs a view of the whole
frame's current and desired glyph matrices for optimization.  This is
done by giving tty frames matrices, and sub-allocating window matrices
from these.

The descriptions above are not affected by this, but it has to be kept
in mind, for the update phase, which is not yet taken into account.

** minibuffer, reading from

Seems to be all the same as other windows, but open.

** Echo area

Open (-> window geometry?)

** Bidi

Eli says:

#+begin_quote
I think this can be removed from the list of issues.  Basically (with
a few caveats, see below, which I don't think change anything in
principle), bidi.c is just a subroutine of set_iterator_to_next, which
implements the non-linear scanning of buffer text needed for bidi
reordering.  It effectively causes set_iterator_to_next to move to the
next character in _visual_ order, not in buffer position order (the
latter would require just incrementing the buffer position).  To do
this, bidi.c needs access to buffer text, and little else.

The caveats I mentioned are:

  . sometimes we need to figure out the base paragraph direction,
    either L2R or R2L (the latter will be displayed with characters
    starting at the right edge of the window instead of the left), in
    this case bidi.c looks back using regexps for the beginning of the
    paragraph, because the Unicode Bidirectional Algorithm mandates
    that the paragraph direction is determined by the first strong
    directional character of the paragraph
  . when the buffer includes display properties, bidi.c treats all
    the characters "covered" by the property as a single neutral
    character, since this is how images and other such stuff needs to
    be handled for display reordering purposes -- this requires
    partial processing of display properties for the single purpose of
    determining whether they are "replacing" or "non-replacing"
    properties, and in the former case to determine at which buffer
    position the display property ends

I don't think these caveats change anything, since again they only
need to access buffer text.

The bidi reordering code maintains a state (struct bidi_it), but it is
a sub-structure of struct it, and lives only as long as the iterator
object lives.
#+end_quote

** Narrowing

Don't remember how that is done.

** Selective display

Open.  Should at long last die.

** Window geometry changes

Open.

** Others?

* Redisplay Model

What is being displayed and how it is displayed depends on

- buffer text
- Properties of the text (overlays, text properties)
- Values of display-relevant variables (=truncate-lines=, ...)

Concurrent redisplay mustown such a model, so that no synchronization
is necessary between =main= and =redisplay= module.

** Buffer Text

*** Copying

One could think of making copies of all what is needed for redisplay
and let concurrent redisplay work on such a model.

I believe this is out of question, for performance reasons.

Such a copy would have to be made by the main module, and that could
easily cost more than what what we do now in sequential redisplay,
especially if we don't exactly know what data redisplay will need
(range of text, for example).

*** Another "Copying" Possibility

Stefan Monnier had another interesting idea that I quote here
#+begin_quote
Note that you can also use the current text representation with
a concurrent redisplay: simply keep a whole copy of the buffer over in
the redisplay side.  Updating that whole copy should usually be quite
efficient thanks to BEG/END_UNCHANGED.
#+end_quote

*** Persistence

To avoid copying, let buffer text be represented as a persistent data
structure.

Conceptually, this persistent data structure contains an ordered set
of buffer-text versions.  When the =main= module modifies buffer text,
new versions are created.  When =redisplay= starts, it picks the
youngest version available as buffer zexz. because it is known that
any modification in =main= will lead to a new version, and not modify an
existing version.

The "piece table" is an interesting representation for such a
persistent buffer text data structure.  Some later descriptions assume
that buffer text uses a persistent piece table.

Some links:

An interesting paper about text representations in general:
https://www.cs.unm.edu/~crowley/papers/sds.pdf

Piece tables:
https://www.averylaird.com/programming/the%20text%20editor/2017/09/30/the-piece-table

An implementation of a persistent piece table:
https://github.com/cdacamar/fredbuf

A blog post about VSCode using piece tables:
https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

An implementation of a persistent tree:
https://cglab.ca/~dana/pbst/#:~:text=A%20persistent%20binary%20search%20tree,into%2Fdeletion%20from%20the%20tree.

** Properties (Overlays + Text Properties)

Properties that are relevant for redisplay are:

  - =face=
  - =invisible=
  - =display=
  - =composition=

Redisplay needs the following information about properties:

  - start + end position
  - property value

Property values can contain constructs that eval Lisp.  Examples:

  - display (=:when=, =(:height FN)=, ...)
  - =mode-line-format= may also contain =:eval=

If concurrent redisplay cannot call Lisp:

The parts of the property values that require evaluating Lisp must be
part of the display model in evaluated form.

Such a model could contain a map =Lisp_Object= -> =value= (at the time the
model version was current), where

  - the key =Lisp_Object= is the part of the original property value
    containing =:eval=, for instance.  It could be the =cons= cell of
    an =(:eval ...)=
  - =value= is the evaluated value

Changing properties must create new model versions.

  - adding/removing/changing props -> new model version
  - each piece in a piece table could have a list of applicable props
    for the whole piece.
  - mass changes could be done without producing lots of new model
    versions
    + requires that concurrent redisplay doesn't work on a model that
      is mass-updated, which could require synchronization, which is
      ugly.

Possible optimizations:

  - discard/coalesce old model versions in the background, to reduce
    memory footprint? The main module creates new versions, only.
    Redisplay uses only the latest version.

** Variables

The display model must also contain a snapshot of the values of all
relevant variables at the time of the model version.

Relevant values are:

  - truncate-lines
  - scroll-conservatively
  - window, frame, buffer, global values (window-start, ...)
  - ?
  - todo. make a list

* Persistent Data Structures

Wikipedia:
https://en.wikipedia.org/wiki/Persistent_data_structure

** Terminology

Short summary of the terminology:

- persistent
  + general term encompassing veriations below
  + always preserves versions of itself when modified.
  + immutable in the sense that they are not changed in-place.

- partially persistent
  + all versions can be read
  + only newest version can be modified.

- fully persistent
  + all versions can be read
  + every version can be modified.

- confluently persistent
  + fully persistent
  + versions can be merged (melded).

** Links

Kind of a brief overview:
https://academic-accelerator.com/encyclopedia/persistent-data-structure

Irmin: Mergeable ropes
https://inria.hal.science/hal-01099136v1/document
Intersting article:
https://blog.acolyer.org/2015/01/14/mergeable-persistent-data-structures/

Partially and fully persistent DS in C (no merges)
https://github.com/vineeths96/Persistent-Data-Structures

Confluently persistent DS paper
https://arxiv.org/pdf/1301.3388.pdf
https://www.cs.utexas.edu/~ecprice/papers/confluent_swat.pdf

Data visualization with persistent DS
https://www.researchgate.net/publication/258713092_Efficient_Dynamic_Data_Visualization_with_Persistent_Data_Structures

* Redisplay calling Lisp

This is a hypotheical scenario, but Eli and Stefan seem to assume that
it is important to have to make concurrent redisplay acceptable to
users.

- Redisplay calls Lisp to fontify etc.
  + just assuming that is possible in the future
  + as a substitute for storing a snapshot in display model.
  + how calling Lisp from redisplay works on the Lisp side, is
    not yet specified

My conclusions from this:

- Properties must be persistent data structures
  + because no props snapshot in display model
  + because redisplay needs props corresponding to its buffer-text
    version

- Properties must be confluently persistent data structures
  + need to be able to modify prop versions in Lisp
  + need to merge back changes into current versions

- buffer modifications from Lisp either
  + should be prevented (how?)
  + or buffer-text must be confluently persistent
    - merge or discard any buffer-text changes (delete version, if it
      was created).  Probably discard.

- what about if Lisp changes display-relevant variables?
  - unclear

- Other modifications?

** Merging properties

Imagine =fontification-functions= adding properties for font-lock.
These modifications should not be lost once concurrent redisplay has
finished.  That means the properties added to an old version of the
buffer text etc must be merged into newer versions.

- Confluently persistent props require
  + way to merge changes to newer versions
  + consider only merge version n-1 to n
- single prop = (beg end value)
  + position translation
    - know what changed in buffer-texts from n-1 to n
      + wanting translation pos_{n-1} to pos_n
      + piece added in n in front of pos_{n-1} => add length
      + piece delete in from of pos_{n-1} => subtract
      + details depend on buffer-text DS
	- buffer-text DS must take into account that translations must
          be possible
    - looks doable
  + value merging
    + assume interval [beg end] in the following (including beg and end)
    + added property
      - [beg_n end_n] may intersect with 0+ props in version n.
      - say first intersection is in [a b], a >= beg, b <= end with
        value val
      - [beg a-1] -> new value
      - [a b] -> value-dependent handling of old/new value
        (merge/discard..., must be defined)
      - [b+1 end] -> either new value or merge with next intersecing
        prop from n
    + changed values
      - treat as remove + add
    + removed props
      - no direct representation in version of props in version n-1
      - assume scan whole version n for prop of the same kind
	- could record min/max pos of changes in n-1
        - let [a b val] be "interesting" prop in n (face, ...)
        - if there is no intersecting prop in [a b] in n-1, what does
          that mean?
	  - it has been newly added in n compared to n-1
	  - it was in n-2 and been removed in n-1
	  - must find out to resolve
            + can we in all cases?

Assuming everything still open can be resolved, this looks doable, but
it is certainly non-trivial.

* Performance/Memory Considerations

The use of persistent data structures will have an impoact on both
performance and memory consumption.

How large this impact will be I find impossible to tell, especially on
older hardware.  But keep in mind, that at the time this might be
implemented, current hardware will be old.

* Personal Conclusions

I'm stopping here, despite open questions, because I think I have
reached a sufficient level of gut feeling about the subject.

I'd summarize my thoughts as:

- Concurrent redisplay is feasible, both with and without being able
  to call Lisp from redisplay.
- Changing buffer-text representation using a piece table is a big
  enough bite that it is only worth it only if a concurrent redisplay
  comes at some point.
- If performance on old hardware will be acceptable, for some value of
  acceptable, I find unpredictable.
- Concurrent redisplay with the ability to call Lisp from redisplay is
  considerably more complex than without being able to call Lisp.  I'd
  say at least 2 times.
- Concurrent redisplay will not happen unless at least 2 or 3 people
  with enought time decide to work on it.

* Random Grab Bag

- make pieces for long lines (max length of piece)
- concurrency -> dump complicated redisplay optimizations?
- pieces provide more detailed information about what text has changed
  (compared to BEG_UNCHANGED and END_UNCHANGED).

Zed editor, rasterization on GPU
https://zed.dev/blog/videogame

# end.

[-- Attachment #3: Type: text/plain, Size: 630 bytes --]

It's probably not very helpful, but at least I get the idea of a
concurrent redisplay planted into brains, where it can do it's evil work
:-).

>
> (However, before anyone gets their hopes and/or fears up, my code
> depends on disabling most of the regexp code, and the additional number
> of garbage-collected objects is so great that I concluded I'd wait for
> MPS to land before resuming work on it. One of the few distinct
> advantages of the current gap buffer approach is that it doesn't affect
> GC...)
>
> I know virtually nothing about redisplay.
>
> Pip

What I've written is pretty high-level, nothing to worry about.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 12:27                         ` Pip Cet via Emacs development discussions.
@ 2024-12-11 13:27                           ` Gerd Möllmann
  2024-12-11 15:06                             ` Marcus Harnisch
  0 siblings, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 13:27 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>>> I also recall discussion somewhere (nullprogram.com, maybe) about
>>> multiple cursors and the gap buffer, and that's also a potential use
>>> case where the gap buffer would make things very slow.
>
> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
> seems accurate to me.
>
> Pip

Thanks! Added to my collection. 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  5:27                   ` Gap buffer problem? Gerd Möllmann
  2024-12-11  8:50                     ` Pip Cet via Emacs development discussions.
@ 2024-12-11 14:22                     ` Eli Zaretskii
  2024-12-11 15:51                       ` Gerd Möllmann
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 14:22 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Óscar Fuentes <ofv@wanadoo.es>,  Pip Cet
>  <pipcet@protonmail.com>
> Date: Wed, 11 Dec 2024 06:27:43 +0100
> 
> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
> 
> > To be fair, part of that may be the gap buffer problem rather than GC.
> 
> Could you please tell more about the gap buffer problem?
> 
> I've read a little about the tradeoffs between gap buffers, piece
> tables, ropes, but I'm wondering if there is something concrete already
> known for sure that is a performance problem in Emacs. Maybe a bug that
> has been analyzed or something.
> 
> (I'm asking because I just recently encountered a performance problem
> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
> editing there was so slow that it was absolutely no fun, and that on a
> an M1 pro. Haven't investigated the reason.)

Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
moves point, then inserts a small number of characters, then moves
point far away and again inserts a small number of characters, etc.,
I'd be very surprised if the gap buffer caused significant performance
problems on a modern CPU.

Can you profile that case and post the expanded profile?  I'm always
happy to be wrong about performance bottlenecks, and profiles are good
at proving me wrong.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 13:22                           ` Gerd Möllmann
@ 2024-12-11 14:53                             ` Pip Cet via Emacs development discussions.
  2024-12-11 15:33                               ` Gerd Möllmann
  0 siblings, 1 reply; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 14:53 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>> if we ever replace the gap buffer code, we should make sure its
>>>> replacement actually handles buffer text and text properties/intervals
>>>> in an integrated manner, rather than storing just buffer text).
>>>>
>>>> Pip
>>>
>>> And if I may add a wish to the future author: Make whatever you use
>>> persistent data structures, so that one could think of letting redisplay
>>> run concurrently. Really! :-)
>>
>> You won't be surprised to hear I've been playing with some code,
>
> Indeed, I was just thinking to myself "I knew it" :-).
> Two thumbs up!
>
>> so could I ask you to expand on this point? What precisely does
>> redisplay require? Full snapshotting or would it be sufficient to have
>> fine-grained locking?
>
> Maybe it's helpful when I tell something about the background. Some time
> last year I asked myself if I could make Emacs more than one of my
> plenty of CPU cores without solving the multi-threaded Elisp problem.
> And the idea was that I could do that, possibly, by letting redisplay
> happen in another thread.

This may be a very stupid idea, but why not use a separate process?
fork() is fast on GNU/Linux, and I suspect on macOS too, and the
redisplay child would receive a consistent snapshot of the data to
inspect and/or modify while coming up with the redisplay instructions,
which it would then send back via a pipe or shared memory to be executed
in the main process.

I suggested doing something similar for GC (the GC child would perform a
full GC and send back the Lisp_Objects which are definitely unreachable
via a pipe. No, I never figured out how to make that work for weak hash
tables which may resurrect references, I just made all hash tables
strong...), and in that case the pipe seemed sufficient for the amount
of data that was transferred, but I'm not sure how compact (or
otherwise) serialized redisplay "instructions" would be.

One issue I see is that fork() does a lot of housekeeping work in
addition to marking the child's memory as a COW copy of the parent's
memory at the time of the fork(). ISTR you can split that process on
GNU/Linux (probably not Android), so you'd already have a prepared
thread/LWP which wouldn't need to "start up" when you un-share the
memory, but I can't find the relevant manpage right now. However, I have
no real idea just how bad the fork() latency would be (as you point out,
most people have more CPU cores than they can use, so I don't consider
the approximate doubling of CPU usage a problem).

This would deal very nicely with fontification code attempting to modify
data it shouldn't, by ignoring such modifications. It would also deal
with catastrophic failure in the redisplay code, as it's insulated in a
separate process and we could just print a nice message in the main
process rather than crashing all of Emacs.

I'm emphatically not suggesting letting the redisplay child actually
communicate with the X server or equivalent. That would be much more
difficult.

In fact, I think a good way to test this approach would be to use the
tty code, since there's already a standard serialization of redisplay
instructions for tty displays: VT100 escape sequences.

> I later realized while thinking about the details, that this undertaking
> is an order of magnitude too large for me. Everything taking more than a
> few months is. And, in addition, I wouldn't want to do data structures
> in C anyway.

I think the VT100 case could be done as a weekend project (those always
end up taking several weeks for me...), but I'm not sure it's worth it
as VT100 redisplay isn't the common use case, and the performance
problems are more visible on GUI terminals.

And, like pretty much all Emacs ideas, this depends on having a better
GC.

(However, I've just experimented with an 8 GB process forking, and it's
much slower than I'd hoped for - about 70 ms.  I wouldn't be surprised
if most of that cost is setting up page tables for the ridiculously
small 4KB page size x86 uses, so it may work a lot better for AArch64
systems such as yours).

Pip

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 13:27                           ` Gerd Möllmann
@ 2024-12-11 15:06                             ` Marcus Harnisch
  2024-12-11 22:11                               ` Dmitry Gutov
  0 siblings, 1 reply; 35+ messages in thread
From: Marcus Harnisch @ 2024-12-11 15:06 UTC (permalink / raw)
  To: emacs-devel

On 11/12/2024 14.27, Gerd Möllmann wrote:
> Pip Cet <pipcet@protonmail.com> writes:
> 
>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>> seems accurate to me.
> 
> Thanks! Added to my collection.

You may be interested in this article, too, which refererences the blog 
post above:
https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 14:53                             ` Pip Cet via Emacs development discussions.
@ 2024-12-11 15:33                               ` Gerd Möllmann
  2024-12-11 16:58                                 ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 15:33 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>>> Pip Cet <pipcet@protonmail.com> writes:
>>>>> if we ever replace the gap buffer code, we should make sure its
>>>>> replacement actually handles buffer text and text properties/intervals
>>>>> in an integrated manner, rather than storing just buffer text).
>>>>>
>>>>> Pip
>>>>
>>>> And if I may add a wish to the future author: Make whatever you use
>>>> persistent data structures, so that one could think of letting redisplay
>>>> run concurrently. Really! :-)
>>>
>>> You won't be surprised to hear I've been playing with some code,
>>
>> Indeed, I was just thinking to myself "I knew it" :-).
>> Two thumbs up!
>>
>>> so could I ask you to expand on this point? What precisely does
>>> redisplay require? Full snapshotting or would it be sufficient to have
>>> fine-grained locking?
>>
>> Maybe it's helpful when I tell something about the background. Some time
>> last year I asked myself if I could make Emacs more than one of my
>> plenty of CPU cores without solving the multi-threaded Elisp problem.
>> And the idea was that I could do that, possibly, by letting redisplay
>> happen in another thread.
>
> This may be a very stupid idea, but why not use a separate process?

Not stupid at all. I thought about something similar in a different
context, namely if one could decouple the GUI part of Emacs from the
rest.

Something like that has been done by Eberhard Mattes for OS/2 with the
old redisplay. He had to do that because the whole Presentation Manager
(GUI) in OS/2 would block, for all process, when an application did not
timely handle events and return to the PM. Something like that.
Eberhard's OS/2 Emacs had one process doing the GUI stuff, and one for
the rest. Both communicated with each other using a defined message
protocol. It worked. Don't remember what he used for process
communication, pipes or something else.

I got stuck with this idea because everything seemed to depend on
everything else nowadays. Redisplay needs to execute Lisp, Font backends
I think, not sure. Some GUIs call redisplay (nsterm). And then I
imagined the licensing issues, and dropped the idea. Although - NS could
really need something done, IMO, which was the reason I thought about
that in the first place. NS is not working for me at least. I always
wonder why nobody else has the same freezing problems that I have.

I think the same dependency problems also creep up with to concurrent
redisplay, don't know. Values of variables, faces, jit-lock, and so on.
I think it would be "easier" to handle if one has everything in one
process.

But in principle both could be done. An actor model.

> fork() is fast on GNU/Linux, and I suspect on macOS too, and the
> redisplay child would receive a consistent snapshot of the data to
> inspect and/or modify while coming up with the redisplay instructions,
> which it would then send back via a pipe or shared memory to be executed
> in the main process.
>
> I suggested doing something similar for GC (the GC child would perform a
> full GC and send back the Lisp_Objects which are definitely unreachable
> via a pipe. No, I never figured out how to make that work for weak hash
> tables which may resurrect references, I just made all hash tables
> strong...), and in that case the pipe seemed sufficient for the amount
> of data that was transferred, but I'm not sure how compact (or
> otherwise) serialized redisplay "instructions" would be.
>
> One issue I see is that fork() does a lot of housekeeping work in
> addition to marking the child's memory as a COW copy of the parent's
> memory at the time of the fork(). ISTR you can split that process on
> GNU/Linux (probably not Android), so you'd already have a prepared
> thread/LWP which wouldn't need to "start up" when you un-share the
> memory, but I can't find the relevant manpage right now. However, I have
> no real idea just how bad the fork() latency would be (as you point out,
> most people have more CPU cores than they can use, so I don't consider
> the approximate doubling of CPU usage a problem).
>
> This would deal very nicely with fontification code attempting to modify
> data it shouldn't, by ignoring such modifications. It would also deal
> with catastrophic failure in the redisplay code, as it's insulated in a
> separate process and we could just print a nice message in the main
> process rather than crashing all of Emacs.
>
> I'm emphatically not suggesting letting the redisplay child actually
> communicate with the X server or equivalent. That would be much more
> difficult.
>
> In fact, I think a good way to test this approach would be to use the
> tty code, since there's already a standard serialization of redisplay
> instructions for tty displays: VT100 escape sequences.
>
>> I later realized while thinking about the details, that this undertaking
>> is an order of magnitude too large for me. Everything taking more than a
>> few months is. And, in addition, I wouldn't want to do data structures
>> in C anyway.
>
> I think the VT100 case could be done as a weekend project (those always
> end up taking several weeks for me...), but I'm not sure it's worth it
> as VT100 redisplay isn't the common use case, and the performance
> problems are more visible on GUI terminals.

Yes. In a way, it's already the case that the GUI part of Emacs that I
described above for OS/2, is the terminal emulator, and the protocol is
VT100.

> And, like pretty much all Emacs ideas, this depends on having a better
> GC.
>
> (However, I've just experimented with an 8 GB process forking, and it's
> much slower than I'd hoped for - about 70 ms.  I wouldn't be surprised
> if most of that cost is setting up page tables for the ridiculously
> small 4KB page size x86 uses, so it may work a lot better for AArch64
> systems such as yours).
>
> Pip



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 14:22                     ` Eli Zaretskii
@ 2024-12-11 15:51                       ` Gerd Möllmann
  2024-12-11 17:06                         ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 15:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Óscar Fuentes <ofv@wanadoo.es>,  Pip Cet
>>  <pipcet@protonmail.com>
>> Date: Wed, 11 Dec 2024 06:27:43 +0100
>> 
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>> 
>> > To be fair, part of that may be the gap buffer problem rather than GC.
>> 
>> Could you please tell more about the gap buffer problem?
>> 
>> I've read a little about the tradeoffs between gap buffers, piece
>> tables, ropes, but I'm wondering if there is something concrete already
>> known for sure that is a performance problem in Emacs. Maybe a bug that
>> has been analyzed or something.
>> 
>> (I'm asking because I just recently encountered a performance problem
>> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
>> editing there was so slow that it was absolutely no fun, and that on a
>> an M1 pro. Haven't investigated the reason.)
>
> Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
> moves point, then inserts a small number of characters, then moves
> point far away and again inserts a small number of characters, etc.,
> I'd be very surprised if the gap buffer caused significant performance
> problems on a modern CPU.
>
> Can you profile that case and post the expanded profile?  I'm always
> happy to be wrong about performance bottlenecks, and profiles are good
> at proving me wrong.

Maybe I'll try to investigate that further at some point. Such things
always tend to be so time consuming...



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:33                               ` Gerd Möllmann
@ 2024-12-11 16:58                                 ` Eli Zaretskii
  2024-12-11 17:13                                   ` Gerd Möllmann
  2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 16:58 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, emacs-devel, ofv

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>  Óscar Fuentes <ofv@wanadoo.es>
> Date: Wed, 11 Dec 2024 16:33:18 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > This may be a very stupid idea, but why not use a separate process?
> 
> Not stupid at all. I thought about something similar in a different
> context, namely if one could decouple the GUI part of Emacs from the
> rest.

If it can be done by two processes, it can also be done by two threads
in the same process.  Right?



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:51                       ` Gerd Möllmann
@ 2024-12-11 17:06                         ` Eli Zaretskii
  2024-12-11 17:15                           ` Gerd Möllmann
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 17:06 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: emacs-devel@gnu.org,  ofv@wanadoo.es,  pipcet@protonmail.com
> Date: Wed, 11 Dec 2024 16:51:56 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
> > moves point, then inserts a small number of characters, then moves
> > point far away and again inserts a small number of characters, etc.,
> > I'd be very surprised if the gap buffer caused significant performance
> > problems on a modern CPU.
> >
> > Can you profile that case and post the expanded profile?  I'm always
> > happy to be wrong about performance bottlenecks, and profiles are good
> > at proving me wrong.
> 
> Maybe I'll try to investigate that further at some point. Such things
> always tend to be so time consuming...

I meant profiling with "M-x profile-start", then run your slow-down
recipe.  That should be easy and should not consume any significant
time.  Analyzing the profile could, but producing it shouldn't.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 16:58                                 ` Eli Zaretskii
@ 2024-12-11 17:13                                   ` Gerd Möllmann
  2024-12-11 17:45                                     ` Robert Pluim
  2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, emacs-devel, ofv

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>  Óscar Fuentes <ofv@wanadoo.es>
>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>> 
>> Pip Cet <pipcet@protonmail.com> writes:
>> 
>> > This may be a very stupid idea, but why not use a separate process?
>> 
>> Not stupid at all. I thought about something similar in a different
>> context, namely if one could decouple the GUI part of Emacs from the
>> rest.
>
> If it can be done by two processes, it can also be done by two threads
> in the same process.  Right?

Yes, I think so.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:06                         ` Eli Zaretskii
@ 2024-12-11 17:15                           ` Gerd Möllmann
  0 siblings, 0 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 17:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: emacs-devel@gnu.org,  ofv@wanadoo.es,  pipcet@protonmail.com
>> Date: Wed, 11 Dec 2024 16:51:56 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
>> > moves point, then inserts a small number of characters, then moves
>> > point far away and again inserts a small number of characters, etc.,
>> > I'd be very surprised if the gap buffer caused significant performance
>> > problems on a modern CPU.
>> >
>> > Can you profile that case and post the expanded profile?  I'm always
>> > happy to be wrong about performance bottlenecks, and profiles are good
>> > at proving me wrong.
>> 
>> Maybe I'll try to investigate that further at some point. Such things
>> always tend to be so time consuming...
>
> I meant profiling with "M-x profile-start", then run your slow-down
> recipe.  That should be easy and should not consume any significant
> time.  Analyzing the profile could, but producing it shouldn't.

Plus making it reproducible, if it is.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 16:58                                 ` Eli Zaretskii
  2024-12-11 17:13                                   ` Gerd Möllmann
@ 2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 19:04                                     ` Eli Zaretskii
  2024-12-11 19:09                                     ` Gerd Möllmann
  1 sibling, 2 replies; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 17:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, emacs-devel, ofv

"Eli Zaretskii" <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>  Óscar Fuentes <ofv@wanadoo.es>
>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>> > This may be a very stupid idea, but why not use a separate process?
>>
>> Not stupid at all. I thought about something similar in a different
>> context, namely if one could decouple the GUI part of Emacs from the
>> rest.
>
> If it can be done by two processes, it can also be done by two threads
> in the same process.  Right?

AFAIU: No, not right.

I may have misunderstood, but if the idea is to preserve a consistent
state of all Lisp data and buffer text for redisplay to use, the easiest
way to ensure that consistency is fork().  The other ways, such as
copying all heap objects that might be used by redisplay (and adjusting
all internal pointers in such heap objects to point to the copy rather
than the original data), probably will end up either being a lot slower
or being very specific to the system we're running on.

I know that implementing fork() on Windows is very slow, and I don't
know about a comparable snapshotting mechanism for Windows.

To be honest, though, I'm a bit disappointed that GNU/Linux appears to
make fork() take significant time that is proportional to the size of
the mapped address space, even if it's never COW-faulted in.  I'm pretty
sure that could be avoided (and I hope the Linux kernel avoids doing it
for swapped-out memory, not that anyone still does that).

Concurrent access to Lisp data from several threads requires a locking
mechanism (fine-grained or coarse) for all such data, and possibly
requires rewriting addresses, which means no "ambiguous" references
whatsoever. That's a lot harder than using MPS, which generously allows
for ambiguous references.

It's possible we could have gotten away with concurrent access by the
redisplay machinery if we inhibited GC while the redisplay thread was
busy inspecting our data, but inhibiting MPS GC is a lot harder and
shouldn't be done for ordinary operations.

Oh, and of course mmap() breaks fork()'s snapshotting magic.

The reason I said this depends on a new GC is a bit subtle, by the way:
the old GC does best if we sacrifice a lot of memory and only run it
rarely, which we can usually get away with because RAM is cheap.  With a
fork()-based approach, memory usage comes with a performance penalty for
every fork(), so we need to reduce both memory usage and GC time, which
we can't do with non-incremental GC.

The last reason it's difficult is that MPS isn't optimized for
multi-thread settings: in an ideal world, "scanning" a memory area would
use a secondary mapping of the memory, known only to the scanning code,
so other threads could continue running while an area is being scanned.
With MPS, there is only one mapping, so we need to stop all other
threads while one thread un-mprotect()s a memory area to scan it.

Unless MPS breaks POSIX threads in some spectacular way, fork() should
still work, though.

Pip

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:13                                   ` Gerd Möllmann
@ 2024-12-11 17:45                                     ` Robert Pluim
  2024-12-11 18:11                                       ` Gerd Möllmann
  2024-12-11 19:08                                       ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Robert Pluim @ 2024-12-11 17:45 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv

>>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:

    Gerd> Eli Zaretskii <eliz@gnu.org> writes:
    >>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
    >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
    >>> Óscar Fuentes <ofv@wanadoo.es>
    >>> Date: Wed, 11 Dec 2024 16:33:18 +0100
    >>> 
    >>> Pip Cet <pipcet@protonmail.com> writes:
    >>> 
    >>> > This may be a very stupid idea, but why not use a separate process?
    >>> 
    >>> Not stupid at all. I thought about something similar in a different
    >>> context, namely if one could decouple the GUI part of Emacs from the
    >>> rest.
    >> 
    >> If it can be done by two processes, it can also be done by two threads
    >> in the same process.  Right?

    Gerd> Yes, I think so.

But then you have to throw a lock over all the memory in the
non-display thread that might affect redisplay (although come to think
of it, youʼd probably need that even when using fork)

Robert
-- 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:45                                     ` Robert Pluim
@ 2024-12-11 18:11                                       ` Gerd Möllmann
  2024-12-11 19:08                                       ` Eli Zaretskii
  1 sibling, 0 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 18:11 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv

Robert Pluim <rpluim@gmail.com> writes:

>>>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
>
>     Gerd> Eli Zaretskii <eliz@gnu.org> writes:
>     >>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>     >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>     >>> Óscar Fuentes <ofv@wanadoo.es>
>     >>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>     >>> 
>     >>> Pip Cet <pipcet@protonmail.com> writes:
>     >>> 
>     >>> > This may be a very stupid idea, but why not use a separate process?
>     >>> 
>     >>> Not stupid at all. I thought about something similar in a different
>     >>> context, namely if one could decouple the GUI part of Emacs from the
>     >>> rest.
>     >> 
>     >> If it can be done by two processes, it can also be done by two threads
>     >> in the same process.  Right?
>
>     Gerd> Yes, I think so.
>
> But then you have to throw a lock over all the memory in the
> non-display thread that might affect redisplay (although come to think
> of it, youʼd probably need that even when using fork)
>
> Robert

Well, it depends. Assume you have a solution that works in a second
process. That solution wouldn't use things in the first process because
it can't. Now move that code of the second process to the first process,
and make two threads out of the two process, and replace process
communication with inter-thread message passing like in an actor model.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-11 19:04                                     ` Eli Zaretskii
  2024-12-11 19:54                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 19:09                                     ` Gerd Möllmann
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 19:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv

> Date: Wed, 11 Dec 2024 17:41:29 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > If it can be done by two processes, it can also be done by two threads
> > in the same process.  Right?
> 
> AFAIU: No, not right.
> 
> I may have misunderstood, but if the idea is to preserve a consistent
> state of all Lisp data and buffer text for redisplay to use, the easiest
> way to ensure that consistency is fork().  The other ways, such as
> copying all heap objects that might be used by redisplay (and adjusting
> all internal pointers in such heap objects to point to the copy rather
> than the original data), probably will end up either being a lot slower
> or being very specific to the system we're running on.

How do you do the same in a forked process?  The glyph matrices are
not allocated once, they are reallocated constantly.  Are you going to
fork each time?  And if you are, how is it different from copying
stuff lazily within the same process, exactly like the OS does with
forked processes?

> I know that implementing fork() on Windows is very slow, and I don't
> know about a comparable snapshotting mechanism for Windows.

I'm not talking about Windows, I'm talking about Posix systems.

Anyway, the fact that redisplay calls Lisp and Lisp calls back into
redisplay all but kills this idea.  Gerd's document has also other
gotchas.  We didn't just give up easily back when we discussed that.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:45                                     ` Robert Pluim
  2024-12-11 18:11                                       ` Gerd Möllmann
@ 2024-12-11 19:08                                       ` Eli Zaretskii
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 19:08 UTC (permalink / raw)
  To: Robert Pluim; +Cc: gerd.moellmann, pipcet, emacs-devel, ofv

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,
>   emacs-devel@gnu.org,  ofv@wanadoo.es
> Date: Wed, 11 Dec 2024 18:45:15 +0100
> 
> >>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
> 
>     Gerd> Eli Zaretskii <eliz@gnu.org> writes:
>     >> 
>     >> If it can be done by two processes, it can also be done by two threads
>     >> in the same process.  Right?
> 
>     Gerd> Yes, I think so.
> 
> But then you have to throw a lock over all the memory in the
> non-display thread that might affect redisplay

No, you copy on write.  Exactly like the OS does with forked process.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 19:04                                     ` Eli Zaretskii
@ 2024-12-11 19:09                                     ` Gerd Möllmann
  2024-12-12  8:55                                       ` Robert Pluim
  1 sibling, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-11 19:09 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, emacs-devel, ofv

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>>  Óscar Fuentes <ofv@wanadoo.es>
>>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>> > This may be a very stupid idea, but why not use a separate process?
>>>
>>> Not stupid at all. I thought about something similar in a different
>>> context, namely if one could decouple the GUI part of Emacs from the
>>> rest.
>>
>> If it can be done by two processes, it can also be done by two threads
>> in the same process.  Right?
>
> AFAIU: No, not right.
>
> I may have misunderstood, but if the idea is to preserve a consistent
> state of all Lisp data and buffer text for redisplay to use, the easiest
> way to ensure that consistency is fork().  The other ways, such as
> copying all heap objects that might be used by redisplay (and adjusting
> all internal pointers in such heap objects to point to the copy rather
> than the original data), probably will end up either being a lot slower
> or being very specific to the system we're running on.

I may also be misunderstanding, but in principle, I agree with Eli.

Say we have processes A and B communicating with each other. Take the
code of A and move it to B, possibly with some automatic transformations
if A and B have the same source code. Make two threads in the result
process for A and B. Replace inter-process message passing with
inter-thread message passing. Initial message may be "fork" transferring
the world of thread A to thread B.

But I'm also thinking too abstract sometimes.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:04                                     ` Eli Zaretskii
@ 2024-12-11 19:54                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 20:26                                         ` Eli Zaretskii
  2024-12-11 22:07                                         ` Dmitry Gutov
  0 siblings, 2 replies; 35+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 19:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Wed, 11 Dec 2024 17:41:29 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > If it can be done by two processes, it can also be done by two threads
>> > in the same process.  Right?
>>
>> AFAIU: No, not right.
>>
>> I may have misunderstood, but if the idea is to preserve a consistent
>> state of all Lisp data and buffer text for redisplay to use, the easiest
>> way to ensure that consistency is fork().  The other ways, such as
>> copying all heap objects that might be used by redisplay (and adjusting
>> all internal pointers in such heap objects to point to the copy rather
>> than the original data), probably will end up either being a lot slower
>> or being very specific to the system we're running on.
>
> How do you do the same in a forked process?  The glyph matrices are
> not allocated once, they are reallocated constantly.  Are you going to
> fork each time?

Not necessarily "each time" (meaning once per frame/keystroke), but
quite frequently, yes.

> And if you are, how is it different from copying
> stuff lazily within the same process, exactly like the OS does with
> forked processes?

It is very different indeed:

Copying within a process involves changing the (virtual) addresses that
the copied data is at (unless you use an architecture-specific
implementation of TLS). The beauty of fork() is that the virtual
addresses stay the same, so we don't need to adjust any pointers, which
we cannot do because there are ambiguous references to Lisp data.

IOW, no, you can't lazily create two copies of Lisp data in the same
process. You have to do so eagerly, adjusting any and all pointers (and
only those) in the Lisp data before the new data is read for the first
time (because what you read might be a pointer, and then it needs to be
adjusted). With fork(), you only have to make the copy when the data is
being written to, by either process.

(Of course you can just access all memory through some sort of API that
translates addresses for you, but that would effectively mean we'd be
running Emacs on a virtual machine and simulate fork() on it).

> Anyway, the fact that redisplay calls Lisp and Lisp calls back into
> redisplay all but kills this idea.  Gerd's document has also other
> gotchas.  We didn't just give up easily back when we discussed that.

I don't see why the redisplay process would not be able to call Lisp;
it's a full Emacs process (with a single thread), except it doesn't have
an FD or socket for the window system, and has an extra pipe to
communicate with the parent process instead.

It's true that the side effects of the called Lisp code won't be visible
to the next redisplay process, but such side effects are perilous
anyway, and avoiding them would seem to me to be a feature, not a bug.

However, if such side effects are desired, we can use IPC to execute
Lisp in the main process (some effort) or simply send a "this redisplay
needs to happen synchronously" message to the main process, which would
kill the current redisplay process and perform a synchronous redisplay
(as not all operating systems support fork() reliably, we'll have to
retain the ability to redisplay synchronously, either way).

But, to be perfectly honest, I'm not sure redisplay is slowing me down
the way traditional GC is.

Pip

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:54                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-11 20:26                                         ` Eli Zaretskii
  2024-12-11 22:07                                         ` Dmitry Gutov
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-11 20:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv

> Date: Wed, 11 Dec 2024 19:54:07 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org, ofv@wanadoo.es
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > Anyway, the fact that redisplay calls Lisp and Lisp calls back into
> > redisplay all but kills this idea.  Gerd's document has also other
> > gotchas.  We didn't just give up easily back when we discussed that.
> 
> I don't see why the redisplay process would not be able to call Lisp;
> it's a full Emacs process (with a single thread)

So you are going to fork on each redisplay?

And how will you pass back the results of Lisp evaluation, if the
other process meanwhile changes the global state (as it's running
concurrently)?

> except it doesn't have
> an FD or socket for the window system, and has an extra pipe to
> communicate with the parent process instead.

Do you have an estimation of the throughput that such a pipe will need
to handle in order to support GUI display?  What will you send through
the pipe?  If you send only some kind of commands, then the other
process will need to generate the font glyphs in some way -- the same
glyphs that the "redisplay" process already produced.  And if you
intend to send the pixels, that would be too much traffic, I think.
And again, the global state of the receiving process could have
changed, which means any high-level data might be useless (e.g., using
a font that was unloaded).

> It's true that the side effects of the called Lisp code won't be visible
> to the next redisplay process, but such side effects are perilous
> anyway, and avoiding them would seem to me to be a feature, not a bug.

In Emacs, they are a feature, and are expected to work.  You'd be
surprised to see how many packages and user code rely on that.

> But, to be perfectly honest, I'm not sure redisplay is slowing me down
> the way traditional GC is.

It's the other way around: the Lisp machine blocks user interaction,
including the UI and display.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:54                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 20:26                                         ` Eli Zaretskii
@ 2024-12-11 22:07                                         ` Dmitry Gutov
  1 sibling, 0 replies; 35+ messages in thread
From: Dmitry Gutov @ 2024-12-11 22:07 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv

On 11/12/2024 21:54, Pip Cet via Emacs development discussions. wrote:

> It is very different indeed:
> 
> Copying within a process involves changing the (virtual) addresses that
> the copied data is at (unless you use an architecture-specific
> implementation of TLS). The beauty of fork() is that the virtual
> addresses stay the same, so we don't need to adjust any pointers, which
> we cannot do because there are ambiguous references to Lisp data.
> 
> IOW, no, you can't lazily create two copies of Lisp data in the same
> process. You have to do so eagerly, adjusting any and all pointers (and
> only those) in the Lisp data before the new data is read for the first
> time (because what you read might be a pointer, and then it needs to be
> adjusted). With fork(), you only have to make the copy when the data is
> being written to, by either process.
> 
> (Of course you can just access all memory through some sort of API that
> translates addresses for you, but that would effectively mean we'd be
> running Emacs on a virtual machine and simulate fork() on it).

Could one really avoid global interpreter lock using fork()? It doesn't 
sound right: even if you get cheap snapshotting using the underlying 
OS's mechanisms, could we guarantee that the snapshot is "consistent" 
from the Lisp VM's point of view. Or if Lisp were not in the picture, 
that the data structures are consistent anyway, unless the second 
process adheres to the first one's locks anyway.

> But, to be perfectly honest, I'm not sure redisplay is slowing me down
> the way traditional GC is.

IMO redisplay is not the problem most of time indeed. Sometimes it is, 
but I'm not sure parallelizing the rendering is the best answer in those 
scenarios either.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:06                             ` Marcus Harnisch
@ 2024-12-11 22:11                               ` Dmitry Gutov
  2024-12-12  3:49                                 ` Gerd Möllmann
  2024-12-12  6:01                                 ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Dmitry Gutov @ 2024-12-11 22:11 UTC (permalink / raw)
  To: Marcus Harnisch, emacs-devel

On 11/12/2024 17:06, Marcus Harnisch wrote:
> On 11/12/2024 14.27, Gerd Möllmann wrote:
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>>> seems accurate to me.
>>
>> Thanks! Added to my collection.
> 
> You may be interested in this article, too, which refererences the blog 
> post above:
> https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/

To quote from the bottom of the article:

   The way I see it, gap buffers are better for searching and memory
   usage, but ropes are better at non-local editing patterns. Despite
   their simplicity, gap buffers can hold their own in the modern world.
   Maybe Emacs was on to something.

This is also my takeaway from reading a number of other texts on the 
subject (not benchmarking personally, though, TBF).



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 22:11                               ` Dmitry Gutov
@ 2024-12-12  3:49                                 ` Gerd Möllmann
  2024-12-12 19:07                                   ` Dmitry Gutov
  2024-12-12  6:01                                 ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-12  3:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel

Dmitry Gutov <dmitry@gutov.dev> writes:

> On 11/12/2024 17:06, Marcus Harnisch wrote:
>> On 11/12/2024 14.27, Gerd Möllmann wrote:
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>>>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>>>> seems accurate to me.
>>>
>>> Thanks! Added to my collection.
>> You may be interested in this article, too, which refererences the
>> blog post above:
>> https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/
>
> To quote from the bottom of the article:
>
>   The way I see it, gap buffers are better for searching and memory
>   usage, but ropes are better at non-local editing patterns. Despite
>   their simplicity, gap buffers can hold their own in the modern world.
>   Maybe Emacs was on to something.
>
> This is also my takeaway from reading a number of other texts on the
> subject (not benchmarking personally, though, TBF).

The Zed editor, which is heavily performance-oriented, decided to use
ropes. They have are a number of blog entries that I find interesting,
for example

  https://zed.dev/blog/zed-decoded-rope-sumtree

VSCode uses persistent piece tables

  https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 22:11                               ` Dmitry Gutov
  2024-12-12  3:49                                 ` Gerd Möllmann
@ 2024-12-12  6:01                                 ` Eli Zaretskii
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-12  6:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: mh-gmane, emacs-devel, Pip Cet

> Date: Thu, 12 Dec 2024 00:11:54 +0200
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> > https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/
> 
> To quote from the bottom of the article:
> 
>    The way I see it, gap buffers are better for searching and memory
>    usage, but ropes are better at non-local editing patterns. Despite
>    their simplicity, gap buffers can hold their own in the modern world.
>    Maybe Emacs was on to something.
> 
> This is also my takeaway from reading a number of other texts on the 
> subject (not benchmarking personally, though, TBF).

Yes.  But one important aspect that blog doesn't touch is the
potential effect of changing the buffer text data structure on the
various Emacs display issues.  Some problems in the current display
code that cause slow redisplay in some situations (mainly, very long
lines) cannot really be solved as long as we stay with buffer text
stored as a long C string, with or without the gap.  This important
aspect of Emacs still awaits serious research of possible
alternatives, IMO.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:09                                     ` Gerd Möllmann
@ 2024-12-12  8:55                                       ` Robert Pluim
  2024-12-12 10:14                                         ` Gerd Möllmann
  0 siblings, 1 reply; 35+ messages in thread
From: Robert Pluim @ 2024-12-12  8:55 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv

>>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:

    Gerd> I may also be misunderstanding, but in principle, I agree with Eli.

    Gerd> Say we have processes A and B communicating with each other. Take the
    Gerd> code of A and move it to B, possibly with some automatic transformations
    Gerd> if A and B have the same source code. Make two threads in the result
    Gerd> process for A and B. Replace inter-process message passing with
    Gerd> inter-thread message passing. Initial message may be "fork" transferring
    Gerd> the world of thread A to thread B.

Your first sentence is doing a lot of lifting there :-)

In the two process situation, you automatically have two copies of
every object, so you donʼt need to ensure that the processes are not
stepping on each other. In the two thread situation you donʼt have
that guarantee, unless you have *first* ensured (via locking, or COW,
etc) that the object states are consistent. Once youʼve done that,
then you can use messaging to keep things synchronized (or again,
locking etc)

Robert
-- 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-12  8:55                                       ` Robert Pluim
@ 2024-12-12 10:14                                         ` Gerd Möllmann
  0 siblings, 0 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-12 10:14 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv

Robert Pluim <rpluim@gmail.com> writes:

>>>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
>
>     Gerd> I may also be misunderstanding, but in principle, I agree with Eli.
>
>     Gerd> Say we have processes A and B communicating with each other. Take the
>     Gerd> code of A and move it to B, possibly with some automatic transformations
>     Gerd> if A and B have the same source code. Make two threads in the result
>     Gerd> process for A and B. Replace inter-process message passing with
>     Gerd> inter-thread message passing. Initial message may be "fork" transferring
>     Gerd> the world of thread A to thread B.
>
> Your first sentence is doing a lot of lifting there :-)
>
> In the two process situation, you automatically have two copies of
> every object, so you donʼt need to ensure that the processes are not
> stepping on each other. In the two thread situation you donʼt have
> that guarantee, unless you have *first* ensured (via locking, or COW,
> etc) that the object states are consistent. Once youʼve done that,
> then you can use messaging to keep things synchronized (or again,
> locking etc)
>
> Robert

Sure, a bit of effort is left to the implementer :-).



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-12  3:49                                 ` Gerd Möllmann
@ 2024-12-12 19:07                                   ` Dmitry Gutov
  2024-12-12 19:30                                     ` Eli Zaretskii
  2024-12-12 19:40                                     ` Gerd Möllmann
  0 siblings, 2 replies; 35+ messages in thread
From: Dmitry Gutov @ 2024-12-12 19:07 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Marcus Harnisch, emacs-devel

On 12/12/2024 05:49, Gerd Möllmann wrote:

> The Zed editor, which is heavily performance-oriented, decided to use
> ropes. They have are a number of blog entries that I find interesting,
> for example
> 
>    https://zed.dev/blog/zed-decoded-rope-sumtree

IIUC their goal there was a use a data structure that can do everything.

They also have an ambition to support live collaboration, which we don't 
have anything for, and not for the reasons of performance.

> VSCode uses persistent piece tables
> 
>    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

And this article compared to the previous "array of strings" implementation.

Both editors' data structures (not ropes) seem to have something that 
can be used like our "newline cache", so if anything I would try to 
understand whether either has an advantage in that area.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-12 19:07                                   ` Dmitry Gutov
@ 2024-12-12 19:30                                     ` Eli Zaretskii
  2024-12-12 19:40                                     ` Gerd Möllmann
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2024-12-12 19:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: gerd.moellmann, mh-gmane, emacs-devel

> Date: Thu, 12 Dec 2024 21:07:25 +0200
> Cc: Marcus Harnisch <mh-gmane@online.de>, emacs-devel@gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
> 
> And this article compared to the previous "array of strings" implementation.
> 
> Both editors' data structures (not ropes) seem to have something that 
> can be used like our "newline cache", so if anything I would try to 
> understand whether either has an advantage in that area.

Our newline cache doesn't really help to solve the problems in the
display engine for which it would be good to find an alternative to
gap buffer.  So I wouldn't waste time on thinking how to replace the
newline cache.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-12 19:07                                   ` Dmitry Gutov
  2024-12-12 19:30                                     ` Eli Zaretskii
@ 2024-12-12 19:40                                     ` Gerd Möllmann
  1 sibling, 0 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-12 19:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel

Dmitry Gutov <dmitry@gutov.dev> writes:

> On 12/12/2024 05:49, Gerd Möllmann wrote:
>
>> The Zed editor, which is heavily performance-oriented, decided to use
>> ropes. They have are a number of blog entries that I find interesting,
>> for example
>>    https://zed.dev/blog/zed-decoded-rope-sumtree
>
> IIUC their goal there was a use a data structure that can do everything.
>
> They also have an ambition to support live collaboration, which we
> don't have anything for, and not for the reasons of performance.

I don't find it surprising that being fast isn't the only requirement
for a modern editor, TBH. Thing is that from what I observe, Zed seems
to have no problem with long lines. But maybe that's a wrong
observation. I never did anything that required long-line support.

Point I was trying to make is that apparently ropes can support long
lines. One could maybe learn from what they do; Rust sources are
available, Of course, additional data structures like ones to make
positions to actual text and so on belong into the picture. It not just
using ropes, but in this case the rope is the basic data structure, and
additional data structures depend on the properties of that.

>
>> VSCode uses persistent piece tables
>>    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
>
> And this article compared to the previous "array of strings" implementation.
>

And shows that they are using piece tables, the alternative to ropes, if
one doesn't count the other two which are gap buffer and set of lines
for the moment. And I think they got long-lines support working, too.
And again there are additional data structure involved, and blah blah.

> Both editors' data structures (not ropes) seem to have something that
> can be used like our "newline cache", so if anything I would try to
> understand whether either has an advantage in that area.

For me that would be a too narrow perspective, I must admit. But I won't
do anything in that area anyway, so please ignore me :-).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
@ 2024-12-12 23:40 arthur miller
  2024-12-13  3:19 ` Gerd Möllmann
  0 siblings, 1 reply; 35+ messages in thread
From: arthur miller @ 2024-12-12 23:40 UTC (permalink / raw)
  To: gerd.moellmann@gmail.com; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]

>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: emacs-devel@gnu.org,  ofv@wanadoo.es,  pipcet@protonmail.com
>>> Date: Wed, 11 Dec 2024 16:51:56 +0100
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
>>> > moves point, then inserts a small number of characters, then moves
>>> > point far away and again inserts a small number of characters, etc.,
>>> > I'd be very surprised if the gap buffer caused significant performance
>>> > problems on a modern CPU.
>>> >
>>> > Can you profile that case and post the expanded profile?  I'm always
>>> > happy to be wrong about performance bottlenecks, and profiles are good
>>> > at proving me wrong.
>>>
>>> Maybe I'll try to investigate that further at some point. Such things
>>> always tend to be so time consuming...
>>
>> I meant profiling with "M-x profile-start", then run your slow-down
>> recipe.  That should be easy and should not consume any significant
>> time.  Analyzing the profile could, but producing it shouldn't.
>
>Plus making it reproducible, if it is.

There is a paper by R. Strandh & others from they work on Climacs.

They have, by now a 20 old year Flexichain implements a circular gap
buffer with the explicit goal to workaround that case. At the same time
it also turns gap buffer into a flexible array usable for other use-cases
like qeues and stacks.

However, they have also gone away from the gap buffer, to something they
call "Cluffer", and which is a strategy where they use a double
linked list of lines. The "open" line is a gap buffer. As they say
in the introduction, the similiar strategy was used in Multics Emacs.
The idea is to allow for incremental editing, a lá
tree sitter I guess, and it is also suitable for multiple cursors. They
also say that for modern hardware the additional memory cost of double
linked lists is not prohibitive.

For those interested relevant papers and sources are here:

https://flexichain.common-lisp.dev/download/StrandhVilleneuveMoore.pdf
https://www.european-lisp-workshop.org/archives/2004/slides/Strandh-slides.pdf
https://github.com/robert-strandh/Flexichain

http://metamodular.com/cluffer.pdf
https://hal.science/hal-01887230v1/file/5-incremental-parsing.pdf
https://research.gold.ac.uk/id/eprint/2351/1/climacssyntax.pdf
https://github.com/robert-strandh/Cluffer

https://github.com/scymtym/text.editing <-- a text editor sans the
application based on the Cluffer

https://github.com/robert-strandh/Second-Climacs <-- text editor
application based on the above text.editing and Cluffer

There is also Lem which uses a similar strategy for its text representation
(if I am not misstaken, long time ago I looked at their code):

https://github.com/lem-project/lem
[https://opengraph.githubassets.com/936483c1586b682884e2dfee10ced041f6306d29afd3df15a26830cc9eb47f65/lem-project/lem]<https://github.com/lem-project/lem>
GitHub
 - lem-project/lem: Common Lisp editor/IDE with high expansibility<https://github.com/lem-project/lem>
Common
 Lisp editor/IDE with high expansibility. Contribute to lem-project/lem development by creating an account on GitHub.
github.com

Don't know if it is of use for you or not, but perhaps there is some
inspiration there. Haven't seen those papers mentioned anywhere
in this discussion, so thought they might be of interest to some of you.

[-- Attachment #2: Type: text/html, Size: 14511 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Gap buffer problem?
  2024-12-12 23:40 Gap buffer problem? arthur miller
@ 2024-12-13  3:19 ` Gerd Möllmann
  0 siblings, 0 replies; 35+ messages in thread
From: Gerd Möllmann @ 2024-12-13  3:19 UTC (permalink / raw)
  To: arthur miller; +Cc: emacs-devel@gnu.org

arthur miller <arthur.miller@live.com> writes:

> There is a paper by R. Strandh & others from they work on Climacs. 
>
> They have, by now a 20 old year Flexichain implements a circular gap
> buffer with the explicit goal to workaround that case. At the same time
> it also turns gap buffer into a flexible array usable for other use-cases 
> like qeues and stacks.
>
> However, they have also gone away from the gap buffer, to something they
> call "Cluffer", and which is a strategy where they use a double
> linked list of lines. The "open" line is a gap buffer. As they say
> in the introduction, the similiar strategy was used in Multics Emacs.
> The idea is to allow for incremental editing, a lá
> tree sitter I guess, and it is also suitable for multiple cursors. They
> also say that for modern hardware the additional memory cost of double
> linked lists is not prohibitive.
>
> For those interested relevant papers and sources are here:
>
> https://flexichain.common-lisp.dev/download/StrandhVilleneuveMoore.pdf
> https://www.european-lisp-workshop.org/archives/2004/slides/Strandh-slides.pdf
> https://github.com/robert-strandh/Flexichain
>
> http://metamodular.com/cluffer.pdf
> https://hal.science/hal-01887230v1/file/5-incremental-parsing.pdf
> https://research.gold.ac.uk/id/eprint/2351/1/climacssyntax.pdf
> https://github.com/robert-strandh/Cluffer
>
> https://github.com/scymtym/text.editing <-- a text editor sans the
> application based on the Cluffer
>
> https://github.com/robert-strandh/Second-Climacs <-- text editor
> application based on the above text.editing and Cluffer
>
> There is also Lem which uses a similar strategy for its text representation
> (if I am not misstaken, long time ago I looked at their code):
>
> https://github.com/lem-project/lem
>
>  *  GitHub  
>     - lem-project/lem: Common Lisp editor/IDE with high expansibility  
>    Common  
>     Lisp editor/IDE with high expansibility. Contribute to lem-project/lem development by creating an account on GitHub.  
>    github.com  
> *
>
> Don't know if it is of use for you or not, but perhaps there is some
> inspiration there. Haven't seen those papers mentioned anywhere
> in this discussion, so thought they might be of interest to some of you.

Thanks for the links! I find them interesting. 



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-12-13  3:19 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12 23:40 Gap buffer problem? arthur miller
2024-12-13  3:19 ` Gerd Möllmann
     [not found] <mailman.39.1723910423.12184.emacs-devel@gnu.org>
2024-12-08 16:49 ` pdumper on Solaris 10 Eli Zaretskii
2024-12-08 17:37   ` Pip Cet via Emacs development discussions.
2024-12-08 18:41     ` Eli Zaretskii
2024-12-09  4:59       ` Stefan Kangas
2024-12-09 14:39         ` Eli Zaretskii
2024-12-10  0:09           ` Stefan Kangas
2024-12-10 12:59             ` Eli Zaretskii
2024-12-10 13:39               ` Óscar Fuentes
2024-12-10 15:38                 ` Pip Cet via Emacs development discussions.
2024-12-11  5:27                   ` Gap buffer problem? Gerd Möllmann
2024-12-11  8:50                     ` Pip Cet via Emacs development discussions.
2024-12-11  9:35                       ` Gerd Möllmann
2024-12-11 11:50                         ` Pip Cet via Emacs development discussions.
2024-12-11 13:22                           ` Gerd Möllmann
2024-12-11 14:53                             ` Pip Cet via Emacs development discussions.
2024-12-11 15:33                               ` Gerd Möllmann
2024-12-11 16:58                                 ` Eli Zaretskii
2024-12-11 17:13                                   ` Gerd Möllmann
2024-12-11 17:45                                     ` Robert Pluim
2024-12-11 18:11                                       ` Gerd Möllmann
2024-12-11 19:08                                       ` Eli Zaretskii
2024-12-11 17:41                                   ` Pip Cet via Emacs development discussions.
2024-12-11 19:04                                     ` Eli Zaretskii
2024-12-11 19:54                                       ` Pip Cet via Emacs development discussions.
2024-12-11 20:26                                         ` Eli Zaretskii
2024-12-11 22:07                                         ` Dmitry Gutov
2024-12-11 19:09                                     ` Gerd Möllmann
2024-12-12  8:55                                       ` Robert Pluim
2024-12-12 10:14                                         ` Gerd Möllmann
2024-12-11 12:27                         ` Pip Cet via Emacs development discussions.
2024-12-11 13:27                           ` Gerd Möllmann
2024-12-11 15:06                             ` Marcus Harnisch
2024-12-11 22:11                               ` Dmitry Gutov
2024-12-12  3:49                                 ` Gerd Möllmann
2024-12-12 19:07                                   ` Dmitry Gutov
2024-12-12 19:30                                     ` Eli Zaretskii
2024-12-12 19:40                                     ` Gerd Möllmann
2024-12-12  6:01                                 ` Eli Zaretskii
2024-12-11 14:22                     ` Eli Zaretskii
2024-12-11 15:51                       ` Gerd Möllmann
2024-12-11 17:06                         ` Eli Zaretskii
2024-12-11 17:15                           ` Gerd Möllmann

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.