line buffer as Red Black Trees instead of linear

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* line buffer as Red Black Trees instead of linear
@ 2014-05-15 14:33 Alin Soare
  2014-05-15 17:15 ` Alin Soare
  2014-05-15 19:17 ` Stefan Monnier
  0 siblings, 2 replies; 22+ messages in thread
From: Alin Soare @ 2014-05-15 14:33 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2755 bytes --]

Hi,

I am writing a tutorial about red-black trees, in which I present an
recursive formula that I deduced for left leaning red-black trees . I
implemented it in java. In emacs the red-black trees are already used in
this moment for keeping the  memory segments of lisp symbols.

An application of red-black trees in emacs, which would drastically improve
the redisplay system, would be to keep the text of each line as a rbt.

I give a rough explanation of the basic operations, without having studied
recently the details of buffer implementation.

In emacs the buffers text contents is kept in memory linearly at the
address buffer_text->beg.

One could keep as well the text from each line of the buffer into a
red-black tree.

Each node will contain

* a buffer with a part of the text (for gap an empty text)
* a pointer to the next node, such that the search operation to be done in
constant time
* the number of characters of text are on the nodes at left of the current
node

The speed of the basic operations will be so:

1. search for text (remain the same)
   -- it will remain linear as it is now.

2. seach for a given offset into a line (almost no change)
   -- this will be logarithmic , logarithm of the length of the line
   -- in this moment I think it's constant time, so no visible change

3. insert a new gap (major improvement)
   -- this will be logarithmic in time, and no need to move memory
segments. To insert a new gap at offset X of a line, we start at the root,
if X > number of characters at the left of root we go to right, ifnot, we
either move to left or insert a new node and move the current node to the
left of the new node.

4. insert a new text into the buffer. (the same)
   -- This will insert into the gap from the point, and when the gap is
filled, a new gap is created.

5. remove text.
   -- this supposes a lot of things, like re-checking the values of all
values of properties at the left and right, so many changes, but it will
remain as fast as in this moment.

6. merging 2 lines means to combine the text from 2 red-black trees, and
this is done in constant time, with no need to move memory segments.

The text properties, overlays, markers, etc will remain almost the same as
now. Or they can be moved on the node of a corresponding text. In this
moment these are kept into lists, but one can move the given properties
into the node that keeps the corresponding text.

The whole buffer would be encoded as a list of lines, and each line is
encoded as a red-black tree, as I described.

This will make fast operations involving the gap, the red-black trees are
quite easy to implement, but I cannot see now the bad side effects involved
by such a change. The major improvement would be the redisplay.

[-- Attachment #2: Type: text/html, Size: 3502 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 14:33 line buffer as Red Black Trees instead of linear Alin Soare
@ 2014-05-15 17:15 ` Alin Soare
  2014-05-15 19:17 ` Stefan Monnier
  1 sibling, 0 replies; 22+ messages in thread
From: Alin Soare @ 2014-05-15 17:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

>
>
> The whole buffer would be encoded as a list of lines, and each line is
> encoded as a red-black tree, as I described.
>
> This will make fast operations involving the gap, the red-black trees are
> quite easy to implement, but I cannot see now the bad side effects involved
> by such a change. The major improvement would be the redisplay.
>
>
>
As a completion, I want to add that the whole text of a buffer can be kept
in a single rbt, and at each node one can add an integer attribute that
keeps the number of lines at the left of the node, such that searching for
a particular line offset to be done in logarithmic time.

All the operations would be in constant or logarithmic time, but the major
improvement are the operations on buffer gap, which avoids memory move.

[-- Attachment #2: Type: text/html, Size: 1230 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 14:33 line buffer as Red Black Trees instead of linear Alin Soare
  2014-05-15 17:15 ` Alin Soare
@ 2014-05-15 19:17 ` Stefan Monnier
  2014-05-15 19:47   ` Alin Soare
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2014-05-15 19:17 UTC (permalink / raw)
  To: Alin Soare; +Cc: emacs-devel

> The major improvement would be the redisplay.

Actually, no.  Redisplay scans the buffer linearly, so it would
be unaffected.


        Stefan



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 19:17 ` Stefan Monnier
@ 2014-05-15 19:47   ` Alin Soare
  2014-05-15 20:06     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Alin Soare @ 2014-05-15 19:47 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

>
> > The major improvement would be the redisplay.
>
> Actually, no.  Redisplay scans the buffer linearly, so it would
> be unaffected.
>
>
I did not check recently which operations are called by redisplay, in order
to see exactly where redisplay and other operations get stuck. But emacs
works terrible with long lines.

Emacs works bad with buffers with long lines. This would work well, and all
the others atomic operations, like random access, search, will preserve the
current speed.

Try this:

(dotimes (j 10000)
  (dotimes (i 10000)
    (insert 1000) ) )

This will insert a random unicode character many times in each line, and
many lines.

Try to do these operations a few times: end-of-buffer (M->) and beginning
of buffer (M-<).

These are problems that arise with long lines, in which redisplay gets
stuck. I did not check recently the atomic operations on buffer data
structure, where emacs gets stuck, but I am sure that redisplay gets stuck
in buffer data structure.

For me the rediaplay works bad when a program outputs long lines in
*shell*  -- probably because it modify the gap many times, and redisplay
gets stuck. Konsole of KDE for example, works perfectly with any kind of
outputs from any process.

[-- Attachment #2: Type: text/html, Size: 1915 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 19:47   ` Alin Soare
@ 2014-05-15 20:06     ` Eli Zaretskii
  2014-05-15 20:38       ` Slow redisplay with long lines Stefan Monnier
  2014-05-15 23:40       ` line buffer as Red Black Trees instead of linear Alin Soare
  0 siblings, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-15 20:06 UTC (permalink / raw)
  To: Alin Soare; +Cc: monnier, emacs-devel

> Date: Thu, 15 May 2014 22:47:27 +0300
> From: Alin Soare <as1789@gmail.com>
> 
> > > The major improvement would be the redisplay.
> >
> > Actually, no.  Redisplay scans the buffer linearly, so it would
> > be unaffected.
> >
> >
> I did not check recently which operations are called by redisplay, in order
> to see exactly where redisplay and other operations get stuck. But emacs
> works terrible with long lines.
> 
> Emacs works bad with buffers with long lines. This would work well, and all
> the others atomic operations, like random access, search, will preserve the
> current speed.

You cannot help redisplay with long lines by giving it random access
to buffer text.

Long lines slow down redisplay because it needs to scan the entire
line to see how tall it will be on display.  For that, you must scan
the line in its entirety, since Emacs supports variable-size fonts and
images on the same line, so determining the height of a line requires
to find the largest display element (character glyph or image)
displayed on that line.

How can random access to buffer text help here?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Slow redisplay with long lines
  2014-05-15 20:06     ` Eli Zaretskii
@ 2014-05-15 20:38       ` Stefan Monnier
  2014-05-15 20:57         ` Stephan Mueller
  2014-05-16  5:39         ` Eli Zaretskii
  2014-05-15 23:40       ` line buffer as Red Black Trees instead of linear Alin Soare
  1 sibling, 2 replies; 22+ messages in thread
From: Stefan Monnier @ 2014-05-15 20:38 UTC (permalink / raw)
  To: emacs-devel

> Long lines slow down redisplay because it needs to scan the entire
> line to see how tall it will be on display.

Why should it care about the whole line rather than only the part
of it that's displayed?


        Stefan "who presumes there are several other reasons why long
                lines cause severe performance issues"



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Slow redisplay with long lines
  2014-05-15 20:38       ` Slow redisplay with long lines Stefan Monnier
@ 2014-05-15 20:57         ` Stephan Mueller
  2014-05-16  5:40           ` Eli Zaretskii
  2014-05-16  5:39         ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Stephan Mueller @ 2014-05-15 20:57 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel@gnu.org

" > Long lines slow down redisplay because it needs to scan the entire
" > line to see how tall it will be on display.
"
" Why should it care about the whole line rather than only the part
" of it that's displayed?

So that the line height doesn't change when scrolling horizontally
and a taller character comes into view?

stephan();

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 20:06     ` Eli Zaretskii
  2014-05-15 20:38       ` Slow redisplay with long lines Stefan Monnier
@ 2014-05-15 23:40       ` Alin Soare
  2014-05-16  5:41         ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Alin Soare @ 2014-05-15 23:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]

>
>
> >
> > Emacs works bad with buffers with long lines. This would work well, and
> all
> > the others atomic operations, like random access, search, will preserve
> the
> > current speed.
>
> You cannot help redisplay with long lines by giving it random access
> to buffer text.
>

This is right. Redisplay is one, and computing the result of some functions
is another thing.

I did not check the flow of the calls that take place in that situation,
but I am sure that it happens at the level of buffer access, Some atomic
procedures of access to buffer are linear in time, instead of constant or
logarithmic in the worse case.

I provided a solution in which all atomic accesses to buffer are in good
time.

>
> Long lines slow down redisplay because it needs to scan the entire
> line to see how tall it will be on display.  For that, you must scan
> the line in its entirety, since Emacs supports variable-size fonts and
> images on the same line, so determining the height of a line requires
> to find the largest display element (character glyph or image)
> displayed on that line.
>

If we put at each node of a red-black tree an attribute which says how tall
the block of text at that node is  --  (for example, if there are 2K
characters at that block, all identical, we already computed 2K of all the
line, and pass to the next node, where there may be 5K or 0.5K characters,
and it is GUARANTEED that in the worse case we have log # of nodes ).

Then re-computing the glyph size will be logarithmic in the worse case,
because we need to check maximum a log # of nodes, to detect all the
properties of a line.

Each node must also have an attribute that says how many lines are at the
left of that node, another attrib that says how many chars are at the left,
and each node must also have a pointer to the next node, such that the
nondeterministic finite automata that use lazy search (for example in
search-forward) will take constant time on each change of state (as it is
now).

>
> How can random access to buffer text help here?
>

I repeat : it is very long since I read the code of emacs, so I cannot
remember the details of all the operations, but while I was working on the
tutorial in which I present the details how I deduced the formla of
red-black trees, I realized that these are the wonderful data structure to
work with in emacs buffers.

I knew that emacs loops in buffer data structure, hence the slow operations
in the cases that I said in my prev. message .

[-- Attachment #2: Type: text/html, Size: 3491 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Slow redisplay with long lines
  2014-05-15 20:38       ` Slow redisplay with long lines Stefan Monnier
  2014-05-15 20:57         ` Stephan Mueller
@ 2014-05-16  5:39         ` Eli Zaretskii
  1 sibling, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-16  5:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Thu, 15 May 2014 16:38:52 -0400
> 
> > Long lines slow down redisplay because it needs to scan the entire
> > line to see how tall it will be on display.
> 
> Why should it care about the whole line rather than only the part
> of it that's displayed?

It does indeed care only about the displayed part of the line.  At
least it tries to, although sometimes it is hard.  E.g., when you need
to go back, you need to start from the beginning of the previous line,
because only there you know you are at column zero.

>         Stefan "who presumes there are several other reasons why long
>                 lines cause severe performance issues"

The need to traverse the long lines is the single most import one,
because it comes into play at the lowest level of display iteration,
and so pops up in many higher-level operations.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Slow redisplay with long lines
  2014-05-15 20:57         ` Stephan Mueller
@ 2014-05-16  5:40           ` Eli Zaretskii
  2014-05-16 18:08             ` Stephan Mueller
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-16  5:40 UTC (permalink / raw)
  To: Stephan Mueller; +Cc: monnier, emacs-devel

> From: Stephan Mueller <Stephan.Mueller@microsoft.com>
> Date: Thu, 15 May 2014 20:57:08 +0000
> 
> " > Long lines slow down redisplay because it needs to scan the entire
> " > line to see how tall it will be on display.
> "
> " Why should it care about the whole line rather than only the part
> " of it that's displayed?
> 
> So that the line height doesn't change when scrolling horizontally
> and a taller character comes into view?

No, horizontal scrolling causes another redisplay, which examines the
characters that came into view at that time.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-15 23:40       ` line buffer as Red Black Trees instead of linear Alin Soare
@ 2014-05-16  5:41         ` Eli Zaretskii
  2014-05-16  7:44           ` Alin Soare
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-16  5:41 UTC (permalink / raw)
  To: Alin Soare; +Cc: monnier, emacs-devel

> Date: Fri, 16 May 2014 02:40:00 +0300
> From: Alin Soare <as1789@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > Emacs works bad with buffers with long lines. This would work well, and
> > all
> > > the others atomic operations, like random access, search, will preserve
> > the
> > > current speed.
> >
> > You cannot help redisplay with long lines by giving it random access
> > to buffer text.
> >
> 
> 
> This is right. Redisplay is one, and computing the result of some functions
> is another thing.
> 
> I did not check the flow of the calls that take place in that situation,
> but I am sure that it happens at the level of buffer access, Some atomic
> procedures of access to buffer are linear in time, instead of constant or
> logarithmic in the worse case.

Not in redisplay.  It always moves to the next character, and
remembers the last place it was at, so it mostly only needs to advance
to the next character.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-16  5:41         ` Eli Zaretskii
@ 2014-05-16  7:44           ` Alin Soare
  2014-05-16  8:28             ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Alin Soare @ 2014-05-16  7:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

>
>
> > logarithmic in the worse case.
>
> Not in redisplay.  It always moves to the next character, and
> remembers the last place it was at, so it mostly only needs to advance
> to the next character.
>

Whatever are the functions that scan the whole line of a buffer, slowing
emacs, this can be improved if the values are precomputed and kept at the
tree nodes, and when some change is made at a property, like the font, it
will affect only some nodes, where it is precomputed.

The solution of this problem is the red-black trees, by inserting at each
node all the pre-computed characteristics of the block of text kept at the
given node. Any modification will pre-compute a finite number of nodes, and
when need to access them , only a logarithmic # of nodes will be taken in
consideration for all possible operations.

[-- Attachment #2: Type: text/html, Size: 1305 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-16  7:44           ` Alin Soare
@ 2014-05-16  8:28             ` Eli Zaretskii
  2014-05-16  8:53               ` Alin Soare
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-16  8:28 UTC (permalink / raw)
  To: Alin Soare; +Cc: monnier, emacs-devel

> Date: Fri, 16 May 2014 10:44:44 +0300
> From: Alin Soare <as1789@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > logarithmic in the worse case.
> >
> > Not in redisplay.  It always moves to the next character, and
> > remembers the last place it was at, so it mostly only needs to advance
> > to the next character.
> 
> Whatever are the functions that scan the whole line of a buffer, slowing
> emacs, this can be improved if the values are precomputed and kept at the
> tree nodes, and when some change is made at a property, like the font, it
> will affect only some nodes, where it is precomputed.

The values that need to be precomputed are the metrics of each
"display element" (character glyph, stretch glyph, image glyph, etc.)
that will (or would) be displayed when/if the line is shown on the
screen.  Your suggestion calls for precomputing those in advance,
while the current implementation computes them only when they are
actually needed.  I very much doubt that your suggestion, if
implemented, will be a net win.

It might make sense to cache the values, once they are computed, so
that any subsequent routines that need to access the same portions of
buffer text won't need to recompute the same metrics.  But before this
kind of caching is attempted, Someone(TM) should demonstrate that the
same portions of the long lines are traversed by the display engine
more than once, because otherwise what you suggest will be pure
overhead with no hope of any gains.

> The solution of this problem is the red-black trees, by inserting at each
> node all the pre-computed characteristics of the block of text kept at the
> given node. Any modification will pre-compute a finite number of nodes, and
> when need to access them , only a logarithmic # of nodes will be taken in
> consideration for all possible operations.

"Premature optimization is the root of all evil."  (Donald Knuth)

I would suggest to study the relevant code and, more importantly, time
it to find the _real_ (as opposed to imaginary) hot spots.  Then
present that data, and we could thereafter discuss whether your
suggested solution might indeed speed up the code which takes up the
lion's share of the time when Emacs needs to display very long lines.

If indeed these techniques could help, patches will be very welcome,
as this is an annoying problem, and there's a bug report for it (see
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13675).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-16  8:28             ` Eli Zaretskii
@ 2014-05-16  8:53               ` Alin Soare
  2014-05-16  9:24                 ` Thien-Thi Nguyen
  2014-05-21  8:34                 ` Alin Soare
  0 siblings, 2 replies; 22+ messages in thread
From: Alin Soare @ 2014-05-16  8:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3021 bytes --]

>
> The values that need to be precomputed are the metrics of each
> "display element" (character glyph, stretch glyph, image glyph, etc.)
> that will (or would) be displayed when/if the line is shown on the
> screen.  Your suggestion calls for precomputing those in advance,
> while the current implementation computes them only when they are
> actually needed.  I very much doubt that your suggestion, if
> implemented, will be a net win.
>
>
There are many possibilities.

In the moment when we need to access them (lazy evaluation) or when we
modify them, one can split a node in many nodes, such that all the block of
text at a given node will have the same properties, so one can compute any
property quickly.


In the worse case, when each character of a buffer has a distinct property
than its neighbours, a buffer will have a rbt with 1 charater on each node,
in which case the timing will be log(length(buffer->text)) for all
operations.




> It might make sense to cache the values, once they are computed, so
> that any subsequent routines that need to access the same portions of
> buffer text won't need to recompute the same metrics.  But before this
> kind of caching is attempted, Someone(TM) should demonstrate that the
> same portions of the long lines are traversed by the display engine
> more than once, because otherwise what you suggest will be pure
> overhead with no hope of any gains.
>

One can modify accordingly the rbt at the first transversal, or at
modification, and at a 2nd nothing more to do, apart from reading the
result in log(N).



>
> > The solution of this problem is the red-black trees, by inserting at each
> > node all the pre-computed characteristics of the block of text kept at
> the
> > given node. Any modification will pre-compute a finite number of nodes,
> and
> > when need to access them , only a logarithmic # of nodes will be taken in
> > consideration for all possible operations.
>
> "Premature optimization is the root of all evil."  (Donald Knuth)
>
> I would suggest to study the relevant code and, more importantly, time
> it to find the _real_ (as opposed to imaginary) hot spots.  Then
> present that data, and we could thereafter discuss whether your
> suggested solution might indeed speed up the code which takes up the
> lion's share of the time when Emacs needs to display very long lines.
>
>
I know the theory, but I wanted to report the bug, and provide the
solution. I thought that those who have fresh in mind the place in emacs
where it slows will say exactly why it slows, and indeed they answered.



> If indeed these techniques could help, patches will be very welcome,
> as this is an annoying problem, and there's a bug report for it (see
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13675).
>

This technique can help for sure, but I will not have time to implement it
now. I did not know about bug reports, but this problem is annoying for me
, as I am using emacs all the time.

I will solve it myself, if nobody solves it before.

[-- Attachment #2: Type: text/html, Size: 4278 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-16  8:53               ` Alin Soare
@ 2014-05-16  9:24                 ` Thien-Thi Nguyen
  2014-05-21  8:34                 ` Alin Soare
  1 sibling, 0 replies; 22+ messages in thread
From: Thien-Thi Nguyen @ 2014-05-16  9:24 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

() Alin Soare <as1789@gmail.com>
() Fri, 16 May 2014 11:53:25 +0300

   I will solve it myself, if nobody solves it before.

I think if you can include height-caching in the rbt rework
(or redesign) you will win the enthusiasm of others.

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Slow redisplay with long lines
  2014-05-16  5:40           ` Eli Zaretskii
@ 2014-05-16 18:08             ` Stephan Mueller
  2014-05-16 20:36               ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Stephan Mueller @ 2014-05-16 18:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org

From: Eli Zaretskii [mailto:eliz@gnu.org] 
" > From: Stephan Mueller <Stephan.Mueller@microsoft.com>
" > " > Long lines slow down redisplay because it needs to scan the entire
" > " > line to see how tall it will be on display.
" > "
" > " Why should it care about the whole line rather than only the part
" > " of it that's displayed?
" > 
" > So that the line height doesn't change when scrolling horizontally
" > and a taller character comes into view?
"
" No, horizontal scrolling causes another redisplay, which examines the
" characters that came into view at that time.

So, IIUC, that means that scrolling horizontally _can_ cause the user's
experience to be that the height of a line was changed by looking at a
different portion of it.  Understood it is redisplayed (and so no doubt
draws correctly), but I'd still consider the height change to be an odd
behaviour.  I'm curious if it was an intentional choice (perhaps in the
name of perf, removing one of the reasons to scan the whole line)?

stephan($0.02);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Slow redisplay with long lines
  2014-05-16 18:08             ` Stephan Mueller
@ 2014-05-16 20:36               ` Eli Zaretskii
  0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-16 20:36 UTC (permalink / raw)
  To: Stephan Mueller; +Cc: monnier, emacs-devel

> From: Stephan Mueller <Stephan.Mueller@microsoft.com>
> Date: Fri, 16 May 2014 18:08:18 +0000
> Cc: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>,
> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> From: Eli Zaretskii [mailto:eliz@gnu.org] 
> " > From: Stephan Mueller <Stephan.Mueller@microsoft.com>
> " > " > Long lines slow down redisplay because it needs to scan the entire
> " > " > line to see how tall it will be on display.
> " > "
> " > " Why should it care about the whole line rather than only the part
> " > " of it that's displayed?
> " > 
> " > So that the line height doesn't change when scrolling horizontally
> " > and a taller character comes into view?
> "
> " No, horizontal scrolling causes another redisplay, which examines the
> " characters that came into view at that time.
> 
> So, IIUC, that means that scrolling horizontally _can_ cause the user's
> experience to be that the height of a line was changed by looking at a
> different portion of it.

Yes, it can happen.  You can see this yourself like this:

 emacs -Q
 M-x font-lock-mode RET
 M-x load-library RET info RET
 C-u 122 M-x goto-char RET
 C-SPC C-e
 M-o O info-title-1 RET
 C-a
 C-x 3
 C-e

> Understood it is redisplayed (and so no doubt draws correctly), but
> I'd still consider the height change to be an odd behaviour.  I'm
> curious if it was an intentional choice (perhaps in the name of
> perf, removing one of the reasons to scan the whole line)?

It is the design principle of the Emacs display engine to examine as
little of buffer text as possible, because otherwise it would be
unbearably slow, even on a fast machine.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-16  8:53               ` Alin Soare
  2014-05-16  9:24                 ` Thien-Thi Nguyen
@ 2014-05-21  8:34                 ` Alin Soare
  2014-05-21 15:17                   ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Alin Soare @ 2014-05-21  8:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]

Here is how the buffer text could be encoded. In this case, I pasted here
the output of a Visual method , with the input text "RedBlackTree."

Each node has a block of the whole text, the whole text is ordered in
in-order (left - print_node - right).

In this demo that I wrote in Java, each node keeps only 1 character of the
whole text. Each node has also a number, which represents the depth of that
node (those that are starred are at the same level as their parent, because
they are marked RED -- red means star in my visual output).

(Use a font with fixed width size of each character to see the output
clearly)

** In Emacs each node there must be a linked list of structures that keeps
not a char, but a few blocks of text, like this

struct block_text {
  char *text;
  struct block_text * next;
  struct block_text * prev;
  Node *owner;
  ...
}

The owner is a pointer to the node where this block is kept, and it is
useful for fast splitting and joining nodes.

Note that at each node there can be more than 1 structure into the double
linked list , because this helps operations like delete and insert to work
fast, without having to move text in the moment when 2 nodes are joined or
removed.

The last node of a list of a given node points to the beginning at the list
of the following node, such that operations like search-forward and
backward, whose NFA needs to know the following char to work as fast as it
does in current linear buffer implementation,

RBT corresponding to the input text "RedBlackTree."

        // Visual RedBlackTree            k----0                     |
        // Example                        ·                          |
        //                :´´´´´´´´´´´´´´´°¯¯¯¯¯¯¯¦                  |
        //                B----0-(*)              r---1              |
        //                ·                       ·                  |
        //        ¦¯¯¯¯¯¯¯°¯¯¯¯¯¯¯¦         ¦¯¯¯¯¯°¯¯¯¯¯¯¯¯¯¯¯¦      |
        //        e---1           a---1     T---2             e---2  |
        //        ·               ·         ·                 ·      |
        //  ¦¯¯¯¯¯°¯¦       ¦¯¯¯¯¯°¯¦      ¦°¦      :´´´´´´´´´°¦     |
        //  R---2   d---2   l---2   c---2  # #      e---2-(*)  #     |
        //  ·       ·       ·       ·               ·                |
        // ¦°¦     ¦°¦     ¦°¦     ¦°¦             ¦°¦               |
        // # #     # #     # #     # #             # #               |

Ideally, at each node all the text should have the same properties (marks,
fonts, etc), and chaging 1 such property can be implemented by splitting
the node or removing a few nodes, and their text be preserved in 1 node
(only the field owner must be changed when a few nodes are joined in only
1).

Each node in RBT must have pre-computed all the characteristics of the
block of text keps at that node, including the number of newlines of the
nodes at its left, the number of chars at its left, such that the searach
operations of a given line and a given point offset to work fast.

Note that the gap space has trivial implementation like this, and all the
operations on gap work logirithmically in the worse case.

This is the idea that I will implement in future Clearly, it works, and all
the details can be solved with such a data structure. For me the red black
trees are trivial to imeplement, and integrating them in emacs buffer will
be a lot of fun.

[-- Attachment #2: Type: text/html, Size: 4658 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-21  8:34                 ` Alin Soare
@ 2014-05-21 15:17                   ` Eli Zaretskii
  2014-05-21 16:00                     ` Alin Soare
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-21 15:17 UTC (permalink / raw)
  To: Alin Soare; +Cc: monnier, emacs-devel

> Date: Wed, 21 May 2014 11:34:10 +0300
> From: Alin Soare <as1789@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> Ideally, at each node all the text should have the same properties (marks,
> fonts, etc), and chaging 1 such property can be implemented by splitting
> the node or removing a few nodes, and their text be preserved in 1 node
> (only the field owner must be changed when a few nodes are joined in only
> 1).

That would mean we will need to rearrange the tree each time text
properties or overlays change, something we avoid today.

Also please note that 'display' properties can specify that Emacs
displays something entirely different from buffer text, so the display
engine will still need to consult Lisp strings, in addition to the
text.  (Yes, it's lots of fun.)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-21 15:17                   ` Eli Zaretskii
@ 2014-05-21 16:00                     ` Alin Soare
  2014-05-21 17:03                       ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Alin Soare @ 2014-05-21 16:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

2014-05-21 18:17 GMT+03:00 Eli Zaretskii <eliz@gnu.org>:

> > Date: Wed, 21 May 2014 11:34:10 +0300
> > From: Alin Soare <as1789@gmail.com>
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> >
> > Ideally, at each node all the text should have the same properties
> (marks,
> > fonts, etc), and chaging 1 such property can be implemented by splitting
> > the node or removing a few nodes, and their text be preserved in 1 node
> > (only the field owner must be changed when a few nodes are joined in only
> > 1).
>
> That would mean we will need to rearrange the tree each time text
> properties or overlays change, something we avoid today.
>

Yes, each time ! But however, a simple mental computation is still much
faster than now, because the computation is made only 1 for each change,
and there will be lots of simple pre-computations that never repeat. Not
all the attributes will be pre-computed at the creation of the node. Some
attributes, like # of newlines on the text at the left of the node and on
the self will be precomputed at the creation. The Node->marks will be
perhaps precomputed when some mark is added in the text within the node.
Display props are pre-computed when the redisplay-internal requires it, so
kind of lazy.

> Also please note that 'display' properties can specify that Emacs
> displays something entirely different from buffer text, so the display
> engine will still need to consult Lisp strings, in addition to the
> text.  (Yes, it's lots of fun.)
>

What is displayed will be all the time pre-computed for each node, and this
will never be re-computed for that node before modifications. The attribute
NODE->what-to-display-data perhaps will be evaluated at the 1st display,
not at the node creation.

Note also that I will do such that each node keeps not a block of text, but
a linked list of blocks of text, such that to be fast to manipulate
operations like join and split.

Apart from search in buffer and reading the text at a given offset(point),
that will be a little slower because the problem of page cache, all the
other functions will be faster (at a first sight).

We will see. When i start working, I will send you private message to ask
you to read what I did. Try since October on to do it.

[-- Attachment #2: Type: text/html, Size: 3169 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-21 16:00                     ` Alin Soare
@ 2014-05-21 17:03                       ` Eli Zaretskii
  2014-05-22  9:33                         ` Alin Soare
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2014-05-21 17:03 UTC (permalink / raw)
  To: Alin Soare; +Cc: monnier, emacs-devel

> Date: Wed, 21 May 2014 19:00:49 +0300
> From: Alin Soare <as1789@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > Ideally, at each node all the text should have the same properties
> > (marks,
> > > fonts, etc), and chaging 1 such property can be implemented by splitting
> > > the node or removing a few nodes, and their text be preserved in 1 node
> > > (only the field owner must be changed when a few nodes are joined in only
> > > 1).
> >
> > That would mean we will need to rearrange the tree each time text
> > properties or overlays change, something we avoid today.
> >
> 
> Yes, each time ! But however, a simple mental computation is still much
> faster than now, because the computation is made only 1 for each change,
> and there will be lots of simple pre-computations that never repeat.

Not necessarily.  One scenario that comes to mind is when you visit a
large buffer, and have jit-stealth enabled, which fontifies the buffer
when Emacs is idle.  With your suggestion, each stealth fontification
will cause tree rearrangement; most of these will be in portions of
the buffer that are never displayed at all.

More generally, the savings from maintaining face information as part
of the buffer text are highly dependent on the usage patterns, and it
is not clear to me they will always be a win.

Once again, I urge you to measure and time the relevant code, and also
collect the various popular use cases relevant to the issue, because
otherwise you will be most probably optimizing in the wrong place.

> > Also please note that 'display' properties can specify that Emacs
> > displays something entirely different from buffer text, so the display
> > engine will still need to consult Lisp strings, in addition to the
> > text.  (Yes, it's lots of fun.)
> 
> What is displayed will be all the time pre-computed for each node, and this
> will never be re-computed for that node before modifications. The attribute
> NODE->what-to-display-data perhaps will be evaluated at the 1st display,
> not at the node creation.

So this means you will also have to rearrange the tree whenever
display properties change, which is more work, and non-trivial one at
that (access to overlay strings is not very efficient in Emacs, when
there are many of them).

And what happens with text that is covered with an invisible text
property?  Does it disappear from the tree?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: line buffer as Red Black Trees instead of linear
  2014-05-21 17:03                       ` Eli Zaretskii
@ 2014-05-22  9:33                         ` Alin Soare
  0 siblings, 0 replies; 22+ messages in thread
From: Alin Soare @ 2014-05-22  9:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

>
>
>
> So this means you will also have to rearrange the tree whenever
> display properties change, which is more work, and non-trivial one at
> that (access to overlay strings is not very efficient in Emacs, when
> there are many of them).
>
> And what happens with text that is covered with an invisible text
> property?  Does it disappear from the tree?
>

If you keep a single RBT only for overlays, this tree will be modified only
in case of overlay change.

And if you have a single property "narrow from X to Y", and inside X and Y
there is other property, in the moment when you compute something on this
tree, you can simply jump from the node X to the node keeping offset Y. You
can ignore what is inside the tree, and the jump is done logarithmically,
because searching in RBT is guaranteed to be log in the worse time.

Your example shows clearly that it is better to make a separate tree for
each class of properties, than keeping text together overlays , together
with font locks in the same tree.

[-- Attachment #2: Type: text/html, Size: 1754 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-05-22  9:33 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-15 14:33 line buffer as Red Black Trees instead of linear Alin Soare
2014-05-15 17:15 ` Alin Soare
2014-05-15 19:17 ` Stefan Monnier
2014-05-15 19:47   ` Alin Soare
2014-05-15 20:06     ` Eli Zaretskii
2014-05-15 20:38       ` Slow redisplay with long lines Stefan Monnier
2014-05-15 20:57         ` Stephan Mueller
2014-05-16  5:40           ` Eli Zaretskii
2014-05-16 18:08             ` Stephan Mueller
2014-05-16 20:36               ` Eli Zaretskii
2014-05-16  5:39         ` Eli Zaretskii
2014-05-15 23:40       ` line buffer as Red Black Trees instead of linear Alin Soare
2014-05-16  5:41         ` Eli Zaretskii
2014-05-16  7:44           ` Alin Soare
2014-05-16  8:28             ` Eli Zaretskii
2014-05-16  8:53               ` Alin Soare
2014-05-16  9:24                 ` Thien-Thi Nguyen
2014-05-21  8:34                 ` Alin Soare
2014-05-21 15:17                   ` Eli Zaretskii
2014-05-21 16:00                     ` Alin Soare
2014-05-21 17:03                       ` Eli Zaretskii
2014-05-22  9:33                         ` Alin Soare

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).