bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
@ 2022-09-29  5:29 Gerd Möllmann
  2022-09-29  6:28 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29  5:29 UTC (permalink / raw)
  To: 58158

In its current form, interval tree iteration works like this:

1. Call begin_iteration(T) to iterate over tree T
2. do stuff
3. Call end_iteration(T)

with the following rules:

- Begin_iteration and end_iteration must be paired.

- There can be only one iteration per tree at a time.  Nested iteration
  over the same tree is not supported (abort).

- No GC may happen in step 2.  This is because mark_buffer iterates over
  buffer overlays.

I think this is an exceedingly dangerous design.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann
@ 2022-09-29  6:28 ` Eli Zaretskii
  2022-09-29  7:03   ` Gerd Möllmann
  2022-10-06 22:26 ` Matt Armstrong
  2023-10-06 13:14 ` Gerd Möllmann
  2 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29  6:28 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 58158

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Thu, 29 Sep 2022 07:29:25 +0200
> 
> In its current form, interval tree iteration works like this:
> 
> 1. Call begin_iteration(T) to iterate over tree T
> 2. do stuff
> 3. Call end_iteration(T)
> 
> with the following rules:
> 
> - Begin_iteration and end_iteration must be paired.
> 
> - There can be only one iteration per tree at a time.  Nested iteration
>   over the same tree is not supported (abort).
> 
> - No GC may happen in step 2.  This is because mark_buffer iterates over
>   buffer overlays.
> 
> I think this is an exceedingly dangerous design.

Why, because of "no GC" requirement?  We could ensure that by calling
inhibit_garbage_collection (if the code doesn't do that already).
That is not very elegant, and might even cause memory pressure in some
(hopefully rare) situations, but we do have some users of this in the
sources.

What higher-level operations require "interval tree iteration" that
you describe?  Which primitives end up doing such iterations?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  6:28 ` Eli Zaretskii
@ 2022-09-29  7:03   ` Gerd Möllmann
  2022-09-29  8:08     ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 58158

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Thu, 29 Sep 2022 07:29:25 +0200
>> 
>> In its current form, interval tree iteration works like this:
>> 
>> 1. Call begin_iteration(T) to iterate over tree T
>> 2. do stuff
>> 3. Call end_iteration(T)
>> 
>> with the following rules:
>> 
>> - Begin_iteration and end_iteration must be paired.
>> 
>> - There can be only one iteration per tree at a time.  Nested iteration
>>   over the same tree is not supported (abort).
>> 
>> - No GC may happen in step 2.  This is because mark_buffer iterates over
>>   buffer overlays.
>> 
>> I think this is an exceedingly dangerous design.
>
> Why, because of "no GC" requirement?  We could ensure that by calling
> inhibit_garbage_collection (if the code doesn't do that already).

It doesn't.

BTW, if anything signals in step 2, so that end_iteration isn't called,
we're also hosed.

> What higher-level operations require "interval tree iteration" that
> you describe?  Which primitives end up doing such iterations?

What has to do with overlays.  To name a few: overlay-at, overlays-in,
next-overlay-change, previous-overlay-change, overlay-lists, ...

I personally think this is a no-go.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  7:03   ` Gerd Möllmann
@ 2022-09-29  8:08     ` Eli Zaretskii
  2022-09-29  9:09       ` Gerd Möllmann
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29  8:08 UTC (permalink / raw)
  To: Gerd Möllmann, Stefan Monnier; +Cc: 58158

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 58158@debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:03:17 +0200
> 
> >> - No GC may happen in step 2.  This is because mark_buffer iterates over
> >>   buffer overlays.
> >> 
> >> I think this is an exceedingly dangerous design.
> >
> > Why, because of "no GC" requirement?  We could ensure that by calling
> > inhibit_garbage_collection (if the code doesn't do that already).
> 
> It doesn't.

Should be easy to fix, no?

> BTW, if anything signals in step 2, so that end_iteration isn't called,
> we're also hosed.

record_unwind_protect should fix that, right?
(inhibit_garbage_collection already employs this mechanism).

> > What higher-level operations require "interval tree iteration" that
> > you describe?  Which primitives end up doing such iterations?
> 
> What has to do with overlays.  To name a few: overlay-at, overlays-in,
> next-overlay-change, previous-overlay-change, overlay-lists, ...
> 
> I personally think this is a no-go.

Really?  Even if we take all the measures mentioned above?  Why so?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  8:08     ` Eli Zaretskii
@ 2022-09-29  9:09       ` Gerd Möllmann
  2022-09-29  9:37         ` Eli Zaretskii
  2022-10-01  1:57         ` Richard Stallman
  0 siblings, 2 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29  9:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 58158, Stefan Monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> I personally think this is a no-go.
>
> Really?  Even if we take all the measures mentioned above?

Yes.

> Why so?

I think it simply can't be that what is basically walking a binary tree
requires such restrictions.  And if it does because of some quirk of the
interval tree in itree.c, something is seriously wrong with the design.

That's a bit harsh, but I mean it :-).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  9:09       ` Gerd Möllmann
@ 2022-09-29  9:37         ` Eli Zaretskii
  2022-09-29 10:05           ` Gerd Möllmann
  2022-10-01  1:57         ` Richard Stallman
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29  9:37 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 58158, monnier

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  58158@debbugs.gnu.org
> Date: Thu, 29 Sep 2022 11:09:17 +0200
> 
> I think it simply can't be that what is basically walking a binary tree
> requires such restrictions.  And if it does because of some quirk of the
> interval tree in itree.c, something is seriously wrong with the design.
> 
> That's a bit harsh, but I mean it :-).

Fair enough.

Can you propose a fix?

I guess the begin_iteration/end_iteration dance is because the tree
could be in inconsistent state in-between?  If so, what would it take
to avoid the inconsistency?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  9:37         ` Eli Zaretskii
@ 2022-09-29 10:05           ` Gerd Möllmann
  2022-09-29 10:43             ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29 10:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 58158, monnier

Eli Zaretskii <eliz@gnu.org> writes:

> Can you propose a fix?

Other than completely rewrite at least this part, no.

> I guess the begin_iteration/end_iteration dance is because the tree
> could be in inconsistent state in-between?  If so, what would it take
> to avoid the inconsistency?

I find it very hard to tell why this is done the way it is.  If someone
knowing the code can think of a reason, please speak up.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 10:05           ` Gerd Möllmann
@ 2022-09-29 10:43             ` Eli Zaretskii
  2022-09-29 11:33               ` Gerd Möllmann
  2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29 10:43 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 58158, monnier

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: monnier@iro.umontreal.ca,  58158@debbugs.gnu.org
> Date: Thu, 29 Sep 2022 12:05:50 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Can you propose a fix?
> 
> Other than completely rewrite at least this part, no.
> 
> > I guess the begin_iteration/end_iteration dance is because the tree
> > could be in inconsistent state in-between?  If so, what would it take
> > to avoid the inconsistency?
> 
> I find it very hard to tell why this is done the way it is.  If someone
> knowing the code can think of a reason, please speak up.

I may be missing something, but it looks like the sole purpose of the
iter_start/iter_finish dance is to ensure only one iteration per tree
is running at any given time, and that's because the iteration uses
some state variable(s) of which there's only one instance per tree.

Stefan, am I missing something?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 10:43             ` Eli Zaretskii
@ 2022-09-29 11:33               ` Gerd Möllmann
  2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29 11:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 58158, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> I find it very hard to tell why this is done the way it is.  If someone
>> knowing the code can think of a reason, please speak up.
>
> I may be missing something, but it looks like the sole purpose of the
> iter_start/iter_finish dance is to ensure only one iteration per tree
> is running at any given time, and that's because the iteration uses
> some state variable(s) of which there's only one instance per tree.

BTW, It currently uses a "visited" flag in tree nodes, which is kind of
such a state, but something like that is normally not needed when
walking an rb-tree.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 10:43             ` Eli Zaretskii
  2022-09-29 11:33               ` Gerd Möllmann
@ 2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-29 13:23                 ` Eli Zaretskii
                                   ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 13:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, 58158

> I may be missing something, but it looks like the sole purpose of the
> iter_start/iter_finish dance is to ensure only one iteration per tree
> is running at any given time, and that's because the iteration uses
> some state variable(s) of which there's only one instance per tree.
>
> Stefan, am I missing something?

One reason is that traversing a binary tree usually requires something
like recursion, but that wouldn't fit very conveniently with the current
code (nor with C in general since you can't make a local recursive
closure which accesses local variables from the surrounding function).

Another is the need to update the begin/end fields (these need updating
because of insertions/deletions but they're updated lazily while
traversing the tree to avoid an O(N) complexity during the
insertions/deletions).  Hiding that behind 'some kind of "next node"
function keeps the code more readable.

But yes, the current restriction to have a single iteration at a time is
a bit of a problem, especially because it's very "global".  I added
a comment yesterday describing how we could make it non-global (hence
getting rid of the `visited` flag in the nodes).

For now, I pushed a simple fix to traverse the tree "by hand" in the GC
rather than via the iterator.

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-29 13:23                 ` Eli Zaretskii
  2022-09-29 16:48                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-29 13:40                 ` Eli Zaretskii
  2022-09-29 14:15                 ` Gerd Möllmann
  2 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29 13:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gerd.moellmann, 58158

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,
>   58158@debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:10:19 -0400
> 
> > I may be missing something, but it looks like the sole purpose of the
> > iter_start/iter_finish dance is to ensure only one iteration per tree
> > is running at any given time, and that's because the iteration uses
> > some state variable(s) of which there's only one instance per tree.
> >
> > Stefan, am I missing something?
> 
> One reason is that traversing a binary tree usually requires something
> like recursion, but that wouldn't fit very conveniently with the current
> code (nor with C in general since you can't make a local recursive
> closure which accesses local variables from the surrounding function).

I'm not sure I understand how recursion is related.  Are you saying
that recursion is replaced with iteration?  But then, if _start and
_finish are called by the same caller, we don't really need the
protection, since no one can start another iteration while the first
one runs.  Right?

> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Where in the code do you see iteration that adds or deletes nodes?

> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
> rather than via the iterator.

So this removes the restriction of not having GC during iteration?

Also, I take it that you don't consider the current code is as
"harmful" as Gerd thinks it is?  IOW, you don't share his opinion that
this implementation is a "no-go"?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-29 13:23                 ` Eli Zaretskii
@ 2022-09-29 13:40                 ` Eli Zaretskii
  2022-09-29 14:15                 ` Gerd Möllmann
  2 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-29 13:40 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gerd.moellmann, 58158

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,
>   58158@debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:10:19 -0400
> 
> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Btw, couldn't we handle this by having a flag in the node that tells
us whether the begin/end fields can be trusted?  Then the first caller
who need them would run the update and reset the flags, and we still
have lazy update, albeit somewhat less lazy, but without the need for
guarding the iteration with start/finish calls.  Would that work?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-29 13:23                 ` Eli Zaretskii
  2022-09-29 13:40                 ` Eli Zaretskii
@ 2022-09-29 14:15                 ` Gerd Möllmann
  2022-09-29 14:37                   ` Gerd Möllmann
  2022-09-29 16:40                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 2 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29 14:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> One reason is that traversing a binary tree usually requires something
> like recursion, but that wouldn't fit very conveniently with the current
> code (nor with C in general since you can't make a local recursive
> closure which accesses local variables from the surrounding function).

Ok, usually, but not necessarily.  The alternative is to implement an
iterator that starts with a node N, and an implementation of a successor
function, which return the successor of N in a given order.  This
requires a parent pointer in nodes, but that we have.

(Something like this is used for ordered containers like "map" and "set"
in C++ STL, for instance, which are based on rb-trees in GCC's
libstdc++.)

> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Is this for buffer text changes, something akin to a delayed update of
marker positions?  I already wondered where that was.

> But yes, the current restriction to have a single iteration at a time is
> a bit of a problem, especially because it's very "global".  I added
> a comment yesterday describing how we could make it non-global (hence
> getting rid of the `visited` flag in the nodes).

BTW, and related to iteration directly, did you notice this in
interval_tree_insert?

      /* This suggests that nodes in the right subtree are strictly
         greater.  But this is not true due to later rotations. */
      child = node->begin <= child->begin ? child->left : child->right;





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 14:15                 ` Gerd Möllmann
@ 2022-09-29 14:37                   ` Gerd Möllmann
  2022-09-29 22:09                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-29 16:40                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-29 14:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 58158

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> Ok, usually, but not necessarily.  The alternative is to implement an
> iterator that starts with a node N, and an implementation of a successor
> function, which return the successor of N in a given order.  This
> requires a parent pointer in nodes, but that we have.
>
> (Something like this is used for ordered containers like "map" and "set"
> in C++ STL, for instance, which are based on rb-trees in GCC's
> libstdc++.)

The code from libstdc++ is this (from libstdc++-v3/src/c++98/tree.cc):

  static _Rb_tree_node_base*
  local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
  {
    if (__x->_M_right != 0)
      {
        __x = __x->_M_right;
        while (__x->_M_left != 0)
          __x = __x->_M_left;
      }
    else
      {
        _Rb_tree_node_base* __y = __x->_M_parent;
        while (__x == __y->_M_right)
          {
            __x = __y;
            __y = __y->_M_parent;
          }
        if (__x->_M_right != __y)
          __x = __y;
      }
    return __x;
  }

I hope one can read that.

The idea is to find the root of the smallest subtree containing the
current node, and proceed from there.  That's why the parent pointer is
needed.

Symmmetrical for max->min ordering.  And finding min/max is trivial.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 14:15                 ` Gerd Möllmann
  2022-09-29 14:37                   ` Gerd Möllmann
@ 2022-09-29 16:40                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 16:40 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158

Gerd Möllmann [2022-09-29 16:15:09] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
> Ok, usually, but not necessarily.  The alternative is to implement an
> iterator that starts with a node N, and an implementation of a successor
> function, which return the successor of N in a given order.

The approach currently used is somewhat similar to that.  Some of the
difference is that we need an actual "iterator/generator" object to
remember the parameter of the filtering we want to apply to the set of
objects.  And the problem is that this "object" is currently implemented
not only as a global value (thus restricting us to one-iteration at
a time) but also with some parts of the data stored in the tree.
I think this is the part that really needs to be changed.

> This requires a parent pointer in nodes, but that we have.
>
> (Something like this is used for ordered containers like "map" and "set"
> in C++ STL, for instance, which are based on rb-trees in GCC's
> libstdc++.)

Another difference is that itree.c's iterator uses a local "work stack"
instead of traversing the tree exclusively via left/right/parent like in
the code you show.  I don't know if that difference is important, tho.

>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions).  Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Is this for buffer text changes, something akin to a delayed update of
> marker positions?

Yes, exactly.

>> But yes, the current restriction to have a single iteration at a time is
>> a bit of a problem, especially because it's very "global".  I added
>> a comment yesterday describing how we could make it non-global (hence
>> getting rid of the `visited` flag in the nodes).
>
> BTW, and related to iteration directly, did you notice this in
> interval_tree_insert?
>
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;

No, I had not noticed that yet, and I don't understand this comment.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 13:23                 ` Eli Zaretskii
@ 2022-09-29 16:48                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 16:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 58158

>> > I may be missing something, but it looks like the sole purpose of the
>> > iter_start/iter_finish dance is to ensure only one iteration per tree
>> > is running at any given time, and that's because the iteration uses
>> > some state variable(s) of which there's only one instance per tree.
>> >
>> > Stefan, am I missing something?
>> 
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> I'm not sure I understand how recursion is related.  Are you saying
> that recursion is replaced with iteration?  But then, if _start and
> _finish are called by the same caller, we don't really need the
> protection, since no one can start another iteration while the first
> one runs.  Right?

Typically, current code will look something like:

    int x;
    Lisp_Object y;

    buffer_overlay_iter_start (current_buffer, prev, pos, ITREE_DESCENDING);
    while ((node = buffer_overlay_iter_next (current_buffer)))
      {
        ... do something that updates x and y ...
      }
    buffer_overlay_iter_finish (current_buffer);

If we were to use recursion, then we'd need to define a new (recursive)
function which does what's currently done in the `while` loop, but this
function can't access `x` and `y`, so it would need to take them as
argument, or a reference to them...

The use of an iterator is definitely convenient (and is a standard
approach in many languages).

>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions).  Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Where in the code do you see iteration that adds or deletes nodes?

No, I meant insertion/deletion of text in the buffer, thus requiring
updates to `begin/end` fields.

> Btw, couldn't we handle this by having a flag in the node that tells
> us whether the begin/end fields can be trusted?  Then the first caller
> who need them would run the update and reset the flags, and we still
> have lazy update, albeit somewhat less lazy, but without the need for
> guarding the iteration with start/finish calls.  Would that work?

Yes, it would, but it's still O(N).
The current approach is inspired by the approach used in `intervals.c`
which also updates those fields lazily/ondemand so as to avoid the O(N)
performance impact.

>> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
>> rather than via the iterator.
> So this removes the restriction of not having GC during iteration?

Yes.

> Also, I take it that you don't consider the current code is as
> "harmful" as Gerd thinks it is?

I don't like the global state it uses, but I think we can fix this
aspect without too much trouble.

> IOW, you don't share his opinion that this implementation is
> a "no-go"?

No, indeed, I don't.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 14:37                   ` Gerd Möllmann
@ 2022-09-29 22:09                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-30  5:28                       ` Gerd Möllmann
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 22:09 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158

>   static _Rb_tree_node_base*
>   local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
>   {
>     if (__x->_M_right != 0)
>       {
>         __x = __x->_M_right;
>         while (__x->_M_left != 0)
>           __x = __x->_M_left;
>       }
>     else
>       {
>         _Rb_tree_node_base* __y = __x->_M_parent;
>         while (__x == __y->_M_right)
>           {
>             __x = __y;
>             __y = __y->_M_parent;
>           }
>         if (__x->_M_right != __y)
>           __x = __y;
>       }
>     return __x;
>   }
>
> I hope one can read that.

I changed the code to store the `visited` bit in the work stack, but if
you could rewrite the `interval_generator_next` along the lines of the
code above that would be great.
[ An alternative would be to try and get rid of the `parent` field :-)  ]


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29 22:09                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-30  5:28                       ` Gerd Möllmann
  2022-09-30  6:11                         ` Eli Zaretskii
  2022-09-30 13:25                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-30  5:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I changed the code to store the `visited` bit in the work stack, but if
> you could rewrite the `interval_generator_next` along the lines of the
> code above that would be great.

Ok, I'll rewrite that :-).  When I understand what that "narrowing" is
and how and for what it is used.

BTW, what do you think of changing function names to something a bit
shorter?  I find myself constantly getting confused when reading the
code.  I think an "itree_" prefix would suffice for non-static
functions, and static ones without prefix.  Renaming that is hopefully
not that much work with LSP.

> [ An alternative would be to try and get rid of the `parent` field :-)
> ]

:-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30  5:28                       ` Gerd Möllmann
@ 2022-09-30  6:11                         ` Eli Zaretskii
  2022-09-30 11:31                           ` Gerd Möllmann
  2022-10-06 22:36                           ` Dmitry Gutov
  2022-09-30 13:25                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-30  6:11 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 58158, monnier

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  58158@debbugs.gnu.org
> Date: Fri, 30 Sep 2022 07:28:26 +0200
> 
> Renaming that is hopefully not that much work with LSP.

You should be able to do this with M-? followed by
"M-x xref-query-replace-in-results RET" (the latter
should be invoked in the "*xref*" buffer produced by M-?).





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30  6:11                         ` Eli Zaretskii
@ 2022-09-30 11:31                           ` Gerd Möllmann
  2022-09-30 18:29                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-10-06 22:36                           ` Dmitry Gutov
  1 sibling, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-30 11:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 58158, monnier

On 22-09-30 8:11 , Eli Zaretskii wrote:
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  58158@debbugs.gnu.org
>> Date: Fri, 30 Sep 2022 07:28:26 +0200
>>
>> Renaming that is hopefully not that much work with LSP.
> 
> You should be able to do this with M-? followed by
> "M-x xref-query-replace-in-results RET" (the latter
> should be invoked in the "*xref*" buffer produced by M-?).

Ok, thanks for the tip.

Info for Stefan: I've now removed the per-tree null node, and make check 
shows no unexpected results.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30  5:28                       ` Gerd Möllmann
  2022-09-30  6:11                         ` Eli Zaretskii
@ 2022-09-30 13:25                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-30 14:08                           ` Gerd Möllmann
  1 sibling, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 13:25 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158

Gerd Möllmann [2022-09-30 07:28:26] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> I changed the code to store the `visited` bit in the work stack, but if
>> you could rewrite the `interval_generator_next` along the lines of the
>> code above that would be great.
> Ok, I'll rewrite that :-).  When I understand what that "narrowing" is
> and how and for what it is used.

The traversals are always bound by begin..end boundaries.  Usually we
know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but
when doing things like `next/previous-overlay-change` one of the bounds
is not know at first, so in order to try and avoid the O(N) complexity
the code refines that other bound on the fly (e.g. when searching
forward, after seeing an overlay that ends at POS we know that there's no
point looking further than POS so we can move the end boundary to POS).

> BTW, what do you think of changing function names to something a bit
> shorter?  I find myself constantly getting confused when reading the
> code.  I think an "itree_" prefix would suffice for non-static
> functions, and static ones without prefix.

Fine by me, yes.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 13:25                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-30 14:08                           ` Gerd Möllmann
  2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-09-30 14:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> The traversals are always bound by begin..end boundaries.  Usually we
> know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but
> when doing things like `next/previous-overlay-change` one of the bounds
> is not know at first, so in order to try and avoid the O(N) complexity
> the code refines that other bound on the fly (e.g. when searching
> forward, after seeing an overlay that ends at POS we know that there's no
> point looking further than POS so we can move the end boundary to
> POS).

Thanks, that helps.

Maybe you could also help me with the questions below?

I'm assuming, from a comment somewhere, that an interval tree is an
rb-tree with keys being interval start positions, and allowing
duplicates.

That is, if N is a node, all nodes in the subtree N->left are strictly <
N, and nodes in N->right are >=.  Or in a picture, if [start, end] is an
interval, and we insert some intervals with the same start, we could
arrive at

                 [5, a]
                /      \
            [4, b]     [6, c]
              /\         /  \
             0  0       0   [6, d]
                              /\
                              ...

where 0 means null, and 6 is a duplicate start.

1. Is that correct?

2. Does the tree order duplicates in a particular way?  Maybe by overlay
priority, or by interval end?  And, perhaps, if it doesn't order
duplicates, would it be an idea to order them, maybe at some later
point?  I'm asking this because a successor(N) function would return
nodes in the order in the tree, always, and I don't know what the order
is now.

3. If my mental picture is right, we could of course end up with a tree
that has degenerated to a list.  Or a subtree, maybe.  Do you know if
that can happen?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 14:08                           ` Gerd Möllmann
@ 2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-30 16:04                               ` Eli Zaretskii
                                                 ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 15:25 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158

> Maybe you could also help me with the questions below?

I'll try (BTW, the original author is Andreas Politz who we can still
reach at <mail@andreas-politz.de>.  He doesn't have much time to devote
to it, tho, so best not to Cc him through all the conversations).

> I'm assuming, from a comment somewhere, that an interval tree is an
> rb-tree with keys being interval start positions, and allowing
> duplicates.

Yup.

> That is, if N is a node, all nodes in the subtree N->left are strictly <
> N, and nodes in N->right are >=.

The following code in `interval_tree_insert`:

  while (child != ITREE_NULL)
    {
      parent = child;
      offset += child->offset;
      child->limit = max (child->limit, node->end - offset);
      /* This suggests that nodes in the right subtree are strictly
         greater.  But this is not true due to later rotations. */
      child = node->begin <= child->begin ? child->left : child->right;
    }

suggests that N->left are <= and N->right are > but my reading of the
comment is that the only thing we can rely on is that N-<left is <= and
N->right is >=

[ I do understand this comment now :-)  ]

> 2. Does the tree order duplicates in a particular way?
> Maybe by overlay priority, or by interval end?

AFAICT, no it does not.

> And, perhaps, if it doesn't order duplicates, would it be an idea to
> order them, maybe at some later point?  I'm asking this because
> a successor(N) function would return nodes in the order in the tree,
> always, and I don't know what the order is now.

Ordering based on interval end could arguably make sense.  I'm not
completely sure how/where it would benefit us, tho.  Most times we look
at the itree, we just look at *all* the nodes within a specific
interval, so the order in which they're returned doesn't matter very
much (the ordering within the tree does matter in terms of how we manage
to prune the tree, but that has more to do with which elements are near
the root, which is a different kind of "ordering" than the "binary tree
ordering" itself).  Maybe the `next/previous-overlay-change` code
benefit from sub-ordering based on `end`, but I suspect the effect would
be lost in the noise: if we want to speed up that part of the code,
I expect that a better avenue is to rewrite the
`next/previous-single-overlay-change` so as not to work by (while ..
(next/previous-overlay-change ..) .. (get-char-property ..) ..) where
both functions do the O(log N) itree-iteration dance, but instead doing
the itree iteration itself.

> 3. If my mental picture is right, we could of course end up with a tree
> that has degenerated to a list.  Or a subtree, maybe.  Do you know if
> that can happen?

In terms of tree depth, no: the code preserves the RB invariants, which
ensure balance regardless of ordering (or lack thereof).

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-30 16:04                               ` Eli Zaretskii
  2022-09-30 17:11                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-10-01  5:06                               ` Gerd Möllmann
  2022-10-01  7:25                               ` Gerd Möllmann
  2 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-09-30 16:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gerd.moellmann, mail, 58158

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  58158@debbugs.gnu.org, Andreas Politz
>  <mail@andreas-politz.de>
> Date: Fri, 30 Sep 2022 11:25:30 -0400
> 
> > And, perhaps, if it doesn't order duplicates, would it be an idea to
> > order them, maybe at some later point?  I'm asking this because
> > a successor(N) function would return nodes in the order in the tree,
> > always, and I don't know what the order is now.
> 
> Ordering based on interval end could arguably make sense.  I'm not
> completely sure how/where it would benefit us, tho.  Most times we look
> at the itree, we just look at *all* the nodes within a specific
> interval, so the order in which they're returned doesn't matter very
> much (the ordering within the tree does matter in terms of how we manage
> to prune the tree, but that has more to do with which elements are near
> the root, which is a different kind of "ordering" than the "binary tree
> ordering" itself).  Maybe the `next/previous-overlay-change` code
> benefit from sub-ordering based on `end`, but I suspect the effect would
> be lost in the noise: if we want to speed up that part of the code,
> I expect that a better avenue is to rewrite the
> `next/previous-single-overlay-change` so as not to work by (while ..
> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
> both functions do the O(log N) itree-iteration dance, but instead doing
> the itree iteration itself.

The display engine sorts the overlays, so order could be important.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 16:04                               ` Eli Zaretskii
@ 2022-09-30 17:11                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 17:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, mail, 58158

>> Ordering based on interval end could arguably make sense.  I'm not
>> completely sure how/where it would benefit us, tho.  Most times we look
>> at the itree, we just look at *all* the nodes within a specific
>> interval, so the order in which they're returned doesn't matter very
>> much (the ordering within the tree does matter in terms of how we manage
>> to prune the tree, but that has more to do with which elements are near
>> the root, which is a different kind of "ordering" than the "binary tree
>> ordering" itself).  Maybe the `next/previous-overlay-change` code
>> benefit from sub-ordering based on `end`, but I suspect the effect would
>> be lost in the noise: if we want to speed up that part of the code,
>> I expect that a better avenue is to rewrite the
>> `next/previous-single-overlay-change` so as not to work by (while ..
>> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
>> both functions do the O(log N) itree-iteration dance, but instead doing
>> the itree iteration itself.
>
> The display engine sorts the overlays, so order could be important.

Indeed, for things like `overlays_at` it would be handy if the iterator
could return the right overlays already in the right order, so we don't
have to `sort_overlays`.

But in `sort_overlays` the `priority` property takes precedence, so if
we order the RB tree based on that ordering, we will lose the main
purpose of the change, i.e. finding the overlays at POS in O(log N)
time, because the `compare_overlays` ordering does not let us know that
all of the left or right subtree is outside of a given interval.
IOW the itree has to use a different ordering than that of
`compare_overlays` anyway, so we're still forced to (re-)sort afterwards
with `sort_overlays` :-(


	Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 11:31                           ` Gerd Möllmann
@ 2022-09-30 18:29                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-10-02  8:06                               ` Gerd Möllmann
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 18:29 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158

> Info for Stefan: I've now removed the per-tree null node, and make check
> shows no unexpected results.

Thanks.  It's an improvement, but I still don't understand how it
works :-(


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  9:09       ` Gerd Möllmann
  2022-09-29  9:37         ` Eli Zaretskii
@ 2022-10-01  1:57         ` Richard Stallman
  2022-10-01  7:00           ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Richard Stallman @ 2022-10-01  1:57 UTC (permalink / raw)
  To: Gerd MÃ¶llmann; +Cc: eliz, 58158, monnier

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I think it simply can't be that what is basically walking a binary tree
  > requires such restrictions.

If we allow running complicated code during the tree walk, that would
be asking for trouble.  But if the operations done during the tree
walk are simple and written in C, it should be clear that there is no
possibility of this kind of a bug.  Isn't that the case?

  > > What has to do with overlays.  To name a few: overlay-at, overlays-in,
  > > next-overlay-change, previous-overlay-change, overlay-lists, ...

Those operations can be done in C with no risk of signaling an error.

That ought to be sufficient, as long as we don't need the code to be
reentrant.  If the code needs to be reentrant then we would have
to eliminate the "visited" flag.  But if we don't run Lisp code
during the tree-walk, we should not need it to be reentrant.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-30 16:04                               ` Eli Zaretskii
@ 2022-10-01  5:06                               ` Gerd Möllmann
  2022-10-01 13:54                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-10-01  7:25                               ` Gerd Möllmann
  2 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-01  5:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> That is, if N is a node, all nodes in the subtree N->left are strictly <
>> N, and nodes in N->right are >=.
>
> The following code in `interval_tree_insert`:
>
>   while (child != ITREE_NULL)
>     {
>       parent = child;
>       offset += child->offset;
>       child->limit = max (child->limit, node->end - offset);
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;
>     }
>
> suggests that N->left are <= and N->right are > but my reading of the
> comment is that the only thing we can rely on is that N-<left is <= and
> N->right is >=

Phew.  I'm not sure but I get the feeling that makes implementing a
successor function, let's say, challenging.

I wonder how std::multimap deals with that.  Hm.

Anyways, with your removal of the visited flag (N thumbs up, for large
values of N :-)), most of the problems I preceived are gone anyway.
So, I guess we can live with the iteration stack, which seems to work
fine, and the alternative is only nice to have, now.

(And impacts are getting closer in the real world here, so I'm
a bit distracted anyway.)

>
> [ I do understand this comment now :-)  ]

:-)

>
>> 2. Does the tree order duplicates in a particular way?
>> Maybe by overlay priority, or by interval end?
>
> AFAICT, no it does not.

Ok.

>> And, perhaps, if it doesn't order duplicates, would it be an idea to
>> order them, maybe at some later point?  I'm asking this because
>> a successor(N) function would return nodes in the order in the tree,
>> always, and I don't know what the order is now.
>
> Ordering based on interval end could arguably make sense.  I'm not
> completely sure how/where it would benefit us, tho.  Most times we look
> at the itree, we just look at *all* the nodes within a specific
> interval, so the order in which they're returned doesn't matter very
> much (the ordering within the tree does matter in terms of how we manage
> to prune the tree, but that has more to do with which elements are near
> the root, which is a different kind of "ordering" than the "binary tree
> ordering" itself).  Maybe the `next/previous-overlay-change` code
> benefit from sub-ordering based on `end`, but I suspect the effect would
> be lost in the noise: if we want to speed up that part of the code,
> I expect that a better avenue is to rewrite the
> `next/previous-single-overlay-change` so as not to work by (while ..
> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
> both functions do the O(log N) itree-iteration dance, but instead doing
> the itree iteration itself.

Thanks.

>
>> 3. If my mental picture is right, we could of course end up with a tree
>> that has degenerated to a list.  Or a subtree, maybe.  Do you know if
>> that can happen?
>
> In terms of tree depth, no: the code preserves the RB invariants, which
> ensure balance regardless of ordering (or lack thereof).

Ok, thanks for your insights!





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-01  1:57         ` Richard Stallman
@ 2022-10-01  7:00           ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-10-01  7:00 UTC (permalink / raw)
  To: rms; +Cc: gerd.moellmann, 58158, monnier

> From: Richard Stallman <rms@gnu.org>
> Cc: eliz@gnu.org, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Fri, 30 Sep 2022 21:57:36 -0400
> 
>   > > What has to do with overlays.  To name a few: overlay-at, overlays-in,
>   > > next-overlay-change, previous-overlay-change, overlay-lists, ...
> 
> Those operations can be done in C with no risk of signaling an error.

I think such assumptions were proven dangerous at best in the long
run.  The way Emacs develops, we constantly add more and more hooks to
C code, and more and more direct calls into Lisp from C.  These
invalidate any assumptions about "no risk of signaling an error", even
if they were originally true when the code was first written.

Moreover, we test for QUIT in operations that can be prolonged ones
(and for a good reason), so any long loop in C could potentially throw
to top level if the user pressed C-g.

All in all, experience shows that making such assumptions in Emacs is
unsafe.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-30 16:04                               ` Eli Zaretskii
  2022-10-01  5:06                               ` Gerd Möllmann
@ 2022-10-01  7:25                               ` Gerd Möllmann
  2022-10-01 10:55                                 ` Gerd Möllmann
  2022-10-01 14:01                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 2 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-01  7:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Maybe you could also help me with the questions below?
>
> I'll try (BTW, the original author is Andreas Politz who we can still
> reach at <mail@andreas-politz.de>.  He doesn't have much time to devote
> to it, tho, so best not to Cc him through all the conversations).
>
>> I'm assuming, from a comment somewhere, that an interval tree is an
>> rb-tree with keys being interval start positions, and allowing
>> duplicates.
>
> Yup.
>
>> That is, if N is a node, all nodes in the subtree N->left are strictly <
>> N, and nodes in N->right are >=.
>
> The following code in `interval_tree_insert`:
>
>   while (child != ITREE_NULL)
>     {
>       parent = child;
>       offset += child->offset;
>       child->limit = max (child->limit, node->end - offset);
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;
>     }
>
> suggests that N->left are <= and N->right are > but my reading of the
> comment is that the only thing we can rely on is that N-<left is <= and
> N->right is >=
>

Yup.  I've used overlay-tree a bit (compile with ITREE_DEBUG defined
after pulling), and used this:

(defun make-ivs ()
  (with-current-buffer (get-buffer-create "*iv*")
    (delete-all-overlays)
    (erase-buffer)
    (insert (make-string 50 ?x))
    (let ((o1 (make-overlay 1 1))
	  (o2 (make-overlay 10 11))
	  (o3 (make-overlay 10 12))
	  (o4 (make-overlay 1 1))
	  )
      (move-overlay o4 10 13)
      (overlay-tree))))

(pp (make-ivs))
((:begin 10 :end 12 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
 ((:begin 1 :end 1 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
  nil
  ((:begin 10 :end 13 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
   nil nil))
 ((:begin 10 :end 11 :limit 11 :offset 0 :rear-advance nil :front-advance nil)
  nil nil))

That's 

                    [10, 12]
                   /        \
                 [1, 1]      [10, 11]
                 /    \         /\
                    [10, 13]
                     /    \
                          
The 10 is found "all over the place".

I surmise no reasonable successor function can be written for such a
tree.

I have to look at std::multimap, they manage to do this somehow.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-01  7:25                               ` Gerd Möllmann
@ 2022-10-01 10:55                                 ` Gerd Möllmann
  2022-10-01 14:01                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-01 10:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> I have to look at std::multimap, they manage to do this somehow.

Well, I should have thought of that, because it's obvious from the fact
they are able to use successor/predecessor functions :-/.

The equivalent in our code would go like below, which is just written
down in a hurry, so please bear with me.  It's just about the idea.  So:

Insert a new interval_node only if overlay start is not in the tree
already.  If the contains a node for the start, make the node's value a
list of overlays, and add the new one to it in an order that's
convenient (The STL uses insertion order).

As a consequence, keys is the rb-tree are unique, and it is always
strictly ordered, rotations don't change that.  The min node is the
left-most node in a tree, and so on.

So far so good.

An iterator in the order min->max would have to record the interval_node
in the tree plus information where in the list of overlays it is, if it
is in any.  Finding the next node is implemented with a successor
function like the one I've shown from libstdc++.

To find all overlays containing a given position POS, find the node
whose start is <= POS.  Start at the root of the tree, and walk the left
link until we find the ndoes's start is <= POS.  Then iterate forward
until start > POS, returning only overlapping overlays.

Finding overlays intersecting an interval [BEG, END] is not as nice, but
we can exclude intervals starting after END.  So we have to iterate over
all overlays from the minimum of the tree, and iterate forward until we
reach the first excluded one.

That's so "easy" that even I can understand how it works :-).

What am I overlooking?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-01  5:06                               ` Gerd Möllmann
@ 2022-10-01 13:54                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-10-02  8:22                                   ` Gerd Möllmann
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-01 13:54 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158

>> The following code in `interval_tree_insert`:
[...]
>> suggests that N->left are <= and N->right are > but my reading of the
>> comment is that the only thing we can rely on is that N-<left is <= and
>> N->right is >=
>
> Phew.  I'm not sure but I get the feeling that makes implementing a
> successor function, let's say, challenging.

I don't think it makes any difference in that respect, no.  The notion of
"successor" is just "the successor in the in-order traversal of the
tree" and while rotations change the initial property that "N->right are >",
they don't change the in-order traversal output (they don't re-order
nodes w.r.t in-order traversal (or in reverse order for that matter)).

As alluded to at the end of my previous email, while RB trees are
typically used for your usual binary tree which comes with some notion
of ordering that makes lookup O(log N), the RB trees ensure balance
without relying on any notion of ordering since the rotations just
blindly preserve the order as a side effect but they don't themselves
need to call the ordering function to decide how to do their job.

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-01  7:25                               ` Gerd Möllmann
  2022-10-01 10:55                                 ` Gerd Möllmann
@ 2022-10-01 14:01                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-01 14:01 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158

> That's 
>
>                     [10, 12]
>                    /        \
>                  [1, 1]      [10, 11]
>                  /    \         /\
>                     [10, 13]
>                      /    \
>                           
> The 10 is found "all over the place".

"all over the place" if you consider the pre-order or post-order
traversal, indeed.  But if you look at the in-order traversal (e.g., the
one you'd get from C++ `local_Rb_tree_increment` you showed), you get
the expected result:

   [1, 1]
   [10, 13]
   [10, 12]
   [10, 11]


-- Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30 18:29                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-10-02  8:06                               ` Gerd Möllmann
  0 siblings, 0 replies; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-02  8:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Info for Stefan: I've now removed the per-tree null node, and make check
>> shows no unexpected results.
>
> Thanks.  It's an improvement, but I still don't understand how it
> works :-(

Is it still null->parent?

If so, maybe it helps to picture that null is a child of many parents
:-).  Every node having null on the left or right is a parent in that
sense.  Therefore, if some code uses null->parent to do something it can
not be right.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-01 13:54                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-10-02  8:22                                   ` Gerd Möllmann
  2022-10-02 16:32                                     ` Andreas Politz
  0 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-02  8:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Phew.  I'm not sure but I get the feeling that makes implementing a
>> successor function, let's say, challenging.
>
> I don't think it makes any difference in that respect, no.

The reason I find it challenging, and I'm sure now, is that I've written
the following code, and failed :-).

/* FIXME: This assumption is wrong.  Nodes on the left of P are <=,
   and nodes on the right are >=.  */

/* Value is the successor of interval node X in ascending order.  It
   is assumed that the tree is organized so that nodes < X are in
   X->left and nodes in X->right are >= X.  */

static struct interval_node *
in_order_successor (struct interval_node *x)
{
  if (x->right != ITREE_NULL)
    {
      /* X has a right child, which means X's right subtree has
	 elements >= X.  Proceed to the left-most child in the right
	 subtree, which is the smallest in that subtree.  */
      x = x->right;
      while (x->left != 0)
	x = x->left;
    }
  else
    {
      /* X's left subtree is uninteresting, because everything there
	 is < X.  Therefore follow the parent chain.  If X is the
	 parent's right child, this means the parent is < X,  We are
	 looking for a parent that is >=.  */
      struct interval_node *y = x->parent;
      while (x == y->right)
	{
	  x = y;
	  y = y->parent;
	}

      /* If we found a parent that's >=, the parent is what we sought.
	 Otherwise, X has arrived at the null node, whose right child
	 is the sentinel node itself.  */
      if (x->right != y)
	x = y;
    }

  return x;
}

I tried to change the comments and/or modify the code for a tree like we
have (left subtree <=, right >=), and couldn't explain why it works,
but I also couldn't produce a counter-example that the existing tree
code actually can produce.  IOW, I couldn't prove anything.

P.S.

With the "all over the place" I indented to hint at the fact that the
tree is not a "normal" BST.  I should probably have said that directly,
sorry.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-02  8:22                                   ` Gerd Möllmann
@ 2022-10-02 16:32                                     ` Andreas Politz
  2022-10-03  4:35                                       ` Gerd Möllmann
  0 siblings, 1 reply; 46+ messages in thread
From: Andreas Politz @ 2022-10-02 16:32 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

I've managed to construct a prototype of this "stateless" iterator.

I've only implemented the in-order case and only applied it in the overlays_in function.

The only real trouble I had was with pushing the offset to the children
during the tree navigation in some kind of sensible way.  In the end I
gave up and simply pasted calls to the corresponding function "all over
the place".  It seems to work, at least buffer-tests are passing.

Andreas


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: noverlay-stateless-iterator.diff --]
[-- Type: text/x-patch, Size: 6871 bytes --]

diff --git a/src/buffer.c b/src/buffer.c
index f59fddcbde..6a53b49aad 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -2948,15 +2948,16 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend,
   ptrdiff_t next = ZV;
   Lisp_Object *vec = *vec_ptr;
   struct interval_node *node;
+  struct interval_tree_iterator iterator;
 
-  ITREE_FOREACH (node, current_buffer->overlays, beg,
-                 /* Find empty OV at Z ? */
-                 (end >= ZV && empty) ? ZV + 1 : ZV, ASCENDING)
+  interval_tree_iterator_init(&iterator, current_buffer->overlays, beg,
+			      (end >= ZV && empty) ? ZV + 1 : ZV, ITREE_ASCENDING);
+
+  while ((node = interval_tree_iterator_next(&iterator)))
     {
       if (node->begin > end)
         {
           next = min (next, node->begin);
-          ITREE_FOREACH_ABORT ();
           break;
         }
       else if (node->begin == end)
@@ -2964,7 +2965,6 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend,
           next = node->begin;
           if ((! empty || end < ZV) && beg < end)
             {
-              ITREE_FOREACH_ABORT ();
               break;
             }
         }
diff --git a/src/itree.c b/src/itree.c
index eeecaf1839..21549fe8a7 100644
--- a/src/itree.c
+++ b/src/itree.c
@@ -1190,3 +1190,132 @@ interval_tree_subtree_min (const struct interval_tree *tree, struct interval_nod
  * +===================================================================================+ */
 
 /* See Foverlay_tree in buffer.c */
+
+\f
+/* +===================================================================================+
+ * | Stateless iterator
+ * +===================================================================================+ */
+
+static bool
+interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator,
+                              struct interval_node *node);
+static
+struct interval_node *
+interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator);
+
+void
+interval_tree_iterator_init(struct interval_tree_iterator *iterator,
+                            struct interval_tree *tree,
+                            ptrdiff_t begin,
+                            ptrdiff_t end,
+                            enum interval_tree_order order) {
+  iterator->tree = tree;
+  iterator->node = tree && begin <= end ? ITREE_NULL : NULL;
+  iterator->begin = begin;
+  iterator->end = end;
+  iterator->order = order;
+}
+
+struct interval_node *
+interval_tree_iterator_next(struct interval_tree_iterator *iterator) {
+  if (iterator->node) {
+    do {
+      switch (iterator->order) {
+      case ITREE_ASCENDING:
+        iterator->node = interval_tree_iterator_in_order_successor (iterator);
+        break;
+      default:
+        fprintf (stderr, "interval_tree_order != ITREE_ASCENDING not implemented");
+        emacs_abort ();
+      }
+    } while (iterator->node &&
+             ! interval_node_intersects (iterator->node, iterator->begin, iterator->end));
+  }
+
+  if (iterator->node == ITREE_NULL) {
+    fprintf (stderr, "Next node: ITREE_NULL\n");
+  } else if (! iterator->node) {
+    fprintf (stderr, "Next node: NULL\n");
+  } else {
+    fprintf (stderr, "Next node: begin = %ld, end = %ld (iterator: begin = %ld, end = %ld)\n",
+             iterator->node->begin, iterator->node->end, iterator->begin, iterator->end);
+  }
+  return iterator->node;
+}
+
+static bool
+interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator,
+                              struct interval_node *node) {
+  eassert (node);
+
+  if (node == ITREE_NULL) {
+    return false;
+  } else {
+    eassert (node->parent != ITREE_NULL);
+    if (node->parent->left == node) {
+      return iterator->begin <= node->limit + node->offset;
+    } else {
+      return node->parent->begin <= iterator->end;
+    }
+  }
+}
+
+static
+struct interval_node *
+interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator)
+{
+  struct interval_node *node = iterator->node;
+
+  if (node != ITREE_NULL) {
+    interval_tree_inherit_offset (iterator->tree, node);
+  }
+
+  if (node == ITREE_NULL) {
+    node = iterator->tree->root;
+    if (node != ITREE_NULL) {
+      interval_tree_inherit_offset (iterator->tree, node);
+    }
+    while (interval_tree_iter_traverse_p(iterator, node->left)) {
+      node = node->left;
+      if (node != ITREE_NULL) {
+        interval_tree_inherit_offset (iterator->tree, node);
+      }
+    }
+  } else if (interval_tree_iter_traverse_p(iterator, node->right))
+    {
+      node = node->right;
+      if (node != ITREE_NULL) {
+        interval_tree_inherit_offset (iterator->tree, node);
+      }
+      while (interval_tree_iter_traverse_p(iterator, node->left)) {
+        node = node->left;
+        if (node != ITREE_NULL) {
+          interval_tree_inherit_offset (iterator->tree, node);
+        }
+      }
+    }
+  else
+    {
+      struct interval_node *parent = node->parent;
+      while (node == parent->right)
+        {
+          node = parent;
+          parent = parent->parent;
+        }
+      if (node != ITREE_NULL)
+        node = parent;
+    }
+
+  return node == ITREE_NULL ? NULL : node;
+}
+
+void
+interval_tree_iterator_narrow (struct interval_tree_iterator *iterator,
+                               ptrdiff_t begin,
+                               ptrdiff_t end)
+{
+  eassert (begin >= iterator->begin);
+  eassert (end <= iterator->end);
+  iterator->begin =  max (begin, iterator->begin);
+  iterator->end =  min (end, iterator->end);
+}
diff --git a/src/itree.h b/src/itree.h
index 1f019a2607..53f03cca35 100644
--- a/src/itree.h
+++ b/src/itree.h
@@ -72,6 +72,15 @@ #define ITREE_NULL (&itree_null)
   ITREE_PRE_ORDER,
 };
 
+struct interval_tree_iterator
+{
+  struct interval_tree *tree;
+  struct interval_node *node;
+  ptrdiff_t begin;
+  ptrdiff_t end;
+  enum interval_tree_order order;
+};
+
 void interval_node_init (struct interval_node *, ptrdiff_t, ptrdiff_t, bool, bool, Lisp_Object);
 ptrdiff_t interval_node_begin (struct interval_tree *, struct interval_node *);
 ptrdiff_t interval_node_end (struct interval_tree *, struct interval_node *);
@@ -135,4 +144,15 @@ #define ITREE_FOREACH_ABORT() \
 #define ITREE_FOREACH_NARROW(beg, end) \
   interval_generator_narrow (itree_iter_, beg, end)
 
+void interval_tree_nodes (struct interval_tree *tree, struct interval_node **nodes, enum interval_tree_order order);
+
+void interval_tree_iterator_init(struct interval_tree_iterator *iterator,
+				 struct interval_tree *tree,
+				 ptrdiff_t begin,
+				 ptrdiff_t end,
+				 enum interval_tree_order order);
+struct interval_node *interval_tree_iterator_next(struct interval_tree_iterator *iterator);
+void interval_tree_iterator_narrow (struct interval_tree_iterator *iterator,
+			       ptrdiff_t begin,
+			       ptrdiff_t end);
 #endif

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-02 16:32                                     ` Andreas Politz
@ 2022-10-03  4:35                                       ` Gerd Möllmann
  2022-10-04 10:50                                         ` Andreas Politz
  0 siblings, 1 reply; 46+ messages in thread
From: Gerd Möllmann @ 2022-10-03  4:35 UTC (permalink / raw)
  To: Andreas Politz; +Cc: Eli Zaretskii, 58158, Stefan Monnier

Andreas Politz <mail@andreas-politz.de> writes:

> It seems to work, at least buffer-tests are passing.

"seems to work" is a bit weak.  Can we prove it?

I don't mean mathematically, but by reasoning like I tried in the
comments in successor function I posted,





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-03  4:35                                       ` Gerd Möllmann
@ 2022-10-04 10:50                                         ` Andreas Politz
  0 siblings, 0 replies; 46+ messages in thread
From: Andreas Politz @ 2022-10-04 10:50 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier

My implementation seems to work. The correctness of the algorithm would follow from 

1. The Rb tree algorithm produces a tree with left <= root <= right,
2. the algorithm you’ve posted, and I’ve  adapted, produces an in–order of the tree and
3. the conditions guiding the traversal are correct, i.e. they don’t exclude matching intervals.

Since I believe these statements are true, I believe the algorithm is correct.

Andreas

> Am 03.10.2022 um 06:35 schrieb Gerd Möllmann <gerd.moellmann@gmail.com>:
> 
> Andreas Politz <mail@andreas-politz.de> writes:
> 
>> It seems to work, at least buffer-tests are passing.
> 
> "seems to work" is a bit weak.  Can we prove it?
> 
> I don't mean mathematically, but by reasoning like I tried in the
> comments in successor function I posted,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann
  2022-09-29  6:28 ` Eli Zaretskii
@ 2022-10-06 22:26 ` Matt Armstrong
  2023-10-06 13:14 ` Gerd Möllmann
  2 siblings, 0 replies; 46+ messages in thread
From: Matt Armstrong @ 2022-10-06 22:26 UTC (permalink / raw)
  To: Andreas Politz, Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier

Andreas Politz <mail@andreas-politz.de> writes:

> I've managed to construct a prototype of this "stateless" iterator.
>
> I've only implemented the in-order case and only applied it in the overlays_in function.
>
> The only real trouble I had was with pushing the offset to the children
> during the tree navigation in some kind of sensible way.  In the end I
> gave up and simply pasted calls to the corresponding function "all over
> the place".  It seems to work, at least buffer-tests are passing.

Hey Andreas, this looks pretty good to me but I have some questions on
it.


> +static
> +struct interval_node *
> +interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator)
> +{
> +  struct interval_node *node = iterator->node;
> +
> +  if (node != ITREE_NULL) {
> +    interval_tree_inherit_offset (iterator->tree, node);
> +  }
> +  if (node == ITREE_NULL) {
> +    node = iterator->tree->root;

I don't understand this branch.  If the node is NULL upon entry to the
successor call, why start at the root?  Why is the "next in order node"
of NULL the root?  The root is just an node whose BEG is roughly in the
middle of the total inorder traversal, right?

The "stateless" inorder traversal algorithm I am used to starts with the
minimum node (walk only left links down from the root until at a leaf
and that is the minimum).  Then do inorder traversal from there.


> +    if (node != ITREE_NULL) {
> +      interval_tree_inherit_offset (iterator->tree, node);
> +    }
> +    while (interval_tree_iter_traverse_p(iterator, node->left)) {
> +      node = node->left;
> +      interval_tree_inherit_offset (iterator->tree, node);
> +    }
> +  } else if (interval_tree_iter_traverse_p(iterator, node->right))
> +    {
> +      node = node->right;
> +      interval_tree_inherit_offset (iterator->tree, node);
> +      while (interval_tree_iter_traverse_p(iterator, node->left)) {
> +        node = node->left;
> +        interval_tree_inherit_offset (iterator->tree, node);
> +      }
> +    }
> +  else
> +    {
> +      struct interval_node *parent = node->parent;
> +      while (node == parent->right)
> +        {
> +          node = parent;
> +          parent = parent->parent;
> +        }
> +      if (node != ITREE_NULL)
> +        node = parent;
> +    }
> +
> +  return node == ITREE_NULL ? NULL : node;
> +}

I don't understand how the last "else" case works correctly in the face
of a call to `interval_generator_narrow` during iteration.  Narrowing
could render a node's parent out of the "interesting" range, and so the
iterator should not return it but instead keep trying to find a
successor in range.  Right?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-30  6:11                         ` Eli Zaretskii
  2022-09-30 11:31                           ` Gerd Möllmann
@ 2022-10-06 22:36                           ` Dmitry Gutov
  2022-10-07 19:47                             ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Dmitry Gutov @ 2022-10-06 22:36 UTC (permalink / raw)
  To: Eli Zaretskii, Gerd Möllmann; +Cc: 58158, monnier

On 30.09.2022 09:11, Eli Zaretskii wrote:
> "M-x xref-query-replace-in-results RET"

JFYI this command has the convenient binding 'r' in Xref output buffers.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-06 22:36                           ` Dmitry Gutov
@ 2022-10-07 19:47                             ` Eli Zaretskii
  2022-10-08 18:50                               ` Dmitry Gutov
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-10-07 19:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier

> Date: Fri, 7 Oct 2022 01:36:17 +0300
> Cc: 58158@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> On 30.09.2022 09:11, Eli Zaretskii wrote:
> > "M-x xref-query-replace-in-results RET"
> 
> JFYI this command has the convenient binding 'r' in Xref output buffers.

It also works only when invoked from the Xref buffer, right?

Btw, what am I doing wrong below?

 emacs -Q
 C-x C-f src/character.h RET
 M-x visit-tags-table RET RET
 M-. char_string RET
 r whatever RET

Unexpected result: "No suitable matches here".  Huh? what did I miss?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-07 19:47                             ` Eli Zaretskii
@ 2022-10-08 18:50                               ` Dmitry Gutov
  2022-10-10  8:10                                 ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Gutov @ 2022-10-08 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 58158, monnier

On 07.10.2022 22:47, Eli Zaretskii wrote:
>> Date: Fri, 7 Oct 2022 01:36:17 +0300
>> Cc: 58158@debbugs.gnu.org, monnier@iro.umontreal.ca
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> On 30.09.2022 09:11, Eli Zaretskii wrote:
>>> "M-x xref-query-replace-in-results RET"
>>
>> JFYI this command has the convenient binding 'r' in Xref output buffers.
> 
> It also works only when invoked from the Xref buffer, right?

Yep.

But we also have commands like xref-find-references-and-replace and 
dired-do-find-regexp-and-replace.

> Btw, what am I doing wrong below?
> 
>   emacs -Q
>   C-x C-f src/character.h RET
>   M-x visit-tags-table RET RET
>   M-. char_string RET
>   r whatever RET
> 
> Unexpected result: "No suitable matches here".  Huh? what did I miss?

We can't replace over "find definition" matches: they are more abstract 
and don't contain the necessary information to perform the replacement 
(such as the length of a match, for instance).

And such xrefs might navigate you to the beginning of the line, for 
example, rather than to the beginning of the name.

But that makes sense, doesn't it? If replacing over "find definitions" 
results worked fine, in the end you would get a codebase where all 
declarations of a method 'foo' got renamed, but all callsites of it 
remain unchanged. That couldn't have been your intention, could it?

The error message could use some improvement, I suppose, but I'm not 
sure how to make it better.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-08 18:50                               ` Dmitry Gutov
@ 2022-10-10  8:10                                 ` Eli Zaretskii
  2022-10-11  2:12                                   ` Dmitry Gutov
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-10-10  8:10 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier

> Date: Sat, 8 Oct 2022 21:50:39 +0300
> Cc: gerd.moellmann@gmail.com, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> > Btw, what am I doing wrong below?
> > 
> >   emacs -Q
> >   C-x C-f src/character.h RET
> >   M-x visit-tags-table RET RET
> >   M-. char_string RET
> >   r whatever RET
> > 
> > Unexpected result: "No suitable matches here".  Huh? what did I miss?
> 
> We can't replace over "find definition" matches: they are more abstract 
> and don't contain the necessary information to perform the replacement 
> (such as the length of a match, for instance).
> 
> And such xrefs might navigate you to the beginning of the line, for 
> example, rather than to the beginning of the name.
> 
> But that makes sense, doesn't it? If replacing over "find definitions" 
> results worked fine, in the end you would get a codebase where all 
> declarations of a method 'foo' got renamed, but all callsites of it 
> remain unchanged. That couldn't have been your intention, could it?
> 
> The error message could use some improvement, I suppose, but I'm not 
> sure how to make it better.

I tried to improve the situation, both in the error message, the doc
string, and the manual.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-10  8:10                                 ` Eli Zaretskii
@ 2022-10-11  2:12                                   ` Dmitry Gutov
  2022-10-11  6:37                                     ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Gutov @ 2022-10-11  2:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 58158, monnier

On 10.10.2022 11:10, Eli Zaretskii wrote:
>> Date: Sat, 8 Oct 2022 21:50:39 +0300
>> Cc:gerd.moellmann@gmail.com,58158@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>>> Btw, what am I doing wrong below?
>>>
>>>    emacs -Q
>>>    C-x C-f src/character.h RET
>>>    M-x visit-tags-table RET RET
>>>    M-. char_string RET
>>>    r whatever RET
>>>
>>> Unexpected result: "No suitable matches here".  Huh? what did I miss?
>> We can't replace over "find definition" matches: they are more abstract
>> and don't contain the necessary information to perform the replacement
>> (such as the length of a match, for instance).
>>
>> And such xrefs might navigate you to the beginning of the line, for
>> example, rather than to the beginning of the name.
>>
>> But that makes sense, doesn't it? If replacing over "find definitions"
>> results worked fine, in the end you would get a codebase where all
>> declarations of a method 'foo' got renamed, but all callsites of it
>> remain unchanged. That couldn't have been your intention, could it?
>>
>> The error message could use some improvement, I suppose, but I'm not
>> sure how to make it better.
> I tried to improve the situation, both in the error message, the doc
> string, and the manual.

It's probably better now, but still likely confusing.

What is a "subset of matches"? The user makes a search, gets a bunch of 
matches. Is it fair to call them "only a subset" is the user only 
searched for definitions?

Or to take another example, the user can search for some regexp inside a 
particular directly, with dired-do-find-regexp. Imagine that that 
directory is not the whole project (perhaps just a minor subdirectory). 
Would it be fair to call the results "only a subset" then? But 
xref-query-replace-in-results will work in that case, because the search 
returns values which have enough info to perform the replacements.

Perhaps we should make the error very specific, like "you can't replace 
inside xref-find-definitions results". Since that is going to be the 
exception in like 99.9% of the cases.

It's possible for other backend commands (such as find-references) to 
return unsuitable xref values in third-party backends, or for other code 
to use xref-show-xrefs with such, but those will be really very rare, if 
happen at all.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-10-11  2:12                                   ` Dmitry Gutov
@ 2022-10-11  6:37                                     ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-10-11  6:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier

> Date: Tue, 11 Oct 2022 05:12:11 +0300
> Cc: gerd.moellmann@gmail.com, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> What is a "subset of matches"?

Feel free to suggest a less vague description.  The idea is that the
list in Xref buffer doesn't show all the references to the identifier,
making renaming infeasible.

> Perhaps we should make the error very specific, like "you can't replace 
> inside xref-find-definitions results". Since that is going to be the 
> exception in like 99.9% of the cases.

That'd be my preference, but what are those 0.1% of cases where the
Xref buffer produced by other commands could fit?

More generally, what exactly does xref.el test to produce the error
message, and how to describe that in user-level terms?

> It's possible for other backend commands (such as find-references) to 
> return unsuitable xref values in third-party backends, or for other code 
> to use xref-show-xrefs with such, but those will be really very rare, if 
> happen at all.

You are saying that 'r' is only useful after M-?, is that right?  The
manual says so, but the manual doesn't have to say "the whole truth".
The doc string should.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
  2022-09-29  5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann
  2022-09-29  6:28 ` Eli Zaretskii
  2022-10-06 22:26 ` Matt Armstrong
@ 2023-10-06 13:14 ` Gerd Möllmann
  2 siblings, 0 replies; 46+ messages in thread
From: Gerd Möllmann @ 2023-10-06 13:14 UTC (permalink / raw)
  To: 58158

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> In its current form, interval tree iteration works like this:
>
> 1. Call begin_iteration(T) to iterate over tree T
> 2. do stuff
> 3. Call end_iteration(T)
>
> with the following rules:
>
> - Begin_iteration and end_iteration must be paired.
>
> - There can be only one iteration per tree at a time.  Nested iteration
>   over the same tree is not supported (abort).
>
> - No GC may happen in step 2.  This is because mark_buffer iterates over
>   buffer overlays.
>
> I think this is an exceedingly dangerous design.

Time has passed, and I think this is no longer relevant.
I'm therefore closing this bug.





^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2023-10-06 13:14 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29  5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann
2022-09-29  6:28 ` Eli Zaretskii
2022-09-29  7:03   ` Gerd Möllmann
2022-09-29  8:08     ` Eli Zaretskii
2022-09-29  9:09       ` Gerd Möllmann
2022-09-29  9:37         ` Eli Zaretskii
2022-09-29 10:05           ` Gerd Möllmann
2022-09-29 10:43             ` Eli Zaretskii
2022-09-29 11:33               ` Gerd Möllmann
2022-09-29 13:10               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-29 13:23                 ` Eli Zaretskii
2022-09-29 16:48                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-29 13:40                 ` Eli Zaretskii
2022-09-29 14:15                 ` Gerd Möllmann
2022-09-29 14:37                   ` Gerd Möllmann
2022-09-29 22:09                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-30  5:28                       ` Gerd Möllmann
2022-09-30  6:11                         ` Eli Zaretskii
2022-09-30 11:31                           ` Gerd Möllmann
2022-09-30 18:29                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-02  8:06                               ` Gerd Möllmann
2022-10-06 22:36                           ` Dmitry Gutov
2022-10-07 19:47                             ` Eli Zaretskii
2022-10-08 18:50                               ` Dmitry Gutov
2022-10-10  8:10                                 ` Eli Zaretskii
2022-10-11  2:12                                   ` Dmitry Gutov
2022-10-11  6:37                                     ` Eli Zaretskii
2022-09-30 13:25                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-30 14:08                           ` Gerd Möllmann
2022-09-30 15:25                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-30 16:04                               ` Eli Zaretskii
2022-09-30 17:11                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-01  5:06                               ` Gerd Möllmann
2022-10-01 13:54                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-02  8:22                                   ` Gerd Möllmann
2022-10-02 16:32                                     ` Andreas Politz
2022-10-03  4:35                                       ` Gerd Möllmann
2022-10-04 10:50                                         ` Andreas Politz
2022-10-01  7:25                               ` Gerd Möllmann
2022-10-01 10:55                                 ` Gerd Möllmann
2022-10-01 14:01                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-29 16:40                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-01  1:57         ` Richard Stallman
2022-10-01  7:00           ` Eli Zaretskii
2022-10-06 22:26 ` Matt Armstrong
2023-10-06 13:14 ` Gerd Möllmann

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).