* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful @ 2022-09-29 5:29 Gerd Möllmann 2022-09-29 6:28 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 5:29 UTC (permalink / raw) To: 58158 In its current form, interval tree iteration works like this: 1. Call begin_iteration(T) to iterate over tree T 2. do stuff 3. Call end_iteration(T) with the following rules: - Begin_iteration and end_iteration must be paired. - There can be only one iteration per tree at a time. Nested iteration over the same tree is not supported (abort). - No GC may happen in step 2. This is because mark_buffer iterates over buffer overlays. I think this is an exceedingly dangerous design. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann @ 2022-09-29 6:28 ` Eli Zaretskii 2022-09-29 7:03 ` Gerd Möllmann 2022-10-06 22:26 ` Matt Armstrong 2023-10-06 13:14 ` Gerd Möllmann 2 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 6:28 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 58158 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Date: Thu, 29 Sep 2022 07:29:25 +0200 > > In its current form, interval tree iteration works like this: > > 1. Call begin_iteration(T) to iterate over tree T > 2. do stuff > 3. Call end_iteration(T) > > with the following rules: > > - Begin_iteration and end_iteration must be paired. > > - There can be only one iteration per tree at a time. Nested iteration > over the same tree is not supported (abort). > > - No GC may happen in step 2. This is because mark_buffer iterates over > buffer overlays. > > I think this is an exceedingly dangerous design. Why, because of "no GC" requirement? We could ensure that by calling inhibit_garbage_collection (if the code doesn't do that already). That is not very elegant, and might even cause memory pressure in some (hopefully rare) situations, but we do have some users of this in the sources. What higher-level operations require "interval tree iteration" that you describe? Which primitives end up doing such iterations? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 6:28 ` Eli Zaretskii @ 2022-09-29 7:03 ` Gerd Möllmann 2022-09-29 8:08 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 7:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 58158 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Thu, 29 Sep 2022 07:29:25 +0200 >> >> In its current form, interval tree iteration works like this: >> >> 1. Call begin_iteration(T) to iterate over tree T >> 2. do stuff >> 3. Call end_iteration(T) >> >> with the following rules: >> >> - Begin_iteration and end_iteration must be paired. >> >> - There can be only one iteration per tree at a time. Nested iteration >> over the same tree is not supported (abort). >> >> - No GC may happen in step 2. This is because mark_buffer iterates over >> buffer overlays. >> >> I think this is an exceedingly dangerous design. > > Why, because of "no GC" requirement? We could ensure that by calling > inhibit_garbage_collection (if the code doesn't do that already). It doesn't. BTW, if anything signals in step 2, so that end_iteration isn't called, we're also hosed. > What higher-level operations require "interval tree iteration" that > you describe? Which primitives end up doing such iterations? What has to do with overlays. To name a few: overlay-at, overlays-in, next-overlay-change, previous-overlay-change, overlay-lists, ... I personally think this is a no-go. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 7:03 ` Gerd Möllmann @ 2022-09-29 8:08 ` Eli Zaretskii 2022-09-29 9:09 ` Gerd Möllmann 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 8:08 UTC (permalink / raw) To: Gerd Möllmann, Stefan Monnier; +Cc: 58158 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: 58158@debbugs.gnu.org > Date: Thu, 29 Sep 2022 09:03:17 +0200 > > >> - No GC may happen in step 2. This is because mark_buffer iterates over > >> buffer overlays. > >> > >> I think this is an exceedingly dangerous design. > > > > Why, because of "no GC" requirement? We could ensure that by calling > > inhibit_garbage_collection (if the code doesn't do that already). > > It doesn't. Should be easy to fix, no? > BTW, if anything signals in step 2, so that end_iteration isn't called, > we're also hosed. record_unwind_protect should fix that, right? (inhibit_garbage_collection already employs this mechanism). > > What higher-level operations require "interval tree iteration" that > > you describe? Which primitives end up doing such iterations? > > What has to do with overlays. To name a few: overlay-at, overlays-in, > next-overlay-change, previous-overlay-change, overlay-lists, ... > > I personally think this is a no-go. Really? Even if we take all the measures mentioned above? Why so? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 8:08 ` Eli Zaretskii @ 2022-09-29 9:09 ` Gerd Möllmann 2022-09-29 9:37 ` Eli Zaretskii 2022-10-01 1:57 ` Richard Stallman 0 siblings, 2 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 9:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 58158, Stefan Monnier Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> I personally think this is a no-go. > > Really? Even if we take all the measures mentioned above? Yes. > Why so? I think it simply can't be that what is basically walking a binary tree requires such restrictions. And if it does because of some quirk of the interval tree in itree.c, something is seriously wrong with the design. That's a bit harsh, but I mean it :-). ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 9:09 ` Gerd Möllmann @ 2022-09-29 9:37 ` Eli Zaretskii 2022-09-29 10:05 ` Gerd Möllmann 2022-10-01 1:57 ` Richard Stallman 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 9:37 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 58158, monnier > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 58158@debbugs.gnu.org > Date: Thu, 29 Sep 2022 11:09:17 +0200 > > I think it simply can't be that what is basically walking a binary tree > requires such restrictions. And if it does because of some quirk of the > interval tree in itree.c, something is seriously wrong with the design. > > That's a bit harsh, but I mean it :-). Fair enough. Can you propose a fix? I guess the begin_iteration/end_iteration dance is because the tree could be in inconsistent state in-between? If so, what would it take to avoid the inconsistency? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 9:37 ` Eli Zaretskii @ 2022-09-29 10:05 ` Gerd Möllmann 2022-09-29 10:43 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 10:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 58158, monnier Eli Zaretskii <eliz@gnu.org> writes: > Can you propose a fix? Other than completely rewrite at least this part, no. > I guess the begin_iteration/end_iteration dance is because the tree > could be in inconsistent state in-between? If so, what would it take > to avoid the inconsistency? I find it very hard to tell why this is done the way it is. If someone knowing the code can think of a reason, please speak up. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 10:05 ` Gerd Möllmann @ 2022-09-29 10:43 ` Eli Zaretskii 2022-09-29 11:33 ` Gerd Möllmann 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 10:43 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 58158, monnier > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: monnier@iro.umontreal.ca, 58158@debbugs.gnu.org > Date: Thu, 29 Sep 2022 12:05:50 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Can you propose a fix? > > Other than completely rewrite at least this part, no. > > > I guess the begin_iteration/end_iteration dance is because the tree > > could be in inconsistent state in-between? If so, what would it take > > to avoid the inconsistency? > > I find it very hard to tell why this is done the way it is. If someone > knowing the code can think of a reason, please speak up. I may be missing something, but it looks like the sole purpose of the iter_start/iter_finish dance is to ensure only one iteration per tree is running at any given time, and that's because the iteration uses some state variable(s) of which there's only one instance per tree. Stefan, am I missing something? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 10:43 ` Eli Zaretskii @ 2022-09-29 11:33 ` Gerd Möllmann 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 11:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 58158, monnier Eli Zaretskii <eliz@gnu.org> writes: >> I find it very hard to tell why this is done the way it is. If someone >> knowing the code can think of a reason, please speak up. > > I may be missing something, but it looks like the sole purpose of the > iter_start/iter_finish dance is to ensure only one iteration per tree > is running at any given time, and that's because the iteration uses > some state variable(s) of which there's only one instance per tree. BTW, It currently uses a "visited" flag in tree nodes, which is kind of such a state, but something like that is normally not needed when walking an rb-tree. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 10:43 ` Eli Zaretskii 2022-09-29 11:33 ` Gerd Möllmann @ 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:23 ` Eli Zaretskii ` (2 more replies) 1 sibling, 3 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 13:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Gerd Möllmann, 58158 > I may be missing something, but it looks like the sole purpose of the > iter_start/iter_finish dance is to ensure only one iteration per tree > is running at any given time, and that's because the iteration uses > some state variable(s) of which there's only one instance per tree. > > Stefan, am I missing something? One reason is that traversing a binary tree usually requires something like recursion, but that wouldn't fit very conveniently with the current code (nor with C in general since you can't make a local recursive closure which accesses local variables from the surrounding function). Another is the need to update the begin/end fields (these need updating because of insertions/deletions but they're updated lazily while traversing the tree to avoid an O(N) complexity during the insertions/deletions). Hiding that behind 'some kind of "next node" function keeps the code more readable. But yes, the current restriction to have a single iteration at a time is a bit of a problem, especially because it's very "global". I added a comment yesterday describing how we could make it non-global (hence getting rid of the `visited` flag in the nodes). For now, I pushed a simple fix to traverse the tree "by hand" in the GC rather than via the iterator. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 13:23 ` Eli Zaretskii 2022-09-29 16:48 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:40 ` Eli Zaretskii 2022-09-29 14:15 ` Gerd Möllmann 2 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 13:23 UTC (permalink / raw) To: Stefan Monnier; +Cc: gerd.moellmann, 58158 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, > 58158@debbugs.gnu.org > Date: Thu, 29 Sep 2022 09:10:19 -0400 > > > I may be missing something, but it looks like the sole purpose of the > > iter_start/iter_finish dance is to ensure only one iteration per tree > > is running at any given time, and that's because the iteration uses > > some state variable(s) of which there's only one instance per tree. > > > > Stefan, am I missing something? > > One reason is that traversing a binary tree usually requires something > like recursion, but that wouldn't fit very conveniently with the current > code (nor with C in general since you can't make a local recursive > closure which accesses local variables from the surrounding function). I'm not sure I understand how recursion is related. Are you saying that recursion is replaced with iteration? But then, if _start and _finish are called by the same caller, we don't really need the protection, since no one can start another iteration while the first one runs. Right? > Another is the need to update the begin/end fields (these need updating > because of insertions/deletions but they're updated lazily while > traversing the tree to avoid an O(N) complexity during the > insertions/deletions). Hiding that behind 'some kind of "next node" > function keeps the code more readable. Where in the code do you see iteration that adds or deletes nodes? > For now, I pushed a simple fix to traverse the tree "by hand" in the GC > rather than via the iterator. So this removes the restriction of not having GC during iteration? Also, I take it that you don't consider the current code is as "harmful" as Gerd thinks it is? IOW, you don't share his opinion that this implementation is a "no-go"? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 13:23 ` Eli Zaretskii @ 2022-09-29 16:48 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 0 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 16:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 58158 >> > I may be missing something, but it looks like the sole purpose of the >> > iter_start/iter_finish dance is to ensure only one iteration per tree >> > is running at any given time, and that's because the iteration uses >> > some state variable(s) of which there's only one instance per tree. >> > >> > Stefan, am I missing something? >> >> One reason is that traversing a binary tree usually requires something >> like recursion, but that wouldn't fit very conveniently with the current >> code (nor with C in general since you can't make a local recursive >> closure which accesses local variables from the surrounding function). > > I'm not sure I understand how recursion is related. Are you saying > that recursion is replaced with iteration? But then, if _start and > _finish are called by the same caller, we don't really need the > protection, since no one can start another iteration while the first > one runs. Right? Typically, current code will look something like: int x; Lisp_Object y; buffer_overlay_iter_start (current_buffer, prev, pos, ITREE_DESCENDING); while ((node = buffer_overlay_iter_next (current_buffer))) { ... do something that updates x and y ... } buffer_overlay_iter_finish (current_buffer); If we were to use recursion, then we'd need to define a new (recursive) function which does what's currently done in the `while` loop, but this function can't access `x` and `y`, so it would need to take them as argument, or a reference to them... The use of an iterator is definitely convenient (and is a standard approach in many languages). >> Another is the need to update the begin/end fields (these need updating >> because of insertions/deletions but they're updated lazily while >> traversing the tree to avoid an O(N) complexity during the >> insertions/deletions). Hiding that behind 'some kind of "next node" >> function keeps the code more readable. > > Where in the code do you see iteration that adds or deletes nodes? No, I meant insertion/deletion of text in the buffer, thus requiring updates to `begin/end` fields. > Btw, couldn't we handle this by having a flag in the node that tells > us whether the begin/end fields can be trusted? Then the first caller > who need them would run the update and reset the flags, and we still > have lazy update, albeit somewhat less lazy, but without the need for > guarding the iteration with start/finish calls. Would that work? Yes, it would, but it's still O(N). The current approach is inspired by the approach used in `intervals.c` which also updates those fields lazily/ondemand so as to avoid the O(N) performance impact. >> For now, I pushed a simple fix to traverse the tree "by hand" in the GC >> rather than via the iterator. > So this removes the restriction of not having GC during iteration? Yes. > Also, I take it that you don't consider the current code is as > "harmful" as Gerd thinks it is? I don't like the global state it uses, but I think we can fix this aspect without too much trouble. > IOW, you don't share his opinion that this implementation is > a "no-go"? No, indeed, I don't. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:23 ` Eli Zaretskii @ 2022-09-29 13:40 ` Eli Zaretskii 2022-09-29 14:15 ` Gerd Möllmann 2 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-09-29 13:40 UTC (permalink / raw) To: Stefan Monnier; +Cc: gerd.moellmann, 58158 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, > 58158@debbugs.gnu.org > Date: Thu, 29 Sep 2022 09:10:19 -0400 > > Another is the need to update the begin/end fields (these need updating > because of insertions/deletions but they're updated lazily while > traversing the tree to avoid an O(N) complexity during the > insertions/deletions). Hiding that behind 'some kind of "next node" > function keeps the code more readable. Btw, couldn't we handle this by having a flag in the node that tells us whether the begin/end fields can be trusted? Then the first caller who need them would run the update and reset the flags, and we still have lazy update, albeit somewhat less lazy, but without the need for guarding the iteration with start/finish calls. Would that work? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:23 ` Eli Zaretskii 2022-09-29 13:40 ` Eli Zaretskii @ 2022-09-29 14:15 ` Gerd Möllmann 2022-09-29 14:37 ` Gerd Möllmann 2022-09-29 16:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2 siblings, 2 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 14:15 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: > One reason is that traversing a binary tree usually requires something > like recursion, but that wouldn't fit very conveniently with the current > code (nor with C in general since you can't make a local recursive > closure which accesses local variables from the surrounding function). Ok, usually, but not necessarily. The alternative is to implement an iterator that starts with a node N, and an implementation of a successor function, which return the successor of N in a given order. This requires a parent pointer in nodes, but that we have. (Something like this is used for ordered containers like "map" and "set" in C++ STL, for instance, which are based on rb-trees in GCC's libstdc++.) > Another is the need to update the begin/end fields (these need updating > because of insertions/deletions but they're updated lazily while > traversing the tree to avoid an O(N) complexity during the > insertions/deletions). Hiding that behind 'some kind of "next node" > function keeps the code more readable. Is this for buffer text changes, something akin to a delayed update of marker positions? I already wondered where that was. > But yes, the current restriction to have a single iteration at a time is > a bit of a problem, especially because it's very "global". I added > a comment yesterday describing how we could make it non-global (hence > getting rid of the `visited` flag in the nodes). BTW, and related to iteration directly, did you notice this in interval_tree_insert? /* This suggests that nodes in the right subtree are strictly greater. But this is not true due to later rotations. */ child = node->begin <= child->begin ? child->left : child->right; ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 14:15 ` Gerd Möllmann @ 2022-09-29 14:37 ` Gerd Möllmann 2022-09-29 22:09 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 16:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-09-29 14:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, 58158 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > >> One reason is that traversing a binary tree usually requires something >> like recursion, but that wouldn't fit very conveniently with the current >> code (nor with C in general since you can't make a local recursive >> closure which accesses local variables from the surrounding function). > > Ok, usually, but not necessarily. The alternative is to implement an > iterator that starts with a node N, and an implementation of a successor > function, which return the successor of N in a given order. This > requires a parent pointer in nodes, but that we have. > > (Something like this is used for ordered containers like "map" and "set" > in C++ STL, for instance, which are based on rb-trees in GCC's > libstdc++.) The code from libstdc++ is this (from libstdc++-v3/src/c++98/tree.cc): static _Rb_tree_node_base* local_Rb_tree_increment(_Rb_tree_node_base* __x) throw () { if (__x->_M_right != 0) { __x = __x->_M_right; while (__x->_M_left != 0) __x = __x->_M_left; } else { _Rb_tree_node_base* __y = __x->_M_parent; while (__x == __y->_M_right) { __x = __y; __y = __y->_M_parent; } if (__x->_M_right != __y) __x = __y; } return __x; } I hope one can read that. The idea is to find the root of the smallest subtree containing the current node, and proceed from there. That's why the parent pointer is needed. Symmmetrical for max->min ordering. And finding min/max is trivial. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 14:37 ` Gerd Möllmann @ 2022-09-29 22:09 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 5:28 ` Gerd Möllmann 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 22:09 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158 > static _Rb_tree_node_base* > local_Rb_tree_increment(_Rb_tree_node_base* __x) throw () > { > if (__x->_M_right != 0) > { > __x = __x->_M_right; > while (__x->_M_left != 0) > __x = __x->_M_left; > } > else > { > _Rb_tree_node_base* __y = __x->_M_parent; > while (__x == __y->_M_right) > { > __x = __y; > __y = __y->_M_parent; > } > if (__x->_M_right != __y) > __x = __y; > } > return __x; > } > > I hope one can read that. I changed the code to store the `visited` bit in the work stack, but if you could rewrite the `interval_generator_next` along the lines of the code above that would be great. [ An alternative would be to try and get rid of the `parent` field :-) ] Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 22:09 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 5:28 ` Gerd Möllmann 2022-09-30 6:11 ` Eli Zaretskii 2022-09-30 13:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-09-30 5:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: > I changed the code to store the `visited` bit in the work stack, but if > you could rewrite the `interval_generator_next` along the lines of the > code above that would be great. Ok, I'll rewrite that :-). When I understand what that "narrowing" is and how and for what it is used. BTW, what do you think of changing function names to something a bit shorter? I find myself constantly getting confused when reading the code. I think an "itree_" prefix would suffice for non-static functions, and static ones without prefix. Renaming that is hopefully not that much work with LSP. > [ An alternative would be to try and get rid of the `parent` field :-) > ] :-) ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 5:28 ` Gerd Möllmann @ 2022-09-30 6:11 ` Eli Zaretskii 2022-09-30 11:31 ` Gerd Möllmann 2022-10-06 22:36 ` Dmitry Gutov 2022-09-30 13:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 2 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-09-30 6:11 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 58158, monnier > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, 58158@debbugs.gnu.org > Date: Fri, 30 Sep 2022 07:28:26 +0200 > > Renaming that is hopefully not that much work with LSP. You should be able to do this with M-? followed by "M-x xref-query-replace-in-results RET" (the latter should be invoked in the "*xref*" buffer produced by M-?). ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 6:11 ` Eli Zaretskii @ 2022-09-30 11:31 ` Gerd Möllmann 2022-09-30 18:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-06 22:36 ` Dmitry Gutov 1 sibling, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-09-30 11:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 58158, monnier On 22-09-30 8:11 , Eli Zaretskii wrote: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: Eli Zaretskii <eliz@gnu.org>, 58158@debbugs.gnu.org >> Date: Fri, 30 Sep 2022 07:28:26 +0200 >> >> Renaming that is hopefully not that much work with LSP. > > You should be able to do this with M-? followed by > "M-x xref-query-replace-in-results RET" (the latter > should be invoked in the "*xref*" buffer produced by M-?). Ok, thanks for the tip. Info for Stefan: I've now removed the per-tree null node, and make check shows no unexpected results. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 11:31 ` Gerd Möllmann @ 2022-09-30 18:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-02 8:06 ` Gerd Möllmann 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 18:29 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158 > Info for Stefan: I've now removed the per-tree null node, and make check > shows no unexpected results. Thanks. It's an improvement, but I still don't understand how it works :-( Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 18:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-02 8:06 ` Gerd Möllmann 0 siblings, 0 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-10-02 8:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Info for Stefan: I've now removed the per-tree null node, and make check >> shows no unexpected results. > > Thanks. It's an improvement, but I still don't understand how it > works :-( Is it still null->parent? If so, maybe it helps to picture that null is a child of many parents :-). Every node having null on the left or right is a parent in that sense. Therefore, if some code uses null->parent to do something it can not be right. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 6:11 ` Eli Zaretskii 2022-09-30 11:31 ` Gerd Möllmann @ 2022-10-06 22:36 ` Dmitry Gutov 2022-10-07 19:47 ` Eli Zaretskii 1 sibling, 1 reply; 46+ messages in thread From: Dmitry Gutov @ 2022-10-06 22:36 UTC (permalink / raw) To: Eli Zaretskii, Gerd Möllmann; +Cc: 58158, monnier On 30.09.2022 09:11, Eli Zaretskii wrote: > "M-x xref-query-replace-in-results RET" JFYI this command has the convenient binding 'r' in Xref output buffers. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-06 22:36 ` Dmitry Gutov @ 2022-10-07 19:47 ` Eli Zaretskii 2022-10-08 18:50 ` Dmitry Gutov 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-10-07 19:47 UTC (permalink / raw) To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier > Date: Fri, 7 Oct 2022 01:36:17 +0300 > Cc: 58158@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Dmitry Gutov <dgutov@yandex.ru> > > On 30.09.2022 09:11, Eli Zaretskii wrote: > > "M-x xref-query-replace-in-results RET" > > JFYI this command has the convenient binding 'r' in Xref output buffers. It also works only when invoked from the Xref buffer, right? Btw, what am I doing wrong below? emacs -Q C-x C-f src/character.h RET M-x visit-tags-table RET RET M-. char_string RET r whatever RET Unexpected result: "No suitable matches here". Huh? what did I miss? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-07 19:47 ` Eli Zaretskii @ 2022-10-08 18:50 ` Dmitry Gutov 2022-10-10 8:10 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Dmitry Gutov @ 2022-10-08 18:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 58158, monnier On 07.10.2022 22:47, Eli Zaretskii wrote: >> Date: Fri, 7 Oct 2022 01:36:17 +0300 >> Cc: 58158@debbugs.gnu.org, monnier@iro.umontreal.ca >> From: Dmitry Gutov <dgutov@yandex.ru> >> >> On 30.09.2022 09:11, Eli Zaretskii wrote: >>> "M-x xref-query-replace-in-results RET" >> >> JFYI this command has the convenient binding 'r' in Xref output buffers. > > It also works only when invoked from the Xref buffer, right? Yep. But we also have commands like xref-find-references-and-replace and dired-do-find-regexp-and-replace. > Btw, what am I doing wrong below? > > emacs -Q > C-x C-f src/character.h RET > M-x visit-tags-table RET RET > M-. char_string RET > r whatever RET > > Unexpected result: "No suitable matches here". Huh? what did I miss? We can't replace over "find definition" matches: they are more abstract and don't contain the necessary information to perform the replacement (such as the length of a match, for instance). And such xrefs might navigate you to the beginning of the line, for example, rather than to the beginning of the name. But that makes sense, doesn't it? If replacing over "find definitions" results worked fine, in the end you would get a codebase where all declarations of a method 'foo' got renamed, but all callsites of it remain unchanged. That couldn't have been your intention, could it? The error message could use some improvement, I suppose, but I'm not sure how to make it better. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-08 18:50 ` Dmitry Gutov @ 2022-10-10 8:10 ` Eli Zaretskii 2022-10-11 2:12 ` Dmitry Gutov 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-10-10 8:10 UTC (permalink / raw) To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier > Date: Sat, 8 Oct 2022 21:50:39 +0300 > Cc: gerd.moellmann@gmail.com, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Dmitry Gutov <dgutov@yandex.ru> > > > Btw, what am I doing wrong below? > > > > emacs -Q > > C-x C-f src/character.h RET > > M-x visit-tags-table RET RET > > M-. char_string RET > > r whatever RET > > > > Unexpected result: "No suitable matches here". Huh? what did I miss? > > We can't replace over "find definition" matches: they are more abstract > and don't contain the necessary information to perform the replacement > (such as the length of a match, for instance). > > And such xrefs might navigate you to the beginning of the line, for > example, rather than to the beginning of the name. > > But that makes sense, doesn't it? If replacing over "find definitions" > results worked fine, in the end you would get a codebase where all > declarations of a method 'foo' got renamed, but all callsites of it > remain unchanged. That couldn't have been your intention, could it? > > The error message could use some improvement, I suppose, but I'm not > sure how to make it better. I tried to improve the situation, both in the error message, the doc string, and the manual. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-10 8:10 ` Eli Zaretskii @ 2022-10-11 2:12 ` Dmitry Gutov 2022-10-11 6:37 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Dmitry Gutov @ 2022-10-11 2:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 58158, monnier On 10.10.2022 11:10, Eli Zaretskii wrote: >> Date: Sat, 8 Oct 2022 21:50:39 +0300 >> Cc:gerd.moellmann@gmail.com,58158@debbugs.gnu.org,monnier@iro.umontreal.ca >> From: Dmitry Gutov<dgutov@yandex.ru> >> >>> Btw, what am I doing wrong below? >>> >>> emacs -Q >>> C-x C-f src/character.h RET >>> M-x visit-tags-table RET RET >>> M-. char_string RET >>> r whatever RET >>> >>> Unexpected result: "No suitable matches here". Huh? what did I miss? >> We can't replace over "find definition" matches: they are more abstract >> and don't contain the necessary information to perform the replacement >> (such as the length of a match, for instance). >> >> And such xrefs might navigate you to the beginning of the line, for >> example, rather than to the beginning of the name. >> >> But that makes sense, doesn't it? If replacing over "find definitions" >> results worked fine, in the end you would get a codebase where all >> declarations of a method 'foo' got renamed, but all callsites of it >> remain unchanged. That couldn't have been your intention, could it? >> >> The error message could use some improvement, I suppose, but I'm not >> sure how to make it better. > I tried to improve the situation, both in the error message, the doc > string, and the manual. It's probably better now, but still likely confusing. What is a "subset of matches"? The user makes a search, gets a bunch of matches. Is it fair to call them "only a subset" is the user only searched for definitions? Or to take another example, the user can search for some regexp inside a particular directly, with dired-do-find-regexp. Imagine that that directory is not the whole project (perhaps just a minor subdirectory). Would it be fair to call the results "only a subset" then? But xref-query-replace-in-results will work in that case, because the search returns values which have enough info to perform the replacements. Perhaps we should make the error very specific, like "you can't replace inside xref-find-definitions results". Since that is going to be the exception in like 99.9% of the cases. It's possible for other backend commands (such as find-references) to return unsuitable xref values in third-party backends, or for other code to use xref-show-xrefs with such, but those will be really very rare, if happen at all. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-11 2:12 ` Dmitry Gutov @ 2022-10-11 6:37 ` Eli Zaretskii 0 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-10-11 6:37 UTC (permalink / raw) To: Dmitry Gutov; +Cc: gerd.moellmann, 58158, monnier > Date: Tue, 11 Oct 2022 05:12:11 +0300 > Cc: gerd.moellmann@gmail.com, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Dmitry Gutov <dgutov@yandex.ru> > > What is a "subset of matches"? Feel free to suggest a less vague description. The idea is that the list in Xref buffer doesn't show all the references to the identifier, making renaming infeasible. > Perhaps we should make the error very specific, like "you can't replace > inside xref-find-definitions results". Since that is going to be the > exception in like 99.9% of the cases. That'd be my preference, but what are those 0.1% of cases where the Xref buffer produced by other commands could fit? More generally, what exactly does xref.el test to produce the error message, and how to describe that in user-level terms? > It's possible for other backend commands (such as find-references) to > return unsuitable xref values in third-party backends, or for other code > to use xref-show-xrefs with such, but those will be really very rare, if > happen at all. You are saying that 'r' is only useful after M-?, is that right? The manual says so, but the manual doesn't have to say "the whole truth". The doc string should. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 5:28 ` Gerd Möllmann 2022-09-30 6:11 ` Eli Zaretskii @ 2022-09-30 13:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 14:08 ` Gerd Möllmann 1 sibling, 1 reply; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 13:25 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158 Gerd Möllmann [2022-09-30 07:28:26] wrote: > Stefan Monnier <monnier@iro.umontreal.ca> writes: >> I changed the code to store the `visited` bit in the work stack, but if >> you could rewrite the `interval_generator_next` along the lines of the >> code above that would be great. > Ok, I'll rewrite that :-). When I understand what that "narrowing" is > and how and for what it is used. The traversals are always bound by begin..end boundaries. Usually we know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but when doing things like `next/previous-overlay-change` one of the bounds is not know at first, so in order to try and avoid the O(N) complexity the code refines that other bound on the fly (e.g. when searching forward, after seeing an overlay that ends at POS we know that there's no point looking further than POS so we can move the end boundary to POS). > BTW, what do you think of changing function names to something a bit > shorter? I find myself constantly getting confused when reading the > code. I think an "itree_" prefix would suffice for non-static > functions, and static ones without prefix. Fine by me, yes. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 13:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 14:08 ` Gerd Möllmann 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-09-30 14:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: > The traversals are always bound by begin..end boundaries. Usually we > know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but > when doing things like `next/previous-overlay-change` one of the bounds > is not know at first, so in order to try and avoid the O(N) complexity > the code refines that other bound on the fly (e.g. when searching > forward, after seeing an overlay that ends at POS we know that there's no > point looking further than POS so we can move the end boundary to > POS). Thanks, that helps. Maybe you could also help me with the questions below? I'm assuming, from a comment somewhere, that an interval tree is an rb-tree with keys being interval start positions, and allowing duplicates. That is, if N is a node, all nodes in the subtree N->left are strictly < N, and nodes in N->right are >=. Or in a picture, if [start, end] is an interval, and we insert some intervals with the same start, we could arrive at [5, a] / \ [4, b] [6, c] /\ / \ 0 0 0 [6, d] /\ ... where 0 means null, and 6 is a duplicate start. 1. Is that correct? 2. Does the tree order duplicates in a particular way? Maybe by overlay priority, or by interval end? And, perhaps, if it doesn't order duplicates, would it be an idea to order them, maybe at some later point? I'm asking this because a successor(N) function would return nodes in the order in the tree, always, and I don't know what the order is now. 3. If my mental picture is right, we could of course end up with a tree that has degenerated to a list. Or a subtree, maybe. Do you know if that can happen? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 14:08 ` Gerd Möllmann @ 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 16:04 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 15:25 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158 > Maybe you could also help me with the questions below? I'll try (BTW, the original author is Andreas Politz who we can still reach at <mail@andreas-politz.de>. He doesn't have much time to devote to it, tho, so best not to Cc him through all the conversations). > I'm assuming, from a comment somewhere, that an interval tree is an > rb-tree with keys being interval start positions, and allowing > duplicates. Yup. > That is, if N is a node, all nodes in the subtree N->left are strictly < > N, and nodes in N->right are >=. The following code in `interval_tree_insert`: while (child != ITREE_NULL) { parent = child; offset += child->offset; child->limit = max (child->limit, node->end - offset); /* This suggests that nodes in the right subtree are strictly greater. But this is not true due to later rotations. */ child = node->begin <= child->begin ? child->left : child->right; } suggests that N->left are <= and N->right are > but my reading of the comment is that the only thing we can rely on is that N-<left is <= and N->right is >= [ I do understand this comment now :-) ] > 2. Does the tree order duplicates in a particular way? > Maybe by overlay priority, or by interval end? AFAICT, no it does not. > And, perhaps, if it doesn't order duplicates, would it be an idea to > order them, maybe at some later point? I'm asking this because > a successor(N) function would return nodes in the order in the tree, > always, and I don't know what the order is now. Ordering based on interval end could arguably make sense. I'm not completely sure how/where it would benefit us, tho. Most times we look at the itree, we just look at *all* the nodes within a specific interval, so the order in which they're returned doesn't matter very much (the ordering within the tree does matter in terms of how we manage to prune the tree, but that has more to do with which elements are near the root, which is a different kind of "ordering" than the "binary tree ordering" itself). Maybe the `next/previous-overlay-change` code benefit from sub-ordering based on `end`, but I suspect the effect would be lost in the noise: if we want to speed up that part of the code, I expect that a better avenue is to rewrite the `next/previous-single-overlay-change` so as not to work by (while .. (next/previous-overlay-change ..) .. (get-char-property ..) ..) where both functions do the O(log N) itree-iteration dance, but instead doing the itree iteration itself. > 3. If my mental picture is right, we could of course end up with a tree > that has degenerated to a list. Or a subtree, maybe. Do you know if > that can happen? In terms of tree depth, no: the code preserves the RB invariants, which ensure balance regardless of ordering (or lack thereof). Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 16:04 ` Eli Zaretskii 2022-09-30 17:11 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-01 5:06 ` Gerd Möllmann 2022-10-01 7:25 ` Gerd Möllmann 2 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-09-30 16:04 UTC (permalink / raw) To: Stefan Monnier; +Cc: gerd.moellmann, mail, 58158 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, 58158@debbugs.gnu.org, Andreas Politz > <mail@andreas-politz.de> > Date: Fri, 30 Sep 2022 11:25:30 -0400 > > > And, perhaps, if it doesn't order duplicates, would it be an idea to > > order them, maybe at some later point? I'm asking this because > > a successor(N) function would return nodes in the order in the tree, > > always, and I don't know what the order is now. > > Ordering based on interval end could arguably make sense. I'm not > completely sure how/where it would benefit us, tho. Most times we look > at the itree, we just look at *all* the nodes within a specific > interval, so the order in which they're returned doesn't matter very > much (the ordering within the tree does matter in terms of how we manage > to prune the tree, but that has more to do with which elements are near > the root, which is a different kind of "ordering" than the "binary tree > ordering" itself). Maybe the `next/previous-overlay-change` code > benefit from sub-ordering based on `end`, but I suspect the effect would > be lost in the noise: if we want to speed up that part of the code, > I expect that a better avenue is to rewrite the > `next/previous-single-overlay-change` so as not to work by (while .. > (next/previous-overlay-change ..) .. (get-char-property ..) ..) where > both functions do the O(log N) itree-iteration dance, but instead doing > the itree iteration itself. The display engine sorts the overlays, so order could be important. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 16:04 ` Eli Zaretskii @ 2022-09-30 17:11 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 0 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-30 17:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, mail, 58158 >> Ordering based on interval end could arguably make sense. I'm not >> completely sure how/where it would benefit us, tho. Most times we look >> at the itree, we just look at *all* the nodes within a specific >> interval, so the order in which they're returned doesn't matter very >> much (the ordering within the tree does matter in terms of how we manage >> to prune the tree, but that has more to do with which elements are near >> the root, which is a different kind of "ordering" than the "binary tree >> ordering" itself). Maybe the `next/previous-overlay-change` code >> benefit from sub-ordering based on `end`, but I suspect the effect would >> be lost in the noise: if we want to speed up that part of the code, >> I expect that a better avenue is to rewrite the >> `next/previous-single-overlay-change` so as not to work by (while .. >> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where >> both functions do the O(log N) itree-iteration dance, but instead doing >> the itree iteration itself. > > The display engine sorts the overlays, so order could be important. Indeed, for things like `overlays_at` it would be handy if the iterator could return the right overlays already in the right order, so we don't have to `sort_overlays`. But in `sort_overlays` the `priority` property takes precedence, so if we order the RB tree based on that ordering, we will lose the main purpose of the change, i.e. finding the overlays at POS in O(log N) time, because the `compare_overlays` ordering does not let us know that all of the left or right subtree is outside of a given interval. IOW the itree has to use a different ordering than that of `compare_overlays` anyway, so we're still forced to (re-)sort afterwards with `sort_overlays` :-( Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 16:04 ` Eli Zaretskii @ 2022-10-01 5:06 ` Gerd Möllmann 2022-10-01 13:54 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-01 7:25 ` Gerd Möllmann 2 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-10-01 5:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> That is, if N is a node, all nodes in the subtree N->left are strictly < >> N, and nodes in N->right are >=. > > The following code in `interval_tree_insert`: > > while (child != ITREE_NULL) > { > parent = child; > offset += child->offset; > child->limit = max (child->limit, node->end - offset); > /* This suggests that nodes in the right subtree are strictly > greater. But this is not true due to later rotations. */ > child = node->begin <= child->begin ? child->left : child->right; > } > > suggests that N->left are <= and N->right are > but my reading of the > comment is that the only thing we can rely on is that N-<left is <= and > N->right is >= Phew. I'm not sure but I get the feeling that makes implementing a successor function, let's say, challenging. I wonder how std::multimap deals with that. Hm. Anyways, with your removal of the visited flag (N thumbs up, for large values of N :-)), most of the problems I preceived are gone anyway. So, I guess we can live with the iteration stack, which seems to work fine, and the alternative is only nice to have, now. (And impacts are getting closer in the real world here, so I'm a bit distracted anyway.) > > [ I do understand this comment now :-) ] :-) > >> 2. Does the tree order duplicates in a particular way? >> Maybe by overlay priority, or by interval end? > > AFAICT, no it does not. Ok. >> And, perhaps, if it doesn't order duplicates, would it be an idea to >> order them, maybe at some later point? I'm asking this because >> a successor(N) function would return nodes in the order in the tree, >> always, and I don't know what the order is now. > > Ordering based on interval end could arguably make sense. I'm not > completely sure how/where it would benefit us, tho. Most times we look > at the itree, we just look at *all* the nodes within a specific > interval, so the order in which they're returned doesn't matter very > much (the ordering within the tree does matter in terms of how we manage > to prune the tree, but that has more to do with which elements are near > the root, which is a different kind of "ordering" than the "binary tree > ordering" itself). Maybe the `next/previous-overlay-change` code > benefit from sub-ordering based on `end`, but I suspect the effect would > be lost in the noise: if we want to speed up that part of the code, > I expect that a better avenue is to rewrite the > `next/previous-single-overlay-change` so as not to work by (while .. > (next/previous-overlay-change ..) .. (get-char-property ..) ..) where > both functions do the O(log N) itree-iteration dance, but instead doing > the itree iteration itself. Thanks. > >> 3. If my mental picture is right, we could of course end up with a tree >> that has degenerated to a list. Or a subtree, maybe. Do you know if >> that can happen? > > In terms of tree depth, no: the code preserves the RB invariants, which > ensure balance regardless of ordering (or lack thereof). Ok, thanks for your insights! ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-01 5:06 ` Gerd Möllmann @ 2022-10-01 13:54 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-02 8:22 ` Gerd Möllmann 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-01 13:54 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158 >> The following code in `interval_tree_insert`: [...] >> suggests that N->left are <= and N->right are > but my reading of the >> comment is that the only thing we can rely on is that N-<left is <= and >> N->right is >= > > Phew. I'm not sure but I get the feeling that makes implementing a > successor function, let's say, challenging. I don't think it makes any difference in that respect, no. The notion of "successor" is just "the successor in the in-order traversal of the tree" and while rotations change the initial property that "N->right are >", they don't change the in-order traversal output (they don't re-order nodes w.r.t in-order traversal (or in reverse order for that matter)). As alluded to at the end of my previous email, while RB trees are typically used for your usual binary tree which comes with some notion of ordering that makes lookup O(log N), the RB trees ensure balance without relying on any notion of ordering since the rotations just blindly preserve the order as a side effect but they don't themselves need to call the ordering function to decide how to do their job. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-01 13:54 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-02 8:22 ` Gerd Möllmann 2022-10-02 16:32 ` Andreas Politz 0 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-10-02 8:22 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Phew. I'm not sure but I get the feeling that makes implementing a >> successor function, let's say, challenging. > > I don't think it makes any difference in that respect, no. The reason I find it challenging, and I'm sure now, is that I've written the following code, and failed :-). /* FIXME: This assumption is wrong. Nodes on the left of P are <=, and nodes on the right are >=. */ /* Value is the successor of interval node X in ascending order. It is assumed that the tree is organized so that nodes < X are in X->left and nodes in X->right are >= X. */ static struct interval_node * in_order_successor (struct interval_node *x) { if (x->right != ITREE_NULL) { /* X has a right child, which means X's right subtree has elements >= X. Proceed to the left-most child in the right subtree, which is the smallest in that subtree. */ x = x->right; while (x->left != 0) x = x->left; } else { /* X's left subtree is uninteresting, because everything there is < X. Therefore follow the parent chain. If X is the parent's right child, this means the parent is < X, We are looking for a parent that is >=. */ struct interval_node *y = x->parent; while (x == y->right) { x = y; y = y->parent; } /* If we found a parent that's >=, the parent is what we sought. Otherwise, X has arrived at the null node, whose right child is the sentinel node itself. */ if (x->right != y) x = y; } return x; } I tried to change the comments and/or modify the code for a tree like we have (left subtree <=, right >=), and couldn't explain why it works, but I also couldn't produce a counter-example that the existing tree code actually can produce. IOW, I couldn't prove anything. P.S. With the "all over the place" I indented to hint at the fact that the tree is not a "normal" BST. I should probably have said that directly, sorry. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-02 8:22 ` Gerd Möllmann @ 2022-10-02 16:32 ` Andreas Politz 2022-10-03 4:35 ` Gerd Möllmann 0 siblings, 1 reply; 46+ messages in thread From: Andreas Politz @ 2022-10-02 16:32 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 450 bytes --] I've managed to construct a prototype of this "stateless" iterator. I've only implemented the in-order case and only applied it in the overlays_in function. The only real trouble I had was with pushing the offset to the children during the tree navigation in some kind of sensible way. In the end I gave up and simply pasted calls to the corresponding function "all over the place". It seems to work, at least buffer-tests are passing. Andreas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: noverlay-stateless-iterator.diff --] [-- Type: text/x-patch, Size: 6871 bytes --] diff --git a/src/buffer.c b/src/buffer.c index f59fddcbde..6a53b49aad 100644 --- a/src/buffer.c +++ b/src/buffer.c @@ -2948,15 +2948,16 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend, ptrdiff_t next = ZV; Lisp_Object *vec = *vec_ptr; struct interval_node *node; + struct interval_tree_iterator iterator; - ITREE_FOREACH (node, current_buffer->overlays, beg, - /* Find empty OV at Z ? */ - (end >= ZV && empty) ? ZV + 1 : ZV, ASCENDING) + interval_tree_iterator_init(&iterator, current_buffer->overlays, beg, + (end >= ZV && empty) ? ZV + 1 : ZV, ITREE_ASCENDING); + + while ((node = interval_tree_iterator_next(&iterator))) { if (node->begin > end) { next = min (next, node->begin); - ITREE_FOREACH_ABORT (); break; } else if (node->begin == end) @@ -2964,7 +2965,6 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend, next = node->begin; if ((! empty || end < ZV) && beg < end) { - ITREE_FOREACH_ABORT (); break; } } diff --git a/src/itree.c b/src/itree.c index eeecaf1839..21549fe8a7 100644 --- a/src/itree.c +++ b/src/itree.c @@ -1190,3 +1190,132 @@ interval_tree_subtree_min (const struct interval_tree *tree, struct interval_nod * +===================================================================================+ */ /* See Foverlay_tree in buffer.c */ + +\f +/* +===================================================================================+ + * | Stateless iterator + * +===================================================================================+ */ + +static bool +interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator, + struct interval_node *node); +static +struct interval_node * +interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator); + +void +interval_tree_iterator_init(struct interval_tree_iterator *iterator, + struct interval_tree *tree, + ptrdiff_t begin, + ptrdiff_t end, + enum interval_tree_order order) { + iterator->tree = tree; + iterator->node = tree && begin <= end ? ITREE_NULL : NULL; + iterator->begin = begin; + iterator->end = end; + iterator->order = order; +} + +struct interval_node * +interval_tree_iterator_next(struct interval_tree_iterator *iterator) { + if (iterator->node) { + do { + switch (iterator->order) { + case ITREE_ASCENDING: + iterator->node = interval_tree_iterator_in_order_successor (iterator); + break; + default: + fprintf (stderr, "interval_tree_order != ITREE_ASCENDING not implemented"); + emacs_abort (); + } + } while (iterator->node && + ! interval_node_intersects (iterator->node, iterator->begin, iterator->end)); + } + + if (iterator->node == ITREE_NULL) { + fprintf (stderr, "Next node: ITREE_NULL\n"); + } else if (! iterator->node) { + fprintf (stderr, "Next node: NULL\n"); + } else { + fprintf (stderr, "Next node: begin = %ld, end = %ld (iterator: begin = %ld, end = %ld)\n", + iterator->node->begin, iterator->node->end, iterator->begin, iterator->end); + } + return iterator->node; +} + +static bool +interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator, + struct interval_node *node) { + eassert (node); + + if (node == ITREE_NULL) { + return false; + } else { + eassert (node->parent != ITREE_NULL); + if (node->parent->left == node) { + return iterator->begin <= node->limit + node->offset; + } else { + return node->parent->begin <= iterator->end; + } + } +} + +static +struct interval_node * +interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator) +{ + struct interval_node *node = iterator->node; + + if (node != ITREE_NULL) { + interval_tree_inherit_offset (iterator->tree, node); + } + + if (node == ITREE_NULL) { + node = iterator->tree->root; + if (node != ITREE_NULL) { + interval_tree_inherit_offset (iterator->tree, node); + } + while (interval_tree_iter_traverse_p(iterator, node->left)) { + node = node->left; + if (node != ITREE_NULL) { + interval_tree_inherit_offset (iterator->tree, node); + } + } + } else if (interval_tree_iter_traverse_p(iterator, node->right)) + { + node = node->right; + if (node != ITREE_NULL) { + interval_tree_inherit_offset (iterator->tree, node); + } + while (interval_tree_iter_traverse_p(iterator, node->left)) { + node = node->left; + if (node != ITREE_NULL) { + interval_tree_inherit_offset (iterator->tree, node); + } + } + } + else + { + struct interval_node *parent = node->parent; + while (node == parent->right) + { + node = parent; + parent = parent->parent; + } + if (node != ITREE_NULL) + node = parent; + } + + return node == ITREE_NULL ? NULL : node; +} + +void +interval_tree_iterator_narrow (struct interval_tree_iterator *iterator, + ptrdiff_t begin, + ptrdiff_t end) +{ + eassert (begin >= iterator->begin); + eassert (end <= iterator->end); + iterator->begin = max (begin, iterator->begin); + iterator->end = min (end, iterator->end); +} diff --git a/src/itree.h b/src/itree.h index 1f019a2607..53f03cca35 100644 --- a/src/itree.h +++ b/src/itree.h @@ -72,6 +72,15 @@ #define ITREE_NULL (&itree_null) ITREE_PRE_ORDER, }; +struct interval_tree_iterator +{ + struct interval_tree *tree; + struct interval_node *node; + ptrdiff_t begin; + ptrdiff_t end; + enum interval_tree_order order; +}; + void interval_node_init (struct interval_node *, ptrdiff_t, ptrdiff_t, bool, bool, Lisp_Object); ptrdiff_t interval_node_begin (struct interval_tree *, struct interval_node *); ptrdiff_t interval_node_end (struct interval_tree *, struct interval_node *); @@ -135,4 +144,15 @@ #define ITREE_FOREACH_ABORT() \ #define ITREE_FOREACH_NARROW(beg, end) \ interval_generator_narrow (itree_iter_, beg, end) +void interval_tree_nodes (struct interval_tree *tree, struct interval_node **nodes, enum interval_tree_order order); + +void interval_tree_iterator_init(struct interval_tree_iterator *iterator, + struct interval_tree *tree, + ptrdiff_t begin, + ptrdiff_t end, + enum interval_tree_order order); +struct interval_node *interval_tree_iterator_next(struct interval_tree_iterator *iterator); +void interval_tree_iterator_narrow (struct interval_tree_iterator *iterator, + ptrdiff_t begin, + ptrdiff_t end); #endif ^ permalink raw reply related [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-02 16:32 ` Andreas Politz @ 2022-10-03 4:35 ` Gerd Möllmann 2022-10-04 10:50 ` Andreas Politz 0 siblings, 1 reply; 46+ messages in thread From: Gerd Möllmann @ 2022-10-03 4:35 UTC (permalink / raw) To: Andreas Politz; +Cc: Eli Zaretskii, 58158, Stefan Monnier Andreas Politz <mail@andreas-politz.de> writes: > It seems to work, at least buffer-tests are passing. "seems to work" is a bit weak. Can we prove it? I don't mean mathematically, but by reasoning like I tried in the comments in successor function I posted, ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-03 4:35 ` Gerd Möllmann @ 2022-10-04 10:50 ` Andreas Politz 0 siblings, 0 replies; 46+ messages in thread From: Andreas Politz @ 2022-10-04 10:50 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier My implementation seems to work. The correctness of the algorithm would follow from 1. The Rb tree algorithm produces a tree with left <= root <= right, 2. the algorithm you’ve posted, and I’ve adapted, produces an in–order of the tree and 3. the conditions guiding the traversal are correct, i.e. they don’t exclude matching intervals. Since I believe these statements are true, I believe the algorithm is correct. Andreas > Am 03.10.2022 um 06:35 schrieb Gerd Möllmann <gerd.moellmann@gmail.com>: > > Andreas Politz <mail@andreas-politz.de> writes: > >> It seems to work, at least buffer-tests are passing. > > "seems to work" is a bit weak. Can we prove it? > > I don't mean mathematically, but by reasoning like I tried in the > comments in successor function I posted, ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 16:04 ` Eli Zaretskii 2022-10-01 5:06 ` Gerd Möllmann @ 2022-10-01 7:25 ` Gerd Möllmann 2022-10-01 10:55 ` Gerd Möllmann 2022-10-01 14:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2 siblings, 2 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-10-01 7:25 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Maybe you could also help me with the questions below? > > I'll try (BTW, the original author is Andreas Politz who we can still > reach at <mail@andreas-politz.de>. He doesn't have much time to devote > to it, tho, so best not to Cc him through all the conversations). > >> I'm assuming, from a comment somewhere, that an interval tree is an >> rb-tree with keys being interval start positions, and allowing >> duplicates. > > Yup. > >> That is, if N is a node, all nodes in the subtree N->left are strictly < >> N, and nodes in N->right are >=. > > The following code in `interval_tree_insert`: > > while (child != ITREE_NULL) > { > parent = child; > offset += child->offset; > child->limit = max (child->limit, node->end - offset); > /* This suggests that nodes in the right subtree are strictly > greater. But this is not true due to later rotations. */ > child = node->begin <= child->begin ? child->left : child->right; > } > > suggests that N->left are <= and N->right are > but my reading of the > comment is that the only thing we can rely on is that N-<left is <= and > N->right is >= > Yup. I've used overlay-tree a bit (compile with ITREE_DEBUG defined after pulling), and used this: (defun make-ivs () (with-current-buffer (get-buffer-create "*iv*") (delete-all-overlays) (erase-buffer) (insert (make-string 50 ?x)) (let ((o1 (make-overlay 1 1)) (o2 (make-overlay 10 11)) (o3 (make-overlay 10 12)) (o4 (make-overlay 1 1)) ) (move-overlay o4 10 13) (overlay-tree)))) (pp (make-ivs)) ((:begin 10 :end 12 :limit 13 :offset 0 :rear-advance nil :front-advance nil) ((:begin 1 :end 1 :limit 13 :offset 0 :rear-advance nil :front-advance nil) nil ((:begin 10 :end 13 :limit 13 :offset 0 :rear-advance nil :front-advance nil) nil nil)) ((:begin 10 :end 11 :limit 11 :offset 0 :rear-advance nil :front-advance nil) nil nil)) That's [10, 12] / \ [1, 1] [10, 11] / \ /\ [10, 13] / \ The 10 is found "all over the place". I surmise no reasonable successor function can be written for such a tree. I have to look at std::multimap, they manage to do this somehow. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-01 7:25 ` Gerd Möllmann @ 2022-10-01 10:55 ` Gerd Möllmann 2022-10-01 14:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 46+ messages in thread From: Gerd Möllmann @ 2022-10-01 10:55 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, Andreas Politz, 58158 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > I have to look at std::multimap, they manage to do this somehow. Well, I should have thought of that, because it's obvious from the fact they are able to use successor/predecessor functions :-/. The equivalent in our code would go like below, which is just written down in a hurry, so please bear with me. It's just about the idea. So: Insert a new interval_node only if overlay start is not in the tree already. If the contains a node for the start, make the node's value a list of overlays, and add the new one to it in an order that's convenient (The STL uses insertion order). As a consequence, keys is the rb-tree are unique, and it is always strictly ordered, rotations don't change that. The min node is the left-most node in a tree, and so on. So far so good. An iterator in the order min->max would have to record the interval_node in the tree plus information where in the list of overlays it is, if it is in any. Finding the next node is implemented with a successor function like the one I've shown from libstdc++. To find all overlays containing a given position POS, find the node whose start is <= POS. Start at the root of the tree, and walk the left link until we find the ndoes's start is <= POS. Then iterate forward until start > POS, returning only overlapping overlays. Finding overlays intersecting an interval [BEG, END] is not as nice, but we can exclude intervals starting after END. So we have to iterate over all overlays from the minimum of the tree, and iterate forward until we reach the first excluded one. That's so "easy" that even I can understand how it works :-). What am I overlooking? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-01 7:25 ` Gerd Möllmann 2022-10-01 10:55 ` Gerd Möllmann @ 2022-10-01 14:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-10-01 14:01 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, Andreas Politz, 58158 > That's > > [10, 12] > / \ > [1, 1] [10, 11] > / \ /\ > [10, 13] > / \ > > The 10 is found "all over the place". "all over the place" if you consider the pre-order or post-order traversal, indeed. But if you look at the in-order traversal (e.g., the one you'd get from C++ `local_Rb_tree_increment` you showed), you get the expected result: [1, 1] [10, 13] [10, 12] [10, 11] -- Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 14:15 ` Gerd Möllmann 2022-09-29 14:37 ` Gerd Möllmann @ 2022-09-29 16:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 46+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-29 16:40 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 58158 Gerd Möllmann [2022-09-29 16:15:09] wrote: > Stefan Monnier <monnier@iro.umontreal.ca> writes: >> One reason is that traversing a binary tree usually requires something >> like recursion, but that wouldn't fit very conveniently with the current >> code (nor with C in general since you can't make a local recursive >> closure which accesses local variables from the surrounding function). > Ok, usually, but not necessarily. The alternative is to implement an > iterator that starts with a node N, and an implementation of a successor > function, which return the successor of N in a given order. The approach currently used is somewhat similar to that. Some of the difference is that we need an actual "iterator/generator" object to remember the parameter of the filtering we want to apply to the set of objects. And the problem is that this "object" is currently implemented not only as a global value (thus restricting us to one-iteration at a time) but also with some parts of the data stored in the tree. I think this is the part that really needs to be changed. > This requires a parent pointer in nodes, but that we have. > > (Something like this is used for ordered containers like "map" and "set" > in C++ STL, for instance, which are based on rb-trees in GCC's > libstdc++.) Another difference is that itree.c's iterator uses a local "work stack" instead of traversing the tree exclusively via left/right/parent like in the code you show. I don't know if that difference is important, tho. >> Another is the need to update the begin/end fields (these need updating >> because of insertions/deletions but they're updated lazily while >> traversing the tree to avoid an O(N) complexity during the >> insertions/deletions). Hiding that behind 'some kind of "next node" >> function keeps the code more readable. > > Is this for buffer text changes, something akin to a delayed update of > marker positions? Yes, exactly. >> But yes, the current restriction to have a single iteration at a time is >> a bit of a problem, especially because it's very "global". I added >> a comment yesterday describing how we could make it non-global (hence >> getting rid of the `visited` flag in the nodes). > > BTW, and related to iteration directly, did you notice this in > interval_tree_insert? > > /* This suggests that nodes in the right subtree are strictly > greater. But this is not true due to later rotations. */ > child = node->begin <= child->begin ? child->left : child->right; No, I had not noticed that yet, and I don't understand this comment. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 9:09 ` Gerd Möllmann 2022-09-29 9:37 ` Eli Zaretskii @ 2022-10-01 1:57 ` Richard Stallman 2022-10-01 7:00 ` Eli Zaretskii 1 sibling, 1 reply; 46+ messages in thread From: Richard Stallman @ 2022-10-01 1:57 UTC (permalink / raw) To: Gerd Möllmann; +Cc: eliz, 58158, monnier [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I think it simply can't be that what is basically walking a binary tree > requires such restrictions. If we allow running complicated code during the tree walk, that would be asking for trouble. But if the operations done during the tree walk are simple and written in C, it should be clear that there is no possibility of this kind of a bug. Isn't that the case? > > What has to do with overlays. To name a few: overlay-at, overlays-in, > > next-overlay-change, previous-overlay-change, overlay-lists, ... Those operations can be done in C with no risk of signaling an error. That ought to be sufficient, as long as we don't need the code to be reentrant. If the code needs to be reentrant then we would have to eliminate the "visited" flag. But if we don't run Lisp code during the tree-walk, we should not need it to be reentrant. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-10-01 1:57 ` Richard Stallman @ 2022-10-01 7:00 ` Eli Zaretskii 0 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-10-01 7:00 UTC (permalink / raw) To: rms; +Cc: gerd.moellmann, 58158, monnier > From: Richard Stallman <rms@gnu.org> > Cc: eliz@gnu.org, 58158@debbugs.gnu.org, monnier@iro.umontreal.ca > Date: Fri, 30 Sep 2022 21:57:36 -0400 > > > > What has to do with overlays. To name a few: overlay-at, overlays-in, > > > next-overlay-change, previous-overlay-change, overlay-lists, ... > > Those operations can be done in C with no risk of signaling an error. I think such assumptions were proven dangerous at best in the long run. The way Emacs develops, we constantly add more and more hooks to C code, and more and more direct calls into Lisp from C. These invalidate any assumptions about "no risk of signaling an error", even if they were originally true when the code was first written. Moreover, we test for QUIT in operations that can be prolonged ones (and for a good reason), so any long loop in C could potentially throw to top level if the user pressed C-g. All in all, experience shows that making such assumptions in Emacs is unsafe. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann 2022-09-29 6:28 ` Eli Zaretskii @ 2022-10-06 22:26 ` Matt Armstrong 2023-10-06 13:14 ` Gerd Möllmann 2 siblings, 0 replies; 46+ messages in thread From: Matt Armstrong @ 2022-10-06 22:26 UTC (permalink / raw) To: Andreas Politz, Gerd Möllmann; +Cc: Eli Zaretskii, 58158, Stefan Monnier Andreas Politz <mail@andreas-politz.de> writes: > I've managed to construct a prototype of this "stateless" iterator. > > I've only implemented the in-order case and only applied it in the overlays_in function. > > The only real trouble I had was with pushing the offset to the children > during the tree navigation in some kind of sensible way. In the end I > gave up and simply pasted calls to the corresponding function "all over > the place". It seems to work, at least buffer-tests are passing. Hey Andreas, this looks pretty good to me but I have some questions on it. > +static > +struct interval_node * > +interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator) > +{ > + struct interval_node *node = iterator->node; > + > + if (node != ITREE_NULL) { > + interval_tree_inherit_offset (iterator->tree, node); > + } > + if (node == ITREE_NULL) { > + node = iterator->tree->root; I don't understand this branch. If the node is NULL upon entry to the successor call, why start at the root? Why is the "next in order node" of NULL the root? The root is just an node whose BEG is roughly in the middle of the total inorder traversal, right? The "stateless" inorder traversal algorithm I am used to starts with the minimum node (walk only left links down from the root until at a leaf and that is the minimum). Then do inorder traversal from there. > + if (node != ITREE_NULL) { > + interval_tree_inherit_offset (iterator->tree, node); > + } > + while (interval_tree_iter_traverse_p(iterator, node->left)) { > + node = node->left; > + interval_tree_inherit_offset (iterator->tree, node); > + } > + } else if (interval_tree_iter_traverse_p(iterator, node->right)) > + { > + node = node->right; > + interval_tree_inherit_offset (iterator->tree, node); > + while (interval_tree_iter_traverse_p(iterator, node->left)) { > + node = node->left; > + interval_tree_inherit_offset (iterator->tree, node); > + } > + } > + else > + { > + struct interval_node *parent = node->parent; > + while (node == parent->right) > + { > + node = parent; > + parent = parent->parent; > + } > + if (node != ITREE_NULL) > + node = parent; > + } > + > + return node == ITREE_NULL ? NULL : node; > +} I don't understand how the last "else" case works correctly in the face of a call to `interval_generator_narrow` during iteration. Narrowing could render a node's parent out of the "interesting" range, and so the iterator should not return it but instead keep trying to find a successor in range. Right? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful 2022-09-29 5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann 2022-09-29 6:28 ` Eli Zaretskii 2022-10-06 22:26 ` Matt Armstrong @ 2023-10-06 13:14 ` Gerd Möllmann 2 siblings, 0 replies; 46+ messages in thread From: Gerd Möllmann @ 2023-10-06 13:14 UTC (permalink / raw) To: 58158 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > In its current form, interval tree iteration works like this: > > 1. Call begin_iteration(T) to iterate over tree T > 2. do stuff > 3. Call end_iteration(T) > > with the following rules: > > - Begin_iteration and end_iteration must be paired. > > - There can be only one iteration per tree at a time. Nested iteration > over the same tree is not supported (abort). > > - No GC may happen in step 2. This is because mark_buffer iterates over > buffer overlays. > > I think this is an exceedingly dangerous design. Time has passed, and I think this is no longer relevant. I'm therefore closing this bug. ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2023-10-06 13:14 UTC | newest] Thread overview: 46+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-29 5:29 bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful Gerd Möllmann 2022-09-29 6:28 ` Eli Zaretskii 2022-09-29 7:03 ` Gerd Möllmann 2022-09-29 8:08 ` Eli Zaretskii 2022-09-29 9:09 ` Gerd Möllmann 2022-09-29 9:37 ` Eli Zaretskii 2022-09-29 10:05 ` Gerd Möllmann 2022-09-29 10:43 ` Eli Zaretskii 2022-09-29 11:33 ` Gerd Möllmann 2022-09-29 13:10 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:23 ` Eli Zaretskii 2022-09-29 16:48 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 13:40 ` Eli Zaretskii 2022-09-29 14:15 ` Gerd Möllmann 2022-09-29 14:37 ` Gerd Möllmann 2022-09-29 22:09 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 5:28 ` Gerd Möllmann 2022-09-30 6:11 ` Eli Zaretskii 2022-09-30 11:31 ` Gerd Möllmann 2022-09-30 18:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-02 8:06 ` Gerd Möllmann 2022-10-06 22:36 ` Dmitry Gutov 2022-10-07 19:47 ` Eli Zaretskii 2022-10-08 18:50 ` Dmitry Gutov 2022-10-10 8:10 ` Eli Zaretskii 2022-10-11 2:12 ` Dmitry Gutov 2022-10-11 6:37 ` Eli Zaretskii 2022-09-30 13:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 14:08 ` Gerd Möllmann 2022-09-30 15:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-30 16:04 ` Eli Zaretskii 2022-09-30 17:11 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-01 5:06 ` Gerd Möllmann 2022-10-01 13:54 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-02 8:22 ` Gerd Möllmann 2022-10-02 16:32 ` Andreas Politz 2022-10-03 4:35 ` Gerd Möllmann 2022-10-04 10:50 ` Andreas Politz 2022-10-01 7:25 ` Gerd Möllmann 2022-10-01 10:55 ` Gerd Möllmann 2022-10-01 14:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-29 16:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-10-01 1:57 ` Richard Stallman 2022-10-01 7:00 ` Eli Zaretskii 2022-10-06 22:26 ` Matt Armstrong 2023-10-06 13:14 ` Gerd Möllmann
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).