unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#58342: 29.0.50; noverlay branch is O(N) for important calls
@ 2022-10-06 23:25 Matt Armstrong
  2022-10-07  1:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Armstrong @ 2022-10-06 23:25 UTC (permalink / raw)
  To: 58342; +Cc: Gerd Möllmann, Eli Zaretskii, Andreas Politz, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 4079 bytes --]

First let me preface:

A) I hope I'm wrong here, but after careful thought I've failed to
convince myself that I am, so I am either right or need help from others
to spot my flawed logic.

B) Even if I'm right the problem may not be judged serious enough to
address.

I believe that the feature/noverlay branch uses an O(N) algorithm for
next-overlay-change and previous-overlay-change.  Or, more precisely,
O(MIN(N, (buffer-size))), where N is the overlay count.

Now, in "normal" cases the real world achieved performance will be fine.
If overlays form mostly disjoint intervals, without too many overlapping
overlays, without too many overlays that span large portions of the
buffer, the algorithm used will find the next/previous change quickly.

However, consider the new next_overlay_change:

     1	ptrdiff_t
     2	next_overlay_change (ptrdiff_t pos)
     3	{
     4	  ptrdiff_t next = ZV;
     5	  struct interval_node *node;
     6	  ITREE_FOREACH (node, current_buffer->overlays, pos, next, ASCENDING)
     7	    {
     8	      if (node->begin > pos)
     9	        {
    10	          /* If we reach this branch, node->begin must be the least upper bound
    11	             of pos, because the search is limited to [pos,next) . */
    12	          eassert (node->begin < next);
    13	          next = node->begin;
    14	          ITREE_FOREACH_ABORT ();
    15	          break;
    16	        }
    17	      else if (node->begin < node->end && node->end < next)
    18	        {
    19	          next = node->end;
    20	          ITREE_FOREACH_NARROW (pos, next);
    21	        }
    22	    }
    23	  return next;
    24	}

Here we traverse overlays in ASCENDING order of BEG positions.  The best
we can say is that this loop executes in O(K*log(N)) time, where K is
the MIN of number of overlays that overlap POS and the number of valid
positions in the buffer after POS.  It is always going to be possible to
construct pathalogical cases where lines 19-20 are taken for as many
positions as there are in the buffer after POS, since the tree is not
ordered by END.

I've attached a patch that can go in to itree.c (which, if I'm right, I
think would be good to add, especially if we decide to not improve the
data structure).  I've quoted the text:

   ====
   FIXME: this data structure is O(N) for important operations -matt
   ====

   The amortized costs of Emacs' previous-overlay-change and
   next-overlay-change functions are O(N) with this data structure.
   The root problem is that we only have an order for the BEG field,
   but not the END.  The previous/next overlay change operations need
   to find the nearest point where there is *either* an interval BEG
   or END point, but there is no efficient way to narrow the search
   space over END postions.

   Consider the case where next-overlay-change is called at POS, all
   interval BEG positions are less than pos POS and all interval END
   posistions are after.  These END positions have no order, and so
   *every* interval must be examined.  This is at least O(N).  The
   previous-overlay-change case is similar.  The root issue is that
   the iterative "narrowing" approach is not guaranteed to reduce the
   search space in logarithmic time, since END is not ordered in the
   tree.

   One might argue that the LIMIT value will do this narrowing, but
   this narrowing is O(K*log(N)) where K is the size of the result
   set.  If we are interested in finding the node in a range with the
   smallest END, we might have to examine all K nodes in that range.
   In the case of the *-overlay-channge functions, K may well be equal
   to N.

   Ideally, a tree based data structure for overlays would have
   O(log(N)) performance for previous-overlay-change and
   next-overlay-change, as these are called in performance sensitive
   situations such as redisplay.  The only way I can think of
   achieving this is by keeping one ordering by BEG and a separate
   ordering by END, and then performing logic quite similar to the
   current Emacs overlays-before and overlays-after lists.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-src-itree.c-Add-comment-describing-when-noverlay-is-.patch --]
[-- Type: text/x-diff, Size: 2596 bytes --]

From 6ca364a6ad45b1cfdcf080c492fde536975f6e09 Mon Sep 17 00:00:00 2001
From: Matt Armstrong <matt@rfc20.org>
Date: Thu, 6 Oct 2022 15:47:20 -0700
Subject: [PATCH 5/5] ; * src/itree.c: Add comment describing when noverlay is
 O(N)

---
 src/itree.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/itree.c b/src/itree.c
index 79e39d6e2a..6de84ae0b8 100644
--- a/src/itree.c
+++ b/src/itree.c
@@ -62,6 +62,42 @@ Copyright (C) 2017-2022  Free Software Foundation, Inc.
    complexity of O(K*log(N)) for this operation, where K is the size
    of the result set and N the size of the tree.
 
+   ====
+   FIXME: this data structure is O(N) for important operations -matt
+   ====
+
+   The amortized costs of Emacs' previous-overlay-change and
+   next-overlay-change functions are O(N) with this data structure.
+   The root problem is that we only have an order for the BEG field,
+   but not the END.  The previous/next overlay change operations need
+   to find the nearest point where there is *either* an interval BEG
+   or END point, but there is no efficient way to narrow the search
+   space over END postions.
+
+   Consider the case where next-overlay-change is called at POS, all
+   interval BEG positions are less than pos POS and all interval END
+   posistions are after.  These END positions have no order, and so
+   *every* interval must be examined.  This is at least O(N).  The
+   previous-overlay-change case is similar.  The root issue is that
+   the iterative "narrowing" approach is not guaranteed to reduce the
+   search space in logarithmic time, since END is not ordered in the
+   tree.
+
+   One might argue that the LIMIT value will do this narrowing, but
+   this narrowing is O(K*log(N)) where K is the size of the result
+   set.  If we are interested in finding the node in a range with the
+   smallest END, we might have to examine all K nodes in that range.
+   In the case of the *-overlay-channge functions, K may well be equal
+   to N.
+
+   Ideally, a tree based data structure for overlays would have
+   O(log(N)) performance for previous-overlay-change and
+   next-overlay-change, as these are called in performance sensitive
+   situations such as redisplay.  The only way I can think of
+   achieving this is by keeping one ordering by BEG and a separate
+   ordering by END, and then performing logic quite similar to the
+   current Emacs overlays-before and overlays-after lists.
+
    ==== Adjusting intervals ====
 
    Since this data-structure will be used for overlays in an Emacs
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-10-08 23:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 23:25 bug#58342: 29.0.50; noverlay branch is O(N) for important calls Matt Armstrong
2022-10-07  1:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-07  7:09   ` Eli Zaretskii
2022-10-07 13:36     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-07 13:57       ` Eli Zaretskii
2022-10-07 14:47         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-07 15:23   ` Matt Armstrong
2022-10-07 16:51     ` bug#58361: " Andreas Politz
2022-10-07 18:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-07 17:11     ` bug#58342: " Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-07 20:37       ` Matt Armstrong
2022-10-07 21:22         ` Drew Adams
2022-10-08  0:27           ` Drew Adams
2022-10-07 21:58         ` Dmitry Gutov
2022-10-08  6:20         ` Eli Zaretskii
2022-10-08 13:08           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-08 17:24             ` Drew Adams
2022-10-08 23:08               ` Matt Armstrong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).