From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Matt Armstrong Newsgroups: gmane.emacs.bugs Subject: bug#58342: 29.0.50; noverlay branch is O(N) for important calls Date: Thu, 06 Oct 2022 16:25:48 -0700 Message-ID: <87edvkcz5v.fsf@rfc20.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34350"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Gerd =?UTF-8?Q?M=C3=B6llmann?= , Eli Zaretskii , Andreas Politz , Stefan Monnier To: 58342@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Oct 07 01:27:11 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ogaGl-0008kk-Ba for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 07 Oct 2022 01:27:11 +0200 Original-Received: from localhost ([::1]:52004 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ogaGk-0006t0-5C for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 06 Oct 2022 19:27:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36688) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ogaGc-0006sI-Hf for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 19:27:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34764) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ogaGc-0005zY-7l for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 19:27:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ogaGc-0000ba-3A for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 19:27:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Matt Armstrong Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 06 Oct 2022 23:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 58342 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.16650987652253 (code B ref -1); Thu, 06 Oct 2022 23:27:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 6 Oct 2022 23:26:05 +0000 Original-Received: from localhost ([127.0.0.1]:33840 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ogaFg-0000aF-Jb for submit@debbugs.gnu.org; Thu, 06 Oct 2022 19:26:05 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:38080) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ogaFe-0000a7-G3 for submit@debbugs.gnu.org; Thu, 06 Oct 2022 19:26:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58024) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ogaFd-00062I-Sh for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 19:26:02 -0400 Original-Received: from relay6-d.mail.gandi.net ([2001:4b98:dc4:8::226]:52653) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ogaFa-0005uz-IR; Thu, 06 Oct 2022 19:26:01 -0400 Original-Received: (Authenticated sender: matt@rfc20.org) by mail.gandi.net (Postfix) with ESMTPSA id 66E72C0002; Thu, 6 Oct 2022 23:25:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rfc20.org; s=gm1; t=1665098752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=5jPPHht+iP/wOYNy946ljiEGfu6vROxdEzeTPdogBFA=; b=QIo+MzHrR1XvcVpvLhYjSFJcbeBZB/7A0UDG5bvWwe17Wz03YTrvt96FlIEtt/GNkozbyU GYiwT9mfJRv7TeECldYudUSe72LoItV+zFZMbdpg9054THBIrEM1wjgfL6x5CZEZ8rC8xj 6CTKqw1MeE6SEKMgC/ZxVH9SmvjtfyABiBXo6G3YTgDvbfPFM+gFPlBruSF3bSJ54Q9Tll QM4joUjC/OWfH5nSijqS3/P/qd9V/X+ms3bOuEZxzukPvBXEUtplidFTkiLgPbA5sne4Bj 0HbLszCAUQVMW0JAR0BEtrhPjjU+K8Kxt4l9hwdnUPx+x2vC9qkPPkYPzQxgcw== Original-Received: from matt by naz with local (Exim 4.96) (envelope-from ) id 1ogaFQ-0043mO-0Y; Thu, 06 Oct 2022 16:25:48 -0700 Received-SPF: pass client-ip=2001:4b98:dc4:8::226; envelope-from=matt@rfc20.org; helo=relay6-d.mail.gandi.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:244719 Archived-At: --=-=-= Content-Type: text/plain First let me preface: A) I hope I'm wrong here, but after careful thought I've failed to convince myself that I am, so I am either right or need help from others to spot my flawed logic. B) Even if I'm right the problem may not be judged serious enough to address. I believe that the feature/noverlay branch uses an O(N) algorithm for next-overlay-change and previous-overlay-change. Or, more precisely, O(MIN(N, (buffer-size))), where N is the overlay count. Now, in "normal" cases the real world achieved performance will be fine. If overlays form mostly disjoint intervals, without too many overlapping overlays, without too many overlays that span large portions of the buffer, the algorithm used will find the next/previous change quickly. However, consider the new next_overlay_change: 1 ptrdiff_t 2 next_overlay_change (ptrdiff_t pos) 3 { 4 ptrdiff_t next = ZV; 5 struct interval_node *node; 6 ITREE_FOREACH (node, current_buffer->overlays, pos, next, ASCENDING) 7 { 8 if (node->begin > pos) 9 { 10 /* If we reach this branch, node->begin must be the least upper bound 11 of pos, because the search is limited to [pos,next) . */ 12 eassert (node->begin < next); 13 next = node->begin; 14 ITREE_FOREACH_ABORT (); 15 break; 16 } 17 else if (node->begin < node->end && node->end < next) 18 { 19 next = node->end; 20 ITREE_FOREACH_NARROW (pos, next); 21 } 22 } 23 return next; 24 } Here we traverse overlays in ASCENDING order of BEG positions. The best we can say is that this loop executes in O(K*log(N)) time, where K is the MIN of number of overlays that overlap POS and the number of valid positions in the buffer after POS. It is always going to be possible to construct pathalogical cases where lines 19-20 are taken for as many positions as there are in the buffer after POS, since the tree is not ordered by END. I've attached a patch that can go in to itree.c (which, if I'm right, I think would be good to add, especially if we decide to not improve the data structure). I've quoted the text: ==== FIXME: this data structure is O(N) for important operations -matt ==== The amortized costs of Emacs' previous-overlay-change and next-overlay-change functions are O(N) with this data structure. The root problem is that we only have an order for the BEG field, but not the END. The previous/next overlay change operations need to find the nearest point where there is *either* an interval BEG or END point, but there is no efficient way to narrow the search space over END postions. Consider the case where next-overlay-change is called at POS, all interval BEG positions are less than pos POS and all interval END posistions are after. These END positions have no order, and so *every* interval must be examined. This is at least O(N). The previous-overlay-change case is similar. The root issue is that the iterative "narrowing" approach is not guaranteed to reduce the search space in logarithmic time, since END is not ordered in the tree. One might argue that the LIMIT value will do this narrowing, but this narrowing is O(K*log(N)) where K is the size of the result set. If we are interested in finding the node in a range with the smallest END, we might have to examine all K nodes in that range. In the case of the *-overlay-channge functions, K may well be equal to N. Ideally, a tree based data structure for overlays would have O(log(N)) performance for previous-overlay-change and next-overlay-change, as these are called in performance sensitive situations such as redisplay. The only way I can think of achieving this is by keeping one ordering by BEG and a separate ordering by END, and then performing logic quite similar to the current Emacs overlays-before and overlays-after lists. --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0005-src-itree.c-Add-comment-describing-when-noverlay-is-.patch >From 6ca364a6ad45b1cfdcf080c492fde536975f6e09 Mon Sep 17 00:00:00 2001 From: Matt Armstrong Date: Thu, 6 Oct 2022 15:47:20 -0700 Subject: [PATCH 5/5] ; * src/itree.c: Add comment describing when noverlay is O(N) --- src/itree.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/src/itree.c b/src/itree.c index 79e39d6e2a..6de84ae0b8 100644 --- a/src/itree.c +++ b/src/itree.c @@ -62,6 +62,42 @@ Copyright (C) 2017-2022 Free Software Foundation, Inc. complexity of O(K*log(N)) for this operation, where K is the size of the result set and N the size of the tree. + ==== + FIXME: this data structure is O(N) for important operations -matt + ==== + + The amortized costs of Emacs' previous-overlay-change and + next-overlay-change functions are O(N) with this data structure. + The root problem is that we only have an order for the BEG field, + but not the END. The previous/next overlay change operations need + to find the nearest point where there is *either* an interval BEG + or END point, but there is no efficient way to narrow the search + space over END postions. + + Consider the case where next-overlay-change is called at POS, all + interval BEG positions are less than pos POS and all interval END + posistions are after. These END positions have no order, and so + *every* interval must be examined. This is at least O(N). The + previous-overlay-change case is similar. The root issue is that + the iterative "narrowing" approach is not guaranteed to reduce the + search space in logarithmic time, since END is not ordered in the + tree. + + One might argue that the LIMIT value will do this narrowing, but + this narrowing is O(K*log(N)) where K is the size of the result + set. If we are interested in finding the node in a range with the + smallest END, we might have to examine all K nodes in that range. + In the case of the *-overlay-channge functions, K may well be equal + to N. + + Ideally, a tree based data structure for overlays would have + O(log(N)) performance for previous-overlay-change and + next-overlay-change, as these are called in performance sensitive + situations such as redisplay. The only way I can think of + achieving this is by keeping one ordering by BEG and a separate + ordering by END, and then performing logic quite similar to the + current Emacs overlays-before and overlays-after lists. + ==== Adjusting intervals ==== Since this data-structure will be used for overlays in an Emacs -- 2.35.1 --=-=-=--