From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Matt Armstrong Newsgroups: gmane.emacs.devel Subject: Re: noverlay branch Date: Sun, 09 Oct 2022 19:57:43 -0700 Message-ID: <87pmf04c7s.fsf@rfc20.org> References: <87sfjzefvv.fsf@rfc20.org> <875ygt6gbj.fsf@rfc20.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36189"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Oct 10 04:59:10 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ohj0X-0009D6-Kt for ged-emacs-devel@m.gmane-mx.org; Mon, 10 Oct 2022 04:59:09 +0200 Original-Received: from localhost ([::1]:33802 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ohj0W-0000S1-Ky for ged-emacs-devel@m.gmane-mx.org; Sun, 09 Oct 2022 22:59:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48488) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohizM-0007Nt-9T for emacs-devel@gnu.org; Sun, 09 Oct 2022 22:57:56 -0400 Original-Received: from relay8-d.mail.gandi.net ([217.70.183.201]:53327) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ohizI-0002Zx-L1 for emacs-devel@gnu.org; Sun, 09 Oct 2022 22:57:55 -0400 Original-Received: (Authenticated sender: matt@rfc20.org) by mail.gandi.net (Postfix) with ESMTPSA id E65711BF203; Mon, 10 Oct 2022 02:57:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rfc20.org; s=gm1; t=1665370667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p2CcQrNJQKBISHJblv206cuAIEwdpoek77LR4A/ccY4=; b=eA4lqqt51UmabYHndB98/z0G/nnxk/bvJw55BLHwrwURzSOyzdK5ZRsZXLDFp3KJ/IzVWV WVMVQoGnQ4bqzLtRfKW3Q7xWqbBLuWymoPatpAM/XLYu1yfPh6pNi5u6Xq4FX4JaFKMwmZ ofdJrkiDU15En85Yf3a/U8ROTmsiMWWlkSBNeVx2zjFv4LKYh45es7dad0NxxXgIID2AD9 yqgyzpMYMCX7S6DzchkNVTmHJeCDvR+2bepzn6Mc7rkjLhGpB9K+w3zr8uKyEyV33RJ+vK iIoSth1CLFqLhpxYgG/6P/PBIfejWpbOLy4tHunXnfN/3G36+PUDQBffKZo12w== Original-Received: from matt by naz with local (Exim 4.96) (envelope-from ) id 1ohiz9-006iim-2s; Sun, 09 Oct 2022 19:57:43 -0700 In-Reply-To: Received-SPF: pass client-ip=217.70.183.201; envelope-from=matt@rfc20.org; helo=relay8-d.mail.gandi.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:297303 Archived-At: --=-=-= Content-Type: text/plain Stefan Monnier writes: >>> node->otick = otick; >> >> Indeed, I saw later in the remove code that we do propagate offsets >> without caring if they're up-to-date, and I think the logic behind that >> is sound. And indeed we can drop the test because the only property we >> only ever care about for an `otick` is whether it's equal to >> `tree->otick`, so if node->parent->otick != otick then performing the >> assignment is equivalent to not performing it. > > The above isn't quite right: > I was thinking of `node->otick = parent->otick`. Hey Stefan, yep, I'll have to check out the changes you made today. I had a different idea of tightening the otree invariant to this: 1) A node's otick must be greater than or equal to its children's. 2) There is no tree->otick, just tree->root->otick. That is what is incremented when offsets are added. 3) All downward tree traversal propagates offsets and otick. This strikes me as conceptually simpler. E.g. since the invariant is defined recursively it might allow for more local reasoning, clearer assertions, less places where "offset math" is needed, etc. But you've lived in this code a bit longer than me. What do you think? I have a commit halfway worked up so maybe I can show that if it works out... Anyway, thanks for fixing my "thinko" bug in the check_tree code -- I had the fix but should have sent it faster. I didn't actually intend for that to be commited yet (pushed it to sr.ht by accident), but maybe it helped you chase down the bugs you fixed today? I have two patches, mostly to add more checks to the invariant checking in check_tree. Tests on noverlay are passing again as of your commits today, which is great! On https://git.sr.ht/~matta/emacs/log/feature/noverlay but also below. --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-test-src-buffer-tests.el-Remove-unecessary-message-c.patch >From 246acbddbeb3e9a390fe78242259182af0c2cc18 Mon Sep 17 00:00:00 2001 From: Matt Armstrong Date: Sun, 9 Oct 2022 10:12:32 -0700 Subject: [PATCH 1/2] ; * test/src/buffer-tests.el: Remove unecessary `message' calls. --- test/src/buffer-tests.el | 2 -- 1 file changed, 2 deletions(-) diff --git a/test/src/buffer-tests.el b/test/src/buffer-tests.el index 01780a15cc..9bccbdf2e8 100644 --- a/test/src/buffer-tests.el +++ b/test/src/buffer-tests.el @@ -1554,10 +1554,8 @@ test-overlay-randomly ;; is to initially steadily increase the overlay count, then ;; steadily decrease it, then repeat. (when (and growing (= overlay-count overlay-count-limit)) - (message "now shrinking") (setq growing nil)) (when (and (not growing) (= overlay-count 0)) - (message "now growing") (setq growing t)) ;; Create or delete a random overlay according to a -- 2.35.1 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0002-Add-.-configure-enable-checking-overlays.patch >From 1d40c60265c34caaf9d8f5ad0c66322e7281902d Mon Sep 17 00:00:00 2001 From: Matt Armstrong Date: Sun, 9 Oct 2022 19:42:14 -0700 Subject: [PATCH 2/2] Add ./configure --enable-checking=overlays This enables expensive validation checks of overlay data structures. Some are too expensive to turn on by default with --enable-checking=all since they turn O(log N) algorithms into O(N) algorithms. * src/itree.c (ENABLE_OVERLAY_CHECKING): define to 0 if undefined. (struct check_subtree_result): new struct. (check_subtree): renamed from recurse_check_tree, modified to use check_subtree_result and check more invariants. (check_tree): use check_subtree_result. * configure.ac: add support for ENABLE_OVERLAY_CHECKING. --- configure.ac | 11 +++- src/itree.c | 164 +++++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 148 insertions(+), 27 deletions(-) diff --git a/configure.ac b/configure.ac index 8ba52a492b..42a55c7572 100644 --- a/configure.ac +++ b/configure.ac @@ -582,7 +582,7 @@ AC_DEFUN enable only specific categories of checks. Categories are: all,yes,no. Flags are: stringbytes, stringoverrun, stringfreelist, - structs, glyphs])], + structs, glyphs, overlays])], [ac_checking_flags="${enableval}"],[]) IFS="${IFS= }"; ac_save_IFS="$IFS"; IFS="$IFS," CHECK_STRUCTS=false @@ -596,7 +596,8 @@ AC_DEFUN ac_gc_check_stringbytes= ; ac_gc_check_string_overrun= ; ac_gc_check_string_free_list= ; - ac_glyphs_debug= ;; + ac_glyphs_debug= ; + ac_enable_overlay_checking= ;; all) ac_enable_checking=1 ; CHECK_STRUCTS=true ac_gc_check_stringbytes=1 ; @@ -609,6 +610,7 @@ AC_DEFUN stringfreelist) ac_gc_check_string_free_list=1 ;; structs) CHECK_STRUCTS=true ;; glyphs) ac_glyphs_debug=1 ;; + overlays) ac_enable_overlay_checking=1 ;; *) AC_MSG_ERROR([unknown check category $check]) ;; esac done @@ -645,6 +647,11 @@ AC_DEFUN AC_DEFINE([GLYPH_DEBUG], [1], [Define this to enable glyphs debugging code.]) fi +if test x$ac_enable_overlay_checking != x ; then + AC_DEFINE([ENABLE_OVERLAY_CHECKING], [1], +[Define this temporarily to hunt a bug. If defined, expensive + invariant checking of the overlay data structures is enabled.]) +fi dnl The name of this option is unfortunate. It predates, and has no dnl relation to, the "sampling-based elisp profiler" added in 24.3. diff --git a/src/itree.c b/src/itree.c index bbab70dac7..31a7e01090 100644 --- a/src/itree.c +++ b/src/itree.c @@ -144,6 +144,10 @@ Copyright (C) 2017-2022 Free Software Foundation, Inc. static struct interval_generator* interval_generator_create (struct interval_tree *); static void interval_tree_insert (struct interval_tree *, struct interval_node *); +#ifndef ENABLE_OVERLAY_CHECKING +#define ENABLE_OVERLAY_CHECKING 0 +#endif + /* The sentinel node, the null node. */ struct interval_node itree_null; @@ -205,14 +209,44 @@ itree_init (void) iter = interval_generator_create (NULL); } -static ptrdiff_t -recurse_check_tree (struct interval_node *node, uintmax_t tree_otick, - ptrdiff_t offset, ptrdiff_t min_begin, - ptrdiff_t max_begin, intmax_t *size) +struct check_subtree_result +{ + /* Were all nodes visited? */ + bool complete; + + /* Computed node count of the tree. */ + int size; + + /* Computed limit of the tree (max END). */ + ptrdiff_t limit; + + /* Computed black height of the tree (count of black nodes from the + bottom up to the root). */ + int black_height; +}; + +static struct check_subtree_result +check_subtree (struct interval_node *node, uintmax_t tree_otick, + int max_depth, ptrdiff_t offset, ptrdiff_t min_begin, + ptrdiff_t max_begin) { + struct check_subtree_result result = { .complete = false, + .size = 0, + .limit = PTRDIFF_MIN, + .black_height = 0 }; if (node == ITREE_NULL) - return PTRDIFF_MIN; - ++*size; + { + /* Every nil node of a Red-Black tree is black */ + result.black_height = 1; + result.complete = true; + return result; + }; + + if (max_depth == 0) + { + result.complete = false; + return result; + } /* Validate structure. */ eassert ( @@ -221,12 +255,23 @@ recurse_check_tree (struct interval_node *node, uintmax_t tree_otick, eassert (node->left == ITREE_NULL || node->left->parent == node); eassert (node->right == ITREE_NULL || node->right->parent == node); - /* We don't check the RB invariants here (neither the absence of - red+red nor the equal-black-depth), so that we can use this check - even while the tree is temporarily breaking some of those invarints. */ - /* Red nodes cannot have red parents. */ - /* eassert (node->parent == ITREE_NULL - || !(node->red && node->parent->red)); */ + /* We don't normally check the RB invariants here (neither the + absence of red+red nor the equal-black-depth), so that we can use + this check even while the tree is temporarily breaking some of + those invarints. You can enable them if you want. */ + if (false) + { + /* If a node is red then both of its children are black. Red + nodes cannot have red parents. */ + if (node->red) + { + eassert (node->left == ITREE_NULL + || node->left->red == false); + eassert (node->right == ITREE_NULL + || node->right->red == false); + eassert (node->parent == ITREE_NULL || !node->parent->red); + } + } eassert (node->offset == 0 || node->otick < tree_otick); @@ -240,18 +285,44 @@ recurse_check_tree (struct interval_node *node, uintmax_t tree_otick, eassert (begin <= max_begin); eassert (end <= limit); - ptrdiff_t left_limit - = recurse_check_tree (node->left, tree_otick, offset, min_begin, - begin, size); - ptrdiff_t right_limit - = recurse_check_tree (node->right, tree_otick, offset, begin, - max_begin, size); - eassert (left_limit <= limit); - eassert (right_limit <= limit); - eassert (limit == max (end, max (left_limit, right_limit))); - return limit; + struct check_subtree_result left_result + = check_subtree (node->left, tree_otick, max_depth - 1, offset, + min_begin, begin); + struct check_subtree_result right_result + = check_subtree (node->right, tree_otick, max_depth - 1, offset, + begin, max_begin); + + eassert (left_result.limit <= limit); + eassert (right_result.limit <= limit); + + result.complete = left_result.complete && right_result.complete; + if (result.complete) + { + /* Every path from a node to a descendent leaf contains the same + number of black nodes. Often said this way: all nodes have + the same "black height". */ + eassert (left_result.black_height == right_result.black_height); + result.black_height + = (node->red ? 0 : 1) + left_result.black_height; + + result.size = 1 + left_result.size + right_result.size; + result.limit + = max (end, max (left_result.limit, right_result.limit)); + + eassert (limit == result.limit); + } + + return result; } +/* Validate invariants for TREE. + + This runs in constant time when ENABLE_OVERLAY_CHECKING is 0 + (i.e. Emacs is not configured with + "--enable_checking=yes,overlays"). In this mode it can't check all + the invariants. When ENABLE_OVERLAY_CHECKING is 1 it checks the + entire tree and validates all invariants. +*/ static bool check_tree (struct interval_tree *tree) { @@ -260,9 +331,52 @@ check_tree (struct interval_tree *tree) if (tree->root == ITREE_NULL) return true; - intmax_t size = 0; - recurse_check_tree (tree->root, tree->otick, 0, - PTRDIFF_MIN, PTRDIFF_MAX, &size); + /* Establish an upper bound on the height of the tree. + + A red-black tree with n nodes has height at least log2(n + 1) but + no more than 2 * log2(n + 1). A red-black tree with height h has + at least pow(2, h / 2) - 1 nodes but no more than pow(2, h) - 1. + + On 64 bit architectures a red black tree has at least 2 pointers, + 8 bytes each. A 64 bit architecture has a theoretical maximum of + 2 ^ 64 addressable bytes. + + Putting this together, the upper bound for the height of any + red-black tree on a 64 bit architecture is 120. + + 2 * log2((2 ^ 64 bytes) / (2 * 8) bytes + 1) => 120 + + In truth, the memory overhead for our tree nodes is more than two + pointers, but even if we say each has 1 KiB overhead the bound + only drops to 108, which is an unimportant difference here. + */ + const int max_height = 120; + + /* + When ENABLE_OVERLAY_CHECKING is true, check the invariants of the + entire tree. + + Otherwise limit invariant checking to at most 257 nodes. When so + limited some of the invariants cannot be asserted, but the + asymtotic run time of the tree algorithms is preserved, so it is + practical to enable them whenever ENABLE_CHECKING (without + ENABLE_OVERLAY_CHECKING) is defined. + + Note: a tree of height 8 can have at most 2^8-1 or 257 nodes. + */ + const int max_depth + = (ENABLE_OVERLAY_CHECKING || tree->size <= 257) ? max_height : 8; + + struct interval_node *node = tree->root; + struct check_subtree_result result + = check_subtree (node, tree->otick, max_depth, node->offset, + PTRDIFF_MIN, PTRDIFF_MAX); + if (ENABLE_OVERLAY_CHECKING) + eassert (result.complete); + if (result.complete) + eassert (result.size == tree->size); + + /* The only way this function fails is eassert(). */ return true; } -- 2.35.1 --=-=-=--