From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: noverlay branch Date: Tue, 27 Sep 2022 09:53:53 +0300 Message-ID: <83ill9joji.fsf@gnu.org> References: <87bkr1e6yb.fsf@rfc20.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22891"; mail-complaints-to="usenet@ciao.gmane.io" Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Matt Armstrong Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Sep 27 09:40:50 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1od5Cz-0005mu-1n for ged-emacs-devel@m.gmane-mx.org; Tue, 27 Sep 2022 09:40:49 +0200 Original-Received: from localhost ([::1]:49204 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1od5Cx-0002Fy-3R for ged-emacs-devel@m.gmane-mx.org; Tue, 27 Sep 2022 03:40:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43562) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1od4Ti-000634-JW for emacs-devel@gnu.org; Tue, 27 Sep 2022 02:54:03 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:60160) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1od4Th-0004VV-Jm; Tue, 27 Sep 2022 02:54:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=2UI1irjp9oLeLfPaP3KH4LeD39w8XHzBDVGx3hwpYh4=; b=S5HhJFjSWzM7 USt2QXHW2PXtBT31witowgPqTRGglos07anGfPvf/xuFGIhaUa4QWrAojAO9V3+j1EFyWh2/Ib9Xd s9ARSBvV8Wu5gP/sJrGMfqrzUkX+pCUBdQ11WPtjKZhbZ6NGMxAf5EeqpNzIZKeqSkhUjfIs1PWGH NdkY4ZV8S5+YdU59nzKr6A7skiMQOIUNhTNsngd3FyZxyNNWZSnEDXdkZ5PWXdwKnDwsN/NAT4C8+ DgV56G1PjNnCsJ6djSE3e6Q5Bn8BClD6qbrtiCp/PIvpeO3GRXUVMMdZ4PS76ll1irFcn7PWmllnG tFQ5WOclKlFn6T7e0nsB7w==; Original-Received: from [87.69.77.57] (port=3130 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1od4Tf-0007V4-LU; Tue, 27 Sep 2022 02:54:01 -0400 In-Reply-To: <87bkr1e6yb.fsf@rfc20.org> (message from Matt Armstrong on Mon, 26 Sep 2022 22:12:44 -0700) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:296320 Archived-At: > From: Matt Armstrong > Date: Mon, 26 Sep 2022 22:12:44 -0700 > > Stefan Monnier writes: > > > I just updated the noverlay branch to the code in `master` (most of it > > was mechanical except for the fact that since the last commit on that > > branch we have gotten rid of Lisp_Misc and we moved to pdumper). > > This is pretty exciting! > > I've been looking at the noverlay branch as well in some detail, but > only as a background task, over the past year. > > I haven't considered the branch ready for merge because I saw some areas > where I think the implementation should improve. But, working code is a > compelling thing. If noverlay is ready to merge I'd suggest doing it. > Code can improve later, as and if needed. > > I was planning to takle these problems before proposing a merge: > > 1) Improve the worst case run time of `previous-overlay-change' from > O(n) to O(log N). The noverlay branch uses an O(N) algorithm, > though it is difficult to spot. Since the point of using a tree is > O(log N) algorithms, and O(n) algorithms can easily become > exponential algorithms when composed in higher level loops (the > problem overlays sees today), this strikes me as important. > > 2) Look at reducing the number of malloc'd overlay blocks in half by > expressing the tree intrusively (the same way the overlay list is > today). I don't see a lot of value in itree.h/c abstracting away > the interval logic from the overlay object itself. > > 3) Improve quality of comments in the new code. Personally, I find the > algorithms quite subtle and quite a bit more complex than what you > find on, say, https://en.wikipedia.org/wiki/Interval_tree or the > Cormen et al. Introduction to Algorithms Book. I think I pieced > most of it together but it took a lot of effort. At top of mind is > looking at the interval_node.visited flag and figure out how that > flag is used, then describe the algorithm in detail. It isn't clear > to me how that flag gets set/cleared. Best case: doing so proves me > wrong on point (1). > > And lower priority: > > 4) The overlay `front-advance' and `rear-advance' booleans are > conceptually part of the overlay's BEG and END positions, except > that this is ignored everhwhere except insertion. Upon insertion at > any given POS the overlays are according to *both* their BEG or END > positions and the *-advance booleans. Yet, this is not used when > ordering overlays in the tree. Doing that may bring an opportunity > to simplify code or make it more efficient. (Side note: there *may* > also be a way to encode the *-advance flags implicitly in the > beg/end fields positions if a way can be found to "steal" the free > bit in the currently signed ptrdiff_t fields, effectively causing > the *-advance flags to count as this extra "half" position for the > purpose of insertion). > > 5) Look at using an augmented B-tree of overlays instead of a binary > tree. B-trees are quite often faster than binary trees (at least on > hardware made over the past few decades), so there is that enticing > proposition. They're also typically not harder to implement, > either, requiring no "rotations" nor convluted deletion logic. An > *augmented* B-tree may also allow for simplification of the relative > vs. absolute offsets in the tree. The noverlay branch currently > handles this by treating the absolute offsets as cached values, > "dirtying" them whenever any mutation occurs in the tree, and > recomputing absolute offsets on demand. If augmented in the right > way a B-tree might be shallow enough in practice that recomputation > on demand is always fast enough, so the the invalidation/caching > approach (as well as the memory used to track it) can go away. > (wild idea) Thank you for your interest in this branch. We want to have this feature in Emacs 29, so, barring some grave bugs that will be uncovered soon enough, this branch will land on master soon. You can work on the items you've mentioned either now or after the branch is merged. Item 3 can be done immediately, and will be greatly appreciated, as comments that explain how the code works are always welcome. Of the rest, item 1 sounds like the most important one, but do you have ideas for how to achieve that?