From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Overlays as an AA-tree Date: Wed, 21 Sep 2016 10:52:23 -0400 Message-ID: References: <87d1jylv43.fsf@fastmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1474470407 32353 195.159.176.226 (21 Sep 2016 15:06:47 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 21 Sep 2016 15:06:47 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 21 17:06:40 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bmj6S-0005Im-Ae for ged-emacs-devel@m.gmane.org; Wed, 21 Sep 2016 17:06:28 +0200 Original-Received: from localhost ([::1]:42985 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmivm-0002T1-Fi for ged-emacs-devel@m.gmane.org; Wed, 21 Sep 2016 10:55:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmite-0001Lq-MU for emacs-devel@gnu.org; Wed, 21 Sep 2016 10:53:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmitX-0006y5-JF for emacs-devel@gnu.org; Wed, 21 Sep 2016 10:53:13 -0400 Original-Received: from [195.159.176.226] (port=58889 helo=blaine.gmane.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmitX-0006uJ-Bm for emacs-devel@gnu.org; Wed, 21 Sep 2016 10:53:07 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1bmit7-00006Y-Dy for emacs-devel@gnu.org; Wed, 21 Sep 2016 16:52:41 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 84 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:cTqueOijMgEszuBMNQRDxRe7BLs= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207657 Archived-At: > What I've been looking at is replacing the current implementation of > buffer overlays with a self balanced tree. I chose an AA-tree as that > seemed simple enough for me to grasp :). Could you describe how you use this tree? An AA-tree is indexed by a single "number", whereas overlays have two independent "numbers". So how to use this AA-tree to implement an *interval* tree? This should be documented in the code. E.g. the "struct Lisp_Overlay" should define very clearly the meaning of char_start, char_end, left, and right (i.e. which overlays go to the left, and which go to the right). While looking for an answer in the text I found: /* This function determines where overlays are inserted in the tree. FIXME: I didn't think too hard about this... */ which makes me suspect your design might not be quite right. Have you read https://en.wikipedia.org/wiki/Interval_tree ? [ BTW, our convention is to put the "*/" at the end of the last line rather than alone on the next line. ] This said, reading overlay_tree_insert_internal, I have the impression that you're implementing what wikipedia describes as an "Augmented tree" where the basic tree is your AA-tree, indexed by the left position of the overlays, with the `max` field being the "augmented" data, which sounds just fine. Is that right? > Anyway, I basically have two questions for you experts: > 1) Is it worth continuing down this path? I don't see why not. > 2) If so, what's the best way to go about something like this? Replacing > the whole thing at once seems very hard, but I don't really know how to > go about replacing it little by little. The only places you absolutely need to replace are all the places that need to modify the tree. There shouldn't be that many of them (basically, make-overlay, move-overlay, delete-overlay, plus text insertion/deletion). Then the rest can be modified little by little. Some tricky parts: - because of the insertion_type, two overlays may start with end1end2 without any call to move-overlay. - the overlay tree needs to be "weak" (i.e. we'll need to add special code in the GC). > I'm attaching the diff. It is an unholy mess of ifdefs, but the meat of > it is in overlays.[ch] and buffer.c. It is a pretty big diff, and it's > based on master from a few months ago, so I'm not even sure it applies, I wouldn't worry about merging (better yet: merge from master right away and then keep doing that on a regular basis, which should be easy since I don't think we've touched (nor will touch) this code very much). > Well, the Lisp-visible APIs weren't really the problem. The problem was > more in the 'internal' handling of overlays in buffer.c and in xdisp.c, > and also the fact that I had to reimplement a lot of the logic about how > markers are moved when the buffer changes. Speaking of which, is the > byte position stored in a marker of any significance in an overlay? > Otherwise I could at least get rid of those. AFAIK, the byte-position of markers is used, but the byte-position of overlays isn't, so you should be able to get rid of them. Richard said: > Since overlays have to be discarded whenever text is copied, > the need to go through the text properties and discard those > that are really overlays could make things substantially slower. My original idea was to keep overlays inside the intervals tree, but that'd be done by adding some fields to the interval struct, so you wouldn't need to do any extra work to "discard" the overlays when copying an interval tree: just don't copy the overlay-related fields while copying the interval tree. Stefan