From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Markers in a gap array Date: Thu, 04 Jul 2024 00:59:56 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23361"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 04 07:00:49 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sPEaP-0005v0-9X for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Jul 2024 07:00:49 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sPEZw-000588-0n; Thu, 04 Jul 2024 01:00:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sPEZt-00057d-1o for emacs-devel@gnu.org; Thu, 04 Jul 2024 01:00:17 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sPEZd-0001XB-FU for emacs-devel@gnu.org; Thu, 04 Jul 2024 01:00:16 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 9524280822; Thu, 4 Jul 2024 00:59:59 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1720069198; bh=GHY/I/IL2kY5MeMzXHDR8zdmc/kbxDg7nLghCttQR5Q=; h=From:To:Subject:Date:From; b=B8kROHGUtsc4ac488og08aGvbMn8cr3HQJSXDUVsRRbyR3RF0GMAkY02IzM4O1rwp XEGYUQC9cZqq9brtZ8i5lb4EfDhjWiPzoObsOQHyYQGvYE/GesFI6sGAr7S3hP2lZp ktZJkmhOoZTaeKzRNwFbZIqhVswNGWH8vHJO1AXwtLzMeK0IOt7NfrhmdcsJEQxx++ JsY46L+6G6QtajwA1Cpp5fWM3fU2Fw+wznMurzMd46BaX51Lp5ospAzH7XLyLXCWEV K8rWj2AbCpXp3CSTcYy6xsCqdqTZx+iQWUylwpu6LwgK8duGLR/A36VlatiUi5P2TC i32rH/dsUnnGA== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 45C36807F5; Thu, 4 Jul 2024 00:59:58 -0400 (EDT) Original-Received: from pastel (unknown [45.72.245.253]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 240DA12034B; Thu, 4 Jul 2024 00:59:58 -0400 (EDT) Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321298 Archived-At: I've just pushed to `scratch/markers-as-gap-array` code that seems to be working correctly (passes all the tests, for example). The code is not ready for `master` (there are some cleanups to do), but if you use code that uses many markers, I'd be interested to hear about your experience with it. What it does: replace the unordered singly-linked list of markers that we keep in every buffer, with an "ordered array (with gap) of markers". The main purpose is to try and eliminate some pathological behavior in the bytepos<->charpos conversion routines (which "abuse" markers as a cache of bytepos<->charpos conversions) when you have many markers in large buffers. There's no free lunch, so this comes at the cost of slowing down other marker operations, which is why I'd like to hear about your experience. Here are those operations and how the performance is expected to compare: The good: - Conversion bytepos<->charpos used to be O(N), now it's O(lg N). The frequency of such conversions can vary drastically depending on the circumstances, so a lot of use-cases will see no difference at all, whereas some particular use-cases should see a significant speed up. The indifferent: - `make-marker`: Was and is still O(1). - `marker-position`: Was and is still O(1). - Adjustments when performing text insertion/deletion: Used to be O(N) and is still O(N), but with a smaller constant factor because it can fetch the markers in parallel whereas that used to be sequentialized by the pointer chasing of the singly-linked list and its branch instructions should be easier to predict. It's unlikely you'll see the gain, tho, because those adjustments are usually drowned in the noise of everything else we do upon text insertion/deletion. The bad: - `copy-marker`: Used to be O(1), now will cost time O(M + lg N) where N is the number of markers and M is the distance between where this marker is inserted and the previous position of the gap. The `lg N` factor should be hopefully cheap enough. The `M` factor could be more worrisome, since M can be as large as N, but there are two reasons to be optimistic: - The locality which makes our "gap buffer" usually efficient should hopefully work its same magic here keeping M usually small. - Moving the gap is a `memmove` and this can be performed quite fast even for fairly large M (i.e. the constant factor applied to M should be quite small). - `(set-marker m nil)`: Used to be O(N), now costs the same as `copy-marker`, hence O(M + lg N). Even when M=3DN, this should be much better than the previous worst case because the constant factor applied to M should be *much* smaller. So it could be considered to be part of "The good" except that (set-marker m nil) was often near O(1) in practice if `m` was recently created (in which case it was found near the beginning of the singly-linked list). So, for instance, `save-excursion` used to do a `copy-marker` followed by a (set-marker m nil), both of which were, typically, O(1) and are now made slower. Side note: IIUC this design is similar to the one used in XEmacs, so we're slowly catching up to them. =F0=9F=99=82 Stefan