From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Markers in a gap array Date: Thu, 18 Jul 2024 16:46:04 -0400 Message-ID: References: <87ikxlqwu6.fsf@localhost> <87le2hp6ug.fsf@localhost> <87v81455iw.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17122"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Pip Cet , Ihor Radchenko , emacs-devel@gnu.org To: Helmut Eller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 18 22:46:45 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sUY1U-0004Gm-VK for ged-emacs-devel@m.gmane-mx.org; Thu, 18 Jul 2024 22:46:45 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sUY0z-0003J4-BV; Thu, 18 Jul 2024 16:46:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sUY0x-0003It-KQ for emacs-devel@gnu.org; Thu, 18 Jul 2024 16:46:11 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sUY0u-0002Aw-Qs for emacs-devel@gnu.org; Thu, 18 Jul 2024 16:46:11 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 2EA7780883; Thu, 18 Jul 2024 16:46:06 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1721335564; bh=r40bXBeD7EoGSpDHAbmZijMgnW/d6rPSDwNXY1+VWB8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=dYht/Gq03numTdJdKmdqDNeSwwVpkdGOgEDhRzGW3jaN3LcCJd5HmYfr1f0SqzjTb EDejlgYqfyw9Dh/T4BT5he5iK5KTV9ODlrGP9MTW1194SDg9XE5hFUVi9235J7UpWg TrstGAilbiZLakSliqfqdRojbx/M9oBFxCeCdGeLEVRg1yvu7nh7djoX41wNHH6EXK kqEdF4wl1Lr0/H4jgO5unfsqLXxvj03Koh9KuBKKgJ7hE7Kc9tG/tHDXvdwxs0hBUE 7PBN83CxG3+zH1xYr7uB/wbD2sxeAG8sI6/vpkXdE9U1MzXUXnGJPB1LFH0hQw+F/l VzPt26Nm/2Wgw== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id DA46D8014E; Thu, 18 Jul 2024 16:46:04 -0400 (EDT) Original-Received: from lechazo (lechon.iro.umontreal.ca [132.204.27.242]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id C4851120403; Thu, 18 Jul 2024 16:46:04 -0400 (EDT) In-Reply-To: <87v81455iw.fsf@gmail.com> (Helmut Eller's message of "Wed, 17 Jul 2024 18:48:07 +0200") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321811 Archived-At: > The current scratch/igc branch, configured with MPS and -O2 > -fno-omit-frame-pointer: > > | test || tot avg (s) | tot avg err (s) | > |------------------------++-------------+-----------------| > | bytechar || 12.11 | 0.18 | > | bytechar-100k || 12.38 | 0.17 | > | bytechar-100k-nolookup || 9.14 | 0.22 | > | bytechar-100k-random || 271.52 | 14.27 | > | bytechar-100k-rev || 12.38 | 0.24 | > | bytechar-10k-random || 38.08 | 1.43 | > | bytechar-1k-random || 14.95 | 0.48 | > | bytechar-nolookup || 8.97 | 0.12 | > |------------------------++-------------+-----------------| > | total || 379.53 | 14.36 | > > and without MPS: > > | test || tot avg (s) | tot avg err (s) | > |------------------------++-------------+-----------------| > | bytechar || 11.42 | 0.03 | > | bytechar-100k || 11.48 | 0.02 | > | bytechar-100k-nolookup || 9.15 | 0.00 | > | bytechar-100k-random || 16.39 | 0.02 | > | bytechar-100k-rev || 11.48 | 0.02 | > | bytechar-10k-random || 11.97 | 0.02 | > | bytechar-1k-random || 11.56 | 0.01 | > | bytechar-nolookup || 9.13 | 0.04 | > |------------------------++-------------+-----------------| > | total || 92.58 | 0.06 | > > So the weak vector doesn't compare very well to the linked list. Hmm... I wonder why there is such a large difference for markers created in a random-order compared to the cases where they're created beg-to-end and end-to-beg. My crystal ball is of no help but suggests that it might hint at the fact that it's probably a silly effect that could be fixed easily once diagnosed. > Maybe because the vector only grows and never shrinks. But why would that only show up when the order is random? > The idea with the sorted markers in a gap array would probably avoid > the worst of this. You could try porting the code in `scratch/markers-as-gap-array`. I haven't touched it recently, but IIRC the code itself is in reasonably good shape: the commits themselves need to be improved (better commit messages, maybe sliced differently) and there's a bit of cleanup needed around the pdumper code, but it should be reliable enough. [ I'm still not completely sure if it's a good idea, tho: I mean, I like the idea, but the benchmarks aren't very compelling (they're not bad, but they're not very enticing either). ] Stefan PS: BTW, apparently I was confused about XEmacs' markers, they don't use a gap-array but a doubly-linked list. They use a gap-array for the extents (where we use a fancier balanced tree instead). Also they don't use markers to convert bytes<->chars. Instead they keep a (per buffer) cache in an unsorted array of 50 pairs. Surprisingly their markers store the bytepos rather than the charpos, so `marker-position` always needs to convert to charpos and `set-marker` does the other conversion.