From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Markers in a gap array Date: Thu, 04 Jul 2024 16:11:28 -0400 Message-ID: References: <87ikxlqwu6.fsf@localhost> <87le2hp6ug.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4405"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Ihor Radchenko Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 04 22:12:35 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sPSol-0000wK-Ho for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Jul 2024 22:12:35 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sPSno-00014O-7h; Thu, 04 Jul 2024 16:11:36 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sPSnn-00014A-4c for emacs-devel@gnu.org; Thu, 04 Jul 2024 16:11:35 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sPSnl-0004Ei-5p for emacs-devel@gnu.org; Thu, 04 Jul 2024 16:11:34 -0400 Original-Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 73F9A100055; Thu, 4 Jul 2024 16:11:31 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1720123889; bh=K6xpYZfjV2Pr+wBkusif5QDHzy088q2odsVPpGtuv5k=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=inYnrvnxGFwxeYDv4aduLAFL6CEOG7gELPdYWZFxp9W4T5EWwwukfEi4J3iIQ/Wwk aIu6ixwqX5KQSda3TiMQQWRKdJu32gvFjxdO8aznHtoa574h2Ybmo5cpuNwr3QyuwE 4kPJ3/iz3z7O0LEaU95n7+g+ELf21xNe7JlzqJMySsSo8m17kvMCDs7lcBwO6vXDza dLls/u2NVRY+fMXZ82e4A1H/6ZFkiTC/l1ustHjX/Wud+IW/q8XgeDT5aIctEzH9vk 5htNoDy1U6VR6J2shETaf+KmeSX48baWdulhKSkovZJitFXPzzxltBR3RjqJyvTgMO gwme3yjTi2unQ== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 6AFB5100042; Thu, 4 Jul 2024 16:11:29 -0400 (EDT) Original-Received: from lechazo (lechon.iro.umontreal.ca [132.204.27.242]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 552DD120679; Thu, 4 Jul 2024 16:11:29 -0400 (EDT) In-Reply-To: <87le2hp6ug.fsf@localhost> (Ihor Radchenko's message of "Thu, 04 Jul 2024 14:30:47 +0000") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321359 Archived-At: Ihor Radchenko [2024-07-04 14:30:47] wrote: > Stefan Monnier writes: > >>> Some perf stats: >>> >>> ;; Switch to todo and mark next 3 times, on branch >>> ;; 28.72% emacs emacs >>> [.] markers_sanity_check >> >> Did you build with or without assertions? > > Without. > >> And indeed, I need to rework them to be "more conditional" (but I was >> focused on correctness until now). You should probably remove those >> calls to `markers_sanity_check` by hand when testing performance, sorry. > > Without these calls, I can see some speed improvement in > buf_bytepos_to_charpos, but I do not currently have a reliable > reproducer to trigger buf_bytepos_to_charpos slowdown on master, so it > is comparing very small numbers. Hmm... I tried a benchmark based on: (defconst elb-bytechar-buffer (let ((buf (get-buffer-create " *elb-bytechar*"))) (with-current-buffer buf (let ((step (apply #'concat "=F0=9F=99=82 foo\n" (make-list 2000 = "asdf ")))) (dotimes (_ (/ 10000000 (length step))) (insert step)) buf)))) =20=20=20=20 (defconst elb-bytechar-re "\\<.\\> \\<.\\> bar") =20=20=20=20 (defun elb-bytechar--aux (nmarkers lookup &optional marker-fun) (with-current-buffer elb-bytechar-buffer (let ((step (/ (buffer-size) nmarkers)) (markers nil)) (dotimes (i nmarkers) (push (copy-marker (funcall (or marker-fun #'identity) (* i ste= p))) markers)) (dotimes (_ 10) (goto-char (point-min)) (let ((parse-sexp-lookup-properties lookup)) (re-search-forward elb-bytechar-re nil t)))))) where I call `elb-bytechar--aux` with various arguments. [ This benchmark is a test of the performance of bytepos->charpos conversion because the regexp engine works only with bytepos internally and it needs to convert it to charpos whenever it looks up the `syntax-table` text-property, which happens for example for \< and \>. IME, this is the most important use of the bytepos->charpos conversion. ] And like you, I don't see any speed improvement from the branch. On the other hand, my trivial "thinko fix" b595b4598 (which I thought would have no real-life effect) seems to make a significant difference (see the results below). So maybe the reason why you can't reproduce the slowdown is because of b595b4598? And maybe we should install that into `emacs-30`? In any case, these benchmarks suggest my branch isn't very exciting performancewise. =F0=9F=99=81 Also I don't have an explanation for the difference in performance between bytechar-100k (8.00) and bytechar-100k-random/rev (~9.00) on `markers-as-gap-array`: `rev` just builds the markers in reverse order and `random` puts the markers at random positions. Since my gap-array keeps the markers sorted, the order in which they're created should not affect the end result, and I don't think that placing them randomly in the text should make much difference either (unless the performance difference is just due to the time needed to compute `random`?). Stefan PS: Beware the "tot avg error", because the machine I used for those benchmarks is a poor fit, with a CPU whose top-frequency varies enormously depending on temperature and such, and I was using the machine (lightly, but still) at the same time as the benchmarks were running. * markers-as-gap-array | test | non-gc avg (s) | gc avg (s) | gcs avg | tot av= g (s) | tot avg err (s) | |------------------------+----------------+------------+---------+-------= ------+-----------------| | bytechar | 7.86 | 0.00 | 0 | = 7.86 | 0.05 | | bytechar-100k | 8.00 | 0.00 | 0 | = 8.00 | 0.15 | | bytechar-100k-nolookup | 5.99 | 0.00 | 0 | = 5.99 | 0.07 | | bytechar-100k-random | 9.20 | 0.00 | 0 | = 9.20 | 0.23 | | bytechar-100k-rev | 9.05 | 0.00 | 0 | = 9.05 | 0.59 | | bytechar-10k-random | 8.09 | 0.00 | 0 | = 8.09 | 0.06 | | bytechar-1k-random | 7.91 | 0.00 | 0 | = 7.91 | 0.03 | | bytechar-nolookup | 5.91 | 0.00 | 0 | = 5.91 | 0.01 | |------------------------+----------------+------------+---------+-------= ------+-----------------| * master | test | non-gc avg (s) | gc avg (s) | gcs avg | tot av= g (s) | tot avg err (s) | |------------------------+----------------+------------+---------+-------= ------+-----------------| | bytechar | 7.73 | 0.00 | 0 | = 7.73 | 0.40 | | bytechar-100k | 8.04 | 0.00 | 0 | = 8.04 | 0.02 | | bytechar-100k-nolookup | 5.93 | 0.00 | 0 | = 5.93 | 0.02 | | bytechar-100k-random | 10.05 | 0.00 | 0 | = 10.05 | 0.01 | | bytechar-100k-rev | 7.99 | 0.00 | 0 | = 7.99 | 0.01 | | bytechar-10k-random | 8.23 | 0.00 | 0 | = 8.23 | 0.05 | | bytechar-1k-random | 8.05 | 0.00 | 0 | = 8.05 | 0.03 | | bytechar-nolookup | 5.86 | 0.00 | 0 | = 5.86 | 0.01 | |------------------------+----------------+------------+---------+-------= ------+-----------------| * master before commit b595b4598 (mixup byte/char) | test | non-gc avg (s) | gc avg (s) | gcs avg | tot av= g (s) | tot avg err (s) | |------------------------+----------------+------------+---------+-------= ------+-----------------| | bytechar | 7.97 | 0.00 | 0 | = 7.97 | 0.60 | | bytechar-100k | 16.64 | 0.00 | 0 | = 16.64 | 0.43 | | bytechar-100k-nolookup | 6.80 | 0.00 | 0 | = 6.80 | 1.07 | | bytechar-100k-random | 16.85 | 0.00 | 0 | = 16.85 | 1.03 | | bytechar-100k-rev | 13.56 | 0.00 | 0 | = 13.56 | 0.10 | | bytechar-10k-random | 14.15 | 0.00 | 0 | = 14.15 | 0.07 | | bytechar-1k-random | 14.06 | 0.00 | 0 | = 14.06 | 0.20 | | bytechar-nolookup | 5.93 | 0.00 | 0 | = 5.93 | 0.03 | |------------------------+----------------+------------+---------+-------= ------+-----------------| * /usr/bin/emacs (29.4): | test | non-gc avg (s) | gc avg (s) | gcs avg | tot av= g (s) | tot avg err (s) | |------------------------+----------------+------------+---------+-------= ------+-----------------| | bytechar | 7.92 | 0.00 | 0 | = 7.92 | 1.07 | | bytechar-100k | 16.27 | 0.00 | 0 | = 16.27 | 1.91 | | bytechar-100k-nolookup | 6.16 | 0.00 | 0 | = 6.16 | 0.04 | | bytechar-100k-random | 15.91 | 0.00 | 0 | = 15.91 | 0.29 | | bytechar-100k-rev | 13.38 | 0.00 | 0 | = 13.38 | 0.06 | | bytechar-10k-random | 15.47 | 0.00 | 0 | = 15.47 | 3.06 | | bytechar-1k-random | 14.26 | 0.00 | 0 | = 14.26 | 1.22 | | bytechar-nolookup | 6.03 | 0.00 | 0 | = 6.03 | 0.02 | |------------------------+----------------+------------+---------+-------= ------+-----------------|