From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Pip Cet <pipcet@protonmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: MPS: marker-vector
Date: Mon, 05 Aug 2024 21:14:13 +0000
Message-ID: <87bk26vfj1.fsf@protonmail.com>
References: <jwvv81lg65d.fsf-monnier+emacs@gnu.org> <87le2hp6ug.fsf@localhost>
 <jwvzfqx0wde.fsf-monnier+emacs@gnu.org>
 <dfJE6JR_EIS55b5yYx-PajGk5L_q5N9QyzB7X-oD2HbK6e8EO2iJs2CF8xv-DFLMpatiulXVHdMul0D25KaVbyVd6MUy78DDWD9iaZGUhZg=@protonmail.com>
 <jwvo77c28lx.fsf-monnier+emacs@gnu.org> <87v81455iw.fsf@gmail.com>
 <jwvr0bqa30h.fsf-monnier+emacs@gnu.org> <87jzh8djdt.fsf@gmail.com>
 <87a5hqsq2l.fsf_-_@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="22883"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: =?utf-8?Q?Gerd_M=C3=B6llmann?= <gerd@gnu.org>,
 Stefan Monnier <monnier@iro.umontreal.ca>,
 Ihor Radchenko <yantar92@posteo.net>, emacs-devel@gnu.org
To: Helmut Eller <eller.helmut@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 06 04:25:45 2024
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1sb9tQ-0005iI-Kv
	for ged-emacs-devel@m.gmane-mx.org; Tue, 06 Aug 2024 04:25:44 +0200
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1sb9sk-0007F0-If; Mon, 05 Aug 2024 22:25:02 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@protonmail.com>)
 id 1sb527-0001xf-M9
 for emacs-devel@gnu.org; Mon, 05 Aug 2024 17:14:23 -0400
Original-Received: from mail-4316.protonmail.ch ([185.70.43.16])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@protonmail.com>)
 id 1sb525-0000pR-4d; Mon, 05 Aug 2024 17:14:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail3; t=1722892457; x=1723151657;
 bh=8m4pUwHO/RJTFx/B2YdKh2lM+qaNF1Xsw9763QwfnEk=;
 h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References:
 Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID:
 Message-ID:BIMI-Selector;
 b=X6I+wIW3emHxS/N3HdbIDemXW2LkVTtqqSQxe0EXFtz8F3GvuLaWzl4GQw+i/kiGz
 jRgYZx6MihStBqtNGnpzZ1LQ5pu9YC9ZDambhrj2vOI9VcUFnEj9B5b6pgSgruACwy
 3/f3KCbdIu+2CNAmgm/ZY8XAXfQLzvTNXo4Zr2lGsJTFn0ffVlY208X82+q/5EHBOY
 KBTYSSnWG4D1BuKdtkjwyia7mYreUsoqFUQiCy6ds9PY7qcwBpJQSCMNseXhsgmint
 jsdz4NIN/CFQOUWN32xZTU4cryGnVUB/aj4dusBc4CXpfMDt1d3JBnFdEtkBR2i0mE
 yDJjNqoiScWDg==
In-Reply-To: <87a5hqsq2l.fsf_-_@gmail.com>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: 750fdc0c402afd44be07922e81b0d5c184eed4dd
Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@protonmail.com;
 helo=mail-4316.protonmail.ch
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Mailman-Approved-At: Mon, 05 Aug 2024 22:25:00 -0400
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:322419
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/322419>

"Helmut Eller" <eller.helmut@gmail.com> writes:

> What would you think about changing the marker-vector-with-free-list to
> a mundane growable array like in the patch below?

> The main motivation for this would that we could then iterate in
> reverse, or more precisely, in an order that is more like the order used
> for the linked-list-of-markers.  This is relevant for the heuristic in
> buf_charpos_to_bytepos that creates a temporary marker; it works better
> if those temporary markers are visited first.

Thanks for investigating this! I finally understand why the current MPS
code is slower now, I think.

I wonder whether we couldn't reuse more of the weak hash table code for
this, though...

> This replaces the vector-with-free-list by a growable vector, i.e. the
> free entries are always kept at the end of the vector.

I don't think that's entirely accurate; it's quite possible for an entry
at the beginning of the array to be splatted and remain untouched until
the next time a DO_MARKERS reaches it, which may take a long time.

I see one somewhat theoretical problem with this patch: if a marker is
simultaneously kept in a weak hash table, it's possible for it to be
splatted from the marker vector while remaining in the weak hash table
(there's no guarantee all references will be splatted at the same time);
if it is then retrieved from the weak hash table and made to point
nowhere, we will try to remove it from the marker vector, and hit the
igc_assert.

The rest of my comments are tiny nits, really:

- capacity isn't redundant on 32-bit systems
- I'd prefer the marker index to be signed; if it is unsigned, we don't nee=
d to assert it's >=3D 0, and assigning -1 to it confused me...
- you shouldn't compare Lisp_Objects with =3D=3D
- I'd prefer checking for splatted elements before deciding to grow the vec=
tor, if we can do so efficiently
- I find XFIXNAT easier to read when the number is guaranteed to be nonnega=
tive
- using alloca is problematic for large vectors (which shouldn't be dumped,=
 thus a nit)

> Using Stefan's elb-bytechar benchmark, I get for the
> linked-list-of-markers:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.80 |            0.00 |
>   | bytechar-100k          ||       11.85 |            0.00 |
>   | bytechar-100k-nolookup ||        9.19 |            0.00 |
>   | bytechar-100k-random   ||       16.73 |            0.02 |
>   | bytechar-100k-rev      ||       11.86 |            0.00 |
>   | bytechar-10k-random    ||       12.36 |            0.01 |
>   | bytechar-1k-random     ||       11.93 |            0.00 |
>   | bytechar-nolookup      ||        9.15 |            0.00 |
>   |------------------------++-------------+-----------------|
>   | total                  ||       94.88 |            0.02 |
>
> for the vector-with-free-list:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.63 |            0.01 |
>   | bytechar-100k          ||       11.91 |            0.37 |
>   | bytechar-100k-nolookup ||        8.80 |            0.01 |
>   | bytechar-100k-random   ||      248.07 |            3.84 |
>   | bytechar-100k-rev      ||       11.71 |            0.02 |
>   | bytechar-10k-random    ||       35.24 |            0.53 |
>   | bytechar-1k-random     ||       14.01 |            0.06 |
>   | bytechar-nolookup      ||        8.69 |            0.13 |
>   |------------------------++-------------+-----------------|
>   | total                  ||      350.06 |            3.89 |
>
> and for the growable array:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.34 |            0.08 |
>   | bytechar-100k          ||       11.59 |            0.47 |
>   | bytechar-100k-nolookup ||        8.78 |            0.12 |
>   | bytechar-100k-random   ||       16.17 |            0.33 |
>   | bytechar-100k-rev      ||       11.31 |            0.03 |
>   | bytechar-10k-random    ||       11.76 |            0.01 |
>   | bytechar-1k-random     ||       11.34 |            0.08 |
>   | bytechar-nolookup      ||        8.70 |            0.09 |
>   |------------------------++-------------+-----------------|
>   | total                  ||       91.00 |            0.61 |

Hard to argue with those numbers :-)

I wonder whether it wouldn't be faster, upon encountering a marker that
has been splatted, to fix the entire array all at once. That would
ensure that creation order is respected, and splatting is relatively
rare (and, when splatted, we can expect most of the array to have been
splatted; indeed, I suspect it'd be best to give up on the marker vector
and build a new one so the old one can be collected and we don't have to
worry about never shrinking it).

Pip