Re: MPS: marker-vector

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Pip Cet <pipcet@protonmail.com>
To: Helmut Eller <eller.helmut@gmail.com>
Cc: "Gerd Möllmann" <gerd@gnu.org>,
	"Stefan Monnier" <monnier@iro.umontreal.ca>,
	"Ihor Radchenko" <yantar92@posteo.net>,
	emacs-devel@gnu.org
Subject: Re: MPS: marker-vector
Date: Mon, 05 Aug 2024 21:14:13 +0000	[thread overview]
Message-ID: <87bk26vfj1.fsf@protonmail.com> (raw)
In-Reply-To: <87a5hqsq2l.fsf_-_@gmail.com>

"Helmut Eller" <eller.helmut@gmail.com> writes:

> What would you think about changing the marker-vector-with-free-list to
> a mundane growable array like in the patch below?

> The main motivation for this would that we could then iterate in
> reverse, or more precisely, in an order that is more like the order used
> for the linked-list-of-markers.  This is relevant for the heuristic in
> buf_charpos_to_bytepos that creates a temporary marker; it works better
> if those temporary markers are visited first.

Thanks for investigating this! I finally understand why the current MPS
code is slower now, I think.

I wonder whether we couldn't reuse more of the weak hash table code for
this, though...

> This replaces the vector-with-free-list by a growable vector, i.e. the
> free entries are always kept at the end of the vector.

I don't think that's entirely accurate; it's quite possible for an entry
at the beginning of the array to be splatted and remain untouched until
the next time a DO_MARKERS reaches it, which may take a long time.

I see one somewhat theoretical problem with this patch: if a marker is
simultaneously kept in a weak hash table, it's possible for it to be
splatted from the marker vector while remaining in the weak hash table
(there's no guarantee all references will be splatted at the same time);
if it is then retrieved from the weak hash table and made to point
nowhere, we will try to remove it from the marker vector, and hit the
igc_assert.

The rest of my comments are tiny nits, really:

- capacity isn't redundant on 32-bit systems
- I'd prefer the marker index to be signed; if it is unsigned, we don't need to assert it's >= 0, and assigning -1 to it confused me...
- you shouldn't compare Lisp_Objects with ==
- I'd prefer checking for splatted elements before deciding to grow the vector, if we can do so efficiently
- I find XFIXNAT easier to read when the number is guaranteed to be nonnegative
- using alloca is problematic for large vectors (which shouldn't be dumped, thus a nit)

> Using Stefan's elb-bytechar benchmark, I get for the
> linked-list-of-markers:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.80 |            0.00 |
>   | bytechar-100k          ||       11.85 |            0.00 |
>   | bytechar-100k-nolookup ||        9.19 |            0.00 |
>   | bytechar-100k-random   ||       16.73 |            0.02 |
>   | bytechar-100k-rev      ||       11.86 |            0.00 |
>   | bytechar-10k-random    ||       12.36 |            0.01 |
>   | bytechar-1k-random     ||       11.93 |            0.00 |
>   | bytechar-nolookup      ||        9.15 |            0.00 |
>   |------------------------++-------------+-----------------|
>   | total                  ||       94.88 |            0.02 |
>
> for the vector-with-free-list:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.63 |            0.01 |
>   | bytechar-100k          ||       11.91 |            0.37 |
>   | bytechar-100k-nolookup ||        8.80 |            0.01 |
>   | bytechar-100k-random   ||      248.07 |            3.84 |
>   | bytechar-100k-rev      ||       11.71 |            0.02 |
>   | bytechar-10k-random    ||       35.24 |            0.53 |
>   | bytechar-1k-random     ||       14.01 |            0.06 |
>   | bytechar-nolookup      ||        8.69 |            0.13 |
>   |------------------------++-------------+-----------------|
>   | total                  ||      350.06 |            3.89 |
>
> and for the growable array:
>
>   | test                   || tot avg (s) | tot avg err (s) |
>   |------------------------++-------------+-----------------|
>   | bytechar               ||       11.34 |            0.08 |
>   | bytechar-100k          ||       11.59 |            0.47 |
>   | bytechar-100k-nolookup ||        8.78 |            0.12 |
>   | bytechar-100k-random   ||       16.17 |            0.33 |
>   | bytechar-100k-rev      ||       11.31 |            0.03 |
>   | bytechar-10k-random    ||       11.76 |            0.01 |
>   | bytechar-1k-random     ||       11.34 |            0.08 |
>   | bytechar-nolookup      ||        8.70 |            0.09 |
>   |------------------------++-------------+-----------------|
>   | total                  ||       91.00 |            0.61 |

Hard to argue with those numbers :-)

I wonder whether it wouldn't be faster, upon encountering a marker that
has been splatted, to fix the entire array all at once. That would
ensure that creation order is respected, and splatting is relatively
rare (and, when splatted, we can expect most of the array to have been
splatted; indeed, I suspect it'd be best to give up on the marker vector
and build a new one so the old one can be collected and we don't have to
worry about never shrinking it).

Pip

next prev parent reply	other threads:[~2024-08-05 21:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-04  4:59 Markers in a gap array Stefan Monnier
2024-07-04 10:24 ` Ihor Radchenko
2024-07-04 13:16   ` Stefan Monnier
2024-07-04 14:30     ` Ihor Radchenko
2024-07-04 20:11       ` Stefan Monnier
2024-07-04 20:34         ` Pip Cet
2024-07-04 20:42           ` Stefan Monnier
2024-07-17 16:48             ` Helmut Eller
2024-07-18 20:46               ` Stefan Monnier
2024-07-26 19:48                 ` Helmut Eller
2024-08-05 19:54                   ` MPS: marker-vector (was: Markers in a gap array) Helmut Eller
2024-08-05 21:14                     ` Pip Cet [this message]
2024-08-06  6:28                       ` MPS: marker-vector Helmut Eller
2024-08-06  6:51                         ` Gerd Möllmann
2024-08-06 14:36                         ` Pip Cet
2024-08-06 16:15                           ` Helmut Eller
2024-08-06  3:59                     ` Gerd Möllmann
2024-08-06  6:02                       ` Helmut Eller
2024-07-04 22:24         ` Markers in a gap array Stefan Monnier
2024-07-07 12:31         ` Ihor Radchenko
2024-07-07 13:09         ` Konstantin Kharlamov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bk26vfj1.fsf@protonmail.com \
    --to=pipcet@protonmail.com \
    --cc=eller.helmut@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=gerd@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).