unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Pip Cet <pipcet@gmail.com>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: emacs-devel@gnu.org
Subject: Re: master 669aeaf: Prefer make_nil_vector to make-vector with nil
Date: Sat, 15 Aug 2020 19:53:32 +0000	[thread overview]
Message-ID: <CAOqdjBekWDBarVAn9fcLFJU89u7yEoTUN7+LGd6-dycxZ+oVZA@mail.gmail.com> (raw)
In-Reply-To: <fdeac2ac-8b32-cb12-5f4e-9d29174d5e52@cs.ucla.edu>

On Sat, Aug 15, 2020 at 6:48 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 8/12/20 6:05 AM, Pip Cet wrote:
>
> > bugs like the one we currently have in fns.c (Fdelete allocates an
> > uninitialized vector, then calls Fequal, which can call quit, leaving
> > a half-initialized vector on the stack and potentially marked by
> > conservative GC) are bound to bite us one day
>
> Good catch; I hadn't noticed this GC-related bug. I looked through the Emacs
> source code for similar instances of the bug, and fixed the ones I found with
> the first attached patch.

I'm not sure all of those are strictly necessary, but I'm all in favor. Thanks.

> > Particularly for
> > small-ish fixed-size vectors, it seems to me uninitialized vectors are
> > more trouble than they're worth (in fact, I could see make_vector (n,
> > Qnil) being faster on many CPUs, because the cache lines are written
> > to completely and don't have to be brought in from RAM.
>
> I don't quite follow. If Fmake_vector (n, Qnil) invokes memset to clear the
> memory, then it should work the same way as (make_uninit_vector followed by
> immediate initialization) as far as the hardware cache is concerned.

It depends on whether "immediate" is immediate enough for the
hardware. IIRC, the last CPU whose documentation I looked at closely
allowed only register operations between the movq insns, so

v = make_uninit_vector (n);
for (i = 0; i < n; i++)
  ASET (v, i, AREF (w, i));

would be enough to force the cache line containing v->contents to be
read, whereas

v = make_nil_vector (n);
for (i = 0; i < n; i++)
  ASET (v, i, AREF (w, i));

would combine the initializing zero writes to that cache line, notice
the previous cache line contents had become irrelevant, and never
actually fault in the cache line.

Needless to say, with code like

for (i = 0; i < n; i++)
  ASET (v, i, Fcomplicated_function (AREF (w, i)));

no CPU will be able to decide that the cache line containing
v->contents will be written to completely.

IOW, memset is okay both on the CPU I dimly recall and modern actual
CPUs, copying data from another vector might or might not be okay
depending on the CPU, anything more complicated than that is likely
not okay on any current CPU.

> And if
> Fmake_vector (n, Qnil) doesn't invoke memset but instead relies on calloc (which
> in turn just mmaps /dev/zero or whatever), then Fmake_vector should be slower
> than uninitialized vectors due to the cache effects that you mention.

I've not said anything about the case of pre-cleared memory; memset
can be faster than overwriting memory one word at a time, because
cache lines' previous contents become irrelevant in a way that's
obvious enough for the CPU to catch.

> (If this
> is a significant performance issue with Fmake_vector then we should fix it, but
> that issue is independent of this discussion.)

I agree. I just don't think it's at all obvious that
make_uninit_vector is worth it even if it's used perfectly, and that
erroneous usage tips the scales in favor of dropping it.

> If we changed hash_table_contents to use Fcopy_sequence, wouldn't
> hash_table_rehash become a bit slower? hash_table_rehash would need to look at
> the entire sequence instead of just its first first h->count elements.

Yeah, but the slight slowdown would be worth it to keep the code
simpler, IMHO. OTOH, I'm working on the more ambitious hash table
rewrite and it would remove the key_and_value vector entirely, so I'd
rather focus on that and go back to optimizing the current hash table
implementation if the rewrite proposal is shot down.

> That being said, you're right that the Emacs code was using make_uninit_vector
> too often, even aside from the GC bugs noted above. Inspired by your comment, I
> went through the Emacs source and replaced all calls to make_uninit_vector (and
> allocate_vector) that were easy to replace and didn't seem to make any
> performance difference, by installing the second attached patch.

Thank you!

> This patch isn't as aggressive as your comment suggested, though, as it doesn't
> replace code like this from charset.c:
>
>    val = make_uninit_vector (8);
>    for (i = 0; i < 8; i++)
>      ASET (val, i, make_fixnum (code_space[i]));
>
> Here, changing make_uninit_vector to make_nil_vector initializes unnecessarily,
> there's no chance of GC before the vector is initialized, and readability is not
> significantly improved by changing the code to something like the following:
>
>    val = CALLN (Fvector,
>                make_fixnum (code_space[0]), make_fixnum (code_space[1]),
>                make_fixnum (code_space[2]), make_fixnum (code_space[3]),
>                make_fixnum (code_space[4]), make_fixnum (code_space[5]),
>                make_fixnum (code_space[6]), make_fixnum (code_space[7]));

I agree the hypothetical code is much less readable, and your patches
(modulo the fns.c problems) look fine to me. Thank you again!



      reply	other threads:[~2020-08-15 19:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200811173009.24292.8963@vcs0.savannah.gnu.org>
     [not found] ` <20200811173011.46A2E20CB8@vcs0.savannah.gnu.org>
2020-08-11 18:31   ` master 669aeaf: Prefer make_nil_vector to make-vector with nil Pip Cet
2020-08-12  0:01     ` Paul Eggert
2020-08-12 13:05       ` Pip Cet
2020-08-15 18:48         ` Paul Eggert
2020-08-15 19:53           ` Pip Cet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOqdjBekWDBarVAn9fcLFJU89u7yEoTUN7+LGd6-dycxZ+oVZA@mail.gmail.com \
    --to=pipcet@gmail.com \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).