unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Lars Ingebrigtsen <larsi@gnus.org>, Pip Cet <pipcet@gmail.com>
Cc: 36597-done@debbugs.gnu.org
Subject: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 02:33:34 -0700	[thread overview]
Message-ID: <c1aad0e7-6416-2327-24a2-fc5c916a87b8@cs.ucla.edu> (raw)
In-Reply-To: <87v9hq4ppp.fsf@gnus.org>

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On 8/10/20 6:04 AM, Lars Ingebrigtsen wrote:

> time make -j32 compile-always
> ...
> Er...  It's weird that there's so much difference in time between
> runs

I  think I get less variance if I do a sequential 'make' (i.e., without -j). Of 
course this takes longer.

> if we just take the mean
> from these numbers, it seems that your patch is making compilation
> faster.  :-)

It also simplifies the code a bit, so I took the liberty of installing it, after 
updating its commit message a bit and changing it to keep a comment that was 
updated recently (I figure this was a merge error). The patch I installed is the 
first patch attached. While reviewing the patch I noticed some relatively minor 
things in the neighborhood that could easily be fixed, so I did so by installing 
some followup patches, also attached (the last patch is the only one that's at 
all nontrivial). I'll mark the bug as done. Thanks to both of you for following 
up on this.

[-- Attachment #2: 0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch --]
[-- Type: text/x-patch, Size: 19345 bytes --]

From 8a33944ee20f57a8d2efe6fe68aa0f4745aa2d03 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Tue, 11 Aug 2020 02:16:53 -0700
Subject: [PATCH 1/7] Rehash hash tables eagerly after loading a dump

This simplifies code, and helps performance in some cases (Bug#36597).
* src/lisp.h (hash_rehash_needed_p): Remove.  All uses removed.
(hash_rehash_if_needed): Remove.  All uses removed.
(struct Lisp_Hash_Table): Remove comment about rehashing hash tables.
* src/pdumper.c (thaw_hash_tables): New function.
(hash_table_thaw): New function.
(hash_table_freeze): New function.
(dump_hash_table): Simplify.
(dump_hash_table_list): New function.
(hash_table_contents): New function.
(Fdump_emacs_portable): Handle hash tables by eager rehashing.
(pdumper_load): Restore hash tables.
(init_pdumper_once): New function.
---
 src/bytecode.c  |   1 -
 src/composite.c |   1 -
 src/emacs.c     |   1 +
 src/fns.c       |  65 ++++------------
 src/lisp.h      |  21 +----
 src/minibuf.c   |   3 -
 src/pdumper.c   | 201 +++++++++++++++++++++---------------------------
 src/pdumper.h   |   1 +
 8 files changed, 109 insertions(+), 185 deletions(-)

diff --git a/src/bytecode.c b/src/bytecode.c
index 1913a4812a..1c3b6eac0d 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -1401,7 +1401,6 @@ #define DEFINE(name, value) LABEL (name) ,
             Lisp_Object v1 = POP;
             ptrdiff_t i;
             struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
-            hash_rehash_if_needed (h);
 
             /* h->count is a faster approximation for HASH_TABLE_SIZE (h)
                here. */
diff --git a/src/composite.c b/src/composite.c
index f96f0b7772..ec2b8328f7 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -652,7 +652,6 @@ gstring_lookup_cache (Lisp_Object header)
 composition_gstring_put_cache (Lisp_Object gstring, ptrdiff_t len)
 {
   struct Lisp_Hash_Table *h = XHASH_TABLE (gstring_hash_table);
-  hash_rehash_if_needed (h);
   Lisp_Object header = LGSTRING_HEADER (gstring);
   Lisp_Object hash = h->test.hashfn (header, h);
   if (len < 0)
diff --git a/src/emacs.c b/src/emacs.c
index 8e5eaf5e43..d31fa2cb28 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1536,6 +1536,7 @@ main (int argc, char **argv)
   if (!initialized)
     {
       init_alloc_once ();
+      init_pdumper_once ();
       init_obarray_once ();
       init_eval_once ();
       init_charset_once ();
diff --git a/src/fns.c b/src/fns.c
index 811d6e8200..41e26104f3 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4248,50 +4248,28 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
 
 /* Recompute the hashes (and hence also the "next" pointers).
    Normally there's never a need to recompute hashes.
-   This is done only on first-access to a hash-table loaded from
-   the "pdump", because the object's addresses may have changed, thus
+   This is done only on first access to a hash-table loaded from
+   the "pdump", because the objects' addresses may have changed, thus
    affecting their hash.  */
 void
-hash_table_rehash (struct Lisp_Hash_Table *h)
+hash_table_rehash (Lisp_Object hash)
 {
-  ptrdiff_t size = HASH_TABLE_SIZE (h);
-
-  /* These structures may have been purecopied and shared
-     (bug#36447).  */
-  Lisp_Object hash = make_nil_vector (size);
-  h->next = Fcopy_sequence (h->next);
-  h->index = Fcopy_sequence (h->index);
-
+  struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
   /* Recompute the actual hash codes for each entry in the table.
      Order is still invalid.  */
-  for (ptrdiff_t i = 0; i < size; ++i)
+  for (ptrdiff_t i = 0; i < h->count; ++i)
     {
       Lisp_Object key = HASH_KEY (h, i);
-      if (!EQ (key, Qunbound))
-        ASET (hash, i, h->test.hashfn (key, h));
+      Lisp_Object hash_code = h->test.hashfn (key, h);
+      ptrdiff_t start_of_bucket = XUFIXNUM (hash_code) % ASIZE (h->index);
+      set_hash_hash_slot (h, i, hash_code);
+      set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
+      set_hash_index_slot (h, start_of_bucket, i);
+      eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
     }
 
-  /* Reset the index so that any slot we don't fill below is marked
-     invalid.  */
-  Ffillarray (h->index, make_fixnum (-1));
-
-  /* Rebuild the collision chains.  */
-  for (ptrdiff_t i = 0; i < size; ++i)
-    if (!NILP (AREF (hash, i)))
-      {
-        EMACS_UINT hash_code = XUFIXNUM (AREF (hash, i));
-        ptrdiff_t start_of_bucket = hash_code % ASIZE (h->index);
-        set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
-        set_hash_index_slot (h, start_of_bucket, i);
-        eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
-      }
-
-  /* Finally, mark the hash table as having a valid hash order.
-     Do this last so that if we're interrupted, we retry on next
-     access. */
-  eassert (hash_rehash_needed_p (h));
-  h->hash = hash;
-  eassert (!hash_rehash_needed_p (h));
+  for (ptrdiff_t i = h->count; i < ASIZE (h->next) - 1; i++)
+    set_hash_next_slot (h, i, i + 1);
 }
 
 /* Lookup KEY in hash table H.  If HASH is non-null, return in *HASH
@@ -4303,8 +4281,6 @@ hash_lookup (struct Lisp_Hash_Table *h, Lisp_Object key, Lisp_Object *hash)
 {
   ptrdiff_t start_of_bucket, i;
 
-  hash_rehash_if_needed (h);
-
   Lisp_Object hash_code = h->test.hashfn (key, h);
   if (hash)
     *hash = hash_code;
@@ -4339,8 +4315,6 @@ hash_put (struct Lisp_Hash_Table *h, Lisp_Object key, Lisp_Object value,
 {
   ptrdiff_t start_of_bucket, i;
 
-  hash_rehash_if_needed (h);
-
   /* Increment count after resizing because resizing may fail.  */
   maybe_resize_hash_table (h);
   h->count++;
@@ -4373,8 +4347,6 @@ hash_remove_from_table (struct Lisp_Hash_Table *h, Lisp_Object key)
   ptrdiff_t start_of_bucket = XUFIXNUM (hash_code) % ASIZE (h->index);
   ptrdiff_t prev = -1;
 
-  hash_rehash_if_needed (h);
-
   for (ptrdiff_t i = HASH_INDEX (h, start_of_bucket);
        0 <= i;
        i = HASH_NEXT (h, i))
@@ -4415,8 +4387,7 @@ hash_clear (struct Lisp_Hash_Table *h)
   if (h->count > 0)
     {
       ptrdiff_t size = HASH_TABLE_SIZE (h);
-      if (!hash_rehash_needed_p (h))
-	memclear (xvector_contents (h->hash), size * word_size);
+      memclear (xvector_contents (h->hash), size * word_size);
       for (ptrdiff_t i = 0; i < size; i++)
 	{
 	  set_hash_next_slot (h, i, i < size - 1 ? i + 1 : -1);
@@ -4452,9 +4423,7 @@ sweep_weak_table (struct Lisp_Hash_Table *h, bool remove_entries_p)
   for (ptrdiff_t bucket = 0; bucket < n; ++bucket)
     {
       /* Follow collision chain, removing entries that don't survive
-         this garbage collection.  It's okay if hash_rehash_needed_p
-         (h) is true, since we're operating entirely on the cached
-         hash values. */
+         this garbage collection.  */
       ptrdiff_t prev = -1;
       ptrdiff_t next;
       for (ptrdiff_t i = HASH_INDEX (h, bucket); 0 <= i; i = next)
@@ -4499,7 +4468,7 @@ sweep_weak_table (struct Lisp_Hash_Table *h, bool remove_entries_p)
                     set_hash_hash_slot (h, i, Qnil);
 
                   eassert (h->count != 0);
-                  h->count += h->count > 0 ? -1 : 1;
+                  h->count--;
                 }
 	      else
 		{
@@ -4923,7 +4892,7 @@ DEFUN ("hash-table-count", Fhash_table_count, Shash_table_count, 1, 1, 0,
   (Lisp_Object table)
 {
   struct Lisp_Hash_Table *h = check_hash_table (table);
-  eassert (h->count >= 0);
+
   return make_fixnum (h->count);
 }
 
diff --git a/src/lisp.h b/src/lisp.h
index 17b92a0414..d88038d91b 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -2275,11 +2275,7 @@ #define DEFSYM(sym, name) /* empty */
 
 struct Lisp_Hash_Table
 {
-  /* Change pdumper.c if you change the fields here.
-
-     IMPORTANT!!!!!!!
-
-     Call hash_rehash_if_needed() before accessing.  */
+  /* Change pdumper.c if you change the fields here.  */
 
   /* This is for Lisp; the hash table code does not refer to it.  */
   union vectorlike_header header;
@@ -2398,20 +2394,7 @@ HASH_TABLE_SIZE (const struct Lisp_Hash_Table *h)
   return size;
 }
 
-void hash_table_rehash (struct Lisp_Hash_Table *h);
-
-INLINE bool
-hash_rehash_needed_p (const struct Lisp_Hash_Table *h)
-{
-  return NILP (h->hash);
-}
-
-INLINE void
-hash_rehash_if_needed (struct Lisp_Hash_Table *h)
-{
-  if (hash_rehash_needed_p (h))
-    hash_table_rehash (h);
-}
+void hash_table_rehash (Lisp_Object);
 
 /* Default size for hash tables if not specified.  */
 
diff --git a/src/minibuf.c b/src/minibuf.c
index 9d870ce364..cb302c5a60 100644
--- a/src/minibuf.c
+++ b/src/minibuf.c
@@ -1212,9 +1212,6 @@ DEFUN ("try-completion", Ftry_completion, Stry_completion, 2, 3, 0,
       bucket = AREF (collection, idx);
     }
 
-  if (HASH_TABLE_P (collection))
-    hash_rehash_if_needed (XHASH_TABLE (collection));
-
   while (1)
     {
       /* Get the next element of the alist, obarray, or hash-table.  */
diff --git a/src/pdumper.c b/src/pdumper.c
index 63ee0fcb7f..10dfa8737f 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -105,17 +105,6 @@ #define VM_MS_WINDOWS 2
 # define VM_SUPPORTED 0
 #endif
 
-/* PDUMPER_CHECK_REHASHING being true causes the portable dumper to
-   check, for each hash table it dumps, that the hash table means the
-   same thing after rehashing.  */
-#ifndef PDUMPER_CHECK_REHASHING
-# if ENABLE_CHECKING
-#  define PDUMPER_CHECK_REHASHING 1
-# else
-#  define PDUMPER_CHECK_REHASHING 0
-# endif
-#endif
-
 /* Require an architecture in which pointers, ptrdiff_t and intptr_t
    are the same size and have the same layout, and where bytes have
    eight bits --- that is, a general-purpose computer made after 1990.
@@ -401,6 +390,8 @@ dump_fingerprint (char const *label,
      The start of the cold region is always aligned on a page
      boundary.  */
   dump_off cold_start;
+
+  dump_off hash_list;
 };
 
 /* Double-ended singly linked list.  */
@@ -558,6 +549,8 @@ dump_fingerprint (char const *label,
      heap objects.  */
   Lisp_Object bignum_data;
 
+  Lisp_Object hash_tables;
+
   unsigned number_hot_relocations;
   unsigned number_discardable_relocations;
 };
@@ -2616,78 +2609,62 @@ dump_vectorlike_generic (struct dump_context *ctx,
   return offset;
 }
 
-/* Determine whether the hash table's hash order is stable
-   across dump and load.  If it is, we don't have to trigger
-   a rehash on access.  */
-static bool
-dump_hash_table_stable_p (const struct Lisp_Hash_Table *hash)
+/* Return a vector of KEY, VALUE pairs in the given hash table H.  The
+   first H->count pairs are valid, the rest is left as nil.  */
+static Lisp_Object
+hash_table_contents (struct Lisp_Hash_Table *h)
 {
-  if (hash->test.hashfn == hashfn_user_defined)
+  if (h->test.hashfn == hashfn_user_defined)
     error ("cannot dump hash tables with user-defined tests");  /* Bug#36769 */
-  bool is_eql = hash->test.hashfn == hashfn_eql;
-  bool is_equal = hash->test.hashfn == hashfn_equal;
-  ptrdiff_t size = HASH_TABLE_SIZE (hash);
-  for (ptrdiff_t i = 0; i < size; ++i)
+  Lisp_Object contents = Qnil;
+
+  /* Make sure key_and_value ends up in the same order; charset.c
+     relies on it by expecting hash table indices to stay constant
+     across the dump.  */
+  for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h) - h->count; i++)
     {
-      Lisp_Object key =  HASH_KEY (hash, i);
-      if (!EQ (key, Qunbound))
-        {
-	  bool key_stable = (dump_builtin_symbol_p (key)
-			     || FIXNUMP (key)
-			     || (is_equal
-			         && (STRINGP (key) || BOOL_VECTOR_P (key)))
-			     || ((is_equal || is_eql)
-			         && (FLOATP (key) || BIGNUMP (key))));
-          if (!key_stable)
-            return false;
-        }
+      dump_push (&contents, Qnil);
+      dump_push (&contents, Qunbound);
     }
 
-  return true;
+  for (ptrdiff_t i = HASH_TABLE_SIZE (h) - 1; i >= 0; --i)
+    if (!NILP (HASH_HASH (h, i)))
+      {
+	dump_push (&contents, HASH_VALUE (h, i));
+	dump_push (&contents, HASH_KEY (h, i));
+      }
+
+  return CALLN (Fapply, Qvector, contents);
 }
 
-/* Return a list of (KEY . VALUE) pairs in the given hash table.  */
-static Lisp_Object
-hash_table_contents (Lisp_Object table)
+static dump_off
+dump_hash_table_list (struct dump_context *ctx)
 {
-  Lisp_Object contents = Qnil;
-  struct Lisp_Hash_Table *h = XHASH_TABLE (table);
-  for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
-    {
-      Lisp_Object key =  HASH_KEY (h, i);
-      if (!EQ (key, Qunbound))
-        dump_push (&contents, Fcons (key, HASH_VALUE (h, i)));
-    }
-  return Fnreverse (contents);
+  if (CONSP (ctx->hash_tables))
+    return dump_object (ctx, CALLN (Fapply, Qvector, ctx->hash_tables));
+  else
+    return 0;
 }
 
-/* Copy the given hash table, rehash it, and make sure that we can
-   look up all the values in the original.  */
 static void
-check_hash_table_rehash (Lisp_Object table_orig)
-{
-  ptrdiff_t count = XHASH_TABLE (table_orig)->count;
-  hash_rehash_if_needed (XHASH_TABLE (table_orig));
-  Lisp_Object table_rehashed = Fcopy_hash_table (table_orig);
-  eassert (!hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
-  XHASH_TABLE (table_rehashed)->hash = Qnil;
-  eassert (count == 0 || hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
-  hash_rehash_if_needed (XHASH_TABLE (table_rehashed));
-  eassert (!hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
-  Lisp_Object expected_contents = hash_table_contents (table_orig);
-  while (!NILP (expected_contents))
-    {
-      Lisp_Object key_value_pair = dump_pop (&expected_contents);
-      Lisp_Object key = XCAR (key_value_pair);
-      Lisp_Object expected_value = XCDR (key_value_pair);
-      Lisp_Object arbitrary = Qdump_emacs_portable__sort_predicate_copied;
-      Lisp_Object found_value = Fgethash (key, table_rehashed, arbitrary);
-      eassert (EQ (expected_value, found_value));
-      Fremhash (key, table_rehashed);
-    }
+hash_table_freeze (struct Lisp_Hash_Table *h)
+{
+  ptrdiff_t nkeys = XFIXNAT (Flength (h->key_and_value)) / 2;
+  h->key_and_value = hash_table_contents (h);
+  h->next_free = (nkeys == h->count ? -1 : h->count);
+  h->index = Flength (h->index);
+  h->next = h->hash = make_fixnum (nkeys);
+}
 
-  eassert (EQ (Fhash_table_count (table_rehashed),
-               make_fixnum (0)));
+static void
+hash_table_thaw (Lisp_Object hash)
+{
+  struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
+  h->index = Fmake_vector (h->index, make_fixnum (-1));
+  h->hash = Fmake_vector (h->hash, Qnil);
+  h->next = Fmake_vector (h->next, make_fixnum (-1));
+
+  hash_table_rehash (hash);
 }
 
 static dump_off
@@ -2699,51 +2676,11 @@ dump_hash_table (struct dump_context *ctx,
 # error "Lisp_Hash_Table changed. See CHECK_STRUCTS comment in config.h."
 #endif
   const struct Lisp_Hash_Table *hash_in = XHASH_TABLE (object);
-  bool is_stable = dump_hash_table_stable_p (hash_in);
-  /* If the hash table is likely to be modified in memory (either
-     because we need to rehash, and thus toggle hash->count, or
-     because we need to assemble a list of weak tables) punt the hash
-     table to the end of the dump, where we can lump all such hash
-     tables together.  */
-  if (!(is_stable || !NILP (hash_in->weak))
-      && ctx->flags.defer_hash_tables)
-    {
-      if (offset != DUMP_OBJECT_ON_HASH_TABLE_QUEUE)
-        {
-	  eassert (offset == DUMP_OBJECT_ON_NORMAL_QUEUE
-		   || offset == DUMP_OBJECT_NOT_SEEN);
-          /* We still want to dump the actual keys and values now.  */
-          dump_enqueue_object (ctx, hash_in->key_and_value, WEIGHT_NONE);
-          /* We'll get to the rest later.  */
-          offset = DUMP_OBJECT_ON_HASH_TABLE_QUEUE;
-          dump_remember_object (ctx, object, offset);
-          dump_push (&ctx->deferred_hash_tables, object);
-        }
-      return offset;
-    }
-
-  if (PDUMPER_CHECK_REHASHING)
-    check_hash_table_rehash (make_lisp_ptr ((void *) hash_in, Lisp_Vectorlike));
-
   struct Lisp_Hash_Table hash_munged = *hash_in;
   struct Lisp_Hash_Table *hash = &hash_munged;
 
-  /* Remember to rehash this hash table on first access.  After a
-     dump reload, the hash table values will have changed, so we'll
-     need to rebuild the index.
-
-     TODO: for EQ and EQL hash tables, it should be possible to rehash
-     here using the preferred load address of the dump, eliminating
-     the need to rehash-on-access if we can load the dump where we
-     want.  */
-  if (hash->count > 0 && !is_stable)
-    /* Hash codes will have to be recomputed anyway, so let's not dump them.
-       Also set `hash` to nil for hash_rehash_needed_p.
-       We could also refrain from dumping the `next' and `index' vectors,
-       except that `next' is currently used for HASH_TABLE_SIZE and
-       we'd have to rebuild the next_free list as well as adjust
-       sweep_weak_hash_table for the case where there's no `index'.  */
-    hash->hash = Qnil;
+  hash_table_freeze (hash);
+  dump_push (&ctx->hash_tables, object);
 
   START_DUMP_PVEC (ctx, &hash->header, struct Lisp_Hash_Table, out);
   dump_pseudovector_lisp_fields (ctx, &out->header, &hash->header);
@@ -4151,6 +4088,19 @@ DEFUN ("dump-emacs-portable",
 	 || !NILP (ctx->deferred_hash_tables)
 	 || !NILP (ctx->deferred_symbols));
 
+  ctx->header.hash_list = ctx->offset;
+  dump_hash_table_list (ctx);
+
+  do
+    {
+      dump_drain_deferred_hash_tables (ctx);
+      dump_drain_deferred_symbols (ctx);
+      dump_drain_normal_queue (ctx);
+    }
+  while (!dump_queue_empty_p (&ctx->dump_queue)
+	 || !NILP (ctx->deferred_hash_tables)
+	 || !NILP (ctx->deferred_symbols));
+
   dump_sort_copied_objects (ctx);
 
   /* While we copy built-in symbols into the Emacs image, these
@@ -5302,6 +5252,9 @@ dump_do_all_emacs_relocations (const struct dump_header *const header,
    NUMBER_DUMP_SECTIONS,
   };
 
+/* Pointer to a stack variable to avoid having to staticpro it.  */
+static Lisp_Object *pdumper_hashes = &zero_vector;
+
 /* Load a dump from DUMP_FILENAME.  Return an error code.
 
    N.B. We run very early in initialization, so we can't use lisp,
@@ -5448,6 +5401,15 @@ pdumper_load (const char *dump_filename)
   for (int i = 0; i < ARRAYELTS (sections); ++i)
     dump_mmap_reset (&sections[i]);
 
+  Lisp_Object hashes = zero_vector;
+  if (header->hash_list)
+    {
+      struct Lisp_Vector *hash_tables =
+	((struct Lisp_Vector *)(dump_base + header->hash_list));
+      XSETVECTOR (hashes, hash_tables);
+    }
+
+  pdumper_hashes = &hashes;
   /* Run the functions Emacs registered for doing post-dump-load
      initialization.  */
   for (int i = 0; i < nr_dump_hooks; ++i)
@@ -5518,6 +5480,19 @@ DEFUN ("pdumper-stats", Fpdumper_stats, Spdumper_stats, 0, 0, 0,
 #endif /* HAVE_PDUMPER */
 
 \f
+static void
+thaw_hash_tables (void)
+{
+  Lisp_Object hash_tables = *pdumper_hashes;
+  for (ptrdiff_t i = 0; i < ASIZE (hash_tables); i++)
+    hash_table_thaw (AREF (hash_tables, i));
+}
+
+void
+init_pdumper_once (void)
+{
+  pdumper_do_now_and_after_load (thaw_hash_tables);
+}
 
 void
 syms_of_pdumper (void)
diff --git a/src/pdumper.h b/src/pdumper.h
index 6a99b511f2..c793fb4058 100644
--- a/src/pdumper.h
+++ b/src/pdumper.h
@@ -256,6 +256,7 @@ pdumper_clear_marks (void)
    file was loaded.  */
 extern void pdumper_record_wd (const char *);
 
+void init_pdumper_once (void);
 void syms_of_pdumper (void);
 
 INLINE_HEADER_END
-- 
2.17.1


[-- Attachment #3: 0002-src-fns.c-hash_table_rehash-Help-the-compiler-a-bit.patch --]
[-- Type: text/x-patch, Size: 1688 bytes --]

From 5c9fa99d3f0cbb2a44d3e0507533f4ab5a13f906 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 2/7] * src/fns.c (hash_table_rehash): Help the compiler a bit.

---
 src/fns.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/fns.c b/src/fns.c
index 41e26104f3..9199178212 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4250,14 +4250,16 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
    Normally there's never a need to recompute hashes.
    This is done only on first access to a hash-table loaded from
    the "pdump", because the objects' addresses may have changed, thus
-   affecting their hash.  */
+   affecting their hashes.  */
 void
 hash_table_rehash (Lisp_Object hash)
 {
   struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
+  ptrdiff_t i, count = h->count;
+
   /* Recompute the actual hash codes for each entry in the table.
      Order is still invalid.  */
-  for (ptrdiff_t i = 0; i < h->count; ++i)
+  for (i = 0; i < count; i++)
     {
       Lisp_Object key = HASH_KEY (h, i);
       Lisp_Object hash_code = h->test.hashfn (key, h);
@@ -4268,7 +4270,8 @@ hash_table_rehash (Lisp_Object hash)
       eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
     }
 
-  for (ptrdiff_t i = h->count; i < ASIZE (h->next) - 1; i++)
+  ptrdiff_t size = ASIZE (h->next);
+  for (; i + 1 < size; i++)
     set_hash_next_slot (h, i, i + 1);
 }
 
@@ -4892,7 +4895,6 @@ DEFUN ("hash-table-count", Fhash_table_count, Shash_table_count, 1, 1, 0,
   (Lisp_Object table)
 {
   struct Lisp_Hash_Table *h = check_hash_table (table);
-
   return make_fixnum (h->count);
 }
 
-- 
2.17.1


[-- Attachment #4: 0003-src-pdumper.c-pdumper_load-XSETVECTOR-make_lisp_ptr.patch --]
[-- Type: text/x-patch, Size: 1611 bytes --]

From 523b92c3e7b61e5e625101c7ec99273689f7a75a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 3/7] * src/pdumper.c (pdumper_load): XSETVECTOR ->
 make_lisp_ptr.

---
 src/pdumper.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index 10dfa8737f..c38cb2d34f 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -391,6 +391,7 @@ dump_fingerprint (char const *label,
      boundary.  */
   dump_off cold_start;
 
+  /* Offset of a vector of the dumped hash tables.  */
   dump_off hash_list;
 };
 
@@ -549,6 +550,7 @@ dump_fingerprint (char const *label,
      heap objects.  */
   Lisp_Object bignum_data;
 
+  /* List of hash tables that have been dumped.  */
   Lisp_Object hash_tables;
 
   unsigned number_hot_relocations;
@@ -2610,7 +2612,7 @@ dump_vectorlike_generic (struct dump_context *ctx,
 }
 
 /* Return a vector of KEY, VALUE pairs in the given hash table H.  The
-   first H->count pairs are valid, the rest is left as nil.  */
+   first H->count pairs are valid, and the rest are unbound.  */
 static Lisp_Object
 hash_table_contents (struct Lisp_Hash_Table *h)
 {
@@ -5405,8 +5407,8 @@ pdumper_load (const char *dump_filename)
   if (header->hash_list)
     {
       struct Lisp_Vector *hash_tables =
-	((struct Lisp_Vector *)(dump_base + header->hash_list));
-      XSETVECTOR (hashes, hash_tables);
+	(struct Lisp_Vector *) (dump_base + header->hash_list);
+      hashes = make_lisp_ptr (hash_tables, Lisp_Vectorlike);
     }
 
   pdumper_hashes = &hashes;
-- 
2.17.1


[-- Attachment #5: 0004-Don-t-needlessly-convert-to-unsigned-in-pdumper.patch --]
[-- Type: text/x-patch, Size: 5982 bytes --]

From 1d50d5a039ca4d40473c862b021d2ed97279ffe8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 4/7] =?UTF-8?q?Don=E2=80=99t=20needlessly=20convert=20to?=
 =?UTF-8?q?=20=E2=80=98unsigned=E2=80=99=20in=20pdumper?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/pdumper.c (PRIdDUMP_OFF): New macro.
(EMACS_INT_XDIGITS): New constant.
(struct dump_context): Use dump_off for relocation counts.
All uses changed.
(dump_queue_enqueue, dump_queue_dequeue, Fdump_emacs_portable):
Don’t assume counts fit in ‘unsigned’ or ‘unsigned long’.
Use EMACS_INT_XDIGITS instead of assuming it’s 16.
---
 src/pdumper.c | 58 +++++++++++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index c38cb2d34f..6d303af77d 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -141,6 +141,9 @@ divide_round_up (size_t x, size_t y)
 typedef int_least32_t dump_off;
 #define DUMP_OFF_MIN INT_LEAST32_MIN
 #define DUMP_OFF_MAX INT_LEAST32_MAX
+#define PRIdDUMP_OFF PRIdLEAST32
+
+enum { EMACS_INT_XDIGITS = (EMACS_INT_WIDTH + 3) / 4 };
 
 static void ATTRIBUTE_FORMAT ((printf, 1, 2))
 dump_trace (const char *fmt, ...)
@@ -553,8 +556,8 @@ dump_fingerprint (char const *label,
   /* List of hash tables that have been dumped.  */
   Lisp_Object hash_tables;
 
-  unsigned number_hot_relocations;
-  unsigned number_discardable_relocations;
+  dump_off number_hot_relocations;
+  dump_off number_discardable_relocations;
 };
 
 /* These special values for use as offsets in dump_remember_object and
@@ -1007,9 +1010,9 @@ dump_queue_enqueue (struct dump_queue *dump_queue,
   if (NILP (weights))
     {
       /* Object is new.  */
-      dump_trace ("new object %016x weight=%u\n",
-                  (unsigned) XLI (object),
-                  (unsigned) weight.value);
+      EMACS_UINT uobj = XLI (object);
+      dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
+		  weight.value);
 
       if (weight.value == WEIGHT_NONE.value)
         {
@@ -1224,17 +1227,15 @@ dump_queue_dequeue (struct dump_queue *dump_queue, dump_off basis)
 	       + dump_tailq_length (&dump_queue->one_weight_normal_objects)
 	       + dump_tailq_length (&dump_queue->one_weight_strong_objects)));
 
-  bool dump_object_counts = true;
-  if (dump_object_counts)
-    dump_trace
-      ("dump_queue_dequeue basis=%d fancy=%u zero=%u "
-       "normal=%u strong=%u hash=%u\n",
-       basis,
-       (unsigned) dump_tailq_length (&dump_queue->fancy_weight_objects),
-       (unsigned) dump_tailq_length (&dump_queue->zero_weight_objects),
-       (unsigned) dump_tailq_length (&dump_queue->one_weight_normal_objects),
-       (unsigned) dump_tailq_length (&dump_queue->one_weight_strong_objects),
-       (unsigned) XFIXNUM (Fhash_table_count (dump_queue->link_weights)));
+  dump_trace
+    (("dump_queue_dequeue basis=%"PRIdDUMP_OFF" fancy=%"PRIdPTR
+      " zero=%"PRIdPTR" normal=%"PRIdPTR" strong=%"PRIdPTR" hash=%td\n"),
+     basis,
+     dump_tailq_length (&dump_queue->fancy_weight_objects),
+     dump_tailq_length (&dump_queue->zero_weight_objects),
+     dump_tailq_length (&dump_queue->one_weight_normal_objects),
+     dump_tailq_length (&dump_queue->one_weight_strong_objects),
+     XHASH_TABLE (dump_queue->link_weights)->count);
 
   static const int nr_candidates = 3;
   struct candidate
@@ -1307,10 +1308,10 @@ dump_queue_dequeue (struct dump_queue *dump_queue, dump_off basis)
   else
     emacs_abort ();
 
-  dump_trace ("  result score=%f src=%s object=%016x\n",
+  EMACS_UINT uresult = XLI (result);
+  dump_trace ("  result score=%f src=%s object=%0*"pI"x\n",
               best < 0 ? -1.0 : (double) candidates[best].score,
-              src,
-              (unsigned) XLI (result));
+	      src, EMACS_INT_XDIGITS, uresult);
 
   {
     Lisp_Object weights = Fgethash (result, dump_queue->link_weights, Qnil);
@@ -4162,9 +4163,9 @@ DEFUN ("dump-emacs-portable",
      of the dump.  */
   drain_reloc_list (ctx, dump_emit_dump_reloc, emacs_reloc_merger,
 		    &ctx->dump_relocs, &ctx->header.dump_relocs);
-  unsigned number_hot_relocations = ctx->number_hot_relocations;
+  dump_off number_hot_relocations = ctx->number_hot_relocations;
   ctx->number_hot_relocations = 0;
-  unsigned number_discardable_relocations = ctx->number_discardable_relocations;
+  dump_off number_discardable_relocations = ctx->number_discardable_relocations;
   ctx->number_discardable_relocations = 0;
   drain_reloc_list (ctx, dump_emit_dump_reloc, emacs_reloc_merger,
 		    &ctx->object_starts, &ctx->header.object_starts);
@@ -4188,14 +4189,17 @@ DEFUN ("dump-emacs-portable",
   dump_seek (ctx, 0);
   dump_write (ctx, &ctx->header, sizeof (ctx->header));
 
+  dump_off
+    header_bytes = header_end - header_start,
+    hot_bytes = hot_end - hot_start,
+    discardable_bytes = discardable_end - ctx->header.discardable_start,
+    cold_bytes = cold_end - ctx->header.cold_start;
   fprintf (stderr,
 	   ("Dump complete\n"
-	    "Byte counts: header=%lu hot=%lu discardable=%lu cold=%lu\n"
-	    "Reloc counts: hot=%u discardable=%u\n"),
-           (unsigned long) (header_end - header_start),
-           (unsigned long) (hot_end - hot_start),
-           (unsigned long) (discardable_end - ctx->header.discardable_start),
-           (unsigned long) (cold_end - ctx->header.cold_start),
+	    "Byte counts: header=%"PRIdDUMP_OFF" hot=%"PRIdDUMP_OFF
+	    " discardable=%"PRIdDUMP_OFF" cold=%"PRIdDUMP_OFF"\n"
+	    "Reloc counts: hot=%"PRIdDUMP_OFF" discardable=%"PRIdDUMP_OFF"\n"),
+	   header_bytes, hot_bytes, discardable_bytes, cold_bytes,
            number_hot_relocations,
            number_discardable_relocations);
 
-- 
2.17.1


[-- Attachment #6: 0005-In-pdumper-simplify-INT_MAX-computation.patch --]
[-- Type: text/x-patch, Size: 1634 bytes --]

From 597bb393156730e0f68b0b3e80098d977b8dbdb8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 5/7] In pdumper, simplify INT_MAX computation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/pdumper.c (dump_read_all): Avoid unnecessary cast.
Also, round down to page size, as sysdep.c does.
Also, don’t assume INT_MAX <= UINT_MAX (!).
---
 src/pdumper.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index 6d303af77d..fcad5242df 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -5065,14 +5065,13 @@ dump_read_all (int fd, void *buf, size_t bytes_to_read)
 {
   /* We don't want to use emacs_read, since that relies on the lisp
      world, and we're not in the lisp world yet.  */
-  eassert (bytes_to_read <= SSIZE_MAX);
   size_t bytes_read = 0;
   while (bytes_read < bytes_to_read)
     {
-      /* Some platforms accept only int-sized values to read.  */
-      unsigned chunk_to_read = INT_MAX;
-      if (bytes_to_read - bytes_read < chunk_to_read)
-	chunk_to_read = (unsigned) (bytes_to_read - bytes_read);
+      /* Some platforms accept only int-sized values to read.
+         Round this down to a page size (see MAX_RW_COUNT in sysdep.c).  */
+      int max_rw_count = INT_MAX >> 18 << 18;
+      size_t chunk_to_read = min (bytes_to_read - bytes_read, max_rw_count);
       ssize_t chunk = read (fd, (char *) buf + bytes_read, chunk_to_read);
       if (chunk < 0)
         return chunk;
-- 
2.17.1


[-- Attachment #7: 0006-pdumper-speed-tweeks-for-hash-tables.patch --]
[-- Type: text/x-patch, Size: 2685 bytes --]

From ba8bff6bfca9daefd3917d706b94fc55b7d93191 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 6/7] pdumper speed tweeks for hash tables

* src/pdumper.c (dump_queue_empty_p): Avoid unnecessary call
to Fhash_table_count on a known hash table.
(dump_hash_table_list): !NILP, not CONSP.
(hash_table_freeze, hash_table_thaw): ASIZE, not Flength, on vectors.
Initialize in same order as struct.
(hash_table_thaw): make_nil_vector, not Fmake_vector with nil.
---
 src/pdumper.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index fcad5242df..bc9d197ca2 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -964,11 +964,9 @@ dump_queue_init (struct dump_queue *dump_queue)
 static bool
 dump_queue_empty_p (struct dump_queue *dump_queue)
 {
-  bool is_empty =
-    EQ (Fhash_table_count (dump_queue->sequence_numbers),
-        make_fixnum (0));
-  eassert (EQ (Fhash_table_count (dump_queue->sequence_numbers),
-               Fhash_table_count (dump_queue->link_weights)));
+  ptrdiff_t count = XHASH_TABLE (dump_queue->sequence_numbers)->count;
+  bool is_empty = count == 0;
+  eassert (count == XFIXNAT (Fhash_table_count (dump_queue->link_weights)));
   if (!is_empty)
     {
       eassert (!dump_tailq_empty_p (&dump_queue->zero_weight_objects)
@@ -2643,7 +2641,7 @@ hash_table_contents (struct Lisp_Hash_Table *h)
 static dump_off
 dump_hash_table_list (struct dump_context *ctx)
 {
-  if (CONSP (ctx->hash_tables))
+  if (!NILP (ctx->hash_tables))
     return dump_object (ctx, CALLN (Fapply, Qvector, ctx->hash_tables));
   else
     return 0;
@@ -2652,20 +2650,20 @@ dump_hash_table_list (struct dump_context *ctx)
 static void
 hash_table_freeze (struct Lisp_Hash_Table *h)
 {
-  ptrdiff_t nkeys = XFIXNAT (Flength (h->key_and_value)) / 2;
+  ptrdiff_t npairs = ASIZE (h->key_and_value) / 2;
   h->key_and_value = hash_table_contents (h);
-  h->next_free = (nkeys == h->count ? -1 : h->count);
-  h->index = Flength (h->index);
-  h->next = h->hash = make_fixnum (nkeys);
+  h->next = h->hash = make_fixnum (npairs);
+  h->index = make_fixnum (ASIZE (h->index));
+  h->next_free = (npairs == h->count ? -1 : h->count);
 }
 
 static void
 hash_table_thaw (Lisp_Object hash)
 {
   struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
-  h->index = Fmake_vector (h->index, make_fixnum (-1));
-  h->hash = Fmake_vector (h->hash, Qnil);
+  h->hash = make_nil_vector (XFIXNUM (h->hash));
   h->next = Fmake_vector (h->next, make_fixnum (-1));
+  h->index = Fmake_vector (h->index, make_fixnum (-1));
 
   hash_table_rehash (hash);
 }
-- 
2.17.1


[-- Attachment #8: 0007-pdumper-avoid-listing-hash-table-contents.patch --]
[-- Type: text/x-patch, Size: 1807 bytes --]

From 88d3e15f47b675d8d3fc922eb5a6ff7df8295b34 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 11 Aug 2020 02:16:54 -0700
Subject: [PATCH 7/7] pdumper avoid listing hash table contents

* src/pdumper.c (hash_table_contents): Create a vector directly,
instead of creating a list and then converting that to a vector.
---
 src/pdumper.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index bc9d197ca2..94921dc9ea 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -2617,25 +2617,28 @@ hash_table_contents (struct Lisp_Hash_Table *h)
 {
   if (h->test.hashfn == hashfn_user_defined)
     error ("cannot dump hash tables with user-defined tests");  /* Bug#36769 */
-  Lisp_Object contents = Qnil;
+
+  ptrdiff_t size = HASH_TABLE_SIZE (h);
+  Lisp_Object key_and_value = make_uninit_vector (2 * size);
+  ptrdiff_t n = 0;
 
   /* Make sure key_and_value ends up in the same order; charset.c
      relies on it by expecting hash table indices to stay constant
      across the dump.  */
-  for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h) - h->count; i++)
-    {
-      dump_push (&contents, Qnil);
-      dump_push (&contents, Qunbound);
-    }
-
-  for (ptrdiff_t i = HASH_TABLE_SIZE (h) - 1; i >= 0; --i)
+  for (ptrdiff_t i = 0; i < size; i++)
     if (!NILP (HASH_HASH (h, i)))
       {
-	dump_push (&contents, HASH_VALUE (h, i));
-	dump_push (&contents, HASH_KEY (h, i));
+	ASET (key_and_value, n++, HASH_KEY (h, i));
+	ASET (key_and_value, n++, HASH_VALUE (h, i));
       }
 
-  return CALLN (Fapply, Qvector, contents);
+  while (n < 2 * size)
+    {
+      ASET (key_and_value, n++, Qunbound);
+      ASET (key_and_value, n++, Qnil);
+    }
+
+  return key_and_value;
 }
 
 static dump_off
-- 
2.17.1


  reply	other threads:[~2020-08-11  9:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 14:05 bug#36597: 27.0.50; rehash hash tables eagerly in pdumper Pip Cet
2019-07-14 14:39 ` Paul Eggert
2019-07-14 15:01   ` Pip Cet
2019-07-14 15:49     ` Paul Eggert
2019-07-14 16:54       ` Pip Cet
2019-07-15 14:39         ` Pip Cet
2019-07-19  7:23           ` Pip Cet
2019-07-19  7:46             ` Eli Zaretskii
2019-07-20 12:38               ` Pip Cet
2019-07-21  3:18                 ` Paul Eggert
2019-07-21  5:34                   ` Pip Cet
2019-07-21  6:32                     ` Paul Eggert
2019-07-21  6:32                     ` Pip Cet
2020-08-09 19:27                       ` Lars Ingebrigtsen
2020-08-10 11:51                         ` Pip Cet
2020-08-10 13:04                           ` Lars Ingebrigtsen
2020-08-11  9:33                             ` Paul Eggert [this message]
2020-08-11  9:40                               ` Pip Cet
2020-08-11 11:50                               ` Lars Ingebrigtsen
2020-08-11 14:52                               ` Eli Zaretskii
2020-08-11 15:30                                 ` Paul Eggert
2020-08-11 17:00                                   ` Eli Zaretskii
2020-08-11 18:11                                     ` Paul Eggert
2020-08-11 18:35                                       ` Eli Zaretskii
2020-08-11 18:55                                         ` Eli Zaretskii
2020-08-11 23:43                                         ` Paul Eggert
2020-08-12 14:10                                           ` Eli Zaretskii
2020-08-12 14:46                                             ` Eli Zaretskii
2020-08-12 19:11                                             ` Paul Eggert
2020-08-12 19:28                                               ` Eli Zaretskii
2020-08-12 20:41                                                 ` Andy Moreton
2020-08-11 15:59                                 ` Pip Cet
2020-08-11 17:00                                   ` Eli Zaretskii
2020-08-11 17:31                                     ` Paul Eggert
2020-08-11 18:27                                       ` Andy Moreton
2020-08-11 18:32                                       ` Eli Zaretskii
2019-07-18  5:39   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1aad0e7-6416-2327-24a2-fc5c916a87b8@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=36597-done@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=pipcet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).