unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Pip Cet <pipcet@gmail.com>
To: Noam Postavsky <npostavs@gmail.com>
Cc: Michael Heerdegen <michael_heerdegen@web.de>, 36447@debbugs.gnu.org
Subject: bug#36447: 27.0.50; New "Unknown keyword" errors
Date: Wed, 3 Jul 2019 11:57:52 +0000	[thread overview]
Message-ID: <CAOqdjBd1qR4=BQrCjrXjoh1VFLFpDKdmYqHih1MuxpEZYG6FXA@mail.gmail.com> (raw)
In-Reply-To: <CAOqdjBeZgdZemt0H9xqnVcMhugD7nAHPVap_B42z3LWOGf5ZNQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

On Tue, Jul 2, 2019 at 10:50 PM Pip Cet <pipcet@gmail.com> wrote:
> > The compiler translates repeated `eq' in a cond like that into a hash
> > and jump.  See byte-compile-cond-use-jump-table.
>
> I think I found the problem. It's a bit tricky.

I'm attaching a patch which modifies standard Emacs in what should be
harmless ways, but demonstrates the bug.

As far as I can tell, this is a serious issue that will pop up
seemingly at random. We really should fix it.

However, I thought hash_table_rehash was called more often than it
actually is, so the fix should be simple: just copy-sequence a hash
table's ->next, ->hash, and ->index vectors before rehashing.
(->key_and_value isn't actually modified, so it's safe to share
between (read-only) hash tables.

What's happening is this:

Purecopied hash tables share structure, particularly the ->next
vector, with similar but different purecopied hash tables. Some
purecopied hash tables dumped with pdumper are rehashed upon first
access after loading the dump (in the example, this is the tables
which map 'dummy-symbol to t), while others are not (the tables with
just fixnums as keys). If a hash table is rehashed, it's ->next vector
will change to reflect the changed hash (of 'dummy-symbol); however,
the non-rehashed table that shares the ->next vector will be confused:
its key_and_value array will stay the same, and valid, but the ->next
vector will no longer match it. In practice, this means (gethash
valid-key hash-table) will return nil even though the key is valid.

While the attached patch appears to work, I would prefer simply
dumping hash tables with nil values for ->next, ->hash, and ->index,
and rebuilding the entire hash table upon first access. This would
also allow us to switch to secure randomized hashes should the need
arise.

[-- Attachment #2: 0001-Test-bug-36447.patch --]
[-- Type: text/x-patch, Size: 2007 bytes --]

From 8fe464dbb030a722ca173c442906482ef210f46e Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Wed, 3 Jul 2019 11:27:50 +0000
Subject: [PATCH 1/2] Test bug#36447

---
 lisp/custom.el | 10 ++++++++++
 src/puresize.h |  2 +-
 test-custom.el | 12 ++++++++++++
 3 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 test-custom.el

diff --git a/lisp/custom.el b/lisp/custom.el
index 736460fec7..c49321b829 100644
--- a/lisp/custom.el
+++ b/lisp/custom.el
@@ -35,6 +35,16 @@
 
 (require 'widget)
 
+(defvar hash-tables (make-vector 256 nil))
+
+(dotimes (i 256)
+  (let ((ht (make-hash-table :purecopy t :size 2 :test 'eq)))
+    (when (= (% i 3) 0)
+      (puthash 'dummy-symbol t ht))
+    (dotimes (j 16)
+      (puthash (logand i (lsh 1 j)) t ht))
+    (aset hash-tables i (purecopy ht))))
+
 (defvar custom-define-hook nil
   ;; Customize information for this option is in `cus-edit.el'.
   "Hook called after defining each customize option.")
diff --git a/src/puresize.h b/src/puresize.h
index f5fad8b42b..d29e3e80df 100644
--- a/src/puresize.h
+++ b/src/puresize.h
@@ -47,7 +47,7 @@ #define SITELOAD_PURESIZE_EXTRA 0
 #endif
 
 #ifndef BASE_PURESIZE
-#define BASE_PURESIZE (2000000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA)
+#define BASE_PURESIZE (80000000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA)
 #endif
 
 /* Increase BASE_PURESIZE by a ratio depending on the machine's word size.  */
diff --git a/test-custom.el b/test-custom.el
new file mode 100644
index 0000000000..b871d30cfa
--- /dev/null
+++ b/test-custom.el
@@ -0,0 +1,12 @@
+(dotimes (i 256)
+  (let ((ht (aref hash-tables i)))
+    (dotimes (j 16)
+      (unless (eq (/= 0 (logand (lsh 1 j) i))
+                  (gethash (lsh 1 j) ht))
+        (error "hash table corruption at table %S, bit %S" i j)))))
+
+(aref hash-tables 17)
+;; #s(hash-table size 3 test eq rehash-size 1.5 rehash-threshold 0.8125 purecopy t data (1 t 0 t 16 t))
+
+(gethash 1 (aref hash-tables 17))
+;; nil
-- 
2.20.1


[-- Attachment #3: 0002-Don-t-alter-shared-structure-in-dumped-purecopied-ha.patch --]
[-- Type: text/x-patch, Size: 964 bytes --]

From 70419f630f60919c8645a10aeef8d299f5098ff5 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Wed, 3 Jul 2019 11:48:22 +0000
Subject: [PATCH 2/2] Don't alter shared structure in dumped purecopied hash
 tables.

* src/fns.c (hash_table_rehash): Make sure we're operating on
fresh copies of ->next, ->index, ->hash.
---
 src/fns.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/fns.c b/src/fns.c
index 2fc000a7f4..44d2de523a 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4223,6 +4223,12 @@ hash_table_rehash (struct Lisp_Hash_Table *h)
 {
   ptrdiff_t size = HASH_TABLE_SIZE (h);
 
+  /* These structures may have been purecopied and shared
+     (bug#36447).  */
+  h->next = Fcopy_sequence (h->next);
+  h->index = Fcopy_sequence (h->index);
+  h->hash = Fcopy_sequence (h->hash);
+
   /* Recompute the actual hash codes for each entry in the table.
      Order is still invalid.  */
   for (ptrdiff_t i = 0; i < size; ++i)
-- 
2.20.1


  reply	other threads:[~2019-07-03 11:57 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-30 18:23 bug#36447: 27.0.50; New "Unknown keyword" errors Michael Heerdegen
2019-06-30 18:43 ` Eli Zaretskii
2019-06-30 21:44   ` Michael Heerdegen
2019-07-01 12:25     ` Noam Postavsky
2019-07-01 13:20       ` Pip Cet
2019-07-01 22:04       ` Michael Heerdegen
2019-07-02  1:59         ` Stefan Kangas
2019-07-02 14:17           ` Eli Zaretskii
2019-07-02 13:29         ` Pip Cet
2019-07-02 15:35           ` Michael Heerdegen
2019-07-02 16:20             ` Noam Postavsky
2019-07-02 22:50               ` Pip Cet
2019-07-03 11:57                 ` Pip Cet [this message]
2019-07-05  1:59                   ` Michael Heerdegen
2019-07-05  6:35                     ` Pip Cet
2019-07-05  7:50                       ` Eli Zaretskii
2019-07-05  8:12                         ` Pip Cet
2019-07-05  8:25                           ` Eli Zaretskii
2019-07-05  8:36                             ` Pip Cet
2019-07-05  8:41                               ` Eli Zaretskii
2019-07-05  9:09                                 ` Pip Cet
2019-07-05 12:23                                   ` Robert Pluim
2019-07-05 12:33                                   ` Eli Zaretskii
2019-07-05 13:41                                     ` Pip Cet
2019-07-05 18:00                                     ` Stefan Monnier
2019-07-05 18:07                                       ` Eli Zaretskii
2019-07-05 20:16                                         ` Stefan Monnier
2019-07-05 18:57                                       ` Pip Cet
2019-07-05 19:13                                         ` Eli Zaretskii
2019-07-05 20:21                                         ` Stefan Monnier
2019-07-05 21:52                                           ` Pip Cet
2019-07-05 22:10                                             ` Stefan Monnier
2019-07-06  6:45                                               ` Eli Zaretskii
2019-07-06 15:08                                                 ` Pip Cet
2019-07-09 21:05                                                   ` Stefan Monnier
2019-07-10  2:38                                                     ` Eli Zaretskii
2019-07-10  3:19                                                       ` Daniel Colascione
2019-07-10 15:01                                                         ` Pip Cet
2019-07-10 17:16                                                           ` Daniel Colascione
2019-07-10 20:14                                                             ` Pip Cet
2019-07-06 15:32                                             ` Michael Heerdegen
2019-07-08 17:30                                               ` Lars Ingebrigtsen
2019-07-08 17:58                                                 ` Pip Cet
2019-07-08 22:18                                                   ` Lars Ingebrigtsen
2019-07-08 22:25                                                     ` Noam Postavsky
2019-07-09 14:00                                                       ` Pip Cet
2019-07-10  3:01                                                         ` Daniel Colascione
2019-07-14 14:06                                                           ` Noam Postavsky
2019-07-08 23:22                                                     ` Stefan Monnier
2019-07-08 22:23                                                   ` Michael Heerdegen
2019-07-09 15:43                                                     ` Eli Zaretskii
2019-07-09 20:15                                                   ` Stefan Monnier
2019-07-05  7:55                       ` Katsumi Yamaoka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOqdjBd1qR4=BQrCjrXjoh1VFLFpDKdmYqHih1MuxpEZYG6FXA@mail.gmail.com' \
    --to=pipcet@gmail.com \
    --cc=36447@debbugs.gnu.org \
    --cc=michael_heerdegen@web.de \
    --cc=npostavs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).