On Fri, Jul 5, 2019 at 8:41 AM Eli Zaretskii wrote: > > > From: Pip Cet > > Date: Fri, 5 Jul 2019 08:36:57 +0000 > > Cc: michael_heerdegen@web.de, npostavs@gmail.com, 36447@debbugs.gnu.org > > > > > > I don't think we can sensibly add tests for this bug, but the fix I > > > > posted earlier still seems valid to me. > > > > > > Sorry, I'm not tracking this part of the discussion, as it lost me > > > long ago. > > > > What's the best way of getting this fixed? > > Sorry, I don't think I know what "this bug" is about, The bug: Building emacs with "-O0 -g3 -ggdb" on current Linux will result in binaries that sometimes, depending on the precise compiler version used, will fail weirdly if you evaluate this emacs -Q recipe line by line: (custom-handle-keyword nil :group nil nil) (y-or-n-p "prompt") (custom-handle-keyword nil :group nil nil) The error produced will be "unknown keyword :group", which is nonsensical as :group is indeed a valid keyword. The analysis: It's not the byte code, which is fine and looks like this: byte code for custom-handle-keyword: doc: For customization option SYMBOL, handle KEYWORD with VALUE. ... args: (arg1 arg2 arg3 arg4) 0 varref purify-flag 1 goto-if-nil 1 4 constant purecopy 5 stack-ref 2 6 call 1 7 stack-set 2 9:1 stack-ref 2 10 constant 11 switch 12 goto 9 ... 52:9 constant error 53 constant "Unknown keyword %s" 54 stack-ref 4 55 call 2 56 return Note that the code uses a jump table, which is a hash table mapping keys to integers for the "switch" op. This is where hash tables come in. We can inspect the hash table `custom-handle-keyword' uses by evaluating (aref (aref (symbol-function #'custom-handle-keyword) 2) 2) The hash table prints fine. But investigating its C in-memory representation, we find that the hash collision chains, stored in the ->next vector, are corrupted. It turns out that this is because hash_table_rehash was called on a different hash table which had the same ->next vector, but different contents. That's the problem I fixed: for reasons explained below, we sometimes see two hash tables with the same ->next vector, then try to rehash both of them, obtaining different results. Last caller wins, first caller gets the corruption (each hash table is rehashed at most once). The reasons are this: when a hash table is purecopied, its ->next vector is purecopied, which merges it with another, similar, hash table's ->next vector if purify-flag is a (third) hash table. The vectors are compared using `equal', but the pure copies are actually `eq'. This worked fine with the old dumper, because we never modified pure storage. However, with the current pdumper code, we have to do that. The (disappointingly trivial) fix: call copy-sequence on h->next before rehashing the table. This will make h->next impure, which is good since we're going to modify it. While we're there, do the same for the other vectors used in the hash table representation, except for h->key_and_value, which we need not touch. > and how is the issue with hash tables relevant. The bytecode is executing incorrectly because it relies on a purecopied hash table, which is effectively part of the compiled function. The hash table has become corrupted.