bug#36447: 27.0.50; New "Unknown keyword" errors

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: "Daniel Colascione" <dancol@dancol.org>
To: "Eli Zaretskii" <eliz@gnu.org>
Cc: michael_heerdegen@web.de, npostavs@gmail.com,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	pipcet@gmail.com, 36447@debbugs.gnu.org
Subject: bug#36447: 27.0.50; New "Unknown keyword" errors
Date: Tue, 9 Jul 2019 20:19:38 -0700	[thread overview]
Message-ID: <735465eebd8f96d3969d42e6f734f69e.squirrel@dancol.org> (raw)
In-Reply-To: <83ef2y4nsm.fsf@gnu.org>

>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  michael_heerdegen@web.de,
>> npostavs@gmail.com,  36447@debbugs.gnu.org
>> Date: Tue, 09 Jul 2019 17:05:53 -0400
>>
>> I think we should get Daniel's opinion on this.
>
> Daniel, could you please comment on the issues discussed in this bug?

Thanks for debugging this problem. It really is nasty. AIUI, the problem
is that hash-consing the hash vectors might unify vectors that happen to
have the same contents under one hashing scheme but different contents
under a different hashing scheme, so when we rehash lazily, we correctly
rehash one hash table and corrupt a different hash table's index array by
side effect. There are two proposed solutions:

1) Copy the hash table internal vectors on rehash, and
2) "Freeze" hash tables by eliminating the index arrays and rebuilding
them all eagerly on dump start.

#1 works, but it's somewhat inefficient.

#2 is a variant of #1, in a sense. Instead of copying the hash table
vectors and mutating them, we rebuild them from scratch. I don't
understand why we have to do that eagerly.

#1 isn't as bad as you might think. The existing hash table rehashing
stuff is inefficient anyway. Suppose we dump a hash table, load a dump,
and access the hash table. Right now, we do the rehashing and take COW
faults for the arrays we mutate. So far, so good. What happens if we grow
the hash table past its load factor limit? We allocate new vectors and
rehash into those, forgetting the old vectors. In the non-pdumper case, GC
will collect those older vectors eventually. In the pdumper case, those
COWed pages will stick around in memory forever, unused. I don't think it
counts as a "leak", since the memory waste is bounded, but it's still
memory waste.

If we use approach #1, we don't mutate the hash table pages mapped from
the dump. Besides, the amount of work we do is actually the same. In the
COW-page case, the kernel allocates a new page, reads the dump bytes, and
writes them to the new page. In the Fcopy_sequence case, *we* allocate a
new vector, read the dump bytes, and write them into anonymous memory.
It's the same work either way, except that if we copy, when we grow the
hash table, we can actually free the original vectors.

IMHO, the right approach is to check in #1 for now and switch to a #2-like
approach once master is stable. Noticing that we don't actually have to
store the hash table internal arrays in the dump is a good catch.

next prev parent reply	other threads:[~2019-07-10  3:19 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-30 18:23 bug#36447: 27.0.50; New "Unknown keyword" errors Michael Heerdegen
2019-06-30 18:43 ` Eli Zaretskii
2019-06-30 21:44   ` Michael Heerdegen
2019-07-01 12:25     ` Noam Postavsky
2019-07-01 13:20       ` Pip Cet
2019-07-01 22:04       ` Michael Heerdegen
2019-07-02  1:59         ` Stefan Kangas
2019-07-02 14:17           ` Eli Zaretskii
2019-07-02 13:29         ` Pip Cet
2019-07-02 15:35           ` Michael Heerdegen
2019-07-02 16:20             ` Noam Postavsky
2019-07-02 22:50               ` Pip Cet
2019-07-03 11:57                 ` Pip Cet
2019-07-05  1:59                   ` Michael Heerdegen
2019-07-05  6:35                     ` Pip Cet
2019-07-05  7:50                       ` Eli Zaretskii
2019-07-05  8:12                         ` Pip Cet
2019-07-05  8:25                           ` Eli Zaretskii
2019-07-05  8:36                             ` Pip Cet
2019-07-05  8:41                               ` Eli Zaretskii
2019-07-05  9:09                                 ` Pip Cet
2019-07-05 12:23                                   ` Robert Pluim
2019-07-05 12:33                                   ` Eli Zaretskii
2019-07-05 13:41                                     ` Pip Cet
2019-07-05 18:00                                     ` Stefan Monnier
2019-07-05 18:07                                       ` Eli Zaretskii
2019-07-05 20:16                                         ` Stefan Monnier
2019-07-05 18:57                                       ` Pip Cet
2019-07-05 19:13                                         ` Eli Zaretskii
2019-07-05 20:21                                         ` Stefan Monnier
2019-07-05 21:52                                           ` Pip Cet
2019-07-05 22:10                                             ` Stefan Monnier
2019-07-06  6:45                                               ` Eli Zaretskii
2019-07-06 15:08                                                 ` Pip Cet
2019-07-09 21:05                                                   ` Stefan Monnier
2019-07-10  2:38                                                     ` Eli Zaretskii
2019-07-10  3:19                                                       ` Daniel Colascione [this message]
2019-07-10 15:01                                                         ` Pip Cet
2019-07-10 17:16                                                           ` Daniel Colascione
2019-07-10 20:14                                                             ` Pip Cet
2019-07-06 15:32                                             ` Michael Heerdegen
2019-07-08 17:30                                               ` Lars Ingebrigtsen
2019-07-08 17:58                                                 ` Pip Cet
2019-07-08 22:18                                                   ` Lars Ingebrigtsen
2019-07-08 22:25                                                     ` Noam Postavsky
2019-07-09 14:00                                                       ` Pip Cet
2019-07-10  3:01                                                         ` Daniel Colascione
2019-07-14 14:06                                                           ` Noam Postavsky
2019-07-08 23:22                                                     ` Stefan Monnier
2019-07-08 22:23                                                   ` Michael Heerdegen
2019-07-09 15:43                                                     ` Eli Zaretskii
2019-07-09 20:15                                                   ` Stefan Monnier
2019-07-05  7:55                       ` Katsumi Yamaoka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=735465eebd8f96d3969d42e6f734f69e.squirrel@dancol.org \
    --to=dancol@dancol.org \
    --cc=36447@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=michael_heerdegen@web.de \
    --cc=monnier@iro.umontreal.ca \
    --cc=npostavs@gmail.com \
    --cc=pipcet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).