On Sat, Jul 6, 2019 at 6:45 AM Eli Zaretskii wrote: > > > Indeed. I'm attaching a proof of concept that we can simply freeze the > > > hash tables when dumping and thaw them when loading a dump, which > > > includes rehashing. Do you happen to know why it wasn't done that way? > > > > I'd guess it was to shorten the startup time by doing this rehashing lazily. > > The function pdumper-stats with show the time it took to load Emacs, > so the effect of this on the load time can be measured I'm measuring it directly, and it's more than I expected: about a millisecond, for 4,300 hash table entries. What we can't easily measure is how much the lazy rehashing code would slow us down anyway. For comparison, the entire time stored in pdumper-stats is 15 ms here. I don't think that's significant, because we'd probably end up rehashing most of the large hash tables anyway. We're saving some 250 KB of space in the pdmp image, which was previously used for redundant information. (I'm surprised it's that much, but I guess pdumper relocations are fairly large?) I'm attaching a revised patch, which uses vectors rather than consed lists for both the key_and_value vector, avoiding a copy in the common case where there is more than one hash table entry, and for the list of hash tables. It still contains debugging/timing code. charset.c currently assumes hash table entries will stay at the same index in Vcharset_hash_table. I think that works okay in practice, because we don't shrink or reorder hash tables, but it was still a bit of a nasty surprise. This concept appears to work: modify pdumper to special-case hash tables and freeze/thaw them properly. You probably shouldn't dump hash tables with complicated user-defined hash functions. Both PURE_P and pdumper_object_p fail to distinguish between tables that were pure or impure before being dumped. This also fixes the bug that (hash-table-count dumped-hash-table) will return a negative number if no previous access to the hash table has happened, but of course we can fix that directly... Of course, we're still modifying purecopied information.