* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
2016-12-31 14:18 ` [Emacs-diffs] master 064701d 2/3: Increase the obarray size Stefan Monnier
@ 2016-12-31 20:54 ` Ken Raeburn
2017-01-01 14:43 ` Stefan Monnier
0 siblings, 1 reply; 3+ messages in thread
From: Ken Raeburn @ 2016-12-31 20:54 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> On Dec 31, 2016, at 09:18, Stefan Monnier <monnier@IRO.UMontreal.CA> wrote:
>
>> In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
>> the time we've started. The old obarray size ensured an average chain
>> length of 10 or more.
>> * src/lread.c (OBARRAY_SIZE): Increase to 15121.
>
> Have you checked the performance impact? E.g. compared to a length of 8K?
I got a slight improvement over the old setting when loading dumped.elc, not a huge one, but I didn’t compare against a value in the 8k neighborhood.
I’m not sure what a better test might be. I suppose I could create a test case with a list of strings matching the interned symbols found in dumped.elc in content (depends on the configuration), number of repetitions (depends on the #N# hack I recently pushed to the scratch/raeburn-startup branch), and ordering (depends on the obarray size via mapatoms, of course!), and test *just* interning those strings in a new obarray preloaded with the symbols created in the Emacs C code, without the rest of the reader code. But I’m not sure that symbol set is an optimal test; certainly, if we don’t go the big-elc route, the timing for that specific set of intern operations won’t be something worth optimizing for. And such a test strips out the cache contention and other effects of reading bytes and checking for Unicode input and allocating cons cells and so on, for better or worse.
> I do believe 15K is better than 1.5K, but an average length of 1 sound
> like the hash array might end up being larger than the optimal size.
It’s 1 if you look at all the slots, but about 37% of the slots should remain empty (it’s a standard balls-in-bins question in probability), putting all the symbols (and all the symbol lookups) in the other 63%, so the slots actually used have an average length of about 1.6.
Of course, this ignores uneven distribution of the hash function over the actual, real-world set of symbol names, and since the symbols aren’t equally likely in the input, chain length is only a rough approximation for actual run-time cost. We might be much more affected by where “nil” winds up in its symbol chain than by where “describe-mode” winds up in its chain. Also, in the macOS Nextstep build, it’s more like 21k symbols to start (and ~1/4 empty slots, and nonempty-chain length 1.9 I think).
So, yeah, the initial utilization level is perhaps a bit low. In my testing with the Linux “perf” tools, it appeared that examining symbols — actually, reading the string size fields from memory — was where the CPU tended to stall and waste most of its cycles, so I was aiming to keep the chains short, at a bit of a cost in memory size and cache contention in the obarray itself. I’m happy to run some more tests and look for a better size, if you think it’s important.
Really, I think an obarray should be an opaque object able to automatically resize itself (hash table?) or reorganize itself (tree?), and not pretend to be sort of like a fixed-size array with some symbols visible and some symbols hiding invisibly inside it, but it doesn’t seem crucial enough to performance to actually do anything about it right now.
Ken
^ permalink raw reply [flat|nested] 3+ messages in thread