* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
[not found] ` <20161230230257.E6F992201CD@vcs.savannah.gnu.org>
@ 2016-12-31 14:18 ` Stefan Monnier
2016-12-31 20:54 ` Ken Raeburn
0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2016-12-31 14:18 UTC (permalink / raw)
To: emacs-devel; +Cc: Ken Raeburn
> In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
> the time we've started. The old obarray size ensured an average chain
> length of 10 or more.
> * src/lread.c (OBARRAY_SIZE): Increase to 15121.
Have you checked the performance impact? E.g. compared to a length of 8K?
I do believe 15K is better than 1.5K, but an average length of 1 sound
like the hash array might end up being larger than the optimal size.
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
2016-12-31 14:18 ` [Emacs-diffs] master 064701d 2/3: Increase the obarray size Stefan Monnier
@ 2016-12-31 20:54 ` Ken Raeburn
2017-01-01 14:43 ` Stefan Monnier
0 siblings, 1 reply; 3+ messages in thread
From: Ken Raeburn @ 2016-12-31 20:54 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> On Dec 31, 2016, at 09:18, Stefan Monnier <monnier@IRO.UMontreal.CA> wrote:
>
>> In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
>> the time we've started. The old obarray size ensured an average chain
>> length of 10 or more.
>> * src/lread.c (OBARRAY_SIZE): Increase to 15121.
>
> Have you checked the performance impact? E.g. compared to a length of 8K?
I got a slight improvement over the old setting when loading dumped.elc, not a huge one, but I didn’t compare against a value in the 8k neighborhood.
I’m not sure what a better test might be. I suppose I could create a test case with a list of strings matching the interned symbols found in dumped.elc in content (depends on the configuration), number of repetitions (depends on the #N# hack I recently pushed to the scratch/raeburn-startup branch), and ordering (depends on the obarray size via mapatoms, of course!), and test *just* interning those strings in a new obarray preloaded with the symbols created in the Emacs C code, without the rest of the reader code. But I’m not sure that symbol set is an optimal test; certainly, if we don’t go the big-elc route, the timing for that specific set of intern operations won’t be something worth optimizing for. And such a test strips out the cache contention and other effects of reading bytes and checking for Unicode input and allocating cons cells and so on, for better or worse.
> I do believe 15K is better than 1.5K, but an average length of 1 sound
> like the hash array might end up being larger than the optimal size.
It’s 1 if you look at all the slots, but about 37% of the slots should remain empty (it’s a standard balls-in-bins question in probability), putting all the symbols (and all the symbol lookups) in the other 63%, so the slots actually used have an average length of about 1.6.
Of course, this ignores uneven distribution of the hash function over the actual, real-world set of symbol names, and since the symbols aren’t equally likely in the input, chain length is only a rough approximation for actual run-time cost. We might be much more affected by where “nil” winds up in its symbol chain than by where “describe-mode” winds up in its chain. Also, in the macOS Nextstep build, it’s more like 21k symbols to start (and ~1/4 empty slots, and nonempty-chain length 1.9 I think).
So, yeah, the initial utilization level is perhaps a bit low. In my testing with the Linux “perf” tools, it appeared that examining symbols — actually, reading the string size fields from memory — was where the CPU tended to stall and waste most of its cycles, so I was aiming to keep the chains short, at a bit of a cost in memory size and cache contention in the obarray itself. I’m happy to run some more tests and look for a better size, if you think it’s important.
Really, I think an obarray should be an opaque object able to automatically resize itself (hash table?) or reorganize itself (tree?), and not pretend to be sort of like a fixed-size array with some symbols visible and some symbols hiding invisibly inside it, but it doesn’t seem crucial enough to performance to actually do anything about it right now.
Ken
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
2016-12-31 20:54 ` Ken Raeburn
@ 2017-01-01 14:43 ` Stefan Monnier
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Monnier @ 2017-01-01 14:43 UTC (permalink / raw)
To: Ken Raeburn; +Cc: emacs-devel
> I got a slight improvement over the old setting when loading dumped.elc,
> not a huge one, but I didn’t compare against a value in the 8k neighborhood.
Yeah, I guess it doesn't matter that much in the end.
> Really, I think an obarray should be an opaque object able to automatically
> resize itself (hash table?) or reorganize itself (tree?), and not pretend to
> be sort of like a fixed-size array with some symbols visible and some
> symbols hiding invisibly inside it, but it doesn’t seem crucial enough to
> performance to actually do anything about it right now.
Indeed, I'd be happy to try and get rid of our obarrays and use our
hash-tables for that instead (I've already changed a few packages over
the years to use hash0tables rather than obarrays).
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-01-01 14:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20161230230257.13334.59525@vcs.savannah.gnu.org>
[not found] ` <20161230230257.E6F992201CD@vcs.savannah.gnu.org>
2016-12-31 14:18 ` [Emacs-diffs] master 064701d 2/3: Increase the obarray size Stefan Monnier
2016-12-31 20:54 ` Ken Raeburn
2017-01-01 14:43 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).