unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
       [not found] ` <20161230230257.E6F992201CD@vcs.savannah.gnu.org>
@ 2016-12-31 14:18   ` Stefan Monnier
  2016-12-31 20:54     ` Ken Raeburn
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2016-12-31 14:18 UTC (permalink / raw)
  To: emacs-devel; +Cc: Ken Raeburn

>     In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
>     the time we've started.  The old obarray size ensured an average chain
>     length of 10 or more.
>     * src/lread.c (OBARRAY_SIZE): Increase to 15121.

Have you checked the performance impact?  E.g. compared to a length of 8K?

I do believe 15K is better than 1.5K, but an average length of 1 sound
like the hash array might end up being larger than the optimal size.


        Stefan



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
  2016-12-31 14:18   ` [Emacs-diffs] master 064701d 2/3: Increase the obarray size Stefan Monnier
@ 2016-12-31 20:54     ` Ken Raeburn
  2017-01-01 14:43       ` Stefan Monnier
  0 siblings, 1 reply; 3+ messages in thread
From: Ken Raeburn @ 2016-12-31 20:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel


> On Dec 31, 2016, at 09:18, Stefan Monnier <monnier@IRO.UMontreal.CA> wrote:
> 
>>    In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
>>    the time we've started.  The old obarray size ensured an average chain
>>    length of 10 or more.
>>    * src/lread.c (OBARRAY_SIZE): Increase to 15121.
> 
> Have you checked the performance impact?  E.g. compared to a length of 8K?

I got a slight improvement over the old setting when loading dumped.elc, not a huge one, but I didn’t compare against a value in the 8k neighborhood.

I’m not sure what a better test might be.  I suppose I could create a test case with a list of strings matching the interned symbols found in dumped.elc in content (depends on the configuration), number of repetitions (depends on the #N# hack I recently pushed to the scratch/raeburn-startup branch), and ordering (depends on the obarray size via mapatoms, of course!), and test *just* interning those strings in a new obarray preloaded with the symbols created in the Emacs C code, without the rest of the reader code.  But I’m not sure that symbol set is an optimal test; certainly, if we don’t go the big-elc route, the timing for that specific set of intern operations won’t be something worth optimizing for.  And such a test strips out the cache contention and other effects of reading bytes and checking for Unicode input and allocating cons cells and so on, for better or worse.

> I do believe 15K is better than 1.5K, but an average length of 1 sound
> like the hash array might end up being larger than the optimal size.

It’s 1 if you look at all the slots, but about 37% of the slots should remain empty (it’s a standard balls-in-bins question in probability), putting all the symbols (and all the symbol lookups) in the other 63%, so the slots actually used have an average length of about 1.6.

Of course, this ignores uneven distribution of the hash function over the actual, real-world set of symbol names, and since the symbols aren’t equally likely in the input, chain length is only a rough approximation for actual run-time cost.  We might be much more affected by where “nil” winds up in its symbol chain than by where “describe-mode” winds up in its chain.  Also, in the macOS Nextstep build, it’s more like 21k symbols to start (and ~1/4 empty slots, and nonempty-chain length 1.9 I think).

So, yeah, the initial utilization level is perhaps a bit low.  In my testing with the Linux “perf” tools, it appeared that examining symbols — actually, reading the string size fields from memory — was where the CPU tended to stall and waste most of its cycles, so I was aiming to keep the chains short, at a bit of a cost in memory size and cache contention in the obarray itself.  I’m happy to run some more tests and look for a better size, if you think it’s important.

Really, I think an obarray should be an opaque object able to automatically resize itself (hash table?) or reorganize itself (tree?), and not pretend to be sort of like a fixed-size array with some symbols visible and some symbols hiding invisibly inside it, but it doesn’t seem crucial enough to performance to actually do anything about it right now.

Ken


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
  2016-12-31 20:54     ` Ken Raeburn
@ 2017-01-01 14:43       ` Stefan Monnier
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Monnier @ 2017-01-01 14:43 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: emacs-devel

> I got a slight improvement over the old setting when loading dumped.elc,
> not a huge one, but I didn’t compare against a value in the 8k neighborhood.

Yeah, I guess it doesn't matter that much in the end.

> Really, I think an obarray should be an opaque object able to automatically
> resize itself (hash table?) or reorganize itself (tree?), and not pretend to
> be sort of like a fixed-size array with some symbols visible and some
> symbols hiding invisibly inside it, but it doesn’t seem crucial enough to
> performance to actually do anything about it right now.

Indeed, I'd be happy to try and get rid of our obarrays and use our
hash-tables for that instead (I've already changed a few packages over
the years to use hash0tables rather than obarrays).


        Stefan



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-01 14:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20161230230257.13334.59525@vcs.savannah.gnu.org>
     [not found] ` <20161230230257.E6F992201CD@vcs.savannah.gnu.org>
2016-12-31 14:18   ` [Emacs-diffs] master 064701d 2/3: Increase the obarray size Stefan Monnier
2016-12-31 20:54     ` Ken Raeburn
2017-01-01 14:43       ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).