unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* weak hash tables
@ 2011-10-24 17:19 Andy Wingo
  2011-10-28  9:02 ` Andy Wingo
  2011-11-11 20:30 ` Ludovic Courtès
  0 siblings, 2 replies; 4+ messages in thread
From: Andy Wingo @ 2011-10-24 17:19 UTC (permalink / raw)
  To: guile-devel

Hi,

While doing the recent retagging work I realized that I really wanted to
kill weak pairs.  They look like pairs but you can't actually access
their fields using car and cdr -- terrible things.  To do that I had to
reimplement the weak hash tables, to make them not use them.  I did
that, reimplementing them as open-addressed hash tables, with robin hood
linear probing on collision.  It seems to work well.  Besides the
performance improvement you get with just scanning linearly through
memory, open-addressed hash tables aren't structurally perturbed by the
disappearing links.  It's a lot less messy, and uses less memory, and we
have space to encode the actual hash values in the table.  (However I
don't think our current hash functions are very good.)

The one nasty thing there was adapting to the old interfaces.  In
particular the hashx interface expects to be able to assoc down a list,
but there are no assoc lists in the new weak tables.  Guess what I do?
That's right, for weak hashx tables I make new association lists in the
equal? handler, then pass that alist to the user's assoc function.
Gross.

Currently the new weak tables are only accessible under the old API.  I
don't know whether to publicise the new API, or make a different API, or
what.  If we hadn't coded the handle API into our hash tables I would
replace the regular (strong) hash table implementation too.  I might
still do that but it's not pressing.

Anyway, thoughts on the API are welcome.

Regards,

Andy

ps.  The new weak tables are thread-safe.  Yay.
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weak hash tables
  2011-10-24 17:19 weak hash tables Andy Wingo
@ 2011-10-28  9:02 ` Andy Wingo
  2011-11-11 20:30 ` Ludovic Courtès
  1 sibling, 0 replies; 4+ messages in thread
From: Andy Wingo @ 2011-10-28  9:02 UTC (permalink / raw)
  To: guile-devel

Hi,

A follow-up to the weak table work.

On Mon 24 Oct 2011 19:19, Andy Wingo <wingo@pobox.com> writes:

> I [reimplemented weak hash tables] as open-addressed hash tables, with
> robin hood linear probing on collision

The tables work well.  There are a couple of drawbacks currently:

  (1) Moving weak references (disappearing links) is expensive for
      libgc, because it has to free then allocate a new entry in the
      disappearing links table.  I have a patch to add an API to fix
      this, and I think it will probably be accepted.

  (2) You still have to run over the whole table after GC to update the
      occupancy count of the table.  I'm trying to fix that in libgc as
      well, to add a variant of the disappearing link API that
      increments a counter somewhere in memory.

> (However I don't think our current hash functions are very good.)

They were *terrible*.  Replacing them with hash functions from the
internet improved the performace of compiling psyntax.scm by about 25 or
30%.  It's not often you get to do that!

Currently, for me, compiling psyntax takes 1.4 seconds on master whereas
it takes 1.2 seconds on stable-2.0.  This bothered me for a while, until
I realized that it's actually compiling different things.
psyntax-pp.scm on master (representing the expanded, optimized form of
psyntax.scm) is different from its counterpart on stable-2.0, though in
the end the compiled psyntax-pp.go on master is actually larger than the
stable-2.0 counterpart.  For the moment I'm not worried, as there will
be many changes in 2.2 that should favor a more agressive inliner.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weak hash tables
  2011-10-24 17:19 weak hash tables Andy Wingo
  2011-10-28  9:02 ` Andy Wingo
@ 2011-11-11 20:30 ` Ludovic Courtès
  2011-11-15 21:33   ` Andy Wingo
  1 sibling, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2011-11-11 20:30 UTC (permalink / raw)
  To: guile-devel

Hi Andy,

Andy Wingo <wingo@pobox.com> skribis:

> While doing the recent retagging work I realized that I really wanted to
> kill weak pairs.  They look like pairs but you can't actually access
> their fields using car and cdr -- terrible things.

Agreed.  The reason was to keep the weak hash table API unchanged.

> To do that I had to reimplement the weak hash tables, to make them not
> use them.  I did that, reimplementing them as open-addressed hash
> tables, with robin hood linear probing on collision.  It seems to work
> well.  Besides the performance improvement you get with just scanning
> linearly through memory,

Could you provide an executive summary ;-) of:

  - how good ol’ weak hash tables related to the new weak sets and weak
    tables?

  - what the deal is wrt. performance, linear probing, and rehashing?

> open-addressed hash tables aren't structurally perturbed by the
> disappearing links.  It's a lot less messy, and uses less memory, and
> we have space to encode the actual hash values in the table.

That’s definitely a huge improvement.  :-)

> The one nasty thing there was adapting to the old interfaces.  In
> particular the hashx interface expects to be able to assoc down a list,
> but there are no assoc lists in the new weak tables.  Guess what I do?
> That's right, for weak hashx tables I make new association lists in the
> equal? handler, then pass that alist to the user's assoc function.
> Gross.
>
> Currently the new weak tables are only accessible under the old API.  I
> don't know whether to publicise the new API, or make a different API, or
> what.  If we hadn't coded the handle API into our hash tables I would
> replace the regular (strong) hash table implementation too.  I might
> still do that but it's not pressing.

FWIW I would keep the new C API private.

Is there any advantage to using the new weak table API instead of the
old weak hash table API?  The names suggest that they’re not quite
equivalent, but maybe that’s just the names?

Thanks for this brave endeavor!  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weak hash tables
  2011-11-11 20:30 ` Ludovic Courtès
@ 2011-11-15 21:33   ` Andy Wingo
  0 siblings, 0 replies; 4+ messages in thread
From: Andy Wingo @ 2011-11-15 21:33 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludo,

On Fri 11 Nov 2011 21:30, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> While doing the recent retagging work I realized that I really wanted to
>> kill weak pairs.  They look like pairs but you can't actually access
>> their fields using car and cdr -- terrible things.
>
> Agreed.  The reason was to keep the weak hash table API unchanged.

Understood.  However given that we have now prohibited the get-handle
API from being used on weak tables, we do have a bit more freedom of
implementation.

> Could you provide an executive summary ;-) of:

Hehe, OK :)

>   - how good ol’ weak hash tables related to the new weak sets and weak
>     tables?

All of the uses of weak hash tables in Guile were in a multithreaded
context.  Weak sets and weak tables now include locking, so that user
code doesn't have to lock.

Weak sets and tables store the hash value in the set or table entries,
which is obviously more efficient, and avoids the need to store hash /
equality functions in the table itself (as the old hash tables need to
do, for purposes of rehashing).

The old weak table implementation had to vacuum the ribs and buckets
after GC to avoid taking up memory on behalf of disappeared items.  The
new tables store the cells in a linear array, which can have empty
entries, meaning that adding an entry to the table doesn't allocate
memory -- besides the libgc-internal memory allocated when creating a
disappearing link.  (We could have fixed the old weak hash table with
chained finalizers, but that's tricky too.)

A weak table with N available slots takes up N*3 words: one for the hash
of the key, one for the key, and one for the value.  A weak hash table
with N buckets and N/2 entries takes up N words for the buckets, and
N/2*(2+2) = N*2 words for the entries (one each for the rib, one for the
handle), yielding N*3.  So they are similar regarding memory usage,
though they are different GC loads.

The old implementation used disappearing leaks incorrectly, as it
accessed the weak car and cdr without the GC alloc lock.  The new
implementation takes the alloc lock when getting the key and value.  It
doesn't take the lock when comparing hash values though.

>   - what the deal is wrt. performance, linear probing, and rehashing?

In theory linear probing is faster, if you can keep the max search
length under control.  (Robin hood hashing provides this property.)
This is because sequential buckets are likely to be on the same cache
lines, unlike the buckets and ribs.  SBCL hacker Paul Khoung argues that
here:

  http://www.pvk.ca/Blog/more_numerical_experiments_in_hashing.html

However in our case, adding disappearing links and taking the alloc lock
likely destroys these advantages.

> Is there any advantage to using the new weak table API instead of the
> old weak hash table API?  The names suggest that they’re not quite
> equivalent, but maybe that’s just the names?

Not really.  It's just that using the old API introduces a sort of
pseudo-generic interface for hash tables: weak tables don't share any
implementation bits at all, but they use the same functions, except the
ones they don't use.

The thing that would make the most sense would be some sort of generic
map interface, like the array implementation interface.  But that's a
different project, eh...

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-15 21:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-24 17:19 weak hash tables Andy Wingo
2011-10-28  9:02 ` Andy Wingo
2011-11-11 20:30 ` Ludovic Courtès
2011-11-15 21:33   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).