* weak hash tables @ 2011-10-24 17:19 Andy Wingo 2011-10-28 9:02 ` Andy Wingo 2011-11-11 20:30 ` Ludovic Courtès 0 siblings, 2 replies; 4+ messages in thread From: Andy Wingo @ 2011-10-24 17:19 UTC (permalink / raw) To: guile-devel Hi, While doing the recent retagging work I realized that I really wanted to kill weak pairs. They look like pairs but you can't actually access their fields using car and cdr -- terrible things. To do that I had to reimplement the weak hash tables, to make them not use them. I did that, reimplementing them as open-addressed hash tables, with robin hood linear probing on collision. It seems to work well. Besides the performance improvement you get with just scanning linearly through memory, open-addressed hash tables aren't structurally perturbed by the disappearing links. It's a lot less messy, and uses less memory, and we have space to encode the actual hash values in the table. (However I don't think our current hash functions are very good.) The one nasty thing there was adapting to the old interfaces. In particular the hashx interface expects to be able to assoc down a list, but there are no assoc lists in the new weak tables. Guess what I do? That's right, for weak hashx tables I make new association lists in the equal? handler, then pass that alist to the user's assoc function. Gross. Currently the new weak tables are only accessible under the old API. I don't know whether to publicise the new API, or make a different API, or what. If we hadn't coded the handle API into our hash tables I would replace the regular (strong) hash table implementation too. I might still do that but it's not pressing. Anyway, thoughts on the API are welcome. Regards, Andy ps. The new weak tables are thread-safe. Yay. -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: weak hash tables 2011-10-24 17:19 weak hash tables Andy Wingo @ 2011-10-28 9:02 ` Andy Wingo 2011-11-11 20:30 ` Ludovic Courtès 1 sibling, 0 replies; 4+ messages in thread From: Andy Wingo @ 2011-10-28 9:02 UTC (permalink / raw) To: guile-devel Hi, A follow-up to the weak table work. On Mon 24 Oct 2011 19:19, Andy Wingo <wingo@pobox.com> writes: > I [reimplemented weak hash tables] as open-addressed hash tables, with > robin hood linear probing on collision The tables work well. There are a couple of drawbacks currently: (1) Moving weak references (disappearing links) is expensive for libgc, because it has to free then allocate a new entry in the disappearing links table. I have a patch to add an API to fix this, and I think it will probably be accepted. (2) You still have to run over the whole table after GC to update the occupancy count of the table. I'm trying to fix that in libgc as well, to add a variant of the disappearing link API that increments a counter somewhere in memory. > (However I don't think our current hash functions are very good.) They were *terrible*. Replacing them with hash functions from the internet improved the performace of compiling psyntax.scm by about 25 or 30%. It's not often you get to do that! Currently, for me, compiling psyntax takes 1.4 seconds on master whereas it takes 1.2 seconds on stable-2.0. This bothered me for a while, until I realized that it's actually compiling different things. psyntax-pp.scm on master (representing the expanded, optimized form of psyntax.scm) is different from its counterpart on stable-2.0, though in the end the compiled psyntax-pp.go on master is actually larger than the stable-2.0 counterpart. For the moment I'm not worried, as there will be many changes in 2.2 that should favor a more agressive inliner. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: weak hash tables 2011-10-24 17:19 weak hash tables Andy Wingo 2011-10-28 9:02 ` Andy Wingo @ 2011-11-11 20:30 ` Ludovic Courtès 2011-11-15 21:33 ` Andy Wingo 1 sibling, 1 reply; 4+ messages in thread From: Ludovic Courtès @ 2011-11-11 20:30 UTC (permalink / raw) To: guile-devel Hi Andy, Andy Wingo <wingo@pobox.com> skribis: > While doing the recent retagging work I realized that I really wanted to > kill weak pairs. They look like pairs but you can't actually access > their fields using car and cdr -- terrible things. Agreed. The reason was to keep the weak hash table API unchanged. > To do that I had to reimplement the weak hash tables, to make them not > use them. I did that, reimplementing them as open-addressed hash > tables, with robin hood linear probing on collision. It seems to work > well. Besides the performance improvement you get with just scanning > linearly through memory, Could you provide an executive summary ;-) of: - how good ol’ weak hash tables related to the new weak sets and weak tables? - what the deal is wrt. performance, linear probing, and rehashing? > open-addressed hash tables aren't structurally perturbed by the > disappearing links. It's a lot less messy, and uses less memory, and > we have space to encode the actual hash values in the table. That’s definitely a huge improvement. :-) > The one nasty thing there was adapting to the old interfaces. In > particular the hashx interface expects to be able to assoc down a list, > but there are no assoc lists in the new weak tables. Guess what I do? > That's right, for weak hashx tables I make new association lists in the > equal? handler, then pass that alist to the user's assoc function. > Gross. > > Currently the new weak tables are only accessible under the old API. I > don't know whether to publicise the new API, or make a different API, or > what. If we hadn't coded the handle API into our hash tables I would > replace the regular (strong) hash table implementation too. I might > still do that but it's not pressing. FWIW I would keep the new C API private. Is there any advantage to using the new weak table API instead of the old weak hash table API? The names suggest that they’re not quite equivalent, but maybe that’s just the names? Thanks for this brave endeavor! :-) Ludo’. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: weak hash tables 2011-11-11 20:30 ` Ludovic Courtès @ 2011-11-15 21:33 ` Andy Wingo 0 siblings, 0 replies; 4+ messages in thread From: Andy Wingo @ 2011-11-15 21:33 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hi Ludo, On Fri 11 Nov 2011 21:30, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> skribis: > >> While doing the recent retagging work I realized that I really wanted to >> kill weak pairs. They look like pairs but you can't actually access >> their fields using car and cdr -- terrible things. > > Agreed. The reason was to keep the weak hash table API unchanged. Understood. However given that we have now prohibited the get-handle API from being used on weak tables, we do have a bit more freedom of implementation. > Could you provide an executive summary ;-) of: Hehe, OK :) > - how good ol’ weak hash tables related to the new weak sets and weak > tables? All of the uses of weak hash tables in Guile were in a multithreaded context. Weak sets and weak tables now include locking, so that user code doesn't have to lock. Weak sets and tables store the hash value in the set or table entries, which is obviously more efficient, and avoids the need to store hash / equality functions in the table itself (as the old hash tables need to do, for purposes of rehashing). The old weak table implementation had to vacuum the ribs and buckets after GC to avoid taking up memory on behalf of disappeared items. The new tables store the cells in a linear array, which can have empty entries, meaning that adding an entry to the table doesn't allocate memory -- besides the libgc-internal memory allocated when creating a disappearing link. (We could have fixed the old weak hash table with chained finalizers, but that's tricky too.) A weak table with N available slots takes up N*3 words: one for the hash of the key, one for the key, and one for the value. A weak hash table with N buckets and N/2 entries takes up N words for the buckets, and N/2*(2+2) = N*2 words for the entries (one each for the rib, one for the handle), yielding N*3. So they are similar regarding memory usage, though they are different GC loads. The old implementation used disappearing leaks incorrectly, as it accessed the weak car and cdr without the GC alloc lock. The new implementation takes the alloc lock when getting the key and value. It doesn't take the lock when comparing hash values though. > - what the deal is wrt. performance, linear probing, and rehashing? In theory linear probing is faster, if you can keep the max search length under control. (Robin hood hashing provides this property.) This is because sequential buckets are likely to be on the same cache lines, unlike the buckets and ribs. SBCL hacker Paul Khoung argues that here: http://www.pvk.ca/Blog/more_numerical_experiments_in_hashing.html However in our case, adding disappearing links and taking the alloc lock likely destroys these advantages. > Is there any advantage to using the new weak table API instead of the > old weak hash table API? The names suggest that they’re not quite > equivalent, but maybe that’s just the names? Not really. It's just that using the old API introduces a sort of pseudo-generic interface for hash tables: weak tables don't share any implementation bits at all, but they use the same functions, except the ones they don't use. The thing that would make the most sense would be some sort of generic map interface, like the array implementation interface. But that's a different project, eh... Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-11-15 21:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-10-24 17:19 weak hash tables Andy Wingo 2011-10-28 9:02 ` Andy Wingo 2011-11-11 20:30 ` Ludovic Courtès 2011-11-15 21:33 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).