unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* rfi: hash set
@ 2011-01-06  3:56 Andy Wingo
  2011-01-15  1:42 ` Noah Lavine
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Wingo @ 2011-01-06  3:56 UTC (permalink / raw)
  To: guile-devel

Hello,

Currently the symbol table takes up twice as much memory as it needs to,
because it is a hash table instead of a set. (The difference being that
the buckets in a set don't need to be pairs.)

We don't actually have a good set data type implementation, and I'm sure
people have opinions about this, so if anyone has the time, an
implementation would be appreciated. Name it hashset.[ch] and make sure
it handles the weak reference case.

Thanks! :) (Hey, it's worth a try :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rfi: hash set
  2011-01-06  3:56 rfi: hash set Andy Wingo
@ 2011-01-15  1:42 ` Noah Lavine
  2011-01-17 21:27   ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Noah Lavine @ 2011-01-15  1:42 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hello,

I started looking into implementing this, and I ran into something
strange that I'd like clarification on. Am I correct in saying that
currently, hash tables can only shrink by one size index when they are
rehashed?

I think this because of hashtab.c, line 293. This is a part of
scm_i_rehash which looks like this:

 281       /* rehashing is not triggered when i <= min_size */
 290       i = SCM_HASHTABLE (table)->size_index;
 291       do
 292         --i;
 293       while (i > SCM_HASHTABLE (table)->min_size_index
 294              && SCM_HASHTABLE_N_ITEMS (table) < hashtable_size[i] / 4);

The i variable is an int representing the size of the new hash table.
It is an index into hashtable_size, an array of allowed hashtable
sizes. So i will be decremented until it represents a reasonable size
for the table.

However, i is also bounded by the table's min_size_index. Here's the
thing: based on grepping through this file, it seems that
min_size_index is set when a table is first made, to the initial size
index of the table, and never changes. Therefore, any i that
represents a shrink of the table will be <= min_size_index, so the
while's condition will always fail, so the loop can only run once, no
matter how few items are in the hash table, so i will always be the
old size_index - 1.

(This code path is only run in case of a shrink, not when a table
needs to grow.)

Is that right?
Noah

On Wed, Jan 5, 2011 at 10:56 PM, Andy Wingo <wingo@pobox.com> wrote:
> Hello,
>
> Currently the symbol table takes up twice as much memory as it needs to,
> because it is a hash table instead of a set. (The difference being that
> the buckets in a set don't need to be pairs.)
>
> We don't actually have a good set data type implementation, and I'm sure
> people have opinions about this, so if anyone has the time, an
> implementation would be appreciated. Name it hashset.[ch] and make sure
> it handles the weak reference case.
>
> Thanks! :) (Hey, it's worth a try :)
>
> Andy
> --
> http://wingolog.org/
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rfi: hash set
  2011-01-15  1:42 ` Noah Lavine
@ 2011-01-17 21:27   ` Ludovic Courtès
  2011-01-20  2:11     ` Noah Lavine
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2011-01-17 21:27 UTC (permalink / raw)
  To: guile-devel

Hi Noah!

Noah Lavine <noah.b.lavine@gmail.com> writes:

> I started looking into implementing this, and I ran into something
> strange that I'd like clarification on. Am I correct in saying that
> currently, hash tables can only shrink by one size index when they are
> rehashed?

Yes, your analysis looks correct to me.  Would you like to look into
fixing this?  :-)

I suppose the trick would be to regularly recompile ‘min_size_index’
based on the current ‘SCM_HASHTABLE_N_ITEMS’, iterating on
HASHTABLE_SIZE, starting from the current ‘min_size_index’.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rfi: hash set
  2011-01-17 21:27   ` Ludovic Courtès
@ 2011-01-20  2:11     ` Noah Lavine
  0 siblings, 0 replies; 4+ messages in thread
From: Noah Lavine @ 2011-01-20  2:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hello again,

Actually, I need to retract what I said before. Here is what I think
is actually going on, and it means that the current code is correct.

min_size_index will usually be 0, which means the table can shrink
down to the minimum hash table size (31). The only case it won't be 0
is if the hash table was created by passing a size argument to
make-hash-table, in which case min_size_index will be whatever it
needs to be so that the table will never shrink below the size
specified to make-hash-table. This is consistent with the
documentation, which says that a table created with a given size will
always be at least that large.

Sorry about that. I'll work on hash sets now.

Noah

On Mon, Jan 17, 2011 at 4:27 PM, Ludovic Courtès <ludo@gnu.org> wrote:
> Hi Noah!
>
> Noah Lavine <noah.b.lavine@gmail.com> writes:
>
>> I started looking into implementing this, and I ran into something
>> strange that I'd like clarification on. Am I correct in saying that
>> currently, hash tables can only shrink by one size index when they are
>> rehashed?
>
> Yes, your analysis looks correct to me.  Would you like to look into
> fixing this?  :-)
>
> I suppose the trick would be to regularly recompile ‘min_size_index’
> based on the current ‘SCM_HASHTABLE_N_ITEMS’, iterating on
> HASHTABLE_SIZE, starting from the current ‘min_size_index’.
>
> Thanks,
> Ludo’.
>
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-20  2:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-06  3:56 rfi: hash set Andy Wingo
2011-01-15  1:42 ` Noah Lavine
2011-01-17 21:27   ` Ludovic Courtès
2011-01-20  2:11     ` Noah Lavine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).