* rfi: hash set @ 2011-01-06 3:56 Andy Wingo 2011-01-15 1:42 ` Noah Lavine 0 siblings, 1 reply; 4+ messages in thread From: Andy Wingo @ 2011-01-06 3:56 UTC (permalink / raw) To: guile-devel Hello, Currently the symbol table takes up twice as much memory as it needs to, because it is a hash table instead of a set. (The difference being that the buckets in a set don't need to be pairs.) We don't actually have a good set data type implementation, and I'm sure people have opinions about this, so if anyone has the time, an implementation would be appreciated. Name it hashset.[ch] and make sure it handles the weak reference case. Thanks! :) (Hey, it's worth a try :) Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rfi: hash set 2011-01-06 3:56 rfi: hash set Andy Wingo @ 2011-01-15 1:42 ` Noah Lavine 2011-01-17 21:27 ` Ludovic Courtès 0 siblings, 1 reply; 4+ messages in thread From: Noah Lavine @ 2011-01-15 1:42 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Hello, I started looking into implementing this, and I ran into something strange that I'd like clarification on. Am I correct in saying that currently, hash tables can only shrink by one size index when they are rehashed? I think this because of hashtab.c, line 293. This is a part of scm_i_rehash which looks like this: 281 /* rehashing is not triggered when i <= min_size */ 290 i = SCM_HASHTABLE (table)->size_index; 291 do 292 --i; 293 while (i > SCM_HASHTABLE (table)->min_size_index 294 && SCM_HASHTABLE_N_ITEMS (table) < hashtable_size[i] / 4); The i variable is an int representing the size of the new hash table. It is an index into hashtable_size, an array of allowed hashtable sizes. So i will be decremented until it represents a reasonable size for the table. However, i is also bounded by the table's min_size_index. Here's the thing: based on grepping through this file, it seems that min_size_index is set when a table is first made, to the initial size index of the table, and never changes. Therefore, any i that represents a shrink of the table will be <= min_size_index, so the while's condition will always fail, so the loop can only run once, no matter how few items are in the hash table, so i will always be the old size_index - 1. (This code path is only run in case of a shrink, not when a table needs to grow.) Is that right? Noah On Wed, Jan 5, 2011 at 10:56 PM, Andy Wingo <wingo@pobox.com> wrote: > Hello, > > Currently the symbol table takes up twice as much memory as it needs to, > because it is a hash table instead of a set. (The difference being that > the buckets in a set don't need to be pairs.) > > We don't actually have a good set data type implementation, and I'm sure > people have opinions about this, so if anyone has the time, an > implementation would be appreciated. Name it hashset.[ch] and make sure > it handles the weak reference case. > > Thanks! :) (Hey, it's worth a try :) > > Andy > -- > http://wingolog.org/ > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rfi: hash set 2011-01-15 1:42 ` Noah Lavine @ 2011-01-17 21:27 ` Ludovic Courtès 2011-01-20 2:11 ` Noah Lavine 0 siblings, 1 reply; 4+ messages in thread From: Ludovic Courtès @ 2011-01-17 21:27 UTC (permalink / raw) To: guile-devel Hi Noah! Noah Lavine <noah.b.lavine@gmail.com> writes: > I started looking into implementing this, and I ran into something > strange that I'd like clarification on. Am I correct in saying that > currently, hash tables can only shrink by one size index when they are > rehashed? Yes, your analysis looks correct to me. Would you like to look into fixing this? :-) I suppose the trick would be to regularly recompile ‘min_size_index’ based on the current ‘SCM_HASHTABLE_N_ITEMS’, iterating on HASHTABLE_SIZE, starting from the current ‘min_size_index’. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rfi: hash set 2011-01-17 21:27 ` Ludovic Courtès @ 2011-01-20 2:11 ` Noah Lavine 0 siblings, 0 replies; 4+ messages in thread From: Noah Lavine @ 2011-01-20 2:11 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hello again, Actually, I need to retract what I said before. Here is what I think is actually going on, and it means that the current code is correct. min_size_index will usually be 0, which means the table can shrink down to the minimum hash table size (31). The only case it won't be 0 is if the hash table was created by passing a size argument to make-hash-table, in which case min_size_index will be whatever it needs to be so that the table will never shrink below the size specified to make-hash-table. This is consistent with the documentation, which says that a table created with a given size will always be at least that large. Sorry about that. I'll work on hash sets now. Noah On Mon, Jan 17, 2011 at 4:27 PM, Ludovic Courtès <ludo@gnu.org> wrote: > Hi Noah! > > Noah Lavine <noah.b.lavine@gmail.com> writes: > >> I started looking into implementing this, and I ran into something >> strange that I'd like clarification on. Am I correct in saying that >> currently, hash tables can only shrink by one size index when they are >> rehashed? > > Yes, your analysis looks correct to me. Would you like to look into > fixing this? :-) > > I suppose the trick would be to regularly recompile ‘min_size_index’ > based on the current ‘SCM_HASHTABLE_N_ITEMS’, iterating on > HASHTABLE_SIZE, starting from the current ‘min_size_index’. > > Thanks, > Ludo’. > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-20 2:11 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-06 3:56 rfi: hash set Andy Wingo 2011-01-15 1:42 ` Noah Lavine 2011-01-17 21:27 ` Ludovic Courtès 2011-01-20 2:11 ` Noah Lavine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).