unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Noah Lavine <noah.b.lavine@gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: guile-devel@gnu.org
Subject: Re: rfi: hash set
Date: Fri, 14 Jan 2011 20:42:16 -0500	[thread overview]
Message-ID: <AANLkTineFTMk-bfCJ9FtT8jw3UMDVBWQxWn_yP=-eyY4@mail.gmail.com> (raw)
In-Reply-To: <m3vd22d6rh.fsf@unquote.localdomain>

Hello,

I started looking into implementing this, and I ran into something
strange that I'd like clarification on. Am I correct in saying that
currently, hash tables can only shrink by one size index when they are
rehashed?

I think this because of hashtab.c, line 293. This is a part of
scm_i_rehash which looks like this:

 281       /* rehashing is not triggered when i <= min_size */
 290       i = SCM_HASHTABLE (table)->size_index;
 291       do
 292         --i;
 293       while (i > SCM_HASHTABLE (table)->min_size_index
 294              && SCM_HASHTABLE_N_ITEMS (table) < hashtable_size[i] / 4);

The i variable is an int representing the size of the new hash table.
It is an index into hashtable_size, an array of allowed hashtable
sizes. So i will be decremented until it represents a reasonable size
for the table.

However, i is also bounded by the table's min_size_index. Here's the
thing: based on grepping through this file, it seems that
min_size_index is set when a table is first made, to the initial size
index of the table, and never changes. Therefore, any i that
represents a shrink of the table will be <= min_size_index, so the
while's condition will always fail, so the loop can only run once, no
matter how few items are in the hash table, so i will always be the
old size_index - 1.

(This code path is only run in case of a shrink, not when a table
needs to grow.)

Is that right?
Noah

On Wed, Jan 5, 2011 at 10:56 PM, Andy Wingo <wingo@pobox.com> wrote:
> Hello,
>
> Currently the symbol table takes up twice as much memory as it needs to,
> because it is a hash table instead of a set. (The difference being that
> the buckets in a set don't need to be pairs.)
>
> We don't actually have a good set data type implementation, and I'm sure
> people have opinions about this, so if anyone has the time, an
> implementation would be appreciated. Name it hashset.[ch] and make sure
> it handles the weak reference case.
>
> Thanks! :) (Hey, it's worth a try :)
>
> Andy
> --
> http://wingolog.org/
>
>



  reply	other threads:[~2011-01-15  1:42 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06  3:56 rfi: hash set Andy Wingo
2011-01-15  1:42 ` Noah Lavine [this message]
2011-01-17 21:27   ` Ludovic Courtès
2011-01-20  2:11     ` Noah Lavine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTineFTMk-bfCJ9FtT8jw3UMDVBWQxWn_yP=-eyY4@mail.gmail.com' \
    --to=noah.b.lavine@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).