unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Mikael Djurfeldt <djurfeldt@nada.kth.se>
Cc: Greg Troxel <gdt@ir.bbn.com>
Subject: Re: Efficiency and flexibility of hash-tables
Date: Mon, 10 Feb 2003 17:52:32 +0100	[thread overview]
Message-ID: <xy77kc8krhr.fsf@nada.kth.se> (raw)
In-Reply-To: <1044889242.1033.310.camel@localhost> (Roland Orre's message of "10 Feb 2003 16:00:42 +0100")

Roland Orre <orre@nada.kth.se> writes:

(To Roland: Ledsen att jag inte hunnit svara tidigare.)

> My idea was to modify the routines 
> scm_c_make_hash_table in hashtab.c,
> scm_make_weak_key_hash_table and scm_make_weak_value_hash_table in 
> weaks.c so they allocate three extra items in the mallocated block
> but hides this to the gc by setting the actual vector length to the
> ordinary one. These extra items will contain the actual size in
> number of items stored in the table and current upper and lower
> thresholds for resizing the table.
>
> Then modify the routines:
> scm_hash_fn_create_handle_x and scm_hash_fn_remove_x in hashtab.c
> to do resizing when needed, using a table of prime numbers for next
> and previous size.
>
> The advantage with this solution is that it's minimalistic, i.e. all 
> routines working with hash tables will behave exactly the same. Only
> the routines changing the number of items in the table will be affected.

[...]

> Comments?

Firstly: People have suggested that we should re-use other
implementations of rehashing (resizable, adaptive, ...) hash-tables.

However, the kind of hash-tables we're using in Guile is basic stuff
described in any computer science textbook.  When I asked for a patch,
I asked for the Guile implemention.

Also, hash-tables are used throughout Guile so we need them in the
core, not as a loadable library.

Now a comment on Roland's suggestion:

If we want to store extra information in the hash-table object, in the
way you suggest, that information will be hidden from the user unless
we provide special selector functions which can extract it.  But this
defeats the very purpose of representing a hash-table as a simple
Scheme type, and we could just as well provide a "real" hash-table
object, isn't that so?

Also, how would a hash-function, when receiving a hash-table
represented as a vector, know that it safely can access memory
locations after it's last index?

I also think this would obfuscate the source.

What about this even more minimalistic solution:

The idea of having an upper threshold comes from the idea of
maintaining the load factor of the hash table in a reasonable
interval.  But the purpose of the load factor is to prevent to large
clusters forming in the same hash bucket.  What about simply counting
the pairs in a bucket during insertion and rehashing when we reach a
threshold?  Then no extra data needs to be stored in the table.  (We
would need to globally serialize rehashing with an external mutex,
though.)

One disadvantage of this solution is that since we're putting a hard
limit on the maximal length of a bucket list, the number of buckets
will be sensitive to things causing collisions, such as bad hash
functions or bad luck.

Another disadvantage is that the tables wouldn't shrink.  Is this an
issue?

Mikael


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


  reply	other threads:[~2003-02-10 16:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-08 11:00 Efficiency and flexibility of hash-tables Joris van der Hoeven
2003-02-08 13:57 ` Roland Orre
2003-02-08 14:14   ` Joris van der Hoeven
2003-02-08 14:55     ` Roland Orre
2003-02-08 15:14       ` Joris van der Hoeven
2003-02-08 15:31         ` Mikael Djurfeldt
2003-02-11 11:14           ` Joris van der Hoeven
2003-02-11 11:28             ` Joris van der Hoeven
2003-02-11 12:50               ` Mikael Djurfeldt
2003-02-08 15:44         ` Roland Orre
2003-02-10  9:55           ` Andreas Rottmann
2003-02-10 14:24             ` Greg Troxel
2003-02-10 15:00               ` Roland Orre
2003-02-10 16:52                 ` Mikael Djurfeldt [this message]
2003-02-10 17:09                   ` Roland Orre
2003-02-10 17:11                   ` Mikael Djurfeldt
2003-02-11 13:59                     ` Resizing hash tables in Guile Mikael Djurfeldt
2003-02-11 17:34                       ` Roland Orre
2003-02-12 11:41                         ` Marius Vollmer
2003-02-12 16:10                       ` Marius Vollmer
2003-02-12 17:53                         ` Mikael Djurfeldt
2003-02-12 20:17                           ` Roland Orre
2003-02-13  9:35                             ` Mikael Djurfeldt
2003-02-13 13:55                               ` Harvey J. Stein
2003-02-13 14:24                                 ` Joris van der Hoeven
2003-02-13 18:30                                   ` Harvey J. Stein
2003-02-13 20:02                                     ` Paul Jarc
2003-02-13  9:52                             ` Joris van der Hoeven
2003-02-12 20:55                       ` Rob Browning
2003-02-13 10:43                         ` Mikael Djurfeldt
2003-02-12 20:47       ` Efficiency and flexibility of hash-tables Paul Jarc
2003-02-12 21:58         ` Roland Orre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xy77kc8krhr.fsf@nada.kth.se \
    --to=djurfeldt@nada.kth.se \
    --cc=gdt@ir.bbn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).