unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* doco hash tables
@ 2003-07-30 23:02 Kevin Ryde
  2003-08-17  0:35 ` Kevin Ryde
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Ryde @ 2003-07-30 23:02 UTC (permalink / raw)


This is a proposed revision for the hash tables node.  The main aim is
to cut down the duplication between the function descriptions, to try
to the basic operations clearer.  I've tried to spruce up the
introductory words too.

An example of using the hashx functions might be nice, maybe
case-insensitive string keys like already mentioned, if someone's got
a nice hash function along those lines.

I noticed there's no hashx-remove!, is that merely an omission (in the
code that is), or am I missing something?

Is it true that modifying the hash table while hash-fold is iterating
over it will have unpredicatable results?  (As far as the elements
seen or not seen go.)  Probably worth a warning about that.



Hash Table Reference
....................

A hash table stores key/value pairs, designed for fast lookups by key.
Keys and values can be any Scheme objects.

   A hash table is similar to an association list in the way it holds
keys and values (*note Association Lists::), but an association list
requires a linear search to locate an element, which can be slow when
the list is long.  A hash table instead uses arithmetic to make a array
index from a key, so lookups take a more or less fixed time no matter
how big the table.

   Like the association list functions, the hash table functions come in
several varieties, according to the equality test used for the keys.
Plain `hash-' functions use `equal?', `hashq-' functions use `eq?',
`hashv-' functions use `eqv?', and the `hashx-' functions use an
application supplied test (for instance to implement case insensitive
strings).

   A single `make-hash-table' creates a hash table suitable for use
with any set of functions, but it's imperative that just one set is
then used consistently, or results will be unpredictable.


   Hash tables are implemented as a vector indexed by an integer formed
from the key, and a association list of key/value pairs for each bucket
in case distinct keys hash together.  Direct access to the pairs in
those lists is provided by the `-handle-' functions.

   For the `hashx-' "extended" routines, an application supplies a HASH
function producing an integer index (as per `hashq' etc below), and an
ASSOC alist search function (as per `assq' etc, *Note Retrieving Alist
Entries::.).  The aim in the HASH function is to have different keys
spread out across the vector, so the bucket lists don't become long,
but the exact values generated are otherwise arbitrary.


 - Scheme Procedure: make-hash-table size
     Create a new hash table of SIZE slots.  Note that the number of
     slots does not limit the size of the table, it just tells how large
     the underlying vector will be.  The SIZE should be similar to the
     expected number of elements which will be added to the table, but
     they need not match.  For good performance, it might be a good
     idea to use a prime number as the SIZE.

 - Scheme Procedure: hash-ref table key [dflt]
 - Scheme Procedure: hashq-ref table key [dflt]
 - Scheme Procedure: hashv-ref table key [dflt]
 - Scheme Procedure: hashx-ref hash assoc table key [dflt]
 - C Function: scm_hash_ref (table, key, dflt)
 - C Function: scm_hashq_ref (table, key, dflt)
 - C Function: scm_hashv_ref (table, key, dflt)
 - C Function: scm_hashx_ref (hash, assoc, table, key, dflt)
     Lookup KEY in the given hash TABLE, and return the associated
     value.  If KEY is not found, return DFLT, or `#f' if DFLT is not
     given.  (For the C functions, DFLT must be given.)

 - Scheme Procedure: hash-set! table key val
 - Scheme Procedure: hashq-set! table key val
 - Scheme Procedure: hashv-set! table key val
 - Scheme Procedure: hashx-set! hash assoc table key val
 - C Function: scm_hash_set_x (table, key, val)
 - C Function: scm_hashq_set_x (table, key, val)
 - C Function: scm_hashv_set_x (table, key, val)
 - C Function: scm_hashx_set_x (hash, assoc, table, key, val)
     Associate VAL with KEY in the given hash TABLE.  If KEY is already
     present then it's associated value is changed.  If it's not
     present then a new entry is created.

 - Scheme Procedure: hash-remove! table key
 - Scheme Procedure: hashq-remove! table key
 - Scheme Procedure: hashv-remove! table key
 - C Function: scm_hash_remove_x (table, key)
 - C Function: scm_hashq_remove_x (table, key)
 - C Function: scm_hashv_remove_x (table, key)
     Remove any association for KEY in the given hash TABLE.  If KEY is
     not in TABLE then nothing is done.

 - Scheme Procedure: hash key size
 - Scheme Procedure: hashq key size
 - Scheme Procedure: hashv key size
 - C Function: scm_hash (key, size)
 - C Function: scm_hashq (key, size)
 - C Function: scm_hashv (key, size)
     Return a hash value for KEY.  This is a number in the range 0 to
     SIZE - 1, which is suitable for use in a hash table of the given
     SIZE.

     Note that `hashq' and `hashv' may use internal addresses of
     objects, so if an object is garbage collected and re-created it can
     have a different hash value, even when the two are notionally
     `eq?'.  For instance with symbols,

          (hashq 'something 123)   => 19
          (gc)
          (hashq 'something 123)   => 62

     In normal use this is not a problem, since an object entered into a
     hash table won't be garbage collected until removed.  It's only if
     hashing calculations are somehow separated from normal references
     to an object that its lifetime needs to be considered.

 - Scheme Procedure: hash-get-handle table key
 - Scheme Procedure: hashq-get-handle table key
 - Scheme Procedure: hashv-get-handle table key
 - Scheme Procedure: hashx-get-handle hash assoc table key
 - C Function: scm_hash_get_handle (table, key)
 - C Function: scm_hashq_get_handle (table, key)
 - C Function: scm_hashv_get_handle (table, key)
 - C Function: scm_hashx_get_handle (hash, assoc, table, key)
     Return the `(KEY . VALUE)' pair for KEY in the given hash TABLE,
     or `#f' if KEY is not in TABLE.

 - Scheme Procedure: hash-create-handle! table key init
 - Scheme Procedure: hashq-create-handle! table key init
 - Scheme Procedure: hashv-create-handle! table key init
 - Scheme Procedure: hashx-create-handle! hash assoc table key init
 - C Function: scm_hash_create_handle_x (table, key, init)
 - C Function: scm_hashq_create_handle_x (table, key, init)
 - C Function: scm_hashv_create_handle_x (table, key, init)
 - C Function: scm_hashx_create_handle_x (hash, assoc, table, key, init)
     Return the `(KEY . VALUE)' pair for KEY in the given hash TABLE.
     If KEY is not in TABLE then create an entry for it with INIT as
     the value, and return that pair.

 - Scheme Procedure: hash-fold proc init table
 - C Function: scm_hash_fold (proc, init, table)
     Accumulate a result by applying PROC to the elements of the given
     hash TABLE.

     Each call is `(PROC KEY VALUE PRIOR-RESULT)', where KEY and VALUE
     are from the TABLE and PRIOR-RESULT is the return from the previous
     PROC call.  For the first call, PRIOR-RESULT is the given INIT
     value.  Calls are made over the elements in an unspecified order.

     For example, the following returns an alist comprising all the
     entries from a hash table `mytable',

          (hash-fold acons '() mytable)



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: doco hash tables
  2003-07-30 23:02 doco hash tables Kevin Ryde
@ 2003-08-17  0:35 ` Kevin Ryde
  2003-08-28  0:01   ` Kevin Ryde
  2003-09-15 13:48   ` Marius Vollmer
  0 siblings, 2 replies; 5+ messages in thread
From: Kevin Ryde @ 2003-08-17  0:35 UTC (permalink / raw)


I made these changes,

	* scheme-compound.texi (Hash Table Reference): Collect up groups of
	functions to avoid duplication.  Revise notes on hashx functions and
	on vector implementation.  In make-hash-table, size is now optional.
	Add hash-map and hash-for-each.

What can be said about the new resizing stuff and the initial size in
make-hash-table?  Is the initial size a permanent minimum, or just a
hint?  (I couldn't quite tell from the code.)


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: doco hash tables
  2003-08-17  0:35 ` Kevin Ryde
@ 2003-08-28  0:01   ` Kevin Ryde
  2003-09-15 13:48   ` Marius Vollmer
  1 sibling, 0 replies; 5+ messages in thread
From: Kevin Ryde @ 2003-08-28  0:01 UTC (permalink / raw)


A bit more for the hash functions section,

        * scheme-compound.texi (Hash Table Reference): Add hashx case
        insensitive string example, add cross references to symbol-hash,
        string-hash, and char-set-hash.



   For the `hashx-' "extended" routines, an application supplies a HASH
function producing an integer index like `hashq' etc below, and an
ASSOC alist search function like `assq' etc (*note Retrieving Alist
Entries::).  Here's an example of such functions implementing
case-insensitive hashing of string keys,

     (use-modules (srfi srfi-1)
                  (srfi srfi-13))
     (define (my-hash str size)
       (remainder (string-hash-ci str) size))
     (define (my-assoc str alist)
       (find (lambda (pair) (string-ci=? str (car pair))) alist))

     (define my-table (make-hash-table))
     (hashx-set! my-hash my-assoc my-table "foo" 123)

     (hashx-ref my-hash my-assoc my-table "FOO")
     => 123

   In a `hashx-' HASH function the aim is to spread keys across the
vector, so bucket lists don't become long, but the actual values are
arbitrary (so long as they're in the range 0 to SIZE-1).  Helpful
functions for forming a hash value include `symbol-hash' (*note Symbol
Keys::), `string-hash' (*note SRFI-13 Comparison::), and
`char-set-hash' (*note SRFI-14 Predicates/Comparison::).


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: doco hash tables
  2003-08-17  0:35 ` Kevin Ryde
  2003-08-28  0:01   ` Kevin Ryde
@ 2003-09-15 13:48   ` Marius Vollmer
  2003-10-09  0:16     ` Kevin Ryde
  1 sibling, 1 reply; 5+ messages in thread
From: Marius Vollmer @ 2003-09-15 13:48 UTC (permalink / raw)


Kevin Ryde <user42@zip.com.au> writes:

> What can be said about the new resizing stuff and the initial size in
> make-hash-table?  Is the initial size a permanent minimum, or just a
> hint?  (I couldn't quite tell from the code.)

I don't know right away, but I say it is important to document
hashtables properly.  Could you try to figure out what is going on and
document it?  That would be super great.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: doco hash tables
  2003-09-15 13:48   ` Marius Vollmer
@ 2003-10-09  0:16     ` Kevin Ryde
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Ryde @ 2003-10-09  0:16 UTC (permalink / raw)


Marius Vollmer <mvo@zagadka.de> writes:
>
> I don't know right away, but I say it is important to document
> hashtables properly.

I think the size parameter is now a minimum size, and it'll rehash
above that.  I updated the manual accordingly.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-10-09  0:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-30 23:02 doco hash tables Kevin Ryde
2003-08-17  0:35 ` Kevin Ryde
2003-08-28  0:01   ` Kevin Ryde
2003-09-15 13:48   ` Marius Vollmer
2003-10-09  0:16     ` Kevin Ryde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).