unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: "Ludovic Courtès" <ludovic.courtes@laas.fr>
Cc: guile-user@gnu.org
Subject: Re: Does anyone have a better scm_string_hash ?
Date: Fri, 14 Nov 2003 16:51:55 +0100	[thread overview]
Message-ID: <20031114155148.GI16650@powergnu.laas.fr> (raw)
In-Reply-To: <1068823738.13123.54.camel@localhost>

You might want to look at
http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html .

Basically, the idea there is that a hash key for string cn..c0 is:
     h = (c0 + c1*37 + c2*37^2 + ...) % hash_size.
I used that and it *seems* to work quite well.

This can be written as follows:

  k = 0;
  while (*string)
    {
      k  = (k * 37) + (*string);
      k %= hash_size;
      string++;
    }
  return k;

Cheers,
Ludovic.

Today, 15 minutes, 40 seconds ago, Roland Orre wrote:
> I noticed that our data mining software run very very slow with a
> new data base. I localized the problem to scm_string_hash.
> 
> A hash table in this case was loaded with 14166 strings. I have
> a function which creates a reasonable sized hash table, in this
> case the hash table size was 8209.
> 
> 13856 of these strings were hashed to the same index= 1067.
> 303 strings got index = 8061.
> 2 strings got the index = 754.
> 8201 entries were empty.
> 
> We are running guile 1.6 but I checked the scm_string_hash from a recent
> 1.7 CVS also and the function in hash.c there is identical.
> 
> I added a few of the symbols hashing to 1067 below. One can of course
> argue that the symbols in this case should be hashed as numbers.
> Anyway, does anyone have any hint or have a better string hash function?
> 
> 	Best regards
> 	Roland Orre
> 
> 
> A few strings hashing to entry 1067 for hash table length 8209:
> "01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
> "01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
> "01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
> "01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
> "01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
> "01623101" "01622901" "01622801" "01622701" "01622601" "01622401"
> 
> 
> 
> 
> _______________________________________________
> Guile-user mailing list
> Guile-user@gnu.org
> http://mail.gnu.org/mailman/listinfo/guile-user


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


  reply	other threads:[~2003-11-14 15:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-13 21:55 cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
2003-11-14  8:26 ` Ludovic Courtès
2003-11-14 13:10   ` Thien-Thi Nguyen
2003-11-14 13:37     ` Ludovic Courtès
2003-11-14 17:38       ` Thien-Thi Nguyen
2003-11-14 14:29   ` Marius Vollmer
2003-11-14 14:17 ` Marius Vollmer
2003-11-14 15:28   ` Does anyone have a better scm_string_hash ? Roland Orre
2003-11-14 15:51     ` Ludovic Courtès [this message]
2003-11-17  8:33       ` Roland Orre
2003-11-17 13:01         ` Ludovic Courtès
2003-11-17 15:42         ` Marius Vollmer
2003-11-17 16:02           ` Marius Vollmer
2003-11-17 16:29             ` Marius Vollmer
2003-11-17 16:48               ` Allister MacLeod
2003-11-17 17:57                 ` Marius Vollmer
2003-11-17 19:17                   ` OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?) Allister MacLeod
2003-11-17 21:27                     ` OT: x86 assembly timings/size Marius Vollmer
2003-11-19  9:04             ` Does anyone have a better scm_string_hash ? Ludovic Courtès
2003-11-19 15:02               ` Marius Vollmer
2003-11-14 17:40   ` cmod-play 1 available + modsup.h additions Thien-Thi Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031114155148.GI16650@powergnu.laas.fr \
    --to=ludovic.courtes@laas.fr \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).