unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Roland Orre <orre@nada.kth.se>
Subject: Does anyone have a better scm_string_hash ?
Date: Fri, 14 Nov 2003 19:11:35 +0100	[thread overview]
Message-ID: <1068833495.13117.57.camel@localhost> (raw)

I've earlier posted this to guile-user but I think it's such a
serious issue, as string hashing is essential to an interactive
language, that it needs to be addressed also here.
---

I noticed that our data mining software run very very slow with a
new data base. I localized the problem to scm_string_hash.

A hash table in this case was loaded with 14166 strings. I have
a function which creates a reasonable sized hash table, in this
case the hash table size was 8209.

13856 of these strings were hashed to the same index= 1067.
303 strings got index = 8061.
2 strings got the index = 754.
8201 entries were empty.

We are running guile 1.6 but I checked the scm_string_hash from a recent
1.7 CVS also and the function in hash.c there is identical.

I added a few of the symbols hashing to 1067 below. One can of course
argue that the symbols in this case should be hashed as numbers.
Anyway, does anyone have any hint or have a better string hash function?

	Best regards
	Roland Orre


A few strings hashing to entry 1067 for hash table length 8209:
"01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
"01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
"01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
"01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
"01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
"01623101" "01622901" "01622801" "01622701" "01622601" "01622401"





_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


             reply	other threads:[~2003-11-14 18:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-14 18:11 Roland Orre [this message]
2003-11-14 20:09 ` Does anyone have a better scm_string_hash ? Andreas Rottmann
     [not found] <E1AKPRR-00036l-00@surf.glug.org>
     [not found] ` <8765hnf308.fsf@zagadka.ping.de>
     [not found]   ` <1068823738.13123.54.camel@localhost>
     [not found]     ` <20031114155148.GI16650@powergnu.laas.fr>
     [not found]       ` <1069058032.1638.21.camel@localhost>
     [not found]         ` <874qx3rogk.fsf@zagadka.ping.de>
2003-11-17 16:02           ` Marius Vollmer
2003-11-17 16:29             ` Marius Vollmer
2003-11-18 20:15               ` Kevin Ryde
2003-11-19  9:04             ` Ludovic Courtès
2003-11-19 15:02               ` Marius Vollmer
  -- strict thread matches above, loose matches on Subject: below --
2003-11-20 15:48 Stephen Compall
2003-11-20 17:29 ` Marius Vollmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1068833495.13117.57.camel@localhost \
    --to=orre@nada.kth.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).