From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Ludovic =?iso-8859-1?Q?Court=E8s?= Newsgroups: gmane.lisp.guile.user Subject: Re: Does anyone have a better scm_string_hash ? Date: Fri, 14 Nov 2003 16:51:55 +0100 Organization: LAAS-CNRS Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Message-ID: <20031114155148.GI16650@powergnu.laas.fr> References: <8765hnf308.fsf@zagadka.ping.de> <1068823738.13123.54.camel@localhost> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1068825543 17521 80.91.224.253 (14 Nov 2003 15:59:03 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 14 Nov 2003 15:59:03 +0000 (UTC) Cc: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Fri Nov 14 16:59:00 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AKgLc-0001ZP-00 for ; Fri, 14 Nov 2003 16:59:00 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AKhEb-0001Ra-6F for guile-user@m.gmane.org; Fri, 14 Nov 2003 11:55:49 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AKhDd-0000mn-6T for guile-user@gnu.org; Fri, 14 Nov 2003 11:54:49 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AKhCd-0007QO-C6 for guile-user@gnu.org; Fri, 14 Nov 2003 11:54:18 -0500 Original-Received: from [140.93.0.15] (helo=laas.laas.fr) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.24) id 1AKhCF-0006vw-2v for guile-user@gnu.org; Fri, 14 Nov 2003 11:53:23 -0500 Original-Received: by laas.laas.fr (8.12.10/8.12.10) with SMTP id hAEFpuAi024482; Fri, 14 Nov 2003 16:51:56 +0100 (CET) Original-To: Roland Orre Mail-Followup-To: Roland Orre , guile-user@gnu.org Content-Disposition: inline In-Reply-To: <1068823738.13123.54.camel@localhost> X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-PGP-Key-ID: 0xEB1F5364 X-PGP-Key: http://ludo.humanoidz.org/ludovic.asc X-OS: GNU/Linux X-URL: http://ludo.humanoidz.org/ User-Agent: Mutt/1.5.4i [Guile enabled] X-Spam-Score: -5 () EMAIL_ATTRIBUTION, IN_REP_TO, QUOTED_EMAIL_TEXT, REFERENCES, REPLY_WITH_QUOTES, USER_AGENT_MUTT X-Scanned-By: MIMEDefang at CNRS-LAAS X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.user:2387 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.user:2387 You might want to look at http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html . Basically, the idea there is that a hash key for string cn..c0 is: h = (c0 + c1*37 + c2*37^2 + ...) % hash_size. I used that and it *seems* to work quite well. This can be written as follows: k = 0; while (*string) { k = (k * 37) + (*string); k %= hash_size; string++; } return k; Cheers, Ludovic. Today, 15 minutes, 40 seconds ago, Roland Orre wrote: > I noticed that our data mining software run very very slow with a > new data base. I localized the problem to scm_string_hash. > > A hash table in this case was loaded with 14166 strings. I have > a function which creates a reasonable sized hash table, in this > case the hash table size was 8209. > > 13856 of these strings were hashed to the same index= 1067. > 303 strings got index = 8061. > 2 strings got the index = 754. > 8201 entries were empty. > > We are running guile 1.6 but I checked the scm_string_hash from a recent > 1.7 CVS also and the function in hash.c there is identical. > > I added a few of the symbols hashing to 1067 below. One can of course > argue that the symbols in this case should be hashed as numbers. > Anyway, does anyone have any hint or have a better string hash function? > > Best regards > Roland Orre > > > A few strings hashing to entry 1067 for hash table length 8209: > "01632001" "01627301" "01626801" "01626601" "01626501" "01626401" > "01626301" "01626101" "01626001" "01625901" "01625801" "01625701" > "01625401" "01625301" "01625101" "01625001" "01624801" "01624601" > "01624501" "01624401" "01624301" "01624101" "01624001" "01623901" > "01623801" "01623701" "01623601" "01623501" "01623401" "01623201" > "01623101" "01622901" "01622801" "01622701" "01622601" "01622401" > > > > > _______________________________________________ > Guile-user mailing list > Guile-user@gnu.org > http://mail.gnu.org/mailman/listinfo/guile-user _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user