From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Roland Orre Newsgroups: gmane.lisp.guile.devel Subject: Does anyone have a better scm_string_hash ? Date: Fri, 14 Nov 2003 19:11:35 +0100 Organization: Royal Institute of Technology Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: <1068833495.13117.57.camel@localhost> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1068834040 6690 80.91.224.253 (14 Nov 2003 18:20:40 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 14 Nov 2003 18:20:40 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Nov 14 19:20:37 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AKiYe-0004P7-00 for ; Fri, 14 Nov 2003 19:20:36 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AKjS2-0000ck-Lw for guile-devel@m.gmane.org; Fri, 14 Nov 2003 14:17:50 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AKjRk-0000bx-DY for guile-devel@gnu.org; Fri, 14 Nov 2003 14:17:32 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AKjRD-0000FM-0R for guile-devel@gnu.org; Fri, 14 Nov 2003 14:17:30 -0500 Original-Received: from [130.237.222.202] (helo=smtp.nada.kth.se) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.24) id 1AKjRC-0000Dw-Ge for guile-devel@gnu.org; Fri, 14 Nov 2003 14:16:58 -0500 Original-Received: from c640 (h148n2fls33o875.telia.com [217.208.54.148]) (authenticated bits=0) by smtp.nada.kth.se (8.12.10/8.12.1) with ESMTP id hAEIFVEf001780 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 14 Nov 2003 19:15:32 +0100 (MET) Original-To: guile-devel@gnu.org X-Mailer: Ximian Evolution 1.4.5 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:3018 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:3018 I've earlier posted this to guile-user but I think it's such a serious issue, as string hashing is essential to an interactive language, that it needs to be addressed also here. --- I noticed that our data mining software run very very slow with a new data base. I localized the problem to scm_string_hash. A hash table in this case was loaded with 14166 strings. I have a function which creates a reasonable sized hash table, in this case the hash table size was 8209. 13856 of these strings were hashed to the same index= 1067. 303 strings got index = 8061. 2 strings got the index = 754. 8201 entries were empty. We are running guile 1.6 but I checked the scm_string_hash from a recent 1.7 CVS also and the function in hash.c there is identical. I added a few of the symbols hashing to 1067 below. One can of course argue that the symbols in this case should be hashed as numbers. Anyway, does anyone have any hint or have a better string hash function? Best regards Roland Orre A few strings hashing to entry 1067 for hash table length 8209: "01632001" "01627301" "01626801" "01626601" "01626501" "01626401" "01626301" "01626101" "01626001" "01625901" "01625801" "01625701" "01625401" "01625301" "01625101" "01625001" "01624801" "01624601" "01624501" "01624401" "01624301" "01624101" "01624001" "01623901" "01623801" "01623701" "01623601" "01623501" "01623401" "01623201" "01623101" "01622901" "01622801" "01622701" "01622601" "01622401" _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel