From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.user Subject: Re: Efficiency and flexibility of hash-tables Date: Mon, 10 Feb 2003 17:52:32 +0100 Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Message-ID: References: <87lm0o7951.fsf@alice.rotty.yi.org> <1044889242.1033.310.camel@localhost> Reply-To: djurfeldt@nada.kth.se NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1044897061 802 80.91.224.249 (10 Feb 2003 17:11:01 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 10 Feb 2003 17:11:01 +0000 (UTC) Cc: Greg Troxel Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18iHSK-0000CU-00 for ; Mon, 10 Feb 2003 18:10:57 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18iHPG-00013J-03 for guile-user@m.gmane.org; Mon, 10 Feb 2003 12:07:46 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18iHNh-0000se-00 for guile-user@gnu.org; Mon, 10 Feb 2003 12:06:09 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18iHNc-0000q7-00 for guile-user@gnu.org; Mon, 10 Feb 2003 12:06:05 -0500 Original-Received: from kvast.blakulla.net ([213.212.20.77]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18iHAu-0007RU-00 for guile-user@gnu.org; Mon, 10 Feb 2003 11:52:57 -0500 Original-Received: from dyna224-225.nada.kth.se ([130.237.224.225] helo=linnaeus) by kvast.blakulla.net with esmtp (Exim 3.36 #1 (Debian)) id 18iHAZ-0007oo-00; Mon, 10 Feb 2003 17:52:35 +0100 Original-Received: from mdj by linnaeus with local (Exim 3.36 #1 (Debian)) id 18iHAW-0004eO-00; Mon, 10 Feb 2003 17:52:32 +0100 Original-To: Roland Orre In-Reply-To: <1044889242.1033.310.camel@localhost> (Roland Orre's message of "10 Feb 2003 16:00:42 +0100") User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu) Original-cc: Joris van der Hoeven Original-cc: Andreas Rottmann Original-cc: guile-user@gnu.org Original-cc: djurfeldt@nada.kth.se X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: General Guile related discussions List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.user:1622 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.user:1622 Roland Orre writes: (To Roland: Ledsen att jag inte hunnit svara tidigare.) > My idea was to modify the routines > scm_c_make_hash_table in hashtab.c, > scm_make_weak_key_hash_table and scm_make_weak_value_hash_table in > weaks.c so they allocate three extra items in the mallocated block > but hides this to the gc by setting the actual vector length to the > ordinary one. These extra items will contain the actual size in > number of items stored in the table and current upper and lower > thresholds for resizing the table. > > Then modify the routines: > scm_hash_fn_create_handle_x and scm_hash_fn_remove_x in hashtab.c > to do resizing when needed, using a table of prime numbers for next > and previous size. > > The advantage with this solution is that it's minimalistic, i.e. all > routines working with hash tables will behave exactly the same. Only > the routines changing the number of items in the table will be affected. [...] > Comments? Firstly: People have suggested that we should re-use other implementations of rehashing (resizable, adaptive, ...) hash-tables. However, the kind of hash-tables we're using in Guile is basic stuff described in any computer science textbook. When I asked for a patch, I asked for the Guile implemention. Also, hash-tables are used throughout Guile so we need them in the core, not as a loadable library. Now a comment on Roland's suggestion: If we want to store extra information in the hash-table object, in the way you suggest, that information will be hidden from the user unless we provide special selector functions which can extract it. But this defeats the very purpose of representing a hash-table as a simple Scheme type, and we could just as well provide a "real" hash-table object, isn't that so? Also, how would a hash-function, when receiving a hash-table represented as a vector, know that it safely can access memory locations after it's last index? I also think this would obfuscate the source. What about this even more minimalistic solution: The idea of having an upper threshold comes from the idea of maintaining the load factor of the hash table in a reasonable interval. But the purpose of the load factor is to prevent to large clusters forming in the same hash bucket. What about simply counting the pairs in a bucket during insertion and rehashing when we reach a threshold? Then no extra data needs to be stored in the table. (We would need to globally serialize rehashing with an external mutex, though.) One disadvantage of this solution is that since we're putting a hard limit on the maximal length of a bucket list, the number of buckets will be sensitive to things causing collisions, such as bad hash functions or bad luck. Another disadvantage is that the tables wouldn't shrink. Is this an issue? Mikael _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user