From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.user,gmane.lisp.guile.devel Subject: Re: Resizing hash tables in Guile Date: Thu, 13 Feb 2003 10:35:40 +0100 Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Message-ID: References: <87lm0o7951.fsf@alice.rotty.yi.org> <1044889242.1033.310.camel@localhost> <87fzqtjx96.fsf@zagadka.ping.de> <1045081065.1033.451.camel@localhost> Reply-To: djurfeldt@nada.kth.se NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1045129002 738 80.91.224.249 (13 Feb 2003 09:36:42 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 13 Feb 2003 09:36:42 +0000 (UTC) Cc: Greg Troxel Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18jFnE-0000BR-00 for ; Thu, 13 Feb 2003 10:36:32 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18jFn8-0000bX-06 for guile-user@m.gmane.org; Thu, 13 Feb 2003 04:36:26 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18jFmZ-0000QF-00 for guile-user@gnu.org; Thu, 13 Feb 2003 04:35:51 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18jFmW-0000Lm-00 for guile-user@gnu.org; Thu, 13 Feb 2003 04:35:50 -0500 Original-Received: from kvast.blakulla.net ([213.212.20.77]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18jFmS-0000CF-00; Thu, 13 Feb 2003 04:35:44 -0500 Original-Received: from dyna224-219.nada.kth.se ([130.237.224.219] helo=linnaeus) by kvast.blakulla.net with esmtp (Exim 3.36 #1 (Debian)) id 18jFmP-0001Rv-00; Thu, 13 Feb 2003 10:35:41 +0100 Original-Received: from mdj by linnaeus with local (Exim 3.36 #1 (Debian)) id 18jFmO-0000KY-00; Thu, 13 Feb 2003 10:35:40 +0100 Original-To: Roland Orre In-Reply-To: <1045081065.1033.451.camel@localhost> (Roland Orre's message of "12 Feb 2003 21:17:46 +0100") User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu) Original-cc: djurfeldt@nada.kth.se Original-cc: guile-user@gnu.org Original-cc: Joris van der Hoeven Original-cc: guile-devel@gnu.org Original-cc: Andreas Rottmann Original-cc: Marius Vollmer X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: General Guile related discussions List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.user:1645 gmane.lisp.guile.devel:1926 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:1926 Roland Orre writes: > On Wed, 2003-02-12 at 18:53, Mikael Djurfeldt wrote: >> I've been thinking that maybe we should continue the move and let the >> resizing tables entirely replace the fixed size ones. [...] > I think that this is a a good idea, but I would like the optional size > argument to still be there as an initial guess about the number of > items to expect [...] Agreed. > Of course this additional level of abstraction is nice as it also gives > a future option to use trees if the user e.g gives an optional > comparision operator. > > By the way, Joris van der Hoeven mentioned that this type of resizing > hash tables are faster than trees, which have a time complexity > of O(log n)+reshuffling time if they are balanced. Do you have any > numbers showing how much faster and if possible if there are any > conditions? The reshuffling time will grow at least O(n) when the > size of our linear tables increases if I have understood right. I doubt that we'll ever use trees under the hash table API, the reason being that the user expects O(1) for hash table operations. Even though O(log n) grows slowly, it is substantially different for large enough data sets. Regarding reshuffling time: Yes, rehuffling means that every operation isn't O(1), but it *does* mean that they are O(1) on average. You can understand this by a trick sometimes used in algorithm run-time analysis called "amortization": The idea is that for every operation on the hash table you "pay" some extra virtual time, and the sum of this time can then be used to account for the reshuffling time. In my implementation, the hash table is roughly doubled for each rehash. Before rehashing occurs, you have inserted N elements. This has cost you less than cN seconds. Rehashing is O(2N) = O(N), so we can say it will cost us less than dN seconds. If we now pay off d seconds per operation in advance, and note that the argument above holds equally well for each rehashing point, we realize that each operation costs less than c + d seconds on average. This means that, on average, operations are O(1). Mikael _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user