From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: HJSTEIN@bloomberg.com (Harvey J. Stein)
Newsgroups: gmane.lisp.guile.user,gmane.lisp.guile.devel
Subject: Re: Resizing hash tables in Guile
Date: 13 Feb 2003 08:55:23 -0500
Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org
Message-ID: <kiw3cmsl1ys.fsf@blinky.bloomberg.com>
References: <Pine.GSO.3.96.1030208160743.22945E-100000@anh>
	<87lm0o7951.fsf@alice.rotty.yi.org> <rmiu1fcnrhj.fsf@fnord.ir.bbn.com>
	<1044889242.1033.310.camel@localhost> <xy77kc8krhr.fsf@nada.kth.se>
	<xy74r7ckqmy.fsf@nada.kth.se> <xy78ywmc40m.fsf_-_@nada.kth.se>
	<87fzqtjx96.fsf@zagadka.ping.de> <xy71y2d2xo7.fsf@nada.kth.se>
	<1045081065.1033.451.camel@localhost> <xy7vfzo8qvn.fsf@nada.kth.se>
NNTP-Posting-Host: main.gmane.org
X-Trace: main.gmane.org 1045144736 21646 80.91.224.249 (13 Feb 2003 13:58:56 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Thu, 13 Feb 2003 13:58:56 +0000 (UTC)
Cc: Greg Troxel <gdt@ir.bbn.com>
Return-path: <guile-user-bounces+guile-user=m.gmane.org@gnu.org>
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18jJqM-0005Ry-00
	for <guile-user@m.gmane.org>; Thu, 13 Feb 2003 14:56:02 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18jJs4-0004G5-05
	for guile-user@m.gmane.org; Thu, 13 Feb 2003 08:57:48 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18jJqy-00037o-00
	for guile-user@gnu.org; Thu, 13 Feb 2003 08:56:40 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18jJqW-0002U4-00
	for guile-user@gnu.org; Thu, 13 Feb 2003 08:56:13 -0500
Original-Received: from mh3dmz4.bloomberg.net ([205.216.112.169])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18jJpq-00027g-00; Thu, 13 Feb 2003 08:55:30 -0500
Original-Received: from mh5ny.bloomberg.com by mh3dmz4.bloomberg.net with ESMTP;
	Thu, 13 Feb 2003 08:55:27 -0500
Original-Received: from blinky.bloomberg.com by mh5ny.bloomberg.com with ESMTP;
	Thu, 13 Feb 2003 08:55:08 -0500
Original-Received: (from hjstein@localhost)
	by blinky.bloomberg.com (8.11.6/8.11.6) id h1DDtN729889;
	Thu, 13 Feb 2003 08:55:23 -0500
X-Authentication-Warning: blinky.bloomberg.com: hjstein set sender to
	hjstein@bloomberg.com using -f
Original-To: djurfeldt@nada.kth.se
In-Reply-To: Mikael Djurfeldt's message of "Thu, 13 Feb 2003 10:35:40 +0100"
Original-Lines: 55
X-Mailer: Gnus v5.7/Emacs 20.7
Original-cc: guile-user@gnu.org
Original-cc: Roland Orre <orre@nada.kth.se>
Original-cc: Joris van der Hoeven <TeXmacs@math.u-psud.fr>
Original-cc: guile-devel@gnu.org
Original-cc: Andreas Rottmann <a.rottmann@gmx.at>
Original-cc: Marius Vollmer <mvo@zagadka.de>
X-BeenThere: guile-user@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: General Guile related discussions <guile-user.gnu.org>
List-Help: <mailto:guile-user-request@gnu.org?subject=help>
List-Post: <mailto:guile-user@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/guile-user>,
	<mailto:guile-user-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/guile-user>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/guile-user>,
	<mailto:guile-user-request@gnu.org?subject=unsubscribe>
Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.lisp.guile.user:1648 gmane.lisp.guile.devel:1929
X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.user:1648


Mikael Djurfeldt <djurfeldt@nada.kth.se> writes:

 > Regarding reshuffling time: Yes, rehuffling means that every operation
 > isn't O(1), but it *does* mean that they are O(1) on average.  You can
 > understand this by a trick sometimes used in algorithm run-time
 > analysis called "amortization":
 > 
 > The idea is that for every operation on the hash table you "pay" some
 > extra virtual time, and the sum of this time can then be used to
 > account for the reshuffling time.  In my implementation, the hash
 > table is roughly doubled for each rehash.  Before rehashing occurs,
 > you have inserted N elements.  This has cost you less than cN seconds.
 > Rehashing is O(2N) = O(N), so we can say it will cost us less than dN
 > seconds.  If we now pay off d seconds per operation in advance, and
 > note that the argument above holds equally well for each rehashing
 > point, we realize that each operation costs less than c + d seconds on
 > average.  This means that, on average, operations are O(1).

Inserts are, but lookups aren't necessarily.  Lookups being O(1)
requires uniformity of bucket sizes.  Worst case hash table lookup
time is still O(N).  And good hashing functions are still hard to
write.

People overestimate log(N) and overuse O().  When comparing an O(1)
algorithm to an O(log(N)) algorithm, it really comes down to the
actual functions involved, and actual problem size, not just the
asymptotic behavior.  2^32 is over 4,000,000,000.  With this many
items, log(N) is still just 32, so an O(log(N)) algorithm will still
beat an O(1) algorithm if it's really log_2(N) vs 32.  And you can't
have 2^32 items in your hash table unless this is on a 64bit machine.
And with this many items in your hash table, you'd need 16gb (32gb on
a 64bit machine) just to store the addresses, let alone the keys and
the items themselves, so by this time you're probably hitting
the swap so hard that the whole question is moot.

Also, if a person's relying on O(1) for hash table performance, it might be
not because they need that on average, but because they need an upper
bound on the operation time, in which case automatic resizing would
also violate this, even though it maintains O(1) on average.

Trees also sort the data for you, which hash tables don't give you.

So, ideally, one would have a hash table object with & without
resizing, and various sorts of tree (AVL, red/black, B*, etc) objects.
insert and delete and map would be methods that work on all of the
above, with map on trees returning the data in sorted order.  For that
matter, insert & delete might as well also work on lists...

BTW, the hash table implementation in Tcl seems to be quite good.

-- 
Harvey Stein
Bloomberg LP
hjstein@bloomberg.com


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user