From: HJSTEIN@bloomberg.com (Harvey J. Stein)
Cc: Greg Troxel <gdt@ir.bbn.com>
Subject: Re: Resizing hash tables in Guile
Date: 13 Feb 2003 08:55:23 -0500 [thread overview]
Message-ID: <kiw3cmsl1ys.fsf@blinky.bloomberg.com> (raw)
In-Reply-To: Mikael Djurfeldt's message of "Thu, 13 Feb 2003 10:35:40 +0100"
Mikael Djurfeldt <djurfeldt@nada.kth.se> writes:
> Regarding reshuffling time: Yes, rehuffling means that every operation
> isn't O(1), but it *does* mean that they are O(1) on average. You can
> understand this by a trick sometimes used in algorithm run-time
> analysis called "amortization":
>
> The idea is that for every operation on the hash table you "pay" some
> extra virtual time, and the sum of this time can then be used to
> account for the reshuffling time. In my implementation, the hash
> table is roughly doubled for each rehash. Before rehashing occurs,
> you have inserted N elements. This has cost you less than cN seconds.
> Rehashing is O(2N) = O(N), so we can say it will cost us less than dN
> seconds. If we now pay off d seconds per operation in advance, and
> note that the argument above holds equally well for each rehashing
> point, we realize that each operation costs less than c + d seconds on
> average. This means that, on average, operations are O(1).
Inserts are, but lookups aren't necessarily. Lookups being O(1)
requires uniformity of bucket sizes. Worst case hash table lookup
time is still O(N). And good hashing functions are still hard to
write.
People overestimate log(N) and overuse O(). When comparing an O(1)
algorithm to an O(log(N)) algorithm, it really comes down to the
actual functions involved, and actual problem size, not just the
asymptotic behavior. 2^32 is over 4,000,000,000. With this many
items, log(N) is still just 32, so an O(log(N)) algorithm will still
beat an O(1) algorithm if it's really log_2(N) vs 32. And you can't
have 2^32 items in your hash table unless this is on a 64bit machine.
And with this many items in your hash table, you'd need 16gb (32gb on
a 64bit machine) just to store the addresses, let alone the keys and
the items themselves, so by this time you're probably hitting
the swap so hard that the whole question is moot.
Also, if a person's relying on O(1) for hash table performance, it might be
not because they need that on average, but because they need an upper
bound on the operation time, in which case automatic resizing would
also violate this, even though it maintains O(1) on average.
Trees also sort the data for you, which hash tables don't give you.
So, ideally, one would have a hash table object with & without
resizing, and various sorts of tree (AVL, red/black, B*, etc) objects.
insert and delete and map would be methods that work on all of the
above, with map on trees returning the data in sorted order. For that
matter, insert & delete might as well also work on lists...
BTW, the hash table implementation in Tcl seems to be quite good.
--
Harvey Stein
Bloomberg LP
hjstein@bloomberg.com
_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user
next prev parent reply other threads:[~2003-02-13 13:55 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.GSO.3.96.1030208160743.22945E-100000@anh>
[not found] ` <87lm0o7951.fsf@alice.rotty.yi.org>
[not found] ` <rmiu1fcnrhj.fsf@fnord.ir.bbn.com>
[not found] ` <1044889242.1033.310.camel@localhost>
[not found] ` <xy77kc8krhr.fsf@nada.kth.se>
[not found] ` <xy74r7ckqmy.fsf@nada.kth.se>
2003-02-11 13:59 ` Resizing hash tables in Guile Mikael Djurfeldt
2003-02-11 17:34 ` Roland Orre
[not found] ` <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de>
2003-02-12 17:47 ` Mikael Djurfeldt
2003-02-12 20:44 ` Rob Browning
2003-02-12 16:10 ` Marius Vollmer
2003-02-12 17:53 ` Mikael Djurfeldt
2003-02-12 20:17 ` Roland Orre
2003-02-13 9:35 ` Mikael Djurfeldt
2003-02-13 13:55 ` Harvey J. Stein [this message]
2003-02-13 14:24 ` Joris van der Hoeven
2003-02-13 18:30 ` Harvey J. Stein
2003-02-13 20:02 ` Paul Jarc
2003-02-13 9:52 ` Joris van der Hoeven
2003-02-12 20:55 ` Rob Browning
2003-02-13 10:43 ` Mikael Djurfeldt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=kiw3cmsl1ys.fsf@blinky.bloomberg.com \
--to=hjstein@bloomberg.com \
--cc=gdt@ir.bbn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).