unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mikael Djurfeldt <djurfeldt@nada.kth.se>
Cc: Greg Troxel <gdt@ir.bbn.com>
Subject: Re: Resizing hash tables in Guile
Date: Thu, 13 Feb 2003 10:35:40 +0100	[thread overview]
Message-ID: <xy7vfzo8qvn.fsf@nada.kth.se> (raw)
In-Reply-To: <1045081065.1033.451.camel@localhost> (Roland Orre's message of "12 Feb 2003 21:17:46 +0100")

Roland Orre <orre@nada.kth.se> writes:

> On Wed, 2003-02-12 at 18:53, Mikael Djurfeldt wrote:
>> I've been thinking that maybe we should continue the move and let the
>> resizing tables entirely replace the fixed size ones.
[...]

> I think that this is a a good idea, but I would like the optional size
> argument to still be there as an initial guess about the number of
> items to expect
[...]

Agreed.

> Of course this additional level of abstraction is nice as it also gives
> a future option to use trees if the user e.g gives an optional
> comparision operator.
>
> By the way, Joris van der Hoeven mentioned that this type of resizing
> hash tables are faster than trees, which have a time complexity
> of O(log n)+reshuffling time if they are balanced. Do you have any
> numbers showing how much faster and if possible if there are any
> conditions? The reshuffling time will grow at least O(n) when the
> size of our linear tables increases if I have understood right.

I doubt that we'll ever use trees under the hash table API, the reason
being that the user expects O(1) for hash table operations.  Even
though O(log n) grows slowly, it is substantially different for large
enough data sets.

Regarding reshuffling time: Yes, rehuffling means that every operation
isn't O(1), but it *does* mean that they are O(1) on average.  You can
understand this by a trick sometimes used in algorithm run-time
analysis called "amortization":

The idea is that for every operation on the hash table you "pay" some
extra virtual time, and the sum of this time can then be used to
account for the reshuffling time.  In my implementation, the hash
table is roughly doubled for each rehash.  Before rehashing occurs,
you have inserted N elements.  This has cost you less than cN seconds.
Rehashing is O(2N) = O(N), so we can say it will cost us less than dN
seconds.  If we now pay off d seconds per operation in advance, and
note that the argument above holds equally well for each rehashing
point, we realize that each operation costs less than c + d seconds on
average.  This means that, on average, operations are O(1).

Mikael


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


  reply	other threads:[~2003-02-13  9:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.GSO.3.96.1030208160743.22945E-100000@anh>
     [not found] ` <87lm0o7951.fsf@alice.rotty.yi.org>
     [not found]   ` <rmiu1fcnrhj.fsf@fnord.ir.bbn.com>
     [not found]     ` <1044889242.1033.310.camel@localhost>
     [not found]       ` <xy77kc8krhr.fsf@nada.kth.se>
     [not found]         ` <xy74r7ckqmy.fsf@nada.kth.se>
2003-02-11 13:59           ` Resizing hash tables in Guile Mikael Djurfeldt
2003-02-11 17:34             ` Roland Orre
     [not found]               ` <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de>
2003-02-12 17:47                 ` Mikael Djurfeldt
2003-02-12 20:44                   ` Rob Browning
2003-02-12 16:10             ` Marius Vollmer
2003-02-12 17:53               ` Mikael Djurfeldt
2003-02-12 20:17                 ` Roland Orre
2003-02-13  9:35                   ` Mikael Djurfeldt [this message]
2003-02-13 13:55                     ` Harvey J. Stein
2003-02-13 14:24                       ` Joris van der Hoeven
2003-02-13 18:30                         ` Harvey J. Stein
2003-02-13 20:02                           ` Paul Jarc
2003-02-13  9:52                   ` Joris van der Hoeven
2003-02-12 20:55             ` Rob Browning
2003-02-13 10:43               ` Mikael Djurfeldt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xy7vfzo8qvn.fsf@nada.kth.se \
    --to=djurfeldt@nada.kth.se \
    --cc=gdt@ir.bbn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).