Re: Growable arrays? - David Kastrup

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: David Kastrup <dak@gnu.org>
To: Krister Svanlund <krister.svanlund@gmail.com>
Cc: guile-devel@gnu.org
Subject: Re: Growable arrays?
Date: Sat, 09 Jun 2012 19:35:39 +0200	[thread overview]
Message-ID: <87aa0ctml0.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CAO_vGe8VTs589u9Ei42utuAqBrSGqUg9dgJSZ=JgeYcAwy3w3w@mail.gmail.com> (Krister Svanlund's message of "Sat, 9 Jun 2012 16:43:27 +0200")

Krister Svanlund <krister.svanlund@gmail.com> writes:

> On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <dak@gnu.org> wrote:
>
>
>     One principal distinguishing feature, like with a Scheme
>     hashtable, is the ability to grow on-demand.
>     
>     Scheme/Guile vectors are fixed size.
>
>     It is a bit of a nuisance that one can grow a hashtable
>     efficiently and on-demand, but not so an array.
>     
>     After all, there already _is_ such a mechanism underlying hash
>     tables so it seems somewhat peculiar not to have it available for
>     vectors as well.
>
> I don't know how much you know about data structures,

I do list the various implementations and details.

> and I must confess I'm not very educated on Guile or Luas
> implementations.

And I do list the details here.  Since I do it in free prose, chances
are that I am not just quoting material I have not understood.

> Based on what you are writing I would assume that the scheme
> hashtables aren't growable in the same way as a vector has to be
> growable.

I don't see anything supporting this assumption in what I wrote.  Nor in
Guile's documentation.

    5.6.12 Hash Tables
    ------------------

    Hash tables are dictionaries which offer similar functionality as
    association lists: They provide a mapping from keys to values.  The
    difference is that association lists need time linear in the size of
    elements when searching for entries, whereas hash tables can normally
    search in constant time.  The drawback is that hash tables require a
    little bit more memory, and that you can not use the normal list
    procedures (*note Lists::) for working with them.

       Guile provides two types of hashtables.  One is an abstract data type
    that can only be manipulated with the functions in this section.  The
    other type is concrete: it uses a normal vector with alists as
    elements.  The advantage of the abstract hash tables is that they will
    be automatically resized when they become too full or too empty.

[...]

    6.4.25.1 Creating hash tables
    .............................

     -- Scheme Procedure: make-hash-table [equal-proc hash-proc #:weak
              weakness start-size]
         Create and answer a new hash table with EQUAL-PROC as the equality
         function and HASH-PROC as the hashing function.

    [...]

         As a legacy of the time when Guile couldn't grow hash tables,
         START-SIZE is an optional integer argument that specifies the
         approximate starting size for the hash table, which will be
         rounded to an algorithmically-sounder number.

> The number of elements in a hashtable isn't limited by it's "size".
> They are often implemented as each position (where the hashtables size
> is the number of positions) being a linked list giving the hashtable
> (in theory) limitless actual size.

However, if the number of hash buckets is not grown along with the
number of entries, hashtable access is O(n) in cost rather than O(1)
since after the initial split into hash buckets, the cost is that of
linear search.  This is the difference in behavior between hashtables in
Guile 1.4 (?) with fixed size, and hashtables in 1.6+ with variable
size.

> Growing a vector/array involves having to allocate new continuous
> memory and copying all the elements there, so for example in C++ (i
> think) the std:vector is increased by half it's current size each time
> meaning that the more expensive the copying gets the more elements you
> can insert into the vector before it has to resize.

Sure: since the growth happens with exponential backoff, the amortized
cost for n entries is O(n).

> I would assume it wouldn't be that difficult to implement a pretty
> efficient growable vector for scheme.

Since that already is what is used internally in hashtables it can't be
difficult...  The advantage of growing a hashtable is that you don't
have waste: if you double the size of a hashtable, it means that you
split each bucket in two, and potentially any bucket after the split can
contain new data.  In contrast, after a similar vector resize, half of
the buckets are _guaranteed_ not to contain data.  You can reduce the
waste by using less than exponential backoff, but then the amortized
cost is no longer O(n).

Anyway: your answer was based on the assumption that I did not do my
homework before asking, and that two people not reading documentation
might guess better than one person not reading documentation.

I hope I have now provided adequate coverage concerning this hypothesis
so that it should be possible to focus on the smaller set of remaining
ones.

-- 
David Kastrup

next prev parent reply	other threads:[~2012-06-09 17:35 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-09 12:32 Growable arrays? David Kastrup
2012-06-09 14:43 ` Krister Svanlund
2012-06-09 17:35   ` David Kastrup [this message]
2012-06-11  4:23 ` Daniel Hartwig
2012-06-11  4:37   ` David Kastrup
2012-06-11  5:00     ` Daniel Hartwig
2012-06-11  7:25       ` David Kastrup
2012-06-11  9:01         ` Daniel Hartwig
2012-06-11  9:13           ` Daniel Hartwig
2012-06-11 10:38             ` David Kastrup
2012-06-11 11:57               ` Daniel Hartwig
2012-06-11 12:13         ` Noah Lavine
2012-06-11 12:28           ` David Kastrup
2012-06-11 23:50             ` Mark H Weaver
2012-06-12  9:34               ` David Kastrup
2012-06-12 20:34                 ` Mark H Weaver
2012-06-12 20:47                   ` David Kastrup
2012-06-12 21:03                     ` Mark H Weaver
2012-06-12 21:18                       ` David Kastrup
2012-06-11  8:14 ` Thien-Thi Nguyen
2012-06-11  9:08 ` Andy Wingo
2012-06-11  9:55   ` David Kastrup
2012-06-11 11:25     ` Andy Wingo
2012-06-11 12:00       ` David Kastrup
2012-06-11 12:12         ` David Kastrup
2012-06-11 12:20           ` David Kastrup
2012-06-11 13:04             ` Daniel Hartwig
2012-06-11 14:19               ` David Kastrup
2012-06-11 15:24                 ` Stefan Israelsson Tampe
2012-06-11 15:27                 ` Andy Wingo
2012-06-11 16:03                   ` David Kastrup
2012-06-11 12:20         ` Daniel Hartwig
2012-06-11 12:36           ` David Kastrup
2012-06-11 12:02 ` Ludovic Courtès
2012-06-12 13:36 ` Hans Aberg
2012-06-14 14:33 ` Mark H Weaver
2012-06-14 14:47   ` David Kastrup
2012-06-14 15:23     ` Daniel Hartwig
2012-06-14 15:34       ` David Kastrup
2012-06-14 16:56         ` Daniel Hartwig
2012-06-14 17:15           ` David Kastrup
2012-06-14 17:23             ` Daniel Hartwig
2012-06-14 17:49               ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87aa0ctml0.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=krister.svanlund@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).