unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Growable arrays?
@ 2012-06-09 12:32 David Kastrup
  2012-06-09 14:43 ` Krister Svanlund
                   ` (6 more replies)
  0 siblings, 7 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-09 12:32 UTC (permalink / raw)
  To: guile-devel


Hi,

the main data structure of Lua is a "table", an associative array, and a
table t has a continguous numerically addressed part from 1..#t, with
all other indices going through a hashing mechanism.  One principal
distinguishing feature, like with a Scheme hashtable, is the ability to
grow on-demand.

Scheme/Guile vectors are fixed size.  Now I have a situation where I
have a basic type lattice with records stored in vectors, and this type
lattice may be extended dynamically (which typically happens at the
start of a whole file, for potentially multi-file runs).  Scheme does
not offer a suitable data structure for that.  It is a bit of a nuisance
that one can grow a hashtable efficiently and on-demand, but not so an
array.

Now it would be possible when the type lattice gets extended to store
the new entries in a hashtable and go from there.  Or put them into a
list, and reallocate on first access beyond the existing element.  That
seems rather contorted.  And since there is, if I remember, a project to
run Lua on top of Guile, having a fundamental and reasonably efficient
data structure corresponding to a Lua table, or at least the contiguous
part of a Lua table, would seem like a reasonably useful idea.  After
all, there already _is_ such a mechanism underlying hash tables so it
seems somewhat peculiar not to have it available for vectors as well.

Suggestions?

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
@ 2012-06-09 14:43 ` Krister Svanlund
  2012-06-09 17:35   ` David Kastrup
  2012-06-11  4:23 ` Daniel Hartwig
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Krister Svanlund @ 2012-06-09 14:43 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <dak@gnu.org> wrote:

>
> Hi,
>
> the main data structure of Lua is a "table", an associative array, and a
> table t has a continguous numerically addressed part from 1..#t, with
> all other indices going through a hashing mechanism.  One principal
> distinguishing feature, like with a Scheme hashtable, is the ability to
> grow on-demand.
>
> Scheme/Guile vectors are fixed size.  Now I have a situation where I
> have a basic type lattice with records stored in vectors, and this type
> lattice may be extended dynamically (which typically happens at the
> start of a whole file, for potentially multi-file runs).  Scheme does
> not offer a suitable data structure for that.  It is a bit of a nuisance
> that one can grow a hashtable efficiently and on-demand, but not so an
> array.
>
> Now it would be possible when the type lattice gets extended to store
> the new entries in a hashtable and go from there.  Or put them into a
> list, and reallocate on first access beyond the existing element.  That
> seems rather contorted.  And since there is, if I remember, a project to
> run Lua on top of Guile, having a fundamental and reasonably efficient
> data structure corresponding to a Lua table, or at least the contiguous
> part of a Lua table, would seem like a reasonably useful idea.  After
> all, there already _is_ such a mechanism underlying hash tables so it
> seems somewhat peculiar not to have it available for vectors as well.
>
> Suggestions?
>
> --
> David Kastrup
>
>
>
I don't know how much you know about data structures, and I must confess
I'm not very educated on Guile or Luas implementations. Based on what you
are writing I would assume that the scheme hashtables aren't growable in
the same way as a vector has to be growable. The number of elements in a
hashtable isn't limited by it's "size". They are often implemented as each
position (where the hashtables size is the number of positions) being a
linked list giving the hashtable (in theory) limitless actual size. Growing
a vector/array involves having to allocate new continuous memory and
copying all the elements there, so for example in C++ (i think) the
std:vector is increased by half it's current size each time meaning that
the more expensive the copying gets the more elements you can insert into
the vector before it has to resize.

I would assume it wouldn't be that difficult to implement a pretty
efficient growable vector for scheme.

// Krister Svanlund

[-- Attachment #2: Type: text/html, Size: 2970 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 14:43 ` Krister Svanlund
@ 2012-06-09 17:35   ` David Kastrup
  0 siblings, 0 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-09 17:35 UTC (permalink / raw)
  To: Krister Svanlund; +Cc: guile-devel

Krister Svanlund <krister.svanlund@gmail.com> writes:

> On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <dak@gnu.org> wrote:
>
>
>     One principal distinguishing feature, like with a Scheme
>     hashtable, is the ability to grow on-demand.
>     
>     Scheme/Guile vectors are fixed size.
>
>     It is a bit of a nuisance that one can grow a hashtable
>     efficiently and on-demand, but not so an array.
>     
>     After all, there already _is_ such a mechanism underlying hash
>     tables so it seems somewhat peculiar not to have it available for
>     vectors as well.
>
> I don't know how much you know about data structures,

I do list the various implementations and details.

> and I must confess I'm not very educated on Guile or Luas
> implementations.

And I do list the details here.  Since I do it in free prose, chances
are that I am not just quoting material I have not understood.

> Based on what you are writing I would assume that the scheme
> hashtables aren't growable in the same way as a vector has to be
> growable.

I don't see anything supporting this assumption in what I wrote.  Nor in
Guile's documentation.


    5.6.12 Hash Tables
    ------------------

    Hash tables are dictionaries which offer similar functionality as
    association lists: They provide a mapping from keys to values.  The
    difference is that association lists need time linear in the size of
    elements when searching for entries, whereas hash tables can normally
    search in constant time.  The drawback is that hash tables require a
    little bit more memory, and that you can not use the normal list
    procedures (*note Lists::) for working with them.

       Guile provides two types of hashtables.  One is an abstract data type
    that can only be manipulated with the functions in this section.  The
    other type is concrete: it uses a normal vector with alists as
    elements.  The advantage of the abstract hash tables is that they will
    be automatically resized when they become too full or too empty.

[...]


    6.4.25.1 Creating hash tables
    .............................

     -- Scheme Procedure: make-hash-table [equal-proc hash-proc #:weak
              weakness start-size]
         Create and answer a new hash table with EQUAL-PROC as the equality
         function and HASH-PROC as the hashing function.

    [...]

         As a legacy of the time when Guile couldn't grow hash tables,
         START-SIZE is an optional integer argument that specifies the
         approximate starting size for the hash table, which will be
         rounded to an algorithmically-sounder number.


> The number of elements in a hashtable isn't limited by it's "size".
> They are often implemented as each position (where the hashtables size
> is the number of positions) being a linked list giving the hashtable
> (in theory) limitless actual size.

However, if the number of hash buckets is not grown along with the
number of entries, hashtable access is O(n) in cost rather than O(1)
since after the initial split into hash buckets, the cost is that of
linear search.  This is the difference in behavior between hashtables in
Guile 1.4 (?) with fixed size, and hashtables in 1.6+ with variable
size.

> Growing a vector/array involves having to allocate new continuous
> memory and copying all the elements there, so for example in C++ (i
> think) the std:vector is increased by half it's current size each time
> meaning that the more expensive the copying gets the more elements you
> can insert into the vector before it has to resize.

Sure: since the growth happens with exponential backoff, the amortized
cost for n entries is O(n).

> I would assume it wouldn't be that difficult to implement a pretty
> efficient growable vector for scheme.

Since that already is what is used internally in hashtables it can't be
difficult...  The advantage of growing a hashtable is that you don't
have waste: if you double the size of a hashtable, it means that you
split each bucket in two, and potentially any bucket after the split can
contain new data.  In contrast, after a similar vector resize, half of
the buckets are _guaranteed_ not to contain data.  You can reduce the
waste by using less than exponential backoff, but then the amortized
cost is no longer O(n).

Anyway: your answer was based on the assumption that I did not do my
homework before asking, and that two people not reading documentation
might guess better than one person not reading documentation.

I hope I have now provided adequate coverage concerning this hypothesis
so that it should be possible to focus on the smaller set of remaining
ones.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
  2012-06-09 14:43 ` Krister Svanlund
@ 2012-06-11  4:23 ` Daniel Hartwig
  2012-06-11  4:37   ` David Kastrup
  2012-06-11  8:14 ` Thien-Thi Nguyen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11  4:23 UTC (permalink / raw)
  To: guile-devel

On 9 June 2012 20:32, David Kastrup <dak@gnu.org> wrote:
>
> Hi,
>
> the main data structure of Lua is a "table", an associative array, and a
> table t has a continguous numerically addressed part from 1..#t, with
> all other indices going through a hashing mechanism.  One principal
> distinguishing feature, like with a Scheme hashtable, is the ability to
> grow on-demand.
>

Hello

From <http://www.lua.org/doc/jucs05.pdf> it is clear that these
numerical indices of the Lua table are indeed implemented as an array.
 IIRC the implementation in guile (see branch ‘lua’) is implemented as
a straight hash table, without this array optimization.

Have you considered the vlist data type?  You would have to invert
your indices because it grows at the front, like cons grows a list.
It is also immutable for values already placed in the list, but
nothing prevents you from editing the objects referenced by those
values.

> Now it would be possible when the type lattice gets extended to store
> the new entries in a hashtable and go from there.

Why not use a hash for everything?  Unless your initial lattice is
very large there would be relatively small loss in performance.

If you were going to maintain both a vector and hash you are already
talking about wrapping the accessors, at which point you may as well
use vlist or implement growable vector yourself. It is comparable
effort to what you describe here, and the performance should be better
– especially for the part that gets stored in the hash.

>  Or put them into a
> list, and reallocate on first access beyond the existing element.  That
> seems rather contorted.

You mean “put them into a /vector/”?  Any way, this is not contorted
at all IMO but a dozen or so lines of code to implement the wrapped
vector functions.  Yes, it would be preferable if there was such a
type already implemented in guile.

If you access the list in mostly sequential order (ice-9 q) is quite
useful.  It provides O(1) insertion to the tail of the list.
Implemented using cons so anything other than in-order accessing is
expensive.

Regards



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  4:23 ` Daniel Hartwig
@ 2012-06-11  4:37   ` David Kastrup
  2012-06-11  5:00     ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11  4:37 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 9 June 2012 20:32, David Kastrup <dak@gnu.org> wrote:
>>
>> Hi,
>>
>> the main data structure of Lua is a "table", an associative array, and a
>> table t has a continguous numerically addressed part from 1..#t, with
>> all other indices going through a hashing mechanism.  One principal
>> distinguishing feature, like with a Scheme hashtable, is the ability to
>> grow on-demand.
>>
>
> Hello
>
> From <http://www.lua.org/doc/jucs05.pdf> it is clear that these
> numerical indices of the Lua table are indeed implemented as an array.
>  IIRC the implementation in guile (see branch ‘lua’) is implemented as
> a straight hash table, without this array optimization.
>
> Have you considered the vlist data type?  You would have to invert
> your indices because it grows at the front, like cons grows a list.
> It is also immutable for values already placed in the list, but
> nothing prevents you from editing the objects referenced by those
> values.

What is a vlist?

>> Now it would be possible when the type lattice gets extended to store
>> the new entries in a hashtable and go from there.
>
> Why not use a hash for everything?  Unless your initial lattice is
> very large there would be relatively small loss in performance.

About double the memory impact (vector->1 cell, hash table->1 cell per
hash bucket+1 additional cons cell per element) and much slower copying.
And quite slower access.

>>  Or put them into a
>> list, and reallocate on first access beyond the existing element.  That
>> seems rather contorted.
>
> You mean “put them into a /vector/”?

No, since a vector can't be grown.  This would basically switch the data
structure between "write" and "read" mode, where "write mode" grows the
list of additions, and "read mode" accesses the vector.  Switching from
write to read entails creating a newly allocated vector and copying the
new additions from the list as well as the old vector into it.

> Any way, this is not contorted at all IMO but a dozen or so lines of
> code to implement the wrapped vector functions.  Yes, it would be
> preferable if there was such a type already implemented in guile.
>
> If you access the list in mostly sequential order (ice-9 q) is quite
> useful.  It provides O(1) insertion to the tail of the list.
> Implemented using cons so anything other than in-order accessing is
> expensive.

I don't access the list in mostly sequential order.  I described the use
case in the posting you replied to.  It is quite random access in
character.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  4:37   ` David Kastrup
@ 2012-06-11  5:00     ` Daniel Hartwig
  2012-06-11  7:25       ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11  5:00 UTC (permalink / raw)
  To: guile-devel

On 11 June 2012 12:37, David Kastrup <dak@gnu.org> wrote:
> What is a vlist?

vlist is a data type introduced around guile 2.0.  You will find it
documented in the Guile Reference under Compound Data Types.

They are growable and provide vector-like access performances and
memory locality.

>>> Now it would be possible when the type lattice gets extended to store
>>> the new entries in a hashtable and go from there.
>>
>> Why not use a hash for everything?  Unless your initial lattice is
>> very large there would be relatively small loss in performance.
>
> About double the memory impact (vector->1 cell, hash table->1 cell per
> hash bucket+1 additional cons cell per element) and much slower copying.
> And quite slower access.
>

With these concerns your only options are really vlist or implementing
growable vector.

>>>  Or put them into a
>>> list, and reallocate on first access beyond the existing element.  That
>>> seems rather contorted.
>>
>> You mean “put them into a /vector/”?
>
> No, since a vector can't be grown.  This would basically switch the data
> structure between "write" and "read" mode, where "write mode" grows the
> list of additions, and "read mode" accesses the vector.  Switching from
> write to read entails creating a newly allocated vector and copying the
> new additions from the list as well as the old vector into it.

I see, that is rather contorted :-)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  5:00     ` Daniel Hartwig
@ 2012-06-11  7:25       ` David Kastrup
  2012-06-11  9:01         ` Daniel Hartwig
  2012-06-11 12:13         ` Noah Lavine
  0 siblings, 2 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-11  7:25 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 11 June 2012 12:37, David Kastrup <dak@gnu.org> wrote:
>> What is a vlist?
>
> vlist is a data type introduced around guile 2.0.  You will find it
> documented in the Guile Reference under Compound Data Types.
>
> They are growable and provide vector-like access performances and
> memory locality.

Ah, too bad.  This needs to run under 1.8 as well.

> With these concerns your only options are really vlist or implementing
> growable vector.

>>>>  Or put them into a
>>>> list, and reallocate on first access beyond the existing element.  That
>>>> seems rather contorted.
>>>
>>> You mean “put them into a /vector/”?
>>
>> No, since a vector can't be grown.  This would basically switch the data
>> structure between "write" and "read" mode, where "write mode" grows the
>> list of additions, and "read mode" accesses the vector.  Switching from
>> write to read entails creating a newly allocated vector and copying the
>> new additions from the list as well as the old vector into it.
>
> I see, that is rather contorted :-)

Yes, but then it will actually be quite rare that the structure is
extended while it is read rather often.  It would probably do fine to
just do the extension lazily by exception, but then wrapping a catch
around every access is likely to be slower than checking the addition
list to be empty.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
  2012-06-09 14:43 ` Krister Svanlund
  2012-06-11  4:23 ` Daniel Hartwig
@ 2012-06-11  8:14 ` Thien-Thi Nguyen
  2012-06-11  9:08 ` Andy Wingo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Thien-Thi Nguyen @ 2012-06-11  8:14 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

() David Kastrup <dak@gnu.org>
() Sat, 09 Jun 2012 14:32:28 +0200

   Suggestions?

Guile-SDL implements (in C) collections of "enums" using both a C array
(static, used also for init) and a Scheme hash table for backing store:

http://git.savannah.gnu.org/cgit/guile-sdl.git/tree/src/sdlenums.c#n66

This is not dynamic (growable) as you specified, but i figure it
might be useful as an example building block.  The modifications
are straightforward.  You would have to add a container for the
set of arrays, and two (more) indirections to ‘lookup’ (line 97),
the first to distinguish between integer/otherwise ‘key’, the
second to index into the outer container.

Alternatively, keep track of collection size and add entries to
the hash table under both the user-given key and the monotonic
‘size - 1’ (two entries).  Extremely wasteful spacewise.

Less wasteful spacewise but more wasteful timewise would be to
combine the above approaches, where the initial (static, before
growth) set does not get a double entry, but in turn requires
another check (branch) in ‘lookup’.  Actually the cut-off index
need not be the initial set; it need only correspond to the
initial array allocation (possibly greater than the initial set).

I suspect some of these considerations have gone into the vlist
implementation (though not all since IIRC vlists alloc doubling
"on overflow"; there is not a single threshold), so another idea
is to use that, back-porting as necessary.

Tangential: I think vlists should be aggressively exercised,
debugged and then back-ported to 1.8 and before.  It's a nice
self-contained design.  This requires a change of policy, though.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  7:25       ` David Kastrup
@ 2012-06-11  9:01         ` Daniel Hartwig
  2012-06-11  9:13           ` Daniel Hartwig
  2012-06-11 12:13         ` Noah Lavine
  1 sibling, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11  9:01 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

On 11 June 2012 15:25, David Kastrup <dak@gnu.org> wrote:
> Yes, but then it will actually be quite rare that the structure is
> extended while it is read rather often.  It would probably do fine to
> just do the extension lazily by exception, but then wrapping a catch
> around every access is likely to be slower than checking the addition
> list to be empty.

Certainly, catching exceptions is expensive even for the edge-case  of
the rare write op.  Not sure why you would go that route though I
suppose you understand your use case much better :-)

For reference, attached is a growable vector I use in several
projects, adapted to support the length operation similar to Lua (i.e.
first unset numerical index).  There is no catching of exceptions
here, every access to the data is through the dynvector procedures
which check the index and vector size.

[-- Attachment #2: dynvector.scm --]
[-- Type: application/octet-stream, Size: 1904 bytes --]

(define-module (dynvector)
  #:use-module (srfi srfi-9)
  #:export (dynvector
            make-dynvector
            dynvector?
            dynvector-length
            dynvector-capacity
            dynvector-ref
            dynvector-set!
            dynvector-growth-factor))

(define-record-type dynvector-type
  (%make-dynvector data cached-length)
  dynvector?
  (data dynvector-data set-dynvector-data!)
  (cached-length dynvector-cached-length set-dynvector-cached-length!))

(define dynvector-growth-factor
  (make-fluid 2))

(define (dynvector . l)
  (%make-dynvector (apply vector l) #f))

(define (make-dynvector len . rest)
  (%make-dynvector (apply make-vector len rest) #f))

(define (%dynvector-length dynvector)
  (let iter ((k 0))
    (cond ((eq? k (dynvector-capacity dynvector)) k)
          ((eq? *unspecified* (dynvector-ref dynvector k)) k)
          (else (iter (1+ k))))))

;; index of the first unspecified value
(define (dynvector-length dynvector)
  (unless (dynvector-cached-length dynvector)
    (set-dynvector-cached-length! (%dynvector-length dynvector)))
  (dynvector-cached-length dynvector))

(define (dynvector-capacity dynvector)
 (vector-length (dynvector-data dynvector)))

(define (dynvector-ref dynvector k)
  (if (< k (dynvector-capacity dynvector))
      (vector-ref (dynvector-data dynvector) k)
      *unspecified*))

(define (dynvector-set! dynvector k obj)
  (while (< (dynvector-capacity dynvector) k)
    (dynvector-grow! dynvector))
  (set-dynvector-cached-length! dynvector #f)
  (vector-set! (dynvector-data dynvector) k obj))

(define (dynvector-grow! dynvector)
  (let* ((vec1 (dynvector-data dynvector))
         (end1 (vector-length vec1))
         (vec2 (make-vector
                (* (fluid-ref dynvector-growth-factor) end1)
                *unspecified*)))
    (vector-move-left! vec1 0 end1 vec2 0)
    (set-dynvector-data! dynvector vec2)))

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
                   ` (2 preceding siblings ...)
  2012-06-11  8:14 ` Thien-Thi Nguyen
@ 2012-06-11  9:08 ` Andy Wingo
  2012-06-11  9:55   ` David Kastrup
  2012-06-11 12:02 ` Ludovic Courtès
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Andy Wingo @ 2012-06-11  9:08 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

Hi David,

You raise an interesting issue.  But first, a nitpick :)

On Sat 09 Jun 2012 14:32, David Kastrup <dak@gnu.org> writes:

> Scheme hashtable

To be very pedantic, there are no hashtables in R5RS Scheme.  SRFI-69
and R6RS specify them (in different ways), but do not mandate that they
grow.  Guile's do grow, but that's a detail.

The underlying message important:  _Scheme_ doesn't give you much in the
way of data structures.  It does give you some good tools to define new
data structures, though.

Guile (as an implementation) also provides a few data structures
built-in.  It could provide more.

But, I don't think that "table" is the right strategy for what you want.

Lua (and JS) implementations typically have many different
implementation strategies for their table-like objects.  For example, V8
has over two dozen.  I don't think we want to pull all that complexity
into the core of Guile where it's not necessary for idiomatic Guile
programming.  It would certainly exist in the language-specific runtime,
though.  Perhaps a module would be more appropriate, though whether
something that fundamental could be shared between any two languages is
tricky.

You might enjoy
http://blog.mrale.ph/post/24351748336/explaining-js-vm-in-js.

For your particular problem, I would just define a new data type,
<growable-vector> or so, perhaps using srfi-9.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  9:01         ` Daniel Hartwig
@ 2012-06-11  9:13           ` Daniel Hartwig
  2012-06-11 10:38             ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11  9:13 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On 11 June 2012 17:01, Daniel Hartwig <mandyke@gmail.com> wrote:
> For reference, attached is a growable vector I use in several
> projects, adapted to support the length operation similar to Lua (i.e.
> first unset numerical index).  There is no catching of exceptions
> here, every access to the data is through the dynvector procedures
> which check the index and vector size.

Always test before posting even small changes to existing code :-)

Updated attachment to actually run without problems

[-- Attachment #2: dynvector.scm --]
[-- Type: application/octet-stream, Size: 1915 bytes --]

(define-module (dynvector)
  #:use-module (srfi srfi-9)
  #:export (dynvector
            make-dynvector
            dynvector?
            dynvector-length
            dynvector-capacity
            dynvector-ref
            dynvector-set!
            dynvector-growth-factor))

(define-record-type dynvector-type
  (%make-dynvector data cached-length)
  dynvector?
  (data dynvector-data set-dynvector-data!)
  (cached-length dynvector-cached-length set-dynvector-cached-length!))

(define dynvector-growth-factor
  (make-fluid 2))

(define (dynvector . l)
  (%make-dynvector (apply vector l) #f))

(define (make-dynvector len . rest)
  (%make-dynvector (apply make-vector len rest) #f))

(define (%dynvector-length dynvector)
  (let iter ((k 0))
    (cond ((eq? k (dynvector-capacity dynvector)) k)
          ((eq? *unspecified* (dynvector-ref dynvector k)) k)
          (else (iter (1+ k))))))

;; index of the first unspecified value
(define (dynvector-length dynvector)
  (unless (dynvector-cached-length dynvector)
    (set-dynvector-cached-length! dynvector (%dynvector-length dynvector)))
  (dynvector-cached-length dynvector))

(define (dynvector-capacity dynvector)
 (vector-length (dynvector-data dynvector)))

(define (dynvector-ref dynvector k)
  (if (< k (dynvector-capacity dynvector))
      (vector-ref (dynvector-data dynvector) k)
      *unspecified*))

(define (dynvector-set! dynvector k obj)
  (while (<= (dynvector-capacity dynvector) k)
    (dynvector-grow! dynvector))
  (set-dynvector-cached-length! dynvector #f)
  (vector-set! (dynvector-data dynvector) k obj))

(define (dynvector-grow! dynvector)
  (let* ((vec1 (dynvector-data dynvector))
         (end1 (vector-length vec1))
         (vec2 (make-vector
                (* (fluid-ref dynvector-growth-factor) end1)
                *unspecified*)))
    (vector-move-left! vec1 0 end1 vec2 0)
    (set-dynvector-data! dynvector vec2)))

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  9:08 ` Andy Wingo
@ 2012-06-11  9:55   ` David Kastrup
  2012-06-11 11:25     ` Andy Wingo
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11  9:55 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> writes:

> You raise an interesting issue.  But first, a nitpick :)
>
> On Sat 09 Jun 2012 14:32, David Kastrup <dak@gnu.org> writes:
>
>> Scheme hashtable
>
> To be very pedantic, there are no hashtables in R5RS Scheme.  SRFI-69
> and R6RS specify them (in different ways), but do not mandate that they
> grow.  Guile's do grow, but that's a detail.
>
> The underlying message important:  _Scheme_ doesn't give you much in the
> way of data structures.

Lua gives you one data structure, period.  And it is pretty good at
making this one work well.  Sometimes, the "one thing that works well"
approach beats the "give the user choice between several options with
different drawbacks" approach, since there may be use cases exercising
at least one deficiency in every option.

> Guile (as an implementation) also provides a few data structures
> built-in.  It could provide more.
>
> But, I don't think that "table" is the right strategy for what you
> want.

Tables are a superset of what I need here.  I need the "growable vector"
aspect, not the "hash part" aspect.  Guile 1.8 only offers subsets:
"growable" does not come together with "vector".

Guile hashtables _internally_ use a growable vector, but it is not
accessible as such.

> Lua (and JS) implementations typically have many different
> implementation strategies for their table-like objects.  For example,
> V8 has over two dozen.

Uh what?  Lua has one implementation "strategy".  Array part, and hash
part.  Lua is not JavaScript.  That Lua emulators without access to a
reasonably fitting table building block have to make do with substitutes
is not much of a surprise.

There has been some sort of a language shootout on the LilyPond
developer list.  With regard to the question Guile/Lua, my respective
pro/contra list for suitability for extensions was

a) lexically Scheme is _very_ nice to integrate.  Scheme reader rules
are straightforward.  LilyPond basically starts Scheme expressions with
#, and so things like #3, #(+ 2 1) #(cond ...) provide a seamless
transition from just constants to more complex things, including the
ability to plug into scopes.  The combination of a straightforward lexer
and a basically non-existent parser (you enter the parse trees directly)
makes for predictable and extensible behavior and easy manipulation of
expressions and pattern matching.

b) from the data types, Lua is _very_ nice to integrate because it does
not give you choice.  You don't need to pick between symbols and strings
when mapping your system to the extension language, because there are
only interned strings.  You don't need to pick between lists and
fixed-size arrays because there are just variable size tables.  You
don't need to pick between arrays and structs because everything is a
table.  And so on.  Since you get only a minimal set of data types and
data structures, but those implemented in a manner where they actually
cover most use cases nevertheless, you are saved from a lot of decisions
that may have different drawbacks.

c) for general-purpose programming, Lua is more human-friendly.  That
comes at the cost of being less computer-friendly by having a parser
between input and parse trees.  Macro manipulation in Lua is sort of a
text processing exercise and not reliable.  It is off the charts as a
feasible programming technique.

Goops is powerful and flexible, but has a bit of a problem in that it is
too powerful and flexible.  As a result, there is no really established
programming style for it: it is more a tool for developing your own OOP
technology rather than prescribing one itself.

That would already be a mixed blessing.  It becomes worse since, while
the documentation does not take a lot of choices regarding how to do
things or not and leaves the programmer with a lot of freedom, the
implementation is optimized for certain uses, and if you want to find
out if your plan of mapping your OOP requirement to Goops will actually
work reasonably efficiently, you need to read all the internals of the
Goops implementation and figure out just _which_ use patterns are
supported efficiently, and which patterns aren't.

That's somewhat less than fabulous.  While the results are reasonably
easy to understand and maintain, designing the first implementation
requires a grandwizard if performance is not just an academical
consideration.

If you take a look at Lua, while things do get complex when you dive
into metatable territory, the performance implications are rather
transparent even without having to analyze the implementation of Lua
manually in fine-grained detail.

This is mostly a diversion from the original question, but it might
still be interesting as a sort of observation.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  9:13           ` Daniel Hartwig
@ 2012-06-11 10:38             ` David Kastrup
  2012-06-11 11:57               ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11 10:38 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 11 June 2012 17:01, Daniel Hartwig <mandyke@gmail.com> wrote:
>> For reference, attached is a growable vector I use in several
>> projects, adapted to support the length operation similar to Lua (i.e.
>> first unset numerical index).  There is no catching of exceptions
>> here, every access to the data is through the dynvector procedures
>> which check the index and vector size.
>
> Always test before posting even small changes to existing code :-)
>
> Updated attachment to actually run without problems

Well, considering the cost of dynvector-grow!, doing the growth in a
loop rather then just the determination of the new size seems a bit
expensive:

  (while (<= (dynvector-capacity dynvector) k)
    (dynvector-grow! dynvector))

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  9:55   ` David Kastrup
@ 2012-06-11 11:25     ` Andy Wingo
  2012-06-11 12:00       ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Wingo @ 2012-06-11 11:25 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

Hi,

On Mon 11 Jun 2012 11:55, David Kastrup <dak@gnu.org> writes:

> Tables are a superset of what I need here.  I need the "growable vector"
> aspect, not the "hash part" aspect.  Guile 1.8 only offers subsets:
> "growable" does not come together with "vector".

Why not just make your own growable vectors, then?  It will be just as
efficient.  Guile's hash tables are not designed to be addressed by
index.

I guess to summarize: if you want an abstraction like tables, you would
build it out of vectors and hash tables.  But vectors and hash tables
themselves are the lower-level building blocks.

>> Lua (and JS) implementations typically have many different
>> implementation strategies for their table-like objects.  For example,
>> V8 has over two dozen.
>
> Uh what?  Lua has one implementation "strategy".  Array part, and hash
> part.

LuaJIT doesn't pack unboxed numerics in the array part?  I would be
surprised.  I would also be surprised if it didn't do clever things in
the "properties" part, as V8 also does.

> There has been some sort of a language shootout on the LilyPond
> developer list.  With regard to the question Guile/Lua, my respective
> pro/contra list for suitability for extensions was [...]

Pretty good observations here.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 10:38             ` David Kastrup
@ 2012-06-11 11:57               ` Daniel Hartwig
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11 11:57 UTC (permalink / raw)
  To: guile-devel

On 11 June 2012 18:38, David Kastrup <dak@gnu.org> wrote:
> Well, considering the cost of dynvector-grow!, doing the growth in a
> loop rather then just the determination of the new size seems a bit
> expensive:

Only if you are repeatedly setting values at indices far beyond the
current capacity.  This does not occur in my usage patterns where
values are primarily inserted at the tail position (i.e. length),
which is amortized O(1) since grow! is geometric.

This is only an example, something which fits my needs well.  You are
free to adapt it or not to your particular needs.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 11:25     ` Andy Wingo
@ 2012-06-11 12:00       ` David Kastrup
  2012-06-11 12:12         ` David Kastrup
  2012-06-11 12:20         ` Daniel Hartwig
  0 siblings, 2 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-11 12:00 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> writes:

> Hi,
>
> On Mon 11 Jun 2012 11:55, David Kastrup <dak@gnu.org> writes:
>
>> Tables are a superset of what I need here.  I need the "growable vector"
>> aspect, not the "hash part" aspect.  Guile 1.8 only offers subsets:
>> "growable" does not come together with "vector".
>
> Why not just make your own growable vectors, then?  It will be just as
> efficient.

Sure, I will.  A native implementation would be able to benefit from
storage layout conditions that would, in some cases, allow extending the
array without allocating a new memory range, so it _can_ be done.

> Guile's hash tables are not designed to be addressed by index.

Sure they are: the hash _is_ being used as an index.  And one could
likely provide a "hash function" doing just that, but it would still be
storing a coalescing list in each bucket rather than a single element.

Most of the _mechanisms_ are there.  Just not user-accessible.

> I guess to summarize: if you want an abstraction like tables, you would
> build it out of vectors and hash tables.  But vectors and hash tables
> themselves are the lower-level building blocks.

Not low-level enough: they are already specialized in different
directions making them equally unsuitable for footing the bill.

>>> Lua (and JS) implementations typically have many different
>>> implementation strategies for their table-like objects.  For example,
>>> V8 has over two dozen.
>>
>> Uh what?  Lua has one implementation "strategy".  Array part, and hash
>> part.
>
> LuaJIT doesn't pack unboxed numerics in the array part?  I would be
> surprised.  I would also be surprised if it didn't do clever things in
> the "properties" part, as V8 also does.

LuaJIT is not Lua.  The point was that basic table usage in Lua (which
won't use preallocation) can't be mapped well to Guile's data
structures, and that is somewhat embarrassing.

In this particular case, Lua is just something used for namedropping
since it may be more relevant than my particular application.  In either
case, having to make the decision _either_ vector addressing _or_
growable is not really forthcoming, since hashtables _do_ use that
combination internally, so it is not really technical reasons preventing
it.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
                   ` (3 preceding siblings ...)
  2012-06-11  9:08 ` Andy Wingo
@ 2012-06-11 12:02 ` Ludovic Courtès
  2012-06-12 13:36 ` Hans Aberg
  2012-06-14 14:33 ` Mark H Weaver
  6 siblings, 0 replies; 43+ messages in thread
From: Ludovic Courtès @ 2012-06-11 12:02 UTC (permalink / raw)
  To: guile-devel

Hi David,

David Kastrup <dak@gnu.org> skribis:

> Scheme/Guile vectors are fixed size.  Now I have a situation where I
> have a basic type lattice with records stored in vectors, and this type
> lattice may be extended dynamically (which typically happens at the
> start of a whole file, for potentially multi-file runs).

As Daniel put it, vlists would probably be a good match for that
(info "(guile) VLists").

Ludo’.




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:00       ` David Kastrup
@ 2012-06-11 12:12         ` David Kastrup
  2012-06-11 12:20           ` David Kastrup
  2012-06-11 12:20         ` Daniel Hartwig
  1 sibling, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11 12:12 UTC (permalink / raw)
  To: guile-devel

David Kastrup <dak@gnu.org> writes:

> Andy Wingo <wingo@pobox.com> writes:
>
>> Hi,
>>
>> On Mon 11 Jun 2012 11:55, David Kastrup <dak@gnu.org> writes:
>>
>>> Tables are a superset of what I need here.  I need the "growable vector"
>>> aspect, not the "hash part" aspect.  Guile 1.8 only offers subsets:
>>> "growable" does not come together with "vector".
>>
>> Why not just make your own growable vectors, then?  It will be just as
>> efficient.
>
> Sure, I will.  A native implementation would be able to benefit from
> storage layout conditions that would, in some cases, allow extending the
> array without allocating a new memory range, so it _can_ be done.

P.S.: I still need to look at vlists.  They might already address this
      issue, though I can't use them in Guile 1.8.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11  7:25       ` David Kastrup
  2012-06-11  9:01         ` Daniel Hartwig
@ 2012-06-11 12:13         ` Noah Lavine
  2012-06-11 12:28           ` David Kastrup
  1 sibling, 1 reply; 43+ messages in thread
From: Noah Lavine @ 2012-06-11 12:13 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

Hello,

>> vlist is a data type introduced around guile 2.0.  You will find it
>> documented in the Guile Reference under Compound Data Types.
>>
>> They are growable and provide vector-like access performances and
>> memory locality.
>
> Ah, too bad.  This needs to run under 1.8 as well.

If you need to support older versions of Guile, then you can't use any
data structures we add now anyway, correct? So it seems like you will
have to implement growable vectors yourself no matter what.

If you want to, though, you could look at the implementation of vlists
in Guile 2.0, which I think is all in Scheme. See
module/ice-9/vlist.scm.

Even if we can't fix it, it is still nice to hear about data
structures you wish Guile had, so that in a few more versions this
might not be a problem for you.

Noah



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:00       ` David Kastrup
  2012-06-11 12:12         ` David Kastrup
@ 2012-06-11 12:20         ` Daniel Hartwig
  2012-06-11 12:36           ` David Kastrup
  1 sibling, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11 12:20 UTC (permalink / raw)
  To: guile-devel

On 11 June 2012 20:00, David Kastrup <dak@gnu.org> wrote:
>> I guess to summarize: if you want an abstraction like tables, you would
>> build it out of vectors and hash tables.  But vectors and hash tables
>> themselves are the lower-level building blocks.
>
> Not low-level enough: they are already specialized in different
> directions making them equally unsuitable for footing the bill.

Really?

The Implementation of Lua 5.0 [1], section 4 illustrates how Lua
tables are constructed from a standard hash table and array (vector).
In particular, see Figure 2.

The contiguous, numerically indexed slots are stored only in the
array, with all other slots stored only in the hash table.  This is
perfectly able to be implemented in guile using the standard vectors
and hash tables.  It does require the vectors to be growable, a that
capability which has already been demonstrated.

[1] http://www.lua.org/doc/jucs05.pdf

As Andy points out, Scheme (and guile) provide a toolset of primitive
data types out of which you can build the particular abstractions you
require.  This has the advantage that you can optimize heavily for
your own particular needs, is that possible to the same extent with
Lua given that it only has tables as a fundamental container?

When comparing Lua to guile I would not consider it an issue that
guile does not natively provide a particular data type because most
data types are simple to implement with the tools provided.

Regards



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:12         ` David Kastrup
@ 2012-06-11 12:20           ` David Kastrup
  2012-06-11 13:04             ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11 12:20 UTC (permalink / raw)
  To: guile-devel

David Kastrup <dak@gnu.org> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Andy Wingo <wingo@pobox.com> writes:
>>
>>> Hi,
>>>
>>> On Mon 11 Jun 2012 11:55, David Kastrup <dak@gnu.org> writes:
>>>
>>>> Tables are a superset of what I need here.  I need the "growable vector"
>>>> aspect, not the "hash part" aspect.  Guile 1.8 only offers subsets:
>>>> "growable" does not come together with "vector".
>>>
>>> Why not just make your own growable vectors, then?  It will be just as
>>> efficient.
>>
>> Sure, I will.  A native implementation would be able to benefit from
>> storage layout conditions that would, in some cases, allow extending the
>> array without allocating a new memory range, so it _can_ be done.
>
> P.S.: I still need to look at vlists.  They might already address this
>       issue, though I can't use them in Guile 1.8.

No, the "immutable" angle would make them unsuitable again.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:13         ` Noah Lavine
@ 2012-06-11 12:28           ` David Kastrup
  2012-06-11 23:50             ` Mark H Weaver
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-11 12:28 UTC (permalink / raw)
  To: guile-devel

Noah Lavine <noah.b.lavine@gmail.com> writes:

>>> vlist is a data type introduced around guile 2.0.  You will find it
>>> documented in the Guile Reference under Compound Data Types.
>>>
>>> They are growable and provide vector-like access performances and
>>> memory locality.
>>
>> Ah, too bad.  This needs to run under 1.8 as well.
>
> If you need to support older versions of Guile, then you can't use any
> data structures we add now anyway, correct? So it seems like you will
> have to implement growable vectors yourself no matter what.
>
> If you want to, though, you could look at the implementation of vlists
> in Guile 2.0, which I think is all in Scheme. See
> module/ice-9/vlist.scm.
>
> Even if we can't fix it, it is still nice to hear about data
> structures you wish Guile had, so that in a few more versions this
> might not be a problem for you.

I don't think I need yet another data structure deficient in some
respects.  We have vectors that can't grow, hashtables that can grow but
only index through a hash function, vlists that can grow but have
immutable content...

Where is the point in thinking up yet another restriction that will make
for a new data structure "complementing" the others?

Why not just have a superset without arbitrary restriction and implement
the other structures based on that?  Then each one just needs to enforce
its personal restrictions in its accessor functions, and otherwise just
use what is there in the general mechanism.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:20         ` Daniel Hartwig
@ 2012-06-11 12:36           ` David Kastrup
  0 siblings, 0 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-11 12:36 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 11 June 2012 20:00, David Kastrup <dak@gnu.org> wrote:
>>> I guess to summarize: if you want an abstraction like tables, you would
>>> build it out of vectors and hash tables.  But vectors and hash tables
>>> themselves are the lower-level building blocks.
>>
>> Not low-level enough: they are already specialized in different
>> directions making them equally unsuitable for footing the bill.
>
> Really?
>
> The Implementation of Lua 5.0 [1], section 4 illustrates how Lua
> tables are constructed from a standard hash table and array (vector).
> In particular, see Figure 2.

Sure.

> The contiguous, numerically indexed slots are stored only in the
> array, with all other slots stored only in the hash table.  This is
> perfectly able to be implemented in guile using the standard vectors
> and hash tables.  It does require the vectors to be growable,

Cough, cough.  Standard vectors are not growable.  Which is the original
problem of this thread, never mind Lua.

> a that capability which has already been demonstrated.

Interesting.

> As Andy points out, Scheme (and guile) provide a toolset of primitive
> data types out of which you can build the particular abstractions you
> require.

Too bad that the existing types all have rather arbitrary seeming
limitations.  Vectors don't grow, hashtables have additional indirection
through hash buckets and coalescing lists, vlists are immutable.

It seems like they are all rather similar, and all with a rather
arbitrarily chosen additional restriction tacked on making them
unsuitable for this task.

> When comparing Lua to guile I would not consider it an issue that
> guile does not natively provide a particular data type because most
> data types are simple to implement with the tools provided.

Except that this one isn't.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:20           ` David Kastrup
@ 2012-06-11 13:04             ` Daniel Hartwig
  2012-06-11 14:19               ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-11 13:04 UTC (permalink / raw)
  To: guile-devel

On 11 June 2012 20:20, David Kastrup <dak@gnu.org> wrote:
>> P.S.: I still need to look at vlists.  They might already address this
>>       issue, though I can't use them in Guile 1.8.
>
> No, the "immutable" angle would make them unsuitable again.

Note that vlists are only immutable in the sense that you can not
modify the value of a slot already allocated.  You can modify the
object that a slot points to and you can extend a vlist as much as you
like.

Multiple vlists can even share tails.

For example, this session modifies the object in slot 0 of the vlist:

> (use-modules (ice-9 vlist))
> (list->vlist '((a b) (c d) (e f)))
$1 = #<vlist ((a b) (c d) (e f))>
> (vlist-ref $1 0)
$2 = (a b)
> (set-cdr! $2 '(boo))
> $1
$3 = #<vlist ((a boo) (c d) (e f))>

> Scheme/Guile vectors are fixed size.  Now I have a situation where I
> have a basic type lattice with records stored in vectors, and this type
> lattice may be extended dynamically (which typically happens at the
> start of a whole file, for potentially multi-file runs).

From this I gather that your use case only appends to the lattice, if
so, vlist is suitable for that task.

> Cough, cough.  Standard vectors are not growable.  Which is the original
> problem of this thread, never mind Lua.

True, but a growable vector is a tiny step away from the standard vector.

> hashtables have additional indirection
> through hash buckets and coalescing lists

This is fairly standard for a hash table.  I would be quite surprised
if the hash table part of a Lua table did not also use buckets.

> Except that this one isn't.

Why not?

You take a vector and a hash table, store your values in them, and
grow either as needed.  This is not a complicated type.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 13:04             ` Daniel Hartwig
@ 2012-06-11 14:19               ` David Kastrup
  2012-06-11 15:24                 ` Stefan Israelsson Tampe
  2012-06-11 15:27                 ` Andy Wingo
  0 siblings, 2 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-11 14:19 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 11 June 2012 20:20, David Kastrup <dak@gnu.org> wrote:
>>> P.S.: I still need to look at vlists.  They might already address this
>>>       issue, though I can't use them in Guile 1.8.
>>
>> No, the "immutable" angle would make them unsuitable again.
>
> Note that vlists are only immutable in the sense that you can not
> modify the value of a slot already allocated.

Which makes it useless here.

>> Scheme/Guile vectors are fixed size.  Now I have a situation where I
>> have a basic type lattice with records stored in vectors, and this type
>> lattice may be extended dynamically (which typically happens at the
>> start of a whole file, for potentially multi-file runs).
>
> From this I gather that your use case only appends to the lattice, if
> so, vlist is suitable for that task.

Wrong.  My use case only _allocates_ at the end of the existing type
lattice, but the records are not read-only.

>> Cough, cough.  Standard vectors are not growable.  Which is the
>> original problem of this thread, never mind Lua.
>
> True, but a growable vector is a tiny step away from the standard
> vector.

A tiny step if you are modifying the C code.  A not so tiny step if you
are working with Scheme.

>> hashtables have additional indirection
>> through hash buckets and coalescing lists
>
> This is fairly standard for a hash table.  I would be quite surprised
> if the hash table part of a Lua table did not also use buckets.

But it is not standard for a growable vector that it only comes with
buckets and chains.

>> Except that this one isn't.
>
> Why not?
>
> You take a vector and a hash table, store your values in them, and
> grow either as needed.  This is not a complicated type.

Except that vectors don't grow.  Are you even reading what you are
replying to?

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 14:19               ` David Kastrup
@ 2012-06-11 15:24                 ` Stefan Israelsson Tampe
  2012-06-11 15:27                 ` Andy Wingo
  1 sibling, 0 replies; 43+ messages in thread
From: Stefan Israelsson Tampe @ 2012-06-11 15:24 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 2752 bytes --]

Maybe this could be a first stub for a table structure that is uses both an
array and a hash-table. I do think that implementing this might need fine
tuning in order
to have good defaults and so on. So in that sense one probably need to
check out reference
implementations. But this is for discussion!

I Assumed growing here and have no shrinking!

I made it nonfunctional.

One note to the discussion. It is not just to combine a growable vector
with a growable hash
in ordder to have a one tool for all cases. The reason is that one need to
tackle the issue with sparse tables with integer keys. I tried to add that
feature as well in some way.

Anyway it shows that you don't need a ton of code to get something pretty
functional.


On Mon, Jun 11, 2012 at 4:19 PM, David Kastrup <dak@gnu.org> wrote:

> Daniel Hartwig <mandyke@gmail.com> writes:
>
> > On 11 June 2012 20:20, David Kastrup <dak@gnu.org> wrote:
> >>> P.S.: I still need to look at vlists.  They might already address this
> >>>       issue, though I can't use them in Guile 1.8.
> >>
> >> No, the "immutable" angle would make them unsuitable again.
> >
> > Note that vlists are only immutable in the sense that you can not
> > modify the value of a slot already allocated.
>
> Which makes it useless here.
>
> >> Scheme/Guile vectors are fixed size.  Now I have a situation where I
> >> have a basic type lattice with records stored in vectors, and this type
> >> lattice may be extended dynamically (which typically happens at the
> >> start of a whole file, for potentially multi-file runs).
> >
> > From this I gather that your use case only appends to the lattice, if
> > so, vlist is suitable for that task.
>
> Wrong.  My use case only _allocates_ at the end of the existing type
> lattice, but the records are not read-only.
>
> >> Cough, cough.  Standard vectors are not growable.  Which is the
> >> original problem of this thread, never mind Lua.
> >
> > True, but a growable vector is a tiny step away from the standard
> > vector.
>
> A tiny step if you are modifying the C code.  A not so tiny step if you
> are working with Scheme.
>
> >> hashtables have additional indirection
> >> through hash buckets and coalescing lists
> >
> > This is fairly standard for a hash table.  I would be quite surprised
> > if the hash table part of a Lua table did not also use buckets.
>
> But it is not standard for a growable vector that it only comes with
> buckets and chains.
>
> >> Except that this one isn't.
> >
> > Why not?
> >
> > You take a vector and a hash table, store your values in them, and
> > grow either as needed.  This is not a complicated type.
>
> Except that vectors don't grow.  Are you even reading what you are
> replying to?
>
> --
> David Kastrup
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 3631 bytes --]

[-- Attachment #2: hasharray.scm --]
[-- Type: application/octet-stream, Size: 1549 bytes --]

(use-modules (srfi srfi-11))

(define (make-table)
  (list 10 (make-vector 10 #f) (make-hash-table)))

(define table-ref 
  (case-lambda
    ((tb x) (table-ref tb x #f))
    ((tb x fail)
     (if (integer? x)
         (let ((r (vector-ref (cadr tb) (modulo x (car tb)))))
           (if r
               (if (= (car r) x) 
                   (cdr r)
                   fail)
               fail))
         (hash-ref (caddr tb) x fail)))))

(define (make-larger vec n)
  (let loop ((n2 (* n 2)))
    (let ((vec2  (make-vector n2 #f)))
      (let loop2 ((i 0))
        (if (< i n)
            (let ((r (vector-ref vec i)))
              (if r
                  (let ((r2 (vector-ref vec2 (modulo (car r) n2))))
                    (if r2 
                        (loop (* 2 n2))
                        (begin
                          (vector-set! vec2 (modulo (car r) n2) r)
                          (loop2 (+ i 1)))))
                  (loop2 (+ i 1))))
            (values vec2 n2))))))

(define (table-set! tb k v)
  (if (integer? k)
      (let ((vec (cadr tb))
            (n   (car  tb)))
        (let ((r (vector-ref vec (modulo k n))))
          (if r
              (if (= (car r) k)
                  (set-cdr! r v)
                  (let-values (((vec2 n2) (make-larger vec n)))
                    (set-car! tb  n2)
                    (set-car! (cdr tb) vec2)
                    (table-set! tb k v)))
              (vector-set! vec (modulo k n) (cons k v)))))
      (hash-set! (caddr tb) k v)))
                    
                   

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 14:19               ` David Kastrup
  2012-06-11 15:24                 ` Stefan Israelsson Tampe
@ 2012-06-11 15:27                 ` Andy Wingo
  2012-06-11 16:03                   ` David Kastrup
  1 sibling, 1 reply; 43+ messages in thread
From: Andy Wingo @ 2012-06-11 15:27 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

On Mon 11 Jun 2012 16:19, David Kastrup <dak@gnu.org> writes:

> Are you even reading what you are replying to?

Please be civil.  People are trying to help you.

Thanks,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 15:27                 ` Andy Wingo
@ 2012-06-11 16:03                   ` David Kastrup
  0 siblings, 0 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-11 16:03 UTC (permalink / raw)
  To: guile-devel

Andy Wingo <wingo@pobox.com> writes:

> On Mon 11 Jun 2012 16:19, David Kastrup <dak@gnu.org> writes:
>
>> Are you even reading what you are replying to?
>
> Please be civil.  People are trying to help you.

More like telling me off.  Of course, I am perfectly able to implement
my own moderately efficient version in Guile using already existing data
structures, and this is exactly what I will be doing in this case.

It is not me that needs help here but rather Guile, since the disjoint
members of the set of abstract data structures chosen by Scheme don't
include some combinations of features that map reasonably well both to
an abstract problem space as well as to the underlying implementation.
There is some sense in making use of this feature overlap in reducing
the number and increasing the versatility of the underlying primitives,
whether at the Scheme or the VM level.

Whatever.  I think we can all agree that I don't know what I am talking
about and move on.  Saves a lot of hassle for everybody.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 12:28           ` David Kastrup
@ 2012-06-11 23:50             ` Mark H Weaver
  2012-06-12  9:34               ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Mark H Weaver @ 2012-06-11 23:50 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

Hi David,

David Kastrup <dak@gnu.org> writes:
> I don't think I need yet another data structure deficient in some
> respects.  We have vectors that can't grow, hashtables that can grow but
> only index through a hash function, vlists that can grow but have
> immutable content...
[...]
> Why not just have a superset without arbitrary restriction and implement
> the other structures based on that?  Then each one just needs to enforce
> its personal restrictions in its accessor functions, and otherwise just
> use what is there in the general mechanism.

Simpler data structures can usually be implemented with less memory,
shorter code sequences with fewer conditional branches and less space in
the instruction cache, which in turn means they can be implemented more
efficiently.  Therefore, to allow efficient compilation, primitive data
structures should be very simple, with more complex structures built on
simpler ones instead of the other way around.

For example, consider resizable vectors vs fixed-size vectors.  A
fixed-size vector can be represented as a single memory block that
contains both the length and the elements together.  A resizable vector
requires an additional level of pointer indirection, which inevitably
means slower accesses and greater code size.  Furthermore, fixed-size
vectors allow the possibility of compiling in an unsafe mode where
out-of-bounds checks can be skipped.

Even if one is willing to pay the price of a relatively heavyweight
primitive data structure, Lua's tables are not always a good fit.  What
if you need a table keyed on large, sparsely distributed integer keys?
What if you need a linked list with O(1) insertions?  What if you need a
doubly-linked list with O(1) deletions?

Data structure design involves many tradeoffs, and unfortunately there
is no complex "one-size-fits-all" data structure that is a good
universal primitive upon which to build all others.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-11 23:50             ` Mark H Weaver
@ 2012-06-12  9:34               ` David Kastrup
  2012-06-12 20:34                 ` Mark H Weaver
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-12  9:34 UTC (permalink / raw)
  To: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> Hi David,
>
> David Kastrup <dak@gnu.org> writes:
>> I don't think I need yet another data structure deficient in some
>> respects.  We have vectors that can't grow, hashtables that can grow but
>> only index through a hash function, vlists that can grow but have
>> immutable content...
> [...]
>> Why not just have a superset without arbitrary restriction and implement
>> the other structures based on that?  Then each one just needs to enforce
>> its personal restrictions in its accessor functions, and otherwise just
>> use what is there in the general mechanism.
>
> Simpler data structures can usually be implemented with less memory,
> shorter code sequences with fewer conditional branches and less space in
> the instruction cache, which in turn means they can be implemented more
> efficiently.  Therefore, to allow efficient compilation, primitive data
> structures should be very simple, with more complex structures built on
> simpler ones instead of the other way around.
>
> For example, consider resizable vectors vs fixed-size vectors.  A
> fixed-size vector can be represented as a single memory block that
> contains both the length and the elements together.  A resizable vector
> requires an additional level of pointer indirection, which inevitably
> means slower accesses and greater code size.  Furthermore, fixed-size
> vectors allow the possibility of compiling in an unsafe mode where
> out-of-bounds checks can be skipped.

I have a really hard time swallowing an efficiency argument for Scheme
that is weak enough in comparison with the associated drawbacks not to
find consideration in the C++ standard template library.  What kind of
performance gains are we talking about in a typical vector-heavy
application?  0.5%?  Scheme is, to some degree, a computer theoretic
language.  But in this case, this seems to me more like a "don't ask,
don't tell" scenario regarding the possibility of using one underlying
primitive type that dares to be a trifle more flexible than the
theoretically achievable minimum.  Scheme implementations have
considerable leeway concerning their memory and data layout.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
                   ` (4 preceding siblings ...)
  2012-06-11 12:02 ` Ludovic Courtès
@ 2012-06-12 13:36 ` Hans Aberg
  2012-06-14 14:33 ` Mark H Weaver
  6 siblings, 0 replies; 43+ messages in thread
From: Hans Aberg @ 2012-06-12 13:36 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

On 9 Jun 2012, at 14:32, David Kastrup wrote:

> Scheme/Guile vectors are fixed size.  Now I have a situation where I
> have a basic type lattice with records stored in vectors, and this type
> lattice may be extended dynamically (which typically happens at the
> start of a whole file, for potentially multi-file runs).  Scheme does
> not offer a suitable data structure for that.

If only grows at one end, something like C++ std::deque might be suitable. It has the advantage of not having to invoke copy constructors when it grows.

Hans






^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-12  9:34               ` David Kastrup
@ 2012-06-12 20:34                 ` Mark H Weaver
  2012-06-12 20:47                   ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Mark H Weaver @ 2012-06-12 20:34 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

Hi David,

David Kastrup <dak@gnu.org> writes:
> Mark H Weaver <mhw@netris.org> writes:
>> Simpler data structures can usually be implemented with less memory,
>> shorter code sequences with fewer conditional branches and less space in
>> the instruction cache, which in turn means they can be implemented more
>> efficiently.  Therefore, to allow efficient compilation, primitive data
>> structures should be very simple, with more complex structures built on
>> simpler ones instead of the other way around.
>>
>> For example, consider resizable vectors vs fixed-size vectors.  A
>> fixed-size vector can be represented as a single memory block that
>> contains both the length and the elements together.  A resizable vector
>> requires an additional level of pointer indirection, which inevitably
>> means slower accesses and greater code size.  Furthermore, fixed-size
>> vectors allow the possibility of compiling in an unsafe mode where
>> out-of-bounds checks can be skipped.
>
> I have a really hard time swallowing an efficiency argument for Scheme
> that is weak enough in comparison with the associated drawbacks not to
> find consideration in the C++ standard template library.

C++, like Scheme, already supports fixed-size vectors in the core
language, so it would be redundant to include them in a library.

> What kind of performance gains are we talking about in a typical
> vector-heavy application?

If C++ supported _only_ resizable vectors, such that there was no way to
avoid the additional level of pointer indirection, and all derived data
structures had to be built upon these doubly-indirected vectors, then
I'd expect that the efficiency impact would be quite significant in both
time and space.

I acknowledge that the added cost of resizable vectors would not be
noticable in the current implementation of Guile, that might very well
change in a few years when we have native compilation.  Therefore, I
continue to believe that our primitive vector type should be fixed-size.

On the other hand, I agree that Guile needs a more extensive library of
data structures (including resizable vectors) out of the box.

I've attached a preliminary implementation of resizable vectors.
Hopefully it works on Guile 1.8 as well, though I haven't tested that.
Comments and suggestions welcome.

    Best,
     Mark



[-- Attachment #2: Preliminary growable vector implementation --]
[-- Type: text/plain, Size: 2387 bytes --]

(define-module (growable-vector)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-9 gnu)
  #:export (growable-vector
            make-growable-vector
            growable-vector-ref
            growable-vector-set!
            growable-vector-resize!
            list->growable-vector
            growable-vector->list))

(define-record-type <growable-vector>
  (%make-growable-vector length vector)
  growable-vector?
  (length growable-vector-length %set-growable-vector-length!)
  (vector growable-vector-vector %set-growable-vector-vector!))

(define (alloc-size len)
  (expt 2 (integer-length (- len 1))))

(define (make-growable-vector len . maybe-init)
  (let ((v (apply make-vector (alloc-size len) maybe-init)))
    (%make-growable-vector len v)))

(define (list->growable-vector lst)
  (let* ((len (length lst))
         (v (make-vector (alloc-size len))))
    (do ((i 0 (+ i 1))
         (lst lst (cdr lst)))
        ((null? lst))
      (vector-set! v i (car lst)))
    (%make-growable-vector len v)))

(define (growable-vector->list gv)
  (let ((len (growable-vector-length gv))
        (v   (growable-vector-vector gv)))
    (do ((i (- len 1) (- i 1))
         (lst '() (cons (vector-ref v i) lst)))
        ((negative? i) lst))))

(define (growable-vector . elems)
  (list->growable-vector elems))

(define (growable-vector-ref gv idx)
  (if (< -1 idx (growable-vector-length gv))
      (vector-ref (growable-vector-vector gv) idx)
      (error "growable-vector-ref: index out of range" idx gv)))

(define (growable-vector-set! gv idx val)
  (if (< -1 idx (growable-vector-length gv))
      (vector-set! (growable-vector-vector gv) idx val)
      (error "growable-vector-set!: index out of range" idx gv)))

(define (growable-vector-resize! gv new-len)
  (let* ((old-len   (growable-vector-length gv))
         (old-v     (growable-vector-vector gv))
         (old-alloc (vector-length old-v))
         (new-alloc (alloc-size new-len)))
    (if (not (= old-alloc new-alloc))
        (let ((new-v (make-vector new-alloc)))
          (vector-move-left! old-v 0 (min old-len new-len)
                             new-v 0)
          (%set-growable-vector-vector! gv new-v)))
    (%set-growable-vector-length! gv new-len)))

(set-record-type-printer! <growable-vector>
  (lambda (gv port)
    (format port "#<growable-vector ~S>"
            (growable-vector->list gv))))

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-12 20:34                 ` Mark H Weaver
@ 2012-06-12 20:47                   ` David Kastrup
  2012-06-12 21:03                     ` Mark H Weaver
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-12 20:47 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> Hi David,
>
> David Kastrup <dak@gnu.org> writes:
>> Mark H Weaver <mhw@netris.org> writes:
>>> Simpler data structures can usually be implemented with less memory,
>>> shorter code sequences with fewer conditional branches and less space in
>>> the instruction cache, which in turn means they can be implemented more
>>> efficiently.  Therefore, to allow efficient compilation, primitive data
>>> structures should be very simple, with more complex structures built on
>>> simpler ones instead of the other way around.
>>>
>>> For example, consider resizable vectors vs fixed-size vectors.  A
>>> fixed-size vector can be represented as a single memory block that
>>> contains both the length and the elements together.  A resizable vector
>>> requires an additional level of pointer indirection, which inevitably
>>> means slower accesses and greater code size.  Furthermore, fixed-size
>>> vectors allow the possibility of compiling in an unsafe mode where
>>> out-of-bounds checks can be skipped.
>>
>> I have a really hard time swallowing an efficiency argument for Scheme
>> that is weak enough in comparison with the associated drawbacks not to
>> find consideration in the C++ standard template library.
>
> C++, like Scheme, already supports fixed-size vectors in the core
> language, so it would be redundant to include them in a library.

A vector with run-time determined size?  Which variant of C++ offers
that?

>> What kind of performance gains are we talking about in a typical
>> vector-heavy application?
>
> If C++ supported _only_ resizable vectors, such that there was no way
> to avoid the additional level of pointer indirection, and all derived
> data structures had to be built upon these doubly-indirected vectors,
> then I'd expect that the efficiency impact would be quite significant
> in both time and space.

Reality check: C++ does not offer structs/classes containing vectors of
run-time determinable size.  You need to allocate a pointer for them.

You are apparently confusing fixed size with compile-time size.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-12 20:47                   ` David Kastrup
@ 2012-06-12 21:03                     ` Mark H Weaver
  2012-06-12 21:18                       ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Mark H Weaver @ 2012-06-12 21:03 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

David Kastrup <dak@gnu.org> writes:
> Mark H Weaver <mhw@netris.org> writes:
>> C++, like Scheme, already supports fixed-size vectors in the core
>> language, so it would be redundant to include them in a library.
>
> A vector with run-time determined size?  Which variant of C++ offers
> that?

Um, this is basic functionality that has been available in C since the
beginning, e.g.: int *v = malloc (len * sizeof(int)).  Admittedly, I've
not written a single line of C++ in the last 15 years, but as I recall
C++ has the new[] operator that handles this nicely for arbitrary
classes.

>> If C++ supported _only_ resizable vectors, such that there was no way
>> to avoid the additional level of pointer indirection, and all derived
>> data structures had to be built upon these doubly-indirected vectors,
>> then I'd expect that the efficiency impact would be quite significant
>> in both time and space.
>
> Reality check: C++ does not offer structs/classes containing vectors of
> run-time determinable size.  You need to allocate a pointer for them.

Yes, of course you need to allocate a block, and you need to do that for
Scheme vectors as well.  However, a resizable vector object needs _two_
blocks: one block containing the elements, and another block containing
the length and the pointer to the elements.  That means _two_ pointer
lookups to access an element, as opposed to one for fixed-size vectors.

     Mark



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-12 21:03                     ` Mark H Weaver
@ 2012-06-12 21:18                       ` David Kastrup
  0 siblings, 0 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-12 21:18 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> David Kastrup <dak@gnu.org> writes:
>> Mark H Weaver <mhw@netris.org> writes:
>>> C++, like Scheme, already supports fixed-size vectors in the core
>>> language, so it would be redundant to include them in a library.
>>
>> A vector with run-time determined size?  Which variant of C++ offers
>> that?
>
> Um, this is basic functionality that has been available in C since the
> beginning, e.g.: int *v = malloc (len * sizeof(int)).

That does not offer reflection of the length, so you are talking about a
different beast here.

> Admittedly, I've not written a single line of C++ in the last 15
> years, but as I recall C++ has the new[] operator that handles this
> nicely for arbitrary classes.

Still no reflection about the length.

>>> If C++ supported _only_ resizable vectors, such that there was no way
>>> to avoid the additional level of pointer indirection, and all derived
>>> data structures had to be built upon these doubly-indirected vectors,
>>> then I'd expect that the efficiency impact would be quite significant
>>> in both time and space.
>>
>> Reality check: C++ does not offer structs/classes containing vectors of
>> run-time determinable size.  You need to allocate a pointer for them.
>
> Yes, of course you need to allocate a block, and you need to do that
> for Scheme vectors as well.  However, a resizable vector object needs
> _two_ blocks: one block containing the elements, and another block
> containing the length and the pointer to the elements.  That means
> _two_ pointer lookups to access an element, as opposed to one for
> fixed-size vectors.

Reality check: this is _exactly_ what you need to do in C++ unless you
choose _not_ to store the length as part of the vector.  One lookup to
the block containing length and pointer to data, one lookup to get to
the data.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-09 12:32 Growable arrays? David Kastrup
                   ` (5 preceding siblings ...)
  2012-06-12 13:36 ` Hans Aberg
@ 2012-06-14 14:33 ` Mark H Weaver
  2012-06-14 14:47   ` David Kastrup
  6 siblings, 1 reply; 43+ messages in thread
From: Mark H Weaver @ 2012-06-14 14:33 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-devel

David Kastrup <dak@gnu.org> writes:
> Scheme/Guile vectors are fixed size.  [...]  It is a bit of a nuisance
> that one can grow a hashtable efficiently and on-demand, but not so an
> array.

Although Scheme vectors should remain fixed-size for reasons I have
given elsewhere in this thread, Guile also includes a more complex
'array' type that includes features such as arbitrary rank (i.e. number
of dimensions), arbitrary lower bounds (not just 0), and shared views on
the same underlying array with arbitrary affine mappings of indices.

Guile 'arrays' cannot currently be resized, but I see no good reason for
this limitation.  They are already quite complex, and already require a
second level of pointer indirection.

What do other people think?

    Mark



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 14:33 ` Mark H Weaver
@ 2012-06-14 14:47   ` David Kastrup
  2012-06-14 15:23     ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-14 14:47 UTC (permalink / raw)
  To: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> David Kastrup <dak@gnu.org> writes:
>> Scheme/Guile vectors are fixed size.  [...]  It is a bit of a nuisance
>> that one can grow a hashtable efficiently and on-demand, but not so an
>> array.
>
> Although Scheme vectors should remain fixed-size for reasons I have
> given elsewhere in this thread, Guile also includes a more complex
> 'array' type that includes features such as arbitrary rank (i.e. number
> of dimensions), arbitrary lower bounds (not just 0), and shared views on
> the same underlying array with arbitrary affine mappings of indices.
>
> Guile 'arrays' cannot currently be resized, but I see no good reason for
> this limitation.  They are already quite complex, and already require a
> second level of pointer indirection.
>
> What do other people think?

Another complex type, this time with quite more serious memory and
performance impact, that can't be implemented on top of a simple
resizable common primitive and has an almost, but not quite, overlapping
underlying feature set?

It's almost as if you _like_ not being able to reuse code.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 14:47   ` David Kastrup
@ 2012-06-14 15:23     ` Daniel Hartwig
  2012-06-14 15:34       ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-14 15:23 UTC (permalink / raw)
  To: guile-devel

On 14 June 2012 22:47, David Kastrup <dak@gnu.org> wrote:
> Mark H Weaver <mhw@netris.org> writes:
>
>> David Kastrup <dak@gnu.org> writes:
>>> Scheme/Guile vectors are fixed size.  [...]  It is a bit of a nuisance
>>> that one can grow a hashtable efficiently and on-demand, but not so an
>>> array.
>>
>> Although Scheme vectors should remain fixed-size for reasons I have
>> given elsewhere in this thread, Guile also includes a more complex
>> 'array' type that includes features such as arbitrary rank (i.e. number
>> of dimensions), arbitrary lower bounds (not just 0), and shared views on
>> the same underlying array with arbitrary affine mappings of indices.
>>
>> Guile 'arrays' cannot currently be resized, but I see no good reason for
>> this limitation.  They are already quite complex, and already require a
>> second level of pointer indirection.
>>
>> What do other people think?

Perhaps drop a basic resizable type in the guile-lib?  Any
implementation is going to be less-than-ideal in some situations. The
guile core provides already a solid set of fundamental types which are
very easy to compose to produce *exactly* the type required for any
particular situation.

>
> Another complex type, this time with quite more serious memory and
> performance impact, that can't be implemented on top of a simple
> resizable common primitive and has an almost, but not quite, overlapping
> underlying feature set?
>
> It's almost as if you _like_ not being able to reuse code.
>

I don't see how either hash table or multi-dimensional array type
could benefit from a shared, resizable primitive – they are quite
different algorithms, and both different again to the basic resizable
vector already discussed.

It is moot that hash table internally manages a vector and allocates a
/new/ vector when needed.  This is not a simple resizing operation.

Are there any data types in the guile core which would benefit from
sharing a resizable vector primitive?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 15:23     ` Daniel Hartwig
@ 2012-06-14 15:34       ` David Kastrup
  2012-06-14 16:56         ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-14 15:34 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 14 June 2012 22:47, David Kastrup <dak@gnu.org> wrote:
>> Mark H Weaver <mhw@netris.org> writes:
>>
>>> David Kastrup <dak@gnu.org> writes:
>>>> Scheme/Guile vectors are fixed size.  [...]  It is a bit of a nuisance
>>>> that one can grow a hashtable efficiently and on-demand, but not so an
>>>> array.
>>>
>>> Although Scheme vectors should remain fixed-size for reasons I have
>>> given elsewhere in this thread, Guile also includes a more complex
>>> 'array' type that includes features such as arbitrary rank (i.e. number
>>> of dimensions), arbitrary lower bounds (not just 0), and shared views on
>>> the same underlying array with arbitrary affine mappings of indices.
>>>
>>> Guile 'arrays' cannot currently be resized, but I see no good reason for
>>> this limitation.  They are already quite complex, and already require a
>>> second level of pointer indirection.
>>>
>>> What do other people think?
>
> Perhaps drop a basic resizable type in the guile-lib?  Any
> implementation is going to be less-than-ideal in some situations. The
> guile core provides already a solid set of fundamental types which are
> very easy to compose to produce *exactly* the type required for any
> particular situation.

Except when they don't.  Which is what started this thread.

>> Another complex type, this time with quite more serious memory and
>> performance impact, that can't be implemented on top of a simple
>> resizable common primitive and has an almost, but not quite, overlapping
>> underlying feature set?
>>
>> It's almost as if you _like_ not being able to reuse code.
>
> I don't see how either hash table or multi-dimensional array type
> could benefit from a shared, resizable primitive – they are quite
> different algorithms, and both different again to the basic resizable
> vector already discussed.
>
> It is moot that hash table internally manages a vector and allocates a
> /new/ vector when needed.  This is not a simple resizing operation.

Huh?  When resizing a hash table by doubling, you need to recoalesce
each bucket to two buckets, one of which is the original.  Doubling the
size of the underlying vector is a reasonable start to do this operation
as a half-in-place rather than a three-address variant.

> Are there any data types in the guile core which would benefit from
> sharing a resizable vector primitive?

Hash tables.  Apparently vlists.  I am not convinced that this does not
make sense for normal and uniform vectors as an option as well.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 15:34       ` David Kastrup
@ 2012-06-14 16:56         ` Daniel Hartwig
  2012-06-14 17:15           ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-14 16:56 UTC (permalink / raw)
  To: guile-devel

On 14 June 2012 23:34, David Kastrup <dak@gnu.org> wrote:
> Daniel Hartwig <mandyke@gmail.com> writes:
>> The
>> guile core provides already a solid set of fundamental types which are
>> very easy to compose to produce *exactly* the type required for any
>> particular situation.
>
> Except when they don't.  Which is what started this thread.

You don't find that the guile primitives are easy to compose?

Please remind me what data structure you are precisely after?  You
initially were comparing with Lua tables, but then mentioned that you
only required the resizable vector function.

There has been at least two examples of that data types posted to this
thread which use the underlying vector primitive.  Both examples are
very simple.

Compared to a lot of the data types at the core of guile, resizable
vector has *lots* of variety in it's details, the particular subset
you optimize depends on your usage.

>>> Another complex type, this time with quite more serious memory and
>>> performance impact, that can't be implemented on top of a simple
>>> resizable common primitive and has an almost, but not quite, overlapping
>>> underlying feature set?
>>>
>>> It's almost as if you _like_ not being able to reuse code.
>>
>> I don't see how either hash table or multi-dimensional array type
>> could benefit from a shared, resizable primitive – they are quite
>> different algorithms, and both different again to the basic resizable
>> vector already discussed.
>>
>> It is moot that hash table internally manages a vector and allocates a
>> /new/ vector when needed.  This is not a simple resizing operation.
>
> Huh?  When resizing a hash table by doubling, you need to recoalesce
> each bucket to two buckets, one of which is the original.  Doubling the
> size of the underlying vector is a reasonable start to do this operation
> as a half-in-place rather than a three-address variant.

What is this half-in-place algorithm that makes this efficient?  If
the table is to remain balanced, all items should be rehashed and
reallocated to a new bucket and there is no correlation between an
items old and new buckets (hash).

Perhaps I miss an obvious trick, but I am thinking of …

… when the table controls it's own memory:

1. allocate new vector
2. assign each item to it's new bucket
3. destroy old vector

… when the table does not control it's own memory but uses an
underlying resizable type:

1. resize vector
1a. allocate new vector
1b. move data to head of new vector
1c. destroy old vector
2. assign each item to it's new bucket but this time watch out for
already populated buckets …

which to me /seems/ less efficient, and particularly bad if the
buckets are linked lists or open addressed.

But I am curious if this can be done efficiently, and indeed if it is
more efficient than the table managing it's own memory.

>
>> Are there any data types in the guile core which would benefit from
>> sharing a resizable vector primitive?
>
> Hash tables.  Apparently vlists.  I am not convinced that this does not
> make sense for normal and uniform vectors as an option as well.

A vlist is a very different, it does not resize the underlying vectors
but chains them together.  This avoids the memory copy implicit to
resizing vectors at the expense of some speed during other operations.
 The particulars also permit multiple vlists to share arbitrary tails,
and other nice properties.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 16:56         ` Daniel Hartwig
@ 2012-06-14 17:15           ` David Kastrup
  2012-06-14 17:23             ` Daniel Hartwig
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2012-06-14 17:15 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 14 June 2012 23:34, David Kastrup <dak@gnu.org> wrote:
>
>> Huh?  When resizing a hash table by doubling, you need to recoalesce
>> each bucket to two buckets, one of which is the original.  Doubling
>> the size of the underlying vector is a reasonable start to do this
>> operation as a half-in-place rather than a three-address variant.
>
> What is this half-in-place algorithm that makes this efficient?  If
> the table is to remain balanced, all items should be rehashed and
> reallocated to a new bucket and there is no correlation between an
> items old and new buckets (hash).

Huh?  Hashing computes a hash value, and the bucket is determined by
hash % vectorsize.  Double the vector size, and the new bucket locations
are at bucket and bucket+vectorsize, so you need to coalesce the bucket
at bucket into the buckets at bucket and bucket+vectorsize.

Why would there be no correlation between old and new buckets when they
are calculated based on the same hash code?

> 1. allocate new vector
> 2. assign each item to it's new bucket
> 3. destroy old vector
>
> … when the table does not control it's own memory but uses an
> underlying resizable type:
>
> 1. resize vector
> 1a. allocate new vector
> 1b. move data to head of new vector
> 1c. destroy old vector
> 2. assign each item to it's new bucket but this time watch out for
> already populated buckets …

I have no idea what you are talking about.  Each bucket contains a list.

> which to me /seems/ less efficient, and particularly bad if the
> buckets are linked lists or open addressed.

Complexity is pretty much the same if we don't bother about memory
locality.  Being able to process the buckets sequentially, however, is
advantageous because they then stay mostly in cache.  Splitting a single
bucket list into two lists is easy to do in-place.

> But I am curious if this can be done efficiently, and indeed if it is
> more efficient than the table managing it's own memory.

It can never be more efficient than "the table managing its own memory"
since obviously it is then able to pick the best algorithm.  Which in
this case is simply identical with the general-purpose algorithm.  So
while it is not more efficient in computing resources (apart from the
small impact due to duplicate code), it is more efficient in programmer
resources: cost of writing, understanding, debugging, documenting.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 17:15           ` David Kastrup
@ 2012-06-14 17:23             ` Daniel Hartwig
  2012-06-14 17:49               ` David Kastrup
  0 siblings, 1 reply; 43+ messages in thread
From: Daniel Hartwig @ 2012-06-14 17:23 UTC (permalink / raw)
  To: guile-devel

On 15 June 2012 01:15, David Kastrup <dak@gnu.org> wrote:
> Daniel Hartwig <mandyke@gmail.com> writes:
>> What is this half-in-place algorithm that makes this efficient?  If
>> the table is to remain balanced, all items should be rehashed and
>> reallocated to a new bucket and there is no correlation between an
>> items old and new buckets (hash).
>
> Huh?  Hashing computes a hash value, and the bucket is determined by
> hash % vectorsize.  Double the vector size, and the new bucket locations
> are at bucket and bucket+vectorsize, so you need to coalesce the bucket
> at bucket into the buckets at bucket and bucket+vectorsize.
>
> Why would there be no correlation between old and new buckets when they
> are calculated based on the same hash code?
…

I see.  So starting from the old tail there is little contention for
the new buckets.  This seems obvious now in hindsight :-)

Regarding the data type for your application, this is something which
needs to be accessible from the c side also?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Growable arrays?
  2012-06-14 17:23             ` Daniel Hartwig
@ 2012-06-14 17:49               ` David Kastrup
  0 siblings, 0 replies; 43+ messages in thread
From: David Kastrup @ 2012-06-14 17:49 UTC (permalink / raw)
  To: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> On 15 June 2012 01:15, David Kastrup <dak@gnu.org> wrote:
>> Daniel Hartwig <mandyke@gmail.com> writes:
>>> What is this half-in-place algorithm that makes this efficient?  If
>>> the table is to remain balanced, all items should be rehashed and
>>> reallocated to a new bucket and there is no correlation between an
>>> items old and new buckets (hash).
>>
>> Huh?  Hashing computes a hash value, and the bucket is determined by
>> hash % vectorsize.  Double the vector size, and the new bucket locations
>> are at bucket and bucket+vectorsize, so you need to coalesce the bucket
>> at bucket into the buckets at bucket and bucket+vectorsize.
>>
>> Why would there be no correlation between old and new buckets when they
>> are calculated based on the same hash code?
> …
>
> I see.  So starting from the old tail there is little contention for
> the new buckets.  This seems obvious now in hindsight :-)
>
> Regarding the data type for your application, this is something which
> needs to be accessible from the c side also?

No.  And we have significant amount of C code of our own, so adding
stuff here is not a problem.  And we are talking about rather
performance-critical stuff.

For one thing, it feels ridiculous having to implement a rather basic
data type oneself.  For another, growing a memory area can sometimes be
done in-place when one talks nicely to the garbage management (allocate
from the free space following the resizable vector preferably backwards,
so that resizing may still be opportunistically possible.  Or allocate
the vector itself backwards and try growing it backwards in memory).
There are various strategies that might be more efficient than the
"naive relocator" but that can't be done without actually meddling with
Guile internals.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-06-14 17:49 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-09 12:32 Growable arrays? David Kastrup
2012-06-09 14:43 ` Krister Svanlund
2012-06-09 17:35   ` David Kastrup
2012-06-11  4:23 ` Daniel Hartwig
2012-06-11  4:37   ` David Kastrup
2012-06-11  5:00     ` Daniel Hartwig
2012-06-11  7:25       ` David Kastrup
2012-06-11  9:01         ` Daniel Hartwig
2012-06-11  9:13           ` Daniel Hartwig
2012-06-11 10:38             ` David Kastrup
2012-06-11 11:57               ` Daniel Hartwig
2012-06-11 12:13         ` Noah Lavine
2012-06-11 12:28           ` David Kastrup
2012-06-11 23:50             ` Mark H Weaver
2012-06-12  9:34               ` David Kastrup
2012-06-12 20:34                 ` Mark H Weaver
2012-06-12 20:47                   ` David Kastrup
2012-06-12 21:03                     ` Mark H Weaver
2012-06-12 21:18                       ` David Kastrup
2012-06-11  8:14 ` Thien-Thi Nguyen
2012-06-11  9:08 ` Andy Wingo
2012-06-11  9:55   ` David Kastrup
2012-06-11 11:25     ` Andy Wingo
2012-06-11 12:00       ` David Kastrup
2012-06-11 12:12         ` David Kastrup
2012-06-11 12:20           ` David Kastrup
2012-06-11 13:04             ` Daniel Hartwig
2012-06-11 14:19               ` David Kastrup
2012-06-11 15:24                 ` Stefan Israelsson Tampe
2012-06-11 15:27                 ` Andy Wingo
2012-06-11 16:03                   ` David Kastrup
2012-06-11 12:20         ` Daniel Hartwig
2012-06-11 12:36           ` David Kastrup
2012-06-11 12:02 ` Ludovic Courtès
2012-06-12 13:36 ` Hans Aberg
2012-06-14 14:33 ` Mark H Weaver
2012-06-14 14:47   ` David Kastrup
2012-06-14 15:23     ` Daniel Hartwig
2012-06-14 15:34       ` David Kastrup
2012-06-14 16:56         ` Daniel Hartwig
2012-06-14 17:15           ` David Kastrup
2012-06-14 17:23             ` Daniel Hartwig
2012-06-14 17:49               ` David Kastrup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).