Re: vectors are something else

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* Re: vectors are something else
       [not found] <mailman.1288755.1365813667.854.guile-devel@gnu.org>
@ 2013-04-15 11:29 ` Daniel Llorens
  2013-04-15 12:28   ` Daniel Hartwig
  2013-04-16  2:00   ` vectors are something else Mark H Weaver
  0 siblings, 2 replies; 15+ messages in thread
From: Daniel Llorens @ 2013-04-15 11:29 UTC (permalink / raw)
  To: guile-devel

Date: Fri, 12 Apr 2013 17:43:14 -0400
From: Mark H Weaver <mhw@netris.org>

> I've not yet had time to carefully read this thread, but I wanted to say
> that I think we *should* prohibit passing arrays to the vector
> interfaces.  When we have native code generation, then it will be
> possible to make the vector procedures extremely simple machine code
> sequences.  We must avoid adding any complications to this.  Even a
> single additional conditional branch will be painful, in both runtime
> overhead and code size.
> 
> If we were to support a subset of arrays to the vector interface, the
> way it should be done is have the array constructors produce actual
> vector objects when possible, or at least something that can be used by
> the simple vector code sequences without any additional conditional
> branches.
> 
> Does this make sense?

Two things about this.

First, arrays cannot remain opaque types if we want any kind of performance. The compiler should know about the storage and about the array descriptor as independent objects. Then it's possible to inline the index computations, or to eliminate the array descriptor whenever it is not used or when its fields are known at compile time, or to eliminate type checks.

Second, the array operations don't involve any more branching than the vector operations, only an extra indirection that can be amortized over the size of the vector.

That's why I think that there're two options:

[1] have vector- ops only accept vector- types. This removes container type dispatch from the vector- ops but not from the array- ops. The last patch set I've sent to the list goes in this direction. It matches the behavior of other vector-like types such as bytevectors and strings.

[2] force all vector objects into arrays. This removes container type dispatch from both the vector- and the array- ops, but it adds the cost of indirection to the vector- ops. Which is, I think, a good tradeoff, because that cost is smaller, there's some hope that the compiler will remove it in many cases, and we retain the flexibility in the use of vectors that we have now.

This choice affects the interface, if you care about performance. For example in [1], you'd prefer vector- to *reject* strided rank-1 arrays (inc!=1), because those require a descriptor and you want to limit your checks to is-this-scm_tc7_vector. On the other hand, in [2], you want to *accept* offset rank-1 arrays (lbnd!=0) because to reject them you need an explicit check and it's faster to just do the index computation with this lbnd.

What I don't think is a good option is what we have now, where both the vector- and the array- ops have to be able to deal with either type. Aside from the bugs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-15 11:29 ` vectors are something else Daniel Llorens
@ 2013-04-15 12:28   ` Daniel Hartwig
  2013-04-15 14:08     ` Daniel Llorens
  2013-04-15 14:10     ` vector types poll Daniel Llorens
  2013-04-16  2:00   ` vectors are something else Mark H Weaver
  1 sibling, 2 replies; 15+ messages in thread
From: Daniel Hartwig @ 2013-04-15 12:28 UTC (permalink / raw)
  To: Daniel Llorens; +Cc: guile-devel

On 15 April 2013 19:29, Daniel Llorens <daniel.llorens@bluewin.ch> wrote:
>
> [1] have vector- ops only accept vector- types. …
>
> [2] force all vector objects into arrays. …
>
> For example in [1], you'd prefer vector- to *reject* strided rank-1 arrays
> (inc!=1), because those require a descriptor and you want to limit your
> checks to is-this-scm_tc7_vector.

Though such an array is potentially still a vector according to the simple
explanation in the manual.  So this requires introducing a more
complicated discussion of when is an array a vector.  And a lot of the
uses for treating some arrays as vectors will be lost with this, as it
quite restrictive on the layout of the source data.

Given the semantic complications and lesser utility, it may as well be
that no vectors are arrays and reject all.

> On the other hand, in [2],
> you want to *accept* offset rank-1 arrays (lbnd!=0) because to
> reject them you need an explicit check and it's faster to just do
> the index computation with this lbnd.
>

In this case, ‘vector?’ still must answer #f as the offset indexing
makes the object not a vector.  Right?

Most non-vector objects passed to ‘vector-ref’ will throw an error,
except for offset arrays.  Since the result is unspecified aka “it is
an error”, it doesnt have to be correct (ie. include lbnd).  Is it
really much faster to do one or the other:

 if lbnd != 0: error
 index = …

vs:

 index = lbnd + ...

?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-15 12:28   ` Daniel Hartwig
@ 2013-04-15 14:08     ` Daniel Llorens
  2013-04-15 14:17       ` Daniel Hartwig
  2013-04-15 14:10     ` vector types poll Daniel Llorens
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Llorens @ 2013-04-15 14:08 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

On Apr 15, 2013, at 14:28, Daniel Hartwig wrote:

> Though such an array is potentially still a vector according to the simple
> explanation in the manual.  So this requires introducing a more
> complicated discussion of when is an array a vector.

Not really. One just needs to say 'these functions produce vectors. Everything else doesn't'.

> And a lot of the
> uses for treating some arrays as vectors will be lost with this, as it
> quite restrictive on the layout of the source data.

True. One could use arrays though.

> Given the semantic complications and lesser utility, it may as well be
> that no vectors are arrays and reject all.

No disagreement here.

> In this case, ‘vector?’ still must answer #f as the offset indexing
> makes the object not a vector.  Right?

Well... it's really the same argument of 'it is an error'. We can do whatever.

> Most non-vector objects passed to ‘vector-ref’ will throw an error,
> except for offset arrays.  Since the result is unspecified aka “it is
> an error”, it doesnt have to be correct (ie. include lbnd).

Wrong is wrong. Besides, people tend to rely on unspecified behavior. It's the worst we could do. 

> Is it
> really much faster to do one or the other:
> 
> if lbnd != 0: error
> index = …
> 
> vs:
> 
> index = lbnd + ...

The second should actually be just

index = ...

You do this by having the base index point to the element at index 0 and not to the first element. This is a mistake all through the Guile array code.

A branch would certainly be slower. However, if the array implementation was exposed to the compiler, the lbnd!=0 check could be hoisted out of any array loop. So both would end up being identical in that case.

I prefer the second option. But if it comes to you and I having to agree, I'd be ok with the first.

I'm sending a poll to get some more reactions.

Regards

	Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* vector types poll
  2013-04-15 12:28   ` Daniel Hartwig
  2013-04-15 14:08     ` Daniel Llorens
@ 2013-04-15 14:10     ` Daniel Llorens
  2013-04-15 22:47       ` Daniel Hartwig
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Llorens @ 2013-04-15 14:10 UTC (permalink / raw)
  To: guile-devel

	
Let's please agree on a behavior so we can start closing bugs. These are all the objects accepted by the array interface. I've filled the table with some ready-made choices that I think are at least internally consistent.

; --

(import (rnrs bytevectors))
(define (every-two a) (make-shared-array a (lambda (i) (list (+ 1 (* 2 i)))) 2))
(define (offset1 a) (make-shared-array a (lambda (i) (list (1- i))) `(1 ,(array-length a))))

; [1] http://lists.gnu.org/archive/html/guile-devel/2013-04/msg00158.html
; [2] stable-2.0 [e006d87]
; [3] all array-type objects *are* arrays and support offsets, strides, etc.
; [4] Common ground btw D. Hartwig and I (?), functionally r5rs vectors.

; -------------------------- [1] ------------ [2] --------- [3] ------ [4]

(define a #(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(vector? a)                  ; #t             #t            #t         #t
(vector? b)                  ; #f             #f            #t         #t
(vector? c)                  ; #f             #f            #t         #f
(vector-ref a 1)             ; 2              2             2          2
(vector-ref b 1)             ; bad type       4             4          4
(vector-ref c 1)             ; bad type       2             1          bad type
(array-ref c 1)              ; 1

(define a "1234")
(define b (every-two a))
(define c (offset1 a))
(string? a)                  ; #t             #t            #t
(string? b)                  ; #f             #f            #t
(string? c)                  ; #f             #f            #t
(string-ref a 1)             ; #\2            #\2           #\2
(string-ref b 1)             ; bad type       bad type      #\4
(string-ref c 1)             ; bad type       bad type      #\1
(array-ref c 1)              ; #\1

(define a #s8(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(s8vector? a)                ; #t             #t            #t
(s8vector? b)                ; #f             #t            #t
(s8vector? c)                ; #f             #t            #t
(s8vector-ref a 1)           ; 2              2             2
(s8vector-ref b 1)           ; bad type       bad type      4
(s8vector-ref c 1)           ; bad type       bad type      1
(array-ref c 1)              ; 1

(define a #s8(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(bytevector? a)              ; #t             #t            #t
(bytevector? b)              ; #f             #f            #t
(bytevector? c)              ; #f             #f            #t
(bytevector-s8-ref a 1)      ; 2              #\2           2
(bytevector-s8-ref b 1)      ; bad type       bad type      4
(bytevector-s8-ref c 1)      ; bad type       bad type      1
(array-ref c 1)              ; 1

(define a (list->bitvector '(#t #f #t #t))) ; read syntax for vectors is broken
(define b (every-two a))
(define c (offset1 a))
(bitvector? a)               ; #t             #t            #t
(bitvector? b)               ; #f             #f            #t
(bitvector? c)               ; #f             #f            #t
(bitvector-ref a 1)          ; #f             #f            #f
(bitvector-ref b 1)          ; #t [bad type]  #t            #t
(bitvector-ref c 1)          ; #f [bad type]  #f            #t
(array-ref c 1)              ; #t

should be bad type; to be fixed ------^

Regards,

	Daniel






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-15 14:08     ` Daniel Llorens
@ 2013-04-15 14:17       ` Daniel Hartwig
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Hartwig @ 2013-04-15 14:17 UTC (permalink / raw)
  To: Daniel Llorens; +Cc: guile-devel

On 15 April 2013 22:08, Daniel Llorens <daniel.llorens@bluewin.ch> wrote:
>
> On Apr 15, 2013, at 14:28, Daniel Hartwig wrote:
>
>> Is it
>> really much faster to do one or the other:
>>
>> if lbnd != 0: error
>> index = …
>>
>> vs:
>>
>> index = lbnd + ...
>
> The second should actually be just
>
> index = ...
>
> You do this by having the base index point to the element at index 0
> and not to the first element. This is a mistake all through the Guile
> array code.

So lbnd is only for range checking.  Neat.

>
> A branch would certainly be slower. However, if the array implementation
> was exposed to the compiler, the lbnd!=0 check could be hoisted out of
> any array loop. So both would end up being identical in that case.

I see.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vector types poll
  2013-04-15 14:10     ` vector types poll Daniel Llorens
@ 2013-04-15 22:47       ` Daniel Hartwig
  2013-04-16  0:16         ` Daniel Llorens
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Hartwig @ 2013-04-15 22:47 UTC (permalink / raw)
  To: Daniel Llorens; +Cc: guile-devel

On 15 April 2013 22:10, Daniel Llorens <daniel.llorens@bluewin.ch> wrote:
>
> Let's please agree on a behavior so we can start closing bugs. These are all the objects accepted by the array interface. I've filled the table with some ready-made choices that I think are at least internally consistent.
>
> ; --
>
> (import (rnrs bytevectors))
> (define (every-two a) (make-shared-array a (lambda (i) (list (+ 1 (* 2 i)))) 2))
> (define (offset1 a) (make-shared-array a (lambda (i) (list (1- i))) `(1 ,(array-length a))))
>
> ; [1] http://lists.gnu.org/archive/html/guile-devel/2013-04/msg00158.html
> ; [2] stable-2.0 [e006d87]
> ; [3] all array-type objects *are* arrays and support offsets, strides, etc.
> ; [4] Common ground btw D. Hartwig and I (?), functionally r5rs vectors.
>
> ; -------------------------- [1] ------------ [2] --------- [3] ------ [4]
>

Is column [4] intentionally missing from all but the first set?  I was
expecting it for atleast s8vector.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vector types poll
  2013-04-15 22:47       ` Daniel Hartwig
@ 2013-04-16  0:16         ` Daniel Llorens
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Llorens @ 2013-04-16  0:16 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel


On Apr 16, 2013, at 00:47, Daniel Hartwig wrote:

> Is column [4] intentionally missing from all but the first set?  I was
> expecting it for atleast s8vector.

I wasn't sure, but it makes sense. Then the same for bitvector, I think. What about bytevector? Should it remain a special 'raw' type? But the standards treat it like vector, so maybe it should follow the same logic. And string?

--

(import (rnrs bytevectors))
(define (every-two a) (make-shared-array a (lambda (i) (list (+ 1 (* 2 i)))) 2))
(define (offset1 a) (make-shared-array a (lambda (i) (list (1- i))) `(1 ,(array-length a))))

; [1] http://lists.gnu.org/archive/html/guile-devel/2013-04/msg00158.html
; [2] stable-2.0 [e006d87]
; [3] all array-type objects are actually arrays and support offsets, strides, etc.
; [4] D. Hartwig and I may agree on this, follows functional definition of r5rs vectors.

; -------------------------- [1] ------------ [2] --------- [3] ------ [4]

(define a #(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(vector? a)                  ; #t             #t            #t         #t
(vector? b)                  ; #f             #f            #t         #t
(vector? c)                  ; #f             #f            #t         #f
(vector-ref a 1)             ; 2              2             2          2
(vector-ref b 1)             ; bad type       4             4          4
(vector-ref c 1)             ; bad type       2             1          bad type
(array-ref c 1)              ; 1

(define a "1234")
(define b (every-two a))
(define c (offset1 a))
(string? a)                  ; #t             #t            #t
(string? b)                  ; #f             #f            #t
(string? c)                  ; #f             #f            #t
(string-ref a 1)             ; #\2            #\2           #\2
(string-ref b 1)             ; bad type       bad type      #\4
(string-ref c 1)             ; bad type       bad type      #\1
(array-ref c 1)              ; #\1

(define a #s8(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(s8vector? a)                ; #t             #t            #t         #t
(s8vector? b)                ; #f             #t            #t         #t
(s8vector? c)                ; #f             #t            #t         #f
(s8vector-ref a 1)           ; 2              2             2          2
(s8vector-ref b 1)           ; bad type       bad type      4          4
(s8vector-ref c 1)           ; bad type       bad type      1          bad type
(array-ref c 1)              ; 1

(define a #s8(1 2 3 4))
(define b (every-two a))
(define c (offset1 a))
(bytevector? a)              ; #t             #t            #t
(bytevector? b)              ; #f             #f            #t
(bytevector? c)              ; #f             #f            #t
(bytevector-s8-ref a 1)      ; 2              #\2           2
(bytevector-s8-ref b 1)      ; bad type       bad type      4
(bytevector-s8-ref c 1)      ; bad type       bad type      1
(array-ref c 1)              ; 1

(define a (list->bitvector '(#t #f #t #t))) ; ad syntax for vectors is broken
(define b (every-two a))
(define c (offset1 a))
(bitvector? a)               ; #t             #t            #t         #t
(bitvector? b)               ; #f             #f            #t         #t
(bitvector? c)               ; #f             #f            #t         #f
(bitvector-ref a 1)          ; #f             #f            #f         #f
(bitvector-ref b 1)          ; #t [bad type]  #t            #t         #t
(bitvector-ref c 1)          ; #f [bad type]  #f            #t         bad type
(array-ref c 1)              ; #t







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-15 11:29 ` vectors are something else Daniel Llorens
  2013-04-15 12:28   ` Daniel Hartwig
@ 2013-04-16  2:00   ` Mark H Weaver
  2013-04-16  4:10     ` Daniel Llorens
  2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
  1 sibling, 2 replies; 15+ messages in thread
From: Mark H Weaver @ 2013-04-16  2:00 UTC (permalink / raw)
  To: Daniel Llorens; +Cc: guile-devel

Daniel Llorens <daniel.llorens@bluewin.ch> writes:

> Date: Fri, 12 Apr 2013 17:43:14 -0400
> From: Mark H Weaver <mhw@netris.org>
>
>> I've not yet had time to carefully read this thread, but I wanted to say
>> that I think we *should* prohibit passing arrays to the vector
>> interfaces.  When we have native code generation, then it will be
>> possible to make the vector procedures extremely simple machine code
>> sequences.  We must avoid adding any complications to this.  Even a
>> single additional conditional branch will be painful, in both runtime
>> overhead and code size.
>> 
>> If we were to support a subset of arrays to the vector interface, the
>> way it should be done is have the array constructors produce actual
>> vector objects when possible, or at least something that can be used by
>> the simple vector code sequences without any additional conditional
>> branches.
>> 
>> Does this make sense?
>
> Two things about this.
>
> First, arrays cannot remain opaque types if we want any kind of
> performance. The compiler should know about the storage and about the
> array descriptor as independent objects. Then it's possible to inline
> the index computations, or to eliminate the array descriptor whenever
> it is not used or when its fields are known at compile time, or to
> eliminate type checks.

Unfortunately, this is rarely possible in a language like Scheme, where
calls to procedures bound by mutable top-level variables are frequent.
We cannot fix this without making most commonly-used top-level bindings
immutable.  Last I checked, there was strong resistance to this idea.

> Second, the array operations don't involve any more branching than the
> vector operations, only an extra indirection that can be amortized
> over the size of the vector.

The cost can only be amortized in cases where the entire loop over the
vector is done without any calls to procedures bound by mutable
variables.

> That's why I think that there're two options:
>
> [1] have vector- ops only accept vector- types. This removes container
> type dispatch from the vector- ops but not from the array- ops. The
> last patch set I've sent to the list goes in this direction. It
> matches the behavior of other vector-like types such as bytevectors
> and strings.

I very much favor this approach, but unfortunately I don't think we can
do it in 2.0.  It'll have to wait until 2.2.

> [2] force all vector objects into arrays. This removes container type
> dispatch from both the vector- and the array- ops, but it adds the
> cost of indirection to the vector- ops. Which is, I think, a good
> tradeoff, because that cost is smaller, there's some hope that the
> compiler will remove it in many cases, and we retain the flexibility
> in the use of vectors that we have now.

I consider this option completely unacceptable.  Arrays are much more
complex than vectors, and will inevitably be *much* slower in the common
case where the compiler doesn't know much.  Most egregiously,
'make-shared-array' requires that array references apply an arbitrary
affine transformation that's unknown at compile time, which means
multiplications and additions for each index.

Don't get me wrong: I'm all in favor of providing flexible, generic,
extensible procedures.  They have their place.  I think that's why
there's still a justification for keeping 'generalized-vectors' around.
However, efficiency demands that we also have simple, non-generic,
non-extensible procedures as well.  IMO, we should keep the vector,
bitvector, and bytevector procedures as simple as possible.

Does this make sense?  More thoughts?

Thanks for your work on this.

      Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-16  2:00   ` vectors are something else Mark H Weaver
@ 2013-04-16  4:10     ` Daniel Llorens
  2013-04-16  6:19       ` Mark H Weaver
  2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Llorens @ 2013-04-16  4:10 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

On Apr 16, 2013, at 04:00, Mark H Weaver wrote:

>> [1] have vector- ops only accept vector- types. This removes container
>> type dispatch from the vector- ops but not from the array- ops. The
>> last patch set I've sent to the list goes in this direction. It
>> matches the behavior of other vector-like types such as bytevectors
>> and strings.
> 
> I very much favor this approach, but unfortunately I don't think we can
> do it in 2.0.  It'll have to wait until 2.2.

What should we do then about the consistency bugs? like vector? being #f on something that is accepted by vector-ref and so on. Did you have a look at the table I posted?

> I consider this option completely unacceptable.  Arrays are much more
> complex than vectors, and will inevitably be *much* slower in the common
> case where the compiler doesn't know much.

...

> Most egregiously,
> 'make-shared-array' requires that array references apply an arbitrary
> affine transformation that's unknown at compile time, which means
> multiplications and additions for each index.

make-shared-array itself is a very slow function and I'd like to have some alternatives, but it's only called when setting up the shared array. In any case, it's not slow because of any multiplications, but because of how much it conses.

You overestimate (by a lot) the cost of the index computation. One works on arrays by looping over them. In these loops the strides are always constant. No matter the rank of the array, in the inner loop only one index moves. The cost of this stepping is negligible on any scenario that isn't hugely contrived. Things like traversal order and memory order have a much larger impact on performance. 90% of the reason why Guile arrays are dog slow is that they've been programmed with no concern for efficiency. The other 10% is the need to do type dispatch on various kinds of containers and various types of elements. (These numbers are invented.)

> Don't get me wrong: I'm all in favor of providing flexible, generic,
> extensible procedures.  They have their place.  I think that's why
> there's still a justification for keeping 'generalized-vectors' around.

They didn't do anything that the array functions didn't do just as well. Plus they were buggy —they are buggy in stable-2.0. And the names took half the screen.

> However, efficiency demands that we also have simple, non-generic,
> non-extensible procedures as well.  IMO, we should keep the vector,
> bitvector, and bytevector procedures as simple as possible.

Other than the bizarre idea that arrays have to be slow, I think this is a sensible position. After all, arrays have both an array descriptor and a reference to linear storage, so there must be a type for this linear storage. (E.g. in stable-2.0, vector- operations take either arrays, in a buggy way, or the underlying storage type, which is called 'simple vector' and isn't exposed to Scheme.)

What's your take on Daniel Hartwig's position? For example a and b on the table I posted —b is an array, but it prints as #(2 4). How do you justify forbidding vector-ref for something that prints #(2 4) ?

Thanks for you comments,

	Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-16  4:10     ` Daniel Llorens
@ 2013-04-16  6:19       ` Mark H Weaver
  2013-04-16  8:31         ` Daniel Llorens
  0 siblings, 1 reply; 15+ messages in thread
From: Mark H Weaver @ 2013-04-16  6:19 UTC (permalink / raw)
  To: Daniel Llorens; +Cc: guile-devel

Daniel Llorens <daniel.llorens@bluewin.ch> writes:

> On Apr 16, 2013, at 04:00, Mark H Weaver wrote:
>
>>> [1] have vector- ops only accept vector- types. This removes container
>>> type dispatch from the vector- ops but not from the array- ops. The
>>> last patch set I've sent to the list goes in this direction. It
>>> matches the behavior of other vector-like types such as bytevectors
>>> and strings.
>> 
>> I very much favor this approach, but unfortunately I don't think we can
>> do it in 2.0.  It'll have to wait until 2.2.
>
> What should we do then about the consistency bugs? like vector? being
> #f on something that is accepted by vector-ref and so on.

I don't know.

> Did you have a look at the table I posted?

Yes, I did, and generally I strongly prefer column [1], but we have to
be very careful what changes we make in a stable release series that
might break compatibility with existing code.  I find this frustrating
as well, which is one reason why I agree with Andy that shifting focus
to 2.2 soon would be a good idea.

>> I consider this option completely unacceptable.  Arrays are much more
>> complex than vectors, and will inevitably be *much* slower in the common
>> case where the compiler doesn't know much.
>
> ...
>
>> Most egregiously,
>> 'make-shared-array' requires that array references apply an arbitrary
>> affine transformation that's unknown at compile time, which means
>> multiplications and additions for each index.
>
> make-shared-array itself is a very slow function and I'd like to have
> some alternatives, but it's only called when setting up the shared
> array. In any case, it's not slow because of any multiplications, but
> because of how much it conses.

I think you misunderstood me.  I'm not talking about the cost of the
'make-shared-array' call itself.  I'm talking about the fact that in
order to support arrays that might have been created using
'make-shared-array', things like 'array-ref' must apply an arbitrary
affine transformation to the indices in order to compute the offset into
the linear storage.

> You overestimate (by a lot) the cost of the index computation. One
> works on arrays by looping over them. In these loops the strides are
> always constant. No matter the rank of the array, in the inner loop
> only one index moves. The cost of this stepping is negligible on any
> scenario that isn't hugely contrived.

Here you're assuming either a single core procedure that loops over the
entire array (e.g. something like 'array-map!'), or a Scheme loop that
the compiler is able to analyze in detail.  Of course, when we have a
fancy native compiler, and if you write your Scheme carefully, you could
certainly arrange to make this optimization possible.

However, there's a lot of Scheme code that uses vectors and is not coded
with such concerns in mind.  In such code, you might have a top-level
procedure that receives an argument and applies 'vector-ref' to it.  In
such cases, if all vectors are actually arrays, the compiler cannot
assume anything about what kind of array is passed in.  Such an array
might have been created by 'make-shared-array', and thus might use an
arbitrary affine transformation.  If 'vector-ref' can be used on such an
array, then this procedure would have to fetch the stride and offset and
do the multiplication each time.  If there's a loop that calls this
top-level procedure, then the compiler can't do anything with this.

In the context of Scheme, with no static type system, with most
procedures accessed via mutable top-level variables, and where programs
are often written in terms of higher-order procedures, analyzing
programs can be very difficult or impossible.  Therefore, it's a bad
idea to have complicated primitives that are slow in the general case,
in the hope that a fancy compiler will be able to analyze entire loops
and make them fast through non-trivial analyses.

>> Don't get me wrong: I'm all in favor of providing flexible, generic,
>> extensible procedures.  They have their place.  I think that's why
>> there's still a justification for keeping 'generalized-vectors' around.
>
> They didn't do anything that the array functions didn't do just as
> well.

Well, generalized vector accessors don't have to cope with multiple
dimensions, nor do they require performing multiplications for most of
the supported vector types.

> Plus they were buggy —they are buggy in stable-2.0.

Bugs are not a good reason to remove an API.

> And the names took half the screen.

This is a weak argument.  You can always make shorter names for them.

Having said all this, I haven't yet thought much about this issue, and
don't (yet) have a strong opinion on whether the generic vector
procedures should be kept.

>> However, efficiency demands that we also have simple, non-generic,
>> non-extensible procedures as well.  IMO, we should keep the vector,
>> bitvector, and bytevector procedures as simple as possible.
>
> Other than the bizarre idea that arrays have to be slow, I think this
> is a sensible position. After all, arrays have both an array
> descriptor and a reference to linear storage, so there must be a type
> for this linear storage. (E.g. in stable-2.0, vector- operations take
> either arrays, in a buggy way, or the underlying storage type, which
> is called 'simple vector' and isn't exposed to Scheme.)
>
> What's your take on Daniel Hartwig's position? For example a and b on
> the table I posted —b is an array, but it prints as #(2 4). How do you
> justify forbidding vector-ref for something that prints #(2 4) ?

I don't think that arrays should ever be printed as #(2 4).

Thanks for the discussion.

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vectors are something else
  2013-04-16  6:19       ` Mark H Weaver
@ 2013-04-16  8:31         ` Daniel Llorens
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Llorens @ 2013-04-16  8:31 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel


On Apr 16, 2013, at 08:19, Mark H Weaver wrote:

> Yes, I did, and generally I strongly prefer column [1], but we have to
> be very careful what changes we make in a stable release series that
> might break compatibility with existing code.  I find this frustrating
> as well, which is one reason why I agree with Andy that shifting focus
> to 2.2 soon would be a good idea.

All right.

> I think you misunderstood me.  I'm not talking about the cost of the
> 'make-shared-array' call itself.  I'm talking about the fact that in
> order to support arrays that might have been created using
> 'make-shared-array', things like 'array-ref' must apply an arbitrary
> affine transformation to the indices in order to compute the offset into
> the linear storage.

Sorry if it sounded that way; the specific mention of make-shared-array puzzled me. There are, after all, other functions that imply such a transformation, e.g. transpose-array.

> assume anything about what kind of array is passed in.  Such an array
> might have been created by 'make-shared-array', and thus might use an
> arbitrary affine transformation.  If 'vector-ref' can be used on such an
> array, then this procedure would have to fetch the stride and offset and
> do the multiplication each time.  If there's a loop that calls this
> top-level procedure, then the compiler can't do anything with this.

Since vector-ref takes a single index, 'arbitrary affine transformation' is b+d*i, all three scalars. In a context where the compiler can't do any meaningful optimization, is this going to matter?

At any rate, I accept your point that the indexing cannot be optimized out in general.

> This is a weak argument.  You can always make shorter names for them.

I wasn't serious.

> I don't think that arrays should ever be printed as #(2 4).

The obvious alternative is #1(2 4). As of now, these are 100% equivalent.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Mutable top-level bindings (was: vectors are something else)
  2013-04-16  2:00   ` vectors are something else Mark H Weaver
  2013-04-16  4:10     ` Daniel Llorens
@ 2013-04-17 15:29     ` Chris K. Jester-Young
  2013-04-17 17:53       ` Mutable top-level bindings Mark H Weaver
                         ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Chris K. Jester-Young @ 2013-04-17 15:29 UTC (permalink / raw)
  To: guile-devel

On Mon, Apr 15, 2013 at 10:00:55PM -0400, Mark H Weaver wrote:
> Unfortunately, this is rarely possible in a language like Scheme, where
> calls to procedures bound by mutable top-level variables are frequent.
> We cannot fix this without making most commonly-used top-level bindings
> immutable.  Last I checked, there was strong resistance to this idea.

Maybe it's time to reopen this (and hope it's not a can of worms). :-)

With a proper module system, I don't see why top-level bindings should
be mutable. That would make even things like direct inlining of cons or
+ somewhat harder than it needs to be. The way I understand it, the
definition of (@ (guile) cons) or the like should not be subject to
change at runtime. If people want to change the behaviour of cons, write
a module that replaces the top-level binding.

Yes, this does disable the ability to perform monkey-patching. I don't
see this as a big loss, but perhaps there are legitimate use cases for
monkey-patching that I haven't thought of.

Another thing we will need to provide is define-values, which allows you
to make top-level bindings that are mutually-recursive. By which I mean:

    (define-values (foo bar baz)
      (letrec ((foo ...)
               (bar ...)
               (baz ...))
        (values foo bar baz)))

This would obviate the need to use the following pattern:

    (define foo #f)
    (define bar #f)
    (define baz #f)
    (letrec ((my-foo ...)
             (my-bar ...)
             (my-baz ...))
      (set! foo my-foo)
      (set! bar my-bar)
      (set! baz my-baz))

Granted, there are existing modules that use that approach currently, as
we don't currently have define-values. But I think it should be possible
to annotate individual modules as having immutable top-level bindings,
so that we can gradually migrate modules to the define-values style.

Cheers,
Chris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mutable top-level bindings
  2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
@ 2013-04-17 17:53       ` Mark H Weaver
  2013-04-17 20:25       ` Ian Price
  2013-04-20 14:00       ` Ludovic Courtès
  2 siblings, 0 replies; 15+ messages in thread
From: Mark H Weaver @ 2013-04-17 17:53 UTC (permalink / raw)
  To: guile-devel

Hi Chris,

"Chris K. Jester-Young" <cky944@gmail.com> writes:

> On Mon, Apr 15, 2013 at 10:00:55PM -0400, Mark H Weaver wrote:
>> Unfortunately, this is rarely possible in a language like Scheme, where
>> calls to procedures bound by mutable top-level variables are frequent.
>> We cannot fix this without making most commonly-used top-level bindings
>> immutable.  Last I checked, there was strong resistance to this idea.
>
> Maybe it's time to reopen this (and hope it's not a can of worms). :-)
>
> With a proper module system, I don't see why top-level bindings should
> be mutable.

Well, if top-level bindings are immutable, then you have to restart the
system, and possibly even recompile .go files that incorporated
assumptions about the immutable bindings, in order to change a binding.
This is quite contrary to the Lisp and Scheme tradition of course.

My preference would be to provide a way to mark bindings as "rarely
mutated", such that the compiler could treat them as immutable.  The
binding could still be mutated, but mutation would be very expensive: it
would cause all compiled code that incorporated assumptions about that
binding to be immediately invalidated.  This is hard to implement,
though, because it requires "on stack replacement".

> That would make even things like direct inlining of cons or
> + somewhat harder than it needs to be. The way I understand it, the
> definition of (@ (guile) cons) or the like should not be subject to
> change at runtime. If people want to change the behaviour of cons, write
> a module that replaces the top-level binding.

Currently, our compiler treats several bindings in (guile) such as '+',
'car', 'cdr', 'cons', 'eq?', etc, as if they were immutable.  See
(language tree-il primitives) for the full list.  However, it's not
handled very well: you are still allowed to mutate those bindings, and I
guess the evaluator will pick up the changes but the compiler won't.

> Another thing we will need to provide is define-values, which allows you
> to make top-level bindings that are mutually-recursive. By which I mean:
>
>     (define-values (foo bar baz)
>       (letrec ((foo ...)
>                (bar ...)
>                (baz ...))
>         (values foo bar baz)))

Yes.  I started work on this quite a while ago, but got distracted.
Doing the job right requires changes to the "fixing-letrec" pass of the
compiler.  I started by reimplementing the improved "fixing letrec
reloaded" algorithm (with some fixes), but for various reasons it's not
yet ready for publication.  The next step is to finish that work and to
extend it to handle letrec-values and letrec*-values.

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mutable top-level bindings
  2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
  2013-04-17 17:53       ` Mutable top-level bindings Mark H Weaver
@ 2013-04-17 20:25       ` Ian Price
  2013-04-20 14:00       ` Ludovic Courtès
  2 siblings, 0 replies; 15+ messages in thread
From: Ian Price @ 2013-04-17 20:25 UTC (permalink / raw)
  To: guile-devel; +Cc: Chris K. Jester-Young

"Chris K. Jester-Young" <cky944@gmail.com> writes:

> With a proper module system, I don't see why top-level bindings should
> be mutable. That would make even things like direct inlining of cons or
> + somewhat harder than it needs to be. The way I understand it, the
> definition of (@ (guile) cons) or the like should not be subject to
> change at runtime. If people want to change the behaviour of cons, write
> a module that replaces the top-level binding.

Not top-level, but _exports_ and _imports_. While their use is often
ugly, I'm not sure we should do away with all top level mutables.

Since I write a lot of R6RS code, I'm pretty used to the "no mutable
exports/imports" deal. It isn't too bothersome, except in some cases
with macros and so-called implicit exports/imports. (see bottom)

> Yes, this does disable the ability to perform monkey-patching. I don't
> see this as a big loss, but perhaps there are legitimate use cases for
> monkey-patching that I haven't thought of.
It is pretty useful to be able to do redefinition in a different module
at the repl. Perhaps I could come up with a tortured argument to explain
why this is different from mutating a top-level export, but I'll refrain :)

> Another thing we will need to provide is define-values, which allows you
> to make top-level bindings that are mutually-recursive. By which I mean:
>
>     (define-values (foo bar baz)
>       (letrec ((foo ...)
>                (bar ...)
>                (baz ...))
>         (values foo bar baz)))
+1

> This would obviate the need to use the following pattern:
>
>     (define foo #f)
>     (define bar #f)
>     (define baz #f)
>     (letrec ((my-foo ...)
>              (my-bar ...)
>              (my-baz ...))
>       (set! foo my-foo)
>       (set! bar my-bar)
>       (set! baz my-baz))
This example is one that does not work in R6RS if you export foo, bar,
or baz. To do this, you need an additional set of defines that indirect
to the mutated ones.
e.g. https://github.com/rotty/spells/blob/master/spells/define-values.sls

>
> Granted, there are existing modules that use that approach currently, as
> we don't currently have define-values. But I think it should be possible
> to annotate individual modules as having immutable top-level bindings,
> so that we can gradually migrate modules to the define-values style.
>


* Implicit imports/exports
(http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-10.html#node_sec_7.1)

"An exported macro may, however, implicitly export an otherwise
unexported identifier defined within or imported into the library. That
is, it may insert a reference to that identifier into the output code it
produces."

"All implicitly exported variables are also immutable in both the
exporting and importing libraries. It is thus a syntax violation if a
variable appears on the left-hand side of a set! expression in any code
produced by an exported macro outside of the library in which the
variable is defined...."

-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mutable top-level bindings
  2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
  2013-04-17 17:53       ` Mutable top-level bindings Mark H Weaver
  2013-04-17 20:25       ` Ian Price
@ 2013-04-20 14:00       ` Ludovic Courtès
  2 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2013-04-20 14:00 UTC (permalink / raw)
  To: guile-devel

"Chris K. Jester-Young" <cky944@gmail.com> skribis:

> On Mon, Apr 15, 2013 at 10:00:55PM -0400, Mark H Weaver wrote:
>> Unfortunately, this is rarely possible in a language like Scheme, where
>> calls to procedures bound by mutable top-level variables are frequent.
>> We cannot fix this without making most commonly-used top-level bindings
>> immutable.  Last I checked, there was strong resistance to this idea.
>
> Maybe it's time to reopen this (and hope it's not a can of worms). :-)
>
> With a proper module system, I don't see why top-level bindings should
> be mutable.

[...]

> Yes, this does disable the ability to perform monkey-patching. I don't
> see this as a big loss, but perhaps there are legitimate use cases for
> monkey-patching that I haven't thought of.

Several uses cases would need to be addressed:

  - Uses of ‘set!’ for bootstrapping purposes as in boot-9.scm (I found
    myself using a similar idiom in
    <http://git.sv.gnu.org/cgit/libchop.git/tree/guile2/chop/internal.scm>
    because I couldn’t come up with another approach.)

  - Patching, as in
    <http://git.sv.gnu.org/cgit/guix.git/tree/guix/build/download.scm#n173>,
    where we’re able to work around a Guile bug without actually
    bundling a copy of the fixed module.

  - Live development (with Geiser), as Mark noted.  ProofGeneral does
    something interesting, where evaluated forms become immutable in the
    file, and have to be “de-evaluated” in reverse order to become
    mutable again.  That provides additional consistency guarantees, but
    remains far less flexible than what Scheme has to offer.

> Another thing we will need to provide is define-values,

+1

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-04-20 14:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.1288755.1365813667.854.guile-devel@gnu.org>
2013-04-15 11:29 ` vectors are something else Daniel Llorens
2013-04-15 12:28   ` Daniel Hartwig
2013-04-15 14:08     ` Daniel Llorens
2013-04-15 14:17       ` Daniel Hartwig
2013-04-15 14:10     ` vector types poll Daniel Llorens
2013-04-15 22:47       ` Daniel Hartwig
2013-04-16  0:16         ` Daniel Llorens
2013-04-16  2:00   ` vectors are something else Mark H Weaver
2013-04-16  4:10     ` Daniel Llorens
2013-04-16  6:19       ` Mark H Weaver
2013-04-16  8:31         ` Daniel Llorens
2013-04-17 15:29     ` Mutable top-level bindings (was: vectors are something else) Chris K. Jester-Young
2013-04-17 17:53       ` Mutable top-level bindings Mark H Weaver
2013-04-17 20:25       ` Ian Price
2013-04-20 14:00       ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).