unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* for-each et al
@ 2014-03-02 15:55 Andy Wingo
  2014-03-02 21:27 ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2014-03-02 15:55 UTC (permalink / raw)
  To: guile-devel

Hi,

I was looking at some profiles of the compiler in master, and
surprisingly for-each was fairly high in the list.  Probably for-each
should be inlined; but be that as it may, I thought I should take a look
inside for-each and see what the deal was.

The current boot-9 for-each for one list argument is this nastiness:

  (define for-each
    (case-lambda
      ((f l)
       (let for-each1 ((hare l) (tortoise l))
         (if (pair? hare)
             (begin
               (f (car hare))
               (let ((hare (cdr hare)))
                 (if (pair? hare)
                     (begin
                       (when (eq? tortoise hare)
                         (scm-error 'wrong-type-arg "for-each" "Circular list: ~S"
                                    (list l) #f))
                       (f (car hare))
                       (for-each1 (cdr hare) (cdr tortoise)))
                     (for-each1 hare tortoise))))
             (if (not (null? hare))
                 (scm-error 'wrong-type-arg "for-each" "Not a list: ~S"
                            (list l) #f)))))
      ...))

Terrible.  And it's slower than this:

  (lambda (f l)
    (unless (list? l)
      (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
    (let for-each1 ((l l))
      (unless (null? l)
        (f (car l))
        (for-each1 (cdr l)))))

So, pop quiz: what's the difference between the two?

Yes, the first version would raise a "circular list" error instead of
"not a list" on a circular list -- not that anyone cares, I think.

Also, the boot-9 version calls the procedure on earlier elements of
non-lists, something that perhaps it should avoid.

But the real difference is in the handling of concurrent manipulation of
the list being traversed.  The boot-9 version is mostly tolerant of
concurrent manipulation (either by f or by another thread), as it
detects non-lists lazily.  In contrast if a list is concurrently made
circular, the check-then-iterate version will loop infinitely.
(Incidentally the generic multi-list variant of for-each is vulnerable
to this kind of bug.)

I invite you to check out this Matthew Flatt article from November 2007:

  http://blog.racket-lang.org/2007/11/getting-rid-of-set-car-and-set-cdr.html

Of course, there are different levels at which this problem can be
solved.  I think moving towards deprecation and removal of mutable pairs
is probably a good idea, though it's really complicated and not really
the point of this mail.  OTOH I think it's reasonable to ask for a
consensus opinion on this implementation of "for-each":

  (lambda (f l)
    (unless (list? l)
      (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
    (let for-each1 ((l l))
      (unless (null? l)
        (f (car l))
        (for-each1 (cdr l)))))

Is this a reasonable implementation?  Are we OK with the possibility for
infinite-loops or exceptions accessing the car/cdr of what might not be
a pair?  Alternately if we change the test to pair? are we OK with the
possibility of the loop possibly ending silently before its time?  How
do we deal with this kind of bug -- what's our perspective?

My proposal would be that eventually, I don't know how, but Guile should
remove all use of mutable pairs in its own code.  There's not much
set-car!/set-cdr! in Scheme so it's not that big of a deal.  Therefore
in light of that long-term goal, we should write library code as if
pairs were immutable, and therefore the test-then-iterate example is
fine code.

WDYT?

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: for-each et al
  2014-03-02 15:55 for-each et al Andy Wingo
@ 2014-03-02 21:27 ` Ludovic Courtès
  2014-03-04 20:30   ` Andy Wingo
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2014-03-02 21:27 UTC (permalink / raw)
  To: guile-devel

Hello!

Welcome back to email.  ;-)

Andy Wingo <wingo@pobox.com> skribis:

> The current boot-9 for-each for one list argument is this nastiness:
>
>   (define for-each
>     (case-lambda
>       ((f l)
>        (let for-each1 ((hare l) (tortoise l))
>          (if (pair? hare)
>              (begin
>                (f (car hare))
>                (let ((hare (cdr hare)))
>                  (if (pair? hare)
>                      (begin
>                        (when (eq? tortoise hare)
>                          (scm-error 'wrong-type-arg "for-each" "Circular list: ~S"
>                                     (list l) #f))
>                        (f (car hare))
>                        (for-each1 (cdr hare) (cdr tortoise)))
>                      (for-each1 hare tortoise))))
>              (if (not (null? hare))
>                  (scm-error 'wrong-type-arg "for-each" "Not a list: ~S"
>                             (list l) #f)))))
>       ...))
>
> Terrible.  And it's slower than this:
>
>   (lambda (f l)
>     (unless (list? l)
>       (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
>     (let for-each1 ((l l))
>       (unless (null? l)
>         (f (car l))
>         (for-each1 (cdr l)))))
>
> So, pop quiz: what's the difference between the two?

I think an important difference you didn’t mention is that ‘list?’ does
its list traversal in C.

> Of course, there are different levels at which this problem can be
> solved.  I think moving towards deprecation and removal of mutable pairs
> is probably a good idea, though it's really complicated and not really
> the point of this mail.  OTOH I think it's reasonable to ask for a
> consensus opinion on this implementation of "for-each":
>
>   (lambda (f l)
>     (unless (list? l)
>       (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
>     (let for-each1 ((l l))
>       (unless (null? l)
>         (f (car l))
>         (for-each1 (cdr l)))))
>
> Is this a reasonable implementation?  Are we OK with the possibility for
> infinite-loops or exceptions accessing the car/cdr of what might not be
> a pair?  Alternately if we change the test to pair? are we OK with the
> possibility of the loop possibly ending silently before its time?  How
> do we deal with this kind of bug -- what's our perspective?

I’m OK with this implementation.  I’ve never found myself iterating over
a list and modifying it concurrently.

> My proposal would be that eventually, I don't know how, but Guile should
> remove all use of mutable pairs in its own code.  There's not much
> set-car!/set-cdr! in Scheme so it's not that big of a deal.  Therefore
> in light of that long-term goal, we should write library code as if
> pairs were immutable, and therefore the test-then-iterate example is
> fine code.
>
> WDYT?

I’m all in favor of immutable pairs.  However, I expect that a lot of
code out there relies on it.

Moving set-car! and set-cdr! to a different module like in R6 may be a
good first (or second) step.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: for-each et al
  2014-03-02 21:27 ` Ludovic Courtès
@ 2014-03-04 20:30   ` Andy Wingo
  2014-03-04 22:35     ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2014-03-04 20:30 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Sun 02 Mar 2014 22:27, ludo@gnu.org (Ludovic Courtès) writes:

> Welcome back to email.  ;-)

Thank you :)  This makes my first round-trip ;)

>>   (lambda (f l)
>>     (unless (list? l)
>>       (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
>>     (let for-each1 ((l l))
>>       (unless (null? l)
>>         (f (car l))
>>         (for-each1 (cdr l)))))
>>
>> So, pop quiz: what's the difference between the two?
>
> I think an important difference you didn’t mention is that ‘list?’ does
> its list traversal in C.

That's true.  Let's call the above version for-each*.  Then we have
for-each**, which is the same except it calls list*, which is:

(define (list*? l)
  (let lp ((hare l) (tortoise l))
    (or (null? hare)
        (and (pair? hare)
             (let ((hare (cdr hare)))
               (or (null? hare)
                   (and (pair? hare)
                        (not (eq? tortoise hare))
                        (lp (cdr hare) (cdr tortoise)))))))))

Then we have:

  scheme@(guile-user)> (define l (iota 100000000))
  scheme@(guile-user)> ,time (for-each** (lambda (x) x) l)
  ;; 3.228201s real time, 3.223919s run time.  0.000000s spent in GC.
  scheme@(guile-user)> ,time (for-each* (lambda (x) x) l)
  ;; 2.356612s real time, 2.353496s run time.  0.000000s spent in GC.
  scheme@(guile-user)> ,time (for-each (lambda (x) x) l)
  ;; 3.142366s real time, 3.126806s run time.  0.000000s spent in GC.

So the overhead of traversing the list twice is negligible in the worst
case, and probably we still gain once the compiler gets better.

>>   (lambda (f l)
>>     (unless (list? l)
>>       (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
>>     (let for-each1 ((l l))
>>       (unless (null? l)
>>         (f (car l))
>>         (for-each1 (cdr l)))))

> I’m OK with this implementation.  I’ve never found myself iterating over
> a list and modifying it concurrently.

Yeah it's something that no one in their right mind does.  But still we
should have some kind of statement about what we guarantee -- currently
the code is resilient to concurrent mutation, and we'd be relaxing that.

> I’m all in favor of immutable pairs.  However, I expect that a lot of
> code out there relies on it.
>
> Moving set-car! and set-cdr! to a different module like in R6 may be a
> good first (or second) step.

Brainstorming possibilities:

  * Deprecate set-car!/set-cdr! (replacement would be mutable cells in
    the car/cdr; currently called variables, but could be called boxes)

  * Introducing a #!lang facility, and having programs with #!lang make
    immutable pairs

  * ?

It's completely possible to remove mutable pair usage in Guile, but
what's not clear is how to migrate our users.  I assume that we would
want to make Guile's core operations only operate on immutable pairs;
how big would the mutable pair libraries have to be?

Could we forsake R6RS/R7RS and only do immutable pairs?  (I would be OK
with that.  In that case R5RS would run in its own environment.)

Dunno, just some thoughts.  Deprecating set-car!/set-cdr! would be an
easy first step, if we planned to make further steps.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: for-each et al
  2014-03-04 20:30   ` Andy Wingo
@ 2014-03-04 22:35     ` Ludovic Courtès
  2014-03-05 19:59       ` Andy Wingo
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2014-03-04 22:35 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Sun 02 Mar 2014 22:27, ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> I think an important difference you didn’t mention is that ‘list?’ does
>> its list traversal in C.
>
> That's true.  Let's call the above version for-each*.  Then we have
> for-each**, which is the same except it calls list*, which is:
>
> (define (list*? l)
>   (let lp ((hare l) (tortoise l))
>     (or (null? hare)
>         (and (pair? hare)
>              (let ((hare (cdr hare)))
>                (or (null? hare)
>                    (and (pair? hare)
>                         (not (eq? tortoise hare))
>                         (lp (cdr hare) (cdr tortoise)))))))))
>
> Then we have:
>
>   scheme@(guile-user)> (define l (iota 100000000))
>   scheme@(guile-user)> ,time (for-each** (lambda (x) x) l)
>   ;; 3.228201s real time, 3.223919s run time.  0.000000s spent in GC.
>   scheme@(guile-user)> ,time (for-each* (lambda (x) x) l)
>   ;; 2.356612s real time, 2.353496s run time.  0.000000s spent in GC.
>   scheme@(guile-user)> ,time (for-each (lambda (x) x) l)
>   ;; 3.142366s real time, 3.126806s run time.  0.000000s spent in GC.
>
> So the overhead of traversing the list twice is negligible in the worst
> case, and probably we still gain once the compiler gets better.

Yes.

>>>   (lambda (f l)
>>>     (unless (list? l)
>>>       (scm-error 'wrong-type-arg "for-each" "Not a list: ~S" (list l) #f))
>>>     (let for-each1 ((l l))
>>>       (unless (null? l)
>>>         (f (car l))
>>>         (for-each1 (cdr l)))))
>
>> I’m OK with this implementation.  I’ve never found myself iterating over
>> a list and modifying it concurrently.
>
> Yeah it's something that no one in their right mind does.  But still we
> should have some kind of statement about what we guarantee -- currently
> the code is resilient to concurrent mutation, and we'd be relaxing that.

Right.  R5 and R6 don’t say anything about that, but yes, we can add a
statement in the manual and NEWS to that effect.

I think the for-each change would be for 2.2, right?

>> I’m all in favor of immutable pairs.  However, I expect that a lot of
>> code out there relies on it.
>>
>> Moving set-car! and set-cdr! to a different module like in R6 may be a
>> good first (or second) step.
>
> Brainstorming possibilities:
>
>   * Deprecate set-car!/set-cdr! (replacement would be mutable cells in
>     the car/cdr; currently called variables, but could be called boxes)

SRFI-111 boxes.

We should check a few existing packages and usage patterns before.

For instance, I suspect SCM_SETCAR/SCM_SETCDR are actually more
widespread than their Scheme counterparts, and probably much harder to
avoid.  What can we do with them?

Another issue: what about elisp?  It needs mutable pairs, but we don’t
want to have it use a disjoint type.

>   * Introducing a #!lang facility, and having programs with #!lang make
>     immutable pairs

Not really fan of the idea.  :-)

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: for-each et al
  2014-03-04 22:35     ` Ludovic Courtès
@ 2014-03-05 19:59       ` Andy Wingo
  2014-03-05 22:32         ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2014-03-05 19:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Tue 04 Mar 2014 23:35, ludo@gnu.org (Ludovic Courtès) writes:

> I think the for-each change would be for 2.2, right?

Sure.  I'm really just on 2.2 these days...

>>   * Deprecate set-car!/set-cdr! (replacement would be mutable cells in
>>     the car/cdr; currently called variables, but could be called boxes)
>
> SRFI-111 boxes.

Sure.  I guess we should rename our variables to boxes.

> I suspect SCM_SETCAR/SCM_SETCDR are actually more widespread than
> their Scheme counterparts, and probably much harder to avoid.  What
> can we do with them?

Depends.  If they are used to build up a data structure, I wouldn't
worry -- it's not detectable by Scheme except via continuation hacks.
There are only about 70 places in libguile itself that we use SETCDR,
and 30 or so for SETCAR.  Not that bad.  About 40 callers of
scm_reverse_x though.

> Another issue: what about elisp?  It needs mutable pairs, but we don’t
> want to have it use a disjoint type.

A very good question, and I don't know.  Would a tc7 mutable-pair type
be that bad?  Could we do it with a tc3 instead?  Dunno.  This could
make it impossible.

>>   * Introducing a #!lang facility, and having programs with #!lang make
>>     immutable pairs
>
> Not really fan of the idea.  :-)

Why not?  It makes it clear what's in scope at the beginning of a file,
which is a nice advantage.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: for-each et al
  2014-03-05 19:59       ` Andy Wingo
@ 2014-03-05 22:32         ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2014-03-05 22:32 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Tue 04 Mar 2014 23:35, ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> I suspect SCM_SETCAR/SCM_SETCDR are actually more widespread than
>> their Scheme counterparts, and probably much harder to avoid.  What
>> can we do with them?
>
> Depends.  If they are used to build up a data structure, I wouldn't
> worry -- it's not detectable by Scheme except via continuation hacks.
> There are only about 70 places in libguile itself that we use SETCDR,
> and 30 or so for SETCAR.  Not that bad.  About 40 callers of
> scm_reverse_x though.

Right, maybe that’s not too scary.

>> Another issue: what about elisp?  It needs mutable pairs, but we don’t
>> want to have it use a disjoint type.
>
> A very good question, and I don't know.  Would a tc7 mutable-pair type
> be that bad?  Could we do it with a tc3 instead?  Dunno.  This could
> make it impossible.

The question IMO is not about tagging, but about having a disjoint type.
I think the whole idea behind elisp support was that it could use the
very same data types as Scheme code, which would no longer be the case
if mutable pairs and immutable pairs were disjoint.

>>>   * Introducing a #!lang facility, and having programs with #!lang make
>>>     immutable pairs
>>
>> Not really fan of the idea.  :-)
>
> Why not?  It makes it clear what's in scope at the beginning of a file,
> which is a nice advantage.

I have mixed feelings: as a way to save a few use-modules, it’s looks
both convenient and hackish (you can’t #!lang several languages
typically, right?).

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-05 22:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-02 15:55 for-each et al Andy Wingo
2014-03-02 21:27 ` Ludovic Courtès
2014-03-04 20:30   ` Andy Wingo
2014-03-04 22:35     ` Ludovic Courtès
2014-03-05 19:59       ` Andy Wingo
2014-03-05 22:32         ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).