* Delimited continuations to the rescue of futures
@ 2012-11-16 23:36 Ludovic Courtès
2012-11-17 4:38 ` Mark H Weaver
2013-01-14 2:34 ` Nala Ginrut
0 siblings, 2 replies; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-16 23:36 UTC (permalink / raw)
To: guile-devel
Hello!
As was reported recently by Mark and others, ‘par-map’ would only use
ncores - 1, because the main thread was stuck in a
‘wait-condition-variable’ while touching one of the futures.
The obvious fix is to write ‘par-map’ like this (as can be seen from
Chapter 2 of Marc Feeley’s PhD thesis):
(define (par-mapper mapper cons)
(lambda (proc . lists)
(let loop ((lists lists))
(match lists
(((heads tails ...) ...)
(let ((tail (future (loop tails)))
(head (apply proc heads)))
(cons head (touch tail))))
(_
'())))))
However, our futures did not support “nested futures”. That is, if a
future touched another future, it would also wait on a condition
variable until the latter completes. Thus, the above code would only
use one core.
So the fix is to support nested futures, by properly scheduling futures
that are active, and adding those that are waiting to a wait queue.
Those added to the wait queue have their continuation captured (yeah!),
so that they can be later rescheduled when their “waitee” has completed.
But then there’s still the problem of the main thread: it’s silly to let
it just wait on a condition variable when it touches a future that has
not completed yet. So now it behaves as a worker, processing futures
until the one it’s waiting for has completed.
Figures from my 2-core/4-thread laptop:
--8<---------------cut here---------------start------------->8---
$ time ./meta/guile -c '(begin (use-modules (ice-9 threads)) (define (fibo n) (if (<= n 1) n (+ (fibo (- n 1)) (fibo (- n 2))))) (pk "r" (map fibo (make-list 4 30))))'
;;; ("r" (832040 832040 832040 832040))
real 0m27.864s
user 0m27.773s
sys 0m0.031s
$ time ./meta/guile -c '(begin (use-modules (ice-9 threads)) (define (fibo n) (if (<= n 1) n (+ (fibo (- n 1)) (fibo (- n 2))))) (pk "r" (par-map fibo (make-list 4 30))))'
;;; ("r" (832040 832040 832040 832040))
real 0m10.899s
user 0m42.487s
sys 0m0.051s
--8<---------------cut here---------------end--------------->8---
The speedup is not optimal, but there’s room for optimization.
I’ve pushed the result in ‘wip-nested-futures’. Please review and test!
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-16 23:36 Delimited continuations to the rescue of futures Ludovic Courtès
@ 2012-11-17 4:38 ` Mark H Weaver
2012-11-17 13:43 ` Ludovic Courtès
2013-01-14 2:34 ` Nala Ginrut
1 sibling, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2012-11-17 4:38 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
ludo@gnu.org (Ludovic Courtès) writes:
> As was reported recently by Mark and others, ‘par-map’ would only use
> ncores - 1, because the main thread was stuck in a
> ‘wait-condition-variable’ while touching one of the futures.
>
> The obvious fix is to write ‘par-map’ like this (as can be seen from
> Chapter 2 of Marc Feeley’s PhD thesis):
>
> (define (par-mapper mapper cons)
> (lambda (proc . lists)
> (let loop ((lists lists))
> (match lists
> (((heads tails ...) ...)
> (let ((tail (future (loop tails)))
> (head (apply proc heads)))
> (cons head (touch tail))))
> (_
> '())))))
Am I correct in believing that the above code would use the main thread
only for applying 'proc' to the first element of each list?
In other words, if you have 4 cores and call 'par-map' on a list of 1000
elements, the main thread will only be used to process 1 out of 1000
elements, and only 3 cores will be used to process the other 999.
Is that right?
Mark
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-17 4:38 ` Mark H Weaver
@ 2012-11-17 13:43 ` Ludovic Courtès
2012-11-17 16:56 ` Mark H Weaver
0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-17 13:43 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> ludo@gnu.org (Ludovic Courtès) writes:
>
>> As was reported recently by Mark and others, ‘par-map’ would only use
>> ncores - 1, because the main thread was stuck in a
>> ‘wait-condition-variable’ while touching one of the futures.
>>
>> The obvious fix is to write ‘par-map’ like this (as can be seen from
>> Chapter 2 of Marc Feeley’s PhD thesis):
>>
>> (define (par-mapper mapper cons)
>> (lambda (proc . lists)
>> (let loop ((lists lists))
>> (match lists
>> (((heads tails ...) ...)
>> (let ((tail (future (loop tails)))
>> (head (apply proc heads)))
>> (cons head (touch tail))))
>> (_
>> '())))))
>
> Am I correct in believing that the above code would use the main thread
> only for applying 'proc' to the first element of each list?
Yes.
> In other words, if you have 4 cores and call 'par-map' on a list of 1000
> elements, the main thread will only be used to process 1 out of 1000
> elements, and only 3 cores will be used to process the other 999.
> Is that right?
No. :-)
That’s what would happen with current stable-2.0. But in
‘wip-nested-futures’, the main thread behaves as a worker while waiting
for the tail future to complete (which addresses your main concern, I
think, as we had discussed on IRC.) Thus, all cores are always used.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-17 13:43 ` Ludovic Courtès
@ 2012-11-17 16:56 ` Mark H Weaver
2012-11-17 22:00 ` Ludovic Courtès
0 siblings, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2012-11-17 16:56 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
Hi Ludovic,
ludo@gnu.org (Ludovic Courtès) writes:
> Mark H Weaver <mhw@netris.org> skribis:
>> In other words, if you have 4 cores and call 'par-map' on a list of 1000
>> elements, the main thread will only be used to process 1 out of 1000
>> elements, and only 3 cores will be used to process the other 999.
>> Is that right?
>
> No. :-)
>
> That’s what would happen with current stable-2.0. But in
> ‘wip-nested-futures’, the main thread behaves as a worker while waiting
> for the tail future to complete (which addresses your main concern, I
> think, as we had discussed on IRC.) Thus, all cores are always used.
Ah, okay, good! :)
I guess the one remaining concern I have is that if there are any
long-running futures in the process, then any 'touch' could take a very
long time to complete, even if the future it is waiting for is a very
short job.
For example, (par-map - '(1 2 3)) could take several minutes to complete
if (par-map process-image list-of-images) is being done in another
thread.
This doesn't sit well with me. It would be good if we could avoid it,
but at the very least we'll have to warn users about this in the
documentation.
I should mention that this problem already exists in Guile 2.0, and is
not related to your recent work.
Thanks,
Mark
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-17 16:56 ` Mark H Weaver
@ 2012-11-17 22:00 ` Ludovic Courtès
2012-11-18 22:19 ` Mark H Weaver
2012-11-21 23:28 ` Ludovic Courtès
0 siblings, 2 replies; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-17 22:00 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Hi Mark!
Mark H Weaver <mhw@netris.org> skribis:
> I guess the one remaining concern I have is that if there are any
> long-running futures in the process, then any 'touch' could take a very
> long time to complete, even if the future it is waiting for is a very
> short job.
>
> For example, (par-map - '(1 2 3)) could take several minutes to complete
> if (par-map process-image list-of-images) is being done in another
> thread.
Right. That’s actually easy to fix, but the advantage of the current
solution is that it uses less code. Anyway, you’re probably right, so
I’ll do that.
BTW, I’d find it awkward to do use both futures and explicit threads at
the same time.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-17 22:00 ` Ludovic Courtès
@ 2012-11-18 22:19 ` Mark H Weaver
2012-11-20 19:20 ` Ludovic Courtès
2012-11-21 23:28 ` Ludovic Courtès
1 sibling, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2012-11-18 22:19 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
Hi Ludovic!
ludo@gnu.org (Ludovic Courtès) writes:
> Mark H Weaver <mhw@netris.org> skribis:
>
>> I guess the one remaining concern I have is that if there are any
>> long-running futures in the process, then any 'touch' could take a very
>> long time to complete, even if the future it is waiting for is a very
>> short job.
>>
>> For example, (par-map - '(1 2 3)) could take several minutes to complete
>> if (par-map process-image list-of-images) is being done in another
>> thread.
>
> Right. That’s actually easy to fix, but the advantage of the current
> solution is that it uses less code. Anyway, you’re probably right, so
> I’ll do that.
I'd be very curious to hear about your easy fix, because I couldn't
think of anything easy.
Thanks,
Mark
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-18 22:19 ` Mark H Weaver
@ 2012-11-20 19:20 ` Ludovic Courtès
2012-11-21 12:18 ` Peter TB Brett
0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-20 19:20 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> I guess the one remaining concern I have is that if there are any
>>> long-running futures in the process, then any 'touch' could take a very
>>> long time to complete, even if the future it is waiting for is a very
>>> short job.
>>>
>>> For example, (par-map - '(1 2 3)) could take several minutes to complete
>>> if (par-map process-image list-of-images) is being done in another
>>> thread.
>>
>> Right. That’s actually easy to fix, but the advantage of the current
>> solution is that it uses less code. Anyway, you’re probably right, so
>> I’ll do that.
>
> I'd be very curious to hear about your easy fix, because I couldn't
> think of anything easy.
I think you were right. :-)
The solution I tried was to have ‘touch’ directly call (process-future!
future) when it’s queued. The problem arises when the future suspends
itself. In that case, the calling ‘touch’ has then no way to
distinguish between a waiting future and a queued one, because both have
the ‘queued’ status. So it ends up re-running the just-queued future,
which fails because it’s in fact waiting.
So I introduced a third state, ‘waiting’, to help with that. I ended up
with more complex code and, more importantly, a hard-to-debug race
condition (I wish we had tools to help with that...)
Having spent some time on this, I’m inclined to leave things this way.
:-)
WDYT?
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-20 19:20 ` Ludovic Courtès
@ 2012-11-21 12:18 ` Peter TB Brett
2012-11-21 13:36 ` Ludovic Courtès
0 siblings, 1 reply; 14+ messages in thread
From: Peter TB Brett @ 2012-11-21 12:18 UTC (permalink / raw)
To: guile-devel
This is going to sound like a daft question, but: is there any reason
that the thread that calls 'touch' needs to be the same thread that
calls its continuation?
I.e. why does there need to be a special "main thread"? Can't "picking
up a job blocking on touch" just be another task allocated to the
thread pool?
Rubbish diagram:
Thread A Thread B
-------- --------
Creates a future F ...
... Starts computing F
Touches F ...
Starts computing future G ...
... Finishes computing F
... Continues job that touched F
Is this not a plausible approach?
Peter
--
Peter Brett <peter@peter-b.co.uk>
Remote Sensing Research Group
Surrey Space Centre
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-21 12:18 ` Peter TB Brett
@ 2012-11-21 13:36 ` Ludovic Courtès
2012-11-21 16:45 ` Peter TB Brett
0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-21 13:36 UTC (permalink / raw)
To: guile-devel
Hi Peter,
Peter TB Brett <peter@peter-b.co.uk> skribis:
> This is going to sound like a daft question, but: is there any reason
> that the thread that calls 'touch' needs to be the same thread that
> calls its continuation?
>
> I.e. why does there need to be a special "main thread"? Can't "picking
> up a job blocking on touch" just be another task allocated to the
> thread pool?
>
> Rubbish diagram:
>
> Thread A Thread B
> -------- --------
> Creates a future F ...
> ... Starts computing F
> Touches F ...
> Starts computing future G ...
> ... Finishes computing F
> ... Continues job that touched F
>
>
> Is this not a plausible approach?
It is, IMO. This is what ‘wip-nested-futures’ currently does.
What Mark said is that, you could imagine a case where computing G
actually takes much longer than computing F. In that case, he suggested
that Thread A computes F.
However, as I said, I’m not really convinced by this argument.
Normally, both F and G are contributions to a larger computation. It
shouldn’t matter which one completes first, as long as threads are kept
busy.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-21 13:36 ` Ludovic Courtès
@ 2012-11-21 16:45 ` Peter TB Brett
2012-11-21 21:19 ` Ludovic Courtès
0 siblings, 1 reply; 14+ messages in thread
From: Peter TB Brett @ 2012-11-21 16:45 UTC (permalink / raw)
To: guile-devel
ludo@gnu.org (Ludovic Courtès) writes:
> Hi Peter,
>
> Peter TB Brett <peter@peter-b.co.uk> skribis:
>
>> This is going to sound like a daft question, but: is there any reason
>> that the thread that calls 'touch' needs to be the same thread that
>> calls its continuation?
>>
>> I.e. why does there need to be a special "main thread"? Can't "picking
>> up a job blocking on touch" just be another task allocated to the
>> thread pool?
>>
>> Rubbish diagram:
>>
>> Thread A Thread B
>> -------- --------
>> Creates a future F ...
>> ... Starts computing F
>> Touches F ...
>> Starts computing future G ...
>> ... Finishes computing F
>> ... Continues job that touched F
>>
>>
>> Is this not a plausible approach?
>
> It is, IMO. This is what ‘wip-nested-futures’ currently does.
>
> What Mark said is that, you could imagine a case where computing G
> actually takes much longer than computing F. In that case, he suggested
> that Thread A computes F.
Okay, I'm *really* confused now. In the scenario that I've diagrammed
before, why does it matter how long G takes to compute?
> However, as I said, I’m not really convinced by this argument.
> Normally, both F and G are contributions to a larger computation. It
> shouldn’t matter which one completes first, as long as threads are kept
> busy.
I clearly don't understand the objection, so I can't really comment
either way. I would quite *like* to understand it -- I'm very
interested in doing practical parallel computations with Guile -- , so
is there any chance that you would be kind enough to explain like I'm
five or something (possibly with diagrams)?
Peter
--
Peter Brett <peter@peter-b.co.uk>
Remote Sensing Research Group
Surrey Space Centre
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-21 16:45 ` Peter TB Brett
@ 2012-11-21 21:19 ` Ludovic Courtès
0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-21 21:19 UTC (permalink / raw)
To: guile-devel
Hi,
Peter TB Brett <peter@peter-b.co.uk> skribis:
> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Hi Peter,
>>
>> Peter TB Brett <peter@peter-b.co.uk> skribis:
>>
>>> This is going to sound like a daft question, but: is there any reason
>>> that the thread that calls 'touch' needs to be the same thread that
>>> calls its continuation?
>>>
>>> I.e. why does there need to be a special "main thread"? Can't "picking
>>> up a job blocking on touch" just be another task allocated to the
>>> thread pool?
>>>
>>> Rubbish diagram:
>>>
>>> Thread A Thread B
>>> -------- --------
>>> Creates a future F ...
>>> ... Starts computing F
>>> Touches F ...
>>> Starts computing future G ...
>>> ... Finishes computing F
>>> ... Continues job that touched F
>>>
>>>
>>> Is this not a plausible approach?
>>
>> It is, IMO. This is what ‘wip-nested-futures’ currently does.
>>
>> What Mark said is that, you could imagine a case where computing G
>> actually takes much longer than computing F. In that case, he suggested
>> that Thread A computes F.
>
> Okay, I'm *really* confused now. In the scenario that I've diagrammed
> before, why does it matter how long G takes to compute?
It would matter in the following case:
Thread A Thread B
-------- --------
creates futures F creates future G
... ...
touches F ...
starts computing G ...
... starts computing F
... finishes computing F
returns F's result ...
... touches G
This scenario is possible with ‘wip-nested-futures’, where a thread that
touches a future picks any pending future, not necessarily the one at
hand.
This is what Mark was concerned about. I’m inclined to think that it
doesn’t matter much.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-17 22:00 ` Ludovic Courtès
2012-11-18 22:19 ` Mark H Weaver
@ 2012-11-21 23:28 ` Ludovic Courtès
1 sibling, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2012-11-21 23:28 UTC (permalink / raw)
To: guile-devel
ludo@gnu.org (Ludovic Courtès) skribis:
> Mark H Weaver <mhw@netris.org> skribis:
>
>> I guess the one remaining concern I have is that if there are any
>> long-running futures in the process, then any 'touch' could take a very
>> long time to complete, even if the future it is waiting for is a very
>> short job.
>>
>> For example, (par-map - '(1 2 3)) could take several minutes to complete
>> if (par-map process-image list-of-images) is being done in another
>> thread.
>
> Right. That’s actually easy to fix, but the advantage of the current
> solution is that it uses less code. Anyway, you’re probably right, so
> I’ll do that.
I didn’t do that (as discussed in this thread), and ended up merging
wip-nested-futures as is.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2012-11-16 23:36 Delimited continuations to the rescue of futures Ludovic Courtès
2012-11-17 4:38 ` Mark H Weaver
@ 2013-01-14 2:34 ` Nala Ginrut
2013-01-14 10:37 ` Ludovic Courtès
1 sibling, 1 reply; 14+ messages in thread
From: Nala Ginrut @ 2013-01-14 2:34 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
On Sat, 2012-11-17 at 00:36 +0100, Ludovic Courtès wrote:
> Hello!
>
> As was reported recently by Mark and others, ‘par-map’ would only use
> ncores - 1, because the main thread was stuck in a
> ‘wait-condition-variable’ while touching one of the futures.
>
> The obvious fix is to write ‘par-map’ like this (as can be seen from
> Chapter 2 of Marc Feeley’s PhD thesis):
>
> (define (par-mapper mapper cons)
> (lambda (proc . lists)
> (let loop ((lists lists))
> (match lists
> (((heads tails ...) ...)
> (let ((tail (future (loop tails)))
> (head (apply proc heads)))
> (cons head (touch tail))))
> (_
> '())))))
>
> However, our futures did not support “nested futures”. That is, if a
> future touched another future, it would also wait on a condition
> variable until the latter completes. Thus, the above code would only
> use one core.
>
Sorry, but this implementation seems to be a tail-recursive unsafe one.
------------------------------cut---------------------------
scheme@(guile-user)> (par-map 1+ (iota 10000))
ice-9/threads.scm:99:22: In procedure loop:
ice-9/threads.scm:99:22: Throw to key `vm-error' with args `(vm-run "VM:
Stack overflow" ())'.
Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
------------------------------end---------------------------
PS: I do know '1+' here may not be a proper test case, but it doesn't
related here anyway.
> So the fix is to support nested futures, by properly scheduling futures
> that are active, and adding those that are waiting to a wait queue.
> Those added to the wait queue have their continuation captured (yeah!),
> so that they can be later rescheduled when their “waitee” has completed.
>
> But then there’s still the problem of the main thread: it’s silly to let
> it just wait on a condition variable when it touches a future that has
> not completed yet. So now it behaves as a worker, processing futures
> until the one it’s waiting for has completed.
>
> Figures from my 2-core/4-thread laptop:
>
> --8<---------------cut here---------------start------------->8---
> $ time ./meta/guile -c '(begin (use-modules (ice-9 threads)) (define (fibo n) (if (<= n 1) n (+ (fibo (- n 1)) (fibo (- n 2))))) (pk "r" (map fibo (make-list 4 30))))'
>
> ;;; ("r" (832040 832040 832040 832040))
>
> real 0m27.864s
> user 0m27.773s
> sys 0m0.031s
>
> $ time ./meta/guile -c '(begin (use-modules (ice-9 threads)) (define (fibo n) (if (<= n 1) n (+ (fibo (- n 1)) (fibo (- n 2))))) (pk "r" (par-map fibo (make-list 4 30))))'
>
> ;;; ("r" (832040 832040 832040 832040))
>
> real 0m10.899s
> user 0m42.487s
> sys 0m0.051s
> --8<---------------cut here---------------end--------------->8---
>
> The speedup is not optimal, but there’s room for optimization.
>
> I’ve pushed the result in ‘wip-nested-futures’. Please review and test!
>
> Thanks,
> Ludo’.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Delimited continuations to the rescue of futures
2013-01-14 2:34 ` Nala Ginrut
@ 2013-01-14 10:37 ` Ludovic Courtès
0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2013-01-14 10:37 UTC (permalink / raw)
To: Nala Ginrut; +Cc: guile-devel
Hi,
Nala Ginrut <nalaginrut@gmail.com> skribis:
> Sorry, but this implementation seems to be a tail-recursive unsafe one.
I’ve seen the bug report and will look at it as time permits, unless
someone else beats me at it. ;-)
Ludo’.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-01-14 10:37 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-16 23:36 Delimited continuations to the rescue of futures Ludovic Courtès
2012-11-17 4:38 ` Mark H Weaver
2012-11-17 13:43 ` Ludovic Courtès
2012-11-17 16:56 ` Mark H Weaver
2012-11-17 22:00 ` Ludovic Courtès
2012-11-18 22:19 ` Mark H Weaver
2012-11-20 19:20 ` Ludovic Courtès
2012-11-21 12:18 ` Peter TB Brett
2012-11-21 13:36 ` Ludovic Courtès
2012-11-21 16:45 ` Peter TB Brett
2012-11-21 21:19 ` Ludovic Courtès
2012-11-21 23:28 ` Ludovic Courtès
2013-01-14 2:34 ` Nala Ginrut
2013-01-14 10:37 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).