Re: Limiting parallelism using futures, parallel forms and fibers

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

From: "Linus Björnstam" <linus.internet@fastmail.se>
To: guile-user@gnu.org
Subject: Re: Limiting parallelism using futures, parallel forms and fibers
Date: Wed, 08 Jan 2020 09:11:50 +0100	[thread overview]
Message-ID: <b3d27ef2-43d7-494d-9426-71af7e6b5723@www.fastmail.com> (raw)
In-Reply-To: <04cb0461-18a1-ef17-4db7-2475c7c806e6@posteo.de>

Hi! 

I don't have much more input than to say that futures use a built in thread pool that is limited to (current-processor-count) threads. That could maybe be modified using setaffinity ?

Hope this helps.

-- 
  Linus Björnstam

On Wed, 8 Jan 2020, at 08:56, Zelphir Kaltstahl wrote:
> Hello Guile users!
> 
> I thought about what I need for parallelizing an algorithm I am working
> on. Background: I am working on my decision tree implementation
> (https://notabug.org/ZelphirKaltstahl/guile-ml/src/wip-port-to-guile),
> which is currently only using a single core. Training the model splits
> the tree into subtrees, which is an opportunity for running things in
> parallel. Subtrees can subsequently split again and that could again
> result in a parallel evaluation of the subtrees.
> 
> Since it might not always be desired to use potentially all available
> cores (blocking the whole CPU), I would like to add the possibility for
> the user of the algorithm to limit the number of cores used by the
> algorithm, most likely using a keyword argument, which defaults to the
> number of cores. Looking at futures, parallel forms and the fibers
> library, I've had the following thoughts:
> 
> 1. fibers support a ~#:parallelism~ keyword and thus the number of
> fibers running in parallel can be set for ~run-fibers~, which creates a
> scheduler, which might be used later, possibly avoiding to keep track of
> how many threads are being used. It would probably be important when
> using fibers, to always make use of the same scheduler, as a new
> scheduler might not know anything about other schedulers and the limit
> for parallelism might be overstepped. Schedulers, which are controlled
> by the initial scheduler are probably OK. I will need to re-read the
> fibers manual on that.
> 
> 2. With parallel forms, there are ~n-par-map~ and ~n-par-for-each~,
> where one can specify the maximum number of threads running in parallel.
> However, when recursively calling these procedures (in my case
> processing subtrees of a decision tree, which might split into subtrees
> again), one does not have the knowledge of whether the processing of the
> other recursive calls has finished and cannot know, how many other
> threads are being used currently. Calling one of these procedures again
> might run more threads than specified on a upper level call.
> 
> 3. With futures, one cannot specify the number of futures to run at
> maximum directly. In order to control how many threads are evaluating
> code at the same time, one would have to build some kind of construct
> around them, which keeps track of how many futures are running or could
> be running and make that work for recursive creation of further futures
> as well.
> 
> 4. One could also do something ugly and create a globally defined active
> thread counter, which requires locking, to keep track of the number of
> in parallel running threads or futures.
> 
> 5. I could just assume the maximum number of currently used cores, by
> giving the tree depth as argument for recursive calls and calculating
> from that, how many splits and thus evaluations might be running in
> parallel at that point. However, this might be inaccurate, as some
> subtree might already be finished and then I would not use the maximum
> user specified number of parallel evaluations.
> 
> So my questions are:
> 
> - Is there a default / recommended way to limit parallelism for
> recursive calls to parallel forms?
> 
> - Is there a better way than a global counter with locking, to limit the
> number of futures created during recursive calls? I would dislike very
> much to have to do something like global state + mutex.
> 
> - What do you recommend in general to solve this?
> 
> Regards,
> Zelphir
> 
> 
>

next prev parent reply	other threads:[~2020-01-08  8:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08  7:56 Limiting parallelism using futures, parallel forms and fibers Zelphir Kaltstahl
2020-01-08  8:11 ` Linus Björnstam [this message]
2020-01-09 23:27   ` Zelphir Kaltstahl
2020-01-08 11:44 ` Chris Vine
2020-01-10  0:43   ` Zelphir Kaltstahl
2020-01-10 16:08     ` Chris Vine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3d27ef2-43d7-494d-9426-71af7e6b5723@www.fastmail.com \
    --to=linus.internet@fastmail.se \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).