unofficial mirror of help-guix@gnu.org 
 help / color / mirror / Atom feed
* Guix for Corporate "Batch Jobs"?
@ 2022-03-08 21:16 Yasuaki Kudo
  2022-03-08 23:18 ` Phil
  0 siblings, 1 reply; 4+ messages in thread
From: Yasuaki Kudo @ 2022-03-08 21:16 UTC (permalink / raw)
  To: help-guix

Hi,

In many so-called Application Support jobs in the enterprises, one of the core responsibilities is to see through the daily completion of "batch jobs" - those I/O heavy processes that take a long time to run, even with parallel processing.

And at the core of it is to "re-run" the jobs, after due troubleshooting.

In many workplaces I have seen, teams ended up writing their own job schedulers based on cron or used proprietary software such as Autosys (and in Japan, there are local brews such as A-Auto, if I remember the name correctly).

But none of the solutions above take good care of the mechanical incremental computation aspect and a lot of optimization (say skip this and that because they don't matter during re-runs) depend on the operators' sweat and judgement 😅   
 
Can Guix be put into good use in this area do you think?  Or maybe another way of asking this question is, can Guix be used a general compiler such as 'make'?  Knowing that 'make' still exists so - is there any reason why Guix just can't take over?

Maybe similar questions have been already asked in the Nix world as well?   I would love to know! 😄

-Yasu








^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix for Corporate "Batch Jobs"?
  2022-03-08 21:16 Guix for Corporate "Batch Jobs"? Yasuaki Kudo
@ 2022-03-08 23:18 ` Phil
  2022-03-09  8:20   ` Ricardo Wurmus
  2022-03-09  8:49   ` Yasuaki Kudo
  0 siblings, 2 replies; 4+ messages in thread
From: Phil @ 2022-03-08 23:18 UTC (permalink / raw)
  To: Yasuaki Kudo; +Cc: help-guix

Hi Yasu,

Yasuaki Kudo writes:

> Hi,
>
> In many so-called Application Support jobs in the enterprises, one of the core responsibilities is to see through the daily completion of "batch jobs" - those I/O heavy processes that take a long time to run, even with parallel processing.
>
> And at the core of it is to "re-run" the jobs, after due troubleshooting.
>
> In many workplaces I have seen, teams ended up writing their own job schedulers based on cron or used proprietary software such as Autosys (and in Japan, there are local brews such as A-Auto, if I remember the name correctly).

Not sure if this is exactly what you're looking for - but Guix in my
experience can sit at the centre of a tech-stack for providing software
on machines, and then batch-running that software in a very predictable way.

However Guix is currenty first and foremost a command-line tool, so I
find myself augmenting it with other standard offerings to produce
familiar front-ends for triggers, job processing, management, etc.

A few examples below.

I oversee the use of Guix in an enterprise environment.  Initially it
was used to build/test our software and also provide deployments with
dependencies etc.  We wrapped Guix builds in Jenkins, which in-turn
integrates with our source control to trigger Guix using a standard
branch workflow developers are used to.  Guix fetches and caches any
build dependencies making subsequent builds faster, and making artifacts
available via a Guix substitute server to servers across the enterprise.

More recently and probably more useful to you - I've been looking at
taking the build outputs and making them available as batch jobs using
Guix Workflow Language (https://guixwl.org) - which is a good fit if
your batches are compute jobs with well defined inputs, numerous
dependent stages, and the requirement to reproduce identical numerical
output.  GWL provides lots of cool features - it's somewhat like Autosys
in that it is declarative - defining dependencies (and thus an order)
between different workflow processes etc.  I don't think GWL can memoize
different processes in a workflow tho - so running a workflow several
times results in all workflow processes being run, as far as I know.
The point is you should be guaranteed the same result with the same
inputs, every time.

I tend to wrap the GWL scripts in Rundeck (job scheduler) to allow
less-technical staff to re-run batches through a web app or to construct
a daily schedule for overnight/regression tests etc, rather than use the
guix command line.

Note GWL isn't designed to be used if the aim of your batch jobs is to
have a side-effect on the server you're running on.  We only use it to
produce results from calculations.  This is different to Autosys where
each job could be entirely made-up of side-effects which change the
state of the server itself.

HTH,
Phil.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix for Corporate "Batch Jobs"?
  2022-03-08 23:18 ` Phil
@ 2022-03-09  8:20   ` Ricardo Wurmus
  2022-03-09  8:49   ` Yasuaki Kudo
  1 sibling, 0 replies; 4+ messages in thread
From: Ricardo Wurmus @ 2022-03-09  8:20 UTC (permalink / raw)
  To: Phil; +Cc: help-guix


Phil <phil@beadling.co.uk> writes:

> I don't think GWL can memoize
> different processes in a workflow tho - so running a workflow several
> times results in all workflow processes being run, as far as I know.

By default GWL caches outputs that have already been computed.
Currently there’s only one way to skip computation and that is through
files.  When a computation results in a file the output is cached; if
the output exists already then the computation is not re-rerun unless
explicitly requested.

(The GWL needs even more caching to avoid recomputing build scripts, but
that’s a separate issue.)

-- 
Ricardo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix for Corporate "Batch Jobs"?
  2022-03-08 23:18 ` Phil
  2022-03-09  8:20   ` Ricardo Wurmus
@ 2022-03-09  8:49   ` Yasuaki Kudo
  1 sibling, 0 replies; 4+ messages in thread
From: Yasuaki Kudo @ 2022-03-09  8:49 UTC (permalink / raw)
  To: Phil; +Cc: help-guix

Hi Phil,

Thank you so much, yes, this does help!

I was thinking of profit/loss simulations for millions of transactions at large financial companies.  They typically have purpose-built libraries written in C and rely on server farms and beefy databases.   The acceptable range of input for such systems are quite limited and they do fail due to bad data, wrong assumptions of dates, business events, and so forth.

And I am always looking for a good place to start for international worker cooperatives spread around the globe.  Providing 24 hour "dev/op" network with Guix as one of the core competencies might do😄

-Yasu



> On Mar 9, 2022, at 08:18, Phil <phil@beadling.co.uk> wrote:
> 
> Hi Yasu,
> 
> Yasuaki Kudo writes:
> 
>> Hi,
>> 
>> In many so-called Application Support jobs in the enterprises, one of the core responsibilities is to see through the daily completion of "batch jobs" - those I/O heavy processes that take a long time to run, even with parallel processing.
>> 
>> And at the core of it is to "re-run" the jobs, after due troubleshooting.
>> 
>> In many workplaces I have seen, teams ended up writing their own job schedulers based on cron or used proprietary software such as Autosys (and in Japan, there are local brews such as A-Auto, if I remember the name correctly).
> 
> Not sure if this is exactly what you're looking for - but Guix in my
> experience can sit at the centre of a tech-stack for providing software
> on machines, and then batch-running that software in a very predictable way.
> 
> However Guix is currenty first and foremost a command-line tool, so I
> find myself augmenting it with other standard offerings to produce
> familiar front-ends for triggers, job processing, management, etc.
> 
> A few examples below.
> 
> I oversee the use of Guix in an enterprise environment.  Initially it
> was used to build/test our software and also provide deployments with
> dependencies etc.  We wrapped Guix builds in Jenkins, which in-turn
> integrates with our source control to trigger Guix using a standard
> branch workflow developers are used to.  Guix fetches and caches any
> build dependencies making subsequent builds faster, and making artifacts
> available via a Guix substitute server to servers across the enterprise.
> 
> More recently and probably more useful to you - I've been looking at
> taking the build outputs and making them available as batch jobs using
> Guix Workflow Language (https://guixwl.org) - which is a good fit if
> your batches are compute jobs with well defined inputs, numerous
> dependent stages, and the requirement to reproduce identical numerical
> output.  GWL provides lots of cool features - it's somewhat like Autosys
> in that it is declarative - defining dependencies (and thus an order)
> between different workflow processes etc.  I don't think GWL can memoize
> different processes in a workflow tho - so running a workflow several
> times results in all workflow processes being run, as far as I know.
> The point is you should be guaranteed the same result with the same
> inputs, every time.
> 
> I tend to wrap the GWL scripts in Rundeck (job scheduler) to allow
> less-technical staff to re-run batches through a web app or to construct
> a daily schedule for overnight/regression tests etc, rather than use the
> guix command line.
> 
> Note GWL isn't designed to be used if the aim of your batch jobs is to
> have a side-effect on the server you're running on.  We only use it to
> produce results from calculations.  This is different to Autosys where
> each job could be entirely made-up of side-effects which change the
> state of the server itself.
> 
> HTH,
> Phil.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-09  8:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 21:16 Guix for Corporate "Batch Jobs"? Yasuaki Kudo
2022-03-08 23:18 ` Phil
2022-03-09  8:20   ` Ricardo Wurmus
2022-03-09  8:49   ` Yasuaki Kudo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).