From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:43134) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ibruo-00062j-M7 for gwl-devel@gnu.org; Mon, 02 Dec 2019 15:03:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ibrun-0003NW-9u for gwl-devel@gnu.org; Mon, 02 Dec 2019 15:03:26 -0500 Received: from mail-qv1-xf33.google.com ([2607:f8b0:4864:20::f33]:43863) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ibrun-0003NO-67 for gwl-devel@gnu.org; Mon, 02 Dec 2019 15:03:25 -0500 Received: by mail-qv1-xf33.google.com with SMTP id p2so392997qvo.10 for ; Mon, 02 Dec 2019 12:03:25 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Josh Marshall Date: Mon, 2 Dec 2019 15:03:12 -0500 Message-ID: Subject: Re: How do I support building a guix package over multiple machines in a cloud environment? Content-Type: text/plain; charset="UTF-8" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gwl-devel-bounces+kyle=kyleam.com@gnu.org Sender: "gwl-devel" To: gwl-devel@gnu.org Looking at https://lists.gnu.org/archive/html/gwl-devel/2019-01/msg00034.html the use case I'm looking at explicitly requires the input files to be hashed and tracked manually, as if a package. The actual pipeline doesn't change much if at all, but those large data files must be tracked. Nextflow is the current fad pipeline, but it would be nice to have some fully magical reproducible way to just re-use any DSL, as there are a ton used and it would be nice to not replicate xkcd.com/927 . Still reading over everything. I'm going to get a direct plan for supporting this use case today. I wish I had work time for this rather than vacation time. The technology is fascinating. On Mon, Dec 2, 2019 at 2:00 PM zimoun wrote: > > Hi (again) Josh :-) > > On Mon, 2 Dec 2019 at 19:38, Josh Marshall > wrote: > > > He uses it as a bioinformatics workflow to generate some analysis. It > > GWL should work for this use case. o/ > > > Is this kind of use case supported? If so, how so? Is nextflow not > > practical to keep? Please, someone catch me up here so I can start to > > write code to help him out. If this goes well, my company could > > integrate for gwl/guix in our work, which would be amazing. > > Netxflow [1] is a Domain Specific Language (DSL): you write "rules" > and how these rules are combined together. In the bioinformatics > field, Snakemake [2] seems more popular. Other alternatives are CWL > [3], WDL [4], etc. > > Basically, you describe: > - what is the inputs > - what is the outputs > - how to process the inputs to produce the outputs > > You can find examples there [*]. It uses the WISP syntax [5] but it > perfectly works with a Scheme-syntax if you prefer parenthesis. ;-) > > [*] https://guixwl.org/ > > > However, you should be interested by this blog post [#] by Pjotr using > Guix and CWL and other niceties! > > [#] https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/ > > AFAIK, Nextflow is not yet packaged in Guix. One direction is to > package it and then use the workflow described in Nextflow DSL in the > spirit of [#]. One other direction is to rewrite the workflow using > the GWL DSL. It depends a bit on what is your final aim. > > > Hope that helps. > simon > > [1] https://www.nextflow.io/ > [2] https://snakemake.readthedocs.io/en/stable/ > [3] https://www.commonwl.org/ > [4] http://www.openwdl.org/ > [5] https://srfi.schemers.org/srfi-119/srfi-119.html