From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id gO1FAQwvJmR7awEASxT56A (envelope-from ) for ; Fri, 31 Mar 2023 02:53:32 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id GBVkAAwvJmTzcwEAG6o9tA (envelope-from ) for ; Fri, 31 Mar 2023 02:53:32 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7803F31C2A for ; Fri, 31 Mar 2023 02:53:31 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pi30q-0001Os-BB; Thu, 30 Mar 2023 20:53:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pi30o-0001Ob-3N for guix-devel@gnu.org; Thu, 30 Mar 2023 20:53:02 -0400 Received: from mout02.posteo.de ([185.67.36.66]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pi30l-0004yi-Js for guix-devel@gnu.org; Thu, 30 Mar 2023 20:53:01 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 3A5D7240344 for ; Fri, 31 Mar 2023 02:52:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1680223975; bh=oi/BlEhI2gOe4ktHkOVpDUNVZ/3AUIF98lPua0ysLSg=; h=Date:From:To:CC:Subject:From; b=XKEViWQos4dbkuhblyNNuZ/tQlVSljswiCX6uLTM2Umd30F/JUgKDAdW6S9if0Pyd XPZZjwYevyub1fIf7eWPCc3kEC8VKsEwyTaP7qSS0ZOUoMAr5Oo1gXCs/igr5cm4uo DZzE9mihIwiIhDKZ4ibVPjKoXr/syozD89bOzQzPiSEAtNz0O26i5/1bDW0BGK5eeQ oRQ/3vDYYZs/mtwUVFQ28u/IMZs2qkLa7cRHOLzUtTHEgcL1gWUAUu8hz0P23ZE3Ex yscIbhWdLnuecGT+E7R3lQtmU088J3pYJsi96+GtaPTxed+SeV3BCcCUhEYe3On/1U kVX7K/HnG1flg== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PnhbT2Hbtz6tvC; Fri, 31 Mar 2023 02:52:52 +0200 (CEST) Date: Fri, 31 Mar 2023 00:52:48 +0000 From: Kyle To: Spencer Skylar Chan , Ricardo Wurmus CC: Simon Tournier , guix-devel@gnu.org Subject: Re: Google Summer of Code 2023 Inquiry In-Reply-To: References: <6d30ee7b-f1f0-9199-fea8-efd434c8611c@terpmail.umd.edu> <86sfeb9zx8.fsf@gmail.com> <87ttycir7r.fsf@elephly.net> <87zg81hki5.fsf@mdc-berlin.de> Message-ID: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=----MPIOMVLGAPLDU8VLSROG9B1PWOLUT9 Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=185.67.36.66; envelope-from=kyle@posteo.net; helo=mout02.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680224011; a=rsa-sha256; cv=none; b=IXKmePNPplu+rvr5daKs9wqK+2WAGEZ6ysgAX2V+jO20lNWI8cMdHpcUYEriz+7E3U5l4r 5+2uD0/XsavhaqsxhVe/RFNg/6XF58wzEel2RFP23qXlX80bdqvmJCDgNpG44+nmpGj1vv 6AnD4ak3HpQIA4wuY3yQoahiMo3NerTjVbin5D1GfjI98IoPmedJfOAmQqqfLhTMYbGekS NZ5KnYDRBjz2awxlM8nxU5yjrbNLSX0CWfuktlKvGfm+5fKTXqjYmlbh45LW8O25qagu2q OVjRomJHHhNeoLu11EKDCmJ7KSRGBTZWfq7TfE+iVeUZGFwCSg1IHvtIXI92wA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=XKEViWQo; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680224011; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=Uh/vLkd4yLUQFbnjyufpLG2wtZAWR/EnVsfiJZdyd9w=; b=WdNWiKGIiboI4riDQ5N2kn9X9zTHRySGhc/yPqvcYhGF5Ys7rnPvH5sGwdMCBTZbBB/Z2y +Gj+OS1o/xmipKLI1SpuocJHCtJGwxig30l1AGF9KkfBxIr1iNWmE9Qe9uoKa5iek1QUjN CHhRZoMHs6rtyVz7iISdcHENe71G7AVPGVOk0W1xv7I5oRT+1QlLmaiuNglKxzcThJeSMK Ui5ycbEQUcjnE5i0f/AKGkibKRedT0ych+BgE7iBPBq+3pOH8Xv31UgxWZoZC4rC51Qvse W4ZwKV38F9RKEPqnnysuzLb9tDQyyyIOTlk03iByBfk5ahW0pZn2hKQLcht6bQ== X-Migadu-Spam-Score: -3.73 X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=XKEViWQo; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Spam-Score: -3.73 X-Migadu-Queue-Id: 7803F31C2A X-TUID: R9FNr/GUD2cs ------MPIOMVLGAPLDU8VLSROG9B1PWOLUT9 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable As a statistician who always wants to get the most information for the leas= t effort, I am particularly interested in being able to reprioritize workfl= ow jobs interactively within the equivalent portions of the topological sor= t=2E I thought perhaps this would be possible with GWL if it could talk to = SLURM with DRMAA version 2 (https://en=2Ewikipedia=2Eorg/wiki/DRMAA)=2E Thi= s would also be more readily useful to researchers if Guix had a convenient= ly available slurm service which worked out of the box even on a single mac= hine=2E=20 Stepping back, there might be a more ambitious question hidden in there in= terms of how to handle indeterminism in a deterministic workflow manager= =2E Without that external information the problem just involves choosing yo= ur random seeds up front=2E However, I would prefer to write a procedure w= hich is constantly reprioritizing labeled sub jobs within their associated = containers either until I hit a resource limit or I have achieved certain t= arget statistical diagnostics=2E Perhaps I would want GWL to tell me how to= replay my build after the fact so I can make that reproducible even though= I didn't know what I needed to focus my computations on up front and let t= he computer do that=2E Making that sort of thing possible might be a longer= term effort, but working out what's needed for initial steps might be a fu= n project=2E On March 30, 2023 7:27:37 PM EDT, Spencer Skylar Chan wrote: >Hi Ricardo, > >On 3/23/23 03:58, Ricardo Wurmus wrote: >> Hi, >>=20 >> Spencer Skylar Chan writes: >>=20 >>> One approach could be to add CWL import/export capabilities to >>> GWL=2E Then Snakemake/GWL conversion would be a 2 step process, using >>> CWL as an intermediate step: >>>=20 >>> 1=2E Snakemake -> CWL >>> 2=2E CWL -> GWL >>=20 >> This seems doable=2E > >Great! I've been reading the chapter in Evolutionary Genomics on differen= t scalable workflows to understand this process better=2E > >>> However, CWL is not as expressive as Snakemake=2E There may be some >>> details that are lost from Snakemake workflows=2E >>>=20 >>> So a 1-step Snakemake/GWL transpiler could be interesting, as both >>> Snakemake/GWL use a domain-specific language inside a general purpose >>> language (Python/Guile respectively)=2E There may be a possibility to >>> achieve more "accurate" translations between workflows=2E >>=20 >> Compared to the previous approach this seems vastly more complex=2E It= =E2=80=99s >> one thing to *execute* Snakemake code without running it through Python= , >> but quite a bit more challenging to transpile Python to Scheme=2E >>=20 >> Personally, I wouldn=E2=80=99t know where to start=2E Do you have an i= dea >> already? >>=20 > >Actually I was hoping you might have some ideas :) >I do think that if the execution of the pipeline is more important than i= ts representation (Snakemake or otherwise), then it would make more sense t= o focus efforts on increasing GWL's capabilities=2E > >Thanks, >Skylar ------MPIOMVLGAPLDU8VLSROG9B1PWOLUT9 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable As a statistician who always wants to get the most= information for the least effort, I am particularly interested in being ab= le to reprioritize workflow jobs interactively within the equivalent portio= ns of the topological sort=2E I thought perhaps this would be possible with= GWL if it could talk to SLURM with DRMAA version 2 (https://en=2Ewikipedia=2Eorg/wiki/DRMAA)= =2E This would also be more readily useful to researchers if Guix had a con= veniently available slurm service which worked out of the box even on a sin= gle machine=2E

Stepping back, there might be a more ambitious quest= ion hidden in there in terms of how to handle indeterminism in a determinis= tic workflow manager=2E Without that external information the problem just = involves choosing your random seeds up front=2E However,=C2=A0 I would pref= er to write a procedure which is constantly reprioritizing labeled sub jobs= within their associated containers either until I hit a resource limit or = I have achieved certain target statistical diagnostics=2E Perhaps I would w= ant GWL to tell me how to replay my build after the fact so I can make that= reproducible even though I didn't know what I needed to focus my computati= ons on up front and let the computer do that=2E Making that sort of thing p= ossible might be a longer term effort, but working out what's needed for in= itial steps might be a fun project=2E

On = March 30, 2023 7:27:37 PM EDT, Spencer Skylar Chan <schan12@terpmail=2Eu= md=2Eedu> wrote:
Hi Ricardo,

On 3/23/23 03:58, Ri= cardo Wurmus wrote:
Hi,=

Spencer Skylar Chan <schan12@terpmail=2Eumd=2Eedu> writes:
One approach could be= to add CWL import/export capabilities to
GWL=2E Then Snakemake/GWL conv= ersion would be a 2 step process, using
CWL as an intermediate step:
=
1=2E Snakemake -> CWL
2=2E CWL -> GWL

This= seems doable=2E

Great! I've been reading the chapter i= n Evolutionary Genomics on different scalable workflows to understand this = process better=2E

<= blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 1ex 0=2E8ex; bord= er-left: 1px solid #ad7fa8; padding-left: 1ex;">However, CWL is not as expr= essive as Snakemake=2E There may be some
details that are lost from Snak= emake workflows=2E

So a 1-step Snakemake/GWL transpiler could be int= eresting, as both
Snakemake/GWL use a domain-specific language inside a = general purpose
language (Python/Guile respectively)=2E There may be a p= ossibility to
achieve more "accurate" translations between workflows=2E<= br>

Compared to the previous approach this seems vastly mor= e complex=2E It=E2=80=99s
one thing to *execute* Snakemake code without= running it through Python,
but quite a bit more challenging to transpil= e Python to Scheme=2E

Personally, I wouldn=E2=80=99t know where to s= tart=2E Do you have an idea
already?


Actually I= was hoping you might have some ideas :)
I do think that if the executio= n of the pipeline is more important than its representation (Snakemake or o= therwise), then it would make more sense to focus efforts on increasing GWL= 's capabilities=2E

Thanks,
Skylar
------MPIOMVLGAPLDU8VLSROG9B1PWOLUT9--