From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:bcc0::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id ylmuN6efXWA+agAAgWs5BA (envelope-from ) for ; Fri, 26 Mar 2021 09:47:35 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id AITWLKefXWDJEAAAB5/wlQ (envelope-from ) for ; Fri, 26 Mar 2021 08:47:35 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0B683225B2 for ; Fri, 26 Mar 2021 09:47:34 +0100 (CET) Received: from localhost ([::1]:40320 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lPi7w-0003dp-RK for larch@yhetil.org; Fri, 26 Mar 2021 04:47:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36750) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lPi7u-0003dV-61 for gwl-devel@gnu.org; Fri, 26 Mar 2021 04:47:30 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21195) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lPi7o-0003sN-SP for gwl-devel@gnu.org; Fri, 26 Mar 2021 04:47:29 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1616748437; cv=none; d=zohomail.com; s=zohoarc; b=Bvh+VMzxHWyzOl/hlBXcFVAQKc75GcncB0QmuWFNFQoOZdkv3f9bIy4c7yE2knsmpKob9V+xJvKdXPEwhUiyxkcfXBhO7wv2Iua1ni7I9oTDmsik5G8BdTD/0KOc6kgTcycm5vIwpON698/pjJ52PIH+rbV27IjMddIOYoB7FzU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1616748437; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=rAv/LE0agQ6nmCTKaBTsSW280R+6oKdndNIuniPbVgU=; b=dpxDSwrONt5U73CxUmS1At2AGP264LZwjv/keKw15KfeqSXspNIOBwPRoqxe8CyOr/N/niMAYms/zSwp9AImUpVaxdBxCiRGhvy2+MtWUFGx8845Iu3b0MD4fZ0gOn57rxkEJ7m5Vn4zeOm/zeDULYE2aearV/c5EE+rHCdlBQ4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1616748437; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=rAv/LE0agQ6nmCTKaBTsSW280R+6oKdndNIuniPbVgU=; b=DAPcKzoQjRcwGLcNYDlCM9DdBD6eFDBE26tot5jd27W7qW28eTakoJXbq3/apI2G C5L23KTkX05uVeJlSq/sseqPnFmTp7EJIq4MDwkM/xgiFgo3YE/EzJzxRT/fnEIH6fp EMbKQkTckKPBFIdHEPAB0JqJor9vbEGyAbEh/+Ic= Received: from localhost (p54ad4990.dip0.t-ipconnect.de [84.173.73.144]) by mx.zohomail.com with SMTPS id 1616748433875775.9631396595613; Fri, 26 Mar 2021 01:47:13 -0700 (PDT) References: User-agent: mu4e 1.4.14; emacs 27.1 From: Ricardo Wurmus To: Konrad Hinsen Subject: Re: Managing data files in workflows In-reply-to: X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Fri, 26 Mar 2021 09:47:11 +0100 Message-ID: <87r1k2ti7k.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gwl-devel@gnu.org Errors-To: gwl-devel-bounces+larch=yhetil.org@gnu.org Sender: "gwl-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616748454; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=rAv/LE0agQ6nmCTKaBTsSW280R+6oKdndNIuniPbVgU=; b=XqMKFFmvSBKe6ZUKc9FKunbQIW+ZYdZ3svmQ/FobRcsQhhrmGGrHAu398nFDXM5punFnXQ RHr2RQbie+VmbamfxNDPi9dASFyZ5IvIvuvt+U7qsRYdmQF5xqWDnx4lA9NMoQCgzn2FsD QYw8RWtKVRgrCAqeyvC0IDv7BZnMS6RzKEAx0JyXx/Tx89QPXbSXl2wBuWDv766J1A0iTG LbnbAq9hgNWq+gbigpQ3yLxzS0aHjhTzMA9vRZc3dWoC1w/c5QK++1JvxCgN0s7i0Ky5wQ QNTAIG10qeWaRnfU9ykYUZbyvqYlA88FgT8Mi67m7lo18FuwVOuhQuwu2wIykQ== ARC-Seal: i=2; s=key1; d=yhetil.org; t=1616748454; a=rsa-sha256; cv=pass; b=MjCOej9EYsEEpzdS6oUcvBgPXY1tD5Q6e6qjDEz7SqyupfG/yPx1fEmOScx3SbNEi+je7s IfJz2xNdDM+pnLEd3Ktq+pN7nB0pSWRCoFedaR+hDg/OXNW9F0Hi1WXUn/c5Zid/+5pCAi iE8BT94BZaGthXwcpvuoaRWC0tY8P7gzURBKIk7vFoUd7+9gXJcYLs51UR5jWt23qR6yDn doQ/PoNeOfl7B761rFPZfmdRuA1Uv9k7+KWAlamLU5OplgA59RzXNNQSbSEa7fJ/Sex+Ej 1iDe/oyloVo+U0h+X2QwAiU1v/XrQrz1m064XONtGJiogakeMbmDxGvLA+yFQg== ARC-Authentication-Results: i=2; aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=DAPcKzoQ; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.62 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=DAPcKzoQ; arc=pass ("zohomail.com:s=zohoarc:i=1"); dmarc=none; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Queue-Id: 0B683225B2 X-Spam-Score: -3.62 X-Migadu-Scanner: scn0.migadu.com X-TUID: P6k02+1wvRK2 Hi Konrad, > Coming from make-like workflow systems, I wonder how data files are best > managed in GWL workflow. GWL is clearly less file-centric than make > (which is a Good Thing in my opinion), but at a first reading of the > manual, it doesn't seem to care about files at all, except for > auto-connect. > > A simple example: > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > process download > packages "wget" > outputs > file "data/weekly-incidence.csv" > # { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3= .csv } > > workflow influenza-incidence > processes download > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D This works for me correctly: --8<---------------cut here---------------start------------->8--- $ guix workflow run foo.w info: Loading workflow file `foo.w'... info: Computing workflow `influenza-incidence'... The following derivations will be built: /gnu/store/59isvjs850hm6ipywhaz34zvn0235j2g-gwl-download.scm.drv /gnu/store/s8yx15w5zwpz500brl6mv2qf2s9id309-profile.drv building path(s) `/gnu/store/izhflk47bpimvj3xk3r4ddzaipj87cny-ca-certificat= e-bundle' building path(s) `/gnu/store/i7prqy908kfsxsvzksr06gxks2jd3s08-fonts-dir' building path(s) `/gnu/store/pzcqa593l8msd4m3s0i0a3bx84llzlpa-info-dir' building path(s) `/gnu/store/7f5i86dw32ikm9czq1v17spnjn61j8z6-manual-databa= se' Creating manual page database... [ 2/ 3] building list of man-db entries... 108 entries processed in 0.1 s building path(s) `/gnu/store/mrv97q0d2732bk3hmj91znzigxyv1vsh-profile' building path(s) `/gnu/store/chz5lck01vd3wlx3jb35d3qchwi3908f-gwl-download.= scm' run: Executing: /bin/sh -c /gnu/store/chz5lck01vd3wlx3jb35d3qchwi3908f-gwl-= download.scm '((inputs) (outputs "./data/weekly-incidence.csv") (values) (n= ame . "download"))'=20 --2021-03-26 09:41:17-- http://www.sentiweb.fr/datasets/incidence-PAY-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... conne= cted. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: =E2=80=98./data/weekly-incidence.csv=E2=80=99 ./data/weekly-incidence.csv [ <=3D> = = ] 83.50K --.-KB/s in 0.05s=20=20=20 2021-03-26 09:41:18 (1.63 MB/s) - =E2=80=98./data/weekly-incidence.csv=E2= =80=99 saved [85509] $ guix workflow run foo.w info: Loading workflow file `foo.w'... info: Computing workflow `influenza-incidence'... run: Skipping process "download" (cached at /tmp/gwl/lf6uca7zcyyldkcrxn3zwc= 275ax3ip676aqgjo75ybwojtl4emoq/). $ --8<---------------cut here---------------end--------------->8--- Here=E2=80=99s the changed workflow: --8<---------------cut here---------------start------------->8--- process download packages "wget" "coreutils" outputs file "data/weekly-incidence.csv" # { mkdir -p $(dirname {{outputs}}) wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv } workflow influenza-incidence processes download --8<---------------cut here---------------end--------------->8--- It skips the process because the output file exists and the daring assumption we make is that outputs are reproducible. I would like to make these assumptions explicit in a future version, but I=E2=80=99m not sure how. An idea is to add keyword arguments to =E2=80=9C= file=E2=80=9D that allows us to provide a content hash, or merely a flag to declare a file as volatile and thus in need of recomputation. I also wanted to have IPFS and git-annex support, but before I embark on this I want to understand exactly how this should behave and what the UI should be. E.g. having an input that is declared as =E2=80=9CIPFS-file=E2= =80=9D would cause that input file to be fetched automatically without having to specify a process that downloads it first. (Something similar could be implemented for web resources as in your example.) --=20 Ricardo