From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1gomvr-0005fS-2Y for mharc-gwl-devel@gnu.org; Wed, 30 Jan 2019 05:17:23 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53273) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gomvo-0005bw-TX for gwl-devel@gnu.org; Wed, 30 Jan 2019 05:17:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gomvj-0006AE-JO for gwl-devel@gnu.org; Wed, 30 Jan 2019 05:17:17 -0500 Received: from mail-qk1-x732.google.com ([2607:f8b0:4864:20::732]:46266) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gomvj-00069F-El for gwl-devel@gnu.org; Wed, 30 Jan 2019 05:17:15 -0500 Received: by mail-qk1-x732.google.com with SMTP id q1so13315959qkf.13 for ; Wed, 30 Jan 2019 02:17:14 -0800 (PST) MIME-Version: 1.0 References: <87bm40qta0.fsf@elephly.net> <875zu7refm.fsf@elephly.net> <87womnptym.fsf@elephly.net> <874l9rpeiq.fsf@elephly.net> <87womnnjg0.fsf@elephly.net> In-Reply-To: <87womnnjg0.fsf@elephly.net> From: zimoun Date: Wed, 30 Jan 2019 11:17:02 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: support for containers List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ricardo Wurmus Cc: gwl-devel@gnu.org Hi Ricardo, On Wed, 30 Jan 2019 at 00:16, Ricardo Wurmus wrote: > Since we don=E2=80=99t hash the data (because it=E2=80=99s expensive) the= scripts are > =E2=80=9Cproxies=E2=80=9D for the data files. We compute the hashes over= the dependent > scripts and assume that this is enough to decide whether to recompute > data files or to serve them from the cache/store. Just to be sure to well understand your point, let pick the simple example from genomics pipeline: FASTQ -align-> BAM -variant-> VCF So, you intend to hash: - the data FASTQ - the scripts align and variant Or only the scripts containing reference to inputs (here FASTQ), where the reference is a location fixed by the user. Well, hashing the scripts and assuming they "mirror" the data files appear to me an efficient design for the CAS. -- simon