From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id APZrERXBj2C5LAEAgWs5BA (envelope-from ) for ; Mon, 03 May 2021 11:23:33 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id gMkDDRXBj2CbZgAAbx9fmQ (envelope-from ) for ; Mon, 03 May 2021 09:23:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3E66014E5B for ; Mon, 3 May 2021 11:23:32 +0200 (CEST) Received: from localhost ([::1]:38574 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldUnb-00005y-Bo for larch@yhetil.org; Mon, 03 May 2021 05:23:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33128) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldUin-0002Ah-Mk for gwl-devel@gnu.org; Mon, 03 May 2021 05:18:33 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21160) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldUil-00085Q-Jz for gwl-devel@gnu.org; Mon, 03 May 2021 05:18:33 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1620033504; cv=none; d=zohomail.com; s=zohoarc; b=AexMFvGWSD0yuIB84bPU1jUJnboBYvE8WCSxKW9RJ3V+0WJKPr3UM8pjDQxUfs9CrXpQuBqwgQoVUOHB5+usVCf6k+tK3ceQnZkpf596elCZFOcp8FlostEFdSU6pTRywR9H5HtnS02IDOgD34cZsA/RV0Cy2w4Ue7P8VMi1W1s= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620033504; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=GkrYCSYuwdSFqn5ZoG8UrmhBDlZ7pTzTv74JYMrPDrk=; b=cSHuv4ywtbrL8F8UhRpK4uN/BrolcndKE6nZqm6KbstgxJ6ZbXefeFMNL7Le4FcD1NN2/u2iJgW8Jh9c6CPGXtdWX+7Y3z+Gp5OidRE/75lNdjcYTDoy+YA46ykGAzSo0qf1pZEYsKy6EEsdzoGfKyOlaQWyb3SULsK8145hy7Q= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1620033504; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=GkrYCSYuwdSFqn5ZoG8UrmhBDlZ7pTzTv74JYMrPDrk=; b=adhmLumVArAlRBQMSI2Wf8Mit9RZKLA35CSjBjsDsEQPyjP+L1Ml0tSErqqfcwIS mETPqSbXv9Uqj0z0+XP9HZ3yIddasXJztgtEigkVYH6+3pOZC3lTDglVOy1AyBJxThY RWX2+AVTx3O1iefbrw3ST+E4tu5o4Sk+frRbRJg8= Received: from localhost (p4fd5aec5.dip0.t-ipconnect.de [79.213.174.197]) by mx.zohomail.com with SMTPS id 1620033501637116.44346079561763; Mon, 3 May 2021 02:18:21 -0700 (PDT) References: <87r1k2ti7k.fsf@elephly.net> <87czvmt5w3.fsf@elephly.net> <87h7kq2kzy.fsf@elephly.net> <87blaqz5mq.fsf@elephly.net> User-agent: mu4e 1.4.15; emacs 27.2 From: Ricardo Wurmus To: Konrad Hinsen Subject: Re: Managing data files in workflows In-reply-to: X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Mon, 03 May 2021 11:18:18 +0200 Message-ID: <87a6pckwf9.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gwl-devel@gnu.org Errors-To: gwl-devel-bounces+larch=yhetil.org@gnu.org Sender: "gwl-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1620033813; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=GkrYCSYuwdSFqn5ZoG8UrmhBDlZ7pTzTv74JYMrPDrk=; b=J4eg1CouLCWx9h2F9ZEG5rioT2CjAXdIyFuXvMI5Ygmsb1njICqVcuNhdr1wq6OwOOITm0 xgPKqJNpin2H4xZklqfc3/Zdz1bzdoI1VEe1+PplV0kwl010FfOyPc9pdmyoDQljAdTRWA h3S8Nspt5YYAmyuD7Pn2UPI8sz5nJHsL1QOp1bp4IfZMD6ssxee+HHrb01+7pv7FDZFYIh aXc2Z7dgvsUvkVsL53Lwym38c0jzsbmflHzZH8RMthW//aXLQNg34x4oN+UNNOX5zFiHDo own1qzLWJ/2zB0KpmGDNGIUfb9+Dcr3AU+InLIVkQwjtRelQUNw4hOWuAAK1Xg== ARC-Seal: i=2; s=key1; d=yhetil.org; t=1620033813; a=rsa-sha256; cv=pass; b=Gvw2phaEdD5FVvSsPTRQY1KbrLlJb4mk3Na4dyOVDjeifmV7nGp11XdAoCNp92MsOh4bN2 l9GCVdDob85dp9RBgnWvoEBfpLsSJdcG5eqD0pmk23aU2+12r3Mk4aUc72Cd8c5P29wxH5 DjdO/O+fd1oiujFzJKjA2p+cFLiekVtCiA1NcDwUs12cfkitMPneVonpzNdYy1FLemqupi zucgJpvRhP64cEpLL/NbZAOECmlfr2fASLw8e4EG8s+alAPwAg5tBw/yQIwREuOZtIjIAg 0s/3vNS4tZsJzhMeEup2WNRISFm7AvegRPcQtzAgqyd0YryXXp1usGOPpsOOVg== ARC-Authentication-Results: i=2; aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=adhmLumV; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.66 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=adhmLumV; arc=pass ("zohomail.com:s=zohoarc:i=1"); dmarc=none; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Queue-Id: 3E66014E5B X-Spam-Score: -3.66 X-Migadu-Scanner: scn0.migadu.com X-TUID: nUzX7zQXzYFy Konrad Hinsen writes: > Hi Ricardo, > >> We can fix the problem with symlinks by restoring the target of=20 >> the link >> instead of the link itself, but I feel that we need to take a=20 >> step back >> and consider what this cache is really to be used for. > > Indeed, and I have to admit that this isn't clear to me=20 > yet. What is it > supposed to protect against? Modification of files by other=20 > processes of > the workflow? Modification of files outside of the workflow?=20 > Both? > > For the second situation (modification outside of the workflow),=20 > I think > it would be sufficient to store a checksum, and terminate the=20 > workflow > with an error if it detects such tampering. > > The first situation is more difficult. There are actually two=20 > cases: > 1. The workflow intentionally updates files as it proceeds. > 2. The workflow modifies a file by mistake. > > Only the workflow author can make the distinction, so this needs=20 > some > specific input syntax. Case 2 could then again be handled by a=20 > simple > checksum test for signalling an error. > > This leaves case 1, for which the only good solution is to make=20 > a copy > of the file at the end of each process, and restore it in later=20 > runs. Yes, you are right. On wip-drmaa I changed the cache to never=20 symlink. It either hardlinks or copies. This solves the=20 immediate problem. Yes, the semantics of hardlink/copy differ, but since our=20 assumption is that intermediate files are reproducible, we can=20 ignore this at this point. I want to make the cache store/restore actions configurable,=20 though, so that you can implement whatever caching method you want=20 (including caching by copying to AWS S3).=20=20 I=E2=80=99d like to introduce modifiers =E2=80=9Cimmutable=E2=80=9D and =E2= =80=9Cmutable=E2=80=9D, so that=20 you can write =E2=80=9Cimmutable file "whatever" you "want"=E2=80=9D etc.=20 =E2=80=9Cimmutable=E2=80=9D would take care of recording hashes and checkin= g=20 previously recorded hashes in a local state directory. --=20 Ricardo