From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <87a7f5l6e1.fsf@mdc-berlin.de> <8736knieo3.fsf@kyleam.com> <87v9xjja6b.fsf@mdc-berlin.de> In-Reply-To: <87v9xjja6b.fsf@mdc-berlin.de> From: zimoun Date: Thu, 6 Jun 2019 12:55:52 +0200 Message-ID: Subject: Re: Next steps for the GWL Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Ricardo Wurmus Cc: Kyle Meyer , gwl-devel@gnu.org List-ID: Hi, On Thu, 6 Jun 2019 at 12:11, Ricardo Wurmus wrote: > > One of the things I'd love to do > > with GWL is to make it play well with git-annex, something that would > > almost certainly be too specific for GWL itself. For example > > > > * Make data caching git-annex aware. When deciding to recompute data > > files, GWL avoids computing the hash of data files, using scripts a= s > > the cheaper proxy, as you described in 87womnnjg0.fsf@elephly.net. > > But if the user is tracking data files with git-annex, getting the > > hash of data files becomes less expensive because we can ask > > git-annex for the hash it has already computed. > > > > * Support getting annex data files on demand (i.e. 'git annex get') i= f > > they are needed as inputs. > > I wonder what the protocol should look like. Should a workflow > explicitly request a =E2=80=9Cgit annex=E2=80=9D file or should it be up = to the person > running the workflow, i.e. when =E2=80=9Cgit annex=E2=80=9D has been conf= igured to be > the cache backend it would simply look up the declared input/output > files there. > > I suppose the answers would equally apply to using IPFS as a cache. I agree that the mechanism such as `git-annex` should be nice. But is it not a mean for the CAS that we previously discussed? I fully agree with the features and their description. Totally cool! However, I am a bit reluctant with `git-annex` because it requires a Haskell compiler and it is far far from "bootstrapability". I am aware of the Ricardo's try---and AFIAK the only one. And here [1] explanations by one Haskeller. My opinion: GWL should stay on the path of Reproducibility, end-to-end. So `git-annex` should be a transitional step---while the Haskell bootstrap is not solved---as a mean for the CAS (cache) and I would find more elegant to use the "data-oriented IPFS": IPLD [2]. [1] https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC [2] https://ipld.io/ All the best, simon