From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:52399) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYtCZ-0003jo-NJ for gwl-devel@gnu.org; Thu, 06 Jun 2019 10:17:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYtCU-0004N8-6v for gwl-devel@gnu.org; Thu, 06 Jun 2019 10:17:11 -0400 Received: from mail.thebird.nl ([94.142.245.5]:50602) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYtCT-0003gA-Qo for gwl-devel@gnu.org; Thu, 06 Jun 2019 10:17:06 -0400 Date: Thu, 6 Jun 2019 09:06:59 -0500 From: Pjotr Prins Message-ID: <20190606140659.wcwhc3bcfdkaznjw@thebird.nl> References: <87a7f5l6e1.fsf@mdc-berlin.de> <8736knieo3.fsf@kyleam.com> <87v9xjja6b.fsf@mdc-berlin.de> <20190606134404.g3synqkzopqab3ue@thebird.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190606134404.g3synqkzopqab3ue@thebird.nl> Content-Transfer-Encoding: quoted-printable Subject: Re: Next steps for the GWL List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gwl-devel-bounces+kyle=kyleam.com@gnu.org Sender: "gwl-devel" To: zimoun Cc: gwl-devel@gnu.org, Ricardo Wurmus We should also assess this https://labs.eleks.com/2019/03/ipfs-network-data-replication.html On Thu, Jun 06, 2019 at 08:44:04AM -0500, Pjotr Prins wrote: > IPFS is meant for data sharing and reproducibility. It also allows for > private networks which is rather important. >=20 > Scalability of IPFS is a concern, so either we cache using IPFS or we > have some other caching mechanism. >=20 > git-annex is too much of a hack in my book. It also does not scale > that well. >=20 > Pj. >=20 > On Thu, Jun 06, 2019 at 12:55:52PM +0200, zimoun wrote: > > Hi, > >=20 > > On Thu, 6 Jun 2019 at 12:11, Ricardo Wurmus > > wrote: > >=20 > > > > One of the things I'd love to do > > > > with GWL is to make it play well with git-annex, something that w= ould > > > > almost certainly be too specific for GWL itself. For example > > > > > > > > * Make data caching git-annex aware. When deciding to recomput= e data > > > > files, GWL avoids computing the hash of data files, using scr= ipts as > > > > the cheaper proxy, as you described in 87womnnjg0.fsf@elephly= .net. > > > > But if the user is tracking data files with git-annex, gettin= g the > > > > hash of data files becomes less expensive because we can ask > > > > git-annex for the hash it has already computed. > > > > > > > > * Support getting annex data files on demand (i.e. 'git annex g= et') if > > > > they are needed as inputs. > > > > > > I wonder what the protocol should look like. Should a workflow > > > explicitly request a =E2=80=9Cgit annex=E2=80=9D file or should it = be up to the person > > > running the workflow, i.e. when =E2=80=9Cgit annex=E2=80=9D has bee= n configured to be > > > the cache backend it would simply look up the declared input/output > > > files there. > > > > > > I suppose the answers would equally apply to using IPFS as a cache. > >=20 > > I agree that the mechanism such as `git-annex` should be nice. > > But is it not a mean for the CAS that we previously discussed? > >=20 > > I fully agree with the features and their description. Totally cool! > > However, I am a bit reluctant with `git-annex` because it requires a > > Haskell compiler and it is far far from "bootstrapability". I am awar= e > > of the Ricardo's try---and AFIAK the only one. And here [1] > > explanations by one Haskeller. > >=20 > > My opinion: GWL should stay on the path of Reproducibility, > > end-to-end. So `git-annex` should be a transitional step---while the > > Haskell bootstrap is not solved---as a mean for the CAS (cache) and I > > would find more elegant to use the "data-oriented IPFS": IPLD [2]. > >=20 > >=20 > > [1] https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrappin= g_GHC > > [2] https://ipld.io/ > >=20 > >=20 > > All the best, > > simon > >=20