From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:53944) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hn1Og-0000Jr-0d for guix-patches@gnu.org; Mon, 15 Jul 2019 09:52:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hn1Od-0003Ho-Tn for guix-patches@gnu.org; Mon, 15 Jul 2019 09:52:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:38103) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hn1Oc-0003Gp-JN for guix-patches@gnu.org; Mon, 15 Jul 2019 09:52:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hn1Oc-0005Vj-Fo for guix-patches@gnu.org; Mon, 15 Jul 2019 09:52:02 -0400 Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Resent-Message-ID: MIME-Version: 1.0 References: <20181228231205.8068-1-ludo@gnu.org> <87r2dfv0nj.fsf@gnu.org> <8736pqthqm.fsf@gnu.org> <87zhlxe8t9.fsf@ambrevar.xyz> In-Reply-To: From: Alex Potsides Date: Mon, 15 Jul 2019 10:20:26 +0100 Message-ID: Content-Type: multipart/alternative; boundary="000000000000dfcafe058db4c59e" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Molly Mackinlay Cc: Hector Sanjuan , Antoine Eiche , Andrew Nesbitt , "33899\\@debbugs.gnu.org" <33899@debbugs.gnu.org>, Eric Myhre , Pierre Neidhardt , Jessica Schilling , "go-ipfs-wg\\@ipfs.io" --000000000000dfcafe058db4c59e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The reason not to use the UnixFSv1 metadata field was that it's in the spec but it's not really been implemented. As it stands in v1, you'd have to add explicit metadata types to the spec (executable, owner?, group?, etc) because protobufs need to know about everything ahead of time and each implementation would have update to implement those. This is all possible & not a technical blocker, but since most effort is centred around UnixFSv2 the timescales might not fit with people's requirements. The more pragmatic approach Hector suggested was to wrap a CID that resolves to the UnixFSv1 file in a JSON object that you could use to store application-specific metadata - something similar to the UnixFSv1.5 section in our notes from the Package Managers deep dive we did at camp. a. On Fri, Jul 12, 2019 at 9:03 PM Molly Mackinlay wrote: > Thanks for the update Pierre! Also adding Alex, Jessica, Eric and Andrew > from the package managers discussions at IPFS Camp as FYI. > > Generating the ipld manifest with the metadata and the tree of files > should also be fine AFAIK - I=E2=80=99m sure Hector and Eric can expand m= ore on how > to compose them, but data storage format shouldn=E2=80=99t make a big dif= ference > for the ipld manifest. > > On Mon, Jul 1, 2019 at 2:36 PM Pierre Neidhardt wrote= : > >> Hi! >> >> (Re-sending to debbugs, sorry for the double email :p) >> >> A little update/recap after many months! :) >> >> I talked with H=C3=A9ctor and some other people from IPFS + I reviewed L= udo's >> patch so now I have a little better understanding of the current state >> of affair. >> >> - We could store the substitutes as tarballs on IPFS, but this has >> some possible downsides: >> >> - We would need to use IPFS' tar chunker to deduplicate the content of >> the tarball. But the tar chunker is not well maintained currently, >> and it's not clear whether it's reproducible at the moment, so it >> would need some more work. >> >> - Tarballs might induce some performance cost. Nix had attempted >> something similar in the past and this may have incurred a significa= nt >> performance penalty, although this remains to be confirmed. >> Lewo? >> >> - Ludo's patch stores all files on IPFS individually. This way we don't >> need to touch the tar chunker, so it's less work :) >> This raises some other issues however: >> >> - Extra metadata: IPFS stores files on UnixFSv1 which does not >> include the executable bit. >> >> - Right now we store a s-exp manifest with a list of files and a >> list of executable bits. But maybe we don't have to roll out our >> own. >> >> - UnixFSv1 has some metadata field, but H=C3=A9ctor and Alex did not >> recommend using it (not sure why though). >> >> - We could use UnixFSv2 but it's not released yet and it's unclear >> when >> it's going to be released. So we can't really count on it right >> now. >> >> - IPLD: As H=C3=A9ctor suggested in the previous email, we could lev= erage >> IPLD and generate a JSON object that references the files with >> their paths together with an "executable?" property. >> A problem would arise if this IPLD object grows over the 2M >> block-size limit because then we would have to shard it (something >> that UnixFS would do automatically for us). >> >> - Flat storage vs. tree storage: Right now we are storing the files >> separately, but this has some shortcomings, namely we need multiple >> "get" requests instead of just one, and that IPFS does >> not "know" that those files are related. (We lose the web view of >> the tree, etc.) Storing them as tree could be better. >> I don't understand if that would work with the "IPLD manifest" >> suggested above. H=C3=A9ctor? >> >> - Pinning: Pinning all files separately incurs an overhead. It's >> enough to just pin the IPLD object since it propagates recursively. >> When adding a tree, then it's no problem since pinning is only done >> once. >> >> - IPFS endpoint calls: instead of adding each file individually, it's >> possible to add them all in one go. Can we add all files at once >> while using a flat storage? (I.e. not adding them all under a common >> root folder.) >> >> To sum up, here is what remains to be done on the current patch: >> >> - Add all files in one go without pinning them. >> - Store as the file tree? Can we still us the IPLD object to reference >> the files in the tree? Else use the "raw-leaves" option to avoid >> wrapping small files in UnixFS blocks. >> - Remove the Scheme manifest if IPLD can do. >> - Generate the IPLD object and pin it. >> >> Any corrections? >> Thoughts? >> >> Cheers! >> >> -- >> Pierre Neidhardt >> https://ambrevar.xyz/ >> >> -- >> You received this message because you are subscribed to the Google Group= s >> "Go IPFS Working Group" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n >> email to go-ipfs-wg+unsubscribe@ipfs.io. >> To view this discussion on the web visit >> https://groups.google.com/a/ipfs.io/d/msgid/go-ipfs-wg/87zhlxe8t9.fsf%40= ambrevar.xyz >> . >> > --000000000000dfcafe058db4c59e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

The reason not to use the UnixFSv1 metadata= field was that it's in the spec but it's not really been impleme= nted.=C2=A0 As it stands in v1, you'd have to add explicit metadata typ= es to the spec (executable, owner?, group?, etc) because protobufs need to = know about everything ahead of time and each implementation would have upda= te to implement those.=C2=A0 This is all possible & not a technical blo= cker, but since most effort is centred around UnixFSv2 the timescales might= not fit with people's requirements.

The more pragma= tic approach Hector suggested was to wrap a CID that resolves to the UnixFS= v1 file in a JSON object that you could use to store application-specific m= etadata - something similar to the UnixFSv1.5 se= ction in our notes from the Package Managers deep dive we did at camp.<= /div>

a.






On Fri, Jul 12, 2019 at 9:03 PM Molly Mack= inlay <molly@protocol.ai> wr= ote:
Thanks for the update Pierre! Also adding Alex, Jessica, Eric a= nd Andrew from the package managers discussions at IPFS Camp as FYI.
<= br>
Generating the ipld manifest with the metadata a= nd the tree of files should also be fine AFAIK - I=E2=80=99m sure Hector an= d Eric can expand more on how to compose them, but data storage format shou= ldn=E2=80=99t make a big difference for the ipld manifest.

On Mon, Jul = 1, 2019 at 2:36 PM Pierre Neidhardt <mail@ambrevar.xyz> wrote:
Hi!

(Re-sending to debbugs, sorry for the double email :p)

A little update/recap after many months! :)

I talked with H=C3=A9ctor and some other people from IPFS + I reviewed Ludo= 's
patch so now I have a little better understanding of the current state
of affair.

- We could store the substitutes as tarballs on IPFS, but this has
=C2=A0 some possible downsides:

=C2=A0 - We would need to use IPFS' tar chunker to deduplicate the cont= ent of
=C2=A0 =C2=A0 the tarball.=C2=A0 But the tar chunker is not well maintained= currently,
=C2=A0 =C2=A0 and it's not clear whether it's reproducible at the m= oment, so it
=C2=A0 =C2=A0 would need some more work.

=C2=A0 - Tarballs might induce some performance cost.=C2=A0 Nix had attempt= ed
=C2=A0 =C2=A0 something similar in the past and this may have incurred a si= gnificant
=C2=A0 =C2=A0 performance penalty, although this remains to be confirmed. =C2=A0 =C2=A0 Lewo?

- Ludo's patch stores all files on IPFS individually.=C2=A0 This way we= don't
=C2=A0 need to touch the tar chunker, so it's less work :)
=C2=A0 This raises some other issues however:

=C2=A0 - Extra metadata:=C2=A0 IPFS stores files on UnixFSv1 which does not=
=C2=A0 =C2=A0 include the executable bit.

=C2=A0 =C2=A0 - Right now we store a s-exp manifest with a list of files an= d a
=C2=A0 =C2=A0 =C2=A0 list of executable bits.=C2=A0 But maybe we don't = have to roll out our own.

=C2=A0 =C2=A0 - UnixFSv1 has some metadata field, but H=C3=A9ctor and Alex = did not
=C2=A0 =C2=A0 =C2=A0 recommend using it (not sure why though).

=C2=A0 =C2=A0 - We could use UnixFSv2 but it's not released yet and it&= #39;s unclear when
=C2=A0 =C2=A0 =C2=A0 it's going to be released.=C2=A0 So we can't r= eally count on it right now.

=C2=A0 =C2=A0 - IPLD: As H=C3=A9ctor suggested in the previous email, we co= uld leverage
=C2=A0 =C2=A0 =C2=A0 IPLD and generate a JSON object that references the fi= les with
=C2=A0 =C2=A0 =C2=A0 their paths together with an "executable?" p= roperty.
=C2=A0 =C2=A0 =C2=A0 A problem would arise if this IPLD object grows over t= he 2M
=C2=A0 =C2=A0 =C2=A0 block-size limit because then we would have to shard i= t (something
=C2=A0 =C2=A0 =C2=A0 that UnixFS would do automatically for us).

=C2=A0 - Flat storage vs. tree storage: Right now we are storing the files<= br> =C2=A0 =C2=A0 separately, but this has some shortcomings, namely we need mu= ltiple
=C2=A0 =C2=A0 "get" requests instead of just one, and that IPFS d= oes
=C2=A0 =C2=A0 not "know" that those files are related.=C2=A0 (We = lose the web view of
=C2=A0 =C2=A0 the tree, etc.)=C2=A0 Storing them as tree could be better. =C2=A0 =C2=A0 I don't understand if that would work with the "IPLD= manifest"
=C2=A0 =C2=A0 suggested above.=C2=A0 H=C3=A9ctor?

=C2=A0 - Pinning: Pinning all files separately incurs an overhead.=C2=A0 It= 's
=C2=A0 =C2=A0 enough to just pin the IPLD object since it propagates recurs= ively.
=C2=A0 =C2=A0 When adding a tree, then it's no problem since pinning is= only done once.

=C2=A0 - IPFS endpoint calls: instead of adding each file individually, it&= #39;s
=C2=A0 =C2=A0 possible to add them all in one go.=C2=A0 Can we add all file= s at once
=C2=A0 =C2=A0 while using a flat storage? (I.e. not adding them all under a= common
=C2=A0 =C2=A0 root folder.)

To sum up, here is what remains to be done on the current patch:

- Add all files in one go without pinning them.
- Store as the file tree?=C2=A0 Can we still us the IPLD object to referenc= e
=C2=A0 the files in the tree?=C2=A0 Else use the "raw-leaves" opt= ion to avoid
=C2=A0 wrapping small files in UnixFS blocks.
- Remove the Scheme manifest if IPLD can do.
- Generate the IPLD object and pin it.

Any corrections?
Thoughts?

Cheers!

--
Pierre Neidhardt
http= s://ambrevar.xyz/

--
You received this message because you are subscribed to the Google Groups &= quot;Go IPFS Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to go-ipfs-wg+unsubscribe@ipfs.io.
To view this discussion on the web visit https://groups.google.com/a/ipfs.io/d/msgid/go-ipf= s-wg/87zhlxe8t9.fsf%40ambrevar.xyz.
--000000000000dfcafe058db4c59e--