From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:36858) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gkSIj-0001dy-LS for guix-patches@gnu.org; Fri, 18 Jan 2019 06:27:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gkSIh-0007dU-QP for guix-patches@gnu.org; Fri, 18 Jan 2019 06:27:05 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:36476) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gkSIg-0007d3-QW for guix-patches@gnu.org; Fri, 18 Jan 2019 06:27:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gkSIg-00081d-HX for guix-patches@gnu.org; Fri, 18 Jan 2019 06:27:02 -0500 Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Resent-Message-ID: Date: Fri, 18 Jan 2019 11:26:18 +0000 From: Hector Sanjuan Message-ID: In-Reply-To: <8736pqthqm.fsf@gnu.org> References: <20181228231205.8068-1-ludo@gnu.org> <87r2dfv0nj.fsf@gnu.org> <8736pqthqm.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Hector Sanjuan Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: "go-ipfs-wg\\@ipfs.io" , Pierre Neidhardt , "33899\\@debbugs.gnu.org" <33899@debbugs.gnu.org> =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Friday, January 18, 2019 10:52 AM, Ludovic Court=C3=A8s w= rote: > Hello, > > Hector Sanjuan code@hector.link skribis: > > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina= l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80= =90 > > On Monday, January 14, 2019 2:17 PM, Ludovic Court=C3=A8s ludo@gnu.org = wrote: > > [...] > > > Isn=E2=80=99t there a way, then, to achieve the same behavior with the cu= stom > format? The /api/v0/add entry point has a =E2=80=98pin=E2=80=99 argument;= I suppose we > could leave it to false except when we add the top-level =E2=80=9Cdirecto= ry=E2=80=9D > node? Wouldn=E2=80=99t that give us behavior similar to that of Unixfs? > Yes. What you could do is to add every file flatly/separately (with pin=3Df= alse) and at the end add an IPLD object with references to all the files that you added and including the exec bit information (and size?). This is just a JSON file: { "name": "package name", "contents": [ { "path": "/file/path", # so you know where to extract it later "exec": true, "ipfs": { "/": "Qmhash..." } }, ... } This needs to be added to IPFS with the /api/v0/dag/put endpoint (this converts it to CBOR - IPLD-Cbor is the actual block format used here). When this is pinned (?pin=3Dtrue), this will pin all the things referenced from it recursively in the way we want. So this will be quite similar to unixfs. But note that if this blob ever grows over the 2M block-size limit because you have a package with many files, you will need to start solving problems that unixfs solves automatically now (directory sharding). Because IPLD-cbor is supported, ipfs, the gateway etc will know how to display these manifests, the info in it and their links. > > When the user puts the single root hash in ipfs.io/ipfs/, it > > will display correctly the underlying files and the people will be > > able to navigate the actual tree with both web and cli. > > Right, though that=E2=80=99s less important in my view. > > > Note that every file added to IPFS is getting wrapped as a Unixfs > > block anyways. You are just saving some "directory" nodes by adding > > them separately. > > Hmm weird. When I do /api/v0/add, I=E2=80=99m really just passing a byte > vector; there=E2=80=99s no notion of a =E2=80=9Cfile=E2=80=9D here, AFAIC= S. Or am I missing > something? They are wrapped in Unixfs blocks anyway by default. From the moment the file is >256K it will get chunked into several pieces and a Unixfs block (or multiple, if a really big file) is necessary to reference them. In this case the root hash will be a Unixfs node with links to the parts. There is a "raw-leaves" option which does not wrap the individual blocks with unixfs, so if the file is small to not be chunked, you can avoid the default unixfs-wrapping this way. > > > > > It will probably need some trial an error to get the multi-part rig= ht > > > > to upload all in a single request. The Go code HTTP Clients doing > > > > this can be found at: > > > > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.g= o#L96 > > > > As you see, a directory part in the multipart will have the content= -type Header > > > > set to "application/x-directory". The best way to see how "abspath"= etc is set > > > > is probably to sniff an `ipfs add -r ` operation (local= host:5001). > > > > Once UnixFSv2 lands, you will be in a position to just drop the sex= p file > > > > altogether. > > > > > > Yes, that makes sense. In the meantime, I guess we have to keep using > > > our own format. > > > What are the performance implications of adding and retrieving files = one > > > by one like I did? I understand we=E2=80=99re doing N HTTP requests t= o the > > > local IPFS daemon where =E2=80=9Cipfs add -r=E2=80=9D makes a single = request, but this > > > alone can=E2=80=99t be much of a problem since communication is happe= ning > > > locally. Does pinning each file separately somehow incur additional > > > overhead? > > > > Yes, pinning separately is slow and incurs in overhead. Pins are stored > > in a merkle tree themselves so it involves reading, patching and saving= . This > > gets quite slow when you have very large pinsets because your pins bloc= k size > > grow. Your pinset will grow very large if you do this. Additionally the > > pinning operation itself requires global lock making it more slow. > > OK, I see. I should add that even if you want to /add all files separately (and then put the IPLD manifest I described above), you can still add them all in the= same request (it becomes easier as you just need to put more parts in the multip= art and don't have to worry about names/folders/paths). The /add endpoint will forcefully close the HTTP connection for every /add (long story) and small delays might add up to a big one. Specially rel= evant if using IPFS Cluster, where /add might send the blocks somewhere else and = does needs to do some other things. > > > But, even if it was fast, you will not have a way to easily unpin > > anything that becomes obsolete or have an overview of to where things > > belong. It is also unlikely that a single IPFS daemon will be able to > > store everything you build, so you might find yourself using IPFS Clust= er > > soon to distribute the storage across multiple nodes and then you will > > be effectively adding remotely. > > Currently, =E2=80=98guix publish=E2=80=99 stores things as long as they a= re requested, > and then for the duration specified with =E2=80=98--ttl=E2=80=99. I suppo= se we could > have similar behavior with IPFS: if an item hasn=E2=80=99t been requested= for > the specified duration, then we unpin it. > > Does that make sense? Yes, in fact I wanted IPFS Cluster to support a TTL so that things are automatically unpinned when it expires too. > > Thanks for your help! > > Ludo=E2=80=99. Thanks! Hector