From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:41172) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gkQpj-0000zc-Ti for guix-patches@gnu.org; Fri, 18 Jan 2019 04:53:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gkQpi-0000Pq-Ec for guix-patches@gnu.org; Fri, 18 Jan 2019 04:53:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:36407) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gkQpi-0000Pj-B4 for guix-patches@gnu.org; Fri, 18 Jan 2019 04:53:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gkQpi-0001cK-9R for guix-patches@gnu.org; Fri, 18 Jan 2019 04:53:02 -0500 Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Resent-Message-ID: From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20181228231205.8068-1-ludo@gnu.org> <87r2dfv0nj.fsf@gnu.org> Date: Fri, 18 Jan 2019 10:52:49 +0100 In-Reply-To: (Hector Sanjuan's message of "Fri, 18 Jan 2019 09:08:02 +0000") Message-ID: <8736pqthqm.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Hector Sanjuan Cc: "go-ipfs-wg@ipfs.io" , Pierre Neidhardt , "33899@debbugs.gnu.org" <33899@debbugs.gnu.org> Hello, Hector Sanjuan skribis: > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original = Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 > On Monday, January 14, 2019 2:17 PM, Ludovic Court=C3=A8s = wrote: [...] >> Yes, I=E2=80=99m well aware of =E2=80=9Cunixfs=E2=80=9D. The problems, a= s I see it, is that it >> stores =E2=80=9Ctoo much=E2=80=9D in a way (we don=E2=80=99t need to sto= re the mtimes or >> permissions; we could ignore them upon reconstruction though), and =E2= =80=9Cnot >> enough=E2=80=9D in another way (the executable bit is lost, IIUC.) > > Actually the only metadata that Unixfs stores is size: > https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.proto and by all > means the amount of metadata is negligible for the actual data stored > and serves to give you a progress bar when you are downloading. Yes, the format I came up with also store the size so we can eventually display a progress bar. > Having IPFS understand what files are part of a single item is important > because you can pin/unpin,diff,patch all of them as a whole. Unixfs > also takes care of handling the case where the directories need to > be sharded because there are too many entries. Isn=E2=80=99t there a way, then, to achieve the same behavior with the cust= om format? The /api/v0/add entry point has a =E2=80=98pin=E2=80=99 argument; = I suppose we could leave it to false except when we add the top-level =E2=80=9Cdirectory= =E2=80=9D node? Wouldn=E2=80=99t that give us behavior similar to that of Unixfs? > When the user puts the single root hash in ipfs.io/ipfs/, it > will display correctly the underlying files and the people will be > able to navigate the actual tree with both web and cli. Right, though that=E2=80=99s less important in my view. > Note that every file added to IPFS is getting wrapped as a Unixfs > block anyways. You are just saving some "directory" nodes by adding > them separately. Hmm weird. When I do /api/v0/add, I=E2=80=99m really just passing a byte vector; there=E2=80=99s no notion of a =E2=80=9Cfile=E2=80=9D here, AFAICS.= Or am I missing something? >> > It will probably need some trial an error to get the multi-part right >> > to upload all in a single request. The Go code HTTP Clients doing >> > this can be found at: >> > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.go#L= 96 >> > As you see, a directory part in the multipart will have the content-ty= pe Header >> > set to "application/x-directory". The best way to see how "abspath" et= c is set >> > is probably to sniff an `ipfs add -r ` operation (localhos= t:5001). >> > Once UnixFSv2 lands, you will be in a position to just drop the sexp f= ile >> > altogether. >> >> Yes, that makes sense. In the meantime, I guess we have to keep using >> our own format. >> >> What are the performance implications of adding and retrieving files one >> by one like I did? I understand we=E2=80=99re doing N HTTP requests to t= he >> local IPFS daemon where =E2=80=9Cipfs add -r=E2=80=9D makes a single req= uest, but this >> alone can=E2=80=99t be much of a problem since communication is happening >> locally. Does pinning each file separately somehow incur additional >> overhead? >> > > Yes, pinning separately is slow and incurs in overhead. Pins are stored > in a merkle tree themselves so it involves reading, patching and saving. = This > gets quite slow when you have very large pinsets because your pins block = size > grow. Your pinset will grow very large if you do this. Additionally the > pinning operation itself requires global lock making it more slow. OK, I see. > But, even if it was fast, you will not have a way to easily unpin > anything that becomes obsolete or have an overview of to where things > belong. It is also unlikely that a single IPFS daemon will be able to > store everything you build, so you might find yourself using IPFS Cluster > soon to distribute the storage across multiple nodes and then you will > be effectively adding remotely. Currently, =E2=80=98guix publish=E2=80=99 stores things as long as they are= requested, and then for the duration specified with =E2=80=98--ttl=E2=80=99. I suppos= e we could have similar behavior with IPFS: if an item hasn=E2=80=99t been requested f= or the specified duration, then we unpin it. Does that make sense? Thanks for your help! Ludo=E2=80=99.