unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Hector Sanjuan <code@hector.link>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: "go-ipfs-wg\\@ipfs.io" <go-ipfs-wg@ipfs.io>,
	Pierre Neidhardt <mail@ambrevar.xyz>,
	"33899\\@debbugs.gnu.org" <33899@debbugs.gnu.org>
Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS
Date: Fri, 18 Jan 2019 09:08:02 +0000	[thread overview]
Message-ID: <neM1uqJ3yxqbJiTzV6-q6R-8GNGjv7l_7TJhhQIGXpDQbLoS8yIYrJ4KxKYmFwpi1O9YePH3d5i3fknYgv7nfuMrXFgYoxsk_Xxgs9_Sd2U=@hector.link> (raw)
In-Reply-To: <87r2dfv0nj.fsf@gnu.org>

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, January 14, 2019 2:17 PM, Ludovic Courtès <ludo@gnu.org> wrote:

> Hi Hector,
>
> Happy new year to you too! :-)
>
> Hector Sanjuan code@hector.link skribis:
>
> > 1.  The doc strings usually refer to the IPFS HTTP API as GATEWAY. go-ipfs
> >     has a read/write API (on :5001) and a read-only API that we call "gateway"
> >     and which runs on :8080. The gateway, apart from handling most of the
> >     read-only methods from the HTTP API, also handles paths like "/ipfs/<cid>"
> >     or "/ipns/<name>" gracefully, and returns an autogenerated webpage for
> >     directory-type CIDs. The gateway does not allow to "publish". Therefore I think
> >     the doc strings should say "IPFS daemon API" rather than "GATEWAY".
> >
>
> Indeed, I’ll change that.
>
> > 2.  I'm not proficient enough in schema to grasp the details of the
> >     "directory" format. If I understand it right, you keep a separate manifest
> >     object listing the directory structure, the contents and the executable bit
> >     for each. Thus, when adding a store item you add all the files separately and
> >     this manifest. And when retrieving a store item you fetch the manifest and
> >     reconstruct the tree by fetching the contents in it (and applying the
> >     executable flag). Is this correct? This works, but it can be improved:
> >
>
> That’s correct.
>
> > You can add all the files/folders in a single request. If I'm
> > reading it right, now each files is added separately (and gets pinned
> > separately). It would probably make sense to add it all in a single request,
> > letting IPFS to store the directory structure as "unixfs". You can
> > additionally add the sexp file with the dir-structure and executable flags
> > as an extra file to the root folder. This would allow to fetch the whole thing
> > with a single request too /api/v0/get?arg=<hash>. And to pin a single hash
> > recursively (and not each separately). After getting the whole thing, you
> > will need to chmod +x things accordingly.
>
> Yes, I’m well aware of “unixfs”. The problems, as I see it, is that it
> stores “too much” in a way (we don’t need to store the mtimes or
> permissions; we could ignore them upon reconstruction though), and “not
> enough” in another way (the executable bit is lost, IIUC.)

Actually the only metadata that Unixfs stores is size:
https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.proto and by all
means the amount of metadata is negligible for the actual data stored
and serves to give you a progress bar when you are downloading.

Having IPFS understand what files are part of a single item is important
because you can pin/unpin,diff,patch all of them as a whole. Unixfs
also takes care of handling the case where the directories need to
be sharded because there are too many entries. When the user
puts the single root hash in ipfs.io/ipfs/<hash>, it will display
correctly the underlying files and the people will be
able to navigate the actual tree with both web and cli. Note that
every file added to IPFS is getting wrapped as a Unixfs block
anyways. You are just saving some "directory" nodes by adding
them separately.

There is an alternative way which is using IPLD to implement a custom
block format that carries the executable bit information and nothing
else. But I don't see significant advantages at this point for the extra
work it requires.

>
> > It will probably need some trial an error to get the multi-part right
> > to upload all in a single request. The Go code HTTP Clients doing
> > this can be found at:
> > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.go#L96
> > As you see, a directory part in the multipart will have the content-type Header
> > set to "application/x-directory". The best way to see how "abspath" etc is set
> > is probably to sniff an `ipfs add -r <testfolder>` operation (localhost:5001).
> > Once UnixFSv2 lands, you will be in a position to just drop the sexp file
> > altogether.
>
> Yes, that makes sense. In the meantime, I guess we have to keep using
> our own format.
>
> What are the performance implications of adding and retrieving files one
> by one like I did? I understand we’re doing N HTTP requests to the
> local IPFS daemon where “ipfs add -r” makes a single request, but this
> alone can’t be much of a problem since communication is happening
> locally. Does pinning each file separately somehow incur additional
> overhead?
>

Yes, pinning separately is slow and incurs in overhead. Pins are stored
in a merkle tree themselves so it involves reading, patching and saving. This
gets quite slow when you have very large pinsets because your pins block size
grow. Your pinset will grow very large if you do this. Additionally the
pinning operation itself requires global lock making it more slow.

But, even if it was fast, you will not have a way to easily unpin
anything that becomes obsolete or have an overview of to where things
belong. It is also unlikely that a single IPFS daemon will be able to
store everything you build, so you might find yourself using IPFS Cluster
soon to distribute the storage across multiple nodes and then you will
be effectively adding remotely.


> Thanks for your feedback!
>
> Ludo’.

Thanks for working on this!

Hector

  reply	other threads:[~2019-01-18  9:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 23:12 [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Ludovic Courtès
2018-12-28 23:15 ` [bug#33899] [PATCH 1/5] Add (guix json) Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 2/5] tests: 'file=?' now recurses on directories Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 3/5] Add (guix ipfs) Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 4/5] publish: Add IPFS support Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 5/5] DRAFT substitute: " Ludovic Courtès
2019-01-07 14:43 ` [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Hector Sanjuan
2019-01-14 13:17   ` Ludovic Courtès
2019-01-18  9:08     ` Hector Sanjuan [this message]
2019-01-18  9:52       ` Ludovic Courtès
2019-01-18 11:26         ` Hector Sanjuan
2019-07-01 21:36           ` Pierre Neidhardt
2019-07-06  8:44             ` Pierre Neidhardt
2019-07-12 20:02             ` Molly Mackinlay
2019-07-15  9:20               ` Alex Potsides
2019-07-12 20:15             ` Ludovic Courtès
2019-07-14 22:31               ` Hector Sanjuan
2019-07-15  9:24                 ` Ludovic Courtès
2019-07-15 10:10                   ` Pierre Neidhardt
2019-07-15 10:21                     ` Hector Sanjuan
2019-05-13 18:51 ` Alex Griffin
2020-12-29  9:59 ` [bug#33899] Ludo's patch rebased on master Maxime Devos
2021-06-06 17:54 ` [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Tony Olagbaiye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='neM1uqJ3yxqbJiTzV6-q6R-8GNGjv7l_7TJhhQIGXpDQbLoS8yIYrJ4KxKYmFwpi1O9YePH3d5i3fknYgv7nfuMrXFgYoxsk_Xxgs9_Sd2U=@hector.link' \
    --to=code@hector.link \
    --cc=33899@debbugs.gnu.org \
    --cc=go-ipfs-wg@ipfs.io \
    --cc=ludo@gnu.org \
    --cc=mail@ambrevar.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).