all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Hector Sanjuan <code@hector.link>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: "go-ipfs-wg\\@ipfs.io" <go-ipfs-wg@ipfs.io>,
	Pierre Neidhardt <mail@ambrevar.xyz>,
	"33899\\@debbugs.gnu.org" <33899@debbugs.gnu.org>
Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS
Date: Fri, 18 Jan 2019 11:26:18 +0000	[thread overview]
Message-ID: <pa258vxfuaaiuhOEdEpZPTqfDHe4hoICI7EPyJ284kk6qVWc_VakU3cwxbx46BVXobg0LMWyrjE_uX6nJ6XoJj78-mfjtsejqHlx3bmkNFw=@hector.link> (raw)
In-Reply-To: <8736pqthqm.fsf@gnu.org>

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, January 18, 2019 10:52 AM, Ludovic Courtès <ludo@gnu.org> wrote:

> Hello,
>
> Hector Sanjuan code@hector.link skribis:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Monday, January 14, 2019 2:17 PM, Ludovic Courtès ludo@gnu.org wrote:
>
> [...]
>

>
> Isn’t there a way, then, to achieve the same behavior with the custom
> format? The /api/v0/add entry point has a ‘pin’ argument; I suppose we
> could leave it to false except when we add the top-level “directory”
> node? Wouldn’t that give us behavior similar to that of Unixfs?
>

Yes. What you could do is to add every file flatly/separately (with pin=false)
and at the end add an IPLD object with references to all the
files that you added and including the exec bit information (and size?).
This is just a JSON file:

{
   "name": "package name",
   "contents": [
       {
           "path": "/file/path", # so you know where to extract it later
           "exec": true,
           "ipfs": { "/": "Qmhash..." }
       },
       ...
}

This needs to be added to IPFS with the /api/v0/dag/put endpoint (this
converts it to CBOR - IPLD-Cbor is the actual block format used here).
When this is pinned (?pin=true), this will pin all the things referenced
from it recursively in the way we want.

So this will be quite similar to unixfs. But note that if this blob
ever grows over the 2M block-size limit because you have a package with
many files, you will need to start solving problems that unixfs solves
automatically now (directory sharding).

Because IPLD-cbor is supported, ipfs, the gateway etc will know how to
display these manifests, the info in it and their links.


> > When the user puts the single root hash in ipfs.io/ipfs/<hash>, it
> > will display correctly the underlying files and the people will be
> > able to navigate the actual tree with both web and cli.
>
> Right, though that’s less important in my view.
>
> > Note that every file added to IPFS is getting wrapped as a Unixfs
> > block anyways. You are just saving some "directory" nodes by adding
> > them separately.
>
> Hmm weird. When I do /api/v0/add, I’m really just passing a byte
> vector; there’s no notion of a “file” here, AFAICS. Or am I missing
> something?

They are wrapped in Unixfs blocks anyway by default. From the moment
the file is >256K it will get chunked into several  pieces and
a Unixfs block (or multiple, if a really big file) is necessary to
reference them. In this case the root hash will be a Unixfs node
with links to the parts.

There is a "raw-leaves" option which does not wrap the individual
blocks with unixfs, so if the file is small to not be chunked,
you can avoid the default unixfs-wrapping this way.


>
> > > > It will probably need some trial an error to get the multi-part right
> > > > to upload all in a single request. The Go code HTTP Clients doing
> > > > this can be found at:
> > > > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.go#L96
> > > > As you see, a directory part in the multipart will have the content-type Header
> > > > set to "application/x-directory". The best way to see how "abspath" etc is set
> > > > is probably to sniff an `ipfs add -r <testfolder>` operation (localhost:5001).
> > > > Once UnixFSv2 lands, you will be in a position to just drop the sexp file
> > > > altogether.
> > >
> > > Yes, that makes sense. In the meantime, I guess we have to keep using
> > > our own format.
> > > What are the performance implications of adding and retrieving files one
> > > by one like I did? I understand we’re doing N HTTP requests to the
> > > local IPFS daemon where “ipfs add -r” makes a single request, but this
> > > alone can’t be much of a problem since communication is happening
> > > locally. Does pinning each file separately somehow incur additional
> > > overhead?
> >
> > Yes, pinning separately is slow and incurs in overhead. Pins are stored
> > in a merkle tree themselves so it involves reading, patching and saving. This
> > gets quite slow when you have very large pinsets because your pins block size
> > grow. Your pinset will grow very large if you do this. Additionally the
> > pinning operation itself requires global lock making it more slow.
>
> OK, I see.

I should add that even if you want to /add all files separately (and then
put the IPLD manifest I described above), you can still add them all in the same
request (it becomes easier as you just need to put more parts in the multipart
and don't have to worry about names/folders/paths).

The /add endpoint will forcefully close the HTTP connection for every
/add (long story) and small delays might add up to a big one. Specially relevant
if using IPFS Cluster, where /add might send the blocks somewhere else and does
needs to do some other things.


>
> > But, even if it was fast, you will not have a way to easily unpin
> > anything that becomes obsolete or have an overview of to where things
> > belong. It is also unlikely that a single IPFS daemon will be able to
> > store everything you build, so you might find yourself using IPFS Cluster
> > soon to distribute the storage across multiple nodes and then you will
> > be effectively adding remotely.
>
> Currently, ‘guix publish’ stores things as long as they are requested,
> and then for the duration specified with ‘--ttl’. I suppose we could
> have similar behavior with IPFS: if an item hasn’t been requested for
> the specified duration, then we unpin it.
>
> Does that make sense?

Yes, in fact I wanted IPFS Cluster to support a TTL so that things are
automatically unpinned when it expires too.

>
> Thanks for your help!
>
> Ludo’.

Thanks!

Hector

  reply	other threads:[~2019-01-18 11:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 23:12 [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Ludovic Courtès
2018-12-28 23:15 ` [bug#33899] [PATCH 1/5] Add (guix json) Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 2/5] tests: 'file=?' now recurses on directories Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 3/5] Add (guix ipfs) Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 4/5] publish: Add IPFS support Ludovic Courtès
2018-12-28 23:15   ` [bug#33899] [PATCH 5/5] DRAFT substitute: " Ludovic Courtès
2019-01-07 14:43 ` [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Hector Sanjuan
2019-01-14 13:17   ` Ludovic Courtès
2019-01-18  9:08     ` Hector Sanjuan
2019-01-18  9:52       ` Ludovic Courtès
2019-01-18 11:26         ` Hector Sanjuan [this message]
2019-07-01 21:36           ` Pierre Neidhardt
2019-07-06  8:44             ` Pierre Neidhardt
2019-07-12 20:02             ` Molly Mackinlay
2019-07-15  9:20               ` Alex Potsides
2019-07-12 20:15             ` Ludovic Courtès
2019-07-14 22:31               ` Hector Sanjuan
2019-07-15  9:24                 ` Ludovic Courtès
2019-07-15 10:10                   ` Pierre Neidhardt
2019-07-15 10:21                     ` Hector Sanjuan
2019-05-13 18:51 ` Alex Griffin
2020-12-29  9:59 ` [bug#33899] Ludo's patch rebased on master Maxime Devos
2021-06-06 17:54 ` [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS Tony Olagbaiye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pa258vxfuaaiuhOEdEpZPTqfDHe4hoICI7EPyJ284kk6qVWc_VakU3cwxbx46BVXobg0LMWyrjE_uX6nJ6XoJj78-mfjtsejqHlx3bmkNFw=@hector.link' \
    --to=code@hector.link \
    --cc=33899@debbugs.gnu.org \
    --cc=go-ipfs-wg@ipfs.io \
    --cc=ludo@gnu.org \
    --cc=mail@ambrevar.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.