unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: pukkamustard <pukkamustard@posteo.net>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-user@gnu.org
Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile
Date: Thu, 10 Dec 2020 09:27:02 +0100	[thread overview]
Message-ID: <86wnxq5and.fsf@posteo.net> (raw)
In-Reply-To: <878sa6c49s.fsf@gnu.org>


Hi Ludo,

> Block size is fixed; did you consider content-defined block 
> boundaries
> and such?  Perhaps it doesn’t bring much though.

I looked into block boundaries with a "sliding hash" (re-compute a 
short
hash for every byte read and choose boundaries when hash is zero). 
This
would allow a higher degree of de-duplication, but I found this to 
be a
bit "finicky" (and myself too impatient to tune and tweak this :).

I settled on fixed block sizes, making the encoding faster and 
preventing
information leaks based on block size.

An other idea to increase de-duplication: When encoding a 
directory,
align files to the ERIS block size. This would allows 
de-duplication of
files across encoded images/directories.

Maybe something like SquashFS already does such an alignment? That 
would
be cool...

> The IPFS example is nice!  There are bindings to the IPFS HTTP 
> interface
> floating around for Guix; would be nice to converge on these 
> bits.

Spelunking into wip-ipfs-substitutes is on my list! Will report 
back
with a report on the adventure. :)

>> ERIS is still "experimental". This release is intended to 
>> initiate
>> discussion and collect feedback from a wider circle. In 
>> particular I'd
>> be interested in your thoughts on applications and the Guile 
>> API.
>
> Do I get it right that the encoder currently keeps blocks in 
> memory?

By default when using `(eris-encode content)`, yes. The blocks are
stored into an alist.

But the encoder is implemented as an SRFI-171 transducer that 
eagerly
emits (reduces) encoded blocks. So one could do this:

(eris-encode content #:block-reducer my-backend)

Where `my-backend` is a SRFI-171 reducer that takes care of the 
blocks
as soon as they are ready. The IPFS example implements a reducer 
that
stores blocks to IPFS. By default `eris-encode` just uses `rcons` 
from
`(srfi srfi-171)`.

The encoding transducer is state-full. But it only keeps 
references to
blocks in memory and at most log(n) at any moment, where n is the
number of blocks to encode.

The decoding interface currently looks likes this:

(eris-decode->bytevector eris-urn
  (lambda (ref) (get-block-from-my-backend ref)))

Much room for improvement...

> Do you have plans to provide an interface to the storage backend 
> so one
> can easily switch between in-memory, Datashards, IPFS, etc.?

Currently the interface is a bit "low-level" - provide a SRFI-171
reducer. This can definitely be improved and I'd be happy for 
ideas on
how to make this more ergonomic.

Thank you for your comments!
-pukkamustard



  reply	other threads:[~2020-12-10  8:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
2020-12-09 16:50 ` Ludovic Courtès
2020-12-10  8:27   ` pukkamustard [this message]
2020-12-11  8:10     ` Ludovic Courtès
2020-12-09 20:01 ` Christopher Lemmer Webber
2020-12-10  9:02   ` pukkamustard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86wnxq5and.fsf@posteo.net \
    --to=pukkamustard@posteo.net \
    --cc=guile-user@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).