unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: pukkamustard <pukkamustard@posteo.net>
Cc: guile-user@gnu.org
Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile
Date: Fri, 11 Dec 2020 09:10:50 +0100	[thread overview]
Message-ID: <87tuss9305.fsf@gnu.org> (raw)
In-Reply-To: <86wnxq5and.fsf@posteo.net> (pukkamustard@posteo.net's message of "Thu, 10 Dec 2020 09:27:02 +0100")

Hello pukkamustard!

pukkamustard <pukkamustard@posteo.net> skribis:

> I looked into block boundaries with a "sliding hash" (re-compute a
> short
> hash for every byte read and choose boundaries when hash is
> zero). This
> would allow a higher degree of de-duplication, but I found this to be
> a
> bit "finicky" (and myself too impatient to tune and tweak this :).
>
> I settled on fixed block sizes, making the encoding faster and
> preventing
> information leaks based on block size.

Yeah, sounds reasonable.  (I evaluated the benefits of this and other
approaches years ago, FWIW: <https://hal.inria.fr/hal-00187069/en>.)

> An other idea to increase de-duplication: When encoding a directory,
> align files to the ERIS block size. This would allows de-duplication
> of
> files across encoded images/directories.

I guess that’d work, indeed.

>> Do I get it right that the encoder currently keeps blocks in memory?
>
> By default when using `(eris-encode content)`, yes. The blocks are
> stored into an alist.
>
> But the encoder is implemented as an SRFI-171 transducer that eagerly
> emits (reduces) encoded blocks. So one could do this:
>
> (eris-encode content #:block-reducer my-backend)
>
> Where `my-backend` is a SRFI-171 reducer that takes care of the blocks
> as soon as they are ready. The IPFS example implements a reducer that
> stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
> `(srfi srfi-171)`.

Ah, I see, that’s great!  I’m not familiar with the transducer API so I
always have to think twice (or more) about what’s going on; the
flexibility it gives here is really nice.

> The encoding transducer is state-full. But it only keeps references to
> blocks in memory and at most log(n) at any moment, where n is the
> number of blocks to encode.
>
> The decoding interface currently looks likes this:
>
> (eris-decode->bytevector eris-urn
>  (lambda (ref) (get-block-from-my-backend ref)))

OK.

>> Do you have plans to provide an interface to the storage backend so
>> one
>> can easily switch between in-memory, Datashards, IPFS, etc.?
>
> Currently the interface is a bit "low-level" - provide a SRFI-171
> reducer. This can definitely be improved and I'd be happy for ideas on
> how to make this more ergonomic.

Maybe that’s all we need after all.  Maybe what would be nice is a
couple of examples, like a high-level procedure or CLI that can insert
or fetch from either (say) a local GDBM database or IPFS.  That would
illustrate integration with backends as well as the high-level API.

Thanks!

Ludo’.



  reply	other threads:[~2020-12-11  8:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
2020-12-09 16:50 ` Ludovic Courtès
2020-12-10  8:27   ` pukkamustard
2020-12-11  8:10     ` Ludovic Courtès [this message]
2020-12-09 20:01 ` Christopher Lemmer Webber
2020-12-10  9:02   ` pukkamustard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuss9305.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-user@gnu.org \
    --cc=pukkamustard@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).