From: "Ludovic Courtès" <ludo@gnu.org>
To: pukkamustard <pukkamustard@posteo.net>
Cc: guile-user@gnu.org
Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile
Date: Fri, 11 Dec 2020 09:10:50 +0100 [thread overview]
Message-ID: <87tuss9305.fsf@gnu.org> (raw)
In-Reply-To: <86wnxq5and.fsf@posteo.net> (pukkamustard@posteo.net's message of "Thu, 10 Dec 2020 09:27:02 +0100")
Hello pukkamustard!
pukkamustard <pukkamustard@posteo.net> skribis:
> I looked into block boundaries with a "sliding hash" (re-compute a
> short
> hash for every byte read and choose boundaries when hash is
> zero). This
> would allow a higher degree of de-duplication, but I found this to be
> a
> bit "finicky" (and myself too impatient to tune and tweak this :).
>
> I settled on fixed block sizes, making the encoding faster and
> preventing
> information leaks based on block size.
Yeah, sounds reasonable. (I evaluated the benefits of this and other
approaches years ago, FWIW: <https://hal.inria.fr/hal-00187069/en>.)
> An other idea to increase de-duplication: When encoding a directory,
> align files to the ERIS block size. This would allows de-duplication
> of
> files across encoded images/directories.
I guess that’d work, indeed.
>> Do I get it right that the encoder currently keeps blocks in memory?
>
> By default when using `(eris-encode content)`, yes. The blocks are
> stored into an alist.
>
> But the encoder is implemented as an SRFI-171 transducer that eagerly
> emits (reduces) encoded blocks. So one could do this:
>
> (eris-encode content #:block-reducer my-backend)
>
> Where `my-backend` is a SRFI-171 reducer that takes care of the blocks
> as soon as they are ready. The IPFS example implements a reducer that
> stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
> `(srfi srfi-171)`.
Ah, I see, that’s great! I’m not familiar with the transducer API so I
always have to think twice (or more) about what’s going on; the
flexibility it gives here is really nice.
> The encoding transducer is state-full. But it only keeps references to
> blocks in memory and at most log(n) at any moment, where n is the
> number of blocks to encode.
>
> The decoding interface currently looks likes this:
>
> (eris-decode->bytevector eris-urn
> (lambda (ref) (get-block-from-my-backend ref)))
OK.
>> Do you have plans to provide an interface to the storage backend so
>> one
>> can easily switch between in-memory, Datashards, IPFS, etc.?
>
> Currently the interface is a bit "low-level" - provide a SRFI-171
> reducer. This can definitely be improved and I'd be happy for ideas on
> how to make this more ergonomic.
Maybe that’s all we need after all. Maybe what would be nice is a
couple of examples, like a high-level procedure or CLI that can insert
or fetch from either (say) a local GDBM database or IPFS. That would
illustrate integration with backends as well as the high-level API.
Thanks!
Ludo’.
next prev parent reply other threads:[~2020-12-11 8:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
2020-12-09 16:50 ` Ludovic Courtès
2020-12-10 8:27 ` pukkamustard
2020-12-11 8:10 ` Ludovic Courtès [this message]
2020-12-09 20:01 ` Christopher Lemmer Webber
2020-12-10 9:02 ` pukkamustard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tuss9305.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=guile-user@gnu.org \
--cc=pukkamustard@posteo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).