unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Encoding for Robust Immutable Storage (ERIS) and Guile
@ 2020-12-07 12:50 pukkamustard
  2020-12-09 16:50 ` Ludovic Courtès
  2020-12-09 20:01 ` Christopher Lemmer Webber
  0 siblings, 2 replies; 6+ messages in thread
From: pukkamustard @ 2020-12-07 12:50 UTC (permalink / raw)
  To: guile-user

Hello Guile Users,

I'm happy to announce guile-eris 0.2.0. This is a Guile 
implementation
of "Encoding for Robust Immutable Storage (ERIS)" [1].

ERIS defines how an arbirtary sequence of bytes can be encoded 
into a set
of uniformly sized blocks and an identifier (read capability). The
blocks are encrypted such that the original content can only be 
decoded
given the read capability. ERIS is a scheme for content-addressing 
in
the sense that the read capability is deterministically computed 
from
the content itself.

This is done by splitting content into blocks, encrypting them and
collecting references to blocks in a higher-level node (i.e. 
building a
Merkle Tree). See the specification [1] for a detailed description 
of
the encoding.

Encodings like ERIS are common in protocols such as Bittorrent, 
GNUNet,
IPFS, et. al. ERIS decouples the encoding from any particular 
protocol
or application, allowing content to be referenced regardless of 
storage
and transport layer. For example ERIS encoded content can be 
stored and
transported over IPFS [2], but also over HTTP or via an USB stick.

My interest in developing this has been for an ActivityPub-esque
applications where content can be cached and replicated to make
availability of content more robust. There seem to be many other
applications, including Guix substitutes.

ERIS is still "experimental". This release is intended to initiate
discussion and collect feedback from a wider circle. In particular 
I'd
be interested in your thoughts on applications and the Guile API.

ERIS is very much related to Datashards [3] and is in progress of
converging closer.

I have submitted a patch to Guix. You should be able to start
experimenting with ERIS with `guix environment --ad-hoc guile-eris
guile` shortly.

Thanks and happy hacking!
-pukkamustard

[1] http://purl.org/eris
[2] 
https://gitlab.com/openengiadina/eris/-/blob/main/examples/ipfs.org
[3] https://datashards.net/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding for Robust Immutable Storage (ERIS) and Guile
  2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
@ 2020-12-09 16:50 ` Ludovic Courtès
  2020-12-10  8:27   ` pukkamustard
  2020-12-09 20:01 ` Christopher Lemmer Webber
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2020-12-09 16:50 UTC (permalink / raw)
  To: guile-user

Hi!

pukkamustard <pukkamustard@posteo.net> skribis:

> I'm happy to announce guile-eris 0.2.0. This is a Guile implementation
> of "Encoding for Robust Immutable Storage (ERIS)" [1].

Yay, congrats!

> ERIS defines how an arbirtary sequence of bytes can be encoded into a
> set
> of uniformly sized blocks and an identifier (read capability). The
> blocks are encrypted such that the original content can only be
> decoded
> given the read capability. ERIS is a scheme for content-addressing in
> the sense that the read capability is deterministically computed from
> the content itself.
>
> This is done by splitting content into blocks, encrypting them and
> collecting references to blocks in a higher-level node (i.e. building
> a
> Merkle Tree). See the specification [1] for a detailed description of
> the encoding.

AIUI, this is exclusively convergent encryption, which is probably the
right choice for most applications anyway.

Block size is fixed; did you consider content-defined block boundaries
and such?  Perhaps it doesn’t bring much though.

> Encodings like ERIS are common in protocols such as Bittorrent,
> GNUNet,
> IPFS, et. al. ERIS decouples the encoding from any particular protocol
> or application, allowing content to be referenced regardless of
> storage
> and transport layer. For example ERIS encoded content can be stored
> and
> transported over IPFS [2], but also over HTTP or via an USB stick.

The IPFS example is nice!  There are bindings to the IPFS HTTP interface
floating around for Guix; would be nice to converge on these bits.

> ERIS is still "experimental". This release is intended to initiate
> discussion and collect feedback from a wider circle. In particular I'd
> be interested in your thoughts on applications and the Guile API.

Do I get it right that the encoder currently keeps blocks in memory?

Do you have plans to provide an interface to the storage backend so one
can easily switch between in-memory, Datashards, IPFS, etc.?

Thanks for the great work!

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding for Robust Immutable Storage (ERIS) and Guile
  2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
  2020-12-09 16:50 ` Ludovic Courtès
@ 2020-12-09 20:01 ` Christopher Lemmer Webber
  2020-12-10  9:02   ` pukkamustard
  1 sibling, 1 reply; 6+ messages in thread
From: Christopher Lemmer Webber @ 2020-12-09 20:01 UTC (permalink / raw)
  To: pukkamustard; +Cc: guile-user

Congratulations pukkamustard, I really am excited by and respect the
work you're doing on this.  I think it's probably the proper replacement
slot for the storage work I'm doing in Spritely.

Once Spritely Goblins gets ported to Guile it'll be really fun to
combine these two things. :)

pukkamustard writes:

> Hello Guile Users,
>
> I'm happy to announce guile-eris 0.2.0. This is a Guile implementation
> of "Encoding for Robust Immutable Storage (ERIS)" [1].
>
> ERIS defines how an arbirtary sequence of bytes can be encoded into a
> set
> of uniformly sized blocks and an identifier (read capability). The
> blocks are encrypted such that the original content can only be
> decoded
> given the read capability. ERIS is a scheme for content-addressing in
> the sense that the read capability is deterministically computed from
> the content itself.
>
> This is done by splitting content into blocks, encrypting them and
> collecting references to blocks in a higher-level node (i.e. building
> a
> Merkle Tree). See the specification [1] for a detailed description of
> the encoding.
>
> Encodings like ERIS are common in protocols such as Bittorrent,
> GNUNet,
> IPFS, et. al. ERIS decouples the encoding from any particular protocol
> or application, allowing content to be referenced regardless of
> storage
> and transport layer. For example ERIS encoded content can be stored
> and
> transported over IPFS [2], but also over HTTP or via an USB stick.
>
> My interest in developing this has been for an ActivityPub-esque
> applications where content can be cached and replicated to make
> availability of content more robust. There seem to be many other
> applications, including Guix substitutes.
>
> ERIS is still "experimental". This release is intended to initiate
> discussion and collect feedback from a wider circle. In particular I'd
> be interested in your thoughts on applications and the Guile API.
>
> ERIS is very much related to Datashards [3] and is in progress of
> converging closer.
>
> I have submitted a patch to Guix. You should be able to start
> experimenting with ERIS with `guix environment --ad-hoc guile-eris
> guile` shortly.
>
> Thanks and happy hacking!
> -pukkamustard
>
> [1] http://purl.org/eris
> [2]
> https://gitlab.com/openengiadina/eris/-/blob/main/examples/ipfs.org
> [3] https://datashards.net/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding for Robust Immutable Storage (ERIS) and Guile
  2020-12-09 16:50 ` Ludovic Courtès
@ 2020-12-10  8:27   ` pukkamustard
  2020-12-11  8:10     ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: pukkamustard @ 2020-12-10  8:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user


Hi Ludo,

> Block size is fixed; did you consider content-defined block 
> boundaries
> and such?  Perhaps it doesn’t bring much though.

I looked into block boundaries with a "sliding hash" (re-compute a 
short
hash for every byte read and choose boundaries when hash is zero). 
This
would allow a higher degree of de-duplication, but I found this to 
be a
bit "finicky" (and myself too impatient to tune and tweak this :).

I settled on fixed block sizes, making the encoding faster and 
preventing
information leaks based on block size.

An other idea to increase de-duplication: When encoding a 
directory,
align files to the ERIS block size. This would allows 
de-duplication of
files across encoded images/directories.

Maybe something like SquashFS already does such an alignment? That 
would
be cool...

> The IPFS example is nice!  There are bindings to the IPFS HTTP 
> interface
> floating around for Guix; would be nice to converge on these 
> bits.

Spelunking into wip-ipfs-substitutes is on my list! Will report 
back
with a report on the adventure. :)

>> ERIS is still "experimental". This release is intended to 
>> initiate
>> discussion and collect feedback from a wider circle. In 
>> particular I'd
>> be interested in your thoughts on applications and the Guile 
>> API.
>
> Do I get it right that the encoder currently keeps blocks in 
> memory?

By default when using `(eris-encode content)`, yes. The blocks are
stored into an alist.

But the encoder is implemented as an SRFI-171 transducer that 
eagerly
emits (reduces) encoded blocks. So one could do this:

(eris-encode content #:block-reducer my-backend)

Where `my-backend` is a SRFI-171 reducer that takes care of the 
blocks
as soon as they are ready. The IPFS example implements a reducer 
that
stores blocks to IPFS. By default `eris-encode` just uses `rcons` 
from
`(srfi srfi-171)`.

The encoding transducer is state-full. But it only keeps 
references to
blocks in memory and at most log(n) at any moment, where n is the
number of blocks to encode.

The decoding interface currently looks likes this:

(eris-decode->bytevector eris-urn
  (lambda (ref) (get-block-from-my-backend ref)))

Much room for improvement...

> Do you have plans to provide an interface to the storage backend 
> so one
> can easily switch between in-memory, Datashards, IPFS, etc.?

Currently the interface is a bit "low-level" - provide a SRFI-171
reducer. This can definitely be improved and I'd be happy for 
ideas on
how to make this more ergonomic.

Thank you for your comments!
-pukkamustard



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding for Robust Immutable Storage (ERIS) and Guile
  2020-12-09 20:01 ` Christopher Lemmer Webber
@ 2020-12-10  9:02   ` pukkamustard
  0 siblings, 0 replies; 6+ messages in thread
From: pukkamustard @ 2020-12-10  9:02 UTC (permalink / raw)
  To: Christopher Lemmer Webber; +Cc: guile-user


> Congratulations pukkamustard, I really am excited by and respect 
> the
> work you're doing on this.  I think it's probably the proper 
> replacement
> slot for the storage work I'm doing in Spritely.

Thank you!

> Once Spritely Goblins gets ported to Guile it'll be really fun 
> to
> combine these two things. :)

I agree! Looking forward to hacking on Spirtely in Guile...

-pukkamustard




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding for Robust Immutable Storage (ERIS) and Guile
  2020-12-10  8:27   ` pukkamustard
@ 2020-12-11  8:10     ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2020-12-11  8:10 UTC (permalink / raw)
  To: pukkamustard; +Cc: guile-user

Hello pukkamustard!

pukkamustard <pukkamustard@posteo.net> skribis:

> I looked into block boundaries with a "sliding hash" (re-compute a
> short
> hash for every byte read and choose boundaries when hash is
> zero). This
> would allow a higher degree of de-duplication, but I found this to be
> a
> bit "finicky" (and myself too impatient to tune and tweak this :).
>
> I settled on fixed block sizes, making the encoding faster and
> preventing
> information leaks based on block size.

Yeah, sounds reasonable.  (I evaluated the benefits of this and other
approaches years ago, FWIW: <https://hal.inria.fr/hal-00187069/en>.)

> An other idea to increase de-duplication: When encoding a directory,
> align files to the ERIS block size. This would allows de-duplication
> of
> files across encoded images/directories.

I guess that’d work, indeed.

>> Do I get it right that the encoder currently keeps blocks in memory?
>
> By default when using `(eris-encode content)`, yes. The blocks are
> stored into an alist.
>
> But the encoder is implemented as an SRFI-171 transducer that eagerly
> emits (reduces) encoded blocks. So one could do this:
>
> (eris-encode content #:block-reducer my-backend)
>
> Where `my-backend` is a SRFI-171 reducer that takes care of the blocks
> as soon as they are ready. The IPFS example implements a reducer that
> stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
> `(srfi srfi-171)`.

Ah, I see, that’s great!  I’m not familiar with the transducer API so I
always have to think twice (or more) about what’s going on; the
flexibility it gives here is really nice.

> The encoding transducer is state-full. But it only keeps references to
> blocks in memory and at most log(n) at any moment, where n is the
> number of blocks to encode.
>
> The decoding interface currently looks likes this:
>
> (eris-decode->bytevector eris-urn
>  (lambda (ref) (get-block-from-my-backend ref)))

OK.

>> Do you have plans to provide an interface to the storage backend so
>> one
>> can easily switch between in-memory, Datashards, IPFS, etc.?
>
> Currently the interface is a bit "low-level" - provide a SRFI-171
> reducer. This can definitely be improved and I'd be happy for ideas on
> how to make this more ergonomic.

Maybe that’s all we need after all.  Maybe what would be nice is a
couple of examples, like a high-level procedure or CLI that can insert
or fetch from either (say) a local GDBM database or IPFS.  That would
illustrate integration with backends as well as the high-level API.

Thanks!

Ludo’.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-11  8:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07 12:50 Encoding for Robust Immutable Storage (ERIS) and Guile pukkamustard
2020-12-09 16:50 ` Ludovic Courtès
2020-12-10  8:27   ` pukkamustard
2020-12-11  8:10     ` Ludovic Courtès
2020-12-09 20:01 ` Christopher Lemmer Webber
2020-12-10  9:02   ` pukkamustard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).