From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Newsgroups: gmane.lisp.guile.user Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile Date: Fri, 11 Dec 2020 09:10:50 +0100 Message-ID: <87tuss9305.fsf@gnu.org> References: <86ft4hg4qu.fsf@posteo.net> <878sa6c49s.fsf@gnu.org> <86wnxq5and.fsf@posteo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11736"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: guile-user@gnu.org To: pukkamustard Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Fri Dec 11 09:11:07 2020 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kndW7-0002vh-98 for guile-user@m.gmane-mx.org; Fri, 11 Dec 2020 09:11:07 +0100 Original-Received: from localhost ([::1]:45318 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kndW6-00015w-B4 for guile-user@m.gmane-mx.org; Fri, 11 Dec 2020 03:11:06 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39196) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kndVt-00014i-Ra for guile-user@gnu.org; Fri, 11 Dec 2020 03:10:53 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58085) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kndVt-0000Yw-KR; Fri, 11 Dec 2020 03:10:53 -0500 Original-Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=33948 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kndVt-00083v-4R; Fri, 11 Dec 2020 03:10:53 -0500 X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 21 Frimaire an 229 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu In-Reply-To: <86wnxq5and.fsf@posteo.net> (pukkamustard@posteo.net's message of "Thu, 10 Dec 2020 09:27:02 +0100") X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.io gmane.lisp.guile.user:17071 Archived-At: Hello pukkamustard! pukkamustard skribis: > I looked into block boundaries with a "sliding hash" (re-compute a > short > hash for every byte read and choose boundaries when hash is > zero). This > would allow a higher degree of de-duplication, but I found this to be > a > bit "finicky" (and myself too impatient to tune and tweak this :). > > I settled on fixed block sizes, making the encoding faster and > preventing > information leaks based on block size. Yeah, sounds reasonable. (I evaluated the benefits of this and other approaches years ago, FWIW: .) > An other idea to increase de-duplication: When encoding a directory, > align files to the ERIS block size. This would allows de-duplication > of > files across encoded images/directories. I guess that=E2=80=99d work, indeed. >> Do I get it right that the encoder currently keeps blocks in memory? > > By default when using `(eris-encode content)`, yes. The blocks are > stored into an alist. > > But the encoder is implemented as an SRFI-171 transducer that eagerly > emits (reduces) encoded blocks. So one could do this: > > (eris-encode content #:block-reducer my-backend) > > Where `my-backend` is a SRFI-171 reducer that takes care of the blocks > as soon as they are ready. The IPFS example implements a reducer that > stores blocks to IPFS. By default `eris-encode` just uses `rcons` from > `(srfi srfi-171)`. Ah, I see, that=E2=80=99s great! I=E2=80=99m not familiar with the transdu= cer API so I always have to think twice (or more) about what=E2=80=99s going on; the flexibility it gives here is really nice. > The encoding transducer is state-full. But it only keeps references to > blocks in memory and at most log(n) at any moment, where n is the > number of blocks to encode. > > The decoding interface currently looks likes this: > > (eris-decode->bytevector eris-urn > (lambda (ref) (get-block-from-my-backend ref))) OK. >> Do you have plans to provide an interface to the storage backend so >> one >> can easily switch between in-memory, Datashards, IPFS, etc.? > > Currently the interface is a bit "low-level" - provide a SRFI-171 > reducer. This can definitely be improved and I'd be happy for ideas on > how to make this more ergonomic. Maybe that=E2=80=99s all we need after all. Maybe what would be nice is a couple of examples, like a high-level procedure or CLI that can insert or fetch from either (say) a local GDBM database or IPFS. That would illustrate integration with backends as well as the high-level API. Thanks! Ludo=E2=80=99.