From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: pukkamustard Newsgroups: gmane.lisp.guile.user Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile Date: Thu, 10 Dec 2020 09:27:02 +0100 Message-ID: <86wnxq5and.fsf@posteo.net> References: <86ft4hg4qu.fsf@posteo.net> <878sa6c49s.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20068"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.4.13; emacs 27.1 Cc: guile-user@gnu.org To: Ludovic =?utf-8?Q?Court=C3=A8s?= Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Thu Dec 10 09:27:31 2020 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1knHIR-00057Q-6n for guile-user@m.gmane-mx.org; Thu, 10 Dec 2020 09:27:31 +0100 Original-Received: from localhost ([::1]:45258 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1knHIQ-0006EC-9h for guile-user@m.gmane-mx.org; Thu, 10 Dec 2020 03:27:30 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33762) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1knHI5-0006E3-J4 for guile-user@gnu.org; Thu, 10 Dec 2020 03:27:09 -0500 Original-Received: from mout02.posteo.de ([185.67.36.66]:53729) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1knHI3-0005db-C1 for guile-user@gnu.org; Thu, 10 Dec 2020 03:27:09 -0500 Original-Received: from submission (posteo.de [89.146.220.130]) by mout02.posteo.de (Postfix) with ESMTPS id 0CD302400FC for ; Thu, 10 Dec 2020 09:27:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1607588824; bh=tM9QL89kEVXAIcNdS3woqBuwyvhYnWsF5EOOp2vNsDQ=; h=From:To:Cc:Subject:Date:From; b=bUBlyaHCeJlJdWKrZ4sPB8jpD39ewhKl6niRHgqj5KaAuNfHOTTLa/QlAbDZuG1I4 XzFeTVhZgkOR7jLN+qw1WwuL9hsRIL5D8hYyLUGe+tWwEy2ePAt4P718+WHngW+IQC 2zQZ5xjlcadnneTiK7JnVSAQcekFLS3qiIGhCU8w1k8QQ3iQSTHX4ClhWj4Rk8ooFw /+azaFODTqz5v2g6c6sOEUo2hmUM3nUfFFpWvTDrOk8mpXp1XdNORBm0kB89V8W3Vx FtzQTGS77SetDcT2lTdIA/+Rwzfl5cZcLmcc2THrP6PHr9icxHsPrW+dbcDCeFQ8fZ J6afQ81mypQgA== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Cs6Tg3jr0z9rxX; Thu, 10 Dec 2020 09:27:03 +0100 (CET) In-reply-to: <878sa6c49s.fsf@gnu.org> Received-SPF: pass client-ip=185.67.36.66; envelope-from=pukkamustard@posteo.net; helo=mout02.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.io gmane.lisp.guile.user:17069 Archived-At: Hi Ludo, > Block size is fixed; did you consider content-defined block=20 > boundaries > and such? Perhaps it doesn=E2=80=99t bring much though. I looked into block boundaries with a "sliding hash" (re-compute a=20 short hash for every byte read and choose boundaries when hash is zero).=20 This would allow a higher degree of de-duplication, but I found this to=20 be a bit "finicky" (and myself too impatient to tune and tweak this :). I settled on fixed block sizes, making the encoding faster and=20 preventing information leaks based on block size. An other idea to increase de-duplication: When encoding a=20 directory, align files to the ERIS block size. This would allows=20 de-duplication of files across encoded images/directories. Maybe something like SquashFS already does such an alignment? That=20 would be cool... > The IPFS example is nice! There are bindings to the IPFS HTTP=20 > interface > floating around for Guix; would be nice to converge on these=20 > bits. Spelunking into wip-ipfs-substitutes is on my list! Will report=20 back with a report on the adventure. :) >> ERIS is still "experimental". This release is intended to=20 >> initiate >> discussion and collect feedback from a wider circle. In=20 >> particular I'd >> be interested in your thoughts on applications and the Guile=20 >> API. > > Do I get it right that the encoder currently keeps blocks in=20 > memory? By default when using `(eris-encode content)`, yes. The blocks are stored into an alist. But the encoder is implemented as an SRFI-171 transducer that=20 eagerly emits (reduces) encoded blocks. So one could do this: (eris-encode content #:block-reducer my-backend) Where `my-backend` is a SRFI-171 reducer that takes care of the=20 blocks as soon as they are ready. The IPFS example implements a reducer=20 that stores blocks to IPFS. By default `eris-encode` just uses `rcons`=20 from `(srfi srfi-171)`. The encoding transducer is state-full. But it only keeps=20 references to blocks in memory and at most log(n) at any moment, where n is the number of blocks to encode. The decoding interface currently looks likes this: (eris-decode->bytevector eris-urn (lambda (ref) (get-block-from-my-backend ref))) Much room for improvement... > Do you have plans to provide an interface to the storage backend=20 > so one > can easily switch between in-memory, Datashards, IPFS, etc.? Currently the interface is a bit "low-level" - provide a SRFI-171 reducer. This can definitely be improved and I'd be happy for=20 ideas on how to make this more ergonomic. Thank you for your comments! -pukkamustard