From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guix-devel-bounces+larch=yhetil.org@gnu.org>
Received: from mp1.migadu.com ([2001:41d0:303:e224::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms8.migadu.com with LMTPS
	id WAgLF/mSsmVEkgAA62LTzQ:P1
	(envelope-from <guix-devel-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Thu, 25 Jan 2024 17:57:29 +0100
Received: from aspmx1.migadu.com ([2001:41d0:303:e224::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp1.migadu.com with LMTPS
	id WAgLF/mSsmVEkgAA62LTzQ
	(envelope-from <guix-devel-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Thu, 25 Jan 2024 17:57:29 +0100
X-Envelope-To: larch@yhetil.org
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=gaesziIj;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1706201849;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:list-id:list-help:
	 list-unsubscribe:list-subscribe:list-post:dkim-signature;
	bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=;
	b=KHOx79G7YxntJJ7NpTlocqLc8uoRZVZz8WLNufhmb+c/ipZ8933WNm/hvA4d3l9KAVPpL0
	fa/3yk2udJRSRwHGX7YS8OZu6drX6cQibUhp6UI9ZKGSVhgdgF/JHLe05VTN1z+/V0APCW
	JUvEa7pYkvMmKGy2C7v7Wl8t/0qP9D85oRKR54LT9xm27zsB5P9dvQLrsmulmte0I4fSgd
	nQwTfHTdMf3gKjp4MBzUPFXq5HlgP3yibiruG6NcUgGtVHtF8B/6dhf57/r5nw4P/gIxZ8
	IGsG6dMQBPnQ5Z7mf0Cdik+BUBwGdfUv3uXCUVhRnkcUhLGMK1UYdeKJdVrsDQ==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=gaesziIj;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1706201849; a=rsa-sha256; cv=none;
	b=oi/Wtxer+xKm58+76QN/8sq4FqCF3YM42w4+1RgkXXg87BbU0aoGH8KFt0Ioi76iFyRKwl
	1whzjcaHXFRIGqV84k95h0vsPP6Z6mxeLjmYbOb+y6LMggLJzwOatK7p4bfSupeffmysbK
	qLwYMO87IX/iaUWoqlqq80n7vlV9ZBSvRWxZ19z3REh4Hg9oa0w2BJxj2IID6zjbGqWuB5
	snRpQKTG+cLJOb9suu27XjVPRKjajNw8myFYl6po6Nlh1sD9UMVdZjV+CvCnN+/fFRmO0M
	5e9zhEPNwNJaDVm9QLlAzYI7UjdsT2vFsNkrKAIsOpa7qBNTJ6g/5+/I1BKw/Q==
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 3229826D65
	for <larch@yhetil.org>; Thu, 25 Jan 2024 17:57:29 +0100 (CET)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guix-devel-bounces@gnu.org>)
	id 1rT31g-0006uU-9F; Thu, 25 Jan 2024 11:56:28 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <zimon.toutoune@gmail.com>)
 id 1rT31d-0006tG-GF; Thu, 25 Jan 2024 11:56:25 -0500
Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <zimon.toutoune@gmail.com>)
 id 1rT31b-00010E-2k; Thu, 25 Jan 2024 11:56:25 -0500
Received: by mail-wm1-x32c.google.com with SMTP id
 5b1f17b1804b1-40e705d1527so17907675e9.0; 
 Thu, 25 Jan 2024 08:56:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1706201781; x=1706806581; darn=gnu.org;
 h=content-transfer-encoding:mime-version:message-id:date:references
 :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id
 :reply-to; bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=;
 b=gaesziIjg/mP5kJKZnqgmmMlc98ljdj94VEn3/oHk8CZupCfcwZce8Z680YiIpIblc
 Z2514DtLHxoCz5pTzfeehPftUivvLe8M5DIQxOZIYVhfMWUxBzrS/r8zbm/iyhK/oCFv
 RazQ9L0n8ZVtNOuNCG4AY2Y6wX788eqiRUnxwCakB85iZcd064uFEQoc7kHOnV5HhuqS
 U9fzR30FzkrSc2oqIdp8dxZLED7vzK8uBbKLFed+nl9IV1hFbJjMQfYDu6sAM+phBwUb
 BO5yaGA+pK/jWB6A8XekkEMaEWQEuzGR5dCOnU3XxFzsH7pzSQhTVXRMHoFxgMyptOyd
 Fvjw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1706201781; x=1706806581;
 h=content-transfer-encoding:mime-version:message-id:date:references
 :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=;
 b=q/ZlTCmblpy/iPWp2f3bz8BrlJJnojygN+Pm+lzGIUQyUzvPB7z4+YREmfrtTzMWyL
 kqKtSBhAiaL6dzHKXOhYToVg37S0azQBJNY50iqxt40NIrDRRnxTG8eXJ0dNrGtonELa
 K2QrooMmADMQ58wt5OEYTJ3ybF2rgl23UxvXmpdGKHbFSPL2BKTa2Bn+BVSpVgiuZva+
 j/8ygClAkPhaP299mW2oO0PSUq+/b67bTW9o9hFPJXtaMG1T0f/xznE1SG3vVZR8isif
 UpZBIfWEoakE264h+gZZc1AG0TtNlr+G7hvYDaXaL/ct6NI3+FtIxAj+ecu6EjShg/wE
 QDkQ==
X-Gm-Message-State: AOJu0YwE5VdUYlbgx6lLGFMbQHaAYSO6SxQJTVjBqZGP2ljqrYbxDxGf
 76BmpLQKCmiGU97cGBJJSBt/cScm+QnuuCoiJLkwIVfrP/M74ega1IvPO7dL
X-Google-Smtp-Source: AGHT+IE8LYfodCF7XgRrrfHsMgszrtAO8ObRe+gLj3c55XqX7RdIPOXjyCt6Fd0DxOGavy2rpovMsg==
X-Received: by 2002:a05:600c:138b:b0:40e:4875:1fde with SMTP id
 u11-20020a05600c138b00b0040e48751fdemr25992wmf.4.1706201780549; 
 Thu, 25 Jan 2024 08:56:20 -0800 (PST)
Received: from lili ([2a01:e0a:59b:9120:4e91:1e63:dc4:9dec])
 by smtp.gmail.com with ESMTPSA id
 f7-20020a05600c4e8700b0040e4914f28dsm3195741wmq.18.2024.01.25.08.56.19
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 25 Jan 2024 08:56:20 -0800 (PST)
From: Simon Tournier <zimon.toutoune@gmail.com>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@gnu.org>, guix-sysadmin
 <guix-sysadmin@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Git-LFS or Git Annex?
In-Reply-To: <87mssuu57m.fsf@inria.fr>
References: <87mssuu57m.fsf@inria.fr>
Date: Thu, 25 Jan 2024 17:55:11 +0100
Message-ID: <87frylpd3k.fsf@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=2a00:1450:4864:20::32c;
 envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x32c.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guix-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Development of GNU Guix and the GNU System distribution."
 <guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
 <mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guix-devel>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
 <mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org
Sender: guix-devel-bounces+larch=yhetil.org@gnu.org
X-Migadu-Flow: FLOW_IN
X-Migadu-Country: US
X-Migadu-Spam-Score: -8.98
X-Spam-Score: -8.98
X-Migadu-Queue-Id: 3229826D65
X-Migadu-Scanner: mx12.migadu.com
X-TUID: eYL5PO8qISrI

Hi Ludo, all,

On mer., 24 janv. 2024 at 16:22, Ludovic Court=C3=A8s <ludo@gnu.org> wrote:

> The question boils down to: Git-LFS or Git Annex?

Some months ago, I gave a look for managing some datasets.  My
conclusion is Git-Annex.  The main drawback of Git-LFS is that the
server needs to support the protocol.  On Git-Annex side, the main
drawback is Haskell.

Haskell could seem a detail but it is not when considering other
architectures than x86_64.  Give a look to CI filtering with =E2=80=99ghc-=
=E2=80=99:

    http://ci.guix.gnu.org/eval/1074397/dashboard?system=3Di686-linux

Here I pick i686 as an example for making the point of the Haskell
support of non-x86_64.  Aside, I do not speak about the resources that
Haskell requires for being compiled.

Do not take me wrong: it does not mean that=E2=80=99s a roadblock but let k=
eep
that in mind: Git-Annex comes with limitations because of Haskell.

That=E2=80=99s said, Git-Annex seems adapted for the workflow you describe:
backup large files between various servers.  And it would be a bridge
between content and address.  However, the content still needs to be
stored on some servers, IMHO.  Git-Annex supports =E2=80=9Cspecial remotes=
=E2=80=9D [1]
but it is not clear for me if the aim is to distribute the workload
between the two main servers or if the aim is just to ease the
maintenance of backups.

Last, you speak about content-addressed and this part is not clear for
me.  In Git-Annex, you have in one hand the Git content-addressed system
and in the other hand the =E2=80=9Ckey-value backends=E2=80=9C [2].  Someho=
w, Git-Annex
stores the key in a file that is stored in Git itself and the value is
somehow stored outside Git itself.

Recently, support of Git-LFS had been added to git-download with
a4db19d8e07eeb26931edfde0f0e6bca4e0448d3.  In that context, with
content-addressed in mind, are you speaking to add Git-Annex support and
thus distribute the videos as substitutes; probably also easing the
maintenance of backups.  Or is the question unrelated?

On a side note, depending on the size of the videos, it is only possible
to use non-cryptograpgically backends as URL.

All that said, let fix the ideas: a simple example, sync content between
machine-A and machine-B where original content is also kept elsewhere.

Let create a Git repository with a file annexed.

--8<---------------cut here---------------start------------->8---
machine-A$ mkdir example && cd example
machine-A$ git init && git annex init

machine-A$ $ git annex addurl -b MD5 --file sources.json \
                 https://guix.gnu.org/sources.json
addurl https://guix.gnu.org/sources.json=20
(to sources.json) ok
(recording state in git...)

machine-A$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17=
d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a

machine-A$ git annex add .
machine-A$ git commit -am 'Add sources.json'
[master (root-commit) bdf6bca] Add sources.json
 1 file changed, 1 insertion(+)
 create mode 120000 sources.json
--8<---------------cut here---------------end--------------->8---

Let=E2=80=99s backup.

--8<---------------cut here---------------start------------->8---
machine-B$ $ git clone file:///tmp/example backup && cd backup/

machine-B$ file sources.json=20
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572=
--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a9=
7d9c5a
--8<---------------cut here---------------end--------------->8---

As you see, here nothing is really copied.  It is only a symbolic link
pointing to some content outside what Git trackes.

--8<---------------cut here---------------start------------->8---
machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ git annex get sources.json
get sources.json (from origin...)=20
ok
(recording state in git...)

machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17=
d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

Let=E2=80=99s remove the file on machine-B; for whatever reason.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex drop sources.json
drop sources.json ok
(recording state in git...)

machine-B$ file sources.json
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572=
--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a9=
7d9c5a
--8<---------------cut here---------------end--------------->8---

And assume that machine-A is now unreachable.   Let=E2=80=99s get again on
machine-B.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex get sources.json
get sources.json (from web...)=20
ok
(recording state in git...)

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17=
d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As we see, since =E2=80=99origin=E2=80=99 is unreachable, it fetches direct=
ly from the
web.  Well, on machine-B running:

    git annex sync && git annex get -A

allows to first update the keys and then to fetch all the new content
from =E2=80=99origin=E2=80=99.  It eases the maintenance of backups, IMHO.

The main advantages are: all is versioned thanks to Git and what is
locally stored is fine-controlled.

Well, if some motivated Haskeller would find fun to implement NAR as
backend, it would allow transparent substitution; from my understanding,
if the key contains NAR hash then it would be possible to bridge with
Guix content-addressed system. :-)

Cheers,
simon


1: https://git-annex.branchable.com/special_remotes/
2: https://git-annex.branchable.com/backends/
3: https://git-annex.branchable.com/internals/key_format/