From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id WAgLF/mSsmVEkgAA62LTzQ:P1 (envelope-from ) for ; Thu, 25 Jan 2024 17:57:29 +0100 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id WAgLF/mSsmVEkgAA62LTzQ (envelope-from ) for ; Thu, 25 Jan 2024 17:57:29 +0100 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gaesziIj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1706201849; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=; b=KHOx79G7YxntJJ7NpTlocqLc8uoRZVZz8WLNufhmb+c/ipZ8933WNm/hvA4d3l9KAVPpL0 fa/3yk2udJRSRwHGX7YS8OZu6drX6cQibUhp6UI9ZKGSVhgdgF/JHLe05VTN1z+/V0APCW JUvEa7pYkvMmKGy2C7v7Wl8t/0qP9D85oRKR54LT9xm27zsB5P9dvQLrsmulmte0I4fSgd nQwTfHTdMf3gKjp4MBzUPFXq5HlgP3yibiruG6NcUgGtVHtF8B/6dhf57/r5nw4P/gIxZ8 IGsG6dMQBPnQ5Z7mf0Cdik+BUBwGdfUv3uXCUVhRnkcUhLGMK1UYdeKJdVrsDQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gaesziIj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1706201849; a=rsa-sha256; cv=none; b=oi/Wtxer+xKm58+76QN/8sq4FqCF3YM42w4+1RgkXXg87BbU0aoGH8KFt0Ioi76iFyRKwl 1whzjcaHXFRIGqV84k95h0vsPP6Z6mxeLjmYbOb+y6LMggLJzwOatK7p4bfSupeffmysbK qLwYMO87IX/iaUWoqlqq80n7vlV9ZBSvRWxZ19z3REh4Hg9oa0w2BJxj2IID6zjbGqWuB5 snRpQKTG+cLJOb9suu27XjVPRKjajNw8myFYl6po6Nlh1sD9UMVdZjV+CvCnN+/fFRmO0M 5e9zhEPNwNJaDVm9QLlAzYI7UjdsT2vFsNkrKAIsOpa7qBNTJ6g/5+/I1BKw/Q== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3229826D65 for ; Thu, 25 Jan 2024 17:57:29 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rT31g-0006uU-9F; Thu, 25 Jan 2024 11:56:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rT31d-0006tG-GF; Thu, 25 Jan 2024 11:56:25 -0500 Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rT31b-00010E-2k; Thu, 25 Jan 2024 11:56:25 -0500 Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-40e705d1527so17907675e9.0; Thu, 25 Jan 2024 08:56:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706201781; x=1706806581; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=; b=gaesziIjg/mP5kJKZnqgmmMlc98ljdj94VEn3/oHk8CZupCfcwZce8Z680YiIpIblc Z2514DtLHxoCz5pTzfeehPftUivvLe8M5DIQxOZIYVhfMWUxBzrS/r8zbm/iyhK/oCFv RazQ9L0n8ZVtNOuNCG4AY2Y6wX788eqiRUnxwCakB85iZcd064uFEQoc7kHOnV5HhuqS U9fzR30FzkrSc2oqIdp8dxZLED7vzK8uBbKLFed+nl9IV1hFbJjMQfYDu6sAM+phBwUb BO5yaGA+pK/jWB6A8XekkEMaEWQEuzGR5dCOnU3XxFzsH7pzSQhTVXRMHoFxgMyptOyd Fvjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706201781; x=1706806581; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s5ZZwaj6Q5TExMlTmttxzBt+LDvwj6gB8yw255AwoyY=; b=q/ZlTCmblpy/iPWp2f3bz8BrlJJnojygN+Pm+lzGIUQyUzvPB7z4+YREmfrtTzMWyL kqKtSBhAiaL6dzHKXOhYToVg37S0azQBJNY50iqxt40NIrDRRnxTG8eXJ0dNrGtonELa K2QrooMmADMQ58wt5OEYTJ3ybF2rgl23UxvXmpdGKHbFSPL2BKTa2Bn+BVSpVgiuZva+ j/8ygClAkPhaP299mW2oO0PSUq+/b67bTW9o9hFPJXtaMG1T0f/xznE1SG3vVZR8isif UpZBIfWEoakE264h+gZZc1AG0TtNlr+G7hvYDaXaL/ct6NI3+FtIxAj+ecu6EjShg/wE QDkQ== X-Gm-Message-State: AOJu0YwE5VdUYlbgx6lLGFMbQHaAYSO6SxQJTVjBqZGP2ljqrYbxDxGf 76BmpLQKCmiGU97cGBJJSBt/cScm+QnuuCoiJLkwIVfrP/M74ega1IvPO7dL X-Google-Smtp-Source: AGHT+IE8LYfodCF7XgRrrfHsMgszrtAO8ObRe+gLj3c55XqX7RdIPOXjyCt6Fd0DxOGavy2rpovMsg== X-Received: by 2002:a05:600c:138b:b0:40e:4875:1fde with SMTP id u11-20020a05600c138b00b0040e48751fdemr25992wmf.4.1706201780549; Thu, 25 Jan 2024 08:56:20 -0800 (PST) Received: from lili ([2a01:e0a:59b:9120:4e91:1e63:dc4:9dec]) by smtp.gmail.com with ESMTPSA id f7-20020a05600c4e8700b0040e4914f28dsm3195741wmq.18.2024.01.25.08.56.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 08:56:20 -0800 (PST) From: Simon Tournier To: Ludovic =?utf-8?Q?Court=C3=A8s?= , guix-sysadmin Cc: guix-devel@gnu.org Subject: Re: Git-LFS or Git Annex? In-Reply-To: <87mssuu57m.fsf@inria.fr> References: <87mssuu57m.fsf@inria.fr> Date: Thu, 25 Jan 2024 17:55:11 +0100 Message-ID: <87frylpd3k.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::32c; envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x32c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -8.98 X-Spam-Score: -8.98 X-Migadu-Queue-Id: 3229826D65 X-Migadu-Scanner: mx12.migadu.com X-TUID: eYL5PO8qISrI Hi Ludo, all, On mer., 24 janv. 2024 at 16:22, Ludovic Court=C3=A8s wrote: > The question boils down to: Git-LFS or Git Annex? Some months ago, I gave a look for managing some datasets. My conclusion is Git-Annex. The main drawback of Git-LFS is that the server needs to support the protocol. On Git-Annex side, the main drawback is Haskell. Haskell could seem a detail but it is not when considering other architectures than x86_64. Give a look to CI filtering with =E2=80=99ghc-= =E2=80=99: http://ci.guix.gnu.org/eval/1074397/dashboard?system=3Di686-linux Here I pick i686 as an example for making the point of the Haskell support of non-x86_64. Aside, I do not speak about the resources that Haskell requires for being compiled. Do not take me wrong: it does not mean that=E2=80=99s a roadblock but let k= eep that in mind: Git-Annex comes with limitations because of Haskell. That=E2=80=99s said, Git-Annex seems adapted for the workflow you describe: backup large files between various servers. And it would be a bridge between content and address. However, the content still needs to be stored on some servers, IMHO. Git-Annex supports =E2=80=9Cspecial remotes= =E2=80=9D [1] but it is not clear for me if the aim is to distribute the workload between the two main servers or if the aim is just to ease the maintenance of backups. Last, you speak about content-addressed and this part is not clear for me. In Git-Annex, you have in one hand the Git content-addressed system and in the other hand the =E2=80=9Ckey-value backends=E2=80=9C [2]. Someho= w, Git-Annex stores the key in a file that is stored in Git itself and the value is somehow stored outside Git itself. Recently, support of Git-LFS had been added to git-download with a4db19d8e07eeb26931edfde0f0e6bca4e0448d3. In that context, with content-addressed in mind, are you speaking to add Git-Annex support and thus distribute the videos as substitutes; probably also easing the maintenance of backups. Or is the question unrelated? On a side note, depending on the size of the videos, it is only possible to use non-cryptograpgically backends as URL. All that said, let fix the ideas: a simple example, sync content between machine-A and machine-B where original content is also kept elsewhere. Let create a Git repository with a file annexed. --8<---------------cut here---------------start------------->8--- machine-A$ mkdir example && cd example machine-A$ git init && git annex init machine-A$ $ git annex addurl -b MD5 --file sources.json \ https://guix.gnu.org/sources.json addurl https://guix.gnu.org/sources.json=20 (to sources.json) ok (recording state in git...) machine-A$ file sources.json sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17= d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a machine-A$ git annex add . machine-A$ git commit -am 'Add sources.json' [master (root-commit) bdf6bca] Add sources.json 1 file changed, 1 insertion(+) create mode 120000 sources.json --8<---------------cut here---------------end--------------->8--- Let=E2=80=99s backup. --8<---------------cut here---------------start------------->8--- machine-B$ $ git clone file:///tmp/example backup && cd backup/ machine-B$ file sources.json=20 sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572= --a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a9= 7d9c5a --8<---------------cut here---------------end--------------->8--- As you see, here nothing is really copied. It is only a symbolic link pointing to some content outside what Git trackes. --8<---------------cut here---------------start------------->8--- machine-B$ guix hash -rx . 0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5 machine-B$ git annex get sources.json get sources.json (from origin...)=20 ok (recording state in git...) machine-B$ guix hash -rx . 0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5 machine-B$ file sources.json sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17= d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a --8<---------------cut here---------------end--------------->8--- Let=E2=80=99s remove the file on machine-B; for whatever reason. --8<---------------cut here---------------start------------->8--- machine-B$ git annex drop sources.json drop sources.json ok (recording state in git...) machine-B$ file sources.json sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572= --a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a9= 7d9c5a --8<---------------cut here---------------end--------------->8--- And assume that machine-A is now unreachable. Let=E2=80=99s get again on machine-B. --8<---------------cut here---------------start------------->8--- machine-B$ git annex get sources.json get sources.json (from web...)=20 ok (recording state in git...) machine-B$ file sources.json sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17= d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a --8<---------------cut here---------------end--------------->8--- As we see, since =E2=80=99origin=E2=80=99 is unreachable, it fetches direct= ly from the web. Well, on machine-B running: git annex sync && git annex get -A allows to first update the keys and then to fetch all the new content from =E2=80=99origin=E2=80=99. It eases the maintenance of backups, IMHO. The main advantages are: all is versioned thanks to Git and what is locally stored is fine-controlled. Well, if some motivated Haskeller would find fun to implement NAR as backend, it would allow transparent substitution; from my understanding, if the key contains NAR hash then it would be possible to bridge with Guix content-addressed system. :-) Cheers, simon 1: https://git-annex.branchable.com/special_remotes/ 2: https://git-annex.branchable.com/backends/ 3: https://git-annex.branchable.com/internals/key_format/