all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>, guix-sysadmin <guix-sysadmin@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Git-LFS or Git Annex?
Date: Thu, 25 Jan 2024 17:55:11 +0100	[thread overview]
Message-ID: <87frylpd3k.fsf@gmail.com> (raw)
In-Reply-To: <87mssuu57m.fsf@inria.fr>

Hi Ludo, all,

On mer., 24 janv. 2024 at 16:22, Ludovic Courtès <ludo@gnu.org> wrote:

> The question boils down to: Git-LFS or Git Annex?

Some months ago, I gave a look for managing some datasets.  My
conclusion is Git-Annex.  The main drawback of Git-LFS is that the
server needs to support the protocol.  On Git-Annex side, the main
drawback is Haskell.

Haskell could seem a detail but it is not when considering other
architectures than x86_64.  Give a look to CI filtering with ’ghc-’:

    http://ci.guix.gnu.org/eval/1074397/dashboard?system=i686-linux

Here I pick i686 as an example for making the point of the Haskell
support of non-x86_64.  Aside, I do not speak about the resources that
Haskell requires for being compiled.

Do not take me wrong: it does not mean that’s a roadblock but let keep
that in mind: Git-Annex comes with limitations because of Haskell.

That’s said, Git-Annex seems adapted for the workflow you describe:
backup large files between various servers.  And it would be a bridge
between content and address.  However, the content still needs to be
stored on some servers, IMHO.  Git-Annex supports “special remotes” [1]
but it is not clear for me if the aim is to distribute the workload
between the two main servers or if the aim is just to ease the
maintenance of backups.

Last, you speak about content-addressed and this part is not clear for
me.  In Git-Annex, you have in one hand the Git content-addressed system
and in the other hand the “key-value backends“ [2].  Somehow, Git-Annex
stores the key in a file that is stored in Git itself and the value is
somehow stored outside Git itself.

Recently, support of Git-LFS had been added to git-download with
a4db19d8e07eeb26931edfde0f0e6bca4e0448d3.  In that context, with
content-addressed in mind, are you speaking to add Git-Annex support and
thus distribute the videos as substitutes; probably also easing the
maintenance of backups.  Or is the question unrelated?

On a side note, depending on the size of the videos, it is only possible
to use non-cryptograpgically backends as URL.

All that said, let fix the ideas: a simple example, sync content between
machine-A and machine-B where original content is also kept elsewhere.

Let create a Git repository with a file annexed.

--8<---------------cut here---------------start------------->8---
machine-A$ mkdir example && cd example
machine-A$ git init && git annex init

machine-A$ $ git annex addurl -b MD5 --file sources.json \
                 https://guix.gnu.org/sources.json
addurl https://guix.gnu.org/sources.json 
(to sources.json) ok
(recording state in git...)

machine-A$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a

machine-A$ git annex add .
machine-A$ git commit -am 'Add sources.json'
[master (root-commit) bdf6bca] Add sources.json
 1 file changed, 1 insertion(+)
 create mode 120000 sources.json
--8<---------------cut here---------------end--------------->8---

Let’s backup.

--8<---------------cut here---------------start------------->8---
machine-B$ $ git clone file:///tmp/example backup && cd backup/

machine-B$ file sources.json 
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As you see, here nothing is really copied.  It is only a symbolic link
pointing to some content outside what Git trackes.

--8<---------------cut here---------------start------------->8---
machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ git annex get sources.json
get sources.json (from origin...) 
ok
(recording state in git...)

machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

Let’s remove the file on machine-B; for whatever reason.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex drop sources.json
drop sources.json ok
(recording state in git...)

machine-B$ file sources.json
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

And assume that machine-A is now unreachable.   Let’s get again on
machine-B.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex get sources.json
get sources.json (from web...) 
ok
(recording state in git...)

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As we see, since ’origin’ is unreachable, it fetches directly from the
web.  Well, on machine-B running:

    git annex sync && git annex get -A

allows to first update the keys and then to fetch all the new content
from ’origin’.  It eases the maintenance of backups, IMHO.

The main advantages are: all is versioned thanks to Git and what is
locally stored is fine-controlled.

Well, if some motivated Haskeller would find fun to implement NAR as
backend, it would allow transparent substitution; from my understanding,
if the key contains NAR hash then it would be possible to bridge with
Guix content-addressed system. :-)

Cheers,
simon


1: https://git-annex.branchable.com/special_remotes/
2: https://git-annex.branchable.com/backends/
3: https://git-annex.branchable.com/internals/key_format/


  parent reply	other threads:[~2024-01-25 16:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
2024-01-24 16:13 ` indieterminacy
2024-01-24 17:39 ` Giovanni Biscuolo
2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
2024-01-28 11:32     ` Philip McGrath
2024-01-28 17:32     ` Giovanni Biscuolo
2024-01-29 11:39       ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
2024-01-24 18:41 ` pukkamustard
2024-01-24 20:32   ` Troy Figiel
2024-01-25 12:03   ` Giovanni Biscuolo
2024-01-25 16:55 ` Simon Tournier [this message]
2024-01-26  2:20   ` Kyle Meyer
2024-01-26 10:02     ` Simon Tournier
2024-01-27 16:59       ` Timothy Sample
2024-01-27 17:47         ` Kyle Meyer
2024-02-14 15:18         ` Simon Tournier
2024-01-27  4:31 ` Philip McGrath
2024-01-28 17:37 ` Efraim Flashner
2024-02-02 16:46 ` Christine Lemmer-Webber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frylpd3k.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=guix-sysadmin@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.