unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Git-LFS or Git Annex?
@ 2024-01-24 15:22 Ludovic Courtès
  2024-01-24 16:13 ` indieterminacy
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Ludovic Courtès @ 2024-01-24 15:22 UTC (permalink / raw)
  To: guix-sysadmin; +Cc: guix-devel

Hello!

I’m looking for ways to incorporate videos into the repositories of our
web sites so they’re content-addressed and properly tracked, and to make
it easier to create backups (right now those videos are stored on our
two main servers and rsynced between them⁰; I’m talking about the videos
at guix.gnu.org, 10years.guix.gnu.org, and hpc.guix.info).

The question boils down to: Git-LFS or Git Annex?

From a quick look (I haven’t used them), Git-LFS seems to assume a
rather centralized model where there’s an LFS server sitting next to the
Git server¹.  Git Annex looks more decentralized, allowing you to have
several “remotes”, to check the status of each one, to sync them, etc.²
Because of this, Git Annex seems to be a better fit.

Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
support Git-LFS; the two other web sites above are hosted on GitLab
instances, which I think do support Git-LFS.

What’s your experience?  What would you suggest?

Thanks,
Ludo’.

⁰ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/berlin.scm#n193
¹ https://github.com/git-lfs/git-lfs/wiki/Tutorial
² https://git-annex.branchable.com/walkthrough/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
@ 2024-01-24 16:13 ` indieterminacy
  2024-01-24 17:39 ` Giovanni Biscuolo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: indieterminacy @ 2024-01-24 16:13 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-sysadmin, guix-devel

Hello,

On 24-01-2024 16:22, Ludovic Courtès wrote:
> Hello!
> 
> I’m looking for ways to incorporate videos into the repositories of our
> web sites so they’re content-addressed and properly tracked, and to 
> make
> it easier to create backups (right now those videos are stored on our
> two main servers and rsynced between them⁰; I’m talking about the 
> videos
> at guix.gnu.org, 10years.guix.gnu.org, and hpc.guix.info).
> 
> The question boils down to: Git-LFS or Git Annex?
> 
> From a quick look (I haven’t used them), Git-LFS seems to assume a
> rather centralized model where there’s an LFS server sitting next to 
> the
> Git server¹.  Git Annex looks more decentralized, allowing you to have
> several “remotes”, to check the status of each one, to sync them, etc.²
> Because of this, Git Annex seems to be a better fit.
> 
> Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
> support Git-LFS; the two other web sites above are hosted on GitLab
> instances, which I think do support Git-LFS.
> 
> What’s your experience?  What would you suggest?
> 

In an ideal world I would be encouraging Guix to be operating its own 
Peertube instance,
so that the aforementioned videos can be operating within the Fediverse:
https://joinpeertube.org/
https://docs.joinpeertube.org/

Alas, looking at the list of dependencies makes me wonder how long this 
would take:
https://github.com/Chocobozzz/PeerTube/blob/develop/package.json#L86
https://docs.joinpeertube.org/support/doc/dependencies#other-distributions

> Thanks,
> Ludo’.
> 
> ⁰ 
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/berlin.scm#n193
> ¹ https://github.com/git-lfs/git-lfs/wiki/Tutorial
> ² https://git-annex.branchable.com/walkthrough/

It would be nice if there was an alternative Peertube instance 
technology in a language we have more comprehensive packaging (than 
Typescript).
I havent heard of anything, hopefully something will pop up during 
FOSDEM/OFFDEM, Brussels.
There should be enough Fediverse technologists at Caldarium (who will be 
hosting us for the Guix Days dinner on the Friday),
I shall try to remember to ask people.


-- 
Jonathan McHugh
indieterminacy@libre.brussels


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
  2024-01-24 16:13 ` indieterminacy
@ 2024-01-24 17:39 ` Giovanni Biscuolo
  2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
  2024-01-24 18:41 ` pukkamustard
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Giovanni Biscuolo @ 2024-01-24 17:39 UTC (permalink / raw)
  To: Ludovic Courtès, guix-sysadmin; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3933 bytes --]

Hi Ludo’

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> The question boils down to: Git-LFS or Git Annex?
>
> From a quick look (I haven’t used them), Git-LFS seems to assume a
> rather centralized model where there’s an LFS server sitting next to the
> Git server¹.  Git Annex looks more decentralized, allowing you to have
> several “remotes”, to check the status of each one, to sync them, etc.²
> Because of this, Git Annex seems to be a better fit.

I've never used Git-LFS for my media repository (and will never use it,
never).

AFAIK this two advantages of git-annex vs Git-LFS are still valid today:

--8<---------------cut here---------------start------------->8---

A major advantage of git annex is that you can choose which file you
want to download.

You still know which files are available thanks to the symlinks.

For example suppose that you have a directory full of ISO files. You can
list the files, then decide which one you want to download by typing:
git annex get my_file.

Another advantage is that the files are not duplicated in your
checkout. With LFS, lfs files are present as git objects both in
.git/lfs/objects and in your working repository. So If you have 20 GB of
LFS files, you need 40 GB on your disk. While with git annex, files are
symlinked so in this case only 20 GB is required.

--8<---------------cut here---------------end--------------->8---
(https://stackoverflow.com/a/43277071, 2018-10-23)

So, AFAIU, with Git-LFS you can have all your media or no media, you
cannot selectively choose what media to get.

Another important limitation of Git-LFS is that you cannot delete
(remotely stored) objects [1], with git-annex is very easy.

> Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
> support Git-LFS;

to host a Git-LFS service a Git-LFS server implementation (one that
reply to GIT_LFS API) is needed:
https://github.com/git-lfs/git-lfs/wiki/Implementations

AFAIU we dont't have one packaged (I'd save some precious time trying to
package one of them).

AFAIK Savannah do not support git-annex also, so we need to set up a
Guix git-annex server [3], I suggest using gitolite [4]: I can help with
this task if needed!

> the two other web sites above are hosted on GitLab instances, which I
> think do support Git-LFS.

Yes, Git-LFS is supported on GitLab.com and included in the Community
Edition [2] since late 2015.

git-annex repository support was available on GitLab.com in 2015/16 but
was removed in 2017 [5]

> What’s your experience?  What would you suggest?

I've no experience with Git-LFS (and will never have) but from what I
read I definitely suggest git-annex: it's more efficient, it's more
flexible, can be hosted everywhere with a little bit of effort... can be
hosted on a Guix System host! :-)

As a bonus, git-annex have a plenty of super cool features that will
make us very happy, i.e.:

- special remotes: https://git-annex.branchable.com/special_remotes/
  (including rclone
  https://git-annex.branchable.com/special_remotes/rclone/)

- location tracking
  (https://git-annex.branchable.com/location_tracking/)

- manage metadata of annexed files

HTH! Gio'

> Thanks,
> Ludo’.
>
> ⁰ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/berlin.scm#n193
> ¹ https://github.com/git-lfs/git-lfs/wiki/Tutorial
> ² https://git-annex.branchable.com/walkthrough/


[1] https://github.com/git-lfs/git-lfs/wiki/Limitations

[2] GitLab Community Edition

[3]
https://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_your_own_server/

[4] https://git-annex.branchable.com/tips/using_gitolite_with_git-annex/

[5] https://about.gitlab.com/blog/2015/02/17/gitlab-annex-solves-the-problem-of-versioning-large-binaries-with-git/

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
  2024-01-24 16:13 ` indieterminacy
  2024-01-24 17:39 ` Giovanni Biscuolo
@ 2024-01-24 18:41 ` pukkamustard
  2024-01-24 20:32   ` Troy Figiel
  2024-01-25 12:03   ` Giovanni Biscuolo
  2024-01-25 16:55 ` Simon Tournier
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: pukkamustard @ 2024-01-24 18:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, guix-sysadmin


Hi!

Ludovic Courtès <ludo@gnu.org> writes:

[..]

> From a quick look (I haven’t used them), Git-LFS seems to assume a
> rather centralized model where there’s an LFS server sitting next to the
> Git server¹.  Git Annex looks more decentralized, allowing you to have
> several “remotes”, to check the status of each one, to sync them, etc.²
> Because of this, Git Annex seems to be a better fit.

I agree that Git Annex seems to be a better fit for the reasons you
list.

> What’s your experience?  What would you suggest?

I've used Git Annex for managing many large files (~100s of GiBs) and it
worked. However, I found Git Annex to be quite complex and to do things
automatically without me fully realizing.

The use case was to use Git Annex to distribute large test vectors. This
involved many Git checkouts and worktrees on quite a few different hosts
- some of them ephermal. When running `git annex sync` Git Annex tries
to synchronize the current view of the state to all Git remotes (which
file is available where) with a lot of git pushing and pulling. It ended
up sharing remotes that are no longer existant or not-accessible and
somehow it was hard/impossible to remove reference to those remotes
(afaiu Git Annex remotes can only be marked as "dead" and not removed -
https://git-annex.branchable.com/git-annex-dead/). As the number of such
remotes increased, I became more and more confused.

But this is maybe a special use-case which is not relevant for video
sharing as you describe and probably reflects my inability to understand
how Git Annex works more than anything else.

Still, I would recommend to NOT store the videos in a remote Git
repository but a publicly accessible rsync server as a Git Annex special
remote (https://git-annex.branchable.com/special_remotes/).

Cheers,
pukkamustard


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 18:41 ` pukkamustard
@ 2024-01-24 20:32   ` Troy Figiel
  2024-01-25 12:03   ` Giovanni Biscuolo
  1 sibling, 0 replies; 19+ messages in thread
From: Troy Figiel @ 2024-01-24 20:32 UTC (permalink / raw)
  To: guix-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 639 bytes --]

Hi all,

On 2024-01-24 19:41, pukkamustard wrote:
> I've used Git Annex for managing many large files (~100s of GiBs) and it
> worked. However, I found Git Annex to be quite complex and to do things
> automatically without me fully realizing.
> 

I can mirror this sentiment. As far as my experience with git-annex
goes, it is a very flexible tool, but difficult to grok.

Although both are FOSS, only git-annex is mostly copyleft with the
(A)GPL3+. git-lfs I believe is copyrighted mainly by two large companies
(GitHub and Atlassian) under a non-copyleft license. Something to keep
in mind at least.

Best wishes,

Troy

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 6367 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 18:41 ` pukkamustard
  2024-01-24 20:32   ` Troy Figiel
@ 2024-01-25 12:03   ` Giovanni Biscuolo
  1 sibling, 0 replies; 19+ messages in thread
From: Giovanni Biscuolo @ 2024-01-25 12:03 UTC (permalink / raw)
  To: pukkamustard, Ludovic Courtès; +Cc: guix-devel, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 2916 bytes --]

Hi pukkamustard,

git-annex is complex but no so complicated when you learn the two
foundamental concepts (sorry if I say something obvious to you!):

1. only the names of the files and some other metadata are stored in a
git repository when using git-annex, the content is not; when you "git
annex add some-media" it is (locally!) stored in a special folder named
.git/annex/

2. content can be transfered (get, put) from one repository to another
and the tool used to transfer depends (automatically choosen by
git-annex) on the remote where the data is (rsync, cp or curl), there
are also many "special remotes" available for data transfer. (see
https://git-annex.branchable.com/walkthrough/#index11h2 for an ssh
git-annex remote)

See https://git-annex.branchable.com/how_it_works/ for a general
description and https://git-annex.branchable.com/internals/ for a
description of the content of each git-annex managed (and reserved)
directory.

Just to make it clear, you can have one or more "plain" git remotes just
for location tracking and one or more git-annex remotes (also special
remotes) for file transfes (and location tracking if they are also
regular git remotes)

pukkamustard <pukkamustard@posteo.net> writes:

[...]

> It ended up sharing remotes that are no longer existant or
> not-accessible and somehow it was hard/impossible to remove reference
> to those remotes (afaiu Git Annex remotes can only be marked as "dead"
> and not removed -
> https://git-annex.branchable.com/git-annex-dead/). As the number of
> such remotes increased, I became more and more confused.

https://git-annex.branchable.com/git-annex-dead/:
--8<---------------cut here---------------start------------->8---

This command exists to deal with situations where data has been lost,
and you know it has, and you want to stop being reminded of that fact.

When a repository is specified, indicates that the repository has been
irretrievably lost, so it will not be listed in eg, git annex whereis.

--8<---------------cut here---------------end--------------->8---

If you want git-annex to definitely forget about dead repositories
(throwing away historical data about past locations of files) you can
use "git-annex forget --drop-dead"

If you want to remove a remote (and stop syncing with it) you can do it
as you do with any git remote: "git remote rm <remote>"

[...]

> Still, I would recommend to NOT store the videos in a remote Git
> repository but a publicly accessible rsync server as a Git Annex
> special remote (https://git-annex.branchable.com/special_remotes/).

Good catch!

This way we can still use the current Savannah git hosted remote (not
supporting git-annex-shell, AFAIK) for location tracking and the same
(or more) rsync servers we are using to store media.

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
                   ` (2 preceding siblings ...)
  2024-01-24 18:41 ` pukkamustard
@ 2024-01-25 16:55 ` Simon Tournier
  2024-01-26  2:20   ` Kyle Meyer
  2024-01-27  4:31 ` Philip McGrath
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Simon Tournier @ 2024-01-25 16:55 UTC (permalink / raw)
  To: Ludovic Courtès, guix-sysadmin; +Cc: guix-devel

Hi Ludo, all,

On mer., 24 janv. 2024 at 16:22, Ludovic Courtès <ludo@gnu.org> wrote:

> The question boils down to: Git-LFS or Git Annex?

Some months ago, I gave a look for managing some datasets.  My
conclusion is Git-Annex.  The main drawback of Git-LFS is that the
server needs to support the protocol.  On Git-Annex side, the main
drawback is Haskell.

Haskell could seem a detail but it is not when considering other
architectures than x86_64.  Give a look to CI filtering with ’ghc-’:

    http://ci.guix.gnu.org/eval/1074397/dashboard?system=i686-linux

Here I pick i686 as an example for making the point of the Haskell
support of non-x86_64.  Aside, I do not speak about the resources that
Haskell requires for being compiled.

Do not take me wrong: it does not mean that’s a roadblock but let keep
that in mind: Git-Annex comes with limitations because of Haskell.

That’s said, Git-Annex seems adapted for the workflow you describe:
backup large files between various servers.  And it would be a bridge
between content and address.  However, the content still needs to be
stored on some servers, IMHO.  Git-Annex supports “special remotes” [1]
but it is not clear for me if the aim is to distribute the workload
between the two main servers or if the aim is just to ease the
maintenance of backups.

Last, you speak about content-addressed and this part is not clear for
me.  In Git-Annex, you have in one hand the Git content-addressed system
and in the other hand the “key-value backends“ [2].  Somehow, Git-Annex
stores the key in a file that is stored in Git itself and the value is
somehow stored outside Git itself.

Recently, support of Git-LFS had been added to git-download with
a4db19d8e07eeb26931edfde0f0e6bca4e0448d3.  In that context, with
content-addressed in mind, are you speaking to add Git-Annex support and
thus distribute the videos as substitutes; probably also easing the
maintenance of backups.  Or is the question unrelated?

On a side note, depending on the size of the videos, it is only possible
to use non-cryptograpgically backends as URL.

All that said, let fix the ideas: a simple example, sync content between
machine-A and machine-B where original content is also kept elsewhere.

Let create a Git repository with a file annexed.

--8<---------------cut here---------------start------------->8---
machine-A$ mkdir example && cd example
machine-A$ git init && git annex init

machine-A$ $ git annex addurl -b MD5 --file sources.json \
                 https://guix.gnu.org/sources.json
addurl https://guix.gnu.org/sources.json 
(to sources.json) ok
(recording state in git...)

machine-A$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a

machine-A$ git annex add .
machine-A$ git commit -am 'Add sources.json'
[master (root-commit) bdf6bca] Add sources.json
 1 file changed, 1 insertion(+)
 create mode 120000 sources.json
--8<---------------cut here---------------end--------------->8---

Let’s backup.

--8<---------------cut here---------------start------------->8---
machine-B$ $ git clone file:///tmp/example backup && cd backup/

machine-B$ file sources.json 
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As you see, here nothing is really copied.  It is only a symbolic link
pointing to some content outside what Git trackes.

--8<---------------cut here---------------start------------->8---
machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ git annex get sources.json
get sources.json (from origin...) 
ok
(recording state in git...)

machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

Let’s remove the file on machine-B; for whatever reason.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex drop sources.json
drop sources.json ok
(recording state in git...)

machine-B$ file sources.json
sources.json: broken symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

And assume that machine-A is now unreachable.   Let’s get again on
machine-B.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex get sources.json
get sources.json (from web...) 
ok
(recording state in git...)

machine-B$ file sources.json
sources.json: symbolic link to .git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As we see, since ’origin’ is unreachable, it fetches directly from the
web.  Well, on machine-B running:

    git annex sync && git annex get -A

allows to first update the keys and then to fetch all the new content
from ’origin’.  It eases the maintenance of backups, IMHO.

The main advantages are: all is versioned thanks to Git and what is
locally stored is fine-controlled.

Well, if some motivated Haskeller would find fun to implement NAR as
backend, it would allow transparent substitution; from my understanding,
if the key contains NAR hash then it would be possible to bridge with
Guix content-addressed system. :-)

Cheers,
simon


1: https://git-annex.branchable.com/special_remotes/
2: https://git-annex.branchable.com/backends/
3: https://git-annex.branchable.com/internals/key_format/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-25 16:55 ` Simon Tournier
@ 2024-01-26  2:20   ` Kyle Meyer
  2024-01-26 10:02     ` Simon Tournier
  0 siblings, 1 reply; 19+ messages in thread
From: Kyle Meyer @ 2024-01-26  2:20 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, guix-sysadmin, guix-devel

Simon Tournier writes:
> As we see, since ’origin’ is unreachable, it fetches directly from the
> web.  Well, on machine-B running:
>
>     git annex sync && git annex get -A
>
> allows to first update the keys and then to fetch all the new content
> from ’origin’.  It eases the maintenance of backups, IMHO.

One sync wrinkle to consider: by default 'git annex sync' does things
like commit staged changes and sync the checked out branch.  That's
useful in some scenarios, but, in the context of these repos, I'm
guessing people would prefer to continue to manually manage the primary
Git history.  You can tack on an --only-annex to that 'git annex sync'
to tell git-annex to just sync its git-annex branch.

> Well, if some motivated Haskeller would find fun to implement NAR as
> backend, it would allow transparent substitution; from my understanding,
> if the key contains NAR hash then it would be possible to bridge with
> Guix content-addressed system. :-)

Fwiw I think someone could do that outside Haskell, if they preferred,
via a custom backend:

  https://git-annex.branchable.com/design/external_backend_protocol/

Special remotes can also be written in other languages:

  https://git-annex.branchable.com/design/external_special_remote_protocol/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-26  2:20   ` Kyle Meyer
@ 2024-01-26 10:02     ` Simon Tournier
  2024-01-27 16:59       ` Timothy Sample
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Tournier @ 2024-01-26 10:02 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: Ludovic Courtès, guix-sysadmin, guix-devel

Hi Kyle,

On jeu., 25 janv. 2024 at 21:20, Kyle Meyer <kyle@kyleam.com> wrote:

> Fwiw I think someone could do that outside Haskell, if they preferred,
> via a custom backend:
>
>   https://git-annex.branchable.com/design/external_backend_protocol/
>
> Special remotes can also be written in other languages:
>
>   https://git-annex.branchable.com/design/external_special_remote_protocol/

Thanks!  I did not know.  Indeed, it could be a nice GSoC to implement
some ‘git-annex-backend-nar’. :-)

Cheers,
simon


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
                   ` (3 preceding siblings ...)
  2024-01-25 16:55 ` Simon Tournier
@ 2024-01-27  4:31 ` Philip McGrath
  2024-01-28 17:37 ` Efraim Flashner
  2024-02-02 16:46 ` Christine Lemmer-Webber
  6 siblings, 0 replies; 19+ messages in thread
From: Philip McGrath @ 2024-01-27  4:31 UTC (permalink / raw)
  To: Ludovic Courtès, guix-sysadmin; +Cc: guix-devel, morganlemmerwebber

Hi,

On 1/24/24 10:22, Ludovic Courtès wrote:
> 
> The question boils down to: Git-LFS or Git Annex?
> 
> [...]
> 
> What’s your experience?  What would you suggest?
> 

I have a few times had a problem for which I thought Git LFS might be a 
solution, and each time I have ended up ripping out Git LFS in 
frustration before long.

I have not used Git Annex. I have looked into it a few times, but each 
time I decided it was too complex or not quite suitable for my use-case 
in some way. On the other hand, I have heard good things about it from 
people who have used it: in particular, I believe Morgan Lemmer-Webber 
(CC'ed) used it to manage a large set of art history images.

The main thing in this context that still isn't clear to me from by 
reading so far is how sharing lists of remotes works with Git Annex. In 
plain Git, remotes are part of the local state of a particular clone, 
not distributed as part of the repository. For the objectives here, 
though, a lot of the benefit would seem to be having many copies in 
synchronized, possibly "special" remotes so that anyone trying to get 
the videos would have plenty of ways to get them. I'm not sure to what 
extent Git Annex does that out of the box.

I did see that Git Annex can use Git LFS as a "special remote".

There are also two other approaches I think would be worth at least 
considering:

1. Just use Git

While the limitations of Git for storing large media files are well 
known, I have found it to be good enough for several use-cases, and it 
has the strong advantage of not requiring additional tools. My 
impression is that a significant factor in people using Git LFS, in 
particular, is the limit on repository size imposed by the popular 
hosting providers. There are strategies within Git to avoid having to 
download unwanted artifacts, including creating branches with unrelated 
histories, shallow clones (e.g. --depth=1 --single-branch), partial 
clones [1][2][3] (e.g. --filter=blob:none), and sparse checkouts [4][5], 
with the later two being fairly new features.

[1]: https://git-scm.com/docs/partial-clone
[2]: 
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---filterltfilter-specgt
[3]: 
https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---filterltfilter-specgt
[4]: https://git-scm.com/docs/git-sparse-checkout
[5]: https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---sparse

2. Mirror URLs

Another approach would be just to make each video available at a few 
URLs and have Guix origins with the list. If one of the available URLs 
were the Internet Archive, it would have a high degree of assurance of 
long-term preservation. I think the biggest downside is that this might 
not help much with managing the collection of videos.

Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-26 10:02     ` Simon Tournier
@ 2024-01-27 16:59       ` Timothy Sample
  2024-01-27 17:47         ` Kyle Meyer
  2024-02-14 15:18         ` Simon Tournier
  0 siblings, 2 replies; 19+ messages in thread
From: Timothy Sample @ 2024-01-27 16:59 UTC (permalink / raw)
  To: Simon Tournier
  Cc: Kyle Meyer, Ludovic Courtès, guix-sysadmin, guix-devel

Simon Tournier <zimon.toutoune@gmail.com> writes:

>> Special remotes can also be written in other languages:
>>
>>   https://git-annex.branchable.com/design/external_special_remote_protocol/
>
> Thanks!  I did not know.  Indeed, it could be a nice GSoC to implement
> some ‘git-annex-backend-nar’. :-)

I’ve written a special remote in Guile.  If anyone wants to do so, the
following file might help.  It implements the basic protocol.

https://git.ngyro.com/git-annex-remote-clouda/tree/git-annex-remote-clouda/remote.scm

More generally, I’m a fan of git-annex.  The basic idea is both good and
simple: manage content-addressed, external references in a Git repo.  My
biggest complaint is that the interface and its many features often
obscure this simplicity.


-- Tim


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-27 16:59       ` Timothy Sample
@ 2024-01-27 17:47         ` Kyle Meyer
  2024-02-14 15:18         ` Simon Tournier
  1 sibling, 0 replies; 19+ messages in thread
From: Kyle Meyer @ 2024-01-27 17:47 UTC (permalink / raw)
  To: Timothy Sample
  Cc: Simon Tournier, Ludovic Courtès, guix-sysadmin, guix-devel

Timothy Sample writes:

> I’ve written a special remote in Guile.  If anyone wants to do so, the
> following file might help.  It implements the basic protocol.
>
> https://git.ngyro.com/git-annex-remote-clouda/tree/git-annex-remote-clouda/remote.scm

Looks like a great reference.  Thanks for sharing.

(And thanks for doing the initial work packaging git-annex years ago!  I
remember being very excited to see it because I had tried multiple times
and kept getting stuck.)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 17:39 ` Giovanni Biscuolo
@ 2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
  2024-01-28 11:32     ` Philip McGrath
  2024-01-28 17:32     ` Giovanni Biscuolo
  0 siblings, 2 replies; 19+ messages in thread
From: Nicolas Graves via Development of GNU Guix and the GNU System distribution. @ 2024-01-28 10:33 UTC (permalink / raw)
  To: Giovanni Biscuolo, Ludovic Courtès, guix-sysadmin; +Cc: guix-devel


I've left git-annex for git-lfs, I'll just add a few points about
git-lfs.


On 2024-01-24 18:39, Giovanni Biscuolo wrote:

> Hi Ludo’
>
> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>> The question boils down to: Git-LFS or Git Annex?
>>
>> From a quick look (I haven’t used them), Git-LFS seems to assume a
>> rather centralized model where there’s an LFS server sitting next to the
>> Git server¹.  Git Annex looks more decentralized, allowing you to have
>> several “remotes”, to check the status of each one, to sync them, etc.²
>> Because of this, Git Annex seems to be a better fit.

This is not always true. Git-LFS also has the concept of Custom Transfer
Agents, which in some cases do not need a running server. One example is
lfs-folderstore, which can simply use a remote directory as a LFS remote.

>
> I've never used Git-LFS for my media repository (and will never use it,
> never).
>
> AFAIK this two advantages of git-annex vs Git-LFS are still valid today:
>
> --8<---------------cut here---------------start------------->8---
>
> A major advantage of git annex is that you can choose which file you
> want to download.
>
> You still know which files are available thanks to the symlinks.
>
> For example suppose that you have a directory full of ISO files. You can
> list the files, then decide which one you want to download by typing:
> git annex get my_file.

This is true, but
1) you can still adapt your filters to ignore certain files, although
more inconvenient, it's not impossible
2) in practice, I think most uses don't need to. I just now that all .lz
files in a directory are to LFS, no questions asked.

>
> Another advantage is that the files are not duplicated in your
> checkout. With LFS, lfs files are present as git objects both in
> .git/lfs/objects and in your working repository. So If you have 20 GB of
> LFS files, you need 40 GB on your disk. While with git annex, files are
> symlinked so in this case only 20 GB is required.

True.
>
> --8<---------------cut here---------------end--------------->8---
> (https://stackoverflow.com/a/43277071, 2018-10-23)
>
> So, AFAIU, with Git-LFS you can have all your media or no media, you
> cannot selectively choose what media to get.
>
> Another important limitation of Git-LFS is that you cannot delete
> (remotely stored) objects [1], with git-annex is very easy.

Probably true, haven't encountered the use-case yet.
>
>> Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
>> support Git-LFS;
>
> to host a Git-LFS service a Git-LFS server implementation (one that
> reply to GIT_LFS API) is needed:
> https://github.com/git-lfs/git-lfs/wiki/Implementations

See my point on custom transfer agents.
>
> AFAIU we dont't have one packaged (I'd save some precious time trying to
> package one of them).
>
> AFAIK Savannah do not support git-annex also, so we need to set up a
> Guix git-annex server [3], I suggest using gitolite [4]: I can help with
> this task if needed!
>
>> the two other web sites above are hosted on GitLab instances, which I
>> think do support Git-LFS.
>
> Yes, Git-LFS is supported on GitLab.com and included in the Community
> Edition [2] since late 2015.
>
> git-annex repository support was available on GitLab.com in 2015/16 but
> was removed in 2017 [5]
>
>> What’s your experience?  What would you suggest?
>
> I've no experience with Git-LFS (and will never have) but from what I
> read I definitely suggest git-annex: it's more efficient, it's more
> flexible, can be hosted everywhere with a little bit of effort... can be
> hosted on a Guix System host! :-)
>
> As a bonus, git-annex have a plenty of super cool features that will
> make us very happy, i.e.:
>
> - special remotes: https://git-annex.branchable.com/special_remotes/
>   (including rclone
>   https://git-annex.branchable.com/special_remotes/rclone/)
>
> - location tracking
>   (https://git-annex.branchable.com/location_tracking/)
>
> - manage metadata of annexed files
>
> HTH! Gio'

Just a note on upsides of Git-LFS :
- integration with git is better. A special magit extension to use
git-lfs is not needed, whereas it is with git-annex.
- less operations: once I know which files will be my media files, I
have less headaches (basically the exact git experience, you don't have
to think about where I should `git add` or `git annex add` a file).

It's indeed less copyleft though. Simpler, but also maybe less adapted
to this use-case.

>
>> Thanks,
>> Ludo’.
>>
>> ⁰ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/berlin.scm#n193
>> ¹ https://github.com/git-lfs/git-lfs/wiki/Tutorial
>> ² https://git-annex.branchable.com/walkthrough/
>
>
> [1] https://github.com/git-lfs/git-lfs/wiki/Limitations
>
> [2] GitLab Community Edition
>
> [3]
> https://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_your_own_server/
>
> [4] https://git-annex.branchable.com/tips/using_gitolite_with_git-annex/
>
> [5] https://about.gitlab.com/blog/2015/02/17/gitlab-annex-solves-the-problem-of-versioning-large-binaries-with-git/

-- 
Best regards,
Nicolas Graves


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
@ 2024-01-28 11:32     ` Philip McGrath
  2024-01-28 17:32     ` Giovanni Biscuolo
  1 sibling, 0 replies; 19+ messages in thread
From: Philip McGrath @ 2024-01-28 11:32 UTC (permalink / raw)
  To: Nicolas Graves, Giovanni Biscuolo, Ludovic Courtès,
	guix-sysadmin
  Cc: Brian Cully

Hi,

On Sun, Jan 28, 2024, at 5:33 AM, Nicolas Graves via Development of GNU Guix and the GNU System distribution. wrote:
> I've left git-annex for git-lfs, I'll just add a few points about
> git-lfs.
>
>
> On 2024-01-24 18:39, Giovanni Biscuolo wrote:
>
>> Hi Ludo’
>>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>> [...]
>>
>>> The question boils down to: Git-LFS or Git Annex?
>>>
>>> From a quick look (I haven’t used them), Git-LFS seems to assume a
>>> rather centralized model where there’s an LFS server sitting next to the
>>> Git server¹.  Git Annex looks more decentralized, allowing you to have
>>> several “remotes”, to check the status of each one, to sync them, etc.²
>>> Because of this, Git Annex seems to be a better fit.
>
> This is not always true. Git-LFS also has the concept of Custom Transfer
> Agents, which in some cases do not need a running server. One example is
> lfs-folderstore, which can simply use a remote directory as a LFS remote.
>

This is very interesting and could have me look at Git LFS again.

>>
>> I've never used Git-LFS for my media repository (and will never use it,
>> never).
>>
>> AFAIK this two advantages of git-annex vs Git-LFS are still valid today:
>>
>> --8<---------------cut here---------------start------------->8---
>>
>> A major advantage of git annex is that you can choose which file you
>> want to download.
>>
>> You still know which files are available thanks to the symlinks.
>>
>> For example suppose that you have a directory full of ISO files. You can
>> list the files, then decide which one you want to download by typing:
>> git annex get my_file.
>
> This is true, but
> 1) you can still adapt your filters to ignore certain files, although
> more inconvenient, it's not impossible
> 2) in practice, I think most uses don't need to. I just now that all .lz
> files in a directory are to LFS, no questions asked.
>

I think you could probably use the fairly new “sparse checkout” feature of Git to get only some Git LFS files.

>>
>> Another advantage is that the files are not duplicated in your
>> checkout. With LFS, lfs files are present as git objects both in
>> .git/lfs/objects and in your working repository. So If you have 20 GB of
>> LFS files, you need 40 GB on your disk. While with git annex, files are
>> symlinked so in this case only 20 GB is required.
>
> True.

This raises a question for me about Git Annex: if the files are symlinks, if I edit a file, is the change detected and tracked? Could the old version of the file potentially be lost, if I don’t take care to have it synced elsewhere before editing?

Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
  2024-01-28 11:32     ` Philip McGrath
@ 2024-01-28 17:32     ` Giovanni Biscuolo
  2024-01-29 11:39       ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
  1 sibling, 1 reply; 19+ messages in thread
From: Giovanni Biscuolo @ 2024-01-28 17:32 UTC (permalink / raw)
  To: Nicolas Graves, guix-sysadmin; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 4375 bytes --]

Hi Nicolas,

Nicolas Graves <ngraves@ngraves.fr> writes:

[...]

> This is not always true. Git-LFS also has the concept of Custom Transfer
> Agents, which in some cases do not need a running server. One example is
> lfs-folderstore, which can simply use a remote directory as a LFS
> remote.

thanks, i didn't know about custom transfer agents, the use withous an
API server is documented here:

--8<---------------cut here---------------start------------->8---

In some cases the transfer agent can figure out by itself how and where
the transfers should be made, without having to query the API server. In
this case it's possible to use the custom transfer agent directly,
without querying the server, by using the following config option:
 
 lfs.standalonetransferagent, lfs.<url>.standalonetransferagent

Specifies a custom transfer agent to be used if the API server URL
matches as in "git config --get-urlmatch lfs.standalonetransferagent
<apiurl>". git-lfs will not contact the API server. It instead sets
stage 2 transfer actions to null. "lfs.<url>.standalonetransferagent"
can be used to configure a custom transfer agent for individual
remotes. "lfs.standalonetransferagent" unconditionally configures a
custom transfer agent for all remotes. The custom transfer agent must be
specified in a "lfs.customtransfer.<name>" settings group.

--8<---------------cut here---------------end--------------->8---
(https://github.com/git-lfs/git-lfs/blob/main/docs/custom-transfers.md#using-a-custom-transfer-type-without-the-api-server)

some examples:

1. git-lfs-agent-scp: A custom transfer agent for git-lfs that uses scp
   to transfer files. This transfer agent makes it possible to use
   git-lfs in situations where the remote only speaks ssh. This is
   useful if you do not want to install a git-lfs server. (MIT license,
   written in C, URL: https://github.com/tdons/git-lfs-agent-scp)

2. git-lfs-rsync-agent: The rsync git-lfs custom transfer agent allows
   transferring the data through rsync, for example using SSH
   authentication. (MIT license, written in Go, URL:
   https://github.com/excavador/git-lfs-rsync-agent)

3. git-lfs-agent-scp-bash: A custom transfer agent for git-lfs that uses
   scp to transfer files. This is a self-contained bash script designed
   for seamless installation, requiring no prerequisites with the
   exception of the external command scp. It enables to use git-lfs even
   if you can not use http/https but ssh only. (MIT License, written in
   bash, URL: https://github.com/yoshimoto/git-lfs-agent-scp-bash)

So yes: we could use git-lfs without a git-lfs server and set an rsync
or scp transfer agent for each remote (documenting it for users, since
this must be done client-side)

It's not at all as powerful as the location tracking features of
git-annex but... doable :-)

[...]

>> Another important limitation of Git-LFS is that you cannot delete
>> (remotely stored) objects [1], with git-annex is very easy.
>
> Probably true, haven't encountered the use-case yet.

IMHO this is a very important feature when you have to manage media
archives.

[...]

> Just a note on upsides of Git-LFS :
> - integration with git is better. A special magit extension to use
> git-lfs is not needed, whereas it is with git-annex.

true :-D

> - less operations: once I know which files will be my media files, I
> have less headaches (basically the exact git experience, you don't have
> to think about where I should `git add` or `git annex add` a file).

it's the same with git-annex, you just have to configure/distribute a
.gitattributes file, i.e.:

--8<---------------cut here---------------start------------->8---

* annex.largefiles=(largerthan=5Mb)
* annex.largefiles=(not(mimetype=text/*))

--8<---------------cut here---------------end--------------->8---

see https://git-annex.branchable.com/tips/largefiles/ for a description
of this feature

> It's indeed less copyleft though. Simpler, but also maybe less adapted
> to this use-case.

With git-annex everyone can set up a "git-annex enabled" server
(although haskel dependency is a limitation since it's unsupported in
many architectures)... or use one of the available special remotes.

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
                   ` (4 preceding siblings ...)
  2024-01-27  4:31 ` Philip McGrath
@ 2024-01-28 17:37 ` Efraim Flashner
  2024-02-02 16:46 ` Christine Lemmer-Webber
  6 siblings, 0 replies; 19+ messages in thread
From: Efraim Flashner @ 2024-01-28 17:37 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-sysadmin, guix-devel

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

On Wed, Jan 24, 2024 at 04:22:05PM +0100, Ludovic Courtès wrote:
> Hello!
> 
> I’m looking for ways to incorporate videos into the repositories of our
> web sites so they’re content-addressed and properly tracked, and to make
> it easier to create backups (right now those videos are stored on our
> two main servers and rsynced between them⁰; I’m talking about the videos
> at guix.gnu.org, 10years.guix.gnu.org, and hpc.guix.info).
> 
> The question boils down to: Git-LFS or Git Annex?
> 
> From a quick look (I haven’t used them), Git-LFS seems to assume a
> rather centralized model where there’s an LFS server sitting next to the
> Git server¹.  Git Annex looks more decentralized, allowing you to have
> several “remotes”, to check the status of each one, to sync them, etc.²
> Because of this, Git Annex seems to be a better fit.
> 
> Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
> support Git-LFS; the two other web sites above are hosted on GitLab
> instances, which I think do support Git-LFS.
> 
> What’s your experience?  What would you suggest?

I'll respond off the first email because I lost where I was thinking of
responding to.

One git annex repository that I sometimes visit is the
conference_proceedings¹ repository, which has many years worth of
conference videos.  With such a repo you wouldn't actually run `git
annex sync`, you'd `git pull` as desired, run `git annex get
path/to/the/video`, watch the video, and then `git annex drop
path/to/the/video`.  Last I checked there's even tie-in scripts for some
file managers like thunar.

I actually use git-annex with my family's photos and videos, with a full
master copy at my place and one at my parent's place, and a couple of
remotes on the internet.

$ git annex whereis Wedding.iso
whereis Wedding.iso (5 copies)
        00f742bc-02d6-4b05-853a-7703f87b29f9 -- efraim@debian:~/workspace/Flashner_Backup [ct-tor]
        47c3cd13-68d9-43f7-b8a7-e742dccce3be -- [scaleway]
        66babe8f-d716-4502-844f-06645eda3b23 -- efraim@raspberrypi:/media/Elements/efraim/Flashner_Backup
        c8898bb8-da93-4507-87c2-5496241b5dc6 -- efraim@3900XT:~/workspace/Flashner_Backup [here]
        d22b8903-9e94-47f3-8e8a-ef1468e478e3 -- cloud [amazon]
ok

The ISO is encrypted with my GPG key. For this repo I do just run `git
annex sync` because it's really only me interacting with it, and I don't
care about how awful the git history looks.

In our Guix video option, we could upload the actual videos to, say,
archive.org or to audiovideo.gnu.org (or whatever the site is) and then
add the video as a remote `git annex addurl https://path/to/the/video`
and it'll just be available in the repo.

¹ https://github.com/RichiH/conference_proceedings

-- 
Efraim Flashner   <efraim@flashner.co.il>   רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-28 17:32     ` Giovanni Biscuolo
@ 2024-01-29 11:39       ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolas Graves via Development of GNU Guix and the GNU System distribution. @ 2024-01-29 11:39 UTC (permalink / raw)
  To: Giovanni Biscuolo, guix-sysadmin; +Cc: guix-devel

On 2024-01-28 18:32, Giovanni Biscuolo wrote:

> Hi Nicolas,
>
> Nicolas Graves <ngraves@ngraves.fr> writes:
>
> [...]
>
>> This is not always true. Git-LFS also has the concept of Custom Transfer
>> Agents, which in some cases do not need a running server. One example is
>> lfs-folderstore, which can simply use a remote directory as a LFS
>> remote.
>
> thanks, i didn't know about custom transfer agents, the use withous an
> API server is documented here:
>
> --8<---------------cut here---------------start------------->8---
>
> In some cases the transfer agent can figure out by itself how and where
> the transfers should be made, without having to query the API server. In
> this case it's possible to use the custom transfer agent directly,
> without querying the server, by using the following config option:
>
>  lfs.standalonetransferagent, lfs.<url>.standalonetransferagent
>
> Specifies a custom transfer agent to be used if the API server URL
> matches as in "git config --get-urlmatch lfs.standalonetransferagent
> <apiurl>". git-lfs will not contact the API server. It instead sets
> stage 2 transfer actions to null. "lfs.<url>.standalonetransferagent"
> can be used to configure a custom transfer agent for individual
> remotes. "lfs.standalonetransferagent" unconditionally configures a
> custom transfer agent for all remotes. The custom transfer agent must be
> specified in a "lfs.customtransfer.<name>" settings group.
>
> --8<---------------cut here---------------end--------------->8---
> (https://github.com/git-lfs/git-lfs/blob/main/docs/custom-transfers.md#using-a-custom-transfer-type-without-the-api-server)
>
> some examples:
>
> 1. git-lfs-agent-scp: A custom transfer agent for git-lfs that uses scp
>    to transfer files. This transfer agent makes it possible to use
>    git-lfs in situations where the remote only speaks ssh. This is
>    useful if you do not want to install a git-lfs server. (MIT license,
>    written in C, URL: https://github.com/tdons/git-lfs-agent-scp)
>
> 2. git-lfs-rsync-agent: The rsync git-lfs custom transfer agent allows
>    transferring the data through rsync, for example using SSH
>    authentication. (MIT license, written in Go, URL:
>    https://github.com/excavador/git-lfs-rsync-agent)
>
> 3. git-lfs-agent-scp-bash: A custom transfer agent for git-lfs that uses
>    scp to transfer files. This is a self-contained bash script designed
>    for seamless installation, requiring no prerequisites with the
>    exception of the external command scp. It enables to use git-lfs even
>    if you can not use http/https but ssh only. (MIT License, written in
>    bash, URL: https://github.com/yoshimoto/git-lfs-agent-scp-bash)
>
> So yes: we could use git-lfs without a git-lfs server and set an rsync
> or scp transfer agent for each remote (documenting it for users, since
> this must be done client-side)
>
> It's not at all as powerful as the location tracking features of
> git-annex but... doable :-)

One downside however : For some reason, Custom Transfer Agents are
rarely well written, well supported projects. I had to write one myself
for my use-case, I now have a good understanding of the protocol which
is definitely simple, but it's a necessary thing to note that despite
this rather easy extensibility, few CTAs are good/reliable in the long
run.

>
> [...]
>
>>> Another important limitation of Git-LFS is that you cannot delete
>>> (remotely stored) objects [1], with git-annex is very easy.
>>
>> Probably true, haven't encountered the use-case yet.
>
> IMHO this is a very important feature when you have to manage media
> archives.

Depends on the use-case! If you're just looking for an archival tool of
a media with one unmutable version which you want to support for an
indefinite amount of time, doesn't matter that much.

Gitlab and Github documentation say it's possible with the
git-filter-repo extension, but that is indeed not easy.
https://docs.gitlab.com/ee/topics/git/lfs/#removing-objects-from-lfs

>
> [...]
>
>> Just a note on upsides of Git-LFS :
>> - integration with git is better. A special magit extension to use
>> git-lfs is not needed, whereas it is with git-annex.
>
> true :-D
>
>> - less operations: once I know which files will be my media files, I
>> have less headaches (basically the exact git experience, you don't have
>> to think about where I should `git add` or `git annex add` a file).
>
> it's the same with git-annex, you just have to configure/distribute a
> .gitattributes file, i.e.:
>
> --8<---------------cut here---------------start------------->8---
>
> * annex.largefiles=(largerthan=5Mb)
> * annex.largefiles=(not(mimetype=text/*))
>
> --8<---------------cut here---------------end--------------->8---
>
> see https://git-annex.branchable.com/tips/largefiles/ for a description
> of this feature

Nice! I haven't experimented with this. Funny how it's one extension of
a git extension that provides what LFS does natively. 

>
>> It's indeed less copyleft though. Simpler, but also maybe less adapted
>> to this use-case.
>
> With git-annex everyone can set up a "git-annex enabled" server
> (although haskel dependency is a limitation since it's unsupported in
> many architectures)... or use one of the available special remotes.
>
> Thanks! Gio'

One upside I've forgotten that git-annex may also provide (does it ?) :
if the CTA is well-written, progress update during upload/download is
quite reassuring when sending heavy files.

-- 
Best regards,
Nicolas Graves


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
                   ` (5 preceding siblings ...)
  2024-01-28 17:37 ` Efraim Flashner
@ 2024-02-02 16:46 ` Christine Lemmer-Webber
  6 siblings, 0 replies; 19+ messages in thread
From: Christine Lemmer-Webber @ 2024-02-02 16:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-sysadmin, guix-devel

git-annex, 10000% :)

It's not as popular, but it's much more powerful, and is some of my
favorite software!

Ludovic Courtès <ludo@gnu.org> writes:

> Hello!
>
> I’m looking for ways to incorporate videos into the repositories of our
> web sites so they’re content-addressed and properly tracked, and to make
> it easier to create backups (right now those videos are stored on our
> two main servers and rsynced between them⁰; I’m talking about the videos
> at guix.gnu.org, 10years.guix.gnu.org, and hpc.guix.info).
>
> The question boils down to: Git-LFS or Git Annex?
>
> From a quick look (I haven’t used them), Git-LFS seems to assume a
> rather centralized model where there’s an LFS server sitting next to the
> Git server¹.  Git Annex looks more decentralized, allowing you to have
> several “remotes”, to check the status of each one, to sync them, etc.²
> Because of this, Git Annex seems to be a better fit.
>
> Data point: guix.gnu.org source is hosted on Savannah, which doesn’t
> support Git-LFS; the two other web sites above are hosted on GitLab
> instances, which I think do support Git-LFS.
>
> What’s your experience?  What would you suggest?
>
> Thanks,
> Ludo’.
>
> ⁰ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/berlin.scm#n193
> ¹ https://github.com/git-lfs/git-lfs/wiki/Tutorial
> ² https://git-annex.branchable.com/walkthrough/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git-LFS or Git Annex?
  2024-01-27 16:59       ` Timothy Sample
  2024-01-27 17:47         ` Kyle Meyer
@ 2024-02-14 15:18         ` Simon Tournier
  1 sibling, 0 replies; 19+ messages in thread
From: Simon Tournier @ 2024-02-14 15:18 UTC (permalink / raw)
  To: Timothy Sample
  Cc: Kyle Meyer, Ludovic Courtès, guix-sysadmin, guix-devel

Hi Timothy,

On sam., 27 janv. 2024 at 10:59, Timothy Sample <samplet@ngyro.com> wrote:

> https://git.ngyro.com/git-annex-remote-clouda/tree/git-annex-remote-clouda/remote.scm

Oh cool, thanks.  Bookmarked.

Cheers,
simon


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-02-15  9:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 15:22 Git-LFS or Git Annex? Ludovic Courtès
2024-01-24 16:13 ` indieterminacy
2024-01-24 17:39 ` Giovanni Biscuolo
2024-01-28 10:33   ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
2024-01-28 11:32     ` Philip McGrath
2024-01-28 17:32     ` Giovanni Biscuolo
2024-01-29 11:39       ` Nicolas Graves via Development of GNU Guix and the GNU System distribution.
2024-01-24 18:41 ` pukkamustard
2024-01-24 20:32   ` Troy Figiel
2024-01-25 12:03   ` Giovanni Biscuolo
2024-01-25 16:55 ` Simon Tournier
2024-01-26  2:20   ` Kyle Meyer
2024-01-26 10:02     ` Simon Tournier
2024-01-27 16:59       ` Timothy Sample
2024-01-27 17:47         ` Kyle Meyer
2024-02-14 15:18         ` Simon Tournier
2024-01-27  4:31 ` Philip McGrath
2024-01-28 17:37 ` Efraim Flashner
2024-02-02 16:46 ` Christine Lemmer-Webber

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).