* Limited-history local archives
@ 2020-01-03 20:15 Konstantin Ryabitsev
2020-01-03 22:02 ` Eric Wong
0 siblings, 1 reply; 2+ messages in thread
From: Konstantin Ryabitsev @ 2020-01-03 20:15 UTC (permalink / raw)
To: meta
Hi, all:
I wonder if it would be useful to have a feature allowing someone to run
a limited-history local copy of a larger remote archive -- for example
if someone only wanted a 3-month copy of LKML instead of the whole
20-year enchilada.
It's possible to accomplish this with git already [^1], e.g. you can use
the following to grab a copy of LKML starting with December 2019:
$ git clone --bare --shallow-since 2019-12-01 https://lore.kernel.org/lkml/git/7 lkml-since-dec.git
$ cd lkml-since-dec.git
$ git config --add remote.origin.fetch '+refs/heads/master:refs/heads/master'
You can now run "git fetch" as usual and perform all the normal
operations, such as "git show {rev}:m" to get the message contents.
Obviously, if we try to get a revision from before December 1, the
operation fails:
$ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
fatal: Path 'm' does not exist in 'dae740ca679710fbe8b97b3e704d63e3e7883fd9'
If we enable uploadpack.allowAnySHA1InWant on the server, we can then
fetch this object directly:
$ git fetch --depth 1 origin dae740ca679710fbe8b97b3e704d63e3e7883fd9
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://lore.kernel.org/lkml/git/7
* branch dae740ca679710fbe8b97b3e704d63e3e7883fd9 -> FETCH_HEAD
Now this succeeds:
$ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
We can then periodically reshallow the archive (e.g. once a day) in
order to get rid of older objects:
$ git fetch --shallow-since 2019-12-15 --update-shallow origin master
$ git gc --prune=now
There isn't really an RFC or anything associated with this -- I just
wanted to share this idea as a possibly useful way of reducing local
storage requirements while still being able to operate directly on
public-inbox git repositories -- e.g. with a tool like l2md
(https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/l2md.git/).
-K
[^1]: Theoretically, this will become even easier in the future with
partial-clone functionality, though I believe that's mostly
written to support fetching large blobs from CDNs and wouldn't be
as useful for very linear public-inbox repositories.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Limited-history local archives
2020-01-03 20:15 Limited-history local archives Konstantin Ryabitsev
@ 2020-01-03 22:02 ` Eric Wong
0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2020-01-03 22:02 UTC (permalink / raw)
To: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hi, all:
>
> I wonder if it would be useful to have a feature allowing someone to run
> a limited-history local copy of a larger remote archive -- for example
> if someone only wanted a 3-month copy of LKML instead of the whole
> 20-year enchilada.
Yes.
> It's possible to accomplish this with git already [^1], e.g. you can use
> the following to grab a copy of LKML starting with December 2019:
>
> $ git clone --bare --shallow-since 2019-12-01 https://lore.kernel.org/lkml/git/7 lkml-since-dec.git
> $ cd lkml-since-dec.git
> $ git config --add remote.origin.fetch '+refs/heads/master:refs/heads/master'
>
> You can now run "git fetch" as usual and perform all the normal
> operations, such as "git show {rev}:m" to get the message contents.
> Obviously, if we try to get a revision from before December 1, the
> operation fails:
>
> $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
> fatal: Path 'm' does not exist in 'dae740ca679710fbe8b97b3e704d63e3e7883fd9'
>
> If we enable uploadpack.allowAnySHA1InWant on the server, we can then
> fetch this object directly:
Usability-wise, git itself seems pretty bad at this...
I haven't looked deeply at this, but could/should public-inbox
enable allowAnySHA1InWant by default?
> $ git fetch --depth 1 origin dae740ca679710fbe8b97b3e704d63e3e7883fd9
> remote: Counting objects: 3, done.
> remote: Compressing objects: 100% (2/2), done.
> remote: Total 3 (delta 0), reused 3 (delta 0)
> Unpacking objects: 100% (3/3), done.
> From https://lore.kernel.org/lkml/git/7
> * branch dae740ca679710fbe8b97b3e704d63e3e7883fd9 -> FETCH_HEAD
>
> Now this succeeds:
>
> $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
>
> We can then periodically reshallow the archive (e.g. once a day) in
> order to get rid of older objects:
>
> $ git fetch --shallow-since 2019-12-15 --update-shallow origin master
> $ git gc --prune=now
>
> There isn't really an RFC or anything associated with this -- I just
> wanted to share this idea as a possibly useful way of reducing local
> storage requirements while still being able to operate directly on
> public-inbox git repositories -- e.g. with a tool like l2md
> (https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/l2md.git/).
Given allowAnySHA1InWant isn't enabled by default on servers
today, and the number of commands are needed on the client,
I'm not sure git is really great for people who want to read
mail locally...
POST + "&x=m" search queries the easiest alternative, I think:
curl -X POST "$INBOX_URL/?q=d:$YYYYMMDD..&x=m" >mboxrd.gz
(but I wish MUAs could keep track of which messages I've read in
between queries)
And NNTP, which ought to be tunnel-able over HTTPS CONNECT.
> [^1]: Theoretically, this will become even easier in the future with
> partial-clone functionality, though I believe that's mostly
> written to support fetching large blobs from CDNs and wouldn't be
> as useful for very linear public-inbox repositories.
Fwiw, I really wish "git --git-dir=$URL any-read-only-command"
could work one day like it does with SVN.
WebDAV would've been nice but AFAIK davfs2 doesn't support
Range:, yet..., and having to mount FSes is a drag...
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-01-03 22:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-03 20:15 Limited-history local archives Konstantin Ryabitsev
2020-01-03 22:02 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).