* Disarchive update
@ 2021-10-09 10:05 Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-09 10:05 UTC (permalink / raw)
To: guix-devel
Hello Guix!
This job is disassembling all the .tar.gz files packages refer to, using
the recently-added ‘etc/disarchive-manifest.scm’ file:
https://ci.guix.gnu.org/jobset/disarchive
It has just succeeded for the first time. :-)
https://ci.guix.gnu.org/eval/29213?status=succeeded
If you run:
guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv
or:
guix build -m etc/disarchive-manifest.scm
and if you’re patient :-), you eventually get a 579 MB directory
containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
missing tarballs are those that “disarchive disassemble” fails to
handle, for instance because it couldn’t guess what compression method
is being used.)
Where to go from here? Timothy Sample had already set up a Disarchive
database at <https://disarchive.ngyro.com>, which (guix download) uses
as a fallback; I’m not sure exactly how it’s populated. The goal here
would be for the Guix project to set up infrastructure populating a
database automatically and creating backups, possibly via SWH (we’ll
have to discuss it with them).
A plan we can already deploy would be:
1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
2. On berlin, add an mcron job that periodically copies the output of
the latest “disarchive-collection” build to a directory, say
/srv/disarchive. Thus, the database would accumulate tarball
metadata over time.
3. Add an nginx route so that /srv/disarchive is served at
https://disarchive.guix.gnu.org.
4. Add disarchive.guix.gnu.org to (guix download).
How does that sound? Thoughts?
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-09 10:05 Disarchive update Ludovic Courtès
@ 2021-10-09 10:37 ` Mathieu Othacehe
2021-10-10 13:22 ` Ludovic Courtès
2021-10-12 9:19 ` zimoun
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Mathieu Othacehe @ 2021-10-09 10:37 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Hey Ludo,
> https://ci.guix.gnu.org/eval/29213?status=succeeded
Nice! It looks like an expensive operation, maybe we should increase its
period to 24 hours or so?
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
We could add the result as a "build-product" so that it is available at:
https://ci.guix.gnu.org/search/latest/disarchive-collection. The mcron
job could use this URL to fetch the latest archive.
> How does that sound? Thoughts?
Sounds great, happy to see more use-cases for Cuirass :)
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-09 10:37 ` Mathieu Othacehe
@ 2021-10-10 13:22 ` Ludovic Courtès
2021-10-12 8:41 ` Mathieu Othacehe
0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-10 13:22 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: guix-devel
Hi!
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> https://ci.guix.gnu.org/eval/29213?status=succeeded
>
> Nice! It looks like an expensive operation, maybe we should increase its
> period to 24 hours or so?
Yes, I’ve made it 12 hours now. :-)
It shouldn’t be too expensive: there’s one derivation per tarball
disarchive and very few of them get rebuilt between subsequent
evaluations; disarchive-collection.drv depends on all of them.
However, I think the current model of Cuirass means that those
intermediate derivations aren’t retrieved on berlin so we’re potentially
building things multiple times?
>> 2. On berlin, add an mcron job that periodically copies the output of
>> the latest “disarchive-collection” build to a directory, say
>> /srv/disarchive. Thus, the database would accumulate tarball
>> metadata over time.
>
> We could add the result as a "build-product" so that it is available at:
> https://ci.guix.gnu.org/search/latest/disarchive-collection. The mcron
> job could use this URL to fetch the latest archive.
That’d be nice! How do we do that again?
I was planning on retrieving the derivation file name in the mcron job
using the (guix ci) API, but having a build product may simplify things
a bit.
Thanks for your feedback!
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-10 13:22 ` Ludovic Courtès
@ 2021-10-12 8:41 ` Mathieu Othacehe
2021-10-14 14:06 ` Ludovic Courtès
0 siblings, 1 reply; 15+ messages in thread
From: Mathieu Othacehe @ 2021-10-12 8:41 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Hey,
> That’d be nice! How do we do that again?
The build-outputs field of the <specification> record must be used as
explained here:
https://guix.gnu.org/cuirass/manual/html_node/Specifications.html#Specifications.
This field cannot be manipulated via the web interface yet.
I think the easier way to proceed would be to create the "disarchive"
specification in the (maintenance sysadmin services) module, this way:
--8<---------------cut here---------------start------------->8---
(specification
(name "disarchive")
(build '(manifests "etc/disarchive-manifest.scm"))
(build-outputs
(list
(build-output
(job "disarchive-collection*")
(type "archive")
(path ""))))
(notifications #$(cuirass-notifications))
(period 43200)
(priority 7)
(systems '("x86_64-linux")))
--8<---------------cut here---------------end--------------->8---
I can take care of that if it's ok for you.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
@ 2021-10-12 9:19 ` zimoun
2021-10-14 14:02 ` Ludovic Courtès
2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:31 ` Ludovic Courtès
3 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-10-12 9:19 UTC (permalink / raw)
To: Ludovic Courtès, guix-devel
Hi Ludo,
On Sat, 09 Oct 2021 at 12:05, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
> If you run:
>
> guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv
Oh, cool!
> and if you’re patient :-), you eventually get a 579 MB directory
> containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
> missing tarballs are those that “disarchive disassemble” fails to
> handle, for instance because it couldn’t guess what compression method
> is being used.)
Timothy made this table months ago:
tar+gz 9090 52.0%
git 5294 30.3%
tar+xz 1184 06.8%
tar+bz2 775 04.4%
tar 393 02.2%
zip 273 01.6%
svn-multi 175 01.0%
svn 125 00.7%
file 51 00.3%
computed 38 00.2%
hg 36 00.2%
unknown-uri 20 00.1%
tar+gz? 15 00.1%
tar+lz 13 00.1%
tar+Z 4 00.0%
cvs 3 00.0%
bzr 3 00.0%
tar+lzma 2 00.0%
total 17494 100.0%
What is really missing is XZ and Bzip2 support in Disarchive, I guess.
> Where to go from here? Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated. The goal here
> would be for the Guix project to set up infrastructure populating a
> database automatically and creating backups, possibly via SWH (we’ll
> have to discuss it with them).
Timothy was working on feeding the database using each release. Well,
you can give a look at:
<https://git.ngyro.com/preservation-of-guix>
Then something along these lines:
$ sqlite3 /tmp/pog.db < schema.sql
$ guix repl -L . <(echo '
(use-modules (pog))
(ingest "6298c3ffd9654d3231a6f25390b056483e8f407c"
"/tmp/pog.db")
')
for where the commit hash corresponds to v1.0.0. I do not know if it
would be equivalent to run:
guix time-machine --commit=6298c3ffd9654d3231a6f25390b056483e8f407c \
-- build -m etc/disarchive-manifest.scm
> A plan we can already deploy would be:
>
> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
>
> 3. Add an nginx route so that /srv/disarchive is served at
> https://disarchive.guix.gnu.org.
>
> 4. Add disarchive.guix.gnu.org to (guix download).
To replace (or add to) the current ’%disarchive-mirrors’ right?
Going this road (use Cuirass), why not generating the sources.json
similarly? Instead of the hack using the website builder.
On my side, I will try to resume what I started months ago: knowing the
SWH coverage. For instance, on this ~92% of tarballs, how many are
currently stored into SWH? Well, do not take your breath and I would be
happy if someone beats me. ;-)
Cheers,
simon
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
2021-10-12 9:19 ` zimoun
@ 2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:04 ` Ludovic Courtès
2021-10-14 14:31 ` Ludovic Courtès
3 siblings, 1 reply; 15+ messages in thread
From: Timothy Sample @ 2021-10-13 14:54 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Hi Ludovic,
Ludovic Courtès <ludovic.courtes@inria.fr> writes:
> This job is disassembling all the .tar.gz files packages refer to, using
> the recently-added ‘etc/disarchive-manifest.scm’ file:
>
> https://ci.guix.gnu.org/jobset/disarchive
>
> It has just succeeded for the first time. :-)
Fantastic! I feel bad that I left you holding the bag on this one,
though. Sorry. I’ve been a little adrift this summer. Thanks for
picking it up!
> Where to go from here? Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated.
Basically the same as what you are doing now. I have many Cuirass jobs,
and I use the build outputs mechanism (mentioned by Mathieu in elsewhere
in this thread). I don’t have a “disarchive-collection” job, so I have
to use the Cuirass API to dig through the recent build outputs to find
new results. This happens from a cron job, which uploads each new
result to my server.
One simple but satisfying thing that I do is serve the files compressed.
That is, they are compressed on disk and nginx just passes them along
(using the “gzip_static” module). Because of Disarchive’s verbose and
repetitive output format, this makes for a huge reduction in storage
requirements.
> The goal here would be for the Guix project to set up infrastructure
> populating a database automatically and creating backups, possibly via
> SWH (we’ll have to discuss it with them).
>
> A plan we can already deploy would be:
>
> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
>
> 3. Add an nginx route so that /srv/disarchive is served at
> https://disarchive.guix.gnu.org.
>
> 4. Add disarchive.guix.gnu.org to (guix download).
>
> How does that sound? Thoughts?
This is great! I can offer some past metadata, too. Specifically, I
have ~14000 files that I generated while digging into SWH coverage.
(That’s a project I’d like to return to, but I’m still trying to get my
head back in the game and pick up where I left off.)
-- Tim
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-12 9:19 ` zimoun
@ 2021-10-14 14:02 ` Ludovic Courtès
2021-10-14 19:17 ` zimoun
0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:02 UTC (permalink / raw)
To: zimoun; +Cc: guix-devel
Hey!
zimoun <zimon.toutoune@gmail.com> skribis:
> Timothy made this table months ago:
>
> tar+gz 9090 52.0%
> git 5294 30.3%
> tar+xz 1184 06.8%
> tar+bz2 775 04.4%
> tar 393 02.2%
> zip 273 01.6%
> svn-multi 175 01.0%
> svn 125 00.7%
> file 51 00.3%
> computed 38 00.2%
> hg 36 00.2%
> unknown-uri 20 00.1%
> tar+gz? 15 00.1%
> tar+lz 13 00.1%
> tar+Z 4 00.0%
> cvs 3 00.0%
> bzr 3 00.0%
> tar+lzma 2 00.0%
> total 17494 100.0%
>
> What is really missing is XZ and Bzip2 support in Disarchive, I guess.
Definitely, we know what to work on next!
> Timothy was working on feeding the database using each release. Well,
> you can give a look at:
>
> <https://git.ngyro.com/preservation-of-guix>
Ah nice! I had completely overlooked this.
[...]
>> A plan we can already deploy would be:
>>
>> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>>
>> 2. On berlin, add an mcron job that periodically copies the output of
>> the latest “disarchive-collection” build to a directory, say
>> /srv/disarchive. Thus, the database would accumulate tarball
>> metadata over time.
>>
>> 3. Add an nginx route so that /srv/disarchive is served at
>> https://disarchive.guix.gnu.org.
>>
>> 4. Add disarchive.guix.gnu.org to (guix download).
>
> To replace (or add to) the current ’%disarchive-mirrors’ right?
Exactly.
> Going this road (use Cuirass), why not generating the sources.json
> similarly? Instead of the hack using the website builder.
I guess that would also work, indeed. Then we could make /source.json
redirect to ci.guix.gnu.org/whatever/latest.
> On my side, I will try to resume what I started months ago: knowing the
> SWH coverage. For instance, on this ~92% of tarballs, how many are
> currently stored into SWH? Well, do not take your breath and I would be
> happy if someone beats me. ;-)
Yup, we definitely need that kind of info now!
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-13 14:54 ` Timothy Sample
@ 2021-10-14 14:04 ` Ludovic Courtès
0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:04 UTC (permalink / raw)
To: Timothy Sample; +Cc: guix-devel
Hey Timothy!
Timothy Sample <samplet@ngyro.com> skribis:
> Fantastic! I feel bad that I left you holding the bag on this one,
> though. Sorry. I’ve been a little adrift this summer. Thanks for
> picking it up!
No problem, I’m glad to see you chime in now! :-)
> One simple but satisfying thing that I do is serve the files compressed.
> That is, they are compressed on disk and nginx just passes them along
> (using the “gzip_static” module). Because of Disarchive’s verbose and
> repetitive output format, this makes for a huge reduction in storage
> requirements.
Oh nice, thanks for sharing this tip!
> This is great! I can offer some past metadata, too. Specifically, I
> have ~14000 files that I generated while digging into SWH coverage.
> (That’s a project I’d like to return to, but I’m still trying to get my
> head back in the game and pick up where I left off.)
Alright.
Thanks!
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-12 8:41 ` Mathieu Othacehe
@ 2021-10-14 14:06 ` Ludovic Courtès
0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:06 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: guix-devel
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
> I think the easier way to proceed would be to create the "disarchive"
> specification in the (maintenance sysadmin services) module, this way:
>
> (specification
> (name "disarchive")
> (build '(manifests "etc/disarchive-manifest.scm"))
> (build-outputs
> (list
> (build-output
> (job "disarchive-collection*")
> (type "archive")
> (path ""))))
> (notifications #$(cuirass-notifications))
> (period 43200)
> (priority 7)
> (systems '("x86_64-linux")))
Thanks for the tip. I went ahead, committed it to maintenance.git and
deployed it. It works! :-)
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-09 10:05 Disarchive update Ludovic Courtès
` (2 preceding siblings ...)
2021-10-13 14:54 ` Timothy Sample
@ 2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44 ` zimoun
2021-10-21 19:44 ` Ludovic Courtès
3 siblings, 2 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:31 UTC (permalink / raw)
To: guix-devel
Hello Guix!
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
> This job is disassembling all the .tar.gz files packages refer to, using
> the recently-added ‘etc/disarchive-manifest.scm’ file:
>
> https://ci.guix.gnu.org/jobset/disarchive
[...]
> A plan we can already deploy would be:
>
> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
Done:
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=df9e9b7f51abceb5999aabc9a7b71396600cffa4
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=12195160432871b80d0e1eac996a9aa7d8500697
Sample URLs:
https://disarchive.guix.gnu.org/sha256/53cf3e14c71f3a149f29d13a0da64120b3c1d3334fba39c4af3e520be053982a
https://disarchive.guix.gnu.org/sha256/39052f59ff474a4a69cefc25cf3caf8429400889deba010ee6403ca188f8b311
https://disarchive.guix.gnu.org/sha256/03a71d53055bd9ec528d55e07afaf15c09dec9856cba734904bfd05acbc6cf12
Aren’t those Disarchive sexps really cute? :-)
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
First, there’s a script to populate the database; it copies files from
the latest successful “disarchive-collection” build to a specified
directory, gzipping them on their way. It’s atomic, so the directory in
question can be directly served by nginx or similar:
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=fb83b3d8de189c6d6c33c4cdc2ebabf6eae1463e
If you want to try it at home, just run:
./sync-disarchive-db.scm /tmp/db
It’s pretty fast! The output is only 70 MiB, now that individual files
are gzipped.
Then there’s the mcron job that runs it once a day on berlin:
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=27dc74fbe33a9d929b37994e825dc202385f87c0
We could run it as well on bayfront so we have a backup.
> 3. Add an nginx route so that /srv/disarchive is served at
> https://disarchive.guix.gnu.org.
Done here:
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=9ffb2db81a2fbee67b99c76217be874ec0fd6bde
> 4. Add disarchive.guix.gnu.org to (guix download).
Done:
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=f9a506aa6a5aaeb2c06c97d5b663d01d2103db69
As I was once again modifying files by hand to test the download
fallback mechanisms, I figured we could just as well add a variable to
enable testing, which is what I did here:
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=c4a7aa82e25503133a1bd33148d17968c899a5f5
So you can do, say:
GUIX_DOWNLOAD_FALLBACK_TEST=disarchive-mirrors guix build -S r-ebimage --check
or:
GUIX_DOWNLOAD_FALLBACK_TEST=content-addressed-mirrors guix build -S r-ebimage --check
to check whether these fallback mechanisms work as expected. (They do,
but I’ll update the ‘guix’ package because the current one has a bug
that breaks the Disarchive/SWH fallback.)
I think we’re making progress! :-)
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-14 14:02 ` Ludovic Courtès
@ 2021-10-14 19:17 ` zimoun
2021-10-21 19:41 ` Ludovic Courtès
0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-10-14 19:17 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]
Hi,
On Thu, 14 Oct 2021 at 16:02, Ludovic Courtès <ludo@gnu.org> wrote:
>> Going this road (use Cuirass), why not generating the sources.json
>> similarly? Instead of the hack using the website builder.
>
> I guess that would also work, indeed. Then we could make /source.json
> redirect to ci.guix.gnu.org/whatever/latest.
I gave a look but it is not clear yet how to do it. Pointers or tips
welcome. :-)
>> On my side, I will try to resume what I started months ago: knowing the
>> SWH coverage. For instance, on this ~92% of tarballs, how many are
>> currently stored into SWH? Well, do not take your breath and I would be
>> happy if someone beats me. ;-)
>
> Yup, we definitely need that kind of info now!
Using, the Authentication mode from SWH [1] and this trivial patch, the
rate limit is at 1200 which allows to check and archive some packages.
For instance, now,
--8<---------------cut here---------------start------------->8---
for p in $(guix package -A | cut -f1 | grep "julia-");
do
./pre-inst-env guix lint -c archival $p
;done
--8<---------------cut here---------------end--------------->8---
passes. The remaining work is to check with SWH folks for an higher
value than this 1200 limit and have a token associated to an account to
the Software Heritage Authentication service. And set a cron task
“somewhere” running:
./pre-inst-env guix lint -c archival
WDYT?
Cheers,
simon
1: <https://archive.softwareheritage.org/api/>
2: <https://archive.softwareheritage.org/oidc/login/>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: swh-auth.patch --]
[-- Type: text/x-diff, Size: 1488 bytes --]
diff --git a/guix/swh.scm b/guix/swh.scm
index 5c41685a24..1aaf733b5d 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -153,12 +153,20 @@ (define url
url
(string-append url "/")))
+(define token
+ 'xxXxxXxxXxXXXxXxXxXxXxXxxXXxXxXxXxxXXxxxxxxxXxXxXXXxXXXxXXXxXxxxXxXxXXXxXXXxXXXxXxxxXxXxXxxxXXXxXXXxxX.xxXxXXXxXxXxXxXxXxxxXxXxXxxxxXXxXxXxXxxxXXXxXXxxXxxxXXXxXxxxXXxxXXXxXXXxXXxxXxXxXXXxXxxxxxXxXxxxxXXxXxxxXXXxxXxxxxXxxxXxXXxxxxxxXXxxXxxxXxxxxXXxXxXxXXxxxxxXxxXxxxXxXXxxxxxxXXxxXxxxXXXxXxxxxXXxxXXxXxxxxXXxXxXxXxXxXXXxxXXxxXXxXxXxxxXxXxXxxXxxxxXxxXxxXxXxXxXxXXXxXXXxxXXxXxXxXXXxxXXxXxXxXXXxXXxxXxxxXXXxXXXxXXxxXXXxXxxxXXxxXXXxXxXxXxXxXXXxxXXxXxXXXxXxxXxxXxxxXXxxXxxxxxxxXXxxXxXxXxXxxxXxxxxxxxXxxXXxXxXxXXXxXxxxXxxxXxxxXXXxXxXxXxXxXxxxXxxxXXXxXxXxXxXxXXXxXxxxXXXxXxxxXXxxXXXxXxXxxXxxXxXxXxXxxxXxxxxxxXxxXXXxXXxxXxx.xxXxxxxXxxxxXxxxxXXxXXxxxXXXX-xxx_xxxXXxxxx
+ )
+
;; XXX: Work around a bug in Guile 3.0.2 where #:verify-certificate? would
;; be ignored (<https://bugs.gnu.org/40486>).
(define* (http-get* uri #:rest rest)
- (apply http-request uri #:method 'GET rest))
+ (apply http-request uri #:method 'GET
+ #:headers `((authorization . (Bearer ,token)))
+ rest))
(define* (http-post* uri #:rest rest)
- (apply http-request uri #:method 'POST rest))
+ (apply http-request uri #:method 'POST
+ #:headers `((authorization . (Bearer ,token)))
+ rest))
(define %date-regexp
;; Match strings like "2014-11-17T22:09:38+01:00" or
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-14 14:31 ` Ludovic Courtès
@ 2021-10-14 21:44 ` zimoun
2021-10-21 19:44 ` Ludovic Courtès
1 sibling, 0 replies; 15+ messages in thread
From: zimoun @ 2021-10-14 21:44 UTC (permalink / raw)
To: Ludovic Courtès, guix-devel
Hi,
On Thu, 14 Oct 2021 at 16:31, Ludovic Courtès <ludo@gnu.org> wrote:
>> 4. Add disarchive.guix.gnu.org to (guix download).
>
> Done:
[...]
> I think we’re making progress! :-)
I added the SWH authentication token in patch#51216 [1]. Using a valid
TOKEN from an account of the Software Heritage Authentication service,
it reads something along these lines,
GUIX_SWH_TOKEN=${TOKEN} guix lint -c archival
The token allows by default 1200 requests instead of 120. The
interesting thing concerning the recent Disarchive additions are these
bits:
--8<---------------cut here---------------start------------->8---
Disarchive entry refers to non-existent SWH directory 'aeae11cb3c33ab33374e222dc3bdf17039808a5b'
Disarchive entry refers to non-existent SWH directory 'b25414c9864a270899ca1ff494e7ba4c437b166d'
Disarchive entry refers to non-existent SWH directory '128bbe76a82dd0b38b725565ed703a7148257ae0'
Disarchive entry refers to non-existent SWH directory '92625e2c6dbe3ad7c4f44a061ada24ce00637087'
Disarchive entry refers to non-existent SWH directory '6000a273dfff9de62725b53e41562fff711069c1'
Disarchive entry refers to non-existent SWH directory 'c68ff8714c6fd360a38158f3d8f22e555c061452'
Disarchive entry refers to non-existent SWH directory 'cb52aaa9500df2b674bf7922811deeea1b766139'
Disarchive entry refers to non-existent SWH directory '3e574043a04d77dd7231d23210547c4fe065a40c'
Disarchive entry refers to non-existent SWH directory 'aa763150704fe06f34097b38e839409cee52366d'
Disarchive entry refers to non-existent SWH directory '127c0a03c7ccba74870aef7dac36019af35798cc'
Disarchive entry refers to non-existent SWH directory 'd9745f29da983c6ad674871e68ac96362c4f11cc'
Disarchive entry refers to non-existent SWH directory '7d7ed9f88ee649a90493f54d3988a062c3ddeafb'
Disarchive entry refers to non-existent SWH directory 'f5bd0fe7450175196c57d6f6d5aca8905393e814'
Disarchive entry refers to non-existent SWH directory '92bd3b93caa9a4b0840c70ddb96ac75b0684d7ec'
--8<---------------cut here---------------end--------------->8---
which needs investigations why the Disarchive database contains some
entries that SWH does not know. It probably means that there is an
inconsistency from sources.json.
1: <http://issues.guix.gnu.org/issue/51216>
2: <https://archive.softwareheritage.org/oidc/login/>
Cheers,
simon
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-14 19:17 ` zimoun
@ 2021-10-21 19:41 ` Ludovic Courtès
2021-10-21 19:57 ` zimoun
0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-21 19:41 UTC (permalink / raw)
To: zimoun; +Cc: guix-devel
Hi,
zimoun <zimon.toutoune@gmail.com> skribis:
> Using, the Authentication mode from SWH [1] and this trivial patch, the
> rate limit is at 1200 which allows to check and archive some packages.
> For instance, now,
>
> for p in $(guix package -A | cut -f1 | grep "julia-");
> do
> ./pre-inst-env guix lint -c archival $p
> ;done
>
> passes. The remaining work is to check with SWH folks for an higher
> value than this 1200 limit and have a token associated to an account to
> the Software Heritage Authentication service. And set a cron task
> “somewhere” running:
>
> ./pre-inst-env guix lint -c archival
>
> WDYT?
I think you made progress on this in the meantime: this is great!
Really cool of the SWH folks to give you a higher rate limit.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44 ` zimoun
@ 2021-10-21 19:44 ` Ludovic Courtès
1 sibling, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-21 19:44 UTC (permalink / raw)
To: guix-devel
Hi!
Ludovic Courtès <ludo@gnu.org> skribis:
> Then there’s the mcron job that runs it once a day on berlin:
>
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=27dc74fbe33a9d929b37994e825dc202385f87c0
>
> We could run it as well on bayfront so we have a backup.
I did that without thinking much but it won’t work: as written,
sync-disarchive-db.scm assumes ci.guix.gnu.org substitutes are
authorized, which is not the case on bayfront.
So I suppose we need to do things differently there, such as
fetching/unpacking substitutes straight from sync-disarchive-db.scm
instead of going through the daemon.
I’ll take a look sometime, but it’d be great if someone else did. :-)
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Disarchive update
2021-10-21 19:41 ` Ludovic Courtès
@ 2021-10-21 19:57 ` zimoun
0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2021-10-21 19:57 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guix Devel
Hey,
On Thu, 21 Oct 2021 at 21:41, Ludovic Courtès <ludo@gnu.org> wrote:
> Really cool of the SWH folks to give you a higher rate limit.
It is not to me particularly. :-)
Anyone can create an account via Software Heritage Authentication service.
<https://archive.softwareheritage.org/oidc/login/>
Then anyone can enjoy the 1200 rate limit.
Cheers,
simon
PS: I am in the process to ask a special rate limit higher than 1200,
but that's another story. :-)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-10-21 20:42 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
2021-10-10 13:22 ` Ludovic Courtès
2021-10-12 8:41 ` Mathieu Othacehe
2021-10-14 14:06 ` Ludovic Courtès
2021-10-12 9:19 ` zimoun
2021-10-14 14:02 ` Ludovic Courtès
2021-10-14 19:17 ` zimoun
2021-10-21 19:41 ` Ludovic Courtès
2021-10-21 19:57 ` zimoun
2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:04 ` Ludovic Courtès
2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44 ` zimoun
2021-10-21 19:44 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).