unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Disarchive update
@ 2021-10-09 10:05 Ludovic Courtès
  2021-10-09 10:37 ` Mathieu Othacehe
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-09 10:05 UTC (permalink / raw)
  To: guix-devel

Hello Guix!

This job is disassembling all the .tar.gz files packages refer to, using
the recently-added ‘etc/disarchive-manifest.scm’ file:

  https://ci.guix.gnu.org/jobset/disarchive

It has just succeeded for the first time.  :-)

  https://ci.guix.gnu.org/eval/29213?status=succeeded

If you run:

  guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv

or:

  guix build -m etc/disarchive-manifest.scm

and if you’re patient :-), you eventually get a 579 MB directory
containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
missing tarballs are those that “disarchive disassemble” fails to
handle, for instance because it couldn’t guess what compression method
is being used.)

Where to go from here?  Timothy Sample had already set up a Disarchive
database at <https://disarchive.ngyro.com>, which (guix download) uses
as a fallback; I’m not sure exactly how it’s populated.  The goal here
would be for the Guix project to set up infrastructure populating a
database automatically and creating backups, possibly via SWH (we’ll
have to discuss it with them).

A plan we can already deploy would be:

  1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.

  2. On berlin, add an mcron job that periodically copies the output of
     the latest “disarchive-collection” build to a directory, say
     /srv/disarchive.  Thus, the database would accumulate tarball
     metadata over time.

  3. Add an nginx route so that /srv/disarchive is served at
     https://disarchive.guix.gnu.org.

  4. Add disarchive.guix.gnu.org to (guix download).

How does that sound?  Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-09 10:05 Disarchive update Ludovic Courtès
@ 2021-10-09 10:37 ` Mathieu Othacehe
  2021-10-10 13:22   ` Ludovic Courtès
  2021-10-12  9:19 ` zimoun
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Mathieu Othacehe @ 2021-10-09 10:37 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Hey Ludo,

>   https://ci.guix.gnu.org/eval/29213?status=succeeded

Nice! It looks like an expensive operation, maybe we should increase its
period to 24 hours or so?

>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.

We could add the result as a "build-product" so that it is available at:
https://ci.guix.gnu.org/search/latest/disarchive-collection. The mcron
job could use this URL to fetch the latest archive.

> How does that sound?  Thoughts?

Sounds great, happy to see more use-cases for Cuirass :)

Thanks,

Mathieu


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-09 10:37 ` Mathieu Othacehe
@ 2021-10-10 13:22   ` Ludovic Courtès
  2021-10-12  8:41     ` Mathieu Othacehe
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-10 13:22 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: guix-devel

Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>>   https://ci.guix.gnu.org/eval/29213?status=succeeded
>
> Nice! It looks like an expensive operation, maybe we should increase its
> period to 24 hours or so?

Yes, I’ve made it 12 hours now.  :-)

It shouldn’t be too expensive: there’s one derivation per tarball
disarchive and very few of them get rebuilt between subsequent
evaluations; disarchive-collection.drv depends on all of them.

However, I think the current model of Cuirass means that those
intermediate derivations aren’t retrieved on berlin so we’re potentially
building things multiple times?

>>   2. On berlin, add an mcron job that periodically copies the output of
>>      the latest “disarchive-collection” build to a directory, say
>>      /srv/disarchive.  Thus, the database would accumulate tarball
>>      metadata over time.
>
> We could add the result as a "build-product" so that it is available at:
> https://ci.guix.gnu.org/search/latest/disarchive-collection. The mcron
> job could use this URL to fetch the latest archive.

That’d be nice!  How do we do that again?

I was planning on retrieving the derivation file name in the mcron job
using the (guix ci) API, but having a build product may simplify things
a bit.

Thanks for your feedback!

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-10 13:22   ` Ludovic Courtès
@ 2021-10-12  8:41     ` Mathieu Othacehe
  2021-10-14 14:06       ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: Mathieu Othacehe @ 2021-10-12  8:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Hey,

> That’d be nice!  How do we do that again?

The build-outputs field of the <specification> record must be used as
explained here:
https://guix.gnu.org/cuirass/manual/html_node/Specifications.html#Specifications.

This field cannot be manipulated via the web interface yet.

I think the easier way to proceed would be to create the "disarchive"
specification in the (maintenance sysadmin services) module, this way:

--8<---------------cut here---------------start------------->8---
(specification
 (name "disarchive")
 (build '(manifests "etc/disarchive-manifest.scm"))
 (build-outputs
  (list
   (build-output
    (job "disarchive-collection*")
    (type "archive")
    (path ""))))
 (notifications #$(cuirass-notifications))
 (period 43200)
 (priority 7)
 (systems '("x86_64-linux")))
--8<---------------cut here---------------end--------------->8---

I can take care of that if it's ok for you.

Thanks,

Mathieu


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-09 10:05 Disarchive update Ludovic Courtès
  2021-10-09 10:37 ` Mathieu Othacehe
@ 2021-10-12  9:19 ` zimoun
  2021-10-14 14:02   ` Ludovic Courtès
  2021-10-13 14:54 ` Timothy Sample
  2021-10-14 14:31 ` Ludovic Courtès
  3 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-10-12  9:19 UTC (permalink / raw)
  To: Ludovic Courtès, guix-devel

Hi Ludo,

On Sat, 09 Oct 2021 at 12:05, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

> If you run:
>
>   guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv

Oh, cool!

> and if you’re patient :-), you eventually get a 579 MB directory
> containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
> missing tarballs are those that “disarchive disassemble” fails to
> handle, for instance because it couldn’t guess what compression method
> is being used.)

Timothy made this table months ago:

        tar+gz        9090  52.0%
        git           5294  30.3%
        tar+xz        1184  06.8%
        tar+bz2        775  04.4%
        tar            393  02.2%
        zip            273  01.6%
        svn-multi      175  01.0%
        svn            125  00.7%
        file            51  00.3%
        computed        38  00.2%
        hg              36  00.2%
        unknown-uri     20  00.1%
        tar+gz?         15  00.1%
        tar+lz          13  00.1%
        tar+Z            4  00.0%
        cvs              3  00.0%
        bzr              3  00.0%
        tar+lzma         2  00.0%
        total        17494 100.0%

What is really missing is XZ and Bzip2 support in Disarchive, I guess.


> Where to go from here?  Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated.  The goal here
> would be for the Guix project to set up infrastructure populating a
> database automatically and creating backups, possibly via SWH (we’ll
> have to discuss it with them).

Timothy was working on feeding the database using each release.  Well,
you can give a look at:

<https://git.ngyro.com/preservation-of-guix>

Then something along these lines:

    $ sqlite3 /tmp/pog.db < schema.sql
    $ guix repl -L . <(echo '
          (use-modules (pog))
          (ingest "6298c3ffd9654d3231a6f25390b056483e8f407c"
                  "/tmp/pog.db")
      ')

for where the commit hash corresponds to v1.0.0.  I do not know if it
would be equivalent to run:

   guix time-machine --commit=6298c3ffd9654d3231a6f25390b056483e8f407c \
        -- build -m etc/disarchive-manifest.scm


> A plan we can already deploy would be:
>
>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.
>
>   3. Add an nginx route so that /srv/disarchive is served at
>      https://disarchive.guix.gnu.org.
>
>   4. Add disarchive.guix.gnu.org to (guix download).

To replace (or add to) the current ’%disarchive-mirrors’ right?

Going this road (use Cuirass), why not generating the sources.json
similarly?   Instead of the hack using the website builder.


On my side, I will try to resume what I started months ago: knowing the
SWH coverage.  For instance, on this ~92% of tarballs, how many are
currently stored into SWH?  Well, do not take your breath and I would be
happy if someone beats me. ;-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-09 10:05 Disarchive update Ludovic Courtès
  2021-10-09 10:37 ` Mathieu Othacehe
  2021-10-12  9:19 ` zimoun
@ 2021-10-13 14:54 ` Timothy Sample
  2021-10-14 14:04   ` Ludovic Courtès
  2021-10-14 14:31 ` Ludovic Courtès
  3 siblings, 1 reply; 15+ messages in thread
From: Timothy Sample @ 2021-10-13 14:54 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi Ludovic,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> This job is disassembling all the .tar.gz files packages refer to, using
> the recently-added ‘etc/disarchive-manifest.scm’ file:
>
>   https://ci.guix.gnu.org/jobset/disarchive
>
> It has just succeeded for the first time.  :-)

Fantastic!  I feel bad that I left you holding the bag on this one,
though.  Sorry.  I’ve been a little adrift this summer.  Thanks for
picking it up!

> Where to go from here?  Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated.

Basically the same as what you are doing now.  I have many Cuirass jobs,
and I use the build outputs mechanism (mentioned by Mathieu in elsewhere
in this thread).  I don’t have a “disarchive-collection” job, so I have
to use the Cuirass API to dig through the recent build outputs to find
new results.  This happens from a cron job, which uploads each new
result to my server.

One simple but satisfying thing that I do is serve the files compressed.
That is, they are compressed on disk and nginx just passes them along
(using the “gzip_static” module).  Because of Disarchive’s verbose and
repetitive output format, this makes for a huge reduction in storage
requirements.

> The goal here would be for the Guix project to set up infrastructure
> populating a database automatically and creating backups, possibly via
> SWH (we’ll have to discuss it with them).
>
> A plan we can already deploy would be:
>
>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.
>
>   3. Add an nginx route so that /srv/disarchive is served at
>      https://disarchive.guix.gnu.org.
>
>   4. Add disarchive.guix.gnu.org to (guix download).
>
> How does that sound?  Thoughts?

This is great!  I can offer some past metadata, too.  Specifically, I
have ~14000 files that I generated while digging into SWH coverage.
(That’s a project I’d like to return to, but I’m still trying to get my
head back in the game and pick up where I left off.)


-- Tim


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-12  9:19 ` zimoun
@ 2021-10-14 14:02   ` Ludovic Courtès
  2021-10-14 19:17     ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:02 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hey!

zimoun <zimon.toutoune@gmail.com> skribis:

> Timothy made this table months ago:
>
>         tar+gz        9090  52.0%
>         git           5294  30.3%
>         tar+xz        1184  06.8%
>         tar+bz2        775  04.4%
>         tar            393  02.2%
>         zip            273  01.6%
>         svn-multi      175  01.0%
>         svn            125  00.7%
>         file            51  00.3%
>         computed        38  00.2%
>         hg              36  00.2%
>         unknown-uri     20  00.1%
>         tar+gz?         15  00.1%
>         tar+lz          13  00.1%
>         tar+Z            4  00.0%
>         cvs              3  00.0%
>         bzr              3  00.0%
>         tar+lzma         2  00.0%
>         total        17494 100.0%
>
> What is really missing is XZ and Bzip2 support in Disarchive, I guess.

Definitely, we know what to work on next!

> Timothy was working on feeding the database using each release.  Well,
> you can give a look at:
>
> <https://git.ngyro.com/preservation-of-guix>

Ah nice!  I had completely overlooked this.

[...]

>> A plan we can already deploy would be:
>>
>>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>>
>>   2. On berlin, add an mcron job that periodically copies the output of
>>      the latest “disarchive-collection” build to a directory, say
>>      /srv/disarchive.  Thus, the database would accumulate tarball
>>      metadata over time.
>>
>>   3. Add an nginx route so that /srv/disarchive is served at
>>      https://disarchive.guix.gnu.org.
>>
>>   4. Add disarchive.guix.gnu.org to (guix download).
>
> To replace (or add to) the current ’%disarchive-mirrors’ right?

Exactly.

> Going this road (use Cuirass), why not generating the sources.json
> similarly?   Instead of the hack using the website builder.

I guess that would also work, indeed.  Then we could make /source.json
redirect to ci.guix.gnu.org/whatever/latest.

> On my side, I will try to resume what I started months ago: knowing the
> SWH coverage.  For instance, on this ~92% of tarballs, how many are
> currently stored into SWH?  Well, do not take your breath and I would be
> happy if someone beats me. ;-)

Yup, we definitely need that kind of info now!

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-13 14:54 ` Timothy Sample
@ 2021-10-14 14:04   ` Ludovic Courtès
  0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:04 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hey Timothy!

Timothy Sample <samplet@ngyro.com> skribis:

> Fantastic!  I feel bad that I left you holding the bag on this one,
> though.  Sorry.  I’ve been a little adrift this summer.  Thanks for
> picking it up!

No problem, I’m glad to see you chime in now!  :-)

> One simple but satisfying thing that I do is serve the files compressed.
> That is, they are compressed on disk and nginx just passes them along
> (using the “gzip_static” module).  Because of Disarchive’s verbose and
> repetitive output format, this makes for a huge reduction in storage
> requirements.

Oh nice, thanks for sharing this tip!

> This is great!  I can offer some past metadata, too.  Specifically, I
> have ~14000 files that I generated while digging into SWH coverage.
> (That’s a project I’d like to return to, but I’m still trying to get my
> head back in the game and pick up where I left off.)

Alright.

Thanks!

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-12  8:41     ` Mathieu Othacehe
@ 2021-10-14 14:06       ` Ludovic Courtès
  0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:06 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: guix-devel

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

> I think the easier way to proceed would be to create the "disarchive"
> specification in the (maintenance sysadmin services) module, this way:
>
> (specification
>  (name "disarchive")
>  (build '(manifests "etc/disarchive-manifest.scm"))
>  (build-outputs
>   (list
>    (build-output
>     (job "disarchive-collection*")
>     (type "archive")
>     (path ""))))
>  (notifications #$(cuirass-notifications))
>  (period 43200)
>  (priority 7)
>  (systems '("x86_64-linux")))

Thanks for the tip.  I went ahead, committed it to maintenance.git and
deployed it.  It works!  :-)

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-09 10:05 Disarchive update Ludovic Courtès
                   ` (2 preceding siblings ...)
  2021-10-13 14:54 ` Timothy Sample
@ 2021-10-14 14:31 ` Ludovic Courtès
  2021-10-14 21:44   ` zimoun
  2021-10-21 19:44   ` Ludovic Courtès
  3 siblings, 2 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-14 14:31 UTC (permalink / raw)
  To: guix-devel

Hello Guix!

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> This job is disassembling all the .tar.gz files packages refer to, using
> the recently-added ‘etc/disarchive-manifest.scm’ file:
>
>   https://ci.guix.gnu.org/jobset/disarchive

[...]

> A plan we can already deploy would be:
>
>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.

Done:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=df9e9b7f51abceb5999aabc9a7b71396600cffa4
  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=12195160432871b80d0e1eac996a9aa7d8500697

Sample URLs:

  https://disarchive.guix.gnu.org/sha256/53cf3e14c71f3a149f29d13a0da64120b3c1d3334fba39c4af3e520be053982a
  https://disarchive.guix.gnu.org/sha256/39052f59ff474a4a69cefc25cf3caf8429400889deba010ee6403ca188f8b311
  https://disarchive.guix.gnu.org/sha256/03a71d53055bd9ec528d55e07afaf15c09dec9856cba734904bfd05acbc6cf12

Aren’t those Disarchive sexps really cute? :-)

>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.

First, there’s a script to populate the database; it copies files from
the latest successful “disarchive-collection” build to a specified
directory, gzipping them on their way.  It’s atomic, so the directory in
question can be directly served by nginx or similar:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=fb83b3d8de189c6d6c33c4cdc2ebabf6eae1463e

If you want to try it at home, just run:

  ./sync-disarchive-db.scm /tmp/db

It’s pretty fast!  The output is only 70 MiB, now that individual files
are gzipped.

Then there’s the mcron job that runs it once a day on berlin:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=27dc74fbe33a9d929b37994e825dc202385f87c0

We could run it as well on bayfront so we have a backup.

>   3. Add an nginx route so that /srv/disarchive is served at
>      https://disarchive.guix.gnu.org.

Done here:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=9ffb2db81a2fbee67b99c76217be874ec0fd6bde

>   4. Add disarchive.guix.gnu.org to (guix download).

Done:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=f9a506aa6a5aaeb2c06c97d5b663d01d2103db69

As I was once again modifying files by hand to test the download
fallback mechanisms, I figured we could just as well add a variable to
enable testing, which is what I did here:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=c4a7aa82e25503133a1bd33148d17968c899a5f5

So you can do, say:

  GUIX_DOWNLOAD_FALLBACK_TEST=disarchive-mirrors guix build -S r-ebimage --check

or:

  GUIX_DOWNLOAD_FALLBACK_TEST=content-addressed-mirrors guix build -S r-ebimage --check

to check whether these fallback mechanisms work as expected.  (They do,
but I’ll update the ‘guix’ package because the current one has a bug
that breaks the Disarchive/SWH fallback.)

I think we’re making progress!  :-)

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-14 14:02   ` Ludovic Courtès
@ 2021-10-14 19:17     ` zimoun
  2021-10-21 19:41       ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-10-14 19:17 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

Hi,

On Thu, 14 Oct 2021 at 16:02, Ludovic Courtès <ludo@gnu.org> wrote:

>> Going this road (use Cuirass), why not generating the sources.json
>> similarly?   Instead of the hack using the website builder.
>
> I guess that would also work, indeed.  Then we could make /source.json
> redirect to ci.guix.gnu.org/whatever/latest.

I gave a look but it is not clear yet how to do it.  Pointers or tips
welcome. :-)


>> On my side, I will try to resume what I started months ago: knowing the
>> SWH coverage.  For instance, on this ~92% of tarballs, how many are
>> currently stored into SWH?  Well, do not take your breath and I would be
>> happy if someone beats me. ;-)
>
> Yup, we definitely need that kind of info now!

Using, the Authentication mode from SWH [1] and this trivial patch, the
rate limit is at 1200 which allows to check and archive some packages.
For instance, now,

--8<---------------cut here---------------start------------->8---
for p in $(guix package -A | cut -f1 | grep "julia-");
do
   ./pre-inst-env guix lint -c archival $p
;done
--8<---------------cut here---------------end--------------->8---

passes.  The remaining work is to check with SWH folks for an higher
value than this 1200 limit and have a token associated to an account to
the Software Heritage Authentication service.  And set a cron task
“somewhere” running:

   ./pre-inst-env guix lint -c archival

WDYT?


Cheers,
simon

1: <https://archive.softwareheritage.org/api/>
2: <https://archive.softwareheritage.org/oidc/login/>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: swh-auth.patch --]
[-- Type: text/x-diff, Size: 1488 bytes --]

diff --git a/guix/swh.scm b/guix/swh.scm
index 5c41685a24..1aaf733b5d 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -153,12 +153,20 @@ (define url
       url
       (string-append url "/")))
 
+(define token
+  'xxXxxXxxXxXXXxXxXxXxXxXxxXXxXxXxXxxXXxxxxxxxXxXxXXXxXXXxXXXxXxxxXxXxXXXxXXXxXXXxXxxxXxXxXxxxXXXxXXXxxX.xxXxXXXxXxXxXxXxXxxxXxXxXxxxxXXxXxXxXxxxXXXxXXxxXxxxXXXxXxxxXXxxXXXxXXXxXXxxXxXxXXXxXxxxxxXxXxxxxXXxXxxxXXXxxXxxxxXxxxXxXXxxxxxxXXxxXxxxXxxxxXXxXxXxXXxxxxxXxxXxxxXxXXxxxxxxXXxxXxxxXXXxXxxxxXXxxXXxXxxxxXXxXxXxXxXxXXXxxXXxxXXxXxXxxxXxXxXxxXxxxxXxxXxxXxXxXxXxXXXxXXXxxXXxXxXxXXXxxXXxXxXxXXXxXXxxXxxxXXXxXXXxXXxxXXXxXxxxXXxxXXXxXxXxXxXxXXXxxXXxXxXXXxXxxXxxXxxxXXxxXxxxxxxxXXxxXxXxXxXxxxXxxxxxxxXxxXXxXxXxXXXxXxxxXxxxXxxxXXXxXxXxXxXxXxxxXxxxXXXxXxXxXxXxXXXxXxxxXXXxXxxxXXxxXXXxXxXxxXxxXxXxXxXxxxXxxxxxxXxxXXXxXXxxXxx.xxXxxxxXxxxxXxxxxXXxXXxxxXXXX-xxx_xxxXXxxxx
+  )
+
 ;; XXX: Work around a bug in Guile 3.0.2 where #:verify-certificate? would
 ;; be ignored (<https://bugs.gnu.org/40486>).
 (define* (http-get* uri #:rest rest)
-  (apply http-request uri #:method 'GET rest))
+  (apply http-request uri #:method 'GET
+         #:headers `((authorization . (Bearer ,token)))
+         rest))
 (define* (http-post* uri #:rest rest)
-  (apply http-request uri #:method 'POST rest))
+  (apply http-request uri #:method 'POST
+         #:headers `((authorization . (Bearer ,token)))
+         rest))
 
 (define %date-regexp
   ;; Match strings like "2014-11-17T22:09:38+01:00" or

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-14 14:31 ` Ludovic Courtès
@ 2021-10-14 21:44   ` zimoun
  2021-10-21 19:44   ` Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: zimoun @ 2021-10-14 21:44 UTC (permalink / raw)
  To: Ludovic Courtès, guix-devel

Hi,

On Thu, 14 Oct 2021 at 16:31, Ludovic Courtès <ludo@gnu.org> wrote:

>>   4. Add disarchive.guix.gnu.org to (guix download).
>
> Done:

[...]

> I think we’re making progress!  :-)

I added the SWH authentication token in patch#51216 [1].  Using a valid
TOKEN from an account of the Software Heritage Authentication service,
it reads something along these lines,

   GUIX_SWH_TOKEN=${TOKEN} guix lint -c archival

The token allows by default 1200 requests instead of 120.  The
interesting thing concerning the recent Disarchive additions are these
bits:

--8<---------------cut here---------------start------------->8---
Disarchive entry refers to non-existent SWH directory 'aeae11cb3c33ab33374e222dc3bdf17039808a5b'
Disarchive entry refers to non-existent SWH directory 'b25414c9864a270899ca1ff494e7ba4c437b166d'
Disarchive entry refers to non-existent SWH directory '128bbe76a82dd0b38b725565ed703a7148257ae0'
Disarchive entry refers to non-existent SWH directory '92625e2c6dbe3ad7c4f44a061ada24ce00637087'
Disarchive entry refers to non-existent SWH directory '6000a273dfff9de62725b53e41562fff711069c1'
Disarchive entry refers to non-existent SWH directory 'c68ff8714c6fd360a38158f3d8f22e555c061452'
Disarchive entry refers to non-existent SWH directory 'cb52aaa9500df2b674bf7922811deeea1b766139'
Disarchive entry refers to non-existent SWH directory '3e574043a04d77dd7231d23210547c4fe065a40c'
Disarchive entry refers to non-existent SWH directory 'aa763150704fe06f34097b38e839409cee52366d'
Disarchive entry refers to non-existent SWH directory '127c0a03c7ccba74870aef7dac36019af35798cc'
Disarchive entry refers to non-existent SWH directory 'd9745f29da983c6ad674871e68ac96362c4f11cc'
Disarchive entry refers to non-existent SWH directory '7d7ed9f88ee649a90493f54d3988a062c3ddeafb'
Disarchive entry refers to non-existent SWH directory 'f5bd0fe7450175196c57d6f6d5aca8905393e814'
Disarchive entry refers to non-existent SWH directory '92bd3b93caa9a4b0840c70ddb96ac75b0684d7ec'
--8<---------------cut here---------------end--------------->8---

which needs investigations why the Disarchive database contains some
entries that SWH does not know.  It probably means that there is an
inconsistency from sources.json.

1: <http://issues.guix.gnu.org/issue/51216>
2: <https://archive.softwareheritage.org/oidc/login/>

Cheers,
simon


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-14 19:17     ` zimoun
@ 2021-10-21 19:41       ` Ludovic Courtès
  2021-10-21 19:57         ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-21 19:41 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> Using, the Authentication mode from SWH [1] and this trivial patch, the
> rate limit is at 1200 which allows to check and archive some packages.
> For instance, now,
>
> for p in $(guix package -A | cut -f1 | grep "julia-");
> do
>    ./pre-inst-env guix lint -c archival $p
> ;done
>
> passes.  The remaining work is to check with SWH folks for an higher
> value than this 1200 limit and have a token associated to an account to
> the Software Heritage Authentication service.  And set a cron task
> “somewhere” running:
>
>    ./pre-inst-env guix lint -c archival
>
> WDYT?

I think you made progress on this in the meantime: this is great!
Really cool of the SWH folks to give you a higher rate limit.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-14 14:31 ` Ludovic Courtès
  2021-10-14 21:44   ` zimoun
@ 2021-10-21 19:44   ` Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-10-21 19:44 UTC (permalink / raw)
  To: guix-devel

Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

> Then there’s the mcron job that runs it once a day on berlin:
>
>   https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=27dc74fbe33a9d929b37994e825dc202385f87c0
>
> We could run it as well on bayfront so we have a backup.

I did that without thinking much but it won’t work: as written,
sync-disarchive-db.scm assumes ci.guix.gnu.org substitutes are
authorized, which is not the case on bayfront.

So I suppose we need to do things differently there, such as
fetching/unpacking substitutes straight from sync-disarchive-db.scm
instead of going through the daemon.

I’ll take a look sometime, but it’d be great if someone else did.  :-)

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Disarchive update
  2021-10-21 19:41       ` Ludovic Courtès
@ 2021-10-21 19:57         ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2021-10-21 19:57 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hey,

On Thu, 21 Oct 2021 at 21:41, Ludovic Courtès <ludo@gnu.org> wrote:

> Really cool of the SWH folks to give you a higher rate limit.

It is not to me particularly. :-)
Anyone can create an account via Software Heritage Authentication service.

<https://archive.softwareheritage.org/oidc/login/>

Then anyone can enjoy the 1200 rate limit.

Cheers,
simon

PS: I am in the process to ask a special rate limit higher than 1200,
but that's another story. :-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-21 20:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
2021-10-10 13:22   ` Ludovic Courtès
2021-10-12  8:41     ` Mathieu Othacehe
2021-10-14 14:06       ` Ludovic Courtès
2021-10-12  9:19 ` zimoun
2021-10-14 14:02   ` Ludovic Courtès
2021-10-14 19:17     ` zimoun
2021-10-21 19:41       ` Ludovic Courtès
2021-10-21 19:57         ` zimoun
2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:04   ` Ludovic Courtès
2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44   ` zimoun
2021-10-21 19:44   ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).