Dear, Does it make sense to add "lint -c archival" when a package is built by Cuirass? Or on the Guix Data Services? The idea behind is then to ask SWH folks to increase the rate limit for a specific IP (or couple of IPs). Today, the SWH rate is 10 save requests per hour, i.e., 240 per day (more or less). And the new chart [1] shows that there are ~2000 builds per day. Ouch! :-) [1] <https://ci.guix.gnu.org/metrics> If it is not possible, then instead does it make sense to add a script to etc/? If SWH accepts to increase the rate for a specific machine, the script (fold-packages+save-origin) could run with some delay and save all the missing Git references. Well, I do not know what the GitLab CI in Bordeaux is doing? About Guix packages because there are already some things saving requests automatically, I guess. WDYT? All the best, simon
Hello zimoun, > The idea behind is then to ask SWH folks to increase the rate limit > for a specific IP (or couple of IPs). Today, the SWH rate is 10 save > requests per hour, i.e., 240 per day (more or less). And the new > chart [1] shows that there are ~2000 builds per day. Ouch! :-) Yesterday almost 18.000 derivations were added, and even if only 10.000 were built, it is indeed quite substantial. > If it is not possible, then instead does it make sense to add a script > to etc/? If SWH accepts to increase the rate for a specific machine, > the script (fold-packages+save-origin) could run with some delay and > save all the missing Git references. Adding some sort of "post build" hook to Cuirass that would trigger an SHW archival would be possible, even though it would require to implement this mechanism. Having a cron job archiving missing references would also be possible I guess, but I may have a preference for the first option. Thanks, Mathieu -- https://othacehe.org
Hi, On Thu, 24 Sep 2020 at 09:28, Mathieu Othacehe <othacehe@gnu.org> wrote: > > The idea behind is then to ask SWH folks to increase the rate limit > > for a specific IP (or couple of IPs). Today, the SWH rate is 10 save > > requests per hour, i.e., 240 per day (more or less). And the new > > chart [1] shows that there are ~2000 builds per day. Ouch! :-) > > Yesterday almost 18.000 derivations were added, and even if only 10.000 > were built, it is indeed quite substantial. That's good news. :-) On average, it is ~2000, right? Well, we could set a limit for the extra days, sending the X first buildings where X is in agreement with SWH. It would be far from perfect and some packages would not be saved, but it seems better than the current situation (depends on the submitter/reviewer only). This would be something in the meantime; while waiting the SWH sources.json loads accepts more than 'url-fetch' sources. > > If it is not possible, then instead does it make sense to add a script > > to etc/? If SWH accepts to increase the rate for a specific machine, > > the script (fold-packages+save-origin) could run with some delay and > > save all the missing Git references. > > Adding some sort of "post build" hook to Cuirass that would trigger an > SHW archival would be possible, even though it would require to > implement this mechanism. Cool! Yakafonkon. ;-) > Having a cron job archiving missing references would also be possible I > guess, but I may have a preference for the first option. Because I am lazy, the "post build" hook appears to me more complicated to implement than a cron job with a Scheme script (that I almost already have :-)). Hey, "Now is better than never. Although never is often better than *right* now." :-) Thanks, simon
[-- Attachment #1: Type: text/plain, Size: 2318 bytes --] zimoun <zimon.toutoune@gmail.com> writes: > Does it make sense to add "lint -c archival" when a package is built > by Cuirass? Or on the Guix Data Services? > > The idea behind is then to ask SWH folks to increase the rate limit > for a specific IP (or couple of IPs). Today, the SWH rate is 10 save > requests per hour, i.e., 240 per day (more or less). And the new > chart [1] shows that there are ~2000 builds per day. Ouch! :-) > > [1] <https://ci.guix.gnu.org/metrics> > > If it is not possible, then instead does it make sense to add a script > to etc/? If SWH accepts to increase the rate for a specific machine, > the script (fold-packages+save-origin) could run with some delay and > save all the missing Git references. > > Well, I do not know what the GitLab CI in Bordeaux is doing? About > Guix packages because there are already some things saving requests > automatically, I guess. > > WDYT? So, my understanding is that Software Heritage is a potential store for source material for Guix packages. I think the majority of builds Cuirass does are because inputs change, rather than the source of a package. I'm not sure hooking this up to Cuirass would make the most sense, because of the above point. Also, unfortunately, the Guix Data Service doesn't have the ideal data for this, as it doesn't really store the package source information in the way that would be useful for this. Personally though (and I'm rather biased), I think the Guix Data Service might still be an approach. If you take the view on this that the Software Heritage is a means to a store item (which I think is right?), the Guix Data Service knows about those store items (like [1]). 1: https://data.guix.gnu.org/gnu/store/5h4dz6ild4fkida5yfv5fhh59vfd8hvk-python-boolean.py-3.6-checkout It's already storing if substitute servers have a nar for that store item, so I don't think storing if it's available elsewhere is particularly out of place. To make the information actionable though, it would be necessary to store more information about the sources for packages in the Guix Data Service database. This is much more work than just using the existing linter, but it does have the advantage that you'd be able to look at coverage statistics and things like that, which the checker doesn't really afford. Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 962 bytes --]
Hi, On Thu, 24 Sep 2020 at 21:06, Christopher Baines <mail@cbaines.net> wrote: > zimoun <zimon.toutoune@gmail.com> writes: > So, my understanding is that Software Heritage is a potential store for > source material for Guix packages. I think the majority of builds > Cuirass does are because inputs change, rather than the source of a > package. To be precise, Software Heritage stores all the upstream source codes, only. Their API entry-point for "save" is the URL of a Git or Mercurial or Subversion repository and then they ingest the content that this very URL serves. And it is not necessary to build the package to send a "save" request; "guix lint -c archival foo" sends the request for the git-reference source of Guix packages. Note that Guix does not send the result of "guix build -S" but the real upstream URL. > I'm not sure hooking this up to Cuirass would make the most sense, > because of the above point. > > Also, unfortunately, the Guix Data Service doesn't have the ideal data > for this, as it doesn't really store the package source information in > the way that would be useful for this. Somehow, the GDS has this information because it reports Lint Warnings (for example [1]: bottom "no lint warnings"). However, if I read correctly, you added the option "--no-network" to only use the linters which do not require network access. Does the GDS run the linters by itself or does it use the log from Cuirass? [1] <https://data.guix.gnu.org/revision/c385bd69ad407f608e3da3156fed0ac915574313/package/git/2.28.0> BTW, please consider the patch #43261 [2] fixing issue in the current implement of "--no-network". :-) [2] <http://issues.guix.gnu.org/issue/43261> > Personally though (and I'm rather biased), I think the Guix Data Service > might still be an approach. If you take the view on this that the > Software Heritage is a means to a store item (which I think is right?), > the Guix Data Service knows about those store items (like [1]). > > 1: https://data.guix.gnu.org/gnu/store/5h4dz6ild4fkida5yfv5fhh59vfd8hvk-python-boolean.py-3.6-checkout Currently, Guix does not provide machinery to send its source substitutes. I am not convinced it makes sense to do so. The model I am imagining is: - short term: + a script runs as a cron job to lint all the packages, say once per day (packages will be missed but it is better than what we currently have) + try to implement the save request for hg and svn (I am working on it if no one beats me :-)) - middle term: add a hook (Cuirass or GDS) to trigger action if the package passes. - long term: SWH ingest everything via sources.json Somehow, send all the source substitutes should be done once, at the moment from short to middle term. Currently, SWH ingests all the tarballs (via sources.json) and few git-reference packages: the ones when the packager/reviewer did "guix lint -c archival". I am proposing to automatize instead of relying on a packager/reviewer willing. :-) Well, with wider point of view, the hook could send a save request to SWH or we could also imagine that the hook could do whatever with the results (store item): push to somewhere or dissambles the tarball (if any) and saves it to the database (be able then to fetch from SWH). Note that the long term does not depend on the Guix side but on the SWH side. So the term could be shorter. :-) Does this make sense? > To make the information actionable though, it would be necessary to > store more information about the sources for packages in the Guix Data > Service database. > > This is much more work than just using the existing linter, but it does > have the advantage that you'd be able to look at coverage statistics and > things like that, which the checker doesn't really afford. Yes. In summary, SWH limits the number of requests per hour (10 save requests and 120 query requests) and so it is impossible to automatize the saving mechanism. I am proposing to ask them to change this rate limit for one specific trusted machine (for example, if I understand correctly, the Nix and Debian projects are doing so). Therefore, the question is: - which machine? - what is the automation process? (see above) WDYT? All the best, simon