* Reproductibility, Data Services, guix weather
@ 2020-10-12 21:40 zimoun
2020-10-13 19:41 ` Christopher Baines
2020-10-16 10:17 ` Ludovic Courtès
0 siblings, 2 replies; 6+ messages in thread
From: zimoun @ 2020-10-12 21:40 UTC (permalink / raw)
To: Guix Devel, Christopher Baines
[-- Attachment #1: Type: text/plain, Size: 7198 bytes --]
Dear,
Recently, we discovered a regression in the Haskell build system:
introducing unreproducible builds. Well, it was a kind of luck: I was
testing ’git-annex’ with the willing to have ’git-annex-assitant’
building it several times (--check) [1].
Aside this particular issue, ~10% of packages are not reproducible and I
am not convinced that “--check” is done by submitter/committer at each
update or new package. Otherwise the case of unreproducible Mesa [2]
would raised before than June. :-) (That’s fine, we need package after
all and we cannot fix the world all in the same time. :-))
The issue is to be able to find them. I proposed (below) to run cron
task doing ’--check’ on the build farms and then report by email the
failure. Chris indicated me the work they is doing [3] and instead of a
cron task, they is proposing to parse the JSON. That’s what the tiny
script attached is doing.
guix repl -L . -- weather-repro.scm
For example, I run:
guix repl -L . -- weather-repro.scm | sort | grep ghc
to list (almost) all the unreproducible Haskell packages. What I would
like is to be able to filter by build system for example.
First, Chris could you add the fields package name and version? Because
it is hard to automatically reconstruct them by parsing the output-path.
Second, the revision of <https://data.guix-patches.cbaines.net/revision>
does not match the Guix commit. Is it possible to have a bridge? Other
said, how is computed this revision hash?
(A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
does not seems founding in Guix repo.)
Third, this tiny script is better than nothing but *far far away* form
perfect. The question about tooling is: does it make sense to include
something like that directly in “guix weather”? For example,
guix weather --reproducible
or maybe under “guix challenge”?
WDYT? Feedback and ideas are very welcome. :-)
All the best,
simon
PS: Below my question and the Chris’s answer. Both deserve to be public
as Chris told me. :-)
1: <http://issues.guix.gnu.org/issue/43843>
2: <http://issues.guix.gnu.org/issue/42139>
3: <https://data.guix-patches.cbaines.net/revision/6cf35799dec60723f37d83a559429aa8b90482d5/package-derivation-outputs?search_query=&output_consistency=not-matching&system=x86_64-linux&target=none&after_path=&limit_results=10>
-------------------- Start of forwarded message --------------------
From: zimoun <zimon.toutoune@gmail.com>
Subject: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:19:26 +0200
Hi,
Currently, it is hard to catch:
1. which commit breaks which package
2. if the package builds reproductibly
Even if the Data services helps, *a lot!*. There are still a lot of
manual actions to spot one or the other. And I fully agree that the
work initiated by Chris is The Right Thing©.
However it is not ready and the man power is not extensible. For the
#1, Danny have started a discussion.
For the #2, I am proposing to add a cron task on one build-farm. To be
concrete, let’s *randomly* pick 100 packages once a week, rebuild with
“--check“ and send by email the unreproducible packages.
Even, I am proposing: 1rst week 100 packages of build-system “foo”, 2nd
week 100 packages of build-system “bar”, 3rd week…
It is far from perfect but it seems a good heuristic to catch
regression, spot packages with reproducibility troubles, etc. Note that
it should not happen since the committer should catch the
reproducibility issue; but as a matter of fact it is not the case.
Somehow, I am proposing a workaround.
I volunteer to be the recipient of these automatic emails, then I can do
some triage (remove false-positive, check what’s going, etc.) and open
a bug report if there is an issue.
Currently, I do not have the CPU power to do so. So I am asking if it
possible to put something like that on one of the building machines. I
totally understand an answer as: « Simon, you are enthusiast and that’s
nice but no and go to hell! » :-)
Cheers,
simon
-------------------- End of forwarded message --------------------
-------------------- Start of forwarded message --------------------
From: Christopher Baines <mail@cbaines.net>
Subject: Re: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:38:03 +0100
> Currently, I do not have the CPU power to do so. So I am asking if it
> possible to put something like that on one of the building machines. I
> totally understand an answer as: « Simon, you are enthusiast and that’s
> nice but no and go to hell! » :-)
I too would really like to be able to identify/prevent regressions,
including with respect to build reproducibility, and although the work
I'm doing on this is going slowly I'm hoping that with the Guix Build
Coordinator now I'll be able to get something sort of working.
I've just made a few tweaks to the Guix Data Service to make the data it
has on this a little easier to use.
This URL [1] should show you package reproducibility stats for each
architecture, computed from substitute data from ci.guix.gnu.org,
bayfront.guix.gnu.org, guix.tobias.gr and guix.cbaines.net. Currently
there seem to be 1515 outputs (so not exactly packages, but close) that
don't seem to have built reproducibly.
1: https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-reproducibility
Clicking through to the "Not matching" ones for x86_64-linux should give
you this URL [2]. In case it's useful to have this data in a more
machine readable form, I've added a JSON output option.
2 :https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-derivation-outputs?output_consistency=not-matching&system=x86_64-linux
You mention triage, and that's probably the biggest blocker to being
able to methodically try and reduce the "Not matching" numbers on
[1]. As far as I know, Debian has things like [3] and [4] to help with
that.
3: https://tests.reproducible-builds.org/debian/index_issues.html
4: https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/master/issues.yml
Going back to the issue of a cron job to run guix build --check on some
random packages and send emails, if you're looking for a list of
packages (well actually outputs) which don't build reproducibly, then
[2] might do?
The JSON output doesn't contain the package names, but it probably could
with only a little effort. If it did, you could download the JSON file
for all the non-matching package outputs, and record the package names
in a sorted list in a Git repository. If you do that every day, then you
could read the git log to spot potential patterns/regressions.
I don't think your email hit a mailing list, feel free to send my reply
to one though, maybe guix-devel as this discussion probably deserves a
wide audience.
Thanks,
Chris
-------------------- End of forwarded message --------------------
[-- Attachment #2: weather-repro.scm --]
[-- Type: application/octet-stream, Size: 4804 bytes --]
(define-module (weather-repro)
#:use-module (json)
#:use-module ((guix i18n) #:select (G_))
#:use-module ((guix diagnostics) #:select (leave))
#:use-module ((guix describe) #:select (current-profile))
#:use-module (guix channels)
#:use-module (guix utils) ;package-name->name+version, version-*
#:use-module ((guix build download) #:select (url-fetch))
;#:use-module (srfi srfi-11) ; let-values
#:use-module (ice-9 match)
#:use-module (srfi srfi-1) ; fold
#:use-module (srfi srfi-37) ; parse command line (option)
)
\f
(define %temporary-directory
;; Temporary directory.
(or (getenv "TMPDIR") "/tmp"))
(define %prefix-url
"https://data.guix-patches.cbaines.net/revision")
(define %suffix-url
"output_consistency=not-matching&target=none&all_results=on")
(define %json-name "package-derivation-outputs.json")
\f
(define* (json-url revision
#:optional
(system (%current-system))
(json %json-name))
"Return the URL corresponding to REVISION."
(string-append
%prefix-url "/" revision "/" json "?" %suffix-url"&system=" system))
(define* (json-file revision
#:optional
(system (%current-system))
(name %json-name)
(tmp %temporary-directory))
"Path where the JSON is stored.
By default in %TEMPORARY-DIRECTORY/REV-%JSON-NAME."
(let ((hash (substring revision 0 6)))
(string-append tmp "/" hash "-" name "-" system)))
(define (fetch-json revision system)
"Fetch the JSON file from the Data Service corresponding to REVISION.
Store the result in %TEMPORARY-DIRECTORY."
(let* ((out (json-file revision system))
(url (json-url revision system)))
(url-fetch url out)))
(define (json-file! revision system)
"Return the JSON filename corresponding to REVISION.
If the JSON file does not exist locally, then fetch it."
(let ((json (json-file revision system)))
(when (not (access? json F_OK))
(fetch-json revision system))
json))
\f
(define (read-paths revision system)
"Return the list corresponding to REVISION of all the store items from JSON
not matching between the build farms."
(map (lambda (elem) (assoc-ref elem "path"))
(vector->list
(assoc-ref
(call-with-input-file (json-file! revision system) json->scm)
"store_paths"))))
(define (path->package-version gnu-store-hash-package-version)
"Return PACKAGE-VERSION from /gnu/store/hash-package-version."
(let* ((split (string-split gnu-store-hash-package-version #\-))
(package-version-list (cdr split))
(package-version (string-join package-version-list "-")))
package-version))
(define (list-packages revision system)
(map path->package-version (read-paths revision system)))
\f
(define (show-help)
(display (G_ "Usage: guix repl -L /path/to/modules -- weather-repro.scm [OPTION]...
Tiny tools to find unreproducible packages."))
(newline)
(display (G_ "
-h, --help display this help and exit"))
(display (G_ "
-c,--commit revision to check"))
(display (G_ "
-s, --system system to check"))
(newline))
(define %options
(list (option '(#\h "help") #f #f
(lambda args
(show-help)
(exit 0)))
(option '(#\c "commit") #t #f
(lambda (opt name arg result . rest)
(format #t (G_ "commit: ~A~%") arg)
(alist-cons 'commit arg result)))
(option '(#\s "system") #t #f
(lambda (opt name arg result . rest)
(format #t (G_ "system: ~A~%") arg)
(alist-cons 'system arg result)))))
\f
;;;
;;; Entry point
;;;
(define (main . args)
(define opts
(args-fold args %options
(lambda (opt name arg loads)
(leave (G_ "~A: unrecognized option~%") name))
(lambda (op loads) (cons op loads))
'()))
(format #t "~a\n" opts)
(let ((commit (or (assoc-ref opts 'commit)
;; Not the commit hash as in Guix
"6cf35799dec60723f37d83a559429aa8b90482d5"))
;; (match (current-profile)
;; (#f %guix-version) ;for lack of a better ID
;; (profile
;; (let ((channel
;; (find guix-channel? (profile-channels profile))))
;; (channel-commit channel))))
(system (or (assoc-ref opts 'system)
(%current-system))))
(map (lambda (name)
(format #t "~a\n" name))
(list-packages commit system)))
(exit 0))
(apply main (cdr (command-line)))
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Reproductibility, Data Services, guix weather
2020-10-12 21:40 Reproductibility, Data Services, guix weather zimoun
@ 2020-10-13 19:41 ` Christopher Baines
2020-10-14 20:17 ` zimoun
2020-10-16 10:17 ` Ludovic Courtès
1 sibling, 1 reply; 6+ messages in thread
From: Christopher Baines @ 2020-10-13 19:41 UTC (permalink / raw)
To: zimoun; +Cc: Guix Devel
[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]
zimoun <zimon.toutoune@gmail.com> writes:
> The issue is to be able to find them. I proposed (below) to run cron
> task doing ’--check’ on the build farms and then report by email the
> failure. Chris indicated me the work they is doing [3] and instead of a
> cron task, they is proposing to parse the JSON. That’s what the tiny
> script attached is doing.
>
> guix repl -L . -- weather-repro.scm
>
> For example, I run:
>
> guix repl -L . -- weather-repro.scm | sort | grep ghc
>
> to list (almost) all the unreproducible Haskell packages. What I would
> like is to be able to filter by build system for example.
>
>
> First, Chris could you add the fields package name and version? Because
> it is hard to automatically reconstruct them by parsing the output-path.
Done in [1], and I've updated data.guix-patches.cbaines.net.
1: https://git.savannah.gnu.org/cgit/guix/data-service.git/commit/?id=f15dc5ab0b48f4228a3c545052a1e4daf3e80f15
> Second, the revision of <https://data.guix-patches.cbaines.net/revision>
> does not match the Guix commit. Is it possible to have a bridge? Other
> said, how is computed this revision hash?
>
> (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
> does not seems founding in Guix repo.)
So, that particular commit is just some revision of Guix with some
patches applied. I picked it because it was the most recent
data. There's now recent commits for the master branch itself [2] like
[3].
2: https://data.guix-patches.cbaines.net/repository/2/branch/master
3: https://data.guix-patches.cbaines.net/revision/ec82d58526c27a9ca26f6c5e39cec90a48cbc1cc
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Reproductibility, Data Services, guix weather
2020-10-13 19:41 ` Christopher Baines
@ 2020-10-14 20:17 ` zimoun
2020-10-15 8:45 ` Christopher Baines
0 siblings, 1 reply; 6+ messages in thread
From: zimoun @ 2020-10-14 20:17 UTC (permalink / raw)
To: Christopher Baines; +Cc: Guix Devel
Hi Chris,
On Tue, 13 Oct 2020 at 21:41, Christopher Baines <mail@cbaines.net> wrote:
> > First, Chris could you add the fields package name and version? Because
> > it is hard to automatically reconstruct them by parsing the output-path.
>
> Done in [1], and I've updated data.guix-patches.cbaines.net.
Neat! I have updated my script. Now, I need to add some build-system
support to easy the triage. Keep you in touch.
Thank you.
> > (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
> > does not seems founding in Guix repo.)
>
> So, that particular commit is just some revision of Guix with some
> patches applied. I picked it because it was the most recent
> data. There's now recent commits for the master branch itself [2] like
> [3].
Can I expect that all the revisions are there? Or only some?
Cheers,
simon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Reproductibility, Data Services, guix weather
2020-10-14 20:17 ` zimoun
@ 2020-10-15 8:45 ` Christopher Baines
2020-10-15 9:18 ` zimoun
0 siblings, 1 reply; 6+ messages in thread
From: Christopher Baines @ 2020-10-15 8:45 UTC (permalink / raw)
To: zimoun; +Cc: Guix Devel
[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]
zimoun <zimon.toutoune@gmail.com> writes:
> Hi Chris,
>
> On Tue, 13 Oct 2020 at 21:41, Christopher Baines <mail@cbaines.net> wrote:
>
>> > First, Chris could you add the fields package name and version? Because
>> > it is hard to automatically reconstruct them by parsing the output-path.
>>
>> Done in [1], and I've updated data.guix-patches.cbaines.net.
>
> Neat! I have updated my script. Now, I need to add some build-system
> support to easy the triage. Keep you in touch.
>
> Thank you.
>
>> > (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
>> > does not seems founding in Guix repo.)
>>
>> So, that particular commit is just some revision of Guix with some
>> patches applied. I picked it because it was the most recent
>> data. There's now recent commits for the master branch itself [2] like
>> [3].
>
> Can I expect that all the revisions are there? Or only some?
Well, definitely not all revisions, but for the patches instance of the
Guix Data Service I'm aiming to keep recent revisions.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Reproductibility, Data Services, guix weather
2020-10-15 8:45 ` Christopher Baines
@ 2020-10-15 9:18 ` zimoun
0 siblings, 0 replies; 6+ messages in thread
From: zimoun @ 2020-10-15 9:18 UTC (permalink / raw)
To: Christopher Baines; +Cc: Guix Devel
On Thu, 15 Oct 2020 at 09:45, Christopher Baines <mail@cbaines.net> wrote:
>> Can I expect that all the revisions are there? Or only some?
>
> Well, definitely not all revisions, but for the patches instance of the
> Guix Data Service I'm aiming to keep recent revisions.
Well, I am going to put that in my script and will report to you if I am
not able to reach the expected revisions.
--8<---------------cut here---------------start------------->8---
(match (current-profile)
(#f %guix-version) ;for lack of a better ID
(profile
(let ((channel
(find guix-channel? (profile-channels profile))))
(channel-commit channel))))))
--8<---------------cut here---------------end--------------->8---
Thanks,
simon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Reproductibility, Data Services, guix weather
2020-10-12 21:40 Reproductibility, Data Services, guix weather zimoun
2020-10-13 19:41 ` Christopher Baines
@ 2020-10-16 10:17 ` Ludovic Courtès
1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2020-10-16 10:17 UTC (permalink / raw)
To: zimoun; +Cc: Guix Devel
Hi!
zimoun <zimon.toutoune@gmail.com> skribis:
> (define %prefix-url
> "https://data.guix-patches.cbaines.net/revision")
>
> (define %suffix-url
> "output_consistency=not-matching&target=none&all_results=on")
>
> (define %json-name "package-derivation-outputs.json")
>
> \f
> (define* (json-url revision
> #:optional
> (system (%current-system))
> (json %json-name))
> "Return the URL corresponding to REVISION."
> (string-append
> %prefix-url "/" revision "/" json "?" %suffix-url"&system=" system))
>
> (define* (json-file revision
> #:optional
> (system (%current-system))
> (name %json-name)
> (tmp %temporary-directory))
> "Path where the JSON is stored.
>
> By default in %TEMPORARY-DIRECTORY/REV-%JSON-NAME."
> (let ((hash (substring revision 0 6)))
> (string-append tmp "/" hash "-" name "-" system)))
>
> (define (fetch-json revision system)
> "Fetch the JSON file from the Data Service corresponding to REVISION.
>
> Store the result in %TEMPORARY-DIRECTORY."
> (let* ((out (json-file revision system))
> (url (json-url revision system)))
> (url-fetch url out)))
I think it’s a good idea. My suggestion would be to do as for (guix
ci): make a (guix data-service-client) (?) module that contains proper
bindings to a subset of the Data Service APIs, using
‘define-json-mapping’.
Once we have that, we can consider using it in ‘guix weather’ or in new
tools such as the proposed ‘guix git log’.
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-16 10:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-12 21:40 Reproductibility, Data Services, guix weather zimoun
2020-10-13 19:41 ` Christopher Baines
2020-10-14 20:17 ` zimoun
2020-10-15 8:45 ` Christopher Baines
2020-10-15 9:18 ` zimoun
2020-10-16 10:17 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).