unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Reproductibility, Data Services, guix weather
@ 2020-10-12 21:40 zimoun
  2020-10-13 19:41 ` Christopher Baines
  2020-10-16 10:17 ` Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: zimoun @ 2020-10-12 21:40 UTC (permalink / raw)
  To: Guix Devel, Christopher Baines

[-- Attachment #1: Type: text/plain, Size: 7198 bytes --]

Dear,

Recently, we discovered a regression in the Haskell build system:
introducing unreproducible builds.  Well, it was a kind of luck: I was
testing ’git-annex’ with the willing to have ’git-annex-assitant’
building it several times (--check) [1].

Aside this particular issue, ~10% of packages are not reproducible and I
am not convinced that “--check” is done by submitter/committer at each
update or new package.  Otherwise the case of unreproducible Mesa [2]
would raised before than June. :-)  (That’s fine, we need package after
all and we cannot fix the world all in the same time. :-))

The issue is to be able to find them.  I proposed (below) to run cron
task doing ’--check’ on the build farms and then report by email the
failure.  Chris indicated me the work they is doing [3] and instead of a
cron task, they is proposing to parse the JSON.  That’s what the tiny
script attached is doing.

   guix repl -L . -- weather-repro.scm

For example, I run:

   guix repl -L . -- weather-repro.scm | sort | grep ghc

to list (almost) all the unreproducible Haskell packages.  What I would
like is to be able to filter by build system for example.


First, Chris could you add the fields package name and version?  Because
it is hard to automatically reconstruct them by parsing the output-path.

Second, the revision of <https://data.guix-patches.cbaines.net/revision>
does not match the Guix commit.  Is it possible to have a bridge?  Other
said, how is computed this revision hash?

(A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
does not seems founding in Guix repo.)


Third, this tiny script is better than nothing but *far far away* form
perfect.  The question about tooling is: does it make sense to include
something like that directly in “guix weather”?  For example,

  guix weather --reproducible

or maybe under “guix challenge”?


WDYT?  Feedback and ideas are very welcome. :-)


All the best,
simon

PS: Below my question and the Chris’s answer.  Both deserve to be public
as Chris told me. :-)


1: <http://issues.guix.gnu.org/issue/43843>
2: <http://issues.guix.gnu.org/issue/42139>
3: <https://data.guix-patches.cbaines.net/revision/6cf35799dec60723f37d83a559429aa8b90482d5/package-derivation-outputs?search_query=&output_consistency=not-matching&system=x86_64-linux&target=none&after_path=&limit_results=10>


-------------------- Start of forwarded message --------------------
From: zimoun <zimon.toutoune@gmail.com>
Subject: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:19:26 +0200

Hi,

Currently, it is hard to catch:

  1. which commit breaks which package
  2. if the package builds reproductibly

Even if the Data services helps, *a lot!*.  There are still a lot of
manual actions to spot one or the other.  And I fully agree that the
work initiated by Chris is The Right Thing©.

However it is not ready and the man power is not extensible.  For the
#1, Danny have started a discussion. 


For the #2, I am proposing to add a cron task on one build-farm.  To be
concrete, let’s *randomly* pick 100 packages once a week, rebuild with
“--check“ and send by email the unreproducible packages.

Even, I am proposing: 1rst week 100 packages of build-system “foo”, 2nd
week 100 packages of build-system “bar”, 3rd week…

It is far from perfect but it seems a good heuristic to catch
regression, spot packages with reproducibility troubles, etc.  Note that
it should not happen since the committer should catch the
reproducibility issue; but as a matter of fact it is not the case.
Somehow, I am proposing a workaround.


I volunteer to be the recipient of these automatic emails, then I can do
some triage (remove false-positive, check what’s going, etc.)  and open
a bug report if there is an issue.

Currently, I do not have the CPU power to do so.  So I am asking if it
possible to put something like that on one of the building machines.  I
totally understand an answer as: « Simon, you are enthusiast and that’s
nice but no and go to hell! » :-)

Cheers,
simon
-------------------- End of forwarded message --------------------

-------------------- Start of forwarded message --------------------
From: Christopher Baines <mail@cbaines.net>
Subject: Re: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:38:03 +0100

> Currently, I do not have the CPU power to do so.  So I am asking if it
> possible to put something like that on one of the building machines.  I
> totally understand an answer as: « Simon, you are enthusiast and that’s
> nice but no and go to hell! » :-)

I too would really like to be able to identify/prevent regressions,
including with respect to build reproducibility, and although the work
I'm doing on this is going slowly I'm hoping that with the Guix Build
Coordinator now I'll be able to get something sort of working.

I've just made a few tweaks to the Guix Data Service to make the data it
has on this a little easier to use.

This URL [1] should show you package reproducibility stats for each
architecture, computed from substitute data from ci.guix.gnu.org,
bayfront.guix.gnu.org, guix.tobias.gr and guix.cbaines.net. Currently
there seem to be 1515 outputs (so not exactly packages, but close) that
don't seem to have built reproducibly.

1: https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-reproducibility

Clicking through to the "Not matching" ones for x86_64-linux should give
you this URL [2]. In case it's useful to have this data in a more
machine readable form, I've added a JSON output option.

2 :https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-derivation-outputs?output_consistency=not-matching&system=x86_64-linux

You mention triage, and that's probably the biggest blocker to being
able to methodically try and reduce the "Not matching" numbers on
[1]. As far as I know, Debian has things like [3] and [4] to help with
that.

3: https://tests.reproducible-builds.org/debian/index_issues.html
4: https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/master/issues.yml

Going back to the issue of a cron job to run guix build --check on some
random packages and send emails, if you're looking for a list of
packages (well actually outputs) which don't build reproducibly, then
[2] might do?

The JSON output doesn't contain the package names, but it probably could
with only a little effort. If it did, you could download the JSON file
for all the non-matching package outputs, and record the package names
in a sorted list in a Git repository. If you do that every day, then you
could read the git log to spot potential patterns/regressions.

I don't think your email hit a mailing list, feel free to send my reply
to one though, maybe guix-devel as this discussion probably deserves a
wide audience.

Thanks,

Chris
-------------------- End of forwarded message --------------------


[-- Attachment #2: weather-repro.scm --]
[-- Type: application/octet-stream, Size: 4804 bytes --]

(define-module (weather-repro)
  #:use-module (json)

  #:use-module ((guix i18n) #:select (G_))
  #:use-module ((guix diagnostics) #:select (leave))
  #:use-module ((guix describe) #:select (current-profile))
  #:use-module (guix channels)

  #:use-module (guix utils)             ;package-name->name+version, version-*
  #:use-module ((guix build download) #:select (url-fetch))

  ;#:use-module (srfi srfi-11)           ; let-values
  #:use-module (ice-9 match)
  #:use-module (srfi srfi-1)            ; fold
  #:use-module (srfi srfi-37)           ; parse command line (option)
  )


\f
(define %temporary-directory
  ;; Temporary directory.
  (or (getenv "TMPDIR") "/tmp"))


(define %prefix-url
  "https://data.guix-patches.cbaines.net/revision")

(define %suffix-url
  "output_consistency=not-matching&target=none&all_results=on")

(define %json-name "package-derivation-outputs.json")

\f
(define* (json-url revision
              #:optional
              (system (%current-system))
              (json %json-name))
  "Return the URL corresponding to REVISION."
  (string-append
   %prefix-url "/" revision "/" json "?" %suffix-url"&system=" system))

(define* (json-file revision
                    #:optional
                    (system (%current-system))
                    (name %json-name)
                    (tmp %temporary-directory))
  "Path where the JSON is stored.

By default in %TEMPORARY-DIRECTORY/REV-%JSON-NAME."
  (let ((hash (substring revision 0 6)))
    (string-append tmp "/" hash "-" name "-" system)))

(define (fetch-json revision system)
  "Fetch the JSON file from the Data Service corresponding to REVISION.

Store the result in %TEMPORARY-DIRECTORY."
  (let* ((out (json-file revision system))
         (url (json-url revision system)))
    (url-fetch url out)))

(define (json-file! revision system)
  "Return the JSON filename corresponding to REVISION.

If the JSON file does not exist locally, then fetch it."
  (let ((json (json-file revision system)))
    (when (not (access? json F_OK))
      (fetch-json revision system))
    json))

\f
(define (read-paths revision system)
  "Return the list corresponding to REVISION of all the store items from JSON
not matching between the build farms."
  (map (lambda (elem) (assoc-ref elem "path"))
       (vector->list
        (assoc-ref
         (call-with-input-file (json-file! revision system) json->scm)
         "store_paths"))))

(define (path->package-version gnu-store-hash-package-version)
  "Return PACKAGE-VERSION from /gnu/store/hash-package-version."
  (let* ((split (string-split gnu-store-hash-package-version #\-))
         (package-version-list (cdr split))
         (package-version (string-join package-version-list "-")))
    package-version))

(define (list-packages revision system)
  (map path->package-version (read-paths revision system)))


\f
(define (show-help)
  (display (G_ "Usage: guix repl -L /path/to/modules -- weather-repro.scm [OPTION]...
Tiny tools to find unreproducible packages."))
  (newline)
  (display (G_ "
  -h, --help             display this help and exit"))
  (display (G_ "
  -c,--commit            revision to check"))
  (display (G_ "
  -s, --system           system to check"))
  (newline))

(define %options
  (list (option '(#\h "help") #f #f
                (lambda args
                  (show-help)
                  (exit 0)))
        (option '(#\c "commit") #t #f
                (lambda (opt name arg result . rest)
                  (format #t (G_ "commit: ~A~%") arg)
                  (alist-cons 'commit arg result)))
        (option '(#\s "system") #t #f
                (lambda (opt name arg result . rest)
                  (format #t (G_ "system: ~A~%") arg)
                  (alist-cons 'system arg result)))))

\f
;;;
;;; Entry point
;;;

(define (main . args)
  (define opts
    (args-fold args %options
               (lambda (opt name arg loads)
                 (leave (G_ "~A: unrecognized option~%") name))
               (lambda (op loads) (cons op loads))
               '()))

  (format #t "~a\n" opts)

  (let ((commit (or (assoc-ref opts 'commit)
                    ;; Not the commit hash as in Guix
                    "6cf35799dec60723f37d83a559429aa8b90482d5"))
                    ;; (match (current-profile)
                    ;;   (#f %guix-version)   ;for lack of a better ID
                    ;;   (profile
                    ;;    (let ((channel
                    ;;           (find guix-channel? (profile-channels profile))))
                    ;;      (channel-commit channel))))
        (system (or (assoc-ref opts 'system)
                     (%current-system))))

    (map (lambda (name)
           (format #t "~a\n" name))
         (list-packages commit system)))

  (exit 0))

(apply main (cdr (command-line)))



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-16 10:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 21:40 Reproductibility, Data Services, guix weather zimoun
2020-10-13 19:41 ` Christopher Baines
2020-10-14 20:17   ` zimoun
2020-10-15  8:45     ` Christopher Baines
2020-10-15  9:18       ` zimoun
2020-10-16 10:17 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).