* bug#65979: incorrect “guix hash” for FastQC
@ 2023-09-14 16:46 Simon Tournier
2023-09-26 13:00 ` Simon Tournier
2023-09-26 13:29 ` Ludovic Courtès
0 siblings, 2 replies; 8+ messages in thread
From: Simon Tournier @ 2023-09-14 16:46 UTC (permalink / raw)
To: 65979
Hi,
Currently Guix stores in its source the hash
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk and this has not
changed since February (Guix revision
b6a4fbb488f1f539ae45ed7924c9af8905fa0d8b).
Well, because the origin contains a snippet, “guix build -S” does not
return the result of the fixed derivation. Instead, it is one or the other:
--8<---------------cut here---------------start------------->8---
$ guix gc -R $(guix build -S fastqc -d) | grep checkout.drv
/gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv
/gnu/store/7rdcyl7iq00xfdwh6azgmc3i7mr2769b-fastqc-0.11.9-checkout.drv
--8<---------------cut here---------------end--------------->8---
Let pick the correct one and I get:
--8<---------------cut here---------------start------------->8---
$ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---
Oh, it is not the expected result! Instead, I have to omit the option
’-x’:
--8<---------------cut here---------------start------------->8---
$ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
--8<---------------cut here---------------end--------------->8---
Hum, I am missing something…
Let clone from upstream.
--8<---------------cut here---------------start------------->8---
$ git clone https://github.com/s-andrews/FastQC
$ git -C FastQC checkout v0.11.9
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---
Ah… incorrect but consistent with previously. The question is why is
this following,
--8<---------------cut here---------------start------------->8---
$ mv FastQC/.git bye-bye
$ guix hash -r FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
--8<---------------cut here---------------end--------------->8---
is correct? And obviously without .git we get the same hash just as
above,
--8<---------------cut here---------------start------------->8---
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---
Any idea?
Cheers,
simon
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-14 16:46 bug#65979: incorrect “guix hash” for FastQC Simon Tournier
@ 2023-09-26 13:00 ` Simon Tournier
2023-09-26 13:34 ` Ludovic Courtès
2023-09-26 13:29 ` Ludovic Courtès
1 sibling, 1 reply; 8+ messages in thread
From: Simon Tournier @ 2023-09-26 13:00 UTC (permalink / raw)
To: 65979
Hi,
On Thu, 14 Sep 2023 at 18:46, Simon Tournier <zimon.toutoune@gmail.com> wrote:
> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
> Oh, it is not the expected result! Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
[...]
> $ git clone https://github.com/s-andrews/FastQC
> $ git -C FastQC checkout v0.11.9
> $ guix hash -rx FastQC
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
[...]
> $ mv FastQC/.git bye-bye
> $ guix hash -r FastQC
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
[...]
> Any idea?
The issue is from .svn folders.
--8<---------------cut here---------------start------------->8---
$ find $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) \
-type d -name ".svn" -print
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn
--8<---------------cut here---------------end--------------->8---
The option ’-x’ excludes them; see vcs-file? from (guix hash) module,
and let confirm:
$ git clone https://github.com/s-andrews/FastQC
$ git -C FastQC checkout v0.11.9
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
$ find FastQC -type d -name ".svn" -exec rm -r "{}" \;
$ mv FastQC/.git bye-bye
$ guix hash -r FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
Therefore, we have a discrepancy between “guix hash” command-line and
checksum verification from ’source’ field.
The package definition reads:
(source
(origin
(method git-fetch)
(uri (git-reference
(url "https://github.com/s-andrews/FastQC")
(commit (string-append "v" version))))
(file-name (git-file-name name version))
(sha256
(base32
"00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))
when this hash contains .svn subfolders that are excluded by ’-x’
command-line option.
What do we fix? The procedure vcs-file? or fixed-output computation?
Cheers,
simon
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-14 16:46 bug#65979: incorrect “guix hash” for FastQC Simon Tournier
2023-09-26 13:00 ` Simon Tournier
@ 2023-09-26 13:29 ` Ludovic Courtès
1 sibling, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-09-26 13:29 UTC (permalink / raw)
To: Simon Tournier; +Cc: 65979
Hi!
Simon Tournier <zimon.toutoune@gmail.com> skribis:
> Let pick the correct one and I get:
>
> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
>
> Oh, it is not the expected result! Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
This is because:
--8<---------------cut here---------------start------------->8---
$ find $( guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) -name .svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn
--8<---------------cut here---------------end--------------->8---
Not a bug, IMO (‘vcs-file?’ shouldn’t try to be smart in this case.)
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-26 13:00 ` Simon Tournier
@ 2023-09-26 13:34 ` Ludovic Courtès
2023-09-26 16:23 ` Simon Tournier
2023-10-17 17:56 ` Simon Tournier
0 siblings, 2 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-09-26 13:34 UTC (permalink / raw)
To: Simon Tournier; +Cc: 65979
(Oops, now I see you had already found the reason.)
Simon Tournier <zimon.toutoune@gmail.com> skribis:
> What do we fix? The procedure vcs-file? or fixed-output computation?
There are two things are:
1. ‘vcs-file?’, used by ‘guix hash -rx’;
2. ‘git-fetch’, which does (delete-file-recursively ".git").
Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
a Git repo), whereas #1 is an approximation that, in corner cases like
this one, gives the wrong answer.
My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
would be to add complicated heuristics in the hope corner cases like
this one would be better dealt with, but it wouldn’t be bullet-proof
anyway.
WDYT?
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-26 13:34 ` Ludovic Courtès
@ 2023-09-26 16:23 ` Simon Tournier
2023-10-01 15:07 ` Ludovic Courtès
2023-10-17 17:56 ` Simon Tournier
1 sibling, 1 reply; 8+ messages in thread
From: Simon Tournier @ 2023-09-26 16:23 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 65979
Hi,
On Tue, 26 Sept 2023 at 15:34, Ludovic Courtès <ludo@gnu.org> wrote:
> There are two things are:
>
> 1. ‘vcs-file?’, used by ‘guix hash -rx’;
>
> 2. ‘git-fetch’, which does (delete-file-recursively ".git").
Yes.
> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
> a Git repo), whereas #1 is an approximation that, in corner cases like
> this one, gives the wrong answer.
These corner cases matter for some SWH loader implementing Nar hashes
in Python. Since they load sources.json (conversion of hash checksum
from package recipe), read the Nar hash and compare it with the one
they internal compute, then they need to implement in Python the
correct behavior.
See https://gitlab.softwareheritage.org/swh/devel/swh-loader-git/-/issues/4751#note_149180
and all the thread for context.
> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.
Well, the question is what other VCS as 'svn-fetch', etc. are doing?
Maybe, we can just have a special case for Git repository.
Somehow, since the problem needs to be solved for SWH, the same
solution applies for vcs-file? and "guix hash", no?
Cheers,
simon
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-26 16:23 ` Simon Tournier
@ 2023-10-01 15:07 ` Ludovic Courtès
2023-10-01 15:24 ` Simon Tournier
0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-01 15:07 UTC (permalink / raw)
To: Simon Tournier; +Cc: 65979
Hi,
Simon Tournier <zimon.toutoune@gmail.com> skribis:
[...]
>> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
>> a Git repo), whereas #1 is an approximation that, in corner cases like
>> this one, gives the wrong answer.
>
> These corner cases matter for some SWH loader implementing Nar hashes
> in Python. Since they load sources.json (conversion of hash checksum
> from package recipe), read the Nar hash and compare it with the one
> they internal compute, then they need to implement in Python the
> correct behavior.
I agree, SWH needs something more robust than ‘vcs-file?’.
>> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
>> would be to add complicated heuristics in the hope corner cases like
>> this one would be better dealt with, but it wouldn’t be bullet-proof
>> anyway.
>
> Well, the question is what other VCS as 'svn-fetch', etc. are doing?
Remove the relevant metadata directory/directories.
> Maybe, we can just have a special case for Git repository.
>
> Somehow, since the problem needs to be solved for SWH, the same
> solution applies for vcs-file? and "guix hash", no?
Yes, ‘guix hash’ uses ‘vcs-file?’.
Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
that on their side (in Python code I guess).
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-10-01 15:07 ` Ludovic Courtès
@ 2023-10-01 15:24 ` Simon Tournier
0 siblings, 0 replies; 8+ messages in thread
From: Simon Tournier @ 2023-10-01 15:24 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 65979
Hi,
On Sun, 1 Oct 2023 at 17:07, Ludovic Courtès <ludo@gnu.org> wrote:
> >> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> >> would be to add complicated heuristics in the hope corner cases like
> >> this one would be better dealt with, but it wouldn’t be bullet-proof
> >> anyway.
> >
> > Well, the question is what other VCS as 'svn-fetch', etc. are doing?
>
> Remove the relevant metadata directory/directories.
What does it mean "relevant"? :-)
Where do I have to look to implement the exact same behaviour of what
do the fetch methods?
> Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
> that on their side (in Python code I guess).
Yes for sure. It does not change the fact that "guix hash" has a bug.
Cheers,
simon
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#65979: incorrect “guix hash” for FastQC
2023-09-26 13:34 ` Ludovic Courtès
2023-09-26 16:23 ` Simon Tournier
@ 2023-10-17 17:56 ` Simon Tournier
1 sibling, 0 replies; 8+ messages in thread
From: Simon Tournier @ 2023-10-17 17:56 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 65979
[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]
Hi Ludo,
On Tue, 26 Sep 2023 at 15:34, Ludovic Courtès <ludo@gnu.org> wrote:
> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.
Hum, I am probably pig-headed here. :-)
I propose the change below. Well, it does not appear to me a
complicated heuristics but I am biased. The idea is to detect at the
repository root what is the kind of VCS, just by checking if .git (or
.hg or .svn or etc.) exists or not. Then, only exclude this kind… and
not the whole list.
And I think this way is almost bullet-proof, no? Or could you provide
some contrived examples?
WDYT?
Obviously, this fixes the issue:
--8<---------------cut here---------------start------------->8---
$ guix hash -rx /tmp/FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
$ ./pre-inst-env guix hash -rx /tmp/FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
$ grep -A 15 'define-public fastqc' gnu/packages/bioinformatics.scm | grep -C 1 base32
(sha256
(base32
"00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))
--8<---------------cut here---------------end--------------->8---
And I do not observe a regression using a couple of tests.
Cheers,
simon
[-- Attachment #2: p.patch --]
[-- Type: text/x-diff, Size: 4686 bytes --]
From 1bf6555e6a89248f079437b842dbaff9c107de9a Mon Sep 17 00:00:00 2001
Message-Id: <1bf6555e6a89248f079437b842dbaff9c107de9a.1697562942.git.zimon.toutoune@gmail.com>
From: Simon Tournier <zimon.toutoune@gmail.com>
Date: Tue, 17 Oct 2023 18:53:04 +0200
Subject: [PATCH] scripts: hash: Handle repository several VCS folders.
Fixes <https://issues.guix.gnu.org/issue/65979>.
Reported by Simon Tournier <zimon.toutoune@gmail.com>
* guix/hash.scm (vcs-file?): Add optional argument for passing VCS kind of the
file/repository.
(file-hash*): Adjust accordingly.
* guix/scripts/hash.scm (guix-hash)[file-hash]: Detect VCS kind of the
file/repository and passes it.
---
guix/hash.scm | 18 ++++++++++++------
guix/scripts/hash.scm | 18 +++++++++++++-----
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/guix/hash.scm b/guix/hash.scm
index 3cb68e5c44..8fff51e8f1 100644
--- a/guix/hash.scm
+++ b/guix/hash.scm
@@ -1,6 +1,7 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev>
;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
+;;; Copyright © 2023 Simon Tournier <zimon.toutoune@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -25,21 +26,26 @@ (define-module (guix hash)
#:export (vcs-file?
file-hash*))
-(define (vcs-file? file stat)
- "Returns true if FILE is a version control system file."
+(define* (vcs-file? file stat
+ #:optional
+ (vcses (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+ "Returns true if FILE matches a version control system from the list VCSES."
(case (stat:type stat)
((directory)
- (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS")))
+ (member (basename file) vcses))
((regular)
- ;; Git sub-modules have a '.git' file that is a regular text file.
- (string=? (basename file) ".git"))
+ (if (member ".git" vcses)
+ ;; Git sub-modules have a '.git' file that is a regular text file.
+ (string=? (basename file) ".git")
+ #f))
(else
#f)))
(define* (file-hash* file #:key
(algorithm (hash-algorithm sha256))
(recursive? 'auto)
- (select? (negate vcs-file?)))
+ (select? (negate (lambda (file stat)
+ (vcs-file? file stat)))))
"Compute the hash of FILE with ALGORITHM.
Symbolic links are only dereferenced if RECURSIVE? is false.
diff --git a/guix/scripts/hash.scm b/guix/scripts/hash.scm
index 7197d3965c..ed96e6a7e1 100644
--- a/guix/scripts/hash.scm
+++ b/guix/scripts/hash.scm
@@ -3,7 +3,7 @@
;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
;;; Copyright © 2016 Jan Nieuwenhuizen <janneke@gnu.org>
;;; Copyright © 2018 Tim Gesthuizen <tim.gesthuizen@yahoo.de>
-;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
+;;; Copyright © 2021, 2023 Simon Tournier <zimon.toutoune@gmail.com>
;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev>
;;;
;;; This file is part of GNU Guix.
@@ -181,9 +181,6 @@ (define-command (guix-hash . args)
(_ #f))
(reverse opts)))
(fmt (assq-ref opts 'format))
- (select? (if (assq-ref opts 'exclude-vcs?)
- (negate vcs-file?)
- (const #t)))
(algorithm (assoc-ref opts 'hash-algorithm))
(serializer (assoc-ref opts 'serializer)))
@@ -193,7 +190,18 @@ (define-command (guix-hash . args)
(catch 'system-error
(lambda _
(with-error-handling
- (serializer file algorithm select?)))
+ (let* ((vcses (fold (lambda (vcs result)
+ (if (file-exists? (string-append file "/" vcs))
+ (cons vcs result)
+ result))
+ '()
+ (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+ (select? (if (assq-ref opts 'exclude-vcs?)
+ (negate (lambda (file stat)
+ (vcs-file? file stat
+ vcses)))
+ (const #t))))
+ (serializer file algorithm select?))))
(lambda args
(leave (G_ "~a ~a~%")
file
base-commit: dcc5c34504c94732c135a85fb4db40ca9796270e
--
2.38.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-10-17 17:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14 16:46 bug#65979: incorrect “guix hash” for FastQC Simon Tournier
2023-09-26 13:00 ` Simon Tournier
2023-09-26 13:34 ` Ludovic Courtès
2023-09-26 16:23 ` Simon Tournier
2023-10-01 15:07 ` Ludovic Courtès
2023-10-01 15:24 ` Simon Tournier
2023-10-17 17:56 ` Simon Tournier
2023-09-26 13:29 ` Ludovic Courtès
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.