* [PATCH] Add seqtk.
@ 2015-06-24 5:07 Ben Woodcroft
2015-06-24 12:09 ` Mark H Weaver
0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2015-06-24 5:07 UTC (permalink / raw)
To: guix-devel@gnu.org
[-- Attachment #1: Type: text/plain, Size: 75 bytes --]
I feel somewhat honoured to even be mentioned in the same thread as kseq.h
[-- Attachment #2: 0001-gnu-Add-seqtk.patch --]
[-- Type: text/x-patch, Size: 2380 bytes --]
From 48d3adae4bcada110df3fb7d8c5ddc55ad2000ff Mon Sep 17 00:00:00 2001
From: Ben Woodcroft <donttrustben@gmail.com>
Date: Wed, 24 Jun 2015 15:04:48 +1000
Subject: [PATCH] gnu: Add seqtk.
* gnu/packages/bioinformatics.scm (seqtk): New variable.
---
gnu/packages/bioinformatics.scm | 45 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 8dfaff3..e4575ae 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -1573,6 +1573,51 @@ any particular back-end implementation, and supports use of multiple back-ends
simultaneously.")
(license license:public-domain)))
+(define-public seqtk
+ (let ((commit "4feb6e814"))
+ (package
+ (name "seqtk")
+ ;; version number from running 'seqtk' after installation
+ (version (string-append "1.0-r82." commit))
+ (source (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "https://github.com/lh3/seqtk.git")
+ (commit commit)))
+ (sha256
+ (base32
+ "0wdkz8chkinfm23cg95nrn797lv12n2wxglwb3s2kvf0iv3rrx01"))))
+ (build-system gnu-build-system)
+ (arguments
+ `(#:tests? #f
+ #:phases
+ (modify-phases %standard-phases
+ (delete 'configure)
+ (replace 'build
+ (lambda* _
+ (zero? (system* "make"))))
+ (replace 'install
+ (lambda* (#:key outputs #:allow-other-keys)
+ (let ((bin (string-append
+ (assoc-ref outputs "out")
+ "/bin/")))
+ (mkdir-p bin)
+ (copy-file "seqtk" (string-append
+ bin "seqtk"))
+ (copy-file "trimadap" (string-append
+ bin "trimadap"))))))))
+ (native-inputs
+ `(("zlib" ,zlib)))
+ (home-page "https://github.com/lh3/seqtk")
+ (synopsis "Toolkit for processing sequences in FASTA/Q formats")
+ (description
+ "Seqtk is a fast and lightweight tool for processing sequences in
+the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ
+files which can also be optionally compressed by gzip.")
+ (license (license:non-copyleft
+ "file://src/LICENSE"
+ "See src/LICENSE in the distribution.")))))
+
(define-public ngs-java
(package (inherit ngs-sdk)
(name "ngs-java")
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Add seqtk.
2015-06-24 5:07 [PATCH] Add seqtk Ben Woodcroft
@ 2015-06-24 12:09 ` Mark H Weaver
2015-07-18 7:51 ` Ben Woodcroft
0 siblings, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2015-06-24 12:09 UTC (permalink / raw)
To: Ben Woodcroft; +Cc: guix-devel
Ben Woodcroft <b.woodcroft@uq.edu.au> writes:
> I feel somewhat honoured to even be mentioned in the same thread as kseq.h
:-)
> From 48d3adae4bcada110df3fb7d8c5ddc55ad2000ff Mon Sep 17 00:00:00 2001
> From: Ben Woodcroft <donttrustben@gmail.com>
> Date: Wed, 24 Jun 2015 15:04:48 +1000
> Subject: [PATCH] gnu: Add seqtk.
>
> * gnu/packages/bioinformatics.scm (seqtk): New variable.
> ---
> gnu/packages/bioinformatics.scm | 45 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
> index 8dfaff3..e4575ae 100644
> --- a/gnu/packages/bioinformatics.scm
> +++ b/gnu/packages/bioinformatics.scm
> @@ -1573,6 +1573,51 @@ any particular back-end implementation, and supports use of multiple back-ends
> simultaneously.")
> (license license:public-domain)))
>
> +(define-public seqtk
> + (let ((commit "4feb6e814"))
> + (package
> + (name "seqtk")
> + ;; version number from running 'seqtk' after installation
> + (version (string-append "1.0-r82." commit))
> + (source (origin
> + (method git-fetch)
> + (uri (git-reference
> + (url "https://github.com/lh3/seqtk.git")
> + (commit commit)))
> + (sha256
> + (base32
> + "0wdkz8chkinfm23cg95nrn797lv12n2wxglwb3s2kvf0iv3rrx01"))))
> + (build-system gnu-build-system)
> + (arguments
> + `(#:tests? #f
> + #:phases
The misalignment of the code above in this email was caused by your use
of tabs. Please do not use tabs anywhere in *.scm files in Guix. (This
is also an issue in your yaggo patch.)
Please add a brief comment explaining why tests are disabled, in this
case: "#:tests? #f ;no test suite".
> + (modify-phases %standard-phases
> + (delete 'configure)
> + (replace 'build
> + (lambda* _
> + (zero? (system* "make"))))
Why replace the default 'build' phase of the gnu-build-system here? The
only difference is that the default build phase adds -j<N> to enable a
parallel build by default.
> + (replace 'install
> + (lambda* (#:key outputs #:allow-other-keys)
> + (let ((bin (string-append
> + (assoc-ref outputs "out")
> + "/bin/")))
> + (mkdir-p bin)
> + (copy-file "seqtk" (string-append
> + bin "seqtk"))
> + (copy-file "trimadap" (string-append
> + bin "trimadap"))))))))
Phase procedures should return a boolean indicating whether the phase
succeeded, but the return value of 'copy-file' is not specified. Please
add #t after the last call to 'copy-file'.
> + (native-inputs
> + `(("zlib" ,zlib)))
zlib needs to be a normal input here, so please replace "native-inputs"
with "inputs".
> + (home-page "https://github.com/lh3/seqtk")
> + (synopsis "Toolkit for processing sequences in FASTA/Q formats")
> + (description
> + "Seqtk is a fast and lightweight tool for processing sequences in
> +the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ
> +files which can also be optionally compressed by gzip.")
> + (license (license:non-copyleft
> + "file://src/LICENSE"
> + "See src/LICENSE in the distribution.")))))
It's the expat license, so just write "(license license:expat)". The
author calls it "The MIT License", but that term is misleading since MIT
has used many licenses for software, e.g. the X11 license.
However, there's a bigger problem here. Some of the code is not clearly
licensed or has invalid copyright notices:
* The copyright notice on kstring.h lacks copyright dates.
* The copyright dates in seqtk.c and LICENSE are incorrect
("20082-2012").
* ksw.h and trimadap.c lack any copyright notice at all. The mere
presence of a LICENSE file is not enough, especially given the lack of
any statement about the code license in the README.
Would you be willing to ask the author to fix these issues?
Mark
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add seqtk.
2015-06-24 12:09 ` Mark H Weaver
@ 2015-07-18 7:51 ` Ben Woodcroft
2015-07-18 9:07 ` John Darrington
0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2015-07-18 7:51 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guix-devel
On 24/06/15 22:09, Mark H Weaver wrote:
> However, there's a bigger problem here. Some of the code is not clearly
> licensed or has invalid copyright notices:
>
> * The copyright notice on kstring.h lacks copyright dates.
>
> * The copyright dates in seqtk.c and LICENSE are incorrect
> ("20082-2012").
>
> * ksw.h and trimadap.c lack any copyright notice at all. The mere
> presence of a LICENSE file is not enough, especially given the lack of
> any statement about the code license in the README.
>
> Would you be willing to ask the author to fix these issues?
I did this almost a month ago
https://github.com/lh3/seqtk/pull/60
But no response as yet. I think the common sense here is that it is
expat licensed, WDYT? It would be a shame to exclude this software as it
is quite well used in bioinformatics generally, but I'm happy to defer
to your decision.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add seqtk.
2015-07-18 7:51 ` Ben Woodcroft
@ 2015-07-18 9:07 ` John Darrington
2016-09-09 11:08 ` [PATCH] gnu: " Ben Woodcroft
0 siblings, 1 reply; 8+ messages in thread
From: John Darrington @ 2015-07-18 9:07 UTC (permalink / raw)
To: Ben Woodcroft; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]
On Sat, Jul 18, 2015 at 05:51:01PM +1000, Ben Woodcroft wrote:
On 24/06/15 22:09, Mark H Weaver wrote:
>However, there's a bigger problem here. Some of the code is not clearly
>licensed or has invalid copyright notices:
>
>* The copyright notice on kstring.h lacks copyright dates.
>
>* The copyright dates in seqtk.c and LICENSE are incorrect
> ("20082-2012").
>
>* ksw.h and trimadap.c lack any copyright notice at all. The mere
> presence of a LICENSE file is not enough, especially given the lack of
> any statement about the code license in the README.
>
>Would you be willing to ask the author to fix these issues?
I did this almost a month ago
https://github.com/lh3/seqtk/pull/60
But no response as yet. I think the common sense here is that it
is expat licensed, WDYT? It would be a shame to exclude this
software as it is quite well used in bioinformatics generally,
but I'm happy to defer to your decision.
When I look at that url, I don't see that you have asked anybody anything.
The message isn't addressed to anyone in particular; the salutation is
merely "Hi". Also, it poses no question. It just make a (very weak) statement,
including the words "seem to be" and "unsure". I'm not surprised nobody has
replied. Probably nobody thinks it affects them.
I suggest that you post a followup message mentioning explicitly:
* To whom it is addressed,.
* What is missing, and how to correct it.
* Why this is a problem.
* That you would be willing to help in the effort to fix it (assuming that
you are of course).
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] gnu: Add seqtk.
2015-07-18 9:07 ` John Darrington
@ 2016-09-09 11:08 ` Ben Woodcroft
2016-09-09 12:37 ` Marius Bakke
0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2016-09-09 11:08 UTC (permalink / raw)
To: guix-devel
From: Ben J Woodcroft <donttrustben@gmail.com>
Well, despite the lightness of my touch, it seems the licensing is in now in
order. I've updated the package, here's an updated patch. Better?
Thanks,
ben
* gnu/packages/bioinformatics.scm (seqtk): New variable.
---
gnu/packages/bioinformatics.scm | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index f34acd1..4e296f5 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -4529,6 +4529,41 @@ BioPython in a convenient way. Instead of having a big mess of scripts, there
is one that takes arguments.")
(license license:gpl3)))
+(define-public seqtk
+ (package
+ (name "seqtk")
+ (version "1.2")
+ (source (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://github.com/lh3/seqtk/archive/v"
+ version ".tar.gz"))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32
+ "0ywdyzpmfiz2wp6ampbzqg4y8bj450nfgqarpamg045b8mk32lxx"))))
+ (build-system gnu-build-system)
+ (arguments
+ `(#:phases
+ (modify-phases %standard-phases
+ (delete 'configure)
+ (replace 'check
+ ;; There are no tests, so we just run a sanity check.
+ (lambda _ (zero? (system* "./seqtk" "seq"))))
+ (replace 'install
+ (lambda* (#:key outputs #:allow-other-keys)
+ (let ((bin (string-append (assoc-ref outputs "out") "/bin/")))
+ (install-file "seqtk" bin)))))))
+ (inputs
+ `(("zlib" ,zlib)))
+ (home-page "https://github.com/lh3/seqtk")
+ (synopsis "Toolkit for biological sequences in FASTA/Q formats")
+ (description
+ "Seqtk is a fast and lightweight tool for processing sequences in the
+FASTA or FASTQ format. It parses both FASTA and FASTQ files which can be
+optionally compressed by gzip.")
+ (license license:expat)))
+
(define-public snap-aligner
(package
(name "snap-aligner")
--
2.9.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] gnu: Add seqtk.
2016-09-09 11:08 ` [PATCH] gnu: " Ben Woodcroft
@ 2016-09-09 12:37 ` Marius Bakke
2016-09-09 12:54 ` Marius Bakke
2016-09-10 4:03 ` Ben Woodcroft
0 siblings, 2 replies; 8+ messages in thread
From: Marius Bakke @ 2016-09-09 12:37 UTC (permalink / raw)
To: Ben Woodcroft, guix-devel
Ben Woodcroft <donttrustben@gmail.com> writes:
> Well, despite the lightness of my touch, it seems the licensing is in now in
> order. I've updated the package, here's an updated patch. Better?
I don't think this was intended to be a commit message? :)
The program seems to bundle {khash,kseq}.h from htslib. Could you try
replacing them with the files directly from htslib? There are quite a
few examples of doing this already in bioinformatics.scm.
I also think the original description from github is better:
"Toolkit for processing sequences in FASTA/Q formats".
Other than that LGTM.
Thanks!
Marius
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gnu: Add seqtk.
2016-09-09 12:37 ` Marius Bakke
@ 2016-09-09 12:54 ` Marius Bakke
2016-09-10 4:03 ` Ben Woodcroft
1 sibling, 0 replies; 8+ messages in thread
From: Marius Bakke @ 2016-09-09 12:54 UTC (permalink / raw)
To: Ben Woodcroft, guix-devel
Marius Bakke <mbakke@fastmail.com> writes:
> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.
The released version bundles a few unnecessary header files as well,
that are removed in git. I think you can remove all ".h" files in an
origin snippet and substitute references to khash.h and kseq.h before
building.
Cheers,
Marius
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gnu: Add seqtk.
2016-09-09 12:37 ` Marius Bakke
2016-09-09 12:54 ` Marius Bakke
@ 2016-09-10 4:03 ` Ben Woodcroft
1 sibling, 0 replies; 8+ messages in thread
From: Ben Woodcroft @ 2016-09-10 4:03 UTC (permalink / raw)
To: Marius Bakke, Ben Woodcroft, guix-devel
On 09/09/16 22:37, Marius Bakke wrote:
> Ben Woodcroft <donttrustben@gmail.com> writes:
>
>> Well, despite the lightness of my touch, it seems the licensing is in now in
>> order. I've updated the package, here's an updated patch. Better?
> I don't think this was intended to be a commit message? :)
No indeed, I was responding to a thread so old I suspect it was before
your time.
> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.
I see your point, though I'm not sure that htslib is really the home of
those files, and anyway our htslib doesn't provide them as an output
since they are not a shared library (I believe).
I've always been a bit fuzzy on what the official policy is, to what
extent we should remove bundled code, so I'm happy to be corrected. In
this case since there is clear precedent I don't think we should bother
removing the bundled files.
> I also think the original description from github is better:
> "Toolkit for processing sequences in FASTA/Q formats".
How about "Toolkit for processing biological sequences in FASTA/Q
format"? I wanted to make it understandable in a more general context.
I'll push in the next day or two unless there are further comments.
Thanks for the review.
ben
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-10 4:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-24 5:07 [PATCH] Add seqtk Ben Woodcroft
2015-06-24 12:09 ` Mark H Weaver
2015-07-18 7:51 ` Ben Woodcroft
2015-07-18 9:07 ` John Darrington
2016-09-09 11:08 ` [PATCH] gnu: " Ben Woodcroft
2016-09-09 12:37 ` Marius Bakke
2016-09-09 12:54 ` Marius Bakke
2016-09-10 4:03 ` Ben Woodcroft
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).