unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] Add seqtk.
@ 2015-06-24  5:07 Ben Woodcroft
  2015-06-24 12:09 ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2015-06-24  5:07 UTC (permalink / raw)
  To: guix-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 75 bytes --]

I feel somewhat honoured to even be mentioned in the same thread as kseq.h

[-- Attachment #2: 0001-gnu-Add-seqtk.patch --]
[-- Type: text/x-patch, Size: 2380 bytes --]

From 48d3adae4bcada110df3fb7d8c5ddc55ad2000ff Mon Sep 17 00:00:00 2001
From: Ben Woodcroft <donttrustben@gmail.com>
Date: Wed, 24 Jun 2015 15:04:48 +1000
Subject: [PATCH] gnu: Add seqtk.

* gnu/packages/bioinformatics.scm (seqtk): New variable.
---
 gnu/packages/bioinformatics.scm | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 8dfaff3..e4575ae 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -1573,6 +1573,51 @@ any particular back-end implementation, and supports use of multiple back-ends
 simultaneously.")
     (license license:public-domain)))
 
+(define-public seqtk
+  (let ((commit "4feb6e814"))
+    (package
+      (name "seqtk")
+      ;; version number from running 'seqtk' after installation
+      (version (string-append "1.0-r82." commit))
+      (source (origin
+                (method git-fetch)
+                (uri (git-reference
+                      (url "https://github.com/lh3/seqtk.git")
+                      (commit commit)))
+                (sha256
+                 (base32
+                  "0wdkz8chkinfm23cg95nrn797lv12n2wxglwb3s2kvf0iv3rrx01"))))
+      (build-system gnu-build-system)
+      (arguments
+       `(#:tests? #f
+	 #:phases
+	 (modify-phases %standard-phases
+	   (delete 'configure)
+	   (replace 'build
+		    (lambda* _
+		      (zero? (system* "make"))))
+	   (replace 'install
+		    (lambda* (#:key outputs #:allow-other-keys)
+		      (let ((bin (string-append
+				  (assoc-ref outputs "out")
+				  "/bin/")))
+			(mkdir-p bin)
+			(copy-file "seqtk" (string-append
+					    bin "seqtk"))
+			(copy-file "trimadap" (string-append
+					    bin "trimadap"))))))))
+      (native-inputs
+       `(("zlib" ,zlib)))
+      (home-page "https://github.com/lh3/seqtk")
+      (synopsis "Toolkit for processing sequences in FASTA/Q formats")
+      (description
+       "Seqtk is a fast and lightweight tool for processing sequences in
+the FASTA or FASTQ format.  It seamlessly parses both FASTA and FASTQ
+files which can also be optionally compressed by gzip.")
+      (license (license:non-copyleft
+		"file://src/LICENSE"
+		"See src/LICENSE in the distribution.")))))
+
 (define-public ngs-java
   (package (inherit ngs-sdk)
     (name "ngs-java")
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add seqtk.
  2015-06-24  5:07 [PATCH] Add seqtk Ben Woodcroft
@ 2015-06-24 12:09 ` Mark H Weaver
  2015-07-18  7:51   ` Ben Woodcroft
  0 siblings, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2015-06-24 12:09 UTC (permalink / raw)
  To: Ben Woodcroft; +Cc: guix-devel

Ben Woodcroft <b.woodcroft@uq.edu.au> writes:
> I feel somewhat honoured to even be mentioned in the same thread as kseq.h

:-)

> From 48d3adae4bcada110df3fb7d8c5ddc55ad2000ff Mon Sep 17 00:00:00 2001
> From: Ben Woodcroft <donttrustben@gmail.com>
> Date: Wed, 24 Jun 2015 15:04:48 +1000
> Subject: [PATCH] gnu: Add seqtk.
>
> * gnu/packages/bioinformatics.scm (seqtk): New variable.
> ---
>  gnu/packages/bioinformatics.scm | 45 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
> index 8dfaff3..e4575ae 100644
> --- a/gnu/packages/bioinformatics.scm
> +++ b/gnu/packages/bioinformatics.scm
> @@ -1573,6 +1573,51 @@ any particular back-end implementation, and supports use of multiple back-ends
>  simultaneously.")
>      (license license:public-domain)))
>  
> +(define-public seqtk
> +  (let ((commit "4feb6e814"))
> +    (package
> +      (name "seqtk")
> +      ;; version number from running 'seqtk' after installation
> +      (version (string-append "1.0-r82." commit))
> +      (source (origin
> +                (method git-fetch)
> +                (uri (git-reference
> +                      (url "https://github.com/lh3/seqtk.git")
> +                      (commit commit)))
> +                (sha256
> +                 (base32
> +                  "0wdkz8chkinfm23cg95nrn797lv12n2wxglwb3s2kvf0iv3rrx01"))))
> +      (build-system gnu-build-system)
> +      (arguments
> +       `(#:tests? #f
> +	 #:phases

The misalignment of the code above in this email was caused by your use
of tabs.  Please do not use tabs anywhere in *.scm files in Guix.  (This
is also an issue in your yaggo patch.)

Please add a brief comment explaining why tests are disabled, in this
case: "#:tests? #f  ;no test suite".

> +	 (modify-phases %standard-phases
> +	   (delete 'configure)
> +	   (replace 'build
> +		    (lambda* _
> +		      (zero? (system* "make"))))

Why replace the default 'build' phase of the gnu-build-system here?  The
only difference is that the default build phase adds -j<N> to enable a
parallel build by default.

> +	   (replace 'install
> +		    (lambda* (#:key outputs #:allow-other-keys)
> +		      (let ((bin (string-append
> +				  (assoc-ref outputs "out")
> +				  "/bin/")))
> +			(mkdir-p bin)
> +			(copy-file "seqtk" (string-append
> +					    bin "seqtk"))
> +			(copy-file "trimadap" (string-append
> +					    bin "trimadap"))))))))

Phase procedures should return a boolean indicating whether the phase
succeeded, but the return value of 'copy-file' is not specified.  Please
add #t after the last call to 'copy-file'.

> +      (native-inputs
> +       `(("zlib" ,zlib)))

zlib needs to be a normal input here, so please replace "native-inputs"
with "inputs".

> +      (home-page "https://github.com/lh3/seqtk")
> +      (synopsis "Toolkit for processing sequences in FASTA/Q formats")
> +      (description
> +       "Seqtk is a fast and lightweight tool for processing sequences in
> +the FASTA or FASTQ format.  It seamlessly parses both FASTA and FASTQ
> +files which can also be optionally compressed by gzip.")
> +      (license (license:non-copyleft
> +		"file://src/LICENSE"
> +		"See src/LICENSE in the distribution.")))))

It's the expat license, so just write "(license license:expat)".  The
author calls it "The MIT License", but that term is misleading since MIT
has used many licenses for software, e.g. the X11 license.

However, there's a bigger problem here.  Some of the code is not clearly
licensed or has invalid copyright notices:

* The copyright notice on kstring.h lacks copyright dates.

* The copyright dates in seqtk.c and LICENSE are incorrect
  ("20082-2012").

* ksw.h and trimadap.c lack any copyright notice at all.  The mere
  presence of a LICENSE file is not enough, especially given the lack of
  any statement about the code license in the README.

Would you be willing to ask the author to fix these issues?

       Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add seqtk.
  2015-06-24 12:09 ` Mark H Weaver
@ 2015-07-18  7:51   ` Ben Woodcroft
  2015-07-18  9:07     ` John Darrington
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2015-07-18  7:51 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

On 24/06/15 22:09, Mark H Weaver wrote:
> However, there's a bigger problem here.  Some of the code is not clearly
> licensed or has invalid copyright notices:
>
> * The copyright notice on kstring.h lacks copyright dates.
>
> * The copyright dates in seqtk.c and LICENSE are incorrect
>    ("20082-2012").
>
> * ksw.h and trimadap.c lack any copyright notice at all.  The mere
>    presence of a LICENSE file is not enough, especially given the lack of
>    any statement about the code license in the README.
>
> Would you be willing to ask the author to fix these issues?
I did this almost a month ago
https://github.com/lh3/seqtk/pull/60

But no response as yet. I think the common sense here is that it is 
expat licensed, WDYT? It would be a shame to exclude this software as it 
is quite well used in bioinformatics generally, but I'm happy to defer 
to your decision.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add seqtk.
  2015-07-18  7:51   ` Ben Woodcroft
@ 2015-07-18  9:07     ` John Darrington
  2016-09-09 11:08       ` [PATCH] gnu: " Ben Woodcroft
  0 siblings, 1 reply; 8+ messages in thread
From: John Darrington @ 2015-07-18  9:07 UTC (permalink / raw)
  To: Ben Woodcroft; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On Sat, Jul 18, 2015 at 05:51:01PM +1000, Ben Woodcroft wrote:
     On 24/06/15 22:09, Mark H Weaver wrote:
     >However, there's a bigger problem here.  Some of the code is not clearly
     >licensed or has invalid copyright notices:
     >
     >* The copyright notice on kstring.h lacks copyright dates.
     >
     >* The copyright dates in seqtk.c and LICENSE are incorrect
     >   ("20082-2012").
     >
     >* ksw.h and trimadap.c lack any copyright notice at all.  The mere
     >   presence of a LICENSE file is not enough, especially given the lack of
     >   any statement about the code license in the README.
     >
     >Would you be willing to ask the author to fix these issues?
     I did this almost a month ago
     https://github.com/lh3/seqtk/pull/60
     
     But no response as yet. I think the common sense here is that it
     is expat licensed, WDYT? It would be a shame to exclude this
     software as it is quite well used in bioinformatics generally,
     but I'm happy to defer to your decision.

When I look at that url, I don't see that you have asked anybody anything.
The message isn't addressed to anyone in particular; the salutation is
merely "Hi".  Also, it poses no question.  It just make a (very weak) statement,
including the words "seem to be" and "unsure".  I'm not surprised nobody has
replied.  Probably nobody thinks it affects them.

I suggest that you post a followup message mentioning explicitly:

 * To whom it is addressed,.

 * What is missing, and how to correct it.

 * Why this is a problem.

 * That you would be willing to help in the effort to fix it (assuming that
   you are of course).


J'


-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] gnu: Add seqtk.
  2015-07-18  9:07     ` John Darrington
@ 2016-09-09 11:08       ` Ben Woodcroft
  2016-09-09 12:37         ` Marius Bakke
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Woodcroft @ 2016-09-09 11:08 UTC (permalink / raw)
  To: guix-devel

From: Ben J Woodcroft <donttrustben@gmail.com>

Well, despite the lightness of my touch, it seems the licensing is in now in
order.  I've updated the package, here's an updated patch.  Better?

Thanks,
ben

* gnu/packages/bioinformatics.scm (seqtk): New variable.
---
 gnu/packages/bioinformatics.scm | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index f34acd1..4e296f5 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -4529,6 +4529,41 @@ BioPython in a convenient way.  Instead of having a big mess of scripts, there
 is one that takes arguments.")
     (license license:gpl3)))
 
+(define-public seqtk
+  (package
+    (name "seqtk")
+    (version "1.2")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "https://github.com/lh3/seqtk/archive/v"
+                    version ".tar.gz"))
+              (file-name (string-append name "-" version ".tar.gz"))
+              (sha256
+               (base32
+                "0ywdyzpmfiz2wp6ampbzqg4y8bj450nfgqarpamg045b8mk32lxx"))))
+    (build-system gnu-build-system)
+    (arguments
+     `(#:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (replace 'check
+           ;; There are no tests, so we just run a sanity check.
+           (lambda _ (zero? (system* "./seqtk" "seq"))))
+         (replace 'install
+           (lambda* (#:key outputs #:allow-other-keys)
+             (let ((bin (string-append (assoc-ref outputs "out") "/bin/")))
+               (install-file "seqtk" bin)))))))
+    (inputs
+     `(("zlib" ,zlib)))
+    (home-page "https://github.com/lh3/seqtk")
+    (synopsis "Toolkit for biological sequences in FASTA/Q formats")
+    (description
+     "Seqtk is a fast and lightweight tool for processing sequences in the
+FASTA or FASTQ format.  It parses both FASTA and FASTQ files which can be
+optionally compressed by gzip.")
+      (license license:expat)))
+
 (define-public snap-aligner
   (package
     (name "snap-aligner")
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] gnu: Add seqtk.
  2016-09-09 11:08       ` [PATCH] gnu: " Ben Woodcroft
@ 2016-09-09 12:37         ` Marius Bakke
  2016-09-09 12:54           ` Marius Bakke
  2016-09-10  4:03           ` Ben Woodcroft
  0 siblings, 2 replies; 8+ messages in thread
From: Marius Bakke @ 2016-09-09 12:37 UTC (permalink / raw)
  To: Ben Woodcroft, guix-devel

Ben Woodcroft <donttrustben@gmail.com> writes:

> Well, despite the lightness of my touch, it seems the licensing is in now in
> order.  I've updated the package, here's an updated patch.  Better?

I don't think this was intended to be a commit message? :)

The program seems to bundle {khash,kseq}.h from htslib. Could you try
replacing them with the files directly from htslib? There are quite a
few examples of doing this already in bioinformatics.scm.

I also think the original description from github is better:
"Toolkit for processing sequences in FASTA/Q formats".

Other than that LGTM.

Thanks!
Marius

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gnu: Add seqtk.
  2016-09-09 12:37         ` Marius Bakke
@ 2016-09-09 12:54           ` Marius Bakke
  2016-09-10  4:03           ` Ben Woodcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Marius Bakke @ 2016-09-09 12:54 UTC (permalink / raw)
  To: Ben Woodcroft, guix-devel

Marius Bakke <mbakke@fastmail.com> writes:

> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.

The released version bundles a few unnecessary header files as well,
that are removed in git. I think you can remove all ".h" files in an
origin snippet and substitute references to khash.h and kseq.h before
building.

Cheers,
Marius

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gnu: Add seqtk.
  2016-09-09 12:37         ` Marius Bakke
  2016-09-09 12:54           ` Marius Bakke
@ 2016-09-10  4:03           ` Ben Woodcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Ben Woodcroft @ 2016-09-10  4:03 UTC (permalink / raw)
  To: Marius Bakke, Ben Woodcroft, guix-devel



On 09/09/16 22:37, Marius Bakke wrote:
> Ben Woodcroft <donttrustben@gmail.com> writes:
>
>> Well, despite the lightness of my touch, it seems the licensing is in now in
>> order.  I've updated the package, here's an updated patch.  Better?
> I don't think this was intended to be a commit message? :)

No indeed, I was responding to a thread so old I suspect it was before 
your time.

> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.

I see your point, though I'm not sure that htslib is really the home of 
those files, and anyway our htslib doesn't provide them as an output 
since they are not a shared library (I believe).

I've always been a bit fuzzy on what the official policy is, to what 
extent we should remove bundled code, so I'm happy to be corrected. In 
this case since there is clear precedent I don't think we should bother 
removing the bundled files.

> I also think the original description from github is better:
> "Toolkit for processing sequences in FASTA/Q formats".
How about "Toolkit for processing biological sequences in FASTA/Q 
format"? I wanted to make it understandable in a more general context.

I'll push in the next day or two unless there are further comments.
Thanks for the review.
ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-10  4:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-24  5:07 [PATCH] Add seqtk Ben Woodcroft
2015-06-24 12:09 ` Mark H Weaver
2015-07-18  7:51   ` Ben Woodcroft
2015-07-18  9:07     ` John Darrington
2016-09-09 11:08       ` [PATCH] gnu: " Ben Woodcroft
2016-09-09 12:37         ` Marius Bakke
2016-09-09 12:54           ` Marius Bakke
2016-09-10  4:03           ` Ben Woodcroft

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).