unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] gnu: Add TopHat.
@ 2016-01-19 13:52 Ricardo Wurmus
  2016-01-22 11:13 ` Ben Woodcroft
  2016-01-22 17:09 ` Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: Ricardo Wurmus @ 2016-01-19 13:52 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

Hi Guix,

I’m happy to be able to submit a patch to add TopHat.  It’s a very
popular piece of bioinformatics software that I didn’t submit to Guix
upstream before as the license wasn’t clear.  The latest release at that
time contained a LICENSE file with the text of the Artistic license 1.0.

The new 2.1.0 release no longer contains this file; instead it declares
the license to be the Boost Software license 1.0.  Additionally, the
license has been clarified on a Github issue (which I linked to).  I
only just noticed this because a user asked me to package the latest
version and I was very happy to see the Artistic license removed.

The sources of TopHat bundle the SeqAn header library (version 1.3) and
the sources of samtools 0.1.18.  I’m patching the Makefile in a build
phase to use our packages for seqan@1.4.2 and samtools@0.1.19.  A
snippet removes the bundled sources.

An additional patch to the sources makes it possible to build TopHat
with SeqAn 1.4.2.

~~ Ricardo


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-gnu-Add-TopHat.patch --]
[-- Type: text/x-patch, Size: 5841 bytes --]

From 91f7bacd8657b4be6669a2e72a2ca74b2e44a62f Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Date: Tue, 19 Jan 2016 14:29:19 +0100
Subject: [PATCH] gnu: Add TopHat.

* gnu/packages/bioinformatics.scm (tophat): New variable.
* gnu/packages/patches/tophat-build-with-later-seqan.patch: New file.
* gnu-system.am (dist_patch_DATA): Add it.
---
 gnu-system.am                                      |  1 +
 gnu/packages/bioinformatics.scm                    | 64 ++++++++++++++++++++++
 .../patches/tophat-build-with-later-seqan.patch    | 24 ++++++++
 3 files changed, 89 insertions(+)
 create mode 100644 gnu/packages/patches/tophat-build-with-later-seqan.patch

diff --git a/gnu-system.am b/gnu-system.am
index 543a825..c9d16d6 100644
--- a/gnu-system.am
+++ b/gnu-system.am
@@ -678,6 +678,7 @@ dist_patch_DATA =						\
   gnu/packages/patches/tidy-CVE-2015-5522+5523.patch		\
   gnu/packages/patches/tinyxml-use-stl.patch			\
   gnu/packages/patches/tk-find-library.patch			\
+  gnu/packages/patches/tophat-build-with-later-seqan.patch	\
   gnu/packages/patches/torsocks-dns-test.patch			\
   gnu/packages/patches/tvtime-gcc41.patch			\
   gnu/packages/patches/tvtime-pngoutput.patch			\
diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index d7ff5e8..74cadff 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -654,6 +654,70 @@ gapped, local, and paired-end alignment modes.")
     (supported-systems '("x86_64-linux"))
     (license license:gpl3+)))
 
+(define-public tophat
+  (package
+    (name "tophat")
+    (version "2.1.0")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "http://ccb.jhu.edu/software/tophat/downloads/tophat-"
+                    version ".tar.gz"))
+              (sha256
+               (base32
+                "168zlzykq622zbgkh90a90f1bdgsxkscq2zxzbj8brq80hbjpyp7"))
+              (patches (list (search-patch "tophat-build-with-later-seqan.patch")))
+              (modules '((guix build utils)))
+              (snippet
+               '(begin
+                  ;; Remove bundled SeqAn and samtools
+                  (delete-file-recursively "src/SeqAn-1.3")
+                  (delete-file-recursively "src/samtools-0.1.18")
+                  #t))))
+    (build-system gnu-build-system)
+    (arguments
+     '(#:parallel-build? #f ; not supported
+       #:phases
+       (modify-phases %standard-phases
+         (add-after 'unpack 'use-system-samtools
+           (lambda* (#:key inputs #:allow-other-keys)
+             (substitute* "src/Makefile.in"
+               (("(noinst_LIBRARIES = )\\$\\(SAMLIB\\)" _ prefix) prefix)
+               (("\\$\\(SAMPROG\\): \\$\\(SAMLIB\\)") "")
+               (("SAMPROG = samtools_0\\.1\\.18") "")
+               (("\\$\\(samtools_0_1_18_SOURCES\\)") "")
+               (("am__EXEEXT_1 = samtools_0\\.1\\.18\\$\\(EXEEXT\\)") ""))
+             (substitute* '("src/common.h"
+                            "src/bam2fastx.cpp")
+               (("#include \"bam.h\"") "#include <samtools/bam.h>")
+               (("#include \"sam.h\"") "#include <samtools/sam.h>"))
+             (substitute* '("src/bwt_map.h"
+                            "src/map2gtf.h"
+                            "src/align_status.h")
+               (("#include <bam.h>") "#include <samtools/bam.h>")
+               (("#include <sam.h>") "#include <samtools/sam.h>"))
+             #t)))))
+    (inputs
+     `(("boost" ,boost)
+       ("bowtie" ,bowtie)
+       ("samtools" ,samtools-0.1)
+       ("ncurses" ,ncurses)
+       ("python" ,python-2)
+       ("perl" ,perl)
+       ("zlib" ,zlib)
+       ("seqan" ,seqan)))
+    (home-page "http://ccb.jhu.edu/software/tophat/index.shtml")
+    (synopsis "Spliced read mapper for RNA-Seq")
+    (description
+     "TopHat is a fast splice junction mapper for RNA-Seq reads.  It aligns
+RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short
+read aligner Bowtie, and then analyzes the mapping results to identify splice
+junctions between exons.")
+    ;; TopHat is released under the Boost Software License, Version 1.0
+    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
+    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
+                                "Some components have other similar licences."))))
+
 (define-public bwa
   (package
     (name "bwa")
diff --git a/gnu/packages/patches/tophat-build-with-later-seqan.patch b/gnu/packages/patches/tophat-build-with-later-seqan.patch
new file mode 100644
index 0000000..fc742e2
--- /dev/null
+++ b/gnu/packages/patches/tophat-build-with-later-seqan.patch
@@ -0,0 +1,24 @@
+This patch resolves a build failure when building TopHat 2.1.0 with SeqAn 1.4.
+This is the relevant part of a patch originally posted here:
+https://lists.fu-berlin.de/pipermail/seqan-dev/2014-July/msg00001.html
+
+--- a/src/segment_juncs.cpp
++++ b/src/segment_juncs.cpp
+@@ -2050,10 +2050,13 @@ void juncs_from_ref_segs(RefSequenceTabl
+     typedef map<uint32_t, IntronMotifs> MotifMap;
+     
+     MotifMap ims;
+-	
+-    seqan::DnaStringReverseComplement rev_donor_dinuc(donor_dinuc);
+-    seqan::DnaStringReverseComplement rev_acceptor_dinuc(acceptor_dinuc);
+-    
++
++    typedef seqan::ModifiedString<
++                    seqan::ModifiedString<seqan::DnaString const, seqan::ModView<seqan::FunctorComplement<seqan::Dna> > >,  
++                    seqan::ModReverse>   ConstDnaStringReverseComplement;
++    ConstDnaStringReverseComplement rev_donor_dinuc(donor_dinuc);
++    ConstDnaStringReverseComplement rev_acceptor_dinuc(acceptor_dinuc);
++     
+     if (talkative)
+         fprintf(stderr, "Collecting potential splice sites in islands\n");
+ 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] gnu: Add TopHat.
  2016-01-19 13:52 [PATCH] gnu: Add TopHat Ricardo Wurmus
@ 2016-01-22 11:13 ` Ben Woodcroft
  2016-01-23  7:51   ` Ricardo Wurmus
  2016-01-22 17:09 ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: Ben Woodcroft @ 2016-01-22 11:13 UTC (permalink / raw)
  To: Ricardo Wurmus, guix-devel



On 19/01/16 23:52, Ricardo Wurmus wrote:
> Hi Guix,
>
> I’m happy to be able to submit a patch to add TopHat.  It’s a very
> popular piece of bioinformatics software that I didn’t submit to Guix
> upstream before as the license wasn’t clear.  The latest release at that
> time contained a LICENSE file with the text of the Artistic license 1.0.
>
> The new 2.1.0 release no longer contains this file; instead it declares
> the license to be the Boost Software license 1.0.  Additionally, the
> license has been clarified on a Github issue (which I linked to).  I
> only just noticed this because a user asked me to package the latest
> version and I was very happy to see the Artistic license removed.
>
> The sources of TopHat bundle the SeqAn header library (version 1.3) and
> the sources of samtools 0.1.18.  I’m patching the Makefile in a build
> phase to use our packages for seqan@1.4.2 and samtools@0.1.19.  A
> snippet removes the bundled sources.
>
> An additional patch to the sources makes it possible to build TopHat
> with SeqAn 1.4.2.
Excellent.
> +    ;; TopHat is released under the Boost Software License, Version 1.0
> +    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
> +    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
> +                                "Some components have other similar licences."))))
Am I right in thinking that the only other license used is asl2.0 ? If 
so why not say so?

Other than that, looks pretty good.

Thanks for packaging these important bioinformatics tools.
ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gnu: Add TopHat.
  2016-01-19 13:52 [PATCH] gnu: Add TopHat Ricardo Wurmus
  2016-01-22 11:13 ` Ben Woodcroft
@ 2016-01-22 17:09 ` Ludovic Courtès
  2016-01-23  7:50   ` Ricardo Wurmus
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2016-01-22 17:09 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> I’m happy to be able to submit a patch to add TopHat.  It’s a very
> popular piece of bioinformatics software that I didn’t submit to Guix
> upstream before as the license wasn’t clear.  The latest release at that
> time contained a LICENSE file with the text of the Artistic license 1.0.
>
> The new 2.1.0 release no longer contains this file; instead it declares
> the license to be the Boost Software license 1.0.  Additionally, the
> license has been clarified on a Github issue (which I linked to).  I
> only just noticed this because a user asked me to package the latest
> version and I was very happy to see the Artistic license removed.

One more free software package, this is good news!

> The sources of TopHat bundle the SeqAn header library (version 1.3) and
> the sources of samtools 0.1.18.  I’m patching the Makefile in a build
> phase to use our packages for seqan@1.4.2 and samtools@0.1.19.  A
> snippet removes the bundled sources.

Good.

> From 91f7bacd8657b4be6669a2e72a2ca74b2e44a62f Mon Sep 17 00:00:00 2001
> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> Date: Tue, 19 Jan 2016 14:29:19 +0100
> Subject: [PATCH] gnu: Add TopHat.
>
> * gnu/packages/bioinformatics.scm (tophat): New variable.
> * gnu/packages/patches/tophat-build-with-later-seqan.patch: New file.
> * gnu-system.am (dist_patch_DATA): Add it.

[...]

> +    (synopsis "Spliced read mapper for RNA-Seq")
> +    (description
> +     "TopHat is a fast splice junction mapper for RNA-Seq reads.  It aligns

It would be nice to contextualize a bit, for instance by adding a word
after “RNA-Seq”, like:

  … for RNA-Seq bioinformatics foobars

or:

  … for the RNA-Seq bioinformatics thingie

Something like that.  :-)

> +    ;; TopHat is released under the Boost Software License, Version 1.0
> +    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
> +    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
> +                                "Some components have other similar licences."))))

You can use ‘license:boost1.0’ here.

Otherwise LGTM, thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gnu: Add TopHat.
  2016-01-22 17:09 ` Ludovic Courtès
@ 2016-01-23  7:50   ` Ricardo Wurmus
  0 siblings, 0 replies; 6+ messages in thread
From: Ricardo Wurmus @ 2016-01-23  7:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Ludovic Courtès <ludo@gnu.org> writes:

> Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:
>
>> +    (synopsis "Spliced read mapper for RNA-Seq")
>> +    (description
>> +     "TopHat is a fast splice junction mapper for RNA-Seq reads.  It aligns
>
> It would be nice to contextualize a bit, for instance by adding a word
> after “RNA-Seq”, like:
>
>   … for RNA-Seq bioinformatics foobars
>
> or:
>
>   … for the RNA-Seq bioinformatics thingie
>
> Something like that.  :-)

I changed it to “for RNA-Seq data” in the synopsis and “for nucleotide
sequence reads produced by the RNA-Seq method” in the description.
That’s probably a little clunky for bioinformaticians but I hope it’s a
little less dense for everyone else.

>
>> +    ;; TopHat is released under the Boost Software License, Version 1.0
>> +    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
>> +    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
>> +                                "Some components have other similar licences."))))
>
> You can use ‘license:boost1.0’ here.

Done.  Thanks for the suggestions!

~~ Ricardo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gnu: Add TopHat.
  2016-01-22 11:13 ` Ben Woodcroft
@ 2016-01-23  7:51   ` Ricardo Wurmus
  2016-01-23  8:44     ` Ben Woodcroft
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Wurmus @ 2016-01-23  7:51 UTC (permalink / raw)
  To: Ben Woodcroft; +Cc: guix-devel


Ben Woodcroft <b.woodcroft@uq.edu.au> writes:

>> +    ;; TopHat is released under the Boost Software License, Version 1.0
>> +    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
>> +    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
>> +                                "Some components have other similar licences."))))
> Am I right in thinking that the only other license used is asl2.0 ? If 
> so why not say so?

I just copied this from the “boost” package.  After Ludo’s comment I
changed that to just “license:boost1.0”.  Maybe we can replace the
license field for Boost with “(license license:boost1.0)” as well?

~~ Ricardo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gnu: Add TopHat.
  2016-01-23  7:51   ` Ricardo Wurmus
@ 2016-01-23  8:44     ` Ben Woodcroft
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Woodcroft @ 2016-01-23  8:44 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel



On 23/01/16 17:51, Ricardo Wurmus wrote:
> Ben Woodcroft <b.woodcroft@uq.edu.au> writes:
>
>>> +    ;; TopHat is released under the Boost Software License, Version 1.0
>>> +    ;; See https://github.com/infphilo/tophat/issues/11#issuecomment-121589893
>>> +    (license (license:x11-style "http://www.boost.org/LICENSE_1_0.txt"
>>> +                                "Some components have other similar licences."))))
>> Am I right in thinking that the only other license used is asl2.0 ? If
>> so why not say so?
> I just copied this from the “boost” package.  After Ludo’s comment I
> changed that to just “license:boost1.0”.  Maybe we can replace the
> license field for Boost with “(license license:boost1.0)” as well?
Sounds fine to me. Only, some tophat source appears to be asl e.g.
https://github.com/infphilo/tophat/blob/master/src/intervaltree/__init__.py

ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-23  8:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-19 13:52 [PATCH] gnu: Add TopHat Ricardo Wurmus
2016-01-22 11:13 ` Ben Woodcroft
2016-01-23  7:51   ` Ricardo Wurmus
2016-01-23  8:44     ` Ben Woodcroft
2016-01-22 17:09 ` Ludovic Courtès
2016-01-23  7:50   ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).