unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] gnu: Add freebayes.
@ 2016-03-08 15:44 Roel Janssen
  2016-03-08 23:55 ` Leo Famulari
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Roel Janssen @ 2016-03-08 15:44 UTC (permalink / raw)
  To: guix-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: 0001-gnu-Add-freebayes.patch --]
[-- Type: text/x-patch, Size: 8738 bytes --]

From 38302e8cac77275694c8793933be414ec26906ec Mon Sep 17 00:00:00 2001
From: Roel Janssen <roel@gnu.org>
Date: Tue, 8 Mar 2016 16:38:46 +0100
Subject: [PATCH] gnu: Add freebayes.

* gnu/packages/bioinformatics.scm (freebayes): New variable.
---
 gnu/packages/bioinformatics.scm | 160 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 5d53dc9..601ab9b 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -37,6 +37,7 @@
   #:use-module (gnu packages algebra)
   #:use-module (gnu packages base)
   #:use-module (gnu packages boost)
+  #:use-module (gnu packages cmake)
   #:use-module (gnu packages compression)
   #:use-module (gnu packages cpio)
   #:use-module (gnu packages curl)
@@ -247,6 +248,165 @@ intervals from multiple files in widely-used genomic file formats such as BAM,
 BED, GFF/GTF, VCF.")
     (license license:gpl2)))
 
+(define-public freebayes
+  (let ((commit "3ce827d8ebf89bb3bdc097ee0fe7f46f9f30d5fb"))
+    (package
+      (name "freebayes")
+      (version (string-append "v1.0.2-" (string-take commit 7)))
+      (source (origin
+        (method git-fetch)
+        (uri (git-reference
+          (url "https://github.com/ekg/freebayes.git")
+          (commit commit)))
+        (file-name (string-append name "-" version "-checkout"))
+        (sha256
+         (base32 "1sbzwmcbn78ybymjnhwk7qc5r912azy5vqz2y7y81616yc3ba2a2"))))
+      (build-system gnu-build-system)
+      (native-inputs
+       `(("cmake" ,cmake)
+         ("htslib" ,htslib)
+         ("zlib" ,zlib)
+         ("python" ,python-2)
+         ("perl" ,perl)
+         ("bamtools-src"
+          ,(origin
+             (method url-fetch)
+             (uri (string-append "https://github.com/ekg/bamtools/archive/"
+                  "e77a43f5097ea7eee432ee765049c6b246d49baa" ".tar.gz"))
+             (file-name "bamtools-src.tar.gz")
+             (sha256
+              (base32 "0rqymka21g6lfjfgxzr40pxz4c4fcl77jpy1np1li70pnc7h2cs1"))))
+         ("vcflib-src"
+          ,(origin
+             (method url-fetch)
+             (uri (string-append "https://github.com/vcflib/vcflib/archive/"
+                  "5ac091365fdc716cc47cc5410bb97ee5dc2a2c92" ".tar.gz"))
+             (file-name "vcflib-5ac0913.tar.gz")
+             (sha256
+              (base32 "0ywshwpif059z5h0g7zzrdfzzdj2gr8xvwlwcsdxrms3p9iy35h8"))))
+         ;; These are submodules for the vcflib version used in freebayes
+         ("tabixpp-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/tabixpp/archive/"
+                  "bbc63a49acc52212199f92e9e3b8fba0a593e3f7" ".tar.gz"))
+            (file-name "tabixpp-src.tar.gz")
+            (sha256
+             (base32 "1s06wmpgj4my4pik5kp2lc42hzzazbp5ism2y4i2ajp2y1c68g77"))))
+         ("intervaltree-src"
+          ,(origin
+             (method url-fetch)
+             (uri (string-append
+                   "https://github.com/ekg/intervaltree/archive/"
+                   "dbb4c513d1ad3baac516fc1484c995daf9b42838" ".tar.gz"))
+             (file-name "intervaltree-src.tar.gz")
+             (sha256
+              (base32 "19prwpn2wxsrijp5svfqvfcxl5nj7zdhm3jycd5kqhl9nifpmcks"))))
+         ("smithwaterman-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/smithwaterman/archive/"
+                  "203218b47d45ac56ef234716f1bd4c741b289be1" ".tar.gz"))
+            (file-name "smithwaterman-src.tar.gz")
+            (sha256
+             (base32 "1lkxy4xkjn96l70jdbsrlm687jhisgw4il0xr2dm33qwcclzzm3b"))))
+         ("multichoose-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/multichoose/archive/"
+                  "73d35daa18bf35729b9ba758041a9247a72484a5" ".tar.gz"))
+            (file-name "multichoose-src.tar.gz")
+            (sha256
+             (base32 "07aizwdabmlnjaq4p3v0vsasgz1xzxid8xcxcw3paq8kh9c1099i"))))
+         ("fsom-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/fsom/archive/"
+                  "a6ef318fbd347c53189384aef7f670c0e6ce89a3" ".tar.gz"))
+            (file-name "fsom-src.tar.gz")
+            (sha256
+             (base32 "0q6b57ppxfvsm5cqmmbfmjpn5qvx2zi5pamvp3yh8gpmmz8cfbl3"))))
+         ("filevercmp-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/filevercmp/archive/"
+                  "1a9b779b93d0b244040274794d402106907b71b7" ".tar.gz"))
+            (file-name "filevercmp-src.tar.gz")
+            (sha256
+             (base32 "0yp5jswf5j2pqc6517x277s4s6h1ss99v57kxw9gy0jkfl3yh450"))))
+         ("fastahack-src"
+          ,(origin
+            (method url-fetch)
+            (uri (string-append "https://github.com/ekg/fastahack/archive/"
+                  "c68cebb4f2e5d5d2b70cf08fbdf1944e9ab2c2dd" ".tar.gz"))
+            (file-name "fastahack-src.tar.gz")
+            (sha256
+             (base32 "0j25lcl3jk1kls66zzxjfyq5ir6sfcvqrdwfcva61y3ajc9ssay2"))))
+            ))
+      (arguments
+       `(#:tests? #f
+         #:phases
+         (modify-phases %standard-phases
+           (delete 'configure)
+           (delete 'check)
+           (add-after 'unpack 'unpack-submodule-sources
+             (lambda* (#:key inputs #:allow-other-keys)
+               (let ((unpack (lambda (source target)
+                               (with-directory-excursion target
+                                 (zero? (system* "tar" "xvf"
+                                                 (assoc-ref inputs source)
+                                                 "--strip-components=1"))))))
+                 (and
+                  (unpack "bamtools-src" "bamtools")
+                  (unpack "vcflib-src" "vcflib")
+                  (unpack "intervaltree-src" "intervaltree")
+                  (unpack "fastahack-src" "vcflib/fastahack")
+                  (unpack "filevercmp-src" "vcflib/filevercmp")
+                  (unpack "fsom-src" "vcflib/fsom")
+                  (unpack "intervaltree-src" "vcflib/intervaltree")
+                  (unpack "multichoose-src" "vcflib/multichoose")
+                  (unpack "smithwaterman-src" "vcflib/smithwaterman")
+                  (unpack "tabixpp-src" "vcflib/tabixpp")))))
+           (add-after 'unpack-submodule-sources 'fix-makefile
+             (lambda* (#:key inputs #:allow-other-keys)
+               ;; We don't have the .git folder to get the version tag from.
+               ;; For this checkout of the code, it's v1.0.0.
+               (substitute* '("vcflib/Makefile")
+                 (("^GIT_VERSION.*") "GIT_VERSION = v1.0.0"))))
+           (replace
+            'build
+            (lambda* (#:key inputs make-flags #:allow-other-keys)
+              (and
+               ;; Compile Bamtools before compiling the main project.
+               (with-directory-excursion "bamtools"
+                 (system* "mkdir" "build")
+                 (with-directory-excursion "build"
+                   (and (zero? (system* "cmake" "../"))
+                        (zero? (system* "make")))))
+               ;; Compile vcflib before we compiling the main project.
+               (with-directory-excursion "vcflib"
+                 (with-directory-excursion "tabixpp"
+                   (zero? (system* "make")))
+                 (zero? (system* "make" "CC=gcc"
+                   (string-append "CFLAGS=\"" "-Itabixpp "
+                     "-I" (assoc-ref inputs "htslib") "/include " "\"") "all")))
+               (with-directory-excursion "src"
+                 (zero? (system* "make"))))))
+           (replace
+            'install
+            (lambda* (#:key outputs #:allow-other-keys)
+              (let ((bin (string-append (assoc-ref outputs "out") "/bin")))
+                (install-file "bin/freebayes" bin)
+                (install-file "bin/bamleftalign" bin)))))))
+      (home-page "https://github.com/ekg/freebayes")
+      (synopsis "Haplotype-based variant detector")
+      (description "FreeBayes is a Bayesian genetic variant detector designed to
+find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms),
+indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and
+complex events (composite insertion and substitution events) smaller than the
+length of a short-read sequencing alignment.")
+      (license license:expat))))
+
 (define-public python2-pybedtools
   (package
     (name "python2-pybedtools")
-- 
2.5.0


[-- Attachment #2: Type: text/plain, Size: 431 bytes --]

Dear Guix,

I have a patch to add another bioinformatics tool: FreeBayes. It is a
rather long patch, so suspect it isn't completely 'guix-proof'.

One of the problems with the patch is probably the bulk of dependencies
dragged in (for example, vcflib).  They use specific versions so they
are tied to this package (so that's why I cannot package them separately).

Hope someone is willing to review it.

Kind regards,
Roel Janssen

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-08 15:44 Roel Janssen
@ 2016-03-08 23:55 ` Leo Famulari
  2016-03-09  6:44   ` Pjotr Prins
  2016-03-09  6:53 ` Pjotr Prins
  2016-03-09 10:17 ` Ricardo Wurmus
  2 siblings, 1 reply; 13+ messages in thread
From: Leo Famulari @ 2016-03-08 23:55 UTC (permalink / raw)
  To: Roel Janssen; +Cc: guix-devel

On Tue, Mar 08, 2016 at 04:44:13PM +0100, Roel Janssen wrote:

Thanks for the patch! This looks like a big one!

[...]

> +      (native-inputs
> +       `(("cmake" ,cmake)
> +         ("htslib" ,htslib)
> +         ("zlib" ,zlib)
> +         ("python" ,python-2)
> +         ("perl" ,perl)
> +         ("bamtools-src"
> +          ,(origin
> +             (method url-fetch)
> +             (uri (string-append "https://github.com/ekg/bamtools/archive/"
> +                  "e77a43f5097ea7eee432ee765049c6b246d49baa" ".tar.gz"))
> +             (file-name "bamtools-src.tar.gz")
> +             (sha256
> +              (base32 "0rqymka21g6lfjfgxzr40pxz4c4fcl77jpy1np1li70pnc7h2cs1"))))

[... more sub-modules ...]

> +      (arguments
> +       `(#:tests? #f

Can you say why tests are disabled? It can be as simple as "no test
suite".

> +         #:phases
> +         (modify-phases %standard-phases
> +           (delete 'configure)
> +           (delete 'check)

This can be removed when using "#:tests? #f", which is the preferred way
to skip tests.

[...]

> Dear Guix,
> 
> I have a patch to add another bioinformatics tool: FreeBayes. It is a
> rather long patch, so suspect it isn't completely 'guix-proof'.
> 
> One of the problems with the patch is probably the bulk of dependencies
> dragged in (for example, vcflib).  They use specific versions so they
> are tied to this package (so that's why I cannot package them separately).

You can specify the version of dependencies when you list them in your
package. There is an example of this in the package definition of
fltk [0]. Is that not an appropriate solution for freebayes?

In any case, I think we should find some way to specify the license of
the code in all those modules.

I think somebody with some more experience should weigh in...

Cc-ing Ricardo since he also packages a lot of bioinformatics packages.

Thanks again for putting such a complicated patch together! We will find
the best way to add freebayes to the distribution :)

[0] Due to a recent change, the most correct method of specifying the
version uses '@' to separate the package name and version rather than
'-'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-08 23:55 ` Leo Famulari
@ 2016-03-09  6:44   ` Pjotr Prins
  0 siblings, 0 replies; 13+ messages in thread
From: Pjotr Prins @ 2016-03-09  6:44 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

On Tue, Mar 08, 2016 at 06:55:15PM -0500, Leo Famulari wrote:
> Thanks again for putting such a complicated patch together! We will find
> the best way to add freebayes to the distribution :)

+1. Freebayes is (in my opinion) one of the best variant callers out
there, both for SNPs and somatic (i.e. cancer) and a true free
competitor to the not-so-free GATK caller. 

It is great to be able to deploy it with GNU Guix (I am using Roel's
package already).

Pj.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-08 15:44 Roel Janssen
  2016-03-08 23:55 ` Leo Famulari
@ 2016-03-09  6:53 ` Pjotr Prins
  2016-03-09  7:31   ` Leo Famulari
  2016-03-09 10:17 ` Ricardo Wurmus
  2 siblings, 1 reply; 13+ messages in thread
From: Pjotr Prins @ 2016-03-09  6:53 UTC (permalink / raw)
  To: Roel Janssen; +Cc: guix-devel

On Tue, Mar 08, 2016 at 04:44:13PM +0100, Roel Janssen wrote:
> One of the problems with the patch is probably the bulk of dependencies
> dragged in (for example, vcflib).  They use specific versions so they
> are tied to this package (so that's why I cannot package them separately).

This approach is very common in bioinformatics and one way they fight
dependency hell. Not the best way, admittedly.

It may be worthwhile to package vcflib, for example, separately as
freebayes merely requires the header files to compile. The problem is
that that te current vcflib does not include the headers from a
default install, if I understand Roel correctly.  Ricardo, can you
advice on this? Should we add the headers in guix through a vcflib
installer?

Personally I favour adding freebayes as is and when we decide to
package these libraries separately revisit the issues. 

Pj.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-09  6:53 ` Pjotr Prins
@ 2016-03-09  7:31   ` Leo Famulari
  2016-03-10  9:56     ` Roel Janssen
  0 siblings, 1 reply; 13+ messages in thread
From: Leo Famulari @ 2016-03-09  7:31 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

On Wed, Mar 09, 2016 at 07:53:20AM +0100, Pjotr Prins wrote:
> On Tue, Mar 08, 2016 at 04:44:13PM +0100, Roel Janssen wrote:
> > One of the problems with the patch is probably the bulk of dependencies
> > dragged in (for example, vcflib).  They use specific versions so they
> > are tied to this package (so that's why I cannot package them separately).
> 
> This approach is very common in bioinformatics and one way they fight
> dependency hell. Not the best way, admittedly.
> 
> It may be worthwhile to package vcflib, for example, separately as
> freebayes merely requires the header files to compile. The problem is
> that that te current vcflib does not include the headers from a
> default install, if I understand Roel correctly.  Ricardo, can you
> advice on this? Should we add the headers in guix through a vcflib
> installer?
> 
> Personally I favour adding freebayes as is and when we decide to
> package these libraries separately revisit the issues. 

Okay, I'm fine with the approach. It does make sense if these specific
versions only make sense in this particular context. And you are the
experts in bioinformatics software, not me :)

Ricardo, your thoughts?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-08 15:44 Roel Janssen
  2016-03-08 23:55 ` Leo Famulari
  2016-03-09  6:53 ` Pjotr Prins
@ 2016-03-09 10:17 ` Ricardo Wurmus
  2 siblings, 0 replies; 13+ messages in thread
From: Ricardo Wurmus @ 2016-03-09 10:17 UTC (permalink / raw)
  To: Roel Janssen; +Cc: guix-devel


Roel Janssen <roel@gnu.org> writes:

> One of the problems with the patch is probably the bulk of dependencies
> dragged in (for example, vcflib).  They use specific versions so they
> are tied to this package (so that's why I cannot package them separately).

Yeah, this is worrying.

> From 38302e8cac77275694c8793933be414ec26906ec Mon Sep 17 00:00:00 2001
> From: Roel Janssen <roel@gnu.org>
> Date: Tue, 8 Mar 2016 16:38:46 +0100
> Subject: [PATCH] gnu: Add freebayes.

> * gnu/packages/bioinformatics.scm (freebayes): New variable.

[...]

> +(define-public freebayes
> +  (let ((commit "3ce827d8ebf89bb3bdc097ee0fe7f46f9f30d5fb"))
> +    (package
> +      (name "freebayes")
> +      (version (string-append "v1.0.2-" (string-take commit 7)))

The version should not start with “v” and it should include a numeric
revision because later git commits may be sorted lower than the current
commit.

(let ((commit ....)
      (revision "1"))
  ...
  (version (string-append "1.0.2-" revision "." commit))
  ...)

Why does it have to be a git clone, though?  I see that only six commits
were made to master since release 1.0.2.  If fetching from git must
happen it would be good to have a reason in a comment.

> +      (native-inputs
> +       `(("cmake" ,cmake)
> +         ("htslib" ,htslib)
> +         ("zlib" ,zlib)

“htslib” and “zlib” sound like regular inputs.

> +         ("python" ,python-2)
> +         ("perl" ,perl)

These maybe as well.

> +         ("bamtools-src"
> +          ,(origin
> +             (method url-fetch)
> +             (uri (string-append "https://github.com/ekg/bamtools/archive/"
> +                  "e77a43f5097ea7eee432ee765049c6b246d49baa" ".tar.gz"))
> +             (file-name "bamtools-src.tar.gz")
> +             (sha256
> +              (base32 "0rqymka21g6lfjfgxzr40pxz4c4fcl77jpy1np1li70pnc7h2cs1"))))

We already have bamtools, I think.  Is there no way to link with that
version?  Does it have to be this arbitrary-looking commit?

> +         ("vcflib-src"
> +          ,(origin
> +             (method url-fetch)
> +             (uri (string-append "https://github.com/vcflib/vcflib/archive/"
> +                  "5ac091365fdc716cc47cc5410bb97ee5dc2a2c92" ".tar.gz"))
> +             (file-name "vcflib-5ac0913.tar.gz")
> +             (sha256
> +              (base32 "0ywshwpif059z5h0g7zzrdfzzdj2gr8xvwlwcsdxrms3p9iy35h8"))))

> +         ;; These are submodules for the vcflib version used in freebayes
> +         ("tabixpp-src"
> +         ("intervaltree-src"
> +         ("smithwaterman-src"
> +         ("multichoose-src"
> +         ("fsom-src"
> +         ("filevercmp-src"
> +         ("fastahack-src"

If these are submodules of this particular version of vcflib I think it
would be better to create a separate vcflib package where these
submodules are included.  If ever possible you would then coerce
freebayes to link with that version of vcflib.

Note that vcflib doesn’t *have* to be exported via define-public.  It
would be nice, though, if we could get a regular vcflib package as a
side-effect of all this.

> +      (arguments
> +       `(#:tests? #f

Please quickly comment on why the tests are disabled.

> +         #:phases
> +         (modify-phases %standard-phases
> +           (delete 'configure)
> +           (delete 'check)

You won’t need this when tests are disabled already.

> +           (add-after 'unpack 'unpack-submodule-sources
> +             (lambda* (#:key inputs #:allow-other-keys)
> +               (let ((unpack (lambda (source target)
> +                               (with-directory-excursion target
> +                                 (zero? (system* "tar" "xvf"
> +                                                 (assoc-ref inputs source)
> +                                                 "--strip-components=1"))))))
> +                 (and
> +                  (unpack "bamtools-src" "bamtools")
> +                  (unpack "vcflib-src" "vcflib")
> +                  (unpack "intervaltree-src" "intervaltree")
> +                  (unpack "fastahack-src" "vcflib/fastahack")
> +                  (unpack "filevercmp-src" "vcflib/filevercmp")
> +                  (unpack "fsom-src" "vcflib/fsom")
> +                  (unpack "intervaltree-src" "vcflib/intervaltree")
> +                  (unpack "multichoose-src" "vcflib/multichoose")
> +                  (unpack "smithwaterman-src" "vcflib/smithwaterman")
> +                  (unpack "tabixpp-src" "vcflib/tabixpp")))))
> +           (add-after 'unpack-submodule-sources 'fix-makefile
> +             (lambda* (#:key inputs #:allow-other-keys)
> +               ;; We don't have the .git folder to get the version tag from.
> +               ;; For this checkout of the code, it's v1.0.0.
> +               (substitute* '("vcflib/Makefile")
> +                 (("^GIT_VERSION.*") "GIT_VERSION = v1.0.0"))))
> +           (replace
> +            'build
> +            (lambda* (#:key inputs make-flags #:allow-other-keys)
> +              (and
> +               ;; Compile Bamtools before compiling the main project.
> +               (with-directory-excursion "bamtools"
> +                 (system* "mkdir" "build")
> +                 (with-directory-excursion "build"
> +                   (and (zero? (system* "cmake" "../"))
> +                        (zero? (system* "make")))))
> +               ;; Compile vcflib before we compiling the main project.
> +               (with-directory-excursion "vcflib"
> +                 (with-directory-excursion "tabixpp"
> +                   (zero? (system* "make")))
> +                 (zero? (system* "make" "CC=gcc"
> +                   (string-append "CFLAGS=\"" "-Itabixpp "
> +                     "-I" (assoc-ref inputs "htslib") "/include " "\"") "all")))
> +               (with-directory-excursion "src"
> +                 (zero? (system* "make"))))))

This seems too hackish for my taste.  It would be so much nicer if
bamtools and vcflib were built in separate packages.  This would make it
much clearer what hacks are required for what package and you could use
the cmake-build-system for bamtools and the gnu-build-system for vcflib.

You might be able to force freebayes to use those separate packages by
overriding BAMTOOLS_ROOT and VCFLIB_ROOT in “src/Makefile” or by
replacing “$(BAMTOOLS_ROOT)/lib/libbamtools.a” with the path to the
actual “libbamtools.a” in the store.  Looking at “src/Makefile” the
entanglement isn’t that bad and we should be able to resolve it.

If you need assistance with this I could help you.

> +           (replace
> +            'install

Please leave “'install” on the same line.

~~ Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes.
  2016-03-09  7:31   ` Leo Famulari
@ 2016-03-10  9:56     ` Roel Janssen
  0 siblings, 0 replies; 13+ messages in thread
From: Roel Janssen @ 2016-03-10  9:56 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel


Leo Famulari writes:

> On Wed, Mar 09, 2016 at 07:53:20AM +0100, Pjotr Prins wrote:
>> On Tue, Mar 08, 2016 at 04:44:13PM +0100, Roel Janssen wrote:
>> > One of the problems with the patch is probably the bulk of dependencies
>> > dragged in (for example, vcflib).  They use specific versions so they
>> > are tied to this package (so that's why I cannot package them separately).
>> 
>> This approach is very common in bioinformatics and one way they fight
>> dependency hell. Not the best way, admittedly.
>> 
>> It may be worthwhile to package vcflib, for example, separately as
>> freebayes merely requires the header files to compile. The problem is
>> that that te current vcflib does not include the headers from a
>> default install, if I understand Roel correctly.  Ricardo, can you
>> advice on this? Should we add the headers in guix through a vcflib
>> installer?

I would like to propose the following:

- Separate the packages, and when needed, add specific versions of the
  packages.

  Instead of trying to package arbitrary Git commits, I will look into
  whether I can use release versions instead, without impacting the final
  FreeBayes binary.

- Use ,(package-source ...) to unpack the sources of the packages to
  satisfy the "submodules" structure of projects that use this.

This way we can separate the packages, which yields the programs that
are now hidden in submodule sources.  It also gives us more complete
package descriptions for them.  We don't have to heavily modify the
project setups that rely on submodules (just extract sources the way
we do now).

I am working on doing this for FreeBayes.  It should yield separate
packages for: vcflib, tabixpp, multichoose, smithwaterman, filevercmp
and fsom.

I want to find out whether I can make "intervaltree" a library, as it
is meant to be:
  https://github.com/ekg/intervaltree

WDYT?

Kind regards,
Roel Janssen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] gnu: Add freebayes
@ 2016-05-02  9:25 Rob Syme
  2016-05-02 15:21 ` Ricardo Wurmus
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Syme @ 2016-05-02  9:25 UTC (permalink / raw)
  To: guix-devel

A guix-friendly licensed variant caller.

From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
From: Rob Syme <rob.syme@gmail.com>
Date: Mon, 2 May 2016 16:46:53 +0800
Subject: [PATCH] gnu: Add freebayes

* gnu/packages/bioinformatics.scm (freebayes): New variable.

---
 gnu/packages/bioinformatics.scm | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 079fd46..db382d7 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -43,6 +43,7 @@
   #:use-module (gnu packages boost)
   #:use-module (gnu packages compression)
   #:use-module (gnu packages cpio)
+  #:use-module (gnu packages cmake)
   #:use-module (gnu packages curl)
   #:use-module (gnu packages doxygen)
   #:use-module (gnu packages datastructures)
@@ -1905,6 +1906,44 @@ genes in incomplete assemblies or complete genomes.")
     ;; GPL3+ according to private correspondense with the authors.
     (license license:gpl3+)))

+(define-public freebayes
+  (let ((commit "0cb269728b2db6307053cafe6f913a8b6fa1331e"))
+    (package
+      (name "freebayes")
+      (version "1.0.2")
+      (source (origin
+                (method git-fetch)
+                (uri (git-reference
+                      (url "https://github.com/ekg/freebayes.git")
+                      (commit commit)
+                      (recursive? #t)))
+                (sha256
+                 (base32
+                  "0z37ch3as3g8hx36l1lwy1v9cqahx72lb51yxrcmwymx0kcf39c5"))))
+      (build-system gnu-build-system)
+      (arguments '(#:phases
+                   (modify-phases %standard-phases
+                     (delete 'configure)
+                     (delete 'check) ; no "check" target
+                     (replace 'install
+                       (lambda* (#:key outputs #:allow-other-keys)
+                         (let* ((out (assoc-ref outputs "out"))
+                                (bin (string-append out "/bin")))
+                           (install-file "bin/freebayes" bin)
+                           (install-file "bin/bamleftalign" bin)
+                           #t))))))
+      (inputs
+       `(("cmake" ,cmake)
+         ("zlib" ,zlib)))
+      (home-page "https://github.com/ekg/freebayes")
+      (synopsis "Bayesian haplotype-based polymorphism discovery and
genotyping")
+      (description "FreeBayes is a Bayesian genetic variant detector
designed to
+find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms),
+indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and
+complex events (composite insertion and substitution events) smaller than the
+length of a short-read sequencing alignment.")
+      (license license:expat))))
+
 (define-public fxtract
   (let ((util-commit "776ca85a18a47492af3794745efcb4a905113115"))
     (package
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes
  2016-05-02  9:25 [PATCH] gnu: Add freebayes Rob Syme
@ 2016-05-02 15:21 ` Ricardo Wurmus
  2016-05-03  7:32   ` Rob Syme
  0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Wurmus @ 2016-05-02 15:21 UTC (permalink / raw)
  To: Rob Syme; +Cc: guix-devel


Hi Rob,

> A guix-friendly licensed variant caller.
>
> From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> From: Rob Syme <rob.syme@gmail.com>
> Date: Mon, 2 May 2016 16:46:53 +0800
> Subject: [PATCH] gnu: Add freebayes
>
> * gnu/packages/bioinformatics.scm (freebayes): New variable.
>
> ---

thanks for the patch!  I see that freebayes has a couple of git
submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
Roel was working on this before, trying to untangle all the
dependencies.

See this discussion here:

    http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html

I don’t see any special treatment of these dependencies in your
package.  Is this not needed?  Or does the git checkout include all the
bundled dependencies?

I think we should use one of the release tarballs instead and make sure
to package the dependencies separately.  Maybe you can cooperate with
Roel, who has made a lot of progress on this end already.

What do you think?

~~ Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes
  2016-05-02 15:21 ` Ricardo Wurmus
@ 2016-05-03  7:32   ` Rob Syme
  2016-05-03  7:45     ` Roel Janssen
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Syme @ 2016-05-03  7:32 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

Hi Ricardo

I'm sorry for not checking the list beforehand! Interestingly, we ended up
with very different solutions to the problem of including the freebayes
dependencies. Using the recursive git fetch compiles without issue for me
and *seems* to produce sensible results. Perhaps some non-guix packages are
bleeding in from my configuration? If so, any verification that it
works/breaks would be appreciated. If it *does* work, I'd argue that using
"(recursive? #t)" is a neater and more upgradable solution to the problem
of the freebayes git submodule problem, as we wouldn't need to update the
hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.

On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
wrote:

>
> Hi Rob,
>
> > A guix-friendly licensed variant caller.
> >
> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> > From: Rob Syme <rob.syme@gmail.com>
> > Date: Mon, 2 May 2016 16:46:53 +0800
> > Subject: [PATCH] gnu: Add freebayes
> >
> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
> >
> > ---
>
> thanks for the patch!  I see that freebayes has a couple of git
> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
> Roel was working on this before, trying to untangle all the
> dependencies.
>
> See this discussion here:
>
>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
>
> I don’t see any special treatment of these dependencies in your
> package.  Is this not needed?  Or does the git checkout include all the
> bundled dependencies?
>
> I think we should use one of the release tarballs instead and make sure
> to package the dependencies separately.  Maybe you can cooperate with
> Roel, who has made a lot of progress on this end already.
>
> What do you think?
>
> ~~ Ricardo
>

[-- Attachment #2: Type: text/html, Size: 2445 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes
  2016-05-03  7:32   ` Rob Syme
@ 2016-05-03  7:45     ` Roel Janssen
  2016-05-03  7:52       ` Rob Syme
  0 siblings, 1 reply; 13+ messages in thread
From: Roel Janssen @ 2016-05-03  7:45 UTC (permalink / raw)
  To: Rob Syme; +Cc: guix-devel

Hello Rob,

Actually, at the time I packaged freebayes, I intended to use the
recursive Git fetch, but there was a problem with it in Guix at that
moment.

So, the clutter in my package should do almost the same as the Git
recursive fetch :).

There are still some licensing problems with freebayes.  First, we need
to get vcflib in Guix, for which the following needs to be resolved:
- fastahack:  No free/open source license.
- smithwaterman: No free/open source license.
- tabixpp: No free/open source license.

For the other dependencies, I sent packages to the list.  Some made it
in upstream already (filevercmp), and other are still in review.  For
the three packages mentioned above we must first resolve the licensing
issues.

I sent Erik an e-mail a week ago asking to add licenses to these
projects, and he told me he will look into this soon.  Feel free to keep
reminding him to look into this :).

Kind regards,
Roel Janssen


Rob Syme writes:

> Hi Ricardo
>
> I'm sorry for not checking the list beforehand! Interestingly, we ended up
> with very different solutions to the problem of including the freebayes
> dependencies. Using the recursive git fetch compiles without issue for me
> and *seems* to produce sensible results. Perhaps some non-guix packages are
> bleeding in from my configuration? If so, any verification that it
> works/breaks would be appreciated. If it *does* work, I'd argue that using
> "(recursive? #t)" is a neater and more upgradable solution to the problem
> of the freebayes git submodule problem, as we wouldn't need to update the
> hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.
>
> On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> wrote:
>
>>
>> Hi Rob,
>>
>> > A guix-friendly licensed variant caller.
>> >
>> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
>> > From: Rob Syme <rob.syme@gmail.com>
>> > Date: Mon, 2 May 2016 16:46:53 +0800
>> > Subject: [PATCH] gnu: Add freebayes
>> >
>> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
>> >
>> > ---
>>
>> thanks for the patch!  I see that freebayes has a couple of git
>> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
>> Roel was working on this before, trying to untangle all the
>> dependencies.
>>
>> See this discussion here:
>>
>>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
>>
>> I don’t see any special treatment of these dependencies in your
>> package.  Is this not needed?  Or does the git checkout include all the
>> bundled dependencies?
>>
>> I think we should use one of the release tarballs instead and make sure
>> to package the dependencies separately.  Maybe you can cooperate with
>> Roel, who has made a lot of progress on this end already.
>>
>> What do you think?
>>
>> ~~ Ricardo
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes
  2016-05-03  7:45     ` Roel Janssen
@ 2016-05-03  7:52       ` Rob Syme
  2016-05-03 12:34         ` Pjotr Prins
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Syme @ 2016-05-03  7:52 UTC (permalink / raw)
  To: Roel Janssen; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3535 bytes --]

Ah. Good point Roel. Until the licencing is resolved, any discussion about
how to package freebayes is of no practical value. I'll ping Erik about the
licencing as well :)
-r

P.S. Erik's latest biorxiv preprint is worth a read (A Graph Extension of
the Positional Burrows-Wheeler Transform and its Applications):
http://biorxiv.org/content/early/2016/05/02/051409

On Tue, 3 May 2016 at 15:46 Roel Janssen <roel@gnu.org> wrote:

> Hello Rob,
>
> Actually, at the time I packaged freebayes, I intended to use the
> recursive Git fetch, but there was a problem with it in Guix at that
> moment.
>
> So, the clutter in my package should do almost the same as the Git
> recursive fetch :).
>
> There are still some licensing problems with freebayes.  First, we need
> to get vcflib in Guix, for which the following needs to be resolved:
> - fastahack:  No free/open source license.
> - smithwaterman: No free/open source license.
> - tabixpp: No free/open source license.
>
> For the other dependencies, I sent packages to the list.  Some made it
> in upstream already (filevercmp), and other are still in review.  For
> the three packages mentioned above we must first resolve the licensing
> issues.
>
> I sent Erik an e-mail a week ago asking to add licenses to these
> projects, and he told me he will look into this soon.  Feel free to keep
> reminding him to look into this :).
>
> Kind regards,
> Roel Janssen
>
>
> Rob Syme writes:
>
> > Hi Ricardo
> >
> > I'm sorry for not checking the list beforehand! Interestingly, we ended
> up
> > with very different solutions to the problem of including the freebayes
> > dependencies. Using the recursive git fetch compiles without issue for me
> > and *seems* to produce sensible results. Perhaps some non-guix packages
> are
> > bleeding in from my configuration? If so, any verification that it
> > works/breaks would be appreciated. If it *does* work, I'd argue that
> using
> > "(recursive? #t)" is a neater and more upgradable solution to the problem
> > of the freebayes git submodule problem, as we wouldn't need to update the
> > hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.
> >
> > On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de
> >
> > wrote:
> >
> >>
> >> Hi Rob,
> >>
> >> > A guix-friendly licensed variant caller.
> >> >
> >> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> >> > From: Rob Syme <rob.syme@gmail.com>
> >> > Date: Mon, 2 May 2016 16:46:53 +0800
> >> > Subject: [PATCH] gnu: Add freebayes
> >> >
> >> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
> >> >
> >> > ---
> >>
> >> thanks for the patch!  I see that freebayes has a couple of git
> >> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
> >> Roel was working on this before, trying to untangle all the
> >> dependencies.
> >>
> >> See this discussion here:
> >>
> >>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
> >>
> >> I don’t see any special treatment of these dependencies in your
> >> package.  Is this not needed?  Or does the git checkout include all the
> >> bundled dependencies?
> >>
> >> I think we should use one of the release tarballs instead and make sure
> >> to package the dependencies separately.  Maybe you can cooperate with
> >> Roel, who has made a lot of progress on this end already.
> >>
> >> What do you think?
> >>
> >> ~~ Ricardo
> >>
>
>

[-- Attachment #2: Type: text/html, Size: 4627 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gnu: Add freebayes
  2016-05-03  7:52       ` Rob Syme
@ 2016-05-03 12:34         ` Pjotr Prins
  0 siblings, 0 replies; 13+ messages in thread
From: Pjotr Prins @ 2016-05-03 12:34 UTC (permalink / raw)
  To: Rob Syme; +Cc: guix-devel

On Tue, May 03, 2016 at 07:52:51AM +0000, Rob Syme wrote:
>    Ah. Good point Roel. Until the licencing is resolved, any discussion about
>    how to package freebayes is of no practical value. I'll ping Erik about
>    the licencing as well :)

To be taken as a serious alternative to GATK, freebayes should be
really free :). I also raised an issue on non-deterministic output:

  https://github.com/ekg/freebayes/issues/256

Pj.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-03 12:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-02  9:25 [PATCH] gnu: Add freebayes Rob Syme
2016-05-02 15:21 ` Ricardo Wurmus
2016-05-03  7:32   ` Rob Syme
2016-05-03  7:45     ` Roel Janssen
2016-05-03  7:52       ` Rob Syme
2016-05-03 12:34         ` Pjotr Prins
  -- strict thread matches above, loose matches on Subject: below --
2016-03-08 15:44 Roel Janssen
2016-03-08 23:55 ` Leo Famulari
2016-03-09  6:44   ` Pjotr Prins
2016-03-09  6:53 ` Pjotr Prins
2016-03-09  7:31   ` Leo Famulari
2016-03-10  9:56     ` Roel Janssen
2016-03-09 10:17 ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).