From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roel Janssen Subject: [PATCH] Add vcflib. Date: Tue, 22 Mar 2016 16:24:49 +0100 Message-ID: <878u1aa62m.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:51304) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOAm-0003Vn-3d for guix-devel@gnu.org; Tue, 22 Mar 2016 11:24:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiOAi-0000tZ-RU for guix-devel@gnu.org; Tue, 22 Mar 2016 11:24:44 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOAi-0000tQ-L1 for guix-devel@gnu.org; Tue, 22 Mar 2016 11:24:40 -0400 Received: from [143.121.239.252] (port=61062 helo=roel-tp) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aiOAh-0001er-VA for guix-devel@gnu.org; Tue, 22 Mar 2016 11:24:40 -0400 List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: guix-devel@gnu.org --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-gnu-Add-tabixpp.patch >From dfc9b373bbce0f36882407cec47440a0a44c78d1 Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 14:59:05 +0100 Subject: [PATCH 1/8] gnu: Add tabixpp. * gnu/packages/bioinformatics.scm (tabixpp): New variable. --- gnu/packages/bioinformatics.scm | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 281bd1f..6792be9 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4791,3 +4791,42 @@ negative binomial distribution to model the read counts among the samples in the same group, and look for consistent differences between ChIP and control group or two ChIP groups run under different conditions.") (license license:gpl3+))) + +(define-public tabixpp + (package + (name "tabixpp") + (version "1.0.0") + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/tabixpp/archive/v" + version ".tar.gz")) + (file-name (string-append name "-" version ".tar.gz")) + (sha256 + (base32 "1s0lgks7qlvlhvcjhi2wm18nnza1bwcnic44ij7z8wfg88h4ivwn")))) + (build-system gnu-build-system) + (inputs + `(("htslib" ,htslib) + ("zlib" ,zlib))) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + ;; Modify the build phase to use the separately packaged htslib. + (replace 'build + (lambda* (#:key inputs #:allow-other-keys) + (let ((htslib-ref (assoc-ref inputs "htslib"))) + (zero? + (system* "make" + (string-append "HTS_LIB=" htslib-ref "/lib/libhts.a") + "HTS_HEADERS=" ; Do not check for local htslib headers. + (string-append "LIBPATH=-L. -L" htslib-ref "/include")))))) + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "tabix++" bin))))))) + (home-page "https://github.com/ekg/tabixpp") + (synopsis "C++ wrapper around tabix project") + (description "This is a C++ wrapper around the Tabix project which abstracts +some of the details of opening and jumping in tabix-indexed files.") + (license license:expat))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0002-gnu-Add-smithwaterman.patch >From e75aa388931c92657336c8a15f88b6a0273f5e02 Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:01:37 +0100 Subject: [PATCH 2/8] gnu: Add smithwaterman. * gnu/packages/bioinformatics.scm (smithwaterman): New variable. --- gnu/packages/bioinformatics.scm | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 6792be9..fa7ba24 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4830,3 +4830,34 @@ group or two ChIP groups run under different conditions.") (description "This is a C++ wrapper around the Tabix project which abstracts some of the details of opening and jumping in tabix-indexed files.") (license license:expat))) + +(define-public smithwaterman + (let ((commit "203218b47d45ac56ef234716f1bd4c741b289be1")) + (package + (name "smithwaterman") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/smithwaterman/archive/" + commit ".tar.gz")) + (file-name (string-append name "-" version "-checkout.tar.gz")) + (sha256 + (base32 "1lkxy4xkjn96l70jdbsrlm687jhisgw4il0xr2dm33qwcclzzm3b")))) + (build-system gnu-build-system) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "smithwaterman" bin))))))) + (home-page "https://github.com/ekg/smithwaterman") + (synopsis "Implementation of the Smith-Waterman algorithm") + (description "This package provides an implementation of the Smith-Waterman +algorithm.") + ;; libdisorder is licensed GPLv2. The parent project (vcflib), of which + ;; this program is a submodule, is licensed MIT, which is the same as + ;; the Expat license. + (license (list license:gpl2 license:expat))))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0003-gnu-Add-multichoose.patch >From edcf3132dca6c3e86439710892870285377adbb2 Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:07:47 +0100 Subject: [PATCH 3/8] gnu: Add multichoose. * gnu/packages/bioinformatics.scm (multichoose): New variable. --- gnu/packages/bioinformatics.scm | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index fa7ba24..9465d56 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4861,3 +4861,38 @@ algorithm.") ;; this program is a submodule, is licensed MIT, which is the same as ;; the Expat license. (license (list license:gpl2 license:expat))))) + +(define-public multichoose + (package + (name "multichoose") + (version "1.0.3") + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/multichoose/archive/v" + version ".tar.gz")) + (file-name (string-append name "-" version ".tar.gz")) + (sha256 + (base32 "0xy86vvr3qrs4l81qis7ia1q2hnqv0xcb4a1n60smxbhqqis5w3l")))) + (build-system gnu-build-system) + (native-inputs + `(("python" ,python-2) + ("node" ,node))) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "multichoose" bin) + (install-file "multipermute" bin))))))) + (home-page "https://github.com/ekg/multichoose") + (synopsis "Library for efficient loopless multiset combination generation +algorithm") + (description "A library implements an efficient loopless multiset +combination generation algorithm which is (approximately) described in +\"Loopless algorithms for generating permutations, combinations, and other +combinatorial configurations.\" G Ehrlich - Journal of the ACM (JACM), +1973. (Algorithm 7.)") + (license license:expat))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0004-gnu-Add-fsom.patch >From ec8d80b1be2d4271b6a1583a1c1a6264d37ccbeb Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:10:53 +0100 Subject: [PATCH 4/8] gnu: Add fsom. * gnu/packages/bioinformatics.scm (fsom): New variable. --- gnu/packages/bioinformatics.scm | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 9465d56..0abccbc 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4896,3 +4896,30 @@ combination generation algorithm which is (approximately) described in combinatorial configurations.\" G Ehrlich - Journal of the ACM (JACM), 1973. (Algorithm 7.)") (license license:expat))) + +(define-public fsom + (let ((commit "a6ef318fbd347c53189384aef7f670c0e6ce89a3")) + (package + (name "fsom") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/fsom/archive/" + commit ".tar.gz")) + (file-name (string-append name "-" version "-checkout.tar.gz")) + (sha256 + (base32 "0q6b57ppxfvsm5cqmmbfmjpn5qvx2zi5pamvp3yh8gpmmz8cfbl3")))) + (build-system gnu-build-system) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "fsom" bin))))))) + (home-page "https://github.com/ekg/fsom") + (synopsis "Program for managing SOM (Self-Organizing Maps) neural networks") + (description "Program for managing SOM (Self-Organizing Maps) neural networks.") + (license license:gpl3)))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0005-gnu-Add-filevercmp.patch >From f0cb8c476e902e3351988f7c278b3682837c8cce Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:47:06 +0100 Subject: [PATCH 5/8] gnu: Add filevercmp. * gnu/packages/bioinformatics.scm (filevercmp): New variable. --- gnu/packages/bioinformatics.scm | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 0abccbc..261660f 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4923,3 +4923,31 @@ combinatorial configurations.\" G Ehrlich - Journal of the ACM (JACM), (synopsis "Program for managing SOM (Self-Organizing Maps) neural networks") (description "Program for managing SOM (Self-Organizing Maps) neural networks.") (license license:gpl3)))) + +(define-public filevercmp + (let ((commit "1a9b779b93d0b244040274794d402106907b71b7")) + (package + (name "filevercmp") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/filevercmp/archive/" + commit ".tar.gz")) + (file-name "filevercmp-src.tar.gz") + (sha256 + (base32 "0yp5jswf5j2pqc6517x277s4s6h1ss99v57kxw9gy0jkfl3yh450")))) + (build-system gnu-build-system) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "filevercmp" bin))))))) + (home-page "https://github.com/ekg/filevercmp") + (synopsis "Program to compare version strings") + (description "A program to compare version strings. It intends to be a +replacement for strverscmp.") + (license license:gpl3+)))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0006-gnu-Add-fastahack.patch >From c934cee6c84a39de36ecb4c3a85340ff025b1343 Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:50:12 +0100 Subject: [PATCH 6/8] gnu: Add fastahack. * gnu/packages/bioinformatics.scm (fastahack): New variable. --- gnu/packages/bioinformatics.scm | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 261660f..9cbde46 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4951,3 +4951,37 @@ combinatorial configurations.\" G Ehrlich - Journal of the ACM (JACM), (description "A program to compare version strings. It intends to be a replacement for strverscmp.") (license license:gpl3+)))) + +(define-public fastahack + (let ((commit "c68cebb4f2e5d5d2b70cf08fbdf1944e9ab2c2dd")) + (package + (name "fastahack") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/fastahack/archive/" + commit ".tar.gz")) + (file-name (string-append name "-" version "-checkout.tar.gz")) + (sha256 + (base32 "0j25lcl3jk1kls66zzxjfyq5ir6sfcvqrdwfcva61y3ajc9ssay2")))) + (build-system gnu-build-system) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((bin (string-append (assoc-ref outputs "out") "/bin"))) + (install-file "fastahack" bin))))))) + (home-page "https://github.com/ekg/fastahack") + (synopsis "Program for indexing and sequence extraction from FASTA files") + (description "Fastahack is a small application for indexing and extracting +sequences and subsequences from FASTA files. The included Fasta.cpp library +provides a FASTA reader and indexer that can be embeddedinto applications which +would benefit from directly reading subsequences from FASTA files. The library +automatically handles index file generation and use.") + ;; libdisorder is licensed GPLv2. The parent project (vcflib), of which + ;; this program is a submodule, is licensed MIT, which is the same as + ;; the Expat license. + (license (list license:gpl2 license:expat))))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0007-gnu-Add-intervaltree.patch >From 3ed14719121a952fca48a8ad3426588ebb58a130 Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 15:57:33 +0100 Subject: [PATCH 7/8] gnu: Add intervaltree. * gnu/packages/bioinformatics.scm (intervaltree): New variable. --- gnu/packages/bioinformatics.scm | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 9cbde46..5a0eb16 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4985,3 +4985,35 @@ automatically handles index file generation and use.") ;; this program is a submodule, is licensed MIT, which is the same as ;; the Expat license. (license (list license:gpl2 license:expat))))) + +(define-public intervaltree + (let ((commit "dbb4c513d1ad3baac516fc1484c995daf9b42838")) + (package + (name "intervaltree") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append + "https://github.com/ekg/intervaltree/archive/" commit ".tar.gz")) + (file-name (string-append name "-" version ".tar.gz")) + (sha256 + (base32 "19prwpn2wxsrijp5svfqvfcxl5nj7zdhm3jycd5kqhl9nifpmcks")))) + (build-system gnu-build-system) + (arguments + `(#:phases + (modify-phases %standard-phases + (delete 'configure) ; There is no configure phase. + (replace 'check + (lambda _ + (zero? (system* "./interval_tree_test")))) + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let ((include (string-append (assoc-ref outputs "out") + "/include/intervaltree"))) + (install-file "IntervalTree.h" include))))))) + (home-page "https://github.com/ekg/intervaltree/") + (synopsis "Minimal C++ interval tree implementation") + (description "This library provides a basic implementation of an interval +tree using C++ templates, allowing the insertion of arbitrary types into the +tree.") + (license license:expat)))) -- 2.5.5 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0008-gnu-Add-vcflib.patch >From 07f041559f7f023c56c23f81ac7e90441c44f91b Mon Sep 17 00:00:00 2001 From: Roel Janssen Date: Tue, 22 Mar 2016 16:06:45 +0100 Subject: [PATCH 8/8] gnu: Add vcflib. * gnu/packages/bioinformatics.scm (tabixpp-vcflib): New variable. * gnu/packages/bioinformatics.scm (vcflib): New variable. --- gnu/packages/bioinformatics.scm | 96 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 5a0eb16..34aadab 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -4831,6 +4831,22 @@ group or two ChIP groups run under different conditions.") some of the details of opening and jumping in tabix-indexed files.") (license license:expat))) +;; This version works with FreeBayes while the released version doesn't. The +;; release creates a variable with the name "vcf" somewhere, which is also the +;; name of a namespace in vcflib. +(define-public tabixpp-vcflib + (let ((commit "bbc63a49acc52212199f92e9e3b8fba0a593e3f7")) + (package (inherit tabixpp) + (name "tabixpp-vcflib") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/ekg/tabixpp/archive/" + commit ".tar.gz")) + (file-name (string-append name "-" version "-checkout.tar.gz")) + (sha256 + (base32 "1s06wmpgj4my4pik5kp2lc42hzzazbp5ism2y4i2ajp2y1c68g77"))))))) + (define-public smithwaterman (let ((commit "203218b47d45ac56ef234716f1bd4c741b289be1")) (package @@ -5017,4 +5033,82 @@ automatically handles index file generation and use.") tree using C++ templates, allowing the insertion of arbitrary types into the tree.") (license license:expat)))) + +(define-public vcflib + (let ((commit "5ac091365fdc716cc47cc5410bb97ee5dc2a2c92")) + (package + (name "vcflib") + (version (string-append "0-1." (string-take commit 7))) + (source (origin + (method url-fetch) + (uri (string-append "https://github.com/vcflib/vcflib/archive/" + commit ".tar.gz")) + (file-name (string-append name "-" version ".tar.gz")) + (sha256 + (base32 "0ywshwpif059z5h0g7zzrdfzzdj2gr8xvwlwcsdxrms3p9iy35h8")))) + (build-system gnu-build-system) + (inputs + `(("intervaltree" ,intervaltree) + ("htslib" ,htslib) + ("zlib" ,zlib))) + (native-inputs + `(("python" ,python-2) + ("perl" ,perl) + ("r" ,r) + ("node" ,node) + ("tabixpp-src" ,(package-source tabixpp-vcflib)) + ("smithwaterman-src" ,(package-source smithwaterman)) + ("multichoose-src" ,(package-source multichoose)) + ("fsom-src" ,(package-source fsom)) + ("filevercmp-src" ,(package-source filevercmp)) + ("fastahack-src" ,(package-source fastahack)))) + (arguments + `(#:tests? #f ; There are no tests to run. + #:phases + (modify-phases %standard-phases + (delete 'configure) + (add-after 'unpack 'unpack-submodule-sources + (lambda* (#:key inputs #:allow-other-keys) + (let ((unpack (lambda (source target) + (with-directory-excursion target + (zero? (system* "tar" "xvf" + (assoc-ref inputs source) + "--strip-components=1")))))) + (and + (unpack "fastahack-src" "fastahack") + (unpack "filevercmp-src" "filevercmp") + (unpack "fsom-src" "fsom") + (unpack "multichoose-src" "multichoose") + (unpack "smithwaterman-src" "smithwaterman") + (unpack "tabixpp-src" "tabixpp"))))) + (add-after 'unpack-submodule-sources 'fix-makefile + (lambda* (#:key inputs #:allow-other-keys) + (substitute* '("Makefile") + (("^GIT_VERSION.*") "GIT_VERSION = v1.0.0")))) + (replace 'build + (lambda* (#:key inputs make-flags #:allow-other-keys) + (with-directory-excursion "tabixpp" + (zero? (system* "make"))) + (zero? (system* "make" "CC=gcc" + (string-append + "CFLAGS=\"" "-Itabixpp " + "-I" (assoc-ref inputs "htslib") "/include " + "-I" (assoc-ref inputs "intervaltree") "/include " + "\"") "all")))) + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let* ((out (assoc-ref outputs "out")) + (bin (string-append out "/bin")) + (lib (string-append out "/lib"))) + (for-each (lambda (file) + (install-file file bin)) + (find-files "bin" ".*")) + (install-file "libvcflib.a" lib))))))) + (home-page "https://github.com/vcflib/vcflib/") + (synopsis "Library for parsing and manipulating VCF files") + (description "Vcflib provides methods to manipulate and interpret +sequence variation as it can be described by VCF. It is both an API for parsing +and operating on records of genomic variation as it can be described by the VCF +format, and a collection of command-line utilities for executing complex +manipulations on VCF files.") + (license license:expat)))) -- 2.5.5 --=-=-= Content-Type: text/plain Dear Guix, In an effort to package freebayes, I would first like to add vcflib and its dependencies. Therefore, I have attached eight patches. I've attempted to decouple the dependencies from vcflib, which worked for intervaltree and htslib. The following dependencies are actually tightly coupled in the build process: - tabixpp - smithwaterman - multichoose - fsom - filevercmp - fastahack To decouple these, we would need to include header files in the package output that aren't part of the public interface. In addition to that, we would need to patch the build system to not look for .o, but instead add the right directives to the linker. I don't think that is the desirable approach, because that would cause these packages to provide header files that should only be used internally. Therefore, I use the source of these packages in vcflib, and unpack them in the vcflib project root, to avoid confusion on interfaces and fiddling with the build system. Thank you for your time. Kind regards, Roel Janssen --=-=-=--