From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ricardo Wurmus Subject: Re: [PATCH] Add fraggenescan. Date: Sun, 20 Dec 2015 14:03:49 +0100 Message-ID: <87egehl1ai.fsf@elephly.net> References: <5676A0C1.4000004@uq.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:54016) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aAdef-0003l1-Sz for guix-devel@gnu.org; Sun, 20 Dec 2015 08:04:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aAdea-0006rt-Ok for guix-devel@gnu.org; Sun, 20 Dec 2015 08:04:05 -0500 Received: from sender163-mail.zoho.com ([74.201.84.163]:25590) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aAdea-0006rC-He for guix-devel@gnu.org; Sun, 20 Dec 2015 08:04:00 -0500 In-reply-to: <5676A0C1.4000004@uq.edu.au> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ben Woodcroft Cc: "guix-devel@gnu.org" Hi Ben, thanks for your patch! > From f10c4178cef360a8472c988c9ea8aace15daa954 Mon Sep 17 00:00:00 2001 > From: Ben Woodcroft > Date: Sun, 20 Dec 2015 22:23:17 +1000 > Subject: [PATCH] gnu: Add fraggenescan. > * gnu/packages/bioinformatics.scm (fraggenescan): New variable. > --- > gnu/packages/bioinformatics.scm | 80 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 80 insertions(+) [...] > +(define-public fraggenescan > + (package > + (name "fraggenescan") > + (version "1.20") > + (source > + (origin > + (method url-fetch) > + (uri > + (string-append "mirror://sourceforge/fraggenescan/" > + "FragGeneScan" version ".tar.gz")) > + (sha256 > + (base32 "1zzigqmvqvjyqv4945kv6nc5ah2xxm1nxgrlsnbzav3f5c0n0pyj")))) > + (build-system gnu-build-system) > + (arguments > + `(#:phases > + (modify-phases %standard-phases > + (delete 'configure) > + (add-before 'build 'patch-run-script This phase does more than just patch the run script (it changes paths in two scripts and a C source file). Could you find a better name? > + (lambda* (#:key outputs #:allow-other-keys) > + (let* ((out (string-append (assoc-ref outputs "out"))) > + (share (string-append out "/share/fraggenescan/"))) > + (substitute* "run_FragGeneScan.pl" > + (("system\\(\"rm") > + (string-append "system(\"" (which "rm"))) > + (("system\\(\"mv") > + (string-append "system(\"" (which "mv"))) > + ;; This script and other programs expect the training files > + ;; to be in the non-standard location bin/train/XXX. Change > + ;; this to be share/fraggenescan/train/XXX instead. > + (("^\\$train.file = \\$dir.\\\"train/\\\".\\$FGS_train_file;") > + (string-append "$train_file = \"" > + share > + "train/\".$FGS_train_file;"))) I might look clearer if you captured a part of the matches, (("(^\\$train\\.file = )\\$dir\\.\\\"(train/\\\"\\.\\$FGS_train_file;)" _ pre post) (string-append pre "\"" share post)) Or since there’s really just one line beginning with “$train.file” maybe you could do this: (("(^\\$train\\.file = ).*" _ prefix) (string-append prefix "\"" share "train/\".$FGS_train_file;")) The regular expressions above look quite scary, so maybe the latter proposal is best here. > + (substitute* "run_hmm.c" > + (("^ strcat\\(train_dir, \\\"train/\\\"\\);") > + (string-append " strcpy(train_dir, \"" share "/train/\");"))) Why do you replace “strcat” with “strcpy” here? > + (substitute* "post_process.pl" > + (("^my \\$dir = substr\\(\\$0, 0, length\\(\\$0)-15\\);") > + (string-append "my $dir = \"" share "\";")))) I would also suggest simplifying the regular expression here. There is only one line matching “^my \\$dir =”. > + #t)) > + (replace 'build > + (lambda _ (and (zero? (system* "make" "clean")) > + (zero? (system* "make" "fgs"))))) Why must “make clean” be run first? I know the README says so, but is it really required? If it is not you could just use the default build phase, possibly specifying “fgs” as the target. > + (delete 'check) How about “#:tests? #f” instead? > + (replace 'install > + (lambda* (#:key outputs #:allow-other-keys) > + (let* ((out (string-append (assoc-ref outputs "out"))) > + (bin (string-append out "/bin/")) > + (share (string-append out "/share/fraggenescan/train"))) > + (install-file "run_FragGeneScan.pl" bin) > + (install-file "FragGeneScan" bin) > + (install-file "FGS_gff.py" bin) > + (install-file "post_process.pl" bin) > + (copy-recursively "train" share)))) > + (add-after 'install 'post-install-check > + ;; In lieu of 'make check', run one of the examples and check the > + ;; output files gets created. Oh, I see. Maybe you could delete the “check” phase right before this, so that it’s obvious you are moving it after the “install” phase. > + (lambda* (#:key outputs #:allow-other-keys) > + (let* ((out (string-append (assoc-ref outputs "out"))) > + (bin (string-append out "/bin/"))) > + (and (zero? (system* (string-append bin "run_FragGeneScan.pl") > + "-genome=./example/NC_000913.fna" > + "-out=./test2" > + "-complete=1" > + "-train=complete")) > + (file-exists? "test2.faa") > + (file-exists? "test2.ffn") > + (file-exists? "test2.gff") > + (file-exists? "test2.out")))))))) > + (inputs > + `(("perl" ,perl) > + ("python" ,python-2))) ;not compatible with python 3. > + (home-page "https://sourceforge.net/projects/fraggenescan/") > + (synopsis "Finds potentially fragmented genes in short reads") > + (description > + "FragGeneScan is a program for predicting bacterial and archaeal genes in > +short and error-prone DNA sequencing reads. It can also be applied to predict > +genes in incomplete assemblies or complete genomes.") > + (license license:gpl1))) I didn’t see any mention of a particular GPL version. The README says this: License ============ Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang. You may redistribute this software under the terms of the GNU General Public License. This looks like it could be any version of the GPL (as it is implied when R packages just declare “GPL” as a license). I would do this, but I’m not sure that’s okay: ;; Released under any version of the GPL (license license:gpl3+) ~~ Ricardo