From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Woodcroft Subject: Re: [PATCH] Add fraggenescan. Date: Mon, 28 Dec 2015 07:53:27 +1000 Message-ID: <56805DD7.8010900@uq.edu.au> References: <5676A0C1.4000004@uq.edu.au> <87egehl1ai.fsf@elephly.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010901010200000100080207" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:45125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aDJG3-000201-PQ for guix-devel@gnu.org; Sun, 27 Dec 2015 16:53:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aDJG0-0002X3-Gv for guix-devel@gnu.org; Sun, 27 Dec 2015 16:53:43 -0500 Received: from mailhub1.soe.uq.edu.au ([130.102.132.208]:54348 helo=newmailhub.uq.edu.au) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aDJFz-0002UG-TX for guix-devel@gnu.org; Sun, 27 Dec 2015 16:53:40 -0500 In-Reply-To: <87egehl1ai.fsf@elephly.net> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ricardo Wurmus Cc: "guix-devel@gnu.org" This is a multi-part message in MIME format. --------------010901010200000100080207 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by newmailhub.uq.edu.au id tBRLrT64004425 Heya, On 20/12/15 23:03, Ricardo Wurmus wrote: > thanks for your patch! and thank you. >> +(define-public fraggenescan >> + (package >> + (name "fraggenescan") >> + (version "1.20") >> + (source >> + (origin >> + (method url-fetch) >> + (uri >> + (string-append "mirror://sourceforge/fraggenescan/" >> + "FragGeneScan" version ".tar.gz")) >> + (sha256 >> + (base32 "1zzigqmvqvjyqv4945kv6nc5ah2xxm1nxgrlsnbzav3f5c0n0pyj= ")))) >> + (build-system gnu-build-system) >> + (arguments >> + `(#:phases >> + (modify-phases %standard-phases >> + (delete 'configure) >> + (add-before 'build 'patch-run-script > This phase does more than just patch the run script (it changes paths i= n > two scripts and a C source file). Could you find a better name? ok, done. >> + (lambda* (#:key outputs #:allow-other-keys) >> + (let* ((out (string-append (assoc-ref outputs "out"))) >> + (share (string-append out "/share/fraggenescan/")= )) >> + (substitute* "run_FragGeneScan.pl" >> + (("system\\(\"rm") >> + (string-append "system(\"" (which "rm"))) >> + (("system\\(\"mv") >> + (string-append "system(\"" (which "mv"))) >> + ;; This script and other programs expect the trainin= g files >> + ;; to be in the non-standard location bin/train/XXX.= Change >> + ;; this to be share/fraggenescan/train/XXX instead. >> + (("^\\$train.file =3D \\$dir.\\\"train/\\\".\\$FGS_t= rain_file;") >> + (string-append "$train_file =3D \"" >> + share >> + "train/\".$FGS_train_file;"))) > I might look clearer if you captured a part of the matches, > > (("(^\\$train\\.file =3D )\\$dir\\.\\\"(train/\\\"\\.\\$FGS_train_file;= )" _ pre post) > (string-append pre "\"" share post)) > > Or since there=E2=80=99s really just one line beginning with =E2=80=9C$= train.file=E2=80=9D maybe > you could do this: > > (("(^\\$train\\.file =3D ).*" _ prefix) > (string-append prefix "\"" share "train/\".$FGS_train_file;")) > > The regular expressions above look quite scary, so maybe the latter > proposal is best here. Indeed they are scary, and writing regexes for Guix rarely fails to trip=20 me up. I'd actually prefer a less powerful alternative (just a regular=20 string replacement), and better yet one that stops the build process=20 when it matches 0 or 2 or more times. There's nothing like that around=20 is there? But for now I just took your suggestions about shortening the regexes. > >> + (substitute* "run_hmm.c" >> + (("^ strcat\\(train_dir, \\\"train/\\\"\\);") >> + (string-append " strcpy(train_dir, \"" share "/tra= in/\");"))) > Why do you replace =E2=80=9Cstrcat=E2=80=9D with =E2=80=9Cstrcpy=E2=80=9D= here? The line above does a strcpy we don't want, so strcat would keep that. I=20 could remove the line above if you want, but this effectively makes no=20 difference to the running of the program. [..] >> + #t)) >> + (replace 'build >> + (lambda _ (and (zero? (system* "make" "clean")) >> + (zero? (system* "make" "fgs"))))) > Why must =E2=80=9Cmake clean=E2=80=9D be run first? I know the README = says so, but is > it really required? If it is not you could just use the default build > phase, possibly specifying =E2=80=9Cfgs=E2=80=9D as the target. Yeh the tarball comes with compiled files. I've added a comment. > >> + (delete 'check) > How about =E2=80=9C#:tests? #f=E2=80=9D instead? > >> + (replace 'install >> + (lambda* (#:key outputs #:allow-other-keys) >> + (let* ((out (string-append (assoc-ref outputs "out"))) >> + (bin (string-append out "/bin/")) >> + (share (string-append out "/share/fraggenescan/tr= ain"))) >> + (install-file "run_FragGeneScan.pl" bin) >> + (install-file "FragGeneScan" bin) >> + (install-file "FGS_gff.py" bin) >> + (install-file "post_process.pl" bin) >> + (copy-recursively "train" share)))) >> + (add-after 'install 'post-install-check >> + ;; In lieu of 'make check', run one of the examples and ch= eck the >> + ;; output files gets created. > Oh, I see. Maybe you could delete the =E2=80=9Ccheck=E2=80=9D phase ri= ght before this, > so that it=E2=80=99s obvious you are moving it after the =E2=80=9Cinsta= ll=E2=80=9D phase. ok, done. > >> + (lambda* (#:key outputs #:allow-other-keys) >> + (let* ((out (string-append (assoc-ref outputs "out"))) >> + (bin (string-append out "/bin/"))) >> + (and (zero? (system* (string-append bin "run_FragGeneS= can.pl") >> + "-genome=3D./example/NC_000913.fna" >> + "-out=3D./test2" >> + "-complete=3D1" >> + "-train=3Dcomplete")) >> + (file-exists? "test2.faa") >> + (file-exists? "test2.ffn") >> + (file-exists? "test2.gff") >> + (file-exists? "test2.out")))))))) >> + (inputs >> + `(("perl" ,perl) >> + ("python" ,python-2))) ;not compatible with python 3. >> + (home-page "https://sourceforge.net/projects/fraggenescan/") >> + (synopsis "Finds potentially fragmented genes in short reads") >> + (description >> + "FragGeneScan is a program for predicting bacterial and archaeal= genes in >> +short and error-prone DNA sequencing reads. It can also be applied t= o predict >> +genes in incomplete assemblies or complete genomes.") >> + (license license:gpl1))) > I didn=E2=80=99t see any mention of a particular GPL version. The READ= ME says > this: > > License > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang. > You may redistribute this software under the terms of the GNU Gene= ral Public License. > > This looks like it could be any version of the GPL (as it is implied > when R packages just declare =E2=80=9CGPL=E2=80=9D as a license). I wo= uld do this, but > I=E2=80=99m not sure that=E2=80=99s okay: > > ;; Released under any version of the GPL > (license license:gpl3+) It seems your interpretation was better than mine. The authors said=20 gpl3+ over email. Thanks, ben --------------010901010200000100080207 Content-Type: text/x-patch; name="0001-gnu-Add-fraggenescan.patch" Content-Disposition: attachment; filename="0001-gnu-Add-fraggenescan.patch" Content-Transfer-Encoding: 7bit >From a140b0816f095a7a13d6bdd4dcbddbae1d020fbb Mon Sep 17 00:00:00 2001 From: Ben Woodcroft Date: Sun, 20 Dec 2015 22:23:17 +1000 Subject: [PATCH] gnu: Add fraggenescan. * gnu/packages/bioinformatics.scm (fraggenescan): New variable. --- gnu/packages/bioinformatics.scm | 81 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm index 7c573e1..afa32f2 100644 --- a/gnu/packages/bioinformatics.scm +++ b/gnu/packages/bioinformatics.scm @@ -1354,6 +1354,87 @@ supports next-generation sequencing data in fasta/q and csfasta/q format from Illumina, Roche 454, and the SOLiD platform.") (license license:gpl3))) +(define-public fraggenescan + (package + (name "fraggenescan") + (version "1.20") + (source + (origin + (method url-fetch) + (uri + (string-append "mirror://sourceforge/fraggenescan/" + "FragGeneScan" version ".tar.gz")) + (sha256 + (base32 "1zzigqmvqvjyqv4945kv6nc5ah2xxm1nxgrlsnbzav3f5c0n0pyj")))) + (build-system gnu-build-system) + (arguments + `(#:phases + (modify-phases %standard-phases + (delete 'configure) + (add-before 'build 'patch-paths + (lambda* (#:key outputs #:allow-other-keys) + (let* ((out (string-append (assoc-ref outputs "out"))) + (share (string-append out "/share/fraggenescan/"))) + (substitute* "run_FragGeneScan.pl" + (("system\\(\"rm") + (string-append "system(\"" (which "rm"))) + (("system\\(\"mv") + (string-append "system(\"" (which "mv"))) + ;; This script and other programs expect the training files + ;; to be in the non-standard location bin/train/XXX. Change + ;; this to be share/fraggenescan/train/XXX instead. + (("^\\$train.file = \\$dir.*") + (string-append "$train_file = \"" + share + "train/\".$FGS_train_file;"))) + (substitute* "run_hmm.c" + (("^ strcat\\(train_dir, \\\"train/\\\"\\);") + (string-append " strcpy(train_dir, \"" share "/train/\");"))) + (substitute* "post_process.pl" + (("^my \\$dir = substr.*") + (string-append "my $dir = \"" share "\";")))) + #t)) + (replace 'build + (lambda _ (and (zero? (system* "make" "clean")) + (zero? (system* "make" "fgs"))))) + (replace 'install + (lambda* (#:key outputs #:allow-other-keys) + (let* ((out (string-append (assoc-ref outputs "out"))) + (bin (string-append out "/bin/")) + (share (string-append out "/share/fraggenescan/train"))) + (install-file "run_FragGeneScan.pl" bin) + (install-file "FragGeneScan" bin) + (install-file "FGS_gff.py" bin) + (install-file "post_process.pl" bin) + (copy-recursively "train" share)))) + (delete 'check) + (add-after 'install 'post-install-check + ;; In lieu of 'make check', run one of the examples and check the + ;; output files gets created. + (lambda* (#:key outputs #:allow-other-keys) + (let* ((out (string-append (assoc-ref outputs "out"))) + (bin (string-append out "/bin/"))) + (and (zero? (system* (string-append bin "run_FragGeneScan.pl") + "-genome=./example/NC_000913.fna" + "-out=./test2" + "-complete=1" + "-train=complete")) + (file-exists? "test2.faa") + (file-exists? "test2.ffn") + (file-exists? "test2.gff") + (file-exists? "test2.out")))))))) + (inputs + `(("perl" ,perl) + ("python" ,python-2))) ;not compatible with python 3. + (home-page "https://sourceforge.net/projects/fraggenescan/") + (synopsis "Finds potentially fragmented genes in short reads") + (description + "FragGeneScan is a program for predicting bacterial and archaeal genes in +short and error-prone DNA sequencing reads. It can also be applied to predict +genes in incomplete assemblies or complete genomes.") + ;; GPL3+ according to private correspondense with the authors. + (license license:gpl3+))) + (define-public grit (package (name "grit") -- 2.5.0 --------------010901010200000100080207--