all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Cc: guix-devel <guix-devel@gnu.org>
Subject: Re: [PATCH]: Rewrite CRAN importer.
Date: Fri, 04 Dec 2015 15:23:37 +0100	[thread overview]
Message-ID: <871tb2l2c6.fsf@gnu.org> (raw)
In-Reply-To: <idj1tb3a6wc.fsf@bimsb-sys02.mdc-berlin.net> (Ricardo Wurmus's message of "Thu, 3 Dec 2015 16:28:19 +0100")

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> So I rewrote the CRAN importer to do this:
>
>   * download the DESCRIPTION file for a given package
>   * break it up into a simple alist
>   * transform the alist into a package expression
>
> This is much simpler than the sxml hackery we did before and the code
> can be reused to write an importer for Bioconductor, a popular,
> versioned R package repository for bioinformatics packages.[1]

Sounds great!

> From f2455d19461d50c775360aafb60c104281f83483 Mon Sep 17 00:00:00 2001
> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> Date: Thu, 3 Dec 2015 16:12:09 +0100
> Subject: [PATCH] import: cran: Parse DESCRIPTION instead of HTML.
>
> * guix/import/cran.scm (description->alist, safe-car, listify,
>   beautify-description, description->package): New procedures.
> (table-datum, downloads->url, nodes->text, cran-sxml->sexp): Remove
> proceduces.
> (latest-release): Use parsed DESCRIPTION instead of SXML.
> * tests/cran.scm: Rewrite to match importer.
> ---
>  guix/import/cran.scm | 270 +++++++++++++++++++++++++--------------------------
>  tests/cran.scm       | 189 +++++++++++++++---------------------
>  2 files changed, 214 insertions(+), 245 deletions(-)

Bonus points for making it shorter.  :-)

> +(define (safe-car maybe-pair)

Does it have airbags?

> +  (if (or (null? maybe-pair)
> +          (not maybe-pair))
> +      #f
> +      (car maybe-pair)))

Seriously, I think this is the wrong way to deal with that.

> +(define (package->cran-name package)
> +  "Given a Guix PACKAGE value, return the name of the R package on CRAN."
> +  (let* ((source-url (and=> (package-source package)
> +                            (compose safe-car origin-uri)))

I would change it to:

  (match (package-source package)
    ((? origin? origin)
     (match (origin-uri origin)
       ((url rest ...)
        (let ((end (string-rindex url #\_)) …) …))
       (_               #f)))
    (_ #f))

This is more verbose, but it makes the intent clearer IMO, and avoids
“wrong-type-arg #f” errors that would arise with the proposed code.

> +  (let ((url (string-append %cran-url name "/DESCRIPTION")))
> +    (call-with-temporary-output-file
> +     (lambda (temp port)
> +       (and (url-fetch url temp)
> +            (call-with-input-file temp
> +              (compose description->alist read-string)))))))

I think it’s best to use ‘http-fetch’ or ‘http-fetch/cached’ from (guix
http-client), which does not necessitate the creation of a temporary
file.

> +(define (beautify-description description)
> +  "Improve the package DESCRIPTION by turning a beginning sentence fragment
> +into a proper sentence and by using two spaces between sentences."

Excellent.  :-)

> +URL: http://gnu.org/s/my-example
> +Description: This is a long description
> +spanning multiple lines: and it could confuse the parser that
> +there is a colon : on the lines.
> +  And: this line continues the description.
> +biocViews: 0
> +SystemRequirements: Cairo (>= 0)
> +Depends: A C++11 compiler. Version 4.6.* of g++ (as
> +	currently in Rtools) is insufficient; versions 4.8.*, 4.9.* or
> +	later will be fine.

Too bad that this format is almost, but not quite, recutils.

The rest LGTM!

Can you send an updated patch?

Thank you!

Ludo’.

  reply	other threads:[~2015-12-04 22:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03 15:28 [PATCH]: Rewrite CRAN importer Ricardo Wurmus
2015-12-04 14:23 ` Ludovic Courtès [this message]
2015-12-10 15:12   ` Ricardo Wurmus
2015-12-11  9:47     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871tb2l2c6.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=ricardo.wurmus@mdc-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.