unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Amirouche Boubekki <amirouche@hypermove.net>
To: Cyril Roelandt <tipecaml@gmail.com>
Cc: guix-devel@gnu.org, guix-devel-bounces+amirouche=hypermove.net@gnu.org
Subject: Re: [PATCH] import: pypi: Detect inputs.
Date: Thu, 18 Jun 2015 12:45:41 +0200	[thread overview]
Message-ID: <dcefa440e96b445bc3185f21b7f03629@hypermove.net> (raw)
In-Reply-To: <1434331554-13170-1-git-send-email-tipecaml@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12655 bytes --]

Héllo,


If I'm not mistaken this patch relies only on the presence of 
requirements.txt. This is not a required file in python packaging. 
otherwise said, we miss a lot using this method. I think the best way to 
do that would be to:

- download the package and extract it
- create an environment (#)
- create a virtual env with access to system site package of the 
environment (#)
- enter the venv and install the package
- use `pip freeze -l` to retrieve the full set of dependencies

If it fails (because of missing system dependencies) fallback to parse 
setup.py (with guile-log?) and plain requirements.txt. It would be nice 
to allow to drop to guix environment (#) when the first option fails to 
inspect and install missing system dependencies manually.

Maybe [1] can be helpful, I attached both data and a script to extract. 
the dataset is missing and needs cleanup. It helped me to see that *a 
lot* of django packages miss django dependency on pypi.

WDYT?

[1] 
https://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/


On 2015-06-15 03:25, Cyril Roelandt wrote:
> * guix/import/pypi.scm (python->package-name, maybe-inputs, 
> compute-inputs,
>   guess-requirements): New procedures.
> * guix/import/pypi.scm (guix-hash-url): Now takes a filename instead of 
> an
>   URL as input.
> * guix/import/pypi.scm (make-pypi-sexp): Now tries to generate the 
> inputs
>   automagically.
> * tests/pypi.scm: Update the test.
> ---
>  guix/import/pypi.scm | 160 
> +++++++++++++++++++++++++++++++++++++++++----------
>  tests/pypi.scm       |  42 +++++++++-----
>  2 files changed, 158 insertions(+), 44 deletions(-)
> 
> diff --git a/guix/import/pypi.scm b/guix/import/pypi.scm
> index 8567cad..cf0a7bb 100644
> --- a/guix/import/pypi.scm
> +++ b/guix/import/pypi.scm
> @@ -21,10 +21,13 @@
>    #:use-module (ice-9 match)
>    #:use-module (ice-9 pretty-print)
>    #:use-module (ice-9 regex)
> +  #:use-module ((ice-9 rdelim) #:select (read-line))
>    #:use-module (srfi srfi-1)
> +  #:use-module (srfi srfi-26)
>    #:use-module (rnrs bytevectors)
>    #:use-module (json)
>    #:use-module (web uri)
> +  #:use-module (guix ui)
>    #:use-module (guix utils)
>    #:use-module (guix import utils)
>    #:use-module (guix import json)
> @@ -77,42 +80,137 @@ or #f on failure."
>  with dashes."
>    (string-join (string-split (string-downcase str) #\_) "-"))
> 
> -(define (guix-hash-url url)
> -  "Download the resource at URL and return the hash in nix-base32 
> format."
> -  (call-with-temporary-output-file
> -   (lambda (temp port)
> -     (and (url-fetch url temp)
> -          (bytevector->nix-base32-string
> -           (call-with-input-file temp port-sha256))))))
> +(define (guix-hash-url filename)
> +  "Return the hash of FILENAME in nix-base32 format."
> +  (bytevector->nix-base32-string  (file-sha256 filename)))
> +
> +(define (python->package-name name)
> +  "Given the NAME of a package on PyPI, return a Guix-compliant name 
> for the
> +package."
> +  (if (string-prefix? "python-" name)
> +      (snake-case name)
> +      (string-append "python-" (snake-case name))))
> +
> +(define (maybe-inputs package-inputs)
> +  "Given a list of PACKAGE-INPUTS, tries to generate the 'inputs' 
> field of a
> +package definition."
> +  (match package-inputs
> +    (()
> +     '())
> +    ((package-inputs ...)
> +     `((inputs (,'quasiquote ,package-inputs))))))
> +
> +(define (guess-requirements source-url tarball)
> +  "Given SOURCE-URL and a TARBALL of the package, return a list of the 
> required
> +packages specified in the requirements.txt file. TARBALL will be 
> extracted in
> +the current directory, and will be deleted."
> +
> +  (define (tarball-directory url)
> +    ;; Given the URL of the package's tarball, return the name of the 
> directory
> +    ;; that will be created upon decompressing it. If the filetype is 
> not
> +    ;; supported, return #f.
> +    ;; TODO: Support more archive formats.
> +    (let ((basename (substring url (+ 1 (string-rindex url #\/)))))
> +      (cond
> +       ((string-suffix? ".tar.gz" basename)
> +        (string-drop-right basename 7))
> +       ((string-suffix? ".tar.bz2" basename)
> +        (string-drop-right basename 8))
> +       (else
> +        (begin
> +          (warning (_ "Unsupported archive format: \
> +cannot determine package dependencies"))
> +          #f)))))
> +
> +  (define (clean-requirement s)
> +    ;; Given a requirement LINE, as can be found in a Python 
> requirements.txt
> +    ;; file, remove everything other than the actual name of the 
> required
> +    ;; package, and return it.
> +    (string-take s
> +     (or (string-index s #\space)
> +         (string-length s))))
> +
> +  (define (comment? line)
> +    ;; Return #t if the given LINE is a comment, #f otherwise.
> +    (eq? (string-ref (string-trim line) 0) #\#))
> +
> +  (define (read-requirements requirements-file)
> +    ;; Given REQUIREMENTS-FILE, a Python requirements.txt file, return 
> a list
> +    ;; of name/variable pairs describing the requirements.
> +    (call-with-input-file requirements-file
> +      (lambda (port)
> +        (let loop ((result '()))
> +          (let ((line (read-line port)))
> +            (if (eof-object? line)
> +                result
> +                (cond
> +                 ((or (string-null? line) (comment? line))
> +                  (loop result))
> +                 (else
> +                  (loop (cons (python->package-name (clean-requirement 
> line))
> +                              result))))))))))
> +
> +  (let ((dirname (tarball-directory source-url)))
> +    (if (string? dirname)
> +        (let* ((req-file (string-append dirname "/requirements.txt"))
> +               (exit-code (system* "tar" "xf" tarball req-file)))
> +          ;; TODO: support more formats.
> +          (if (zero? exit-code)
> +              (dynamic-wind
> +                (const #t)
> +                (lambda ()
> +                  (read-requirements req-file))
> +                (lambda ()
> +                  (delete-file req-file)
> +                  (rmdir dirname)))
> +              (begin
> +                (warning (_ "tar xf failed with exit code ~a") 
> exit-code)
> +                '())))
> +        '())))
> +
> +(define (compute-inputs source-url tarball)
> +  "Given the SOURCE-URL of an already downloaded TARBALL, return a 
> list of
> +name/variable pairs describing the required inputs of this package."
> +  (sort
> +    (map (lambda (input)
> +           (list input (list 'unquote (string->symbol input))))
> +         (append '("python-setuptools")
> +                 ;; Argparse has been part of Python since 2.7.
> +                 (remove (cut string=? "python-argparse" <>)
> +                         (guess-requirements source-url tarball))))
> +    (lambda args
> +      (match args
> +        (((a _ ...) (b _ ...))
> +         (string-ci<? a b))))))
> 
>  (define (make-pypi-sexp name version source-url home-page synopsis
>                          description license)
>    "Return the `package' s-expression for a python package with the 
> given NAME,
>  VERSION, SOURCE-URL, HOME-PAGE, SYNOPSIS, DESCRIPTION, and LICENSE."
> -  `(package
> -     (name ,(if (string-prefix? "python-" name)
> -                (snake-case name)
> -                (string-append "python-" (snake-case name))))
> -     (version ,version)
> -     (source (origin
> -               (method url-fetch)
> -               (uri (string-append ,@(factorize-uri source-url 
> version)))
> -               (sha256
> -                (base32
> -                 ,(guix-hash-url source-url)))))
> -     (build-system python-build-system)
> -     (inputs
> -      `(("python-setuptools" ,python-setuptools)))
> -     (home-page ,home-page)
> -     (synopsis ,synopsis)
> -     (description ,description)
> -     (license ,(assoc-ref `((,lgpl2.0 . lgpl2.0)
> -                            (,gpl3 . gpl3)
> -                            (,bsd-3 . bsd-3)
> -                            (,expat . expat)
> -                            (,public-domain . public-domain)
> -                            (,asl2.0 . asl2.0))
> -                          license))))
> +  (call-with-temporary-output-file
> +   (lambda (temp port)
> +     (and (url-fetch source-url temp)
> +          `(package
> +             (name ,(python->package-name name))
> +             (version ,version)
> +             (source (origin
> +                       (method url-fetch)
> +                       (uri (string-append ,@(factorize-uri
> source-url version)))
> +                       (sha256
> +                        (base32
> +                         ,(guix-hash-url temp)))))
> +             (build-system python-build-system)
> +             ,@(maybe-inputs (compute-inputs source-url temp))
> +             (home-page ,home-page)
> +             (synopsis ,synopsis)
> +             (description ,description)
> +             (license ,(assoc-ref `((,lgpl2.0 . lgpl2.0)
> +                                    (,gpl3 . gpl3)
> +                                    (,bsd-3 . bsd-3)
> +                                    (,expat . expat)
> +                                    (,public-domain . public-domain)
> +                                    (,asl2.0 . asl2.0))
> +                                  license)))))))
> 
>  (define (pypi->guix-package package-name)
>    "Fetch the metadata for PACKAGE-NAME from pypi.python.org, and 
> return the
> diff --git a/tests/pypi.scm b/tests/pypi.scm
> index 45cf7ca..c772474 100644
> --- a/tests/pypi.scm
> +++ b/tests/pypi.scm
> @@ -21,6 +21,7 @@
>    #:use-module (guix base32)
>    #:use-module (guix hash)
>    #:use-module (guix tests)
> +  #:use-module ((guix build utils) #:select (delete-file-recursively))
>    #:use-module (srfi srfi-64)
>    #:use-module (ice-9 match))
> 
> @@ -46,8 +47,14 @@
>    }
>  }")
> 
> -(define test-source
> -  "foobar")
> +(define test-source-hash
> +  "")
> +
> +(define test-requirements
> +"# A comment
> + # A comment after a space
> +bar
> +baz > 13.37")
> 
>  (test-begin "pypi")
> 
> @@ -55,15 +62,22 @@
>    ;; Replace network resources with sample data.
>    (mock ((guix import utils) url-fetch
>           (lambda (url file-name)
> -           (with-output-to-file file-name
> -             (lambda ()
> -               (display
> -                (match url
> -                  ("https://pypi.python.org/pypi/foo/json"
> -                   test-json)
> -                  ("https://example.com/foo-1.0.0.tar.gz"
> -                   test-source)
> -                  (_ (error "Unexpected URL: " url))))))))
> +           (match url
> +             ("https://pypi.python.org/pypi/foo/json"
> +              (with-output-to-file file-name
> +                (lambda ()
> +                  (display test-json))))
> +             ("https://example.com/foo-1.0.0.tar.gz"
> +               (begin
> +                 (mkdir "foo-1.0.0")
> +                 (with-output-to-file "foo-1.0.0/requirements.txt"
> +                   (lambda ()
> +                     (display test-requirements)))
> +                 (system* "tar" "czvf" file-name "foo-1.0.0/")
> +                 (delete-file-recursively "foo-1.0.0")
> +                 (set! test-source-hash
> +                       (call-with-input-file file-name port-sha256))))
> +             (_ (error "Unexpected URL: " url)))))
>      (match (pypi->guix-package "foo")
>        (('package
>           ('name "python-foo")
> @@ -78,13 +92,15 @@
>           ('build-system 'python-build-system)
>           ('inputs
>            ('quasiquote
> -           (("python-setuptools" ('unquote 'python-setuptools)))))
> +           (("python-bar" ('unquote 'python-bar))
> +            ("python-baz" ('unquote 'python-baz))
> +            ("python-setuptools" ('unquote 'python-setuptools)))))
>           ('home-page "http://example.com")
>           ('synopsis "summary")
>           ('description "summary")
>           ('license 'lgpl2.0))
>         (string=? (bytevector->nix-base32-string
> -                  (call-with-input-string test-source port-sha256))
> +                  test-source-hash)
>                   hash))
>        (x
>         (pk 'fail x #f)))))

-- 
Amirouche ~ amz3 ~ http://www.hyperdev.fr

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: extract.py --]
[-- Type: text/x-python; name=extract.py, Size: 755 bytes --]

from re import compile
from json import loads
from pathlib import Path
from base64 import b64decode


NAME = compile('^([\w\.]+)')

# the pypi dataset was built by O. Girardot
# https://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/
with Path('./import.log').open('w') as log:
    with Path('./pypi-deps.csv').resolve().open() as f:
        lines = f.read().split('\n')
        count = len(lines)
        for num, line in enumerate(lines):
            name, version, dependencies = line.split('\t')
            dependencies = loads(b64decode(dependencies).decode('utf-8'))
            dependencies = map(lambda x: x.strip(), dependencies)
            dependencies = list(dependencies)
            print(name, version, dependencies)

[-- Attachment #3: pypi-deps.csv.gz --]
[-- Type: application/x-gzip, Size: 223837 bytes --]

  reply	other threads:[~2015-06-18 10:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-07 23:51 [PATCH] import: pypi: detect requirements from requirements.txt Cyril Roelandt
2015-03-08  0:41 ` David Thompson
2015-03-22 21:05   ` [PATCH] import: pypi: Detect inputs Cyril Roelandt
2015-03-26 13:15     ` Ludovic Courtès
2015-03-27 12:36     ` David Thompson
2015-03-29 13:46       ` Ludovic Courtès
2015-06-04 22:56       ` Cyril Roelandt
2015-06-07 20:03         ` Ludovic Courtès
2015-06-15  1:25           ` Cyril Roelandt
2015-06-18 10:45             ` Amirouche Boubekki [this message]
2015-06-19 15:32               ` Christopher Allan Webber
2015-06-20 18:01                 ` Amirouche Boubekki
2015-06-21 20:56               ` Ludovic Courtès
2015-06-21 21:32                 ` Amirouche Boubekki
2015-06-23 21:04               ` Cyril Roelandt
2015-06-24 19:49                 ` Ludovic Courtès
2015-06-24 21:42                   ` Cyril Roelandt
2015-03-22 21:05   ` [PATCH] import: pypi: detect requirements from requirements.txt Cyril Roelandt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcefa440e96b445bc3185f21b7f03629@hypermove.net \
    --to=amirouche@hypermove.net \
    --cc=guix-devel-bounces+amirouche=hypermove.net@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=tipecaml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).