all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Cc: 24450@debbugs.gnu.org
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Wed, 12 Jun 2019 12:00:50 +0900	[thread overview]
Message-ID: <87imtb33tp.fsf@gmail.com> (raw)
In-Reply-To: <87h99fipj1.fsf@we.make.ritual.n0.is>

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:

> Next up: Seven of Nine, tertiary adjunct of unimatrix zero one:

Ehe! I had to look up the reference; I'm not much of a Star Trek fan
obviously :-P.

>> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Thu, 28 Mar 2019 23:12:26 -0400
>> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as
>>  native-inputs.
>>
>> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use it.
>> (test-section?): New predicate.
>> (parse-requires.txt): Collect the optional test inputs, and return them as the
>> second element of the returned list.
>
> AFAICT parse-requires.txt now returns a list of pairs, but used to
> return a plain list before.  Is this correct?

Right, a list of two lists to be technically correct.

>>  (define (parse-requires.txt requires.txt)
>> -  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of
>> -requirement names."
>> -  ;; This is a very incomplete parser, which job is to select the non-optional
>> -  ;; dependencies and strip them out of any version information.
>> +  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of requirements.
>> +
>> +The first element of the pair contains the required dependencies while the
>> +second the optional test dependencies.  Note that currently, optional,
>> +non-test dependencies are omitted since these can be difficult or expensive to
>> +satisfy."
>> +
>> +  ;; This is a very incomplete parser, which job is to read in the requirement
>> +  ;; specification lines, and strip them out of any version information.
>>    ;; Alternatively, we could implement a PEG parser with the (ice-9 peg)
>>    ;; library and the requirements grammar defined by PEP-0508
>>    ;; (https://www.python.org/dev/peps/pep-0508/).
>
> Does it really return a pair?  Or a list of pairs?  Or is it a
> two-element list of lists?

The latter! I've fixed the docstring accordingly.

>>    (call-with-input-file requires.txt
>>      (lambda (port)
>> -      (let loop ((result '()))
>> +      (let loop ((required-deps '())
>> +                 (test-deps '())
>> +                 (inside-test-section? #f)
>> +                 (optional? #f))
>>          (let ((line (read-line port)))
>> -          ;; Stop when a section is encountered, as sections contains optional
>> -          ;; (extra) requirements.  Non-optional requirements must appear
>> -          ;; before any section is defined.
>> -          (if (or (eof-object? line) (section-header? line))
>> +          (if (eof-object? line)
>>                ;; Duplicates can occur, since the same requirement can be
>>                ;; listed multiple times with different conditional markers, e.g.
>>                ;; pytest >= 3 ; python_version >= "3.3"
>>                ;; pytest < 3 ; python_version < "3.3"
>> -              (reverse (delete-duplicates result))
>> +              (map (compose reverse delete-duplicates)
>> +                   (list required-deps test-deps))
>
> Looks like a list of lists to me.  “delete-duplicates” now won’t delete
> a name that is in both “required-deps” as well as in “test-deps”.  Is
> this acceptable?

It is acceptable, as this corner case cannot exist given the current
code (a requirement can exist in either required-deps or test-deps, but
never in both). It also doesn't make sense that a run time requirement
would also be listed as a test requirement, so that corner case is not
likely to exist in the future either.

> Personally, I’m not a fan of using data structures for returning
> multiple values, because we can simply return multiple values.

I thought the Guile supported multiple values return value would be
great here as well, but I've found that for this specific case here, a
list of lists worked better, since the two lists contain requirements to
be processed the same, which "map" can readily do (i.e. less ceremony is
required).

> Or we could have more than just strings.  The meaning of these strings
> is provided by the bin into which they are thrown — either
> “required-deps” or “test-deps”.  It could be an option to collect tagged
> values instead and have the caller deal with filtering.

Sounds neat, but I'd rather punt on this one for now.

>>  (define (parse-wheel-metadata metadata)
>> -  "Given METADATA, a Wheel metadata file, return a list of requirement names."
>> +  "Given METADATA, a Wheel metadata file, return a pair of requirements.
>> +
>> +The first element of the pair contains the required dependencies while the second the optional
>> +test dependencies.  Note that currently, optional, non-test dependencies are
>> +omitted since these can be difficult or expensive to satisfy."
>>    ;; METADATA is a RFC-2822-like, header based file.
>
> This sounds like this is going to duplicate the previous procedures.

Instead of duplicating the docstring, I'm now referring to that of
PARSE-REQUIRES.TXT for PARSE-WHEEL-METADATA. The two procedures share
the same interface, but implement a different parser, which consist of
mostly a loop + conditional branches. IMHO, it's not worth, or even,
desirable to try to merge those two parsers into one; I prefer to keep
the logic of two distinct parsers separate.

>>    (define (requires-dist-header? line)
>>      ;; Return #t if the given LINE is a Requires-Dist header.
>> -    (regexp-match? (string-match "^Requires-Dist: " line)))
>> +    (string-match "^Requires-Dist: " line))
>>
>>    (define (requires-dist-value line)
>>      (string-drop line (string-length "Requires-Dist: ")))
>>
>>    (define (extra? line)
>>      ;; Return #t if the given LINE is an "extra" requirement.
>> -    (regexp-match? (string-match "extra == " line)))
>> +    (string-match "extra == '(.*)'" line))
>
> These hunks should be part of the previous patch where they were
> introduced.  (See my comments there about “regexp-match?”.)

Done.

>> +  (define (test-requirement? line)
>> +    (let ((extra-label (match:substring (extra? line) 1)))
>> +      (and extra-label (test-section? extra-label))))
>
> You can use “and=>” instead of binding a name:
>
>     (and=> (match:substring (extra? line) 1) test-section?)

Neat! I still don't have the reflex to use "and=>", thanks for reminding
me about it!

>>    (call-with-input-file metadata
>>      (lambda (port)
>> -      (let loop ((requirements '()))
>> +      (let loop ((required-deps '())
>> +                 (test-deps '()))
>>          (let ((line (read-line port)))
>> -          ;; Stop at the first 'Provides-Extra' section: the non-optional
>> -          ;; requirements appear before the optional ones.
>>            (if (eof-object? line)
>> -              (reverse (delete-duplicates requirements))
>> +              (map (compose reverse delete-duplicates)
>> +                   (list required-deps test-deps))
>>                (cond
>>                 ((and (requires-dist-header? line) (not (extra? line)))
>>                  (loop (cons (specification->requirement-name
>>                               (requires-dist-value line))
>> -                            requirements)))
>> +                            required-deps)
>> +                      test-deps))
>> +               ((and (requires-dist-header? line) (test-requirement? line))
>> +                (loop required-deps
>> +                      (cons (specification->requirement-name (requires-dist-value line))
>> +                            test-deps)))
>>                 (else
>> -                (loop requirements)))))))))
>> +                (loop required-deps test-deps))))))))) ;skip line
>
> Could you refactor this so that the other parser can be reused?  If not,
> the same comments about “if” and “cond” and the use of pairs/lists
> instead of “values” applies here.

Done w.r.t. using only "cond" rather than "if" + "cond". As explained
above, the docstring's body now refer to the documentation of
PARSE-REQUIRES.TXT; otherwise I prefer to keep the parsers separate.

>>  (define (guess-requirements source-url wheel-url archive)
>> -  "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a list
>> +  "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a list
>>  of the required packages specified in the requirements.txt file.  ARCHIVE will
>>  be extracted in a temporary directory."
>>
>> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extension url))
>>             (metadata (string-append dirname "/METADATA")))
>>        (call-with-temporary-directory
>>         (lambda (dir)
>> -         (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadata))
>> +         (if (zero?
>> +              (parameterize ((current-error-port (%make-void-port "rw+"))
>> +                             (current-output-port (%make-void-port "rw+")))
>> +                (system* "unzip" wheel-archive "-d" dir metadata)))
>>               (parse-wheel-metadata (string-append dir "/" metadata))
>>               (begin
>>                 (warning
>> @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-extension url))
>>                       (warning
>>                        (G_ "Failed to extract file: ~a from source.~%")
>>                        requires.txt)
>> -                     '())))))
>> -          '())))
>> +                     (list '() '()))))))
>> +          (list '() '()))))
>
> I would like to see cosmetic changes like these three hunks in separate
> commits.

Done for the first two hunks; the last one isn't cosmetic; it changes
the default return from an empty list to a list of two empty lists.

>>  (define (compute-inputs source-url wheel-url archive)
>> -  "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list of
>> -name/variable pairs describing the required inputs of this package.  Also
>> +  "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, return
>> +a pair of lists, each consisting of a list of name/variable pairs, for the
>> +propagated inputs and the native inputs, respectively.  Also
>>  return the unaltered list of upstream dependency names."
>> -  (let ((dependencies
>> -         (remove (cut string=? "argparse" <>)
>> -                 (guess-requirements source-url wheel-url archive))))
>> -    (values (sort
>> -             (map (lambda (input)
>> -                    (let ((guix-name (python->package-name input)))
>> -                      (list guix-name (list 'unquote (string->symbol guix-name)))))
>> -                  dependencies)
>> -             (lambda args
>> -               (match args
>> -                 (((a _ ...) (b _ ...))
>> -                  (string-ci<? a b)))))
>> -            dependencies)))
>> +
>> +  (define (strip-argparse deps)
>> +    (remove (cut string=? "argparse" <>) deps))
>> +
>> +  (define (requirement->package-name/sort deps)
>> +    (sort
>> +     (map (lambda (input)
>> +            (let ((guix-name (python->package-name input)))
>> +              (list guix-name (list 'unquote (string->symbol guix-name)))))
>> +          deps)
>> +     (lambda args
>> +       (match args
>> +         (((a _ ...) (b _ ...))
>> +          (string-ci<? a b))))))
>> +
>> +  (define process-requirements
>> +    (compose requirement->package-name/sort strip-argparse))
>> +
>> +  (let ((dependencies (guess-requirements source-url wheel-url archive)))
>> +    (values (map process-requirements dependencies)
>> +            (concatenate dependencies))))
>
> Giving names to these processing steps is fine and improves clarity, but
> I’m not so comfortable about returning ad-hoc pairs *and* multiple
> values (like above).

Using "values" is required by the API of the recursive importer.

>> +            (match guix-dependencies
>> +              ((required-inputs test-inputs)
>> +               (values
>> +                `(package
>> +                   (name ,(python->package-name name))
>> +                   (version ,version)
>> +                   (source (origin
>> +                             (method url-fetch)
>> +                             ;; Sometimes 'pypi-uri' doesn't quite work due to mixed
>> +                             ;; cases in NAME, for instance, as is the case with
>> +                             ;; "uwsgi".  In that case, fall back to a full URL.
>> +                             (uri (pypi-uri ,(string-downcase name) version))
>> +                             (sha256
>> +                              (base32
>> +                               ,(guix-hash-url temp)))))
>> +                   (build-system python-build-system)
>> +                   ,@(maybe-inputs required-inputs 'propagated-inputs)
>> +                   ,@(maybe-inputs test-inputs 'native-inputs)
>> +                   (home-page ,home-page)
>> +                   (synopsis ,synopsis)
>> +                   (description ,description)
>> +                   (license ,(license->symbol license)))
>> +                ;; Flatten the nested lists and return the upstream
>> +                ;; dependencies.
>> +                upstream-dependencies))))))))
>
> I don’t see anything being flattened here?

Good catch! Seems a remnant of something that is now gone :-).

Thanks for the review.

Maxim

  parent reply	other threads:[~2019-06-12  3:15 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16 20:00 bug#24450: pypi importer outputs strange character series in optional dependency case ng0
2019-03-29  4:24 ` Maxim Cournoyer
2019-06-16 17:02   ` ng0
2019-06-26  4:12     ` Maxim Cournoyer
2019-03-29  4:34 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-03-30  2:12   ` bug#24450: [PATCHv2] " T460s laptop
2019-03-31 14:40     ` bug#24450: [PATCH] " Maxim Cournoyer
2019-04-01 15:28     ` bug#24450: [PATCHv2] " Ludovic Courtès
2019-05-15 11:06 ` Ricardo Wurmus
2019-05-20  4:05   ` bug#24450: [PATCHv2] " Maxim Cournoyer
2019-05-20 15:05     ` Ludovic Courtès
2019-05-22  1:13       ` Maxim Cournoyer
2019-05-27 14:48     ` Ricardo Wurmus
2019-06-10  2:10       ` Maxim Cournoyer
2019-05-27 15:11     ` Ricardo Wurmus
2019-06-10  3:30       ` Maxim Cournoyer
2019-06-10  9:23         ` Ricardo Wurmus
2019-06-16 14:11           ` Maxim Cournoyer
2019-06-17  1:41             ` Ricardo Wurmus
2019-05-27 15:54     ` Ricardo Wurmus
2019-06-10  8:32       ` Maxim Cournoyer
2019-06-10  9:12         ` Ricardo Wurmus
2019-06-16  6:05           ` Maxim Cournoyer
2019-05-27 15:58     ` Ricardo Wurmus
2019-05-28 10:23     ` Ricardo Wurmus
2019-06-10 13:30       ` Maxim Cournoyer
2019-06-10 20:13         ` Ricardo Wurmus
2019-05-28 11:04     ` Ricardo Wurmus
2019-06-11  0:39       ` Maxim Cournoyer
2019-06-11 11:56         ` Ricardo Wurmus
2019-05-28 13:21     ` Ricardo Wurmus
2019-05-28 14:48     ` Ricardo Wurmus
2019-06-16  5:10       ` Maxim Cournoyer
2019-05-28 14:53     ` Ricardo Wurmus
2019-05-30  2:24       ` Maxim Cournoyer
2019-06-16  5:53       ` Maxim Cournoyer
2019-06-12  3:00 ` Maxim Cournoyer [this message]
2019-06-12  6:39   ` Ricardo Wurmus
2019-06-16 14:29     ` Maxim Cournoyer
2019-06-16 14:36       ` bug#24450: [PATCHv3] " Maxim Cournoyer
2019-07-02  1:54         ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87imtb33tp.fsf@gmail.com \
    --to=maxim.cournoyer@gmail.com \
    --cc=24450@debbugs.gnu.org \
    --cc=ricardo.wurmus@mdc-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.