From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Cc: 24450@debbugs.gnu.org
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Wed, 12 Jun 2019 12:00:50 +0900 [thread overview]
Message-ID: <87imtb33tp.fsf@gmail.com> (raw)
In-Reply-To: <87h99fipj1.fsf@we.make.ritual.n0.is>
Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:
> Next up: Seven of Nine, tertiary adjunct of unimatrix zero one:
Ehe! I had to look up the reference; I'm not much of a Star Trek fan
obviously :-P.
>> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Thu, 28 Mar 2019 23:12:26 -0400
>> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as
>> native-inputs.
>>
>> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use it.
>> (test-section?): New predicate.
>> (parse-requires.txt): Collect the optional test inputs, and return them as the
>> second element of the returned list.
>
> AFAICT parse-requires.txt now returns a list of pairs, but used to
> return a plain list before. Is this correct?
Right, a list of two lists to be technically correct.
>> (define (parse-requires.txt requires.txt)
>> - "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of
>> -requirement names."
>> - ;; This is a very incomplete parser, which job is to select the non-optional
>> - ;; dependencies and strip them out of any version information.
>> + "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of requirements.
>> +
>> +The first element of the pair contains the required dependencies while the
>> +second the optional test dependencies. Note that currently, optional,
>> +non-test dependencies are omitted since these can be difficult or expensive to
>> +satisfy."
>> +
>> + ;; This is a very incomplete parser, which job is to read in the requirement
>> + ;; specification lines, and strip them out of any version information.
>> ;; Alternatively, we could implement a PEG parser with the (ice-9 peg)
>> ;; library and the requirements grammar defined by PEP-0508
>> ;; (https://www.python.org/dev/peps/pep-0508/).
>
> Does it really return a pair? Or a list of pairs? Or is it a
> two-element list of lists?
The latter! I've fixed the docstring accordingly.
>> (call-with-input-file requires.txt
>> (lambda (port)
>> - (let loop ((result '()))
>> + (let loop ((required-deps '())
>> + (test-deps '())
>> + (inside-test-section? #f)
>> + (optional? #f))
>> (let ((line (read-line port)))
>> - ;; Stop when a section is encountered, as sections contains optional
>> - ;; (extra) requirements. Non-optional requirements must appear
>> - ;; before any section is defined.
>> - (if (or (eof-object? line) (section-header? line))
>> + (if (eof-object? line)
>> ;; Duplicates can occur, since the same requirement can be
>> ;; listed multiple times with different conditional markers, e.g.
>> ;; pytest >= 3 ; python_version >= "3.3"
>> ;; pytest < 3 ; python_version < "3.3"
>> - (reverse (delete-duplicates result))
>> + (map (compose reverse delete-duplicates)
>> + (list required-deps test-deps))
>
> Looks like a list of lists to me. “delete-duplicates” now won’t delete
> a name that is in both “required-deps” as well as in “test-deps”. Is
> this acceptable?
It is acceptable, as this corner case cannot exist given the current
code (a requirement can exist in either required-deps or test-deps, but
never in both). It also doesn't make sense that a run time requirement
would also be listed as a test requirement, so that corner case is not
likely to exist in the future either.
> Personally, I’m not a fan of using data structures for returning
> multiple values, because we can simply return multiple values.
I thought the Guile supported multiple values return value would be
great here as well, but I've found that for this specific case here, a
list of lists worked better, since the two lists contain requirements to
be processed the same, which "map" can readily do (i.e. less ceremony is
required).
> Or we could have more than just strings. The meaning of these strings
> is provided by the bin into which they are thrown — either
> “required-deps” or “test-deps”. It could be an option to collect tagged
> values instead and have the caller deal with filtering.
Sounds neat, but I'd rather punt on this one for now.
>> (define (parse-wheel-metadata metadata)
>> - "Given METADATA, a Wheel metadata file, return a list of requirement names."
>> + "Given METADATA, a Wheel metadata file, return a pair of requirements.
>> +
>> +The first element of the pair contains the required dependencies while the second the optional
>> +test dependencies. Note that currently, optional, non-test dependencies are
>> +omitted since these can be difficult or expensive to satisfy."
>> ;; METADATA is a RFC-2822-like, header based file.
>
> This sounds like this is going to duplicate the previous procedures.
Instead of duplicating the docstring, I'm now referring to that of
PARSE-REQUIRES.TXT for PARSE-WHEEL-METADATA. The two procedures share
the same interface, but implement a different parser, which consist of
mostly a loop + conditional branches. IMHO, it's not worth, or even,
desirable to try to merge those two parsers into one; I prefer to keep
the logic of two distinct parsers separate.
>> (define (requires-dist-header? line)
>> ;; Return #t if the given LINE is a Requires-Dist header.
>> - (regexp-match? (string-match "^Requires-Dist: " line)))
>> + (string-match "^Requires-Dist: " line))
>>
>> (define (requires-dist-value line)
>> (string-drop line (string-length "Requires-Dist: ")))
>>
>> (define (extra? line)
>> ;; Return #t if the given LINE is an "extra" requirement.
>> - (regexp-match? (string-match "extra == " line)))
>> + (string-match "extra == '(.*)'" line))
>
> These hunks should be part of the previous patch where they were
> introduced. (See my comments there about “regexp-match?”.)
Done.
>> + (define (test-requirement? line)
>> + (let ((extra-label (match:substring (extra? line) 1)))
>> + (and extra-label (test-section? extra-label))))
>
> You can use “and=>” instead of binding a name:
>
> (and=> (match:substring (extra? line) 1) test-section?)
Neat! I still don't have the reflex to use "and=>", thanks for reminding
me about it!
>> (call-with-input-file metadata
>> (lambda (port)
>> - (let loop ((requirements '()))
>> + (let loop ((required-deps '())
>> + (test-deps '()))
>> (let ((line (read-line port)))
>> - ;; Stop at the first 'Provides-Extra' section: the non-optional
>> - ;; requirements appear before the optional ones.
>> (if (eof-object? line)
>> - (reverse (delete-duplicates requirements))
>> + (map (compose reverse delete-duplicates)
>> + (list required-deps test-deps))
>> (cond
>> ((and (requires-dist-header? line) (not (extra? line)))
>> (loop (cons (specification->requirement-name
>> (requires-dist-value line))
>> - requirements)))
>> + required-deps)
>> + test-deps))
>> + ((and (requires-dist-header? line) (test-requirement? line))
>> + (loop required-deps
>> + (cons (specification->requirement-name (requires-dist-value line))
>> + test-deps)))
>> (else
>> - (loop requirements)))))))))
>> + (loop required-deps test-deps))))))))) ;skip line
>
> Could you refactor this so that the other parser can be reused? If not,
> the same comments about “if” and “cond” and the use of pairs/lists
> instead of “values” applies here.
Done w.r.t. using only "cond" rather than "if" + "cond". As explained
above, the docstring's body now refer to the documentation of
PARSE-REQUIRES.TXT; otherwise I prefer to keep the parsers separate.
>> (define (guess-requirements source-url wheel-url archive)
>> - "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a list
>> + "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a list
>> of the required packages specified in the requirements.txt file. ARCHIVE will
>> be extracted in a temporary directory."
>>
>> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extension url))
>> (metadata (string-append dirname "/METADATA")))
>> (call-with-temporary-directory
>> (lambda (dir)
>> - (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadata))
>> + (if (zero?
>> + (parameterize ((current-error-port (%make-void-port "rw+"))
>> + (current-output-port (%make-void-port "rw+")))
>> + (system* "unzip" wheel-archive "-d" dir metadata)))
>> (parse-wheel-metadata (string-append dir "/" metadata))
>> (begin
>> (warning
>> @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-extension url))
>> (warning
>> (G_ "Failed to extract file: ~a from source.~%")
>> requires.txt)
>> - '())))))
>> - '())))
>> + (list '() '()))))))
>> + (list '() '()))))
>
> I would like to see cosmetic changes like these three hunks in separate
> commits.
Done for the first two hunks; the last one isn't cosmetic; it changes
the default return from an empty list to a list of two empty lists.
>> (define (compute-inputs source-url wheel-url archive)
>> - "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list of
>> -name/variable pairs describing the required inputs of this package. Also
>> + "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, return
>> +a pair of lists, each consisting of a list of name/variable pairs, for the
>> +propagated inputs and the native inputs, respectively. Also
>> return the unaltered list of upstream dependency names."
>> - (let ((dependencies
>> - (remove (cut string=? "argparse" <>)
>> - (guess-requirements source-url wheel-url archive))))
>> - (values (sort
>> - (map (lambda (input)
>> - (let ((guix-name (python->package-name input)))
>> - (list guix-name (list 'unquote (string->symbol guix-name)))))
>> - dependencies)
>> - (lambda args
>> - (match args
>> - (((a _ ...) (b _ ...))
>> - (string-ci<? a b)))))
>> - dependencies)))
>> +
>> + (define (strip-argparse deps)
>> + (remove (cut string=? "argparse" <>) deps))
>> +
>> + (define (requirement->package-name/sort deps)
>> + (sort
>> + (map (lambda (input)
>> + (let ((guix-name (python->package-name input)))
>> + (list guix-name (list 'unquote (string->symbol guix-name)))))
>> + deps)
>> + (lambda args
>> + (match args
>> + (((a _ ...) (b _ ...))
>> + (string-ci<? a b))))))
>> +
>> + (define process-requirements
>> + (compose requirement->package-name/sort strip-argparse))
>> +
>> + (let ((dependencies (guess-requirements source-url wheel-url archive)))
>> + (values (map process-requirements dependencies)
>> + (concatenate dependencies))))
>
> Giving names to these processing steps is fine and improves clarity, but
> I’m not so comfortable about returning ad-hoc pairs *and* multiple
> values (like above).
Using "values" is required by the API of the recursive importer.
>> + (match guix-dependencies
>> + ((required-inputs test-inputs)
>> + (values
>> + `(package
>> + (name ,(python->package-name name))
>> + (version ,version)
>> + (source (origin
>> + (method url-fetch)
>> + ;; Sometimes 'pypi-uri' doesn't quite work due to mixed
>> + ;; cases in NAME, for instance, as is the case with
>> + ;; "uwsgi". In that case, fall back to a full URL.
>> + (uri (pypi-uri ,(string-downcase name) version))
>> + (sha256
>> + (base32
>> + ,(guix-hash-url temp)))))
>> + (build-system python-build-system)
>> + ,@(maybe-inputs required-inputs 'propagated-inputs)
>> + ,@(maybe-inputs test-inputs 'native-inputs)
>> + (home-page ,home-page)
>> + (synopsis ,synopsis)
>> + (description ,description)
>> + (license ,(license->symbol license)))
>> + ;; Flatten the nested lists and return the upstream
>> + ;; dependencies.
>> + upstream-dependencies))))))))
>
> I don’t see anything being flattened here?
Good catch! Seems a remnant of something that is now gone :-).
Thanks for the review.
Maxim
next prev parent reply other threads:[~2019-06-12 3:15 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-16 20:00 bug#24450: pypi importer outputs strange character series in optional dependency case ng0
2019-03-29 4:24 ` Maxim Cournoyer
2019-06-16 17:02 ` ng0
2019-06-26 4:12 ` Maxim Cournoyer
2019-03-29 4:34 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-03-30 2:12 ` bug#24450: [PATCHv2] " T460s laptop
2019-03-31 14:40 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-04-01 15:28 ` bug#24450: [PATCHv2] " Ludovic Courtès
2019-05-15 11:06 ` Ricardo Wurmus
2019-05-20 4:05 ` bug#24450: [PATCHv2] " Maxim Cournoyer
2019-05-20 15:05 ` Ludovic Courtès
2019-05-22 1:13 ` Maxim Cournoyer
2019-05-27 14:48 ` Ricardo Wurmus
2019-06-10 2:10 ` Maxim Cournoyer
2019-05-27 15:11 ` Ricardo Wurmus
2019-06-10 3:30 ` Maxim Cournoyer
2019-06-10 9:23 ` Ricardo Wurmus
2019-06-16 14:11 ` Maxim Cournoyer
2019-06-17 1:41 ` Ricardo Wurmus
2019-05-27 15:54 ` Ricardo Wurmus
2019-06-10 8:32 ` Maxim Cournoyer
2019-06-10 9:12 ` Ricardo Wurmus
2019-06-16 6:05 ` Maxim Cournoyer
2019-05-27 15:58 ` Ricardo Wurmus
2019-05-28 10:23 ` Ricardo Wurmus
2019-06-10 13:30 ` Maxim Cournoyer
2019-06-10 20:13 ` Ricardo Wurmus
2019-05-28 11:04 ` Ricardo Wurmus
2019-06-11 0:39 ` Maxim Cournoyer
2019-06-11 11:56 ` Ricardo Wurmus
2019-05-28 13:21 ` Ricardo Wurmus
2019-05-28 14:48 ` Ricardo Wurmus
2019-06-16 5:10 ` Maxim Cournoyer
2019-05-28 14:53 ` Ricardo Wurmus
2019-05-30 2:24 ` Maxim Cournoyer
2019-06-16 5:53 ` Maxim Cournoyer
2019-06-12 3:00 ` Maxim Cournoyer [this message]
2019-06-12 6:39 ` Ricardo Wurmus
2019-06-16 14:29 ` Maxim Cournoyer
2019-06-16 14:36 ` bug#24450: [PATCHv3] " Maxim Cournoyer
2019-07-02 1:54 ` Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87imtb33tp.fsf@gmail.com \
--to=maxim.cournoyer@gmail.com \
--cc=24450@debbugs.gnu.org \
--cc=ricardo.wurmus@mdc-berlin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.