From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Cournoyer Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case. Date: Wed, 12 Jun 2019 12:00:50 +0900 Message-ID: <87imtb33tp.fsf@gmail.com> References: <87h99fipj1.fsf@we.make.ritual.n0.is> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:38849) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hatjk-0002w5-Dg for bug-guix@gnu.org; Tue, 11 Jun 2019 23:15:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hatWV-0001LO-DD for bug-guix@gnu.org; Tue, 11 Jun 2019 23:02:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:46934) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hatWU-0001Kp-LW for bug-guix@gnu.org; Tue, 11 Jun 2019 23:02:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hatWU-0001Gs-FD for bug-guix@gnu.org; Tue, 11 Jun 2019 23:02:02 -0400 In-Reply-To: <87h99fipj1.fsf@we.make.ritual.n0.is> Sender: "Debbugs-submit" Resent-Message-ID: List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ricardo Wurmus Cc: 24450@debbugs.gnu.org Ricardo Wurmus writes: > Next up: Seven of Nine, tertiary adjunct of unimatrix zero one: Ehe! I had to look up the reference; I'm not much of a Star Trek fan obviously :-P. >> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001 >> From: Maxim Cournoyer >> Date: Thu, 28 Mar 2019 23:12:26 -0400 >> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as >> native-inputs. >> >> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use = it. >> (test-section?): New predicate. >> (parse-requires.txt): Collect the optional test inputs, and return them = as the >> second element of the returned list. > > AFAICT parse-requires.txt now returns a list of pairs, but used to > return a plain list before. Is this correct? Right, a list of two lists to be technically correct. >> (define (parse-requires.txt requires.txt) >> - "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of >> -requirement names." >> - ;; This is a very incomplete parser, which job is to select the non-o= ptional >> - ;; dependencies and strip them out of any version information. >> + "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of= requirements. >> + >> +The first element of the pair contains the required dependencies while = the >> +second the optional test dependencies. Note that currently, optional, >> +non-test dependencies are omitted since these can be difficult or expen= sive to >> +satisfy." >> + >> + ;; This is a very incomplete parser, which job is to read in the requ= irement >> + ;; specification lines, and strip them out of any version information. >> ;; Alternatively, we could implement a PEG parser with the (ice-9 peg) >> ;; library and the requirements grammar defined by PEP-0508 >> ;; (https://www.python.org/dev/peps/pep-0508/). > > Does it really return a pair? Or a list of pairs? Or is it a > two-element list of lists? The latter! I've fixed the docstring accordingly. >> (call-with-input-file requires.txt >> (lambda (port) >> - (let loop ((result '())) >> + (let loop ((required-deps '()) >> + (test-deps '()) >> + (inside-test-section? #f) >> + (optional? #f)) >> (let ((line (read-line port))) >> - ;; Stop when a section is encountered, as sections contains o= ptional >> - ;; (extra) requirements. Non-optional requirements must appe= ar >> - ;; before any section is defined. >> - (if (or (eof-object? line) (section-header? line)) >> + (if (eof-object? line) >> ;; Duplicates can occur, since the same requirement can be >> ;; listed multiple times with different conditional marke= rs, e.g. >> ;; pytest >=3D 3 ; python_version >=3D "3.3" >> ;; pytest < 3 ; python_version < "3.3" >> - (reverse (delete-duplicates result)) >> + (map (compose reverse delete-duplicates) >> + (list required-deps test-deps)) > > Looks like a list of lists to me. =E2=80=9Cdelete-duplicates=E2=80=9D no= w won=E2=80=99t delete > a name that is in both =E2=80=9Crequired-deps=E2=80=9D as well as in =E2= =80=9Ctest-deps=E2=80=9D. Is > this acceptable? It is acceptable, as this corner case cannot exist given the current code (a requirement can exist in either required-deps or test-deps, but never in both). It also doesn't make sense that a run time requirement would also be listed as a test requirement, so that corner case is not likely to exist in the future either. > Personally, I=E2=80=99m not a fan of using data structures for returning > multiple values, because we can simply return multiple values. I thought the Guile supported multiple values return value would be great here as well, but I've found that for this specific case here, a list of lists worked better, since the two lists contain requirements to be processed the same, which "map" can readily do (i.e. less ceremony is required). > Or we could have more than just strings. The meaning of these strings > is provided by the bin into which they are thrown =E2=80=94 either > =E2=80=9Crequired-deps=E2=80=9D or =E2=80=9Ctest-deps=E2=80=9D. It could= be an option to collect tagged > values instead and have the caller deal with filtering. Sounds neat, but I'd rather punt on this one for now. >> (define (parse-wheel-metadata metadata) >> - "Given METADATA, a Wheel metadata file, return a list of requirement = names." >> + "Given METADATA, a Wheel metadata file, return a pair of requirements. >> + >> +The first element of the pair contains the required dependencies while = the second the optional >> +test dependencies. Note that currently, optional, non-test dependencie= s are >> +omitted since these can be difficult or expensive to satisfy." >> ;; METADATA is a RFC-2822-like, header based file. > > This sounds like this is going to duplicate the previous procedures. Instead of duplicating the docstring, I'm now referring to that of PARSE-REQUIRES.TXT for PARSE-WHEEL-METADATA. The two procedures share the same interface, but implement a different parser, which consist of mostly a loop + conditional branches. IMHO, it's not worth, or even, desirable to try to merge those two parsers into one; I prefer to keep the logic of two distinct parsers separate. >> (define (requires-dist-header? line) >> ;; Return #t if the given LINE is a Requires-Dist header. >> - (regexp-match? (string-match "^Requires-Dist: " line))) >> + (string-match "^Requires-Dist: " line)) >> >> (define (requires-dist-value line) >> (string-drop line (string-length "Requires-Dist: "))) >> >> (define (extra? line) >> ;; Return #t if the given LINE is an "extra" requirement. >> - (regexp-match? (string-match "extra =3D=3D " line))) >> + (string-match "extra =3D=3D '(.*)'" line)) > > These hunks should be part of the previous patch where they were > introduced. (See my comments there about =E2=80=9Cregexp-match?=E2=80=9D= .) Done. >> + (define (test-requirement? line) >> + (let ((extra-label (match:substring (extra? line) 1))) >> + (and extra-label (test-section? extra-label)))) > > You can use =E2=80=9Cand=3D>=E2=80=9D instead of binding a name: > > (and=3D> (match:substring (extra? line) 1) test-section?) Neat! I still don't have the reflex to use "and=3D>", thanks for reminding me about it! >> (call-with-input-file metadata >> (lambda (port) >> - (let loop ((requirements '())) >> + (let loop ((required-deps '()) >> + (test-deps '())) >> (let ((line (read-line port))) >> - ;; Stop at the first 'Provides-Extra' section: the non-option= al >> - ;; requirements appear before the optional ones. >> (if (eof-object? line) >> - (reverse (delete-duplicates requirements)) >> + (map (compose reverse delete-duplicates) >> + (list required-deps test-deps)) >> (cond >> ((and (requires-dist-header? line) (not (extra? line))) >> (loop (cons (specification->requirement-name >> (requires-dist-value line)) >> - requirements))) >> + required-deps) >> + test-deps)) >> + ((and (requires-dist-header? line) (test-requirement? li= ne)) >> + (loop required-deps >> + (cons (specification->requirement-name (requires-= dist-value line)) >> + test-deps))) >> (else >> - (loop requirements))))))))) >> + (loop required-deps test-deps))))))))) ;skip line > > Could you refactor this so that the other parser can be reused? If not, > the same comments about =E2=80=9Cif=E2=80=9D and =E2=80=9Ccond=E2=80=9D a= nd the use of pairs/lists > instead of =E2=80=9Cvalues=E2=80=9D applies here. Done w.r.t. using only "cond" rather than "if" + "cond". As explained above, the docstring's body now refer to the documentation of PARSE-REQUIRES.TXT; otherwise I prefer to keep the parsers separate. >> (define (guess-requirements source-url wheel-url archive) >> - "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a l= ist >> + "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a = list >> of the required packages specified in the requirements.txt file. ARCHI= VE will >> be extracted in a temporary directory." >> >> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-exten= sion url)) >> (metadata (string-append dirname "/METADATA"))) >> (call-with-temporary-directory >> (lambda (dir) >> - (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metada= ta)) >> + (if (zero? >> + (parameterize ((current-error-port (%make-void-port "rw+"= )) >> + (current-output-port (%make-void-port "rw+= "))) >> + (system* "unzip" wheel-archive "-d" dir metadata))) >> (parse-wheel-metadata (string-append dir "/" metadata)) >> (begin >> (warning >> @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-exte= nsion url)) >> (warning >> (G_ "Failed to extract file: ~a from source.~%") >> requires.txt) >> - '()))))) >> - '()))) >> + (list '() '())))))) >> + (list '() '())))) > > I would like to see cosmetic changes like these three hunks in separate > commits. Done for the first two hunks; the last one isn't cosmetic; it changes the default return from an empty list to a list of two empty lists. >> (define (compute-inputs source-url wheel-url archive) >> - "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list= of >> -name/variable pairs describing the required inputs of this package. Al= so >> + "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE,= return >> +a pair of lists, each consisting of a list of name/variable pairs, for = the >> +propagated inputs and the native inputs, respectively. Also >> return the unaltered list of upstream dependency names." >> - (let ((dependencies >> - (remove (cut string=3D? "argparse" <>) >> - (guess-requirements source-url wheel-url archive)))) >> - (values (sort >> - (map (lambda (input) >> - (let ((guix-name (python->package-name input))) >> - (list guix-name (list 'unquote (string->symbol gu= ix-name))))) >> - dependencies) >> - (lambda args >> - (match args >> - (((a _ ...) (b _ ...)) >> - (string-ci> - dependencies))) >> + >> + (define (strip-argparse deps) >> + (remove (cut string=3D? "argparse" <>) deps)) >> + >> + (define (requirement->package-name/sort deps) >> + (sort >> + (map (lambda (input) >> + (let ((guix-name (python->package-name input))) >> + (list guix-name (list 'unquote (string->symbol guix-name)= )))) >> + deps) >> + (lambda args >> + (match args >> + (((a _ ...) (b _ ...)) >> + (string-ci> + >> + (define process-requirements >> + (compose requirement->package-name/sort strip-argparse)) >> + >> + (let ((dependencies (guess-requirements source-url wheel-url archive)= )) >> + (values (map process-requirements dependencies) >> + (concatenate dependencies)))) > > Giving names to these processing steps is fine and improves clarity, but > I=E2=80=99m not so comfortable about returning ad-hoc pairs *and* multiple > values (like above). Using "values" is required by the API of the recursive importer. >> + (match guix-dependencies >> + ((required-inputs test-inputs) >> + (values >> + `(package >> + (name ,(python->package-name name)) >> + (version ,version) >> + (source (origin >> + (method url-fetch) >> + ;; Sometimes 'pypi-uri' doesn't quite work= due to mixed >> + ;; cases in NAME, for instance, as is the = case with >> + ;; "uwsgi". In that case, fall back to a = full URL. >> + (uri (pypi-uri ,(string-downcase name) ver= sion)) >> + (sha256 >> + (base32 >> + ,(guix-hash-url temp))))) >> + (build-system python-build-system) >> + ,@(maybe-inputs required-inputs 'propagated-inputs) >> + ,@(maybe-inputs test-inputs 'native-inputs) >> + (home-page ,home-page) >> + (synopsis ,synopsis) >> + (description ,description) >> + (license ,(license->symbol license))) >> + ;; Flatten the nested lists and return the upstream >> + ;; dependencies. >> + upstream-dependencies)))))))) > > I don=E2=80=99t see anything being flattened here? Good catch! Seems a remnant of something that is now gone :-). Thanks for the review. Maxim