From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ricardo Wurmus Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case. Date: Tue, 28 May 2019 15:21:56 +0200 Message-ID: References: <87pnod7ot4.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:34175) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVc4G-0000bo-Rb for bug-guix@gnu.org; Tue, 28 May 2019 09:23:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVc4E-0003FJ-NE for bug-guix@gnu.org; Tue, 28 May 2019 09:23:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:41764) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hVc4E-0003FC-Ih for bug-guix@gnu.org; Tue, 28 May 2019 09:23:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hVc4E-0005Ky-8a for bug-guix@gnu.org; Tue, 28 May 2019 09:23:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87pnod7ot4.fsf@gmail.com> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 24450@debbugs.gnu.org Next up: Seven of Nine, tertiary adjunct of unimatrix zero one: > From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001 > From: Maxim Cournoyer > Date: Thu, 28 Mar 2019 23:12:26 -0400 > Subject: [PATCH 7/9] import: pypi: Include optional test inputs as > native-inputs. > > * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use i= t. > (test-section?): New predicate. > (parse-requires.txt): Collect the optional test inputs, and return them a= s the > second element of the returned list. AFAICT parse-requires.txt now returns a list of pairs, but used to return a plain list before. Is this correct? > (define (parse-requires.txt requires.txt) > - "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of > -requirement names." > - ;; This is a very incomplete parser, which job is to select the non-op= tional > - ;; dependencies and strip them out of any version information. > + "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of = requirements. > + > +The first element of the pair contains the required dependencies while t= he > +second the optional test dependencies. Note that currently, optional, > +non-test dependencies are omitted since these can be difficult or expens= ive to > +satisfy." > + > + ;; This is a very incomplete parser, which job is to read in the requi= rement > + ;; specification lines, and strip them out of any version information. > ;; Alternatively, we could implement a PEG parser with the (ice-9 peg) > ;; library and the requirements grammar defined by PEP-0508 > ;; (https://www.python.org/dev/peps/pep-0508/). Does it really return a pair? Or a list of pairs? Or is it a two-element list of lists? > (call-with-input-file requires.txt > (lambda (port) > - (let loop ((result '())) > + (let loop ((required-deps '()) > + (test-deps '()) > + (inside-test-section? #f) > + (optional? #f)) > (let ((line (read-line port))) > - ;; Stop when a section is encountered, as sections contains op= tional > - ;; (extra) requirements. Non-optional requirements must appear > - ;; before any section is defined. > - (if (or (eof-object? line) (section-header? line)) > + (if (eof-object? line) > ;; Duplicates can occur, since the same requirement can be > ;; listed multiple times with different conditional marker= s, e.g. > ;; pytest >=3D 3 ; python_version >=3D "3.3" > ;; pytest < 3 ; python_version < "3.3" > - (reverse (delete-duplicates result)) > + (map (compose reverse delete-duplicates) > + (list required-deps test-deps)) Looks like a list of lists to me. =E2=80=9Cdelete-duplicates=E2=80=9D now = won=E2=80=99t delete a name that is in both =E2=80=9Crequired-deps=E2=80=9D as well as in =E2=80= =9Ctest-deps=E2=80=9D. Is this acceptable? Personally, I=E2=80=99m not a fan of using data structures for returning multiple values, because we can simply return multiple values. Or we could have more than just strings. The meaning of these strings is provided by the bin into which they are thrown =E2=80=94 either =E2=80=9Crequired-deps=E2=80=9D or =E2=80=9Ctest-deps=E2=80=9D. It could b= e an option to collect tagged values instead and have the caller deal with filtering. > (define (parse-wheel-metadata metadata) > - "Given METADATA, a Wheel metadata file, return a list of requirement n= ames." > + "Given METADATA, a Wheel metadata file, return a pair of requirements. > + > +The first element of the pair contains the required dependencies while t= he second the optional > +test dependencies. Note that currently, optional, non-test dependencies= are > +omitted since these can be difficult or expensive to satisfy." > ;; METADATA is a RFC-2822-like, header based file. This sounds like this is going to duplicate the previous procedures. > (define (requires-dist-header? line) > ;; Return #t if the given LINE is a Requires-Dist header. > - (regexp-match? (string-match "^Requires-Dist: " line))) > + (string-match "^Requires-Dist: " line)) > > (define (requires-dist-value line) > (string-drop line (string-length "Requires-Dist: "))) > > (define (extra? line) > ;; Return #t if the given LINE is an "extra" requirement. > - (regexp-match? (string-match "extra =3D=3D " line))) > + (string-match "extra =3D=3D '(.*)'" line)) These hunks should be part of the previous patch where they were introduced. (See my comments there about =E2=80=9Cregexp-match?=E2=80=9D.) > + (define (test-requirement? line) > + (let ((extra-label (match:substring (extra? line) 1))) > + (and extra-label (test-section? extra-label)))) You can use =E2=80=9Cand=3D>=E2=80=9D instead of binding a name: (and=3D> (match:substring (extra? line) 1) test-section?) > (call-with-input-file metadata > (lambda (port) > - (let loop ((requirements '())) > + (let loop ((required-deps '()) > + (test-deps '())) > (let ((line (read-line port))) > - ;; Stop at the first 'Provides-Extra' section: the non-optional > - ;; requirements appear before the optional ones. > (if (eof-object? line) > - (reverse (delete-duplicates requirements)) > + (map (compose reverse delete-duplicates) > + (list required-deps test-deps)) > (cond > ((and (requires-dist-header? line) (not (extra? line))) > (loop (cons (specification->requirement-name > (requires-dist-value line)) > - requirements))) > + required-deps) > + test-deps)) > + ((and (requires-dist-header? line) (test-requirement? lin= e)) > + (loop required-deps > + (cons (specification->requirement-name (requires-d= ist-value line)) > + test-deps))) > (else > - (loop requirements))))))))) > + (loop required-deps test-deps))))))))) ;skip line Could you refactor this so that the other parser can be reused? If not, the same comments about =E2=80=9Cif=E2=80=9D and =E2=80=9Ccond=E2=80=9D and= the use of pairs/lists instead of =E2=80=9Cvalues=E2=80=9D applies here. > (define (guess-requirements source-url wheel-url archive) > - "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a li= st > + "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a l= ist > of the required packages specified in the requirements.txt file. ARCHIV= E will > be extracted in a temporary directory." > > @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extens= ion url)) > (metadata (string-append dirname "/METADATA"))) > (call-with-temporary-directory > (lambda (dir) > - (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadat= a)) > + (if (zero? > + (parameterize ((current-error-port (%make-void-port "rw+")) > + (current-output-port (%make-void-port "rw+"= ))) > + (system* "unzip" wheel-archive "-d" dir metadata))) > (parse-wheel-metadata (string-append dir "/" metadata)) > (begin > (warning > @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-exten= sion url)) > (warning > (G_ "Failed to extract file: ~a from source.~%") > requires.txt) > - '()))))) > - '()))) > + (list '() '())))))) > + (list '() '())))) I would like to see cosmetic changes like these three hunks in separate commits. > (define (compute-inputs source-url wheel-url archive) > - "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list = of > -name/variable pairs describing the required inputs of this package. Also > + "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, = return > +a pair of lists, each consisting of a list of name/variable pairs, for t= he > +propagated inputs and the native inputs, respectively. Also > return the unaltered list of upstream dependency names." > - (let ((dependencies > - (remove (cut string=3D? "argparse" <>) > - (guess-requirements source-url wheel-url archive)))) > - (values (sort > - (map (lambda (input) > - (let ((guix-name (python->package-name input))) > - (list guix-name (list 'unquote (string->symbol gui= x-name))))) > - dependencies) > - (lambda args > - (match args > - (((a _ ...) (b _ ...)) > - (string-ci - dependencies))) > + > + (define (strip-argparse deps) > + (remove (cut string=3D? "argparse" <>) deps)) > + > + (define (requirement->package-name/sort deps) > + (sort > + (map (lambda (input) > + (let ((guix-name (python->package-name input))) > + (list guix-name (list 'unquote (string->symbol guix-name))= ))) > + deps) > + (lambda args > + (match args > + (((a _ ...) (b _ ...)) > + (string-ci + > + (define process-requirements > + (compose requirement->package-name/sort strip-argparse)) > + > + (let ((dependencies (guess-requirements source-url wheel-url archive))) > + (values (map process-requirements dependencies) > + (concatenate dependencies)))) Giving names to these processing steps is fine and improves clarity, but I=E2=80=99m not so comfortable about returning ad-hoc pairs *and* multiple values (like above). > + (match guix-dependencies > + ((required-inputs test-inputs) > + (values > + `(package > + (name ,(python->package-name name)) > + (version ,version) > + (source (origin > + (method url-fetch) > + ;; Sometimes 'pypi-uri' doesn't quite work = due to mixed > + ;; cases in NAME, for instance, as is the c= ase with > + ;; "uwsgi". In that case, fall back to a f= ull URL. > + (uri (pypi-uri ,(string-downcase name) vers= ion)) > + (sha256 > + (base32 > + ,(guix-hash-url temp))))) > + (build-system python-build-system) > + ,@(maybe-inputs required-inputs 'propagated-inputs) > + ,@(maybe-inputs test-inputs 'native-inputs) > + (home-page ,home-page) > + (synopsis ,synopsis) > + (description ,description) > + (license ,(license->symbol license))) > + ;; Flatten the nested lists and return the upstream > + ;; dependencies. > + upstream-dependencies)))))))) I don=E2=80=99t see anything being flattened here? -- Ricardo