From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ricardo Wurmus Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case. Date: Mon, 27 May 2019 17:54:39 +0200 Message-ID: References: <87pnod7ot4.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:58832) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVHxo-00081L-EU for bug-guix@gnu.org; Mon, 27 May 2019 11:55:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVHxn-00013I-9H for bug-guix@gnu.org; Mon, 27 May 2019 11:55:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:40294) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hVHxn-000134-74 for bug-guix@gnu.org; Mon, 27 May 2019 11:55:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hVHxm-0001dQ-So for bug-guix@gnu.org; Mon, 27 May 2019 11:55:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87pnod7ot4.fsf@gmail.com> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 24450@debbugs.gnu.org Patch number 3! > From 0c62b541a3e8925b5ca31fe55dbe7536cf95151f Mon Sep 17 00:00:00 2001 > From: Maxim Cournoyer > Date: Thu, 28 Mar 2019 00:26:01 -0400 > Subject: [PATCH 3/9] import: pypi: Improve parsing of requirement > specifications. > > The previous solution was fragile and could leave unwanted characters in a > requirement name, such as '[' or ']'. Wouldn=E2=80=99t it be sufficient to add [ and ] to the list of forbidden characters? The tests pass with this implementation of clean-requirements: (define (clean-requirements s) (cond ((string-index s (char-set #\space #\> #\=3D #\< #\[ #\])) =3D> (cut stri= ng-take s <>)) (else s))) > +(define %requirement-name-regexp > + ;; Regexp to match the requirement name in a requirement specification. > + > + ;; Some grammar, taken from PEP-0508 (see: > + ;; https://www.python.org/dev/peps/pep-0508/). > + > + ;; The unified rule can be expressed as: > + ;; specification =3D wsp* ( url_req | name_req ) wsp* > + > + ;; where url_req is: > + ;; url_req =3D name wsp* extras? wsp* urlspec wsp+ quoted_marker? > + > + ;; and where name_req is: > + ;; name_req =3D name wsp* extras? wsp* versionspec? wsp* quoted_marker? > + > + ;; Thus, we need only matching NAME, which is expressed as: > + ;; identifer_end =3D letterOrDigit | (('-' | '_' | '.' )* letterOrDigi= t) > + ;; identifier =3D letterOrDigit identifier_end* > + ;; name =3D identifier > + (let* ((letter-or-digit "[A-Za-z0-9]") > + (identifier-end (string-append "(" letter-or-digit "|" > + "[-_.]*" letter-or-digit ")")) > + (identifier (string-append "^" letter-or-digit identifier-end "= *")) > + (name identifier)) > + (make-regexp name))) This seems a little too complicated. Translating a grammar into a regexp is probably not a good idea in general. Since we don=E2=80=99t care about anything other than the name it seems easier to just chop off the string tail as soon as we find an invalid character. --=20 Ricardo