From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character
	series in optional dependency case.
Date: Mon, 27 May 2019 17:54:39 +0200
Message-ID: <idj7eabao5c.fsf@bimsb-sys02.mdc-berlin.net>
References: <idj4l5w9dtu.fsf@bimsb-sys02.mdc-berlin.net>
	<87pnod7ot4.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([209.51.188.92]:58832)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hVHxo-00081L-EU
	for bug-guix@gnu.org; Mon, 27 May 2019 11:55:05 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hVHxn-00013I-9H
	for bug-guix@gnu.org; Mon, 27 May 2019 11:55:04 -0400
Received: from debbugs.gnu.org ([209.51.188.43]:40294)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1hVHxn-000134-74
	for bug-guix@gnu.org; Mon, 27 May 2019 11:55:03 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hVHxm-0001dQ-So
	for bug-guix@gnu.org; Mon, 27 May 2019 11:55:02 -0400
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.24450.B24450.15589724916240@debbugs.gnu.org>
In-Reply-To: <87pnod7ot4.fsf@gmail.com>
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 24450@debbugs.gnu.org

Patch number 3!

> From 0c62b541a3e8925b5ca31fe55dbe7536cf95151f Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Thu, 28 Mar 2019 00:26:01 -0400
> Subject: [PATCH 3/9] import: pypi: Improve parsing of requirement
>  specifications.
>
> The previous solution was fragile and could leave unwanted characters in a
> requirement name, such as '[' or ']'.

Wouldn=E2=80=99t it be sufficient to add [ and ] to the list of forbidden
characters?  The tests pass with this implementation of
clean-requirements:

(define (clean-requirements s)
 (cond
  ((string-index s (char-set #\space #\> #\=3D #\< #\[ #\])) =3D> (cut stri=
ng-take s <>))
  (else s)))

> +(define %requirement-name-regexp
> +  ;; Regexp to match the requirement name in a requirement specification.
> +
> +  ;; Some grammar, taken from PEP-0508 (see:
> +  ;; https://www.python.org/dev/peps/pep-0508/).
> +
> +  ;; The unified rule can be expressed as:
> +  ;; specification =3D wsp* ( url_req | name_req ) wsp*
> +
> +  ;; where url_req is:
> +  ;; url_req =3D name wsp* extras? wsp* urlspec wsp+ quoted_marker?
> +
> +  ;; and where name_req is:
> +  ;; name_req =3D name wsp* extras? wsp* versionspec? wsp* quoted_marker?
> +
> +  ;; Thus, we need only matching NAME, which is expressed as:
> +  ;; identifer_end =3D letterOrDigit | (('-' | '_' | '.' )* letterOrDigi=
t)
> +  ;; identifier    =3D letterOrDigit identifier_end*
> +  ;; name          =3D identifier
> +  (let* ((letter-or-digit "[A-Za-z0-9]")
> +         (identifier-end (string-append "(" letter-or-digit "|"
> +                                        "[-_.]*" letter-or-digit ")"))
> +         (identifier (string-append "^" letter-or-digit identifier-end "=
*"))
> +         (name identifier))
> +    (make-regexp name)))

This seems a little too complicated.  Translating a grammar into a
regexp is probably not a good idea in general.  Since we don=E2=80=99t care
about anything other than the name it seems easier to just chop off
the string tail as soon as we find an invalid character.

--=20
Ricardo