all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Cc: 24450@debbugs.gnu.org
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Mon, 10 Jun 2019 17:32:01 +0900	[thread overview]
Message-ID: <87imtd6dtq.fsf@gmail.com> (raw)
In-Reply-To: <idj7eabao5c.fsf@bimsb-sys02.mdc-berlin.net> (Ricardo Wurmus's message of "Mon, 27 May 2019 17:54:39 +0200")

Hello!

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:

> Patch number 3!

Yay!

>> From 0c62b541a3e8925b5ca31fe55dbe7536cf95151f Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Thu, 28 Mar 2019 00:26:01 -0400
>> Subject: [PATCH 3/9] import: pypi: Improve parsing of requirement
>>  specifications.
>>
>> The previous solution was fragile and could leave unwanted characters in a
>> requirement name, such as '[' or ']'.
>
> Wouldn’t it be sufficient to add [ and ] to the list of forbidden
> characters?  The tests pass with this implementation of
> clean-requirements:
>
> (define (clean-requirements s)
>  (cond
>   ((string-index s (char-set #\space #\> #\= #\< #\[ #\])) => (cut string-take s <>))
>   (else s)))

Indeed this would be sufficient to make the tests pass, but the tests
don't cover all the cases; as an example, consider:

--8<---------------cut here---------------start------------->8---
argparse;python_version<"2.7"
--8<---------------cut here---------------end--------------->8---

While we could make it work with the current logic by adding more
invalid characters (such as ';' here) to the character set, it seems
less error prone to use the upstream provided regex to match a package
name.  [0]

>> +(define %requirement-name-regexp
>> +  ;; Regexp to match the requirement name in a requirement specification.
>> +
>> +  ;; Some grammar, taken from PEP-0508 (see:
>> +  ;; https://www.python.org/dev/peps/pep-0508/).
>> +
>> +  ;; The unified rule can be expressed as:
>> +  ;; specification = wsp* ( url_req | name_req ) wsp*
>> +
>> +  ;; where url_req is:
>> +  ;; url_req = name wsp* extras? wsp* urlspec wsp+ quoted_marker?
>> +
>> +  ;; and where name_req is:
>> +  ;; name_req = name wsp* extras? wsp* versionspec? wsp* quoted_marker?
>> +
>> +  ;; Thus, we need only matching NAME, which is expressed as:
>> +  ;; identifer_end = letterOrDigit | (('-' | '_' | '.' )* letterOrDigit)
>> +  ;; identifier    = letterOrDigit identifier_end*
>> +  ;; name          = identifier
>> +  (let* ((letter-or-digit "[A-Za-z0-9]")
>> +         (identifier-end (string-append "(" letter-or-digit "|"
>> +                                        "[-_.]*" letter-or-digit ")"))
>> +         (identifier (string-append "^" letter-or-digit identifier-end "*"))
>> +         (name identifier))
>> +    (make-regexp name)))
>
> This seems a little too complicated.  Translating a grammar into a
> regexp is probably not a good idea in general.  Since we don’t care
> about anything other than the name it seems easier to just chop off
> the string tail as soon as we find an invalid character.

While I agree that a regexp is a bigger hammer than basic string
manipulation, I see some merit to it here:

1) We can be assured of conformance with upstream, again, per PEP-0508.
2) It is easier to extend; we might want to add parsing for the version
spec in order to disregard dependencies specified for Python < 3, for
example.

The use of the PEP-0508 grammar to define the regexp is useful to detail
in a more human-friendly language the components of the regexp.  We
could have otherwise used the more cryptic regexp for Python
distribution names:

--8<---------------cut here---------------start------------->8---
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
--8<---------------cut here---------------end--------------->8---

So I guess that what I'm saying is that I prefer this approach to using
string-index with invalid characters, for the reasons above.

[0]  https://www.python.org/dev/peps/pep-0508/

Thanks!

Maxim

  reply	other threads:[~2019-06-10  8:33 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16 20:00 bug#24450: pypi importer outputs strange character series in optional dependency case ng0
2019-03-29  4:24 ` Maxim Cournoyer
2019-06-16 17:02   ` ng0
2019-06-26  4:12     ` Maxim Cournoyer
2019-03-29  4:34 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-03-30  2:12   ` bug#24450: [PATCHv2] " T460s laptop
2019-03-31 14:40     ` bug#24450: [PATCH] " Maxim Cournoyer
2019-04-01 15:28     ` bug#24450: [PATCHv2] " Ludovic Courtès
2019-05-15 11:06 ` Ricardo Wurmus
2019-05-20  4:05   ` bug#24450: [PATCHv2] " Maxim Cournoyer
2019-05-20 15:05     ` Ludovic Courtès
2019-05-22  1:13       ` Maxim Cournoyer
2019-05-27 14:48     ` Ricardo Wurmus
2019-06-10  2:10       ` Maxim Cournoyer
2019-05-27 15:11     ` Ricardo Wurmus
2019-06-10  3:30       ` Maxim Cournoyer
2019-06-10  9:23         ` Ricardo Wurmus
2019-06-16 14:11           ` Maxim Cournoyer
2019-06-17  1:41             ` Ricardo Wurmus
2019-05-27 15:54     ` Ricardo Wurmus
2019-06-10  8:32       ` Maxim Cournoyer [this message]
2019-06-10  9:12         ` Ricardo Wurmus
2019-06-16  6:05           ` Maxim Cournoyer
2019-05-27 15:58     ` Ricardo Wurmus
2019-05-28 10:23     ` Ricardo Wurmus
2019-06-10 13:30       ` Maxim Cournoyer
2019-06-10 20:13         ` Ricardo Wurmus
2019-05-28 11:04     ` Ricardo Wurmus
2019-06-11  0:39       ` Maxim Cournoyer
2019-06-11 11:56         ` Ricardo Wurmus
2019-05-28 13:21     ` Ricardo Wurmus
2019-05-28 14:48     ` Ricardo Wurmus
2019-06-16  5:10       ` Maxim Cournoyer
2019-05-28 14:53     ` Ricardo Wurmus
2019-05-30  2:24       ` Maxim Cournoyer
2019-06-16  5:53       ` Maxim Cournoyer
2019-06-12  3:00 ` Maxim Cournoyer
2019-06-12  6:39   ` Ricardo Wurmus
2019-06-16 14:29     ` Maxim Cournoyer
2019-06-16 14:36       ` bug#24450: [PATCHv3] " Maxim Cournoyer
2019-07-02  1:54         ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87imtd6dtq.fsf@gmail.com \
    --to=maxim.cournoyer@gmail.com \
    --cc=24450@debbugs.gnu.org \
    --cc=ricardo.wurmus@mdc-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.