From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Cc: 24450@debbugs.gnu.org
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Mon, 10 Jun 2019 17:32:01 +0900 [thread overview]
Message-ID: <87imtd6dtq.fsf@gmail.com> (raw)
In-Reply-To: <idj7eabao5c.fsf@bimsb-sys02.mdc-berlin.net> (Ricardo Wurmus's message of "Mon, 27 May 2019 17:54:39 +0200")
Hello!
Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:
> Patch number 3!
Yay!
>> From 0c62b541a3e8925b5ca31fe55dbe7536cf95151f Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Thu, 28 Mar 2019 00:26:01 -0400
>> Subject: [PATCH 3/9] import: pypi: Improve parsing of requirement
>> specifications.
>>
>> The previous solution was fragile and could leave unwanted characters in a
>> requirement name, such as '[' or ']'.
>
> Wouldn’t it be sufficient to add [ and ] to the list of forbidden
> characters? The tests pass with this implementation of
> clean-requirements:
>
> (define (clean-requirements s)
> (cond
> ((string-index s (char-set #\space #\> #\= #\< #\[ #\])) => (cut string-take s <>))
> (else s)))
Indeed this would be sufficient to make the tests pass, but the tests
don't cover all the cases; as an example, consider:
--8<---------------cut here---------------start------------->8---
argparse;python_version<"2.7"
--8<---------------cut here---------------end--------------->8---
While we could make it work with the current logic by adding more
invalid characters (such as ';' here) to the character set, it seems
less error prone to use the upstream provided regex to match a package
name. [0]
>> +(define %requirement-name-regexp
>> + ;; Regexp to match the requirement name in a requirement specification.
>> +
>> + ;; Some grammar, taken from PEP-0508 (see:
>> + ;; https://www.python.org/dev/peps/pep-0508/).
>> +
>> + ;; The unified rule can be expressed as:
>> + ;; specification = wsp* ( url_req | name_req ) wsp*
>> +
>> + ;; where url_req is:
>> + ;; url_req = name wsp* extras? wsp* urlspec wsp+ quoted_marker?
>> +
>> + ;; and where name_req is:
>> + ;; name_req = name wsp* extras? wsp* versionspec? wsp* quoted_marker?
>> +
>> + ;; Thus, we need only matching NAME, which is expressed as:
>> + ;; identifer_end = letterOrDigit | (('-' | '_' | '.' )* letterOrDigit)
>> + ;; identifier = letterOrDigit identifier_end*
>> + ;; name = identifier
>> + (let* ((letter-or-digit "[A-Za-z0-9]")
>> + (identifier-end (string-append "(" letter-or-digit "|"
>> + "[-_.]*" letter-or-digit ")"))
>> + (identifier (string-append "^" letter-or-digit identifier-end "*"))
>> + (name identifier))
>> + (make-regexp name)))
>
> This seems a little too complicated. Translating a grammar into a
> regexp is probably not a good idea in general. Since we don’t care
> about anything other than the name it seems easier to just chop off
> the string tail as soon as we find an invalid character.
While I agree that a regexp is a bigger hammer than basic string
manipulation, I see some merit to it here:
1) We can be assured of conformance with upstream, again, per PEP-0508.
2) It is easier to extend; we might want to add parsing for the version
spec in order to disregard dependencies specified for Python < 3, for
example.
The use of the PEP-0508 grammar to define the regexp is useful to detail
in a more human-friendly language the components of the regexp. We
could have otherwise used the more cryptic regexp for Python
distribution names:
--8<---------------cut here---------------start------------->8---
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
--8<---------------cut here---------------end--------------->8---
So I guess that what I'm saying is that I prefer this approach to using
string-index with invalid characters, for the reasons above.
[0] https://www.python.org/dev/peps/pep-0508/
Thanks!
Maxim
next prev parent reply other threads:[~2019-06-10 8:33 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-16 20:00 bug#24450: pypi importer outputs strange character series in optional dependency case ng0
2019-03-29 4:24 ` Maxim Cournoyer
2019-06-16 17:02 ` ng0
2019-06-26 4:12 ` Maxim Cournoyer
2019-03-29 4:34 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-03-30 2:12 ` bug#24450: [PATCHv2] " T460s laptop
2019-03-31 14:40 ` bug#24450: [PATCH] " Maxim Cournoyer
2019-04-01 15:28 ` bug#24450: [PATCHv2] " Ludovic Courtès
2019-05-15 11:06 ` Ricardo Wurmus
2019-05-20 4:05 ` bug#24450: [PATCHv2] " Maxim Cournoyer
2019-05-20 15:05 ` Ludovic Courtès
2019-05-22 1:13 ` Maxim Cournoyer
2019-05-27 14:48 ` Ricardo Wurmus
2019-06-10 2:10 ` Maxim Cournoyer
2019-05-27 15:11 ` Ricardo Wurmus
2019-06-10 3:30 ` Maxim Cournoyer
2019-06-10 9:23 ` Ricardo Wurmus
2019-06-16 14:11 ` Maxim Cournoyer
2019-06-17 1:41 ` Ricardo Wurmus
2019-05-27 15:54 ` Ricardo Wurmus
2019-06-10 8:32 ` Maxim Cournoyer [this message]
2019-06-10 9:12 ` Ricardo Wurmus
2019-06-16 6:05 ` Maxim Cournoyer
2019-05-27 15:58 ` Ricardo Wurmus
2019-05-28 10:23 ` Ricardo Wurmus
2019-06-10 13:30 ` Maxim Cournoyer
2019-06-10 20:13 ` Ricardo Wurmus
2019-05-28 11:04 ` Ricardo Wurmus
2019-06-11 0:39 ` Maxim Cournoyer
2019-06-11 11:56 ` Ricardo Wurmus
2019-05-28 13:21 ` Ricardo Wurmus
2019-05-28 14:48 ` Ricardo Wurmus
2019-06-16 5:10 ` Maxim Cournoyer
2019-05-28 14:53 ` Ricardo Wurmus
2019-05-30 2:24 ` Maxim Cournoyer
2019-06-16 5:53 ` Maxim Cournoyer
2019-06-12 3:00 ` Maxim Cournoyer
2019-06-12 6:39 ` Ricardo Wurmus
2019-06-16 14:29 ` Maxim Cournoyer
2019-06-16 14:36 ` bug#24450: [PATCHv3] " Maxim Cournoyer
2019-07-02 1:54 ` Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87imtd6dtq.fsf@gmail.com \
--to=maxim.cournoyer@gmail.com \
--cc=24450@debbugs.gnu.org \
--cc=ricardo.wurmus@mdc-berlin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).