unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Nathan via "Developers list for Guile, the GNU extensibility library" <guile-devel@gnu.org>
To: Vivien Kraus <vivien@planete-kraus.eu>
Cc: Nathan <nathan_mail@nborghese.com>, guile-devel@gnu.org
Subject: Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Fri, 03 Nov 2023 13:49:37 -0400	[thread overview]
Message-ID: <87o7ga905j.fsf@nborghese.com> (raw)
In-Reply-To: <92c8a960b091300107e523126887fd1c246a0fda.camel@planete-kraus.eu>


Hi Vivien,

> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:

I wasn't able to find anything that MANDATED any normalization at all, either before or after Relative Resolution. It is possible that treating %2E as a literal dot in resolve-relative-reference could count as unwanted normalization. But it's a safe operation in terms of URI equivalence* and I think users would be less confused to have %2E%2E disappear than to have it remain.

Also, what if the resolve-relative-reference procedure didn't treat %2E as a dot?
There isn't a uri-normalize procedure users can call afterwards to fix that.
And there isn't a version of uri-decode that allows selectively decoding JUST the dot characters.
Users would have to write a lot of code themselves to get proper relative-resolution, so we should do it for them.


- Nathan

*References for the claim that treating %2E as a literal dot is always okay:
- Section 2.3: percent-encoded unreserved characters are always equivalent to decoded ones.
- Section 2.4: unreserved characters can be percent-decoded at any time.
- Section 6.2.2.3: dot-segments should be removed during normalization even if found outside of a relative-reference.

Vivien Kraus <vivien@planete-kraus.eu> writes:

> Hello Natan!
>
> Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
>> There is a problem and I fixed it by rewriting a bunch of code myself
>> because I need similar code.
>
> Thank you!
>
>> remove-dot-segments:
>> You cannot split-and-decode-uri-path and then encode-and-join-uri-
>> path.
>> Those are terrible functions that don't work on all URIs.
>> URI schemes are allowed to specify that certain reserved characters
>> (sub-delims) are special.
>> In that case, a sub-delim that IS escaped is different from a sub-
>> delim that IS NOT escaped.
>> 
>> Example input to your remove-dot-segments:
>> (resolve-relative-reference (string->uri-reference "/") (string->uri-
>> reference "excitement://a.com/a!a!%21!"))
>> Your wrong output:
>> excitement://a.com/a%21a%21%21%21
>
> I see.
>
>> 
>> One solution would be to only percent-decode dots. Because dot is
>> unreserved, that solution doesn't have any URI equivalence issues.
>> But I still think decoding dots automatically is a bad, unexpected
>> side-effect to have.
>> I rewrote this function so that it:
>> - works on both escaped and unescaped dots
>> - doesn't unescape any unnecessary characters
>
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
>
> 2.3: Unreserved Characters:
>    For consistency, percent-encoded octets in the ranges of ALPHA
>    (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
>    underscore (%5F), or tilde (%7E) should not be created by URI
>    producers and, when found in a URI, should be decoded to their
>    corresponding unreserved characters by URI normalizers.
>
> 5.2.1: Pre-parse the Base URI:
>    Normalization of the base URI, as described in Sections 6.2.2 and
>    6.2.3, is optional.  A URI reference must be transformed to its
>    target URI before it can be normalized.
>
> Did you find something more precise than that?  In any case, decoding
> the dots is probably the least unsafe thing to do.
>
>> 
>> The test suite no longer needs to check for incorrect output either:
>> > ;; The test suite checks for ';' characters, but Guile escapes
>> > ;; them in URIs. Same for '='.
>> 
>> ----
>> 
>> resolve-relative-reference:
>> I rewrote this procedure so it is shorter.
>> I also added #:strict? to toggle "strict parser" as mentioned in the
>> RFC.
>
> As far as I understand, your code is correct. The tests pass.
>
> Thank you again!
>
> Best regards,
>
> Vivien




  reply	other threads:[~2023-11-03 17:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-25 16:48 [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2 Vivien Kraus
2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48   ` [PATCH v2] " Vivien Kraus
2023-10-02 16:32     ` Vivien Kraus
2023-10-03 18:49       ` Maxime Devos
2023-09-25 16:48         ` [PATCH v3] " Vivien Kraus
2023-10-03 18:56         ` [PATCH v2] " Dale Mellor
2023-10-03 19:04           ` Maxime Devos
2023-10-03 20:03   ` [PATCH] " Vivien Kraus
2023-10-03 22:22     ` Maxime Devos
2023-10-03 22:30       ` Maxime Devos
2023-10-04  5:29         ` Vivien Kraus
2023-10-10 21:44           ` Maxime Devos
2023-09-25 16:48             ` [PATCH v4] " Vivien Kraus
2023-11-02 20:00               ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-02 20:48                 ` Vivien Kraus
2023-11-03 17:49                   ` Nathan via Developers list for Guile, the GNU extensibility library [this message]
2023-11-03 18:19                     ` Vivien Kraus
2023-11-27 17:10                 ` Vivien Kraus
2023-11-27 17:15                   ` Vivien Kraus
2023-11-29  1:08                     ` Nathan via Developers list for Guile, the GNU extensibility library

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o7ga905j.fsf@nborghese.com \
    --to=guile-devel@gnu.org \
    --cc=nathan_mail@nborghese.com \
    --cc=vivien@planete-kraus.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).