From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Nathan via "Developers list for Guile, the GNU extensibility library" Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. Date: Fri, 03 Nov 2023 13:49:37 -0400 Message-ID: <87o7ga905j.fsf@nborghese.com> References: <87fs1n53de.fsf@nborghese.com> <92c8a960b091300107e523126887fd1c246a0fda.camel@planete-kraus.eu> Reply-To: Nathan Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31803"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Nathan , guile-devel@gnu.org To: Vivien Kraus Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Fri Nov 03 19:12:50 2023 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qyyf4-0007zN-63 for guile-devel@m.gmane-mx.org; Fri, 03 Nov 2023 19:12:50 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyyeX-0007jV-Hw; Fri, 03 Nov 2023 14:12:17 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyyeU-0007jD-KC for guile-devel@gnu.org; Fri, 03 Nov 2023 14:12:14 -0400 Original-Received: from mail.nborghese.com ([207.148.28.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyyeQ-00038i-Q0 for guile-devel@gnu.org; Fri, 03 Nov 2023 14:12:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=062122; bh=yrhozX/LkZrOW VSYIT7i/cQfEFjWj+kH78dJI+K+LUA=; h=in-reply-to:date:subject:cc:to: from:references; d=nborghese.com; b=fdlWLWU4YbmSmgvEC1wGhUDu2YBhLskMRZ cVo2KmdlPhtKWQL5Qu7yvB9h8DwL+39K5HEzS/cARN8hropt2py9o8Y06nh+C1+N/lzGfQ 2nAfVWNTf4+UaAP/LNI+kL9Fs4abrqP3Jmbq1DdO9Vy0WN69gnKbAfRaftzjGJ6HEZQ= Original-Received: by nborghese.com (OpenSMTPD) with ESMTPSA id f8fa1508 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 3 Nov 2023 18:12:05 +0000 (UTC) In-reply-to: <92c8a960b091300107e523126887fd1c246a0fda.camel@planete-kraus.eu> Received-SPF: pass client-ip=207.148.28.48; envelope-from=nathan_mail@nborghese.com; helo=mail.nborghese.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22061 Archived-At: Hi Vivien, > This pushes the limits of my understanding of URIs, as I did not know > we had to consider '%2E%2E' the same as '..'. However, the RFC is not > very clear: I wasn't able to find anything that MANDATED any normalization at all, eith= er before or after Relative Resolution. It is possible that treating %2E as= a literal dot in resolve-relative-reference could count as unwanted normal= ization. But it's a safe operation in terms of URI equivalence* and I think= users would be less confused to have %2E%2E disappear than to have it rema= in. Also, what if the resolve-relative-reference procedure didn't treat %2E as = a dot? There isn't a uri-normalize procedure users can call afterwards to fix that. And there isn't a version of uri-decode that allows selectively decoding JU= ST the dot characters. Users would have to write a lot of code themselves to get proper relative-r= esolution, so we should do it for them. - Nathan *References for the claim that treating %2E as a literal dot is always okay: - Section 2.3: percent-encoded unreserved characters are always equivalent = to decoded ones. - Section 2.4: unreserved characters can be percent-decoded at any time. - Section 6.2.2.3: dot-segments should be removed during normalization even= if found outside of a relative-reference. Vivien Kraus writes: > Hello Natan! > > Le jeudi 02 novembre 2023 =C3=A0 16:00 -0400, Nathan a =C3=A9crit=C2=A0: >> There is a problem and I fixed it by rewriting a bunch of code myself >> because I need similar code. > > Thank you! > >> remove-dot-segments: >> You cannot split-and-decode-uri-path and then encode-and-join-uri- >> path. >> Those are terrible functions that don't work on all URIs. >> URI schemes are allowed to specify that certain reserved characters >> (sub-delims) are special. >> In that case, a sub-delim that IS escaped is different from a sub- >> delim that IS NOT escaped. >>=20 >> Example input to your remove-dot-segments: >> (resolve-relative-reference (string->uri-reference "/") (string->uri- >> reference "excitement://a.com/a!a!%21!")) >> Your wrong output: >> excitement://a.com/a%21a%21%21%21 > > I see. > >>=20 >> One solution would be to only percent-decode dots. Because dot is >> unreserved, that solution doesn't have any URI equivalence issues. >> But I still think decoding dots automatically is a bad, unexpected >> side-effect to have. >> I rewrote this function so that it: >> - works on both escaped and unescaped dots >> - doesn't unescape any unnecessary characters > > This pushes the limits of my understanding of URIs, as I did not know > we had to consider '%2E%2E' the same as '..'. However, the RFC is not > very clear: > > 2.3: Unreserved Characters: > For consistency, percent-encoded octets in the ranges of ALPHA > (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), > underscore (%5F), or tilde (%7E) should not be created by URI > producers and, when found in a URI, should be decoded to their > corresponding unreserved characters by URI normalizers. > > 5.2.1: Pre-parse the Base URI: > Normalization of the base URI, as described in Sections 6.2.2 and > 6.2.3, is optional. A URI reference must be transformed to its > target URI before it can be normalized. > > Did you find something more precise than that? In any case, decoding > the dots is probably the least unsafe thing to do. > >>=20 >> The test suite no longer needs to check for incorrect output either: >> > ;; The test suite checks for ';' characters, but Guile escapes >> > ;; them in URIs. Same for '=3D'. >>=20 >> ---- >>=20 >> resolve-relative-reference: >> I rewrote this procedure so it is shorter. >> I also added #:strict? to toggle "strict parser" as mentioned in the >> RFC. > > As far as I understand, your code is correct. The tests pass. > > Thank you again! > > Best regards, > > Vivien