From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vivien Kraus Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. Date: Thu, 02 Nov 2023 21:48:55 +0100 Message-ID: <92c8a960b091300107e523126887fd1c246a0fda.camel@planete-kraus.eu> References: <87fs1n53de.fsf@nborghese.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21645"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.46.4 To: Nathan , guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Thu Nov 02 21:49:34 2023 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qyedA-0005LF-TL for guile-devel@m.gmane-mx.org; Thu, 02 Nov 2023 21:49:32 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyecm-0007NS-Q5; Thu, 02 Nov 2023 16:49:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyeck-0007LY-Iy for guile-devel@gnu.org; Thu, 02 Nov 2023 16:49:06 -0400 Original-Received: from planete-kraus.eu ([89.234.140.182]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1qyeci-0003RD-Cz for guile-devel@gnu.org; Thu, 02 Nov 2023 16:49:06 -0400 Original-Received: from planete-kraus.eu (localhost.lan [127.0.0.1]) by planete-kraus.eu (OpenSMTPD) with ESMTP id da7d7469; Thu, 2 Nov 2023 20:49:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=planete-kraus.eu; h= message-id:subject:from:to:date:in-reply-to:references :content-type:content-transfer-encoding:mime-version; s= albinoniA; bh=4mDijJeZ5nCyIAhNfB4xm3uDxOg=; b=NeVdn0P1p8rWHpMm3m +7oxIm2B1i6HS4XeltkuuEWonlCggSlshHbgoAAsyf9EmUtqhrrBChan5Etg5OO6 xpKwhfwfe8Q6/ZzCDp5sgZA9L+TkdUm/SGPRipa4Bj5B8nr7fAOQRzKdAmdZ8e+O hAHkmQMUElmQQdmrYpzw9j7Pqly3s+fQG6TbhSvy4Co3Cp4VlpD4yOoBlkTPwPHY 72CWSfsOkhbz8kRISLqv/4IQKC1E7aAY0eRQ1uDI71By/jlShQtuBYGUIqm/3FY4 KY8pEJy87rcMGMStZMa9EttySn//TznlegAnH7bdjmL4QTcExtyKasHO0DPOTcIu nMvA== Original-Received: by planete-kraus.eu (OpenSMTPD) with ESMTPSA id cec3442f (TLSv1.3:TLS_CHACHA20_POLY1305_SHA256:256:NO); Thu, 2 Nov 2023 20:48:59 +0000 (UTC) In-Reply-To: <87fs1n53de.fsf@nborghese.com> Received-SPF: pass client-ip=89.234.140.182; envelope-from=vivien@planete-kraus.eu; helo=planete-kraus.eu X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22059 Archived-At: Hello Natan! Le jeudi 02 novembre 2023 =C3=A0 16:00 -0400, Nathan a =C3=A9crit=C2=A0: > There is a problem and I fixed it by rewriting a bunch of code myself > because I need similar code. Thank you! > remove-dot-segments: > You cannot split-and-decode-uri-path and then encode-and-join-uri- > path. > Those are terrible functions that don't work on all URIs. > URI schemes are allowed to specify that certain reserved characters > (sub-delims) are special. > In that case, a sub-delim that IS escaped is different from a sub- > delim that IS NOT escaped. >=20 > Example input to your remove-dot-segments: > (resolve-relative-reference (string->uri-reference "/") (string->uri- > reference "excitement://a.com/a!a!%21!")) > Your wrong output: > excitement://a.com/a%21a%21%21%21 I see. >=20 > One solution would be to only percent-decode dots. Because dot is > unreserved, that solution doesn't have any URI equivalence issues. > But I still think decoding dots automatically is a bad, unexpected > side-effect to have. > I rewrote this function so that it: > - works on both escaped and unescaped dots > - doesn't unescape any unnecessary characters This pushes the limits of my understanding of URIs, as I did not know we had to consider '%2E%2E' the same as '..'. However, the RFC is not very clear: 2.3: Unreserved Characters: For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. 5.2.1: Pre-parse the Base URI: Normalization of the base URI, as described in Sections 6.2.2 and 6.2.3, is optional. A URI reference must be transformed to its target URI before it can be normalized. Did you find something more precise than that? In any case, decoding the dots is probably the least unsafe thing to do. >=20 > The test suite no longer needs to check for incorrect output either: > > ;; The test suite checks for ';' characters, but Guile escapes > > ;; them in URIs. Same for '=3D'. >=20 > ---- >=20 > resolve-relative-reference: > I rewrote this procedure so it is shorter. > I also added #:strict? to toggle "strict parser" as mentioned in the > RFC. As far as I understand, your code is correct. The tests pass. Thank you again! Best regards, Vivien