Op 25-09-2023 om 18:48 schreef Vivien Kraus: > * module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4. > (merge-paths): Implement algorithm 5.2.3. > (resolve-relative-reference): Implement algorithm 5.2.2. > (module): Export resolve-relative-reference. > * NEWS: Reference it here. > --- > Dear Guile developers, > > When you request https://example.com/resource an URI and get redirected > to "here", you end up with 2 URI references: > > - https://example.com/resource > - here > > What should you request next? The answer is, > "https://example.com/here". It seems evident how we go from one to the > other. > > However, there are more subtle cases. What if you get redirected to > "../here", for instance? > > RFC 3986 has you covered, in section 5.2. It explains how we go from a > base URI and a URI reference to the new URI. > What do you think? > > Best regards, Sounds very useful. However, there are also some dangers on doing this thing -- the ‘external’ page https://example.com/data.json could redirect to http://localhost/unsecured-secret-but-its-localhost-only-so-it-is-safe. I forgot the name of this attack, but there is probably a page somewhere that documents the danger and how to mitigate it (I think I read some Firefox documentation somewhere?). Perhaps there exists an informative RFC about it? I think it put a warning about this somewhere in the documentation. (Another related problem is that example.com could have IP address ::1, but that's a different problem.) > > Vivien > > NEWS | 7 ++ > module/web/uri.scm | 152 +++++++++++++++++++++++++++++++++- > test-suite/tests/web-uri.test | 68 +++++++++++++++ You forgot modifying the non-docstring documentation to properly document the new procedure. > 3 files changed, 226 insertions(+), 1 deletion(-) Given the existence of resolve-relative-reference, it is easy to expect http-request to do redirection. I think it would be best to adjust to the documentation of http-request / http-get / ... to mention whether automatic redirection is done or not. > +(define (resolve-relative-reference base relative) > + "Resolve @var{relative} on top of @var{base}, as RFC3986, section 5.2." I don't like the mutation, but it's a completely deterministic procedure (ignoring memory allocation) so it can't cause problems and hence I suppose it's not worth rewriting (unless you or someone else really wants to rewrite it, I suppose). I suppose it also avoids the risk of accidentally deviating from the RFC it is supposed to implement, which is major advantage of sticking with the mutation. I like that you say _which_ resolution method you are using instead of saying or implying that this is the always the _right_ way of resolving relative references, because some URI schemes are rather quirky. (I don't know any quirkiness w.r.t. relative references, but wouldn't be surprised if it exists.) Also I think it's worth stating that both base and relative are URIs -- with the current docstring, I find (resolve-... uri "./whatever") a reasonable thing to do. IIUC, there currently is nothing preventing (resolve-... (uri object for "https://example.com/a") (uri object for "https://example.com/b")). IIUC, this is supposed to return (uri object for "https://example.com/b"), but that could be more explicit with a change of variable name. (define (resolve-... base maybe-relative) [...]) > +(with-test-prefix "resolve relative reference" > + ;; Test suite in RFC3986, section 5.4. > + (let ((base (string->uri "http://a/b/c/d;p?q")) > + (equal/encoded? > + ;; The test suite checks for ';' characters, but Guile escapes > + ;; them in URIs. Same for '='. IIUC, that's a bug! 6.2.2.2. Percent-Encoding Normalization The percent-encoding mechanism (Section 2.1) is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. __These URIs __should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3.__ (Emphasis added.) I am assuming here that ; is an unreserved character, if it isn't, there isn't a bug here. However, I sense a lack of normalisation in resolve-relative-reference, so unless Guile already does normalisation elsewhere (perhaps it does during the URI object construction?), there might be a bug here -- ok, technically perhaps not a bug because the docstring only mentions ‘implements section 5.2 and 5.2. doesn't seem to mention section 6 (and section 6 says ‘should’, not ‘shall/must’, but some people use ‘should’ as ‘shall/must’, so dunno), but in that case that's still a potential pitfall that could be mentioned in the docstring. Could be as simple as "No normalisation is performed.". I guess it shouldn't do normalisation, but guesswork seems better to be avoided/confirmed or disconfirmed when possible. Best regards, Maxime Devos.