From: Vivien Kraus via "Bug reports for GUILE, GNU's Ubiquitous Extension Language" <bug-guile@gnu.org>
To: 49085@debbugs.gnu.org
Subject: bug#49085: Handle more unicode characters in uri-decode
Date: Fri, 18 Jun 2021 12:27:46 +0200 [thread overview]
Message-ID: <103527ae65edab29b1ff860c214499104729d00c.camel@planete-kraus.eu> (raw)
[-- Attachment #1: Type: text/plain, Size: 180 bytes --]
Dear guilers,
uri-decode should accept unicode characters. This is difficult to
implement so that different encodings can be used, but here is my
solution.
Best regards,
Vivien
[-- Attachment #2: 0001-Let-uri-decode-handle-more-cases-with-unicode-input.patch --]
[-- Type: text/x-patch, Size: 2865 bytes --]
From 7adb0e333624c617f7c95796053df60d0a504ec9 Fri Jun 18 11:57:13 2021
From: Vivien Kraus <vivien@planete-kraus.eu>
Date: Fri, 18 Jun 2021 11:57:13 +0200
Subject: [PATCH] Let uri-decode handle more cases with unicode input.
Unicode characters are allowed in URIs. However, since Guile URIs are
not automatically UTF-8, it is possible that the decoding fails in two
different ways:
1. a character in the URI cannot be part of the specified encoding, for
instance a non-trivial unicode character with ISO-8859-1;
2. the percent-encoding sequence cannot be decoded, for instance %FF
alone in a UTF-8 URI.
---
module/web/uri.scm | 17 ++++++++++++++---
test-suite/tests/web-uri.test | 11 +++++++++++
2 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..bb3573fab 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -424,10 +424,21 @@ ENCODING was ‘#f’."
(put-u8 port (char->integer ch))
(lp (1+ i)))
(else
- (uri-error "Invalid character in encoded URI ~a: ~s"
- str ch))))))))))
+ (let ((encoded
+ (catch 'encoding-error
+ (lambda ()
+ (string->bytevector (string ch) encoding))
+ (lambda error
+ (uri-error "Invalid character in encoded URI ~a: ~s"
+ str ch)))))
+ (put-bytevector port encoded)
+ (lp (1+ i))))))))))))
(if encoding
- (bytevector->string bv encoding)
+ (catch 'decoding-error
+ (lambda ()
+ (bytevector->string bv encoding))
+ (lambda error
+ (uri-error "Invalid character in encoded URI ~a" str)))
;; Otherwise return raw bytevector
bv)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..dc1da79e6 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -686,6 +686,17 @@
(pass-if "foo+bar"
(equal? "foo bar" (uri-decode "foo+bar")))
+ (pass-if "foo✫bar"
+ (equal? "foo✫bar" (uri-decode "foo✫bar")))
+
+ (pass-if-uri-exception "foo✫bar in non-utf8 encoding"
+ "Invalid character in encoded URI ~a: ~s"
+ (uri-decode "foo✫bar" #:encoding "ISO-8859-1"))
+
+ (pass-if-uri-exception "foo%FFbar in utf-8"
+ "Invalid character in encoded URI ~a"
+ (uri-decode "foo%FFbar in utf-8"))
+
(pass-if "foo+bar"
(equal? '("foo+bar") (split-and-decode-uri-path "foo+bar"))))
--
2.32.0
next reply other threads:[~2021-06-18 10:27 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-18 10:27 Vivien Kraus via Bug reports for GUILE, GNU's Ubiquitous Extension Language [this message]
2021-07-20 13:54 ` bug#49085: Unicode in uri-decode: we’re parsing IRIs, so NFC (unicode normalization) applies Vivien Kraus via Bug reports for GUILE, GNU's Ubiquitous Extension Language
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=103527ae65edab29b1ff860c214499104729d00c.camel@planete-kraus.eu \
--to=bug-guile@gnu.org \
--cc=49085@debbugs.gnu.org \
--cc=vivien@planete-kraus.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).