* bug#62290: Error when handling invalid unicode with suspendable ports
@ 2023-03-20 9:09 Christopher Baines
2023-03-20 9:15 ` bug#62290: [PATCH] Fix some invalid unicode handling issues " Christopher Baines
0 siblings, 1 reply; 3+ messages in thread
From: Christopher Baines @ 2023-03-20 9:09 UTC (permalink / raw)
To: 62290
Here's a simple reproducer:
(use-modules (ice-9 binary-ports)
(ice-9 suspendable-ports)
(rnrs bytevectors))
(define (test)
(let* ((sequence
'(#xf4 #xa4 #xbd #xa4))
(p (open-bytevector-input-port
(u8-list->bytevector sequence))))
(set-port-encoding! p "UTF-8")
(set-port-conversion-strategy! p 'substitute)
(peek (read-char p))))
(test)
(install-suspendable-ports!)
(test)
If you run it, it outputs #\� as expected the first time, but then using
suspendable ports, it raises an exception. The behaviour should be the
same.
;;; (#\�)
Backtrace:
In ice-9/boot-9.scm:
1752:10 8 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
7 (apply-smob/0 #<thunk 7f3f09cbe300>)
In ice-9/boot-9.scm:
724:2 6 (call-with-prompt ("prompt") #<procedure 7f3f09ccb320 …> …)
In ice-9/eval.scm:
619:8 5 (_ #(#(#<directory (guile-user) 7f3f09cc1c80>)))
In ice-9/boot-9.scm:
2836:4 4 (save-module-excursion #<procedure 7f3f09cb2300 at ice-…>)
4388:12 3 (_)
In /home/chris/Projects/Guile/guile/bad-unicode.scm:
12:10 2 (test)
In ice-9/suspendable-ports.scm:
591:33 1 (read-char _)
499:12 0 (peek-char-and-next-cur/utf8 _ _ _ _)
ice-9/suspendable-ports.scm:499:12: In procedure peek-char-and-next-cur/utf8:
In procedure integer->char: Argument 1 out of range: 1199972
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#62290: [PATCH] Fix some invalid unicode handling issues with suspendable ports.
2023-03-20 9:09 bug#62290: Error when handling invalid unicode with suspendable ports Christopher Baines
@ 2023-03-20 9:15 ` Christopher Baines
2023-03-20 22:27 ` bug#62290: Error when handling invalid unicode " Ludovic Courtès
0 siblings, 1 reply; 3+ messages in thread
From: Christopher Baines @ 2023-03-20 9:15 UTC (permalink / raw)
To: 62290
Based on the implementation in ports.c. I don't understand what this
code is really doing, but the suspendable ports implementation differs
from the similar C code for a couple of inequalities.
* module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a
couple of inequalities.
* test-suite/tests/ports.test ("string ports"): Add additional invalid
UTF-8 test case.
---
module/ice-9/suspendable-ports.scm | 8 ++++----
test-suite/tests/ports.test | 7 +++++++
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/module/ice-9/suspendable-ports.scm b/module/ice-9/suspendable-ports.scm
index a823f1d37..9fac1df62 100644
--- a/module/ice-9/suspendable-ports.scm
+++ b/module/ice-9/suspendable-ports.scm
@@ -419,7 +419,7 @@
(= (logand u8_2 #xc0) #x80)
(case u8_0
((#xe0) (>= u8_1 #xa0))
- ((#xed) (>= u8_1 #x9f))
+ ((#xed) (<= u8_1 #x9f))
(else #t)))
(kt (integer->char
(logior (ash (logand u8_0 #x0f) 12)
@@ -436,7 +436,7 @@
(= (logand u8_3 #xc0) #x80)
(case u8_0
((#xf0) (>= u8_1 #x90))
- ((#xf4) (>= u8_1 #x8f))
+ ((#xf4) (<= u8_1 #x8f))
(else #t)))
(kt (integer->char
(logior (ash (logand u8_0 #x07) 18)
@@ -462,7 +462,7 @@
((< buffering 2) 1)
((not (= (logand (ref 1) #xc0) #x80)) 1)
((and (eq? first-byte #xe0) (< (ref 1) #xa0)) 1)
- ((and (eq? first-byte #xed) (< (ref 1) #x9f)) 1)
+ ((and (eq? first-byte #xed) (> (ref 1) #x9f)) 1)
((< buffering 3) 2)
((not (= (logand (ref 2) #xc0) #x80)) 2)
(else 0)))
@@ -471,7 +471,7 @@
((< buffering 2) 1)
((not (= (logand (ref 1) #xc0) #x80)) 1)
((and (eq? first-byte #xf0) (< (ref 1) #x90)) 1)
- ((and (eq? first-byte #xf4) (< (ref 1) #x8f)) 1)
+ ((and (eq? first-byte #xf4) (> (ref 1) #x8f)) 1)
((< buffering 3) 2)
((not (= (logand (ref 2) #xc0) #x80)) 2)
((< buffering 4) 3)
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 66e10e3dd..1b30e1a68 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -1059,6 +1059,13 @@
eof))
(test-decoding-error (#xf0 #x88 #x88 #x88) "UTF-8"
+ (error ;; 2nd byte should be in the 90..BF range
+ error ;; 88: not a valid starting byte
+ error ;; 88: not a valid starting byte
+ error ;; 88: not a valid starting byte
+ eof))
+
+ (test-decoding-error (#xf4 #xa4 #xbd #xa4) "UTF-8"
(error ;; 2nd byte should be in the 90..BF range
error ;; 88: not a valid starting byte
error ;; 88: not a valid starting byte
--
2.39.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* bug#62290: Error when handling invalid unicode with suspendable ports
2023-03-20 9:15 ` bug#62290: [PATCH] Fix some invalid unicode handling issues " Christopher Baines
@ 2023-03-20 22:27 ` Ludovic Courtès
0 siblings, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2023-03-20 22:27 UTC (permalink / raw)
To: Christopher Baines; +Cc: 62290-done
Hello,
Christopher Baines <mail@cbaines.net> skribis:
> Based on the implementation in ports.c. I don't understand what this
> code is really doing, but the suspendable ports implementation differs
> from the similar C code for a couple of inequalities.
>
> * module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a
> couple of inequalities.
> * test-suite/tests/ports.test ("string ports"): Add additional invalid
> UTF-8 test case.
Pushed as cba2e7e3fec3c781230570f5d1ef070625eeeda8.
Thanks for documenting the problem and providing a perfect patch!
Ludo’.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-03-20 22:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-20 9:09 bug#62290: Error when handling invalid unicode with suspendable ports Christopher Baines
2023-03-20 9:15 ` bug#62290: [PATCH] Fix some invalid unicode handling issues " Christopher Baines
2023-03-20 22:27 ` bug#62290: Error when handling invalid unicode " Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).