From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: taylanbayirli@gmail.com ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Newsgroups: gmane.lisp.guile.bugs Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Date: Sat, 11 Mar 2017 13:19:44 +0100 Message-ID: <87o9x83t0f.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1489234451 15604 195.159.176.226 (11 Mar 2017 12:14:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 11 Mar 2017 12:14:11 +0000 (UTC) To: 26058@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Sat Mar 11 13:14:06 2017 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmfuN-00039P-64 for guile-bugs@m.gmane.org; Sat, 11 Mar 2017 13:14:03 +0100 Original-Received: from localhost ([::1]:42997 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmfuT-0000H8-Ab for guile-bugs@m.gmane.org; Sat, 11 Mar 2017 07:14:09 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46576) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmfuN-0000H1-G6 for bug-guile@gnu.org; Sat, 11 Mar 2017 07:14:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmfuM-00083z-Bg for bug-guile@gnu.org; Sat, 11 Mar 2017 07:14:03 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52159) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cmfuM-00083q-82 for bug-guile@gnu.org; Sat, 11 Mar 2017 07:14:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cmfuM-0007rg-0K for bug-guile@gnu.org; Sat, 11 Mar 2017 07:14:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: taylanbayirli@gmail.com ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sat, 11 Mar 2017 12:14:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.148923440930181 (code B ref -1); Sat, 11 Mar 2017 12:14:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 11 Mar 2017 12:13:29 +0000 Original-Received: from localhost ([127.0.0.1]:50358 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmftp-0007qj-9U for submit@debbugs.gnu.org; Sat, 11 Mar 2017 07:13:29 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:55512) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmfto-0007qX-1m for submit@debbugs.gnu.org; Sat, 11 Mar 2017 07:13:28 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmfth-0007kc-Ky for submit@debbugs.gnu.org; Sat, 11 Mar 2017 07:13:22 -0500 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:34239) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cmfth-0007kX-Hy for submit@debbugs.gnu.org; Sat, 11 Mar 2017 07:13:21 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmftg-0000FE-Ao for bug-guile@gnu.org; Sat, 11 Mar 2017 07:13:21 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmftf-0007iA-7V for bug-guile@gnu.org; Sat, 11 Mar 2017 07:13:20 -0500 Original-Received: from mail-wr0-x234.google.com ([2a00:1450:400c:c0c::234]:35738) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cmftf-0007gz-0l for bug-guile@gnu.org; Sat, 11 Mar 2017 07:13:19 -0500 Original-Received: by mail-wr0-x234.google.com with SMTP id g10so78931385wrg.2 for ; Sat, 11 Mar 2017 04:13:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=; b=VUX3ogASgCy6N+POTpau6n/eJ3x0WxtZORO9b2zoTvgyyKwgQXdmyrDWAX6N1Uog7R 4+wjDA7TG6AVbEH3NamKFA9TmC8WXbrDAso3e3w1yy0ITAiLPsSpIAwo/hcGc5ldVdH3 RwyX6Ljl6xyQdOb2snrfkacyBbSXOPKq2ynSM9nnyHE0Z0ln4efmtigmD4LitEqbOKNp bJWMyBkWygSOQ0DRadr/eTUsDa39VS5EXuTHo3a1VzkhYu56pXJobGvXafobzKdg86rz nAmroiTeBIIVc+kP6QPFVsSwJOmWoMaRCRMcve/XUoKFlrSlHTjF8bL2W0KXSEeKMtmi bWSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=; b=bToNU7aqY0dpw7OrdrYwHqlmxJTOC8oMszev0bzUCT97lOLDN3Jf0BCgLx5LBRK5by 8ekMkA1xnemprWNlrty+KNVedjjJmoOsn2EuHcZ22krDJSaYihXE7WD21tgcjuRUS1sh WfuShcF82SMMpU9FpyvqDuir4wiYcg0Phk8T4Yy0sqnGC5XndbFz5Xgp1VkxaFMO8SiX ABWumPZlMjlnukq6vJJi4CE6wuOEGjMcfSq9GrQfEP4/MynEgqQQsDFlyV93AJg7Nhoe VD4LNIegkJqyg+xOGZmSdtKmgpA6+Rp/0wiOSAZu5Ew6mhb9CX+2XMyrBus22ghl9bbq lwBg== X-Gm-Message-State: AMke39mQ6SVRrIM2zksZBIohMlovT88jWsNeShsWzFpcNCkMxNEj0L4ovFiAhMGJ6FYmnQ== X-Received: by 10.223.128.5 with SMTP id 5mr19465736wrk.163.1489234397837; Sat, 11 Mar 2017 04:13:17 -0800 (PST) Original-Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id d42sm17161980wrd.37.2017.03.11.04.13.17 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 11 Mar 2017 04:13:17 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8669 Archived-At: See the R6RS Libraries document page 10. The differences: - R6RS supports reading a BOM. - R6RS mandates an endianness argument to specify the behavior at the absence of a BOM. - R6RS allows an optional third argument 'endianness-mandatory' to explicitly ignore any possible BOM. Here's a quick patch on top of master. I didn't test it thoroughly... ===File /home/taylan/src/guile/guile-master/0001-Fix-R6RS-utf16-string-and-utf32-string.patch=== >From f51cd1d4884caafb1ed0072cd77c0e3145f34576 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?= Date: Fri, 10 Mar 2017 22:36:55 +0100 Subject: [PATCH] Fix R6RS utf16->string and utf32->string. * module/rnrs/bytevectors.scm (read-bom16, read-bom32): New procedures. (r6rs-utf16->string, r6rs-utf32->string): Ditto. --- module/rnrs/bytevectors.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm index 9744359f0..997a8c9cb 100644 --- a/module/rnrs/bytevectors.scm +++ b/module/rnrs/bytevectors.scm @@ -69,7 +69,9 @@ bytevector-ieee-double-native-set! string->utf8 string->utf16 string->utf32 - utf8->string utf16->string utf32->string)) + utf8->string + (r6rs-utf16->string . utf16->string) + (r6rs-utf32->string . utf32->string))) (load-extension (string-append "libguile-" (effective-version)) @@ -80,4 +82,52 @@ `(quote ,sym) (error "unsupported endianness" sym))) +(define (read-bom16 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1))) + (cond + ((and (= c0 #xFE) (= c1 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE)) + 'little) + (else + #f)))) + +(define r6rs-utf16->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom16 bv))) + (if (not bom-endianness) + (utf16->string bv default-endianness) + (substring/shared (utf16->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf16->string bv endianness) + (r6rs-utf16->string bv endianness))))) + +(define (read-bom32 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1)) + (c2 (bytevector-u8-ref bv 2)) + (c3 (bytevector-u8-ref bv 3))) + (cond + ((and (= c0 #x00) (= c1 #x00) (= c2 #xFE) (= c3 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE) (= c2 #x00) (= c3 #x00)) + 'little) + (else + #f)))) + +(define r6rs-utf32->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom32 bv))) + (if (not bom-endianness) + (utf32->string bv default-endianness) + (substring/shared (utf32->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf32->string bv endianness) + (r6rs-utf32->string bv endianness))))) + ;;; bytevector.scm ends here -- 2.11.0 ============================================================