From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: taylanbayirli@gmail.com ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Newsgroups: gmane.lisp.guile.bugs Subject: bug#26059: utf16->string and utf32->string don't conform to R6RS Date: Sat, 11 Mar 2017 17:26:42 +0100 Message-ID: <87bmt74w59.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1489249272 9121 195.159.176.226 (11 Mar 2017 16:21:12 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 11 Mar 2017 16:21:12 +0000 (UTC) To: 26059@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Sat Mar 11 17:21:08 2017 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmjlS-0001cy-JH for guile-bugs@m.gmane.org; Sat, 11 Mar 2017 17:21:06 +0100 Original-Received: from localhost ([::1]:43893 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmjlW-0000OM-Sm for guile-bugs@m.gmane.org; Sat, 11 Mar 2017 11:21:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43574) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmjlQ-0000OG-8C for bug-guile@gnu.org; Sat, 11 Mar 2017 11:21:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmjlO-0001fU-Nl for bug-guile@gnu.org; Sat, 11 Mar 2017 11:21:04 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:53437) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cmjlO-0001fM-FM for bug-guile@gnu.org; Sat, 11 Mar 2017 11:21:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cmjlO-0002Pi-9Y for bug-guile@gnu.org; Sat, 11 Mar 2017 11:21:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: taylanbayirli@gmail.com ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sat, 11 Mar 2017 16:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 26059 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14892492369222 (code B ref -1); Sat, 11 Mar 2017 16:21:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 11 Mar 2017 16:20:36 +0000 Original-Received: from localhost ([127.0.0.1]:51636 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmjkx-0002Of-Qv for submit@debbugs.gnu.org; Sat, 11 Mar 2017 11:20:36 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:52452) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmjkw-0002OT-BG for submit@debbugs.gnu.org; Sat, 11 Mar 2017 11:20:34 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmjkq-0001QB-5D for submit@debbugs.gnu.org; Sat, 11 Mar 2017 11:20:29 -0500 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:54606) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cmjkq-0001Q0-2e for submit@debbugs.gnu.org; Sat, 11 Mar 2017 11:20:28 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43408) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmjko-0000MZ-La for bug-guile@gnu.org; Sat, 11 Mar 2017 11:20:27 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmjkn-0001Ou-Jg for bug-guile@gnu.org; Sat, 11 Mar 2017 11:20:26 -0500 Original-Received: from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:35020) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cmjkn-0001OE-CB for bug-guile@gnu.org; Sat, 11 Mar 2017 11:20:25 -0500 Original-Received: by mail-wm0-x235.google.com with SMTP id v186so14199610wmd.0 for ; Sat, 11 Mar 2017 08:20:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=UGEZfTLNAXeIZw7zOazYh3YDpHml3fUX8G/7kwA8Fcg=; b=o99m0AF6nGlE+x47B29TOr76sjAzW6+W4os5lHlGEJYKLdboDBF0IMK5ZQP25mcTLb SZQz/KiQJ37JfQ/+C76W0Bxj3pXUz31CwnYbYgSy98Qa0yKZABZKrlbGYAiR5tpqWQeV 2pRUt27SH7yu47ZDIvjWPt9wFaYk1E8tK0+V+6JdN/40jq18urNkqwuGtT4ddTCYS386 VC5+xdmzjIkgoQCvazulGh0cvM6yQ6NpfATyEiQOlBBHLUN8I86n+MZrIUXMCaSvcnGE AfUH3oC5ghReQmFzEWJuIf7xUiYfb6kSYhBVcPagUWxVf2hx0opSACp0Jq4KA+Dh3Ef0 R4qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=UGEZfTLNAXeIZw7zOazYh3YDpHml3fUX8G/7kwA8Fcg=; b=fv1cVYCGtGlqxog9oGmUwyzEW9KAAsQ7d28v7UOrhIJdMgFG3noaKXQmDDyLtPxpGB 6lKMAUzMwKFGhXbtvtVrycHqmdsjw0qZFtxNdZFrxR2wCfrK10fGTNFUWDdt7jbgL70R Fx5GnTtydwXU7//4FnIUZ4utF6Fyg3xxKGz7lEzn4anodwfSG3qFxIom4eTKqF01opa+ RiS3wwcp0gZBHFPOwtlFTy4VKWczRc2Vgxi0vZcJvXclOYhnz6yfxWSK3yjMQ0dUe0tw Vu8MlFCUznBZIGwLrCOHrsEsWipwBG4vZC3TNd+uh9fQdkpNWMlrI0RmHw/Y2wXnzWjh 8Y9Q== X-Gm-Message-State: AFeK/H2gyjxnuJByTIP5/5iizAkMvhDeQ7m/0RHyE/4AqU/YwH9gOYlAgELoiZm0LJzf2Q== X-Received: by 10.28.111.151 with SMTP id c23mr3777257wmi.17.1489249223867; Sat, 11 Mar 2017 08:20:23 -0800 (PST) Original-Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id q4sm17848703wrc.35.2017.03.11.08.20.22 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 11 Mar 2017 08:20:23 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8670 Archived-At: See the R6RS Libraries document page 10. The differences: - R6RS supports reading a BOM. - R6RS mandates an endianness argument to specify the behavior at the absence of a BOM. - R6RS allows an optional third argument 'endianness-mandatory' to explicitly ignore any possible BOM. Here's a quick patch on top of master to implement the R6RS procedures in terms of the Guile procedures and export them with a rename from (rnrs bytevectors). ===File /home/taylan/src/guile/guile-master/0001-Fix-R6RS-utf16-string-and-utf32-string.patch=== >From f51cd1d4884caafb1ed0072cd77c0e3145f34576 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?= Date: Fri, 10 Mar 2017 22:36:55 +0100 Subject: [PATCH] Fix R6RS utf16->string and utf32->string. * module/rnrs/bytevectors.scm (read-bom16, read-bom32): New procedures. (r6rs-utf16->string, r6rs-utf32->string): Ditto. --- module/rnrs/bytevectors.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm index 9744359f0..997a8c9cb 100644 --- a/module/rnrs/bytevectors.scm +++ b/module/rnrs/bytevectors.scm @@ -69,7 +69,9 @@ bytevector-ieee-double-native-set! string->utf8 string->utf16 string->utf32 - utf8->string utf16->string utf32->string)) + utf8->string + (r6rs-utf16->string . utf16->string) + (r6rs-utf32->string . utf32->string))) (load-extension (string-append "libguile-" (effective-version)) @@ -80,4 +82,52 @@ `(quote ,sym) (error "unsupported endianness" sym))) +(define (read-bom16 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1))) + (cond + ((and (= c0 #xFE) (= c1 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE)) + 'little) + (else + #f)))) + +(define r6rs-utf16->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom16 bv))) + (if (not bom-endianness) + (utf16->string bv default-endianness) + (substring/shared (utf16->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf16->string bv endianness) + (r6rs-utf16->string bv endianness))))) + +(define (read-bom32 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1)) + (c2 (bytevector-u8-ref bv 2)) + (c3 (bytevector-u8-ref bv 3))) + (cond + ((and (= c0 #x00) (= c1 #x00) (= c2 #xFE) (= c3 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE) (= c2 #x00) (= c3 #x00)) + 'little) + (else + #f)))) + +(define r6rs-utf32->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom32 bv))) + (if (not bom-endianness) + (utf32->string bv default-endianness) + (substring/shared (utf32->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf32->string bv endianness) + (r6rs-utf32->string bv endianness))))) + ;;; bytevector.scm ends here -- 2.11.0 ============================================================