From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: John Cowan Newsgroups: gmane.lisp.guile.bugs Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale Date: Tue, 16 Oct 2018 08:52:59 -0400 Message-ID: References: <469f2345-5e76-1fc5-1105-f1d508611140@suse.de> <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@suse.de> <87y3ayodqp.fsf_-_@netris.org> <87tvlmo4mw.fsf@netris.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000003bf2a805785809e2" X-Trace: blaine.gmane.org 1539694333 428 195.159.176.226 (16 Oct 2018 12:52:13 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 16 Oct 2018 12:52:13 +0000 (UTC) Cc: 33044@debbugs.gnu.org, tdevries@suse.de To: Mark H Weaver Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Oct 16 14:52:08 2018 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCOpT-0008PJ-RW for guile-bugs@m.gmane.org; Tue, 16 Oct 2018 14:52:08 +0200 Original-Received: from localhost ([::1]:57875 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCOra-00029D-4z for guile-bugs@m.gmane.org; Tue, 16 Oct 2018 08:54:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48554) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCOrQ-00028w-2V for bug-guile@gnu.org; Tue, 16 Oct 2018 08:54:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gCOrM-0006Ub-1A for bug-guile@gnu.org; Tue, 16 Oct 2018 08:54:08 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:47669) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gCOrK-0006Sx-IL for bug-guile@gnu.org; Tue, 16 Oct 2018 08:54:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gCOrK-0003qh-Cn for bug-guile@gnu.org; Tue, 16 Oct 2018 08:54:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: John Cowan Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 16 Oct 2018 12:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33044 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 33044-submit@debbugs.gnu.org id=B33044.153969439914745 (code B ref 33044); Tue, 16 Oct 2018 12:54:02 +0000 Original-Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 12:53:19 +0000 Original-Received: from localhost ([127.0.0.1]:51927 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCOqd-0003pk-7s for submit@debbugs.gnu.org; Tue, 16 Oct 2018 08:53:19 -0400 Original-Received: from mail-wm1-f43.google.com ([209.85.128.43]:38924) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCOqb-0003pX-0v for 33044@debbugs.gnu.org; Tue, 16 Oct 2018 08:53:17 -0400 Original-Received: by mail-wm1-f43.google.com with SMTP id y144-v6so23074363wmd.4 for <33044@debbugs.gnu.org>; Tue, 16 Oct 2018 05:53:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccil-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DeSprrRGY7LEbiPA8DojnO0HQAKBCAeAc6UG38C9g7c=; b=tWeluz5bMq6/3d874D2/XKjrIXjXJwUhHnScOYUSwk2QJmwtuxi4VYSJZ621xpMU9E dCDVJ0ftbUXT4wH+pisyZ7VhlC5CUlttPwsywDE2HE5SEc2tREPUyM9WvJ67TSa67yI1 LxhipgPT9rbqdjPEDsQ87dwXFdt1r7S7Q6acICxgM31GlXSQsk6a9QBZWZ5rKu/wf2Sx amWiLspi3hmNVKcxiixJjnoYtVx8qqWelqYhzTOKnRCv80MszS0S0jw6XzRz5yh6QVhe 8pn+h0AEWj1/ZoFfaUsBVbi/5RbiK8gR3F04ayb1cvy8oJ1gcQnS8Q/UktR9QKVqvQsZ i22g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DeSprrRGY7LEbiPA8DojnO0HQAKBCAeAc6UG38C9g7c=; b=ud+pt1uaUg0msn8AjprlvKrrSfF5tGJas13VNjXX1iaKxV3M78QQl98epSzKL/3y4P fBGtKc+IIeZ1supazHDk61b3qdcPKdFGVX/5GVnXF9Kq5Q7LEIJW+wOKterHYOcBQ7oW lGXoJQ8jRg1+FEGbcAQEacioDOUFyl9Z3TWCtvad6+xdncSqUIasJZwLHebjK5/BBpiw B3BguTMKWIboKjX+tPZPXmM2pZN0f39FFn/0YrrSr1PvmxgBddA87Q/HNIY2dbqmS7DW YWX2IrWw9khS+x75c/3oX+K9VDsEcC5agWrCYNF62AoDuTfAe0wZPbUKyHLZGnJgc51x 7SrQ== X-Gm-Message-State: ABuFfojy+OS0+nvQmojhhw8Eh1k9hR3JBf6avL7e4B84e4c6BSkn04+i lE9XZF4FtgH3vnhfpPoCTY1nNQOr2BjXTHxgMcxg8Q== X-Google-Smtp-Source: ACcGV61jeaqIM3fzBS6ihUXfWNKdFkw7h4rqI/pWisD/AiVV8G/A5wWdcpn3ZuZBwZuUV6TEWfGglcHy05Sk+TOR3FI= X-Received: by 2002:a1c:8154:: with SMTP id c81-v6mr16157140wmd.140.1539694390924; Tue, 16 Oct 2018 05:53:10 -0700 (PDT) In-Reply-To: <87tvlmo4mw.fsf@netris.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9202 Archived-At: --0000000000003bf2a805785809e2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable At this point, I'm inclined to believe that Shift_JIS is not suitable as > a locale encoding on POSIX systems, and that we should not try to > support it in Guile. > > What do you think? > > Can you tell me how backslash and tilde are represented in Shift JIS? > They aren't: iconv is right. Japanese Windows users are used to seeing Windows pathnames that look like "C:=C2=A5foo=C2=A5bar", and when writing C= , to strings like "first line=C2=A5nsecond line." So what is happening is that = the character at #\x5C is *functionally* a backslash that is *displayed* as a yen sign. This is reinforced by the fact that the round-trip mapping from Shift_JIS #\x5C is U+005C BACKSLASH, whereas U+00A5 YEN SIGN is mapped only from Unicode (or other encodings) to Shift_JIS, never the other way around. This is the last survivor of the "national characters" concept of ISO 646, whereby certain 7-bit characters were interpreted differently in different countries. For Scandinavian programmers, for example, blocks in C began with =C3=A6 and ended with =C3=A5 rather than { and } respectively, and the= logical OR operator was =C3=B8. In the same way, British and Irish programmers use= d =C2=A3 instead of # at the beginning of comments in awk and shell programs. With the arrival of Latin-{1,2,3,4} this concept was eventually abandoned, and all systems converged on ISO-646-IRV (the same as US-ASCII) *except* Japanese systems. So I recommend that you do what everyone else does and ignore the issue in JIS-based encodings, of which Shift_JIS is the only one in practical use (and it _is_ heavily used in Japan, where it is almost the only encoding for documents on desktops). Just ignoring the encoding is not an option in Japan: see the comments by Joel Rees, Norman Diamond, and Ryan Thompson at the bug you pointed to. --=20 John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org In might the Feanorians / that swore the unforgotten oath brought war into Arvernien / with burning and with broken troth. and Elwing from her fastness dim / then cast her in the waters wide, but like a mew was swiftly borne, / uplifted o'er the roaring tide. --the Earendillinwe --0000000000003bf2a805785809e2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


At this point, I'm inclined to bel= ieve that Shift_JIS is not suitable as
a locale encoding on POSIX systems, and that we should not try to
support it in Guile.

What do you think?

Can you tell me how backslash and tilde are represented in Shift JIS?

They aren't:=C2=A0 iconv is right.=C2=A0= Japanese Windows users are used to seeing Windows pathnames that look like= "C:=C2=A5foo=C2=A5bar", and when writing C, to strings like &quo= t;first line=C2=A5nsecond line."=C2=A0 So what is happening is that th= e character at #\x5C is *functionally* a backslash that is *displayed* as a= yen sign.=C2=A0 This is reinforced by the fact that the round-trip mapping= from Shift_JIS #\x5C is U+005C BACKSLASH, whereas U+00A5 YEN SIGN is mappe= d only from Unicode (or other encodings) to Shift_JIS, never the other way = around.

This is the last survivor of the "nat= ional characters" concept of ISO 646, whereby certain 7-bit characters= were interpreted differently in different countries.=C2=A0 For Scandinavia= n programmers, for example, blocks in C began with =C3=A6 and ended with = =C3=A5 rather than { and } respectively, and the logical OR operator was = =C3=B8.=C2=A0 In the same way, British and Irish=C2=A0programmers used=C2= =A0=C2=A3 instead of # at the beginning of comments in awk and shell progra= ms.=C2=A0 With the arrival of Latin-{1,2,3,4} this concept was eventually a= bandoned, and all systems converged on ISO-646-IRV (the same as US-ASCII) *= except* Japanese systems.

So I recommend that you = do what everyone else does and ignore the issue in JIS-based encodings, of = which Shift_JIS is the only one in practical use (and it _is_ heavily used = in Japan, where it is almost the only encoding for documents on desktops).= =C2=A0 =C2=A0Just ignoring the encoding is not an option in Japan: see the = comments by Joel Rees, Norman Diamond, and Ryan Thompson at the bug you poi= nted to.

--=C2=A0
John Cowan=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 http= ://vrici.lojban.org/~cowan=C2=A0 =C2=A0 =C2=A0 =C2=A0 cowan@ccil.org
In might the Feanorians / tha= t swore the unforgotten oath
brought war into Arvernien / with bu= rning and with broken troth.
and Elwing from her fastness dim / t= hen cast her in the waters wide,
but like a mew was swiftly borne= , / uplifted o'er the roaring tide.
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 --the Earendillinwe

<= /div>
--0000000000003bf2a805785809e2--