From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale Date: Tue, 16 Oct 2018 01:13:43 -0400 Message-ID: <87tvlmo4mw.fsf@netris.org> References: <469f2345-5e76-1fc5-1105-f1d508611140@suse.de> <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@suse.de> <87y3ayodqp.fsf_-_@netris.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1539666792 10648 195.159.176.226 (16 Oct 2018 05:13:12 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 16 Oct 2018 05:13:12 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Cc: 33044@debbugs.gnu.org To: Tom de Vries Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Oct 16 07:13:08 2018 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCHfH-0002eV-R4 for guile-bugs@m.gmane.org; Tue, 16 Oct 2018 07:13:07 +0200 Original-Received: from localhost ([::1]:55998 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCHhO-0005Mw-A0 for guile-bugs@m.gmane.org; Tue, 16 Oct 2018 01:15:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59290) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gCHhJ-0005Mr-3e for bug-guile@gnu.org; Tue, 16 Oct 2018 01:15:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gCHhC-0002WQ-Ty for bug-guile@gnu.org; Tue, 16 Oct 2018 01:15:10 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:47465) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gCHh8-0002N7-Iq for bug-guile@gnu.org; Tue, 16 Oct 2018 01:15:05 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gCHh8-0005Y6-7z for bug-guile@gnu.org; Tue, 16 Oct 2018 01:15:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 16 Oct 2018 05:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33044 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 33044-submit@debbugs.gnu.org id=B33044.153966684621251 (code B ref 33044); Tue, 16 Oct 2018 05:15:02 +0000 Original-Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 05:14:06 +0000 Original-Received: from localhost ([127.0.0.1]:51723 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCHgE-0005Wg-6C for submit@debbugs.gnu.org; Tue, 16 Oct 2018 01:14:06 -0400 Original-Received: from world.peace.net ([64.112.178.59]:44344) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gCHgB-0005W6-Sh for 33044@debbugs.gnu.org; Tue, 16 Oct 2018 01:14:04 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1gCHg5-0007Z2-Su; Tue, 16 Oct 2018 01:13:58 -0400 In-Reply-To: <87y3ayodqp.fsf_-_@netris.org> (Mark H. Weaver's message of "Mon, 15 Oct 2018 21:57:02 -0400") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9201 Archived-At: Mark H Weaver writes: > Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and > 0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped > to the Yen sign (=C2=A5) and overline (=E2=80=BE) in Shift_JIS. Backslas= h (\) and > tilde (~) are multibyte characters in Shift_JIS. Although I wrote above that "Backslash (\) and tilde (~) are multibyte characters in Shift_JIS", that was admittedly my assumption, based on the absence of those characters in the "First byte" map shown here: https://en.wikipedia.org/wiki/Shift_JIS#As_defined_in_JIS_X_0208:1997 However, now I'm unsure. I've spent some time attempting to find the Shift_JIS encodings for backslash and tilde, but I've not yet found an answer. I've asked Emacs 26 to write a file containing backslashes and Yen signs using the "shift_jis" encoding, and both characters seem to be mapped to the same code: 0x5C. I've also used the 'iconv' utility from GNU libc to convert backslashes and Yen signs to Shift_JIS, and it also maps these two characters to the same codes: --8<---------------cut here---------------start------------->8--- mhw@jojen ~$ echo '\\=C2=A5=C2=A5' | iconv -f UTF-8 -t SHIFT-JIS > Shift_JI= S_test.txt mhw@jojen ~$ hexdump -C Shift_JIS_test.txt 00000000 5c 5c 5c 5c 0a |\\\\.| 00000005 --8<---------------cut here---------------end--------------->8--- While investigating, I found this bug for GNU libc asking to add an SJIS locale, and the developers were strongly opposed: https://bugzilla.redhat.com/show_bug.cgi?id=3D136290 At this point, I'm inclined to believe that Shift_JIS is not suitable as a locale encoding on POSIX systems, and that we should not try to support it in Guile. What do you think? Can you tell me how backslash and tilde are represented in Shift JIS? Regards, Mark