From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Aidan Kehoe <kehoea@parhasard.net>
Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs
Subject: Re: [PATCH] Unicode Lisp reader escapes.
Date: Thu, 15 Jun 2006 20:38:06 +0200
Message-ID: <17553.43278.718379.863167@parhasard.net>
References: <17491.34779.959316.484740@parhasard.net>
	<E1Fa2EV-0003pC-1U@fencepost.gnu.org>
	<17492.29148.246942.842300@parhasard.net>
	<E1FaIvT-0001ST-D8@fencepost.gnu.org> <8764kkawsf.fsf@jurta.org>
	<E1FcWJv-0002xl-QM@fencepost.gnu.org> <87vesi6nh1.fsf@jurta.org>
	<E1Fe26F-0007nN-SB@fencepost.gnu.org> <878xp8g2a9.fsf@jurta.org>
	<E1FeP3a-0002Mf-J6@fencepost.gnu.org>
	<17537.54719.354843.89030@parhasard.net> <ufyieqj0v.fsf@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1150396529 15287 80.91.229.2 (15 Jun 2006 18:35:29 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Thu, 15 Jun 2006 18:35:29 +0000 (UTC)
Cc: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 15 20:35:26 2006
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1FqwgT-00064O-5j
	for ged-emacs-devel@m.gmane.org; Thu, 15 Jun 2006 20:35:13 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1FqwgS-0006PM-Pe
	for ged-emacs-devel@m.gmane.org; Thu, 15 Jun 2006 14:35:12 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1FqwgI-0006Mz-3Q
	for emacs-devel@gnu.org; Thu, 15 Jun 2006 14:35:02 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1FqwgF-0006Lm-AW
	for emacs-devel@gnu.org; Thu, 15 Jun 2006 14:35:00 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1FqwgF-0006Lj-50; Thu, 15 Jun 2006 14:34:59 -0400
Original-Received: from [66.111.49.30] (helo=icarus.asclepian.ie)
	by monty-python.gnu.org with esmtp (Exim 4.52)
	id 1Fqwpe-0003jV-4F; Thu, 15 Jun 2006 14:44:42 -0400
Original-Received: by icarus.asclepian.ie (Postfix, from userid 1003)
	id EC4AB8008C; Thu, 15 Jun 2006 19:34:53 +0100 (IST)
Original-To: Eli Zaretskii <eliz@gnu.org>
In-Reply-To: <ufyieqj0v.fsf@gnu.org>
X-Mailer: VM 7.17 under 21.5  (beta26) "endive" (+CVS-20060512) XEmacs Lucid
X-NS5-file-as-sent: t
X-Echelon-distraction: Peking Ft. Meade Cocaine bomb jihad $400 million in
	gold bullion 
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:55916 gmane.emacs.pretest.bugs:12525
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/55916>


 > 	if (EQ(Qnil, lisp_char))
 > 	  {
 > 	    /* This is ugly and horrible and trashes the user's data. */
 > 	    XSETFASTINT (i, MAKE_CHAR (charset_katakana_jisx0201,=20
 > 				       34 + 128, 46 + 128));
 >             return i;
 > 	  }
 >=20
 > What is this special Katakana character, and why are we producing it?

Firstly, thank you for posing the question; the character intended was no=
t a
member of JISX0201 at all, rather of JISX0208. I yanked the wrong charset
identifier from charset.h when porting the code from XEmacs. The patch be=
low
addresses this.=20

(make-char 'japanese-jisx0208 34 46) gives U+3013 GETA MARK, a character =
in
JISX 0208 that is used to represent unknown or corrupted data. The
Unicode-specific equivalent is U+FFFD REPLACEMENT CHARACTER. I used the G=
ETA
MARK because I was certain it would be available in Mule and it is
equivalent. It turns out that (make-char 'mule-unicode-e000-ffff 117 61)
gives U+FFFD, so it might be worthwhile to replace that.=20

 > Is it to trigger an "Invalid character" message, or is something else
 > going on here?

It doesn=E2=80=99t actually trigger a message, it displays a character to=
 be
interpreted as =E2=80=9Cthe character couldn=E2=80=99t be interpreted.=E2=
=80=9D

My feeling is that the syntax should be close in its behaviour to what th=
e
coding systems do, and when the coding systems see a code point that is
valid but that they can=E2=80=99t interpret, they trash the user=E2=80=99=
s data. (Or do
something totally mad like transform invalid UTF-16 to invalid UTF-8!?)

src/ChangeLog addition:

2006-06-14  Aidan Kehoe  <kehoea@parhasard.net>

	* lread.c (read_escape):
	Change charset_katakana_jisx0201 to charset_jisx0208 as it should
	have been in the first place, since we intended U+3013 GETA MARK.=20
=09

GNU Emacs Trunk source patch:
Diff command:   cvs -q diff -u
Files affected: src/lread.c

Index: src/lread.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /sources/emacs/emacs/src/lread.c,v
retrieving revision 1.353
diff -u -u -r1.353 lread.c
--- src/lread.c	9 Jun 2006 18:22:30 -0000	1.353
+++ src/lread.c	14 Jun 2006 06:57:49 -0000
@@ -1967,7 +1967,7 @@
 	if (EQ(Qnil, lisp_char))
 	  {
 	    /* This is ugly and horrible and trashes the user's data.  */
-	    XSETFASTINT (i, MAKE_CHAR (charset_katakana_jisx0201,
+	    XSETFASTINT (i, MAKE_CHAR (charset_jisx0208,
 				       34 + 128, 46 + 128));
             return i;
 	  }


--=20
Aidan Kehoe, http://www.parhasard.net/