From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Case mapping of sharp s
Date: Sun, 22 Nov 2009 02:40:09 +0900
Message-ID: <87zl6fdbeu.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <4B05A11F.5000700@gmx.de> <jwvskcai43z.fsf-monnier+emacs@gnu.org>
	<87iqd6gmpk.fsf@lola.goethe.zz>
	<87tywp7tir.fsf@uwakimon.sk.tsukuba.ac.jp>
	<jwvpr7d96bt.fsf-monnier+emacs@gnu.org>
	<87aayheki7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<E1NBdQq-0002ca-2m@fencepost.gnu.org>
	<87fx88aw6a.fsf@lola.goethe.zz>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1258841480 31604 80.91.229.12 (21 Nov 2009 22:11:20 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 21 Nov 2009 22:11:20 +0000 (UTC)
Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca
To: David Kastrup <dak@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 23:11:12 2009
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1NByAW-0003SB-Pa
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 23:11:00 +0100
Original-Received: from localhost ([127.0.0.1]:51986 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NBtqd-0002WG-BO
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 12:34:11 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NBtqX-0002Vl-Hg
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:05 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NBtqS-0002UV-Ax
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:04 -0500
Original-Received: from [199.232.76.173] (port=41174 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NBtqS-0002UQ-5P
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:00 -0500
Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:53571)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stephen@xemacs.org>)
	id 1NBtqG-0002wk-Lm; Sat, 21 Nov 2009 12:33:49 -0500
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id 907961537B4;
	Sun, 22 Nov 2009 02:33:44 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id C98501A28C6; Sun, 22 Nov 2009 02:40:09 +0900 (JST)
In-Reply-To: <87fx88aw6a.fsf@lola.goethe.zz>
X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" d20e0a45a4b2
	XEmacs Lucid (x86_64-unknown-linux)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:117458
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/117458>

David Kastrup writes:
 > Richard Stallman <rms@gnu.org> writes:
 > 
 > > I don't think the design of MULE was an error in the 1990s.

Of course it was, at least as applied to the ISO 8859 family of
scripts.  In fact the ISO 8859 standard makes plain that characters
with the same name are identical across the ISO 8859 family.
Distinguishing (make-char 'latin-iso8859-1 32) from (make-char
'latin-iso8859-15 32) was a mistake, and it caused a lot of pain for
users and developers.

I agree that in Japan the design was plausible in the early 90s.  In
hindsight, I think it was an unfortunate choice, though.  It would
have been better for the Mule Lab (which has a fair amount of prestige
in this country) to lead the way toward open, universal standards by
working out the difficulties of dealing with multilingual text written
in a Unihan script (ie, Unicode).  In the end internationalized
encodings based on ISO 2022 extension techniques (such as TRON code
and Mule code) are all dead (except for ISO-2022-JP, still commonly
used in email), but Shift JIS remains in wide use, with only Unicode
gaining share.

 > I think that the design of utf-8 that makes character starts
 > immediately recognizable without the need for rescanning or
 > synchronization has been an excellent idea.  MULE coding lacks this
 > feature.

It does not lack that feature: C0 and GL codes are ASCII (one byte
characters), C1 codes are leading bytes, and GR codes are trailing
bytes.  Ie, all bytes less than 160 are character starters.  AFAIK,
Mule code developed this feature at about the same time that FSS-UTF
was invented (Mule development started in mid-1991, and the earliest
reference I can find to FSS-UTF is Ken Thompson's fss-utf.c dated
1992).  You'd have to ask Ken'ichi Handa for the exact date and
whether he was aware of FSS-UTF and such techniques when the Mule
encoding was designed.

UTF-8 doesn't really have any algorithmic string-processing advantages
over Mule code.  Even the fact that you can compute the length of a
character algorithmically from a UTF-8 leading byte is unimportant,
since it's much more efficient to use a table lookup for that.  The
big advantage of UTF-8 is that it's based on Unicode, so characters
that never should have been distinguished in the first place don't
have to be reidentified in Lisp.  Not to mention all of the useful
character data and the bidi algorithm, etc.