From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Case mapping of sharp s Date: Sun, 22 Nov 2009 02:40:09 +0900 Message-ID: <87zl6fdbeu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4B05A11F.5000700@gmx.de> <87iqd6gmpk.fsf@lola.goethe.zz> <87tywp7tir.fsf@uwakimon.sk.tsukuba.ac.jp> <87aayheki7.fsf@uwakimon.sk.tsukuba.ac.jp> <87fx88aw6a.fsf@lola.goethe.zz> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1258841480 31604 80.91.229.12 (21 Nov 2009 22:11:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2009 22:11:20 +0000 (UTC) Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 23:11:12 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NByAW-0003SB-Pa for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 23:11:00 +0100 Original-Received: from localhost ([127.0.0.1]:51986 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBtqd-0002WG-BO for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 12:34:11 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBtqX-0002Vl-Hg for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:05 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBtqS-0002UV-Ax for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:04 -0500 Original-Received: from [199.232.76.173] (port=41174 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBtqS-0002UQ-5P for emacs-devel@gnu.org; Sat, 21 Nov 2009 12:34:00 -0500 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:53571) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NBtqG-0002wk-Lm; Sat, 21 Nov 2009 12:33:49 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id 907961537B4; Sun, 22 Nov 2009 02:33:44 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id C98501A28C6; Sun, 22 Nov 2009 02:40:09 +0900 (JST) In-Reply-To: <87fx88aw6a.fsf@lola.goethe.zz> X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" d20e0a45a4b2 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117458 Archived-At: David Kastrup writes: > Richard Stallman writes: > > > I don't think the design of MULE was an error in the 1990s. Of course it was, at least as applied to the ISO 8859 family of scripts. In fact the ISO 8859 standard makes plain that characters with the same name are identical across the ISO 8859 family. Distinguishing (make-char 'latin-iso8859-1 32) from (make-char 'latin-iso8859-15 32) was a mistake, and it caused a lot of pain for users and developers. I agree that in Japan the design was plausible in the early 90s. In hindsight, I think it was an unfortunate choice, though. It would have been better for the Mule Lab (which has a fair amount of prestige in this country) to lead the way toward open, universal standards by working out the difficulties of dealing with multilingual text written in a Unihan script (ie, Unicode). In the end internationalized encodings based on ISO 2022 extension techniques (such as TRON code and Mule code) are all dead (except for ISO-2022-JP, still commonly used in email), but Shift JIS remains in wide use, with only Unicode gaining share. > I think that the design of utf-8 that makes character starts > immediately recognizable without the need for rescanning or > synchronization has been an excellent idea. MULE coding lacks this > feature. It does not lack that feature: C0 and GL codes are ASCII (one byte characters), C1 codes are leading bytes, and GR codes are trailing bytes. Ie, all bytes less than 160 are character starters. AFAIK, Mule code developed this feature at about the same time that FSS-UTF was invented (Mule development started in mid-1991, and the earliest reference I can find to FSS-UTF is Ken Thompson's fss-utf.c dated 1992). You'd have to ask Ken'ichi Handa for the exact date and whether he was aware of FSS-UTF and such techniques when the Mule encoding was designed. UTF-8 doesn't really have any algorithmic string-processing advantages over Mule code. Even the fact that you can compute the length of a character algorithmically from a UTF-8 leading byte is unimportant, since it's much more efficient to use a table lookup for that. The big advantage of UTF-8 is that it's based on Unicode, so characters that never should have been distinguished in the first place don't have to be reidentified in Lisp. Not to mention all of the useful character data and the bidi algorithm, etc.