all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca
Subject: Re: Case mapping of sharp s
Date: Sun, 22 Nov 2009 02:40:09 +0900	[thread overview]
Message-ID: <87zl6fdbeu.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <87fx88aw6a.fsf@lola.goethe.zz>

David Kastrup writes:
 > Richard Stallman <rms@gnu.org> writes:
 > 
 > > I don't think the design of MULE was an error in the 1990s.

Of course it was, at least as applied to the ISO 8859 family of
scripts.  In fact the ISO 8859 standard makes plain that characters
with the same name are identical across the ISO 8859 family.
Distinguishing (make-char 'latin-iso8859-1 32) from (make-char
'latin-iso8859-15 32) was a mistake, and it caused a lot of pain for
users and developers.

I agree that in Japan the design was plausible in the early 90s.  In
hindsight, I think it was an unfortunate choice, though.  It would
have been better for the Mule Lab (which has a fair amount of prestige
in this country) to lead the way toward open, universal standards by
working out the difficulties of dealing with multilingual text written
in a Unihan script (ie, Unicode).  In the end internationalized
encodings based on ISO 2022 extension techniques (such as TRON code
and Mule code) are all dead (except for ISO-2022-JP, still commonly
used in email), but Shift JIS remains in wide use, with only Unicode
gaining share.

 > I think that the design of utf-8 that makes character starts
 > immediately recognizable without the need for rescanning or
 > synchronization has been an excellent idea.  MULE coding lacks this
 > feature.

It does not lack that feature: C0 and GL codes are ASCII (one byte
characters), C1 codes are leading bytes, and GR codes are trailing
bytes.  Ie, all bytes less than 160 are character starters.  AFAIK,
Mule code developed this feature at about the same time that FSS-UTF
was invented (Mule development started in mid-1991, and the earliest
reference I can find to FSS-UTF is Ken Thompson's fss-utf.c dated
1992).  You'd have to ask Ken'ichi Handa for the exact date and
whether he was aware of FSS-UTF and such techniques when the Mule
encoding was designed.

UTF-8 doesn't really have any algorithmic string-processing advantages
over Mule code.  Even the fact that you can compute the length of a
character algorithmically from a UTF-8 leading byte is unimportant,
since it's much more efficient to use a table lookup for that.  The
big advantage of UTF-8 is that it's based on Unicode, so characters
that never should have been distinguished in the first place don't
have to be reidentified in Lisp.  Not to mention all of the useful
character data and the bidi algorithm, etc.






  reply	other threads:[~2009-11-21 17:40 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 19:48 Case mapping of sharp s grischka
2009-11-19 21:49 ` Stefan Monnier
2009-11-19 22:43   ` David Kastrup
2009-11-20  2:08     ` Stefan Monnier
2009-11-20  8:03       ` David Kastrup
2009-11-20 14:14         ` Stefan Monnier
2009-11-20  3:41     ` Stephen J. Turnbull
2009-11-20  4:20       ` Stefan Monnier
2009-11-20  7:13         ` Stephen J. Turnbull
2009-11-21  0:02           ` Richard Stallman
2009-11-21 12:39             ` David Kastrup
2009-11-21 17:40               ` Stephen J. Turnbull [this message]
2009-11-21 19:15                 ` Eli Zaretskii
2009-11-22  2:58                   ` Stephen J. Turnbull
2009-11-22  4:28                     ` Eli Zaretskii
2009-11-22  8:27                       ` Stephen J. Turnbull
2009-11-23  1:30                 ` Kenichi Handa
2009-11-21 22:52               ` Richard Stallman
2009-11-20  8:10     ` Ulrich Mueller
2009-11-20 11:46       ` Stephen J. Turnbull
2009-11-20 14:43         ` Ulrich Mueller
2009-11-21  4:33           ` Stephen J. Turnbull
2009-11-19 23:25   ` grischka
2009-11-20  2:11     ` Stefan Monnier
2009-11-21  3:08       ` grischka
2009-11-21  8:58         ` Eli Zaretskii
2009-11-21  9:33           ` Andreas Schwab
2009-11-21 11:45             ` Eli Zaretskii
2009-11-21 15:33           ` grischka
2009-11-21 10:41         ` Ulrich Mueller
2009-11-21 11:58           ` Andreas Schwab
2009-11-21 17:01             ` Ulrich Mueller
2009-11-22 12:11               ` Andreas Schwab
2009-11-22 20:15                 ` Stefan Monnier
2009-11-24 12:26             ` Kenichi Handa
2009-11-24 19:23               ` grischka
2009-11-25  2:13                 ` Kenichi Handa
2009-11-26 13:07                   ` grischka
2009-11-29 22:03                   ` Juri Linkov
2009-11-30  1:22                     ` Stefan Monnier
2009-11-30  1:28                     ` Kenichi Handa
2009-11-30  1:36                       ` Kenichi Handa
2009-11-30  7:01                     ` Ulrich Mueller
2009-11-30 12:01                       ` Juri Linkov
2009-11-30 13:09                         ` martin rudalics
2009-11-30 21:57                       ` Juri Linkov
2009-11-30 22:34                         ` Ulrich Mueller
2009-12-01  0:02                           ` Juri Linkov
  -- strict thread matches above, loose matches on Subject: below --
2009-11-15 14:29 Ulrich Mueller
2009-11-16 12:06 ` Kenichi Handa
2009-11-16 16:38   ` Ulrich Mueller
2009-11-17  7:36     ` Kenichi Handa
2009-11-17 21:23       ` Reiner Steib
2009-11-16 19:12   ` Eli Zaretskii
2009-11-17  7:43     ` martin rudalics
2009-11-17  7:49     ` Kenichi Handa
2009-11-17 18:56       ` Eli Zaretskii
2009-11-18  1:00         ` Kenichi Handa
2009-11-18  4:09           ` Eli Zaretskii
2009-11-18  5:33             ` Stephen J. Turnbull
2009-11-18  6:26             ` Kenichi Handa
2009-11-18 14:44               ` Stefan Monnier
2009-11-18 19:05                 ` Ulrich Mueller
2009-11-19  1:16                   ` Stefan Monnier
2009-11-18 17:58               ` Eli Zaretskii
2009-11-19  1:57                 ` Stephen J. Turnbull

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zl6fdbeu.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.