unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: lekktu@gmail.com, jasonr@gnu.org, emacs-devel@gnu.org,
	handa@m17n.org, Miles Bader <miles@gnu.org>
Subject: Re: Choice of fonts displaying etc/HELLO
Date: Fri, 08 Aug 2008 04:30:59 +0900	[thread overview]
Message-ID: <87y738oc5o.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <uhc9wk8he.fsf@gnu.org>

Eli Zaretskii writes:
 > > From: Miles Bader <miles.bader@necel.com>
 > > Eli Zaretskii <eliz@gnu.org> writes:
 > > > I meant would it break something if "\\cj" matched only the Katakana
 > > > and Hiragana characters instead of what it matches today?
 > > 
 > > I don't know what it would break, but that doesn't seem like
 > > particularly intuitive behavior.
 > 
 > ??? Why not?

Because although Katakana and Hiragana are the only uniquely Japanese
word constituents, the written form of the Japanese language also uses
a set of ideographs (Kanji) borrowed from Chinese, as well as an
idiosyncratic set of symbols (eg, precomposed Roman numerals,
precomosed multiletter units such as "mm" and "kg").  Since the
admissible set of ideographs is defined by Ministry of Education
standards, the Japanese *set* of Kanji is not the same as the Chinese
*set*, and therefore need a category of their own.  So the Japanese
category should include, at least, Hiragana, Katakana, (Japanese)
Kanji, and the idiosyncratic symbol set.

 > > I think emacs' concept of characters belonging to multiple language
 > > categories is pretty neat actually.
 > 
 > Maybe I'm missing something, but I don't see how the fact that, say,
 > Cyrillic characters are claimed to belong to Japanese category could
 > be considered ``neat''.

It's not considered "neat" that Cyrillic is (in old Mule) considered
to be Japanese, at least not by me.  However, I do think it's useful,
at least, that the Hanzi (several varieties of Chinese) overlap the
Kanji (Japanese versions of same) and Hanja (Korean version).
Similarly for the accented characters that are used by Spanish and
French alike (although they don't use the same set, there is some
overlap), etc, etc.  I suppose that's what Miles meant?

Now, that inclusion of Cyrillic in Japanese is due to the fact that
with a character set size of nearly 10,000 and an official list of
about 6000 characters needed for daily use, the Japanese decided that
a more or less universal character set would be a good idea so they
added Cyrillic, Greek, and a number of math symbols, as well as a
bunch of other scripts and "stuff".  In the old Mule encoding I
suppose the \cX categories were implemented basically by looking at
the leading byte, and so if Cyrillic were encoded according to the JIS
standard it would get included in \cj; if it were encoded according to
ISO 8859/5, it would not be included in \cj.  (That's true for XEmacs,
Handa-san is of course authoritative for Emacs.)

While I think it is worth the pain to clean up this inelegant
inclusion of Greek, Cyrillic, etc in Japanese (among other things,
"native" fonts can be used instead of typically ugly fonts designed by
foreigners), it probably will break user applications.  Eg, I can
imagine an MUA that does things like check for \([[:ASCII:]]\|\cj\)*
to see if a message could be encoded in MIME charset ISO-2022-JP.  (I
don't know if any of the mainstream MUAs do that, though.)





  reply	other threads:[~2008-08-07 19:30 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-29 20:49 segmentation fault displaying etc/HELLO on Windows Juanma Barranquero
2008-07-30  6:48 ` Jason Rumney
2008-07-30 11:48   ` Juanma Barranquero
2008-07-30 13:05     ` Jason Rumney
2008-07-30 13:11       ` Jason Rumney
2008-07-30 14:03         ` Juanma Barranquero
2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
2008-07-30 15:03             ` Choice of fonts displaying etc/HELLO Jason Rumney
2008-07-30 15:26               ` Juanma Barranquero
2008-08-01 12:50                 ` Kenichi Handa
2008-08-01 12:56             ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Kenichi Handa
2008-08-01 13:17               ` Choice of fonts displaying etc/HELLO Jason Rumney
2008-08-01 13:51                 ` Eli Zaretskii
2008-08-05  7:33                   ` Kenichi Handa
2008-08-05 18:12                     ` Eli Zaretskii
2008-08-06  5:30                       ` Kenichi Handa
2008-08-06  6:14                         ` Stephen J. Turnbull
2008-08-06  6:29                           ` Kenichi Handa
2008-08-06 15:52                             ` Stephen J. Turnbull
2008-08-06 17:56                         ` Eli Zaretskii
2008-08-07  1:14                           ` Kenichi Handa
2008-08-07  3:22                             ` Eli Zaretskii
2008-08-07  3:54                               ` Kenichi Handa
2008-08-07  4:54                               ` Miles Bader
2008-08-07 18:03                                 ` Eli Zaretskii
2008-08-07 19:30                                   ` Stephen J. Turnbull [this message]
2008-08-11  8:48                                   ` Miles Bader
2008-08-11 19:03                                     ` Eli Zaretskii
2008-07-31  1:49           ` segmentation fault displaying etc/HELLO on Windows Kyle M. Lee
2008-07-31  2:03             ` Juanma Barranquero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y738oc5o.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=jasonr@gnu.org \
    --cc=lekktu@gmail.com \
    --cc=miles@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).