From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Choice of fonts displaying etc/HELLO
Date: Fri, 08 Aug 2008 04:30:59 +0900
Message-ID: <87y738oc5o.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <f7ccd24b0807291349r5c16ca53gee0a4229ac9f9b8e@mail.gmail.com>
	<48900ED2.2000703@gnu.org>
	<f7ccd24b0807300448m1a883ebem11d808bbd871f6c3@mail.gmail.com>
	<4890670C.9000009@gnu.org> <48906865.4000808@gnu.org>
	<f7ccd24b0807300703p26fe2abof3135eed19e4fff9@mail.gmail.com>
	<48907856.6040308@gnu.org> <E1KOuBN-0007Cu-F7@etlken.m17n.org>
	<48930CE4.5080305@gnu.org> <uvdykn8rl.fsf@gnu.org>
	<E1KQH2Y-00024H-6o@etlken.m17n.org> <uzlnrjpp5.fsf@gnu.org>
	<E1KQbbD-0005D0-LW@etlken.m17n.org> <usktijact.fsf@gnu.org>
	<E1KQu5d-00086S-0S@etlken.m17n.org> <uiqudjyqg.fsf@gnu.org>
	<buo7iath1b4.fsf@dhapc248.dev.necel.com> <uhc9wk8he.fsf@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1218137584 27373 80.91.229.12 (7 Aug 2008 19:33:04 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 7 Aug 2008 19:33:04 +0000 (UTC)
Cc: lekktu@gmail.com, jasonr@gnu.org, emacs-devel@gnu.org, handa@m17n.org,
	Miles Bader <miles@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 07 21:33:54 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KRBF0-0007MK-Qo
	for ged-emacs-devel@m.gmane.org; Thu, 07 Aug 2008 21:33:43 +0200
Original-Received: from localhost ([127.0.0.1]:60637 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KRBE5-0007mU-4p
	for ged-emacs-devel@m.gmane.org; Thu, 07 Aug 2008 15:32:45 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KRBDB-000798-9K
	for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:49 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KRBD9-00077u-Bo
	for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:48 -0400
Original-Received: from [199.232.76.173] (port=50820 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KRBD9-00077j-0M
	for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:47 -0400
Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:49344)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stephen@xemacs.org>)
	id 1KRBCo-00035a-5j; Thu, 07 Aug 2008 15:31:27 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 027587FFA;
	Fri,  8 Aug 2008 04:31:12 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id DC5B01A25C3; Fri,  8 Aug 2008 04:30:59 +0900 (JST)
In-Reply-To: <uhc9wk8he.fsf@gnu.org>
X-Mailer: VM ?bug? under XEmacs 21.5.21 (x86_64-unknown-linux)
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:102176
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/102176>

Eli Zaretskii writes:
 > > From: Miles Bader <miles.bader@necel.com>
 > > Eli Zaretskii <eliz@gnu.org> writes:
 > > > I meant would it break something if "\\cj" matched only the Katakana
 > > > and Hiragana characters instead of what it matches today?
 > > 
 > > I don't know what it would break, but that doesn't seem like
 > > particularly intuitive behavior.
 > 
 > ??? Why not?

Because although Katakana and Hiragana are the only uniquely Japanese
word constituents, the written form of the Japanese language also uses
a set of ideographs (Kanji) borrowed from Chinese, as well as an
idiosyncratic set of symbols (eg, precomposed Roman numerals,
precomosed multiletter units such as "mm" and "kg").  Since the
admissible set of ideographs is defined by Ministry of Education
standards, the Japanese *set* of Kanji is not the same as the Chinese
*set*, and therefore need a category of their own.  So the Japanese
category should include, at least, Hiragana, Katakana, (Japanese)
Kanji, and the idiosyncratic symbol set.

 > > I think emacs' concept of characters belonging to multiple language
 > > categories is pretty neat actually.
 > 
 > Maybe I'm missing something, but I don't see how the fact that, say,
 > Cyrillic characters are claimed to belong to Japanese category could
 > be considered ``neat''.

It's not considered "neat" that Cyrillic is (in old Mule) considered
to be Japanese, at least not by me.  However, I do think it's useful,
at least, that the Hanzi (several varieties of Chinese) overlap the
Kanji (Japanese versions of same) and Hanja (Korean version).
Similarly for the accented characters that are used by Spanish and
French alike (although they don't use the same set, there is some
overlap), etc, etc.  I suppose that's what Miles meant?

Now, that inclusion of Cyrillic in Japanese is due to the fact that
with a character set size of nearly 10,000 and an official list of
about 6000 characters needed for daily use, the Japanese decided that
a more or less universal character set would be a good idea so they
added Cyrillic, Greek, and a number of math symbols, as well as a
bunch of other scripts and "stuff".  In the old Mule encoding I
suppose the \cX categories were implemented basically by looking at
the leading byte, and so if Cyrillic were encoded according to the JIS
standard it would get included in \cj; if it were encoded according to
ISO 8859/5, it would not be included in \cj.  (That's true for XEmacs,
Handa-san is of course authoritative for Emacs.)

While I think it is worth the pain to clean up this inelegant
inclusion of Greek, Cyrillic, etc in Japanese (among other things,
"native" fonts can be used instead of typically ugly fonts designed by
foreigners), it probably will break user applications.  Eg, I can
imagine an MUA that does things like check for \([[:ASCII:]]\|\cj\)*
to see if a message could be encoded in MIME charset ISO-2022-JP.  (I
don't know if any of the mainstream MUAs do that, though.)