From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Choice of fonts displaying etc/HELLO Date: Fri, 08 Aug 2008 04:30:59 +0900 Message-ID: <87y738oc5o.fsf@uwakimon.sk.tsukuba.ac.jp> References: <48900ED2.2000703@gnu.org> <4890670C.9000009@gnu.org> <48906865.4000808@gnu.org> <48907856.6040308@gnu.org> <48930CE4.5080305@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1218137584 27373 80.91.229.12 (7 Aug 2008 19:33:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 7 Aug 2008 19:33:04 +0000 (UTC) Cc: lekktu@gmail.com, jasonr@gnu.org, emacs-devel@gnu.org, handa@m17n.org, Miles Bader To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 07 21:33:54 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KRBF0-0007MK-Qo for ged-emacs-devel@m.gmane.org; Thu, 07 Aug 2008 21:33:43 +0200 Original-Received: from localhost ([127.0.0.1]:60637 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KRBE5-0007mU-4p for ged-emacs-devel@m.gmane.org; Thu, 07 Aug 2008 15:32:45 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KRBDB-000798-9K for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:49 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KRBD9-00077u-Bo for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:48 -0400 Original-Received: from [199.232.76.173] (port=50820 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KRBD9-00077j-0M for emacs-devel@gnu.org; Thu, 07 Aug 2008 15:31:47 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:49344) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KRBCo-00035a-5j; Thu, 07 Aug 2008 15:31:27 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 027587FFA; Fri, 8 Aug 2008 04:31:12 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id DC5B01A25C3; Fri, 8 Aug 2008 04:30:59 +0900 (JST) In-Reply-To: X-Mailer: VM ?bug? under XEmacs 21.5.21 (x86_64-unknown-linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:102176 Archived-At: Eli Zaretskii writes: > > From: Miles Bader > > Eli Zaretskii writes: > > > I meant would it break something if "\\cj" matched only the Katakana > > > and Hiragana characters instead of what it matches today? > > > > I don't know what it would break, but that doesn't seem like > > particularly intuitive behavior. > > ??? Why not? Because although Katakana and Hiragana are the only uniquely Japanese word constituents, the written form of the Japanese language also uses a set of ideographs (Kanji) borrowed from Chinese, as well as an idiosyncratic set of symbols (eg, precomposed Roman numerals, precomosed multiletter units such as "mm" and "kg"). Since the admissible set of ideographs is defined by Ministry of Education standards, the Japanese *set* of Kanji is not the same as the Chinese *set*, and therefore need a category of their own. So the Japanese category should include, at least, Hiragana, Katakana, (Japanese) Kanji, and the idiosyncratic symbol set. > > I think emacs' concept of characters belonging to multiple language > > categories is pretty neat actually. > > Maybe I'm missing something, but I don't see how the fact that, say, > Cyrillic characters are claimed to belong to Japanese category could > be considered ``neat''. It's not considered "neat" that Cyrillic is (in old Mule) considered to be Japanese, at least not by me. However, I do think it's useful, at least, that the Hanzi (several varieties of Chinese) overlap the Kanji (Japanese versions of same) and Hanja (Korean version). Similarly for the accented characters that are used by Spanish and French alike (although they don't use the same set, there is some overlap), etc, etc. I suppose that's what Miles meant? Now, that inclusion of Cyrillic in Japanese is due to the fact that with a character set size of nearly 10,000 and an official list of about 6000 characters needed for daily use, the Japanese decided that a more or less universal character set would be a good idea so they added Cyrillic, Greek, and a number of math symbols, as well as a bunch of other scripts and "stuff". In the old Mule encoding I suppose the \cX categories were implemented basically by looking at the leading byte, and so if Cyrillic were encoded according to the JIS standard it would get included in \cj; if it were encoded according to ISO 8859/5, it would not be included in \cj. (That's true for XEmacs, Handa-san is of course authoritative for Emacs.) While I think it is worth the pain to clean up this inelegant inclusion of Greek, Cyrillic, etc in Japanese (among other things, "native" fonts can be used instead of typically ugly fonts designed by foreigners), it probably will break user applications. Eg, I can imagine an MUA that does things like check for \([[:ASCII:]]\|\cj\)* to see if a message could be encoded in MIME charset ISO-2022-JP. (I don't know if any of the mainstream MUAs do that, though.)