From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: inputting characters by hexadigit Date: Sun, 20 Jul 2008 10:23:57 +0900 Message-ID: References: <868ww3vydn.fsf@lifelogs.com> <87myki6fqp.fsf@jurta.org> <87mykhz6tf.fsf@jurta.org> <87tzeokrku.fsf@jurta.org> <87od4wgg8p.fsf@catnip.gol.com> <86od4vmi5i.fsf@lifelogs.com> <873am6n21q.fsf@jurta.org> <87sku5if8t.fsf_-_@jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1216517058 26845 80.91.229.12 (20 Jul 2008 01:24:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 20 Jul 2008 01:24:18 +0000 (UTC) Cc: tzz@lifelogs.com, emacs-devel@gnu.org To: Juri Linkov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jul 20 03:25:06 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KKNfd-0004JM-E9 for ged-emacs-devel@m.gmane.org; Sun, 20 Jul 2008 03:25:05 +0200 Original-Received: from localhost ([127.0.0.1]:56465 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KKNek-000521-CZ for ged-emacs-devel@m.gmane.org; Sat, 19 Jul 2008 21:24:10 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KKNeg-00051M-0k for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:06 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KKNef-00050g-41 for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:05 -0400 Original-Received: from [199.232.76.173] (port=33367 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KKNee-00050Z-S1 for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:04 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:55609) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KKNee-00016b-4N for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:04 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id m6K1Nv4h013852; Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id m6K1Nv6g027197; Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id m6K1Nv0T011295; Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1KKNeX-0003FP-7C; Sun, 20 Jul 2008 10:23:57 +0900 In-reply-to: <87sku5if8t.fsf_-_@jurta.org> (message from Juri Linkov on Sun, 20 Jul 2008 03:29:14 +0300) X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id m6K1Nv4h013852 X-detected-kernel: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:101002 Archived-At: In article <87sku5if8t.fsf_-_@jurta.org>, Juri Linkov wr= ites: > > I think it is better to skip these ranges: > > #x3400..#x4dbf -- CJK Ideograph Extension A > > #x4e00..#x9fff -- CJK Ideograph > > #xd800..#xfaFF -- surroage-pair, private use, CJK COMPATIBILITY I= DEOGRAPH > > #x20000..#x2ffff -- CJK Ideograph Extension B > > and end the loop at #xeffff (#xf0000.. are for private use) > Actually there are no Unicode names in these ranges in UnicodeData.txt. > It has only lines for the first and the last character in these ranges: Yes. But, for CJK chars: (get-char-code-property CHAR 'name) returns a valid name something like "CJK IDEOGRAPH-3400"(*) because get-char-code-property not only looks up UnicodeData.txt but also compute a proper value if necessary. > If it would be possible to loop over names instead of loop over all > characters to check for their names, then this code would be more fast, > but I don't see how it would be possible to loop over all defined names > in UnicodeData.txt. > If this is not possible then we could optimize the loop over all > characters in the chartable to skip these useless ranges. I think it doesn't work because Hangul syllabic character names must also be computed algorithmically(*). I think just doing somethink like this is good: (dotimes (c #xEFFFF) (unless (CHAR-IS-IN-A-RANGE-TO-SKIP-P c) ...)) (*): "The Unicode Standard 5.1" has this section. 4.8 Name=E2=80=94Normative [...] Ideographs and Hangul Syllables. Names for ideographs and Hangul syllables are derived algorithmically. Unified CJK ideographs are named CJK UNIFIED IDEOGRAPH-x, where x is replaced with the hexadecimal Unicode code point=E2=80=94for example, cjk unified ideograph-4E00. Similarly, compatibility CJK ideographs are named =E2=80=9CCJK COMPATIBILITY IDEOGRAPH-x=E2=80=9D. The names of Hangul syllables are generated as described in =E2=80=9CHangul Syllable Names=E2=80=9D in Section 3.12, Conjoining Jamo Behavior. --- Kenichi Handa handa@ni.aist.go.jp