From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: inputting characters by hexadigit
Date: Sun, 20 Jul 2008 10:23:57 +0900
Message-ID: <E1KKNeX-0003FP-7C@etlken.m17n.org>
References: <868ww3vydn.fsf@lifelogs.com>
	<87myki6fqp.fsf@jurta.org>	<E1KIwlf-00036S-7H@etlken.m17n.org>
	<87mykhz6tf.fsf@jurta.org>	<E1KJIG0-0001ru-ME@etlken.m17n.org>
	<87tzeokrku.fsf@jurta.org>	<E1KJdzm-0006K8-IP@etlken.m17n.org>
	<87od4wgg8p.fsf@catnip.gol.com>	<86od4vmi5i.fsf@lifelogs.com>
	<873am6n21q.fsf@jurta.org>	<E1KK0yx-0006kT-Kg@etlken.m17n.org>
	<87sku5if8t.fsf_-_@jurta.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1216517058 26845 80.91.229.12 (20 Jul 2008 01:24:18 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 20 Jul 2008 01:24:18 +0000 (UTC)
Cc: tzz@lifelogs.com, emacs-devel@gnu.org
To: Juri Linkov <juri@jurta.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jul 20 03:25:06 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KKNfd-0004JM-E9
	for ged-emacs-devel@m.gmane.org; Sun, 20 Jul 2008 03:25:05 +0200
Original-Received: from localhost ([127.0.0.1]:56465 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KKNek-000521-CZ
	for ged-emacs-devel@m.gmane.org; Sat, 19 Jul 2008 21:24:10 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KKNeg-00051M-0k
	for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:06 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KKNef-00050g-41
	for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:05 -0400
Original-Received: from [199.232.76.173] (port=33367 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KKNee-00050Z-S1
	for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:04 -0400
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:55609)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>) id 1KKNee-00016b-4N
	for emacs-devel@gnu.org; Sat, 19 Jul 2008 21:24:04 -0400
Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115])
	by mx1.aist.go.jp  with ESMTP id m6K1Nv4h013852;
	Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp3.aist.go.jp
	by rqsmtp1.aist.go.jp  with ESMTP id m6K1Nv6g027197;
	Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp3.aist.go.jp  with ESMTP id m6K1Nv0T011295;
	Sun, 20 Jul 2008 10:23:57 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.69)
	(envelope-from <handa@m17n.org>)
	id 1KKNeX-0003FP-7C; Sun, 20 Jul 2008 10:23:57 +0900
In-reply-to: <87sku5if8t.fsf_-_@jurta.org> (message from Juri Linkov on Sun,
	20 Jul 2008 03:29:14 +0300)
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id
	m6K1Nv4h013852
X-detected-kernel: by monty-python.gnu.org: Solaris 9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:101002
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/101002>

In article <87sku5if8t.fsf_-_@jurta.org>, Juri Linkov <juri@jurta.org> wr=
ites:

> > I think it is better to skip these ranges:
> >   #x3400..#x4dbf   -- CJK Ideograph Extension A
> >   #x4e00..#x9fff   -- CJK Ideograph
> >   #xd800..#xfaFF   -- surroage-pair, private use, CJK COMPATIBILITY I=
DEOGRAPH
> >   #x20000..#x2ffff -- CJK Ideograph Extension B
> > and end the loop at #xeffff (#xf0000.. are for private use)

> Actually there are no Unicode names in these ranges in UnicodeData.txt.
> It has only lines for the first and the last character in these ranges:

Yes.  But, for CJK chars:

   (get-char-code-property CHAR 'name)

returns a valid name something like "CJK IDEOGRAPH-3400"(*)
because get-char-code-property not only looks up
UnicodeData.txt but also compute a proper value if
necessary.

> If it would be possible to loop over names instead of loop over all
> characters to check for their names, then this code would be more fast,
> but I don't see how it would be possible to loop over all defined names
> in UnicodeData.txt.

> If this is not possible then we could optimize the loop over all
> characters in the chartable to skip these useless ranges.

I think it doesn't work because Hangul syllabic character
names must also be computed algorithmically(*).   I think
just doing somethink like this is good:

 (dotimes (c #xEFFFF)
    (unless (CHAR-IS-IN-A-RANGE-TO-SKIP-P c)
       ...))


(*): "The Unicode Standard 5.1" has this section.

4.8 Name=E2=80=94Normative
[...]
Ideographs and Hangul Syllables. Names for ideographs and
Hangul syllables are derived algorithmically. Unified CJK
ideographs are named CJK UNIFIED IDEOGRAPH-x, where x is
replaced with the hexadecimal Unicode code point=E2=80=94for
example, cjk unified ideograph-4E00. Similarly,
compatibility CJK ideographs are named =E2=80=9CCJK COMPATIBILITY
IDEOGRAPH-x=E2=80=9D. The names of Hangul syllables are generated as
described in =E2=80=9CHangul Syllable Names=E2=80=9D in Section 3.12,
Conjoining Jamo Behavior.

---
Kenichi Handa
handa@ni.aist.go.jp