From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Fri, 07 Nov 2008 20:52:37 +0900 Message-ID: References: NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1226059092 18713 80.91.229.12 (7 Nov 2008 11:58:12 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 7 Nov 2008 11:58:12 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 07 12:59:14 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KyPze-0004nB-5y for ged-emacs-devel@m.gmane.org; Fri, 07 Nov 2008 12:59:14 +0100 Original-Received: from localhost ([127.0.0.1]:46072 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KyPyW-0004YG-JJ for ged-emacs-devel@m.gmane.org; Fri, 07 Nov 2008 06:58:04 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KyPtO-0001Jq-Ej for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:46 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KyPtM-0001Id-Gp for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:45 -0500 Original-Received: from [199.232.76.173] (port=52772 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KyPtM-0001IT-9d for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:44 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:44229) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KyPtJ-0001Sl-1J; Fri, 07 Nov 2008 06:52:41 -0500 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id mA7BqbU3019179; Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id mA7Bqbq1023318; Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id mA7BqbKM002899; Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1KyPtF-0005Qa-6p; Fri, 07 Nov 2008 20:52:37 +0900 In-reply-to: (message from Eli Zaretskii on Fri, 07 Nov 2008 12:27:04 +0200) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:105437 Archived-At: In article , Eli Zaretskii writes: > Thanks. But I'm still a bit in the dark, as for how to describe this > correctly and concisely. Do we actually use the range of codes > between 0x110000 and 0x3FFF7F? If so, for what characters? If we do > not use them now, are there some plans for using them in the future? I wrote about the usage of that area in the thread "size of emacs executable after unicode merge" as below. > (0) At first, Emacs assigns a unique linear character code > space in upper Unicode area (#x110000-) to each big > character set (e.g. GB, JIS, KSC) (*see the note at the > tail). The decoding of a character of a specific > charset into this area is quite fast (done just by a few > steps of arithmetic calculation). Encoding is the same > too. [...] > *Note: > > The reason Emacs assigns those linear area is because such > big charsets tend to have their own private use area, and we > must keep a unique characte code for them. Those private > characters are decoded and encoded without being mapped to > Unicode are. For example, the charset `japanese-jisx0208' is defined as below. (define-charset 'japanese-jisx0208 "JISX0208.1983/1990 Japanese Kanji: ISO-IR-87" :short-name "JISX0208" :long-name "JISX0208.1983/1990 (Japanese): ISO-IR-87" :iso-final-char ?B :emacs-mule-id 146 :code-space [33 126 33 126] :code-offset #x140000 :unify-map "JISX0208") So, for that charset, the code space for 8836 (=94x94) characters are preserved in that upper area linearly from #x140000. Then, most of the characters are mapped into Unicode area by the map "JISX0208". The remaining characters stay in this upper area. --- Kenichi Handa handa@ni.aist.go.jp