From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Emacs 23 character code space
Date: Fri, 07 Nov 2008 20:52:37 +0900
Message-ID: <E1KyPtF-0005Qa-6p@etlken.m17n.org>
References: <u63n7wmri.fsf@gnu.org> <E1KyLeO-0001xo-7t@etlken.m17n.org>
	<uy6zvsufb.fsf@gnu.org>
NNTP-Posting-Host: lo.gmane.org
X-Trace: ger.gmane.org 1226059092 18713 80.91.229.12 (7 Nov 2008 11:58:12 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 7 Nov 2008 11:58:12 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 07 12:59:14 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KyPze-0004nB-5y
	for ged-emacs-devel@m.gmane.org; Fri, 07 Nov 2008 12:59:14 +0100
Original-Received: from localhost ([127.0.0.1]:46072 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KyPyW-0004YG-JJ
	for ged-emacs-devel@m.gmane.org; Fri, 07 Nov 2008 06:58:04 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KyPtO-0001Jq-Ej
	for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:46 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KyPtM-0001Id-Gp
	for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:45 -0500
Original-Received: from [199.232.76.173] (port=52772 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KyPtM-0001IT-9d
	for emacs-devel@gnu.org; Fri, 07 Nov 2008 06:52:44 -0500
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:44229)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>)
	id 1KyPtJ-0001Sl-1J; Fri, 07 Nov 2008 06:52:41 -0500
Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115])
	by mx1.aist.go.jp  with ESMTP id mA7BqbU3019179;
	Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp1.aist.go.jp
	by rqsmtp1.aist.go.jp  with ESMTP id mA7Bqbq1023318;
	Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp1.aist.go.jp  with ESMTP id mA7BqbKM002899;
	Fri, 7 Nov 2008 20:52:37 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.69)
	(envelope-from <handa@m17n.org>)
	id 1KyPtF-0005Qa-6p; Fri, 07 Nov 2008 20:52:37 +0900
In-reply-to: <uy6zvsufb.fsf@gnu.org> (message from Eli Zaretskii on Fri, 07
	Nov 2008 12:27:04 +0200)
X-detected-operating-system: by monty-python.gnu.org: Solaris 9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:105437
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/105437>

In article <uy6zvsufb.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Thanks.  But I'm still a bit in the dark, as for how to describe this
> correctly and concisely.  Do we actually use the range of codes
> between 0x110000 and 0x3FFF7F?  If so, for what characters?  If we do
> not use them now, are there some plans for using them in the future?

I wrote about the usage of that area in the thread "size of
emacs executable after unicode merge" as below.

> (0) At first, Emacs assigns a unique linear character code
>     space in upper Unicode area (#x110000-) to each big
>     character set (e.g. GB, JIS, KSC) (*see the note at the
>     tail).  The decoding of a character of a specific
>     charset into this area is quite fast (done just by a few
>     steps of arithmetic calculation).  Encoding is the same
>     too.
[...]
> *Note:
> 
> The reason Emacs assigns those linear area is because such
> big charsets tend to have their own private use area, and we
> must keep a unique characte code for them.  Those private
> characters are decoded and encoded without being mapped to
> Unicode are.

For example, the charset `japanese-jisx0208' is defined as
below.

(define-charset 'japanese-jisx0208
  "JISX0208.1983/1990 Japanese Kanji: ISO-IR-87"
  :short-name "JISX0208"
  :long-name "JISX0208.1983/1990 (Japanese): ISO-IR-87"
  :iso-final-char ?B
  :emacs-mule-id 146
  :code-space [33 126 33 126]
  :code-offset #x140000
  :unify-map "JISX0208")

So, for that charset, the code space for 8836 (=94x94)
characters are preserved in that upper area linearly from
#x140000.  Then, most of the characters are mapped into
Unicode area by the map "JISX0208".  The remaining
characters stay in this upper area.

---
Kenichi Handa
handa@ni.aist.go.jp