From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: size of emacs executable after unicode merge Date: Mon, 10 Nov 2008 10:59:27 +0900 Message-ID: References: <200805140351.m4E3pQuE004549@sallyv1.ics.uci.edu> <200805141652.m4EGqikr018644@sallyv1.ics.uci.edu> <200805151529.m4FFTlF1004684@sallyv1.ics.uci.edu> <482D8435.6060407@gnu.org> <20081030101819.GA15223@orion.lan> <200810311507.m9VF7EAl022755@mothra.ics.uci.edu> <873ai7t7fx.fsf@cyd.mit.edu> <87iqqwk672.fsf@cyd.mit.edu> NNTP-Posting-Host: dough.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1226282617 26559 80.91.229.10 (10 Nov 2008 02:03:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 10 Nov 2008 02:03:37 +0000 (UTC) Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, emacs-devel@gnu.org, dann@ics.uci.edu, monnier@iro.umontreal.ca, evilborisnet@netscape.net, jasonr@gnu.org To: Chong Yidong Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Nov 10 03:41:34 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from mail-forward1.uio.no ([129.240.10.70]) by dough.gmane.org with esmtp (Exim 4.50) id 1KzMiE-0003j1-On for ged-emacs-devel@m.gmane.org; Mon, 10 Nov 2008 03:41:10 +0100 Original-Received: from exim by mail-out1.uio.no with local-bsmtp (Exim 4.69) (envelope-from ) id 1KzM7I-0006SQ-CM for ged-emacs-devel@m.gmane.org; Mon, 10 Nov 2008 03:03:00 +0100 Original-Received: from mail-mx4.uio.no ([129.240.10.45]) by mail-out1.uio.no with esmtp (Exim 4.69) (envelope-from ) id 1KzM7I-0006SN-BA for ged-emacs-devel@m.gmane.org; Mon, 10 Nov 2008 03:03:00 +0100 Original-Received: from lists.gnu.org ([199.232.76.165]) by mail-mx4.uio.no with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1KzM7H-0004ZS-2t for ged-emacs-devel@m.gmane.org; Mon, 10 Nov 2008 03:03:00 +0100 Original-Received: from localhost ([127.0.0.1]:42724 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KzM4A-0003iZ-KO for ged-emacs-devel@m.gmane.org; Sun, 09 Nov 2008 20:59:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KzM46-0003i6-FG for emacs-devel@gnu.org; Sun, 09 Nov 2008 20:59:42 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KzM42-0003hR-Im for emacs-devel@gnu.org; Sun, 09 Nov 2008 20:59:41 -0500 Original-Received: from [199.232.76.173] (port=60555 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KzM42-0003hO-FF for emacs-devel@gnu.org; Sun, 09 Nov 2008 20:59:38 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:62331) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KzM3w-0001ES-Nz; Sun, 09 Nov 2008 20:59:33 -0500 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id mAA1xRw1005072; Mon, 10 Nov 2008 10:59:27 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id mAA1xRms003984; Mon, 10 Nov 2008 10:59:27 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id mAA1xRAe002709; Mon, 10 Nov 2008 10:59:27 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1KzM3r-0002rs-B8; Mon, 10 Nov 2008 10:59:27 +0900 In-reply-to: <87iqqwk672.fsf@cyd.mit.edu> (message from Chong Yidong on Sun, 09 Nov 2008 15:14:25 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-UiO-SPF-Received: Received-SPF: pass (mail-mx4.uio.no: domain of gnu.org designates 199.232.76.165 as permitted sender) client-ip=199.232.76.165; envelope-from=emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org; helo=lists.gnu.org; X-UiO-Spam-info: not spam, SpamAssassin (score=-4.0, required=5.0, autolearn=disabled, RCVD_IN_DNSWL_MED=-4, uiobl=NO, uiouri=NO) X-UiO-Scanned: 0224E6F766C20D1FE28CD8CFDC7EF289E0A1EAE1 X-UiO-SPAM-Test: remote_host: 199.232.76.165 spam_score: -39 maxlevel 200 minaction 2 bait 0 mail/h: 3 total 76662 max/h 424 blacklist 0 greylist 0 ratelimit 0 Xref: news.gmane.org gmane.emacs.devel:105521 Archived-At: In article <87iqqwk672.fsf@cyd.mit.edu>, Chong Yidong writes: > Kenichi Handa writes: > > The problem is that lisp/international/characters.el setups > > syntax-table and category-table for many characters by > > map-charset-chars. > > > > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > > > > To know which (Unicode) characters belongs to > > chinese-gb2312, Emacs has to load a mapping table. > Could you try to describe what needs to be done in more detail? That > way, even if you don't have time to implement this, someone else might > be able to take a stab at it. map-charset-chars calls FUNCTION (modify-category-entry in the above case) on all characters in CHARSET. But, to know which characters belongs to CHARET (chinese-gb2312 in the above case), we must consult with "etc/charsets/GB2312.map". The contents is something like this: 0x2121-0x2123 0x3000 0x2124 0x30FB 0x2125 0x02C9 [...] =46rom this file, we know that #x3000, #x3001, #x3002, #x30FB, #x02C9, ... belong to chinese-gb2312. We must find a way to make map-charset-chars work without loading that map into a char-table. One idea is to have a single boolean vector of size #x110000 (139264 bytes), setup it for CHARSET everytime when we call map-charset-chars for the different charset. In that vector, only the bit for #x3000, #x3001, #x3002, etc are 1 for chinese-gb2312. Then map-charset-chars can know for which characters FUNCTION must be called. --- Kenichi Handa handa@ni.aist.go.jp