From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dan Nicolaescu Newsgroups: gmane.emacs.devel Subject: Re: size of emacs executable after unicode merge Date: Fri, 31 Oct 2008 08:07:14 -0700 (PDT) Message-ID: <200810311507.m9VF7EAl022755@mothra.ics.uci.edu> References: <200805140351.m4E3pQuE004549@sallyv1.ics.uci.edu> <200805141652.m4EGqikr018644@sallyv1.ics.uci.edu> <200805151529.m4FFTlF1004684@sallyv1.ics.uci.edu> <482D8435.6060407@gnu.org> <20081030101819.GA15223@orion.lan> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1225465673 14275 80.91.229.12 (31 Oct 2008 15:07:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 31 Oct 2008 15:07:53 +0000 (UTC) Cc: evilborisnet@netscape.net, jasonr@gnu.org, rms@gnu.org, emanuele.giaquinta@gmail.com, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 31 16:08:54 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KvvcM-0006Kh-6K for ged-emacs-devel@m.gmane.org; Fri, 31 Oct 2008 16:08:54 +0100 Original-Received: from localhost ([127.0.0.1]:38996 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KvvbE-0000au-2a for ged-emacs-devel@m.gmane.org; Fri, 31 Oct 2008 11:07:44 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KvvbA-0000aP-7I for emacs-devel@gnu.org; Fri, 31 Oct 2008 11:07:40 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kvvb8-0000a3-2a for emacs-devel@gnu.org; Fri, 31 Oct 2008 11:07:39 -0400 Original-Received: from [199.232.76.173] (port=52635 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kvvb7-0000a0-UT for emacs-devel@gnu.org; Fri, 31 Oct 2008 11:07:37 -0400 Original-Received: from barrelv2.ics.uci.edu ([128.195.1.114]:61073) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA1:24) (Exim 4.60) (envelope-from ) id 1Kvvaz-0008L6-Ej; Fri, 31 Oct 2008 11:07:30 -0400 Original-Received: from mothra.ics.uci.edu (mothra.ics.uci.edu [128.195.6.93]) by barrelv2.ics.uci.edu (8.13.7+Sun/8.13.7) with ESMTP id m9VF7Fsb014065; Fri, 31 Oct 2008 08:07:15 -0700 (PDT) Original-Received: (from dann@localhost) by mothra.ics.uci.edu (8.13.8+Sun/8.13.6/Submit) id m9VF7EAl022755; Fri, 31 Oct 2008 08:07:14 -0700 (PDT) In-Reply-To: (Kenichi Handa's message of "Fri, 31 Oct 2008 14:29:28 +0900") Original-Lines: 70 X-ICS-MailScanner-Information: Please contact the ISP for more information X-ICS-MailScanner-ID: m9VF7Fsb014065 X-ICS-MailScanner: Found to be clean X-ICS-MailScanner-SpamCheck: not spam, SpamAssassin (score=-1.44, required 5, autolearn=disabled, ALL_TRUSTED -1.44) X-ICS-MailScanner-From: dann@mothra.ics.uci.edu X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:105215 Archived-At: Kenichi Handa writes: > In article , "Richard M. Stallman" writes: > > > If I comment the load_charset_map_from_file call in unify_charset the > > data segment size is back to normal. > > > Although these are loaded "on demand", perhaps something "demands" them > > at build time. > > It's not that simple. This is the strategy of the charset > map loading mechanism. I took that approach expecting that > char-tables that are garbage-collected before dumping are > not in the dumped file. > > (0) At first, Emacs assigns a unique linear character code > space in upper Unicode area (#x110000-) to each big > character set (e.g. GB, JIS, KSC) (*see the note at the > tail). The decoding of a character of a specific > charset into this area is quite fast (done just by a few > steps of arithmetic calculation). Encoding is the same > too. > > (1) While building Emacs, when unify-charset is called, we > update two char-tables Vchar_unify_table, and > Vchar_unified_charset_table. The former maps a > character in the above upper area to Unicode area, and > the latter maps the character to charset symbol. > Unify-charset also builds deunifier char-table for each > charater set that maps a character in Unicode area to > the upper area that is unique to each charset. > > So at this time, the full maps is build. > > (2) Just before dumping, clear-charset-maps is called. This > function sets all char-tables built in (1) (except for > Vchar_unified_charset_table) to nil. Then set > Vchar_unify_table to Vchar_unified_charset_table, and > set Vchar_unified_charset_table to nil. > > Then, garbage-collect is called. After that, the living > char-table is Vchar_unify_table only, and the contents > is not that big because it maps upper area characters to > charset, and each charset has linear upper area, thus > most succeeding charaters have the same value. To allow the allocator can release pages back to the system after they being garbage collected, you have to be sure that absolutely ALL the data allocated can be garbage collected. (and even then you depend on the quirks of the platform specific malloc implementation to do it). >From the sound of the description above, it sounds like the data in Vchar_unify_table is allocated while reading the charset data, and it is not released after the charset data is. So the allocator cannot release all the pages... [note: this speculation based solely on your description above] > (3) When the dumped Emacs runs, at the time of > decoding/encoding charsets that are unified as above, by > checking if the value of Vchar_unify_table for a > character is symbol or not, Emacs knows whether it has > to load the mapping table again or not. > > So, that way, Emacs loads maps on demand. So it sounds that your goal is to build Vchar_unify_table, and it is build from static data in emacs/etc/charsets/*. In that case, can't the data in Vchar_unify_table be a C data structure that is build offline, and just compiled into emacs?