size of emacs executable after unicode merge

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* size of emacs executable after unicode merge
@ 2008-05-14  3:13 Evil Boris
  2008-05-14  3:51 ` Dan Nicolaescu
  0 siblings, 1 reply; 86+ messages in thread
From: Evil Boris @ 2008-05-14  3:13 UTC (permalink / raw)
  To: emacs-devel


I see Emacs executable size grew by more than a factor of 2 after
Unicode merge.  (This is on Solaris 7 "sparc-sun-solaris2.7, X
toolkit", with ancient gcc 3.1.)

Does everyone see that, too?

     --BA

  /local/2/software/emacs-cvs/emacs/src:
  wildcard emacs-*
  -rwxr-xr-x   2 aronov    9519135 Feb  1 10:54 emacs-23.0.50.22
  -rwxr-xr-x   1 aronov    9519135 Feb  1 10:49 emacs-23.0.50.21
  -rwxr-xr-x   1 aronov    9518818 Jan 28 11:17 emacs-23.0.50.20
  -rwxr-xr-x   1 aronov    9527010 Jan 28 10:55 emacs-23.0.50.19
  -rwxr-xr-x   1 aronov    9517283 Jan  3 12:13 emacs-23.0.50.18
  -rwxr-xr-x   1 aronov    9517283 Jan  3 10:57 emacs-23.0.50.17
  ...

  /local/2/software/emacs-cvs/emacs-after-unicode2-merge/emacs/src:
  wildcard emacs*
  -rwxr-xr-x   2 aronov   22570304 Mar 21 11:23 emacs-23.0.60.36
  -rwxr-xr-x   1 aronov   22571511 Mar 16 11:18 emacs-23.0.60.35
  -rwxr-xr-x   1 aronov   22579387 Mar 15 13:09 emacs-23.0.60.33
  -rwxr-xr-x   1 aronov   22620408 Mar 12 10:18 emacs-23.0.60.31
  -rwxr-xr-x   1 aronov   22620408 Mar 10 12:06 emacs-23.0.60.29
  -rwxr-xr-x   1 aronov   22767972 Mar  6 15:18 emacs-23.0.60.27
  -rwxr-xr-x   1 aronov   22767972 Mar  5 13:56 emacs-23.0.60.25
  ...





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-14  3:13 size of emacs executable after unicode merge Evil Boris
@ 2008-05-14  3:51 ` Dan Nicolaescu
  2008-05-14 16:39   ` Richard M Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-05-14  3:51 UTC (permalink / raw)
  To: Evil Boris; +Cc: emacs-devel

Evil Boris <evilborisnet@netscape.net> writes:

  > I see Emacs executable size grew by more than a factor of 2 after
  > Unicode merge.  (This is on Solaris 7 "sparc-sun-solaris2.7, X
  > toolkit", with ancient gcc 3.1.)
  > 
  > Does everyone see that, too?

Yes, it's noted in admin/FOR-RELEASE






^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-14  3:51 ` Dan Nicolaescu
@ 2008-05-14 16:39   ` Richard M Stallman
  2008-05-14 16:52     ` Dan Nicolaescu
  0 siblings, 1 reply; 86+ messages in thread
From: Richard M Stallman @ 2008-05-14 16:39 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet

      > I see Emacs executable size grew by more than a factor of 2 after
      > Unicode merge.  (This is on Solaris 7 "sparc-sun-solaris2.7, X
      > toolkit", with ancient gcc 3.1.)
      > 
      > Does everyone see that, too?

    Yes, it's noted in admin/FOR-RELEASE

What caused it to grow so much?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-14 16:39   ` Richard M Stallman
@ 2008-05-14 16:52     ` Dan Nicolaescu
  2008-05-15 14:18       ` Richard M Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-05-14 16:52 UTC (permalink / raw)
  To: rms; +Cc: evilborisnet, emacs-devel

Richard M Stallman <rms@gnu.org> writes:

  >       > I see Emacs executable size grew by more than a factor of 2 after
  >       > Unicode merge.  (This is on Solaris 7 "sparc-sun-solaris2.7, X
  >       > toolkit", with ancient gcc 3.1.)
  >       > 
  >       > Does everyone see that, too?
  > 
  >     Yes, it's noted in admin/FOR-RELEASE
  > 
  > What caused it to grow so much?

Here's what Handa-san said about that:

One possibility is that the difference is because of big CJK
charset maps loaded while creating emacs from temacs.  Just
before dumping, loadup.el calls `clear-charset-maps' which
sets internal vectors and char-tables for charsets to nil,
but the following call of `garbage-collect' doesn't return
those free memory.

I'll try to convert all encodings of language/*.el files to
utf-8 so that Emacs doesn't have to load charset maps in
loadup.el.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-14 16:52     ` Dan Nicolaescu
@ 2008-05-15 14:18       ` Richard M Stallman
  2008-05-15 15:29         ` Dan Nicolaescu
  0 siblings, 1 reply; 86+ messages in thread
From: Richard M Stallman @ 2008-05-15 14:18 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet

    I'll try to convert all encodings of language/*.el files to
    utf-8 so that Emacs doesn't have to load charset maps in
    loadup.el.

I think that is important, but please let's see what Handa says
about this proposed change.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-15 14:18       ` Richard M Stallman
@ 2008-05-15 15:29         ` Dan Nicolaescu
  2008-05-16 11:31           ` Richard M Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-05-15 15:29 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, evilborisnet

Richard M Stallman <rms@gnu.org> writes:

  >     I'll try to convert all encodings of language/*.el files to
  >     utf-8 so that Emacs doesn't have to load charset maps in
  >     loadup.el.
  > 
  > I think that is important, but please let's see what Handa says
  > about this proposed change.

You misunderstood, what you cited above is part of what he said...




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-15 15:29         ` Dan Nicolaescu
@ 2008-05-16 11:31           ` Richard M Stallman
  2008-05-16 12:06             ` Dan Nicolaescu
  2008-05-16 12:32             ` Kenichi Handa
  0 siblings, 2 replies; 86+ messages in thread
From: Richard M Stallman @ 2008-05-16 11:31 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet

      >     I'll try to convert all encodings of language/*.el files to
      >     utf-8 so that Emacs doesn't have to load charset maps in
      >     loadup.el.
      > 
      > I think that is important, but please let's see what Handa says
      > about this proposed change.

    You misunderstood, what you cited above is part of what he said...

I mean, let's see what he says about the change you propose to make.
It looked like he described the current situation and
your proposal was a response to that situation.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 11:31           ` Richard M Stallman
@ 2008-05-16 12:06             ` Dan Nicolaescu
  2008-05-16 12:32             ` Kenichi Handa
  1 sibling, 0 replies; 86+ messages in thread
From: Dan Nicolaescu @ 2008-05-16 12:06 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, evilborisnet

Richard M Stallman <rms@gnu.org> writes:

  >       >     I'll try to convert all encodings of language/*.el files to
  >       >     utf-8 so that Emacs doesn't have to load charset maps in
  >       >     loadup.el.
  >       > 
  >       > I think that is important, but please let's see what Handa says
  >       > about this proposed change.
  > 
  >     You misunderstood, what you cited above is part of what he said...
  > 
  > I mean, let's see what he says about the change you propose to make.
  > It looked like he described the current situation and
  > your proposal was a response to that situation.

You misunderstand again, I didn't propose anything, I just pasted his
message word for word.  The proposal is his...




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 11:31           ` Richard M Stallman
  2008-05-16 12:06             ` Dan Nicolaescu
@ 2008-05-16 12:32             ` Kenichi Handa
  2008-05-16 12:55               ` Jason Rumney
  1 sibling, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-05-16 12:32 UTC (permalink / raw)
  To: rms; +Cc: dann, evilborisnet, emacs-devel

In article <E1Jwy9X-00016Z-4m@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes:

> I'll try to convert all encodings of language/*.el files to
> utf-8 so that Emacs doesn't have to load charset maps in
> loadup.el.
> 
> I think that is important, but please let's see what Handa says
> about this proposed change.

>     You misunderstood, what you cited above is part of what he said...

> I mean, let's see what he says about the change you propose to make.
> It looked like he described the current situation and
> your proposal was a response to that situation.

I did the experiment of changing all files containing a
Japanese character that are not byte-compiled to utf-8, but
the resulting size is not different.  That means that my
guess above is not correct.  :-(

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 12:32             ` Kenichi Handa
@ 2008-05-16 12:55               ` Jason Rumney
  2008-05-16 15:59                 ` Thomas Lord
  2008-05-17  0:56                 ` Kenichi Handa
  0 siblings, 2 replies; 86+ messages in thread
From: Jason Rumney @ 2008-05-16 12:55 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet

Kenichi Handa wrote:
> I did the experiment of changing all files containing a
> Japanese character that are not byte-compiled to utf-8, but
> the resulting size is not different.  That means that my
> guess above is not correct.  :-(
>   

How big are the data structures holding all the unicode character info 
and translation tables for encodings? At one point you said you wanted 
to make the unicode tables be loaded on demand but AFAIK they are 
currently dumped into the emacs executable. Could that account for a 
large proportion of the size increase?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 12:55               ` Jason Rumney
@ 2008-05-16 15:59                 ` Thomas Lord
  2008-05-16 22:07                   ` Stephen J. Turnbull
  2008-05-17  0:56                 ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Thomas Lord @ 2008-05-16 15:59 UTC (permalink / raw)
  To: Jason Rumney; +Cc: dann, emacs-devel, rms, evilborisnet, Kenichi Handa

Jason Rumney wrote:
> How big are the data structures holding all the unicode character info 
> and translation tables for encodings? 

If that turns out to be the problem, will someone please contact me 
directly?
(I ask that because I mostly just skim this list and so miss things.)

Several years back I devoted a pretty decent number of hours to working
out good ways to compress the run-time representation of such tables
without sacrificing much performance on accesses.   If it would be helpful,
I can take a little bit of time to refresh my memory of how I did it and
report that to someone working on the problem in Emacs, thus hopefully
saving time (and squeaking a little bit more "payout" from my initial
investment).

-t

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 15:59                 ` Thomas Lord
@ 2008-05-16 22:07                   ` Stephen J. Turnbull
  2008-05-16 23:01                     ` Thomas Lord
  0 siblings, 1 reply; 86+ messages in thread
From: Stephen J. Turnbull @ 2008-05-16 22:07 UTC (permalink / raw)
  To: Thomas Lord
  Cc: rms, Kenichi Handa, emacs-devel, dann, evilborisnet, Jason Rumney

Thomas Lord writes:

 > Jason Rumney wrote:
 > > How big are the data structures holding all the unicode character info 
 > > and translation tables for encodings? 

Is it possible that the whole Unicode range (17*2^16 code points) is
being dumped?  That would lead to about the size change observed,
extrapolating from my "naive estimate" for XEmacs implementation of
the BMP given below.  But surely no characters outside of the BMP are
needed to dump Emacs.

 > If that turns out to be the problem, will someone please contact me 
 > directly?
 > (I ask that because I mostly just skim this list and so miss things.)
 > 
 > Several years back I devoted a pretty decent number of hours to working
 > out good ways to compress the run-time representation of such tables
 > without sacrificing much performance on accesses.

Loading on demand is generally a better solution, as most non-Asians
use less than 500 characters, highly localized to about 3 ranges that
can be loaded individually.

Nor do you really need "good solutions", as half of the BMP is hanzi
and Hangul which are basically constant ranges for the character info
tables, and another 10% is private space and surrogates, leading to
approximately 60% savings by using ranges and appropriate defaults for
these four classes.  The non-BMP planes surely can be loaded on-demand.

 > If it would be helpful,

Did you do much better than 60% savings?  If not, it's probably not
really worth much effort given an efficient range table representation
already available.  In any case, something else is going on here
besides naive representation (assuming we're restricted to the BMP).

In XEmacs, where all coding tables for the BMP are loaded by default,
much more naive strategies than those outlined above give 891800 bytes
total for the to-unicode and from-unicode tables.  I think we're
missing a couple of charsets that Emacs Mule provides, but they're
minor.  We don't currently implement the Unidata base, but most (all?) 
of the character properties can be compactly represented as a small
number of Booleans each, so a table of bitvectors for the BMP "should"
only be about 256KB or maybe 512KB.  IIRC XEmacs/UTF-2000 implemented
the BMP Unidata as a Lisp array of Lisp bitvectors in about 1MB (most
of which is Lisp object overhead).

In other words, even with a naive strategy, the Unicode BMP database
should only add about 1.1MB to 1.4MB, ie, about 10% of the size
increase seen here, if coded compactly but straightforwardly in C.

A few straightforward optimizations can probably get that down to
500KB to 700KB, and for an on-demand setup, most Western users should
only see a footprint of about 10-15KB for Unicode data, if that.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 22:07                   ` Stephen J. Turnbull
@ 2008-05-16 23:01                     ` Thomas Lord
  0 siblings, 0 replies; 86+ messages in thread
From: Thomas Lord @ 2008-05-16 23:01 UTC (permalink / raw)
  To: Stephen J. Turnbull
  Cc: rms, Kenichi Handa, emacs-devel, dann, evilborisnet, Jason Rumney

Stephen J. Turnbull wrote:
>  > If it would be helpful,
>
> Did you do much better than 60% savings? 

As I recall, I did considerably better, though I'm not clear whether or not
we're talking about the same tables.   I could be mistaken, hence the 
passive
request for prompting to indicate whether or not it's worth really 
refreshing
my memory here.

You are on the right track to observe that the density of stuff that matters
is the key to optimization.

Trie-based sparse-away approaches seem to work very well.    The trick
is to do some off-line computation to work out a roughly optimal breadth
and depth.   I found it worked well to vary the breadth according to depth.
That's, in a nutshell, what I'm talking about.

You talk about range encoding.  Ick.  Too many tests and branches, in my 
experience.
A simple trie will do -- just take care to get its shape correct.

> In other words, even with a naive strategy, the Unicode BMP database
> should only add about 1.1MB to 1.4MB, ie, about 10% of the size
> increase seen here, if coded compactly but straightforwardly in C.
>
>   

I'm not talking about boatloads of code and, if done right, it has other 
applications
as well.

It's no big deal either way.  I don't mean to argue.  I just thought it 
might be helpful.
I'm just a patzer or kibbitzer here, take yr pick.

As an aside: virtual memory hardware sucks and is pointless.   
Segmentation rocks, on
the other hand.   But, that's a topic for a day a ways in the future, 
unfortunately.

-t

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-16 12:55               ` Jason Rumney
  2008-05-16 15:59                 ` Thomas Lord
@ 2008-05-17  0:56                 ` Kenichi Handa
  2008-05-17  1:52                   ` YAMAMOTO Mitsuharu
  2008-10-30 10:18                   ` Emanuele Giaquinta
  1 sibling, 2 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-05-17  0:56 UTC (permalink / raw)
  To: Jason Rumney; +Cc: dann, rms, evilborisnet, emacs-devel

In article <482D8435.6060407@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> How big are the data structures holding all the unicode character info 
> and translation tables for encodings?

I have not yet calculated them.  Each CJK charset defined by
a map need a char-table of encoding and a vector for
decoding.

> At one point you said you wanted 
> to make the unicode tables be loaded on demand but AFAIK they are 
> currently dumped into the emacs executable. Could that account for a 
> large proportion of the size increase?

They are surely loaded on demand.  And, at the end of the
dumping process, by calling clear-charset-maps, all loaded
maps are set to nil, and then garbage-collect is called.  So
I think they are not included in the dumped Emacs.  Perhaps
some other char-table is the culprit.  Maybe
Vchar_unify_table.  I must re-read src/charset.c.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-17  0:56                 ` Kenichi Handa
@ 2008-05-17  1:52                   ` YAMAMOTO Mitsuharu
  2008-05-19  1:45                     ` Kenichi Handa
  2008-10-30 10:18                   ` Emanuele Giaquinta
  1 sibling, 1 reply; 86+ messages in thread
From: YAMAMOTO Mitsuharu @ 2008-05-17  1:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet, Jason Rumney

>>>>> On Sat, 17 May 2008 09:56:58 +0900, Kenichi Handa <handa@m17n.org> said:

> They are surely loaded on demand.  And, at the end of the dumping
> process, by calling clear-charset-maps, all loaded maps are set to
> nil, and then garbage-collect is called.  So I think they are not
> included in the dumped Emacs.  Perhaps some other char-table is the
> culprit.  Maybe Vchar_unify_table.  I must re-read src/charset.c.

I tried taking some heap statistics during loadup on Mac OS X.  It
would be interesting to focus on the changes in the numbers of
1024-byte blocks and 48-byte blocks as they are major factors of the
increase/decrease.

The major increases in the used heap size are observed at the load of
international/mule-conf (564KB -> 8897KB) and
international/characters (9627KB -> 14685KB).

  Before loading international/mule-conf:
  Overall size: 9359KB; 1512 nodes malloced for 564KB (6% of capacity); largest unused: [0x1855e00-7848KB]
  # of 1024-byte blocks: 193	# of 48-byte blocks: 959

  After loading international/mule-conf:
  Overall size: 9623KB; 18028 nodes malloced for 8897KB (92% of capacity); largest unused: [0xe89920-473KB]
  # of 1024-byte blocks: 7858	# of 48-byte blocks: 9035


  Before loading international/characters:
  Overall size: 18123KB; 19500 nodes malloced for 9627KB (53% of capacity); largest unused: [0x2041e00-7928KB]
  # of 1024-byte blocks: 8106	# of 48-byte blocks: 9969

  After loading international/characters:
  Overall size: 23243KB; 102539 nodes malloced for 14685KB (63% of capacity); largest unused: [0x2155800-6825KB]
  # of 1024-byte blocks: 9358	# of 48-byte blocks: 91721

The clear-charset-maps call followed by garbage-collect surely reduces
the size of the used area (15958KB -> 9188KB), but the unused area is
not returned to the system at least on Mac OS X (i.e., "Overall size"
does not change).

  Before clear-charset-maps:
  Overall size: 23703KB; 105317 nodes malloced for 15958KB (67% of capacity); largest unused: [0x222c000-5967KB]
  # of 1024-byte blocks: 9865	# of 48-byte blocks: 92577

  After clear-charset-maps with garbage-collect:
  Overall size: 23703KB; 91717 nodes malloced for 9188KB (38% of capacity); largest unused: [0x222c000-5967KB]
  # of 1024-byte blocks: 3484	# of 48-byte blocks: 85777


FYI, below is the final result of Emacs 22.2.50.

  Overall size: 10023KB; 18828 nodes malloced for 3319KB (33% of capacity); largest unused: [0x19ae000-6471KB]
  # of 1024-byte blocks: 1245	# of 48-byte blocks: 15510

				     YAMAMOTO Mitsuharu
				mituharu@math.s.chiba-u.ac.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-17  1:52                   ` YAMAMOTO Mitsuharu
@ 2008-05-19  1:45                     ` Kenichi Handa
  0 siblings, 0 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-05-19  1:45 UTC (permalink / raw)
  To: YAMAMOTO Mitsuharu; +Cc: dann, jasonr, rms, evilborisnet, emacs-devel

In article <wlabip66ji.wl%mituharu@math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes:

> The clear-charset-maps call followed by garbage-collect surely reduces
> the size of the used area (15958KB -> 9188KB), but the unused area is
> not returned to the system at least on Mac OS X (i.e., "Overall size"
> does not change).

>   Before clear-charset-maps:
>   Overall size: 23703KB; 105317 nodes malloced for 15958KB (67% of capacity); largest unused: [0x222c000-5967KB]
>   # of 1024-byte blocks: 9865	# of 48-byte blocks: 92577

>   After clear-charset-maps with garbage-collect:
>   Overall size: 23703KB; 91717 nodes malloced for 9188KB (38% of capacity); largest unused: [0x222c000-5967KB]
>   # of 1024-byte blocks: 3484	# of 48-byte blocks: 85777

Could someone tell me how to do the same check on GNU/Linux system?

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-05-17  0:56                 ` Kenichi Handa
  2008-05-17  1:52                   ` YAMAMOTO Mitsuharu
@ 2008-10-30 10:18                   ` Emanuele Giaquinta
  2008-10-30 21:22                     ` Eli Zaretskii
  2008-10-31  3:55                     ` Richard M. Stallman
  1 sibling, 2 replies; 86+ messages in thread
From: Emanuele Giaquinta @ 2008-10-30 10:18 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet, Jason Rumney

On Sat, May 17, 2008 at 09:56:58AM +0900, Kenichi Handa wrote:

> In article <482D8435.6060407@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:
> 
> > How big are the data structures holding all the unicode character info 
> > and translation tables for encodings?
> 
> I have not yet calculated them.  Each CJK charset defined by
> a map need a char-table of encoding and a vector for
> decoding.
> 
> > At one point you said you wanted 
> > to make the unicode tables be loaded on demand but AFAIK they are 
> > currently dumped into the emacs executable. Could that account for a 
> > large proportion of the size increase?
> 
> They are surely loaded on demand.  And, at the end of the
> dumping process, by calling clear-charset-maps, all loaded
> maps are set to nil, and then garbage-collect is called.  So
> I think they are not included in the dumped Emacs.  Perhaps
> some other char-table is the culprit.  Maybe
> Vchar_unify_table.  I must re-read src/charset.c.

If I comment the load_charset_map_from_file call in unify_charset the
data segment size is back to normal.

Emanuele




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-30 10:18                   ` Emanuele Giaquinta
@ 2008-10-30 21:22                     ` Eli Zaretskii
  2008-10-30 21:42                       ` Stefan Monnier
  2008-10-31  3:55                     ` Richard M. Stallman
  1 sibling, 1 reply; 86+ messages in thread
From: Eli Zaretskii @ 2008-10-30 21:22 UTC (permalink / raw)
  To: Emanuele Giaquinta; +Cc: rms, handa, jasonr, dann, evilborisnet, emacs-devel

> Date: Thu, 30 Oct 2008 11:18:19 +0100
> From: Emanuele Giaquinta <emanuele.giaquinta@gmail.com>
> Cc: dann@ics.uci.edu, emacs-devel@gnu.org, rms@gnu.org,
> 	evilborisnet@netscape.net, Jason Rumney <jasonr@gnu.org>
> 
> > They are surely loaded on demand.  And, at the end of the
> > dumping process, by calling clear-charset-maps, all loaded
> > maps are set to nil, and then garbage-collect is called.  So
> > I think they are not included in the dumped Emacs.  Perhaps
> > some other char-table is the culprit.  Maybe
> > Vchar_unify_table.  I must re-read src/charset.c.
> 
> If I comment the load_charset_map_from_file call in unify_charset the
> data segment size is back to normal.

Are we sure GC after setting the maps to nil always resets the data
segment size back to what it was before loading those maps?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-30 21:22                     ` Eli Zaretskii
@ 2008-10-30 21:42                       ` Stefan Monnier
  0 siblings, 0 replies; 86+ messages in thread
From: Stefan Monnier @ 2008-10-30 21:42 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: rms, Emanuele Giaquinta, handa, jasonr, dann, evilborisnet,
	emacs-devel

>> > They are surely loaded on demand.  And, at the end of the
>> > dumping process, by calling clear-charset-maps, all loaded
>> > maps are set to nil, and then garbage-collect is called.  So
>> > I think they are not included in the dumped Emacs.  Perhaps
>> > some other char-table is the culprit.  Maybe
>> > Vchar_unify_table.  I must re-read src/charset.c.
>> 
>> If I comment the load_charset_map_from_file call in unify_charset the
>> data segment size is back to normal.

> Are we sure GC after setting the maps to nil always resets the data
> segment size back to what it was before loading those maps?

Given the above evidence, I'd say we're pretty sure it doesn't.


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-30 10:18                   ` Emanuele Giaquinta
  2008-10-30 21:22                     ` Eli Zaretskii
@ 2008-10-31  3:55                     ` Richard M. Stallman
  2008-10-31  5:29                       ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Richard M. Stallman @ 2008-10-31  3:55 UTC (permalink / raw)
  To: Emanuele Giaquinta; +Cc: jasonr, dann, emacs-devel, evilborisnet, handa

    If I comment the load_charset_map_from_file call in unify_charset the
    data segment size is back to normal.

Although these are loaded "on demand", perhaps something "demands" them
at build time.

How about if you try running temacs
(to build and dump emacs)
under GDB, with a breakpoint there?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  3:55                     ` Richard M. Stallman
@ 2008-10-31  5:29                       ` Kenichi Handa
  2008-10-31  6:32                         ` Chong Yidong
                                           ` (3 more replies)
  0 siblings, 4 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-10-31  5:29 UTC (permalink / raw)
  To: rms; +Cc: jasonr, dann, evilborisnet, emanuele.giaquinta, emacs-devel

In article <E1Kvl71-0002pv-Pt@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     If I comment the load_charset_map_from_file call in unify_charset the
>     data segment size is back to normal.

> Although these are loaded "on demand", perhaps something "demands" them
> at build time.

It's not that simple.  This is the strategy of the charset
map loading mechanism.  I took that approach expecting that
char-tables that are garbage-collected before dumping are
not in the dumped file.

(0) At first, Emacs assigns a unique linear character code
    space in upper Unicode area (#x110000-) to each big
    character set (e.g. GB, JIS, KSC) (*see the note at the
    tail).  The decoding of a character of a specific
    charset into this area is quite fast (done just by a few
    steps of arithmetic calculation).  Encoding is the same
    too.

(1) While building Emacs, when unify-charset is called, we
    update two char-tables Vchar_unify_table, and
    Vchar_unified_charset_table.  The former maps a
    character in the above upper area to Unicode area, and
    the latter maps the character to charset symbol.
    Unify-charset also builds deunifier char-table for each
    charater set that maps a character in Unicode area to
    the upper area that is unique to each charset.

    So at this time, the full maps is build.

(2) Just before dumping, clear-charset-maps is called.  This
    function sets all char-tables built in (1) (except for
    Vchar_unified_charset_table) to nil.  Then set
    Vchar_unify_table to Vchar_unified_charset_table, and
    set Vchar_unified_charset_table to nil.

    Then, garbage-collect is called.  After that, the living
    char-table is Vchar_unify_table only, and the contents
    is not that big because it maps upper area characters to
    charset, and each charset has linear upper area, thus
    most succeeding charaters have the same value.

(3) When the dumped Emacs runs, at the time of
    decoding/encoding charsets that are unified as above, by
    checking if the value of Vchar_unify_table for a
    character is symbol or not, Emacs knows whether it has
    to load the mapping table again or not.

    So, that way, Emacs loads maps on demand.


*Note:

The reason Emacs assigns those linear area is because such
big charsets tend to have their own private use area, and we
must keep a unique characte code for them.  Those private
characters are decoded and encoded without being mapped to
Unicode are.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  5:29                       ` Kenichi Handa
@ 2008-10-31  6:32                         ` Chong Yidong
  2008-10-31  7:32                           ` Kenichi Handa
  2008-10-31 10:41                           ` YAMAMOTO Mitsuharu
  2008-10-31 15:07                         ` Dan Nicolaescu
                                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 86+ messages in thread
From: Chong Yidong @ 2008-10-31  6:32 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr

Kenichi Handa <handa@m17n.org> writes:

> (2) Just before dumping, clear-charset-maps is called.  This
>     function sets all char-tables built in (1) (except for
>     Vchar_unified_charset_table) to nil.  Then set
>     Vchar_unify_table to Vchar_unified_charset_table, and
>     set Vchar_unified_charset_table to nil.
>
>     Then, garbage-collect is called.  After that, the living
>     char-table is Vchar_unify_table only, and the contents
>     is not that big because it maps upper area characters to
>     charset, and each charset has linear upper area, thus
>     most succeeding charaters have the same value.

Is it possible to verify whether the garbage collection indeed reaps
this data?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  6:32                         ` Chong Yidong
@ 2008-10-31  7:32                           ` Kenichi Handa
  2008-10-31 10:09                             ` Eli Zaretskii
  2008-10-31 10:41                           ` YAMAMOTO Mitsuharu
  1 sibling, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-10-31  7:32 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, jasonr, dann, evilborisnet, emacs-devel

In article <87zlklff5a.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > (2) Just before dumping, clear-charset-maps is called.  This
> >     function sets all char-tables built in (1) (except for
> >     Vchar_unified_charset_table) to nil.  Then set
> >     Vchar_unify_table to Vchar_unified_charset_table, and
> >     set Vchar_unified_charset_table to nil.
> >
> >     Then, garbage-collect is called.  After that, the living
> >     char-table is Vchar_unify_table only, and the contents
> >     is not that big because it maps upper area characters to
> >     charset, and each charset has linear upper area, thus
> >     most succeeding charaters have the same value.

> Is it possible to verify whether the garbage collection indeed reaps
> this data?

To verify that, I tried to run temacs under gdb, but I got
this error.

(gdb) run -batch -l loadup dump
Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump
[Thread debugging using libthread_db enabled]
[New process 10730]
Executing new program: /usr/local/work/emacs/src/temacs
warning: Cannot initialize thread debugging library: generic error
warning: Cannot initialize thread debugging library: generic error
[New process 10730]
[Thread debugging using libthread_db enabled]
Debugger segmentation fault

Do you know about this error?  Is this a bug of gdb?  A few
months ago, I didn't see this error on running temacs under
gdb.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  7:32                           ` Kenichi Handa
@ 2008-10-31 10:09                             ` Eli Zaretskii
  2008-10-31 12:33                               ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa
  2008-10-31 12:35                               ` size of emacs executable after unicode merge Stephen Berman
  0 siblings, 2 replies; 86+ messages in thread
From: Eli Zaretskii @ 2008-10-31 10:09 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet,
	emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Fri, 31 Oct 2008 16:32:26 +0900
> Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, jasonr@gnu.org, dann@ics.uci.edu,
> 	evilborisnet@netscape.net, emacs-devel@gnu.org
> 
> > Is it possible to verify whether the garbage collection indeed reaps
> > this data?
> 
> To verify that, I tried to run temacs under gdb, but I got
> this error.
> 
> (gdb) run -batch -l loadup dump
> Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump
> [Thread debugging using libthread_db enabled]
> [New process 10730]
> Executing new program: /usr/local/work/emacs/src/temacs
> warning: Cannot initialize thread debugging library: generic error
> warning: Cannot initialize thread debugging library: generic error
> [New process 10730]
> [Thread debugging using libthread_db enabled]
> Debugger segmentation fault
> 
> Do you know about this error?  Is this a bug of gdb?  A few
> months ago, I didn't see this error on running temacs under
> gdb.

On a GNU/Linux system, with today's CVS configured as shown below, and
with GDB 6.7.1, I don't get this problem.

Why is libthread_db being used? does your build use GTK+ or some other
library that uses multithreading?

Here's how my CVS Emacs was configured:

    ./configure --with-jpeg=no --with-png=no --with-gif=no --with-tiff=no

   [...]

   Configured for `x86_64-unknown-linux-gnu'.

     Where should the build process find the source code?    [...]/eliz/emacs.cvs/emacs
     What operating system and machine description files should Emacs use?
	   `s/gnu-linux.h' and `m/amdx86-64.h'
     What compiler should emacs be built with?               gcc -g -O2 -Wno-pointer-sign
     Should Emacs use the GNU version of malloc?             yes
	 (Using Doug Lea's new malloc from the GNU C Library.)
     Should Emacs use a relocating allocator for buffers?    yes
     Should Emacs use mmap(2) for buffer allocation?         no
     What window system should Emacs use?                    x11
     What toolkit should Emacs use?                          LUCID
     Where do we find X Windows header files?                Standard dirs
     Where do we find X Windows libraries?                   /usr/X11R6/lib
     Does Emacs use -lXaw3d?                                 no
     Does Emacs use -lXpm?                                   yes
     Does Emacs use -ljpeg?                                  no
     Does Emacs use -ltiff?                                  no
     Does Emacs use a gif library?                           no
     Does Emacs use -lpng?                                   no
     Does Emacs use -lrsvg-2?                                no
     Does Emacs use -lgpm?                                   yes
     Does Emacs use -ldbus?                                  no
     Does Emacs use -lfreetype?                              no
     Does Emacs use -lm17n-flt?                              no
     Does Emacs use -lotf?                                   no
     Does Emacs use -lxft?                                   no
     Does Emacs use toolkit scroll bars?                     no




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  6:32                         ` Chong Yidong
  2008-10-31  7:32                           ` Kenichi Handa
@ 2008-10-31 10:41                           ` YAMAMOTO Mitsuharu
  1 sibling, 0 replies; 86+ messages in thread
From: YAMAMOTO Mitsuharu @ 2008-10-31 10:41 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, dann,
	evilborisnet, jasonr

>>>>> On Fri, 31 Oct 2008 02:32:49 -0400, Chong Yidong <cyd@stupidchicken.com> said:

> Is it possible to verify whether the garbage collection indeed reaps
> this data?

Put it in some weak hash table and check if it survives GC?

Anyway, an experiment on Mac OS X (*1) shows that clear-charset-maps
followed by GC actually collects some amount of data in heap (~7MB),
but they are not returned to the system, at least with its malloc
implementation.

*1: http://lists.gnu.org/archive/html/emacs-devel/2008-05/msg01049.html

				     YAMAMOTO Mitsuharu
				mituharu@math.s.chiba-u.ac.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* gdb error [Re: size of emacs executable after unicode merge]
  2008-10-31 10:09                             ` Eli Zaretskii
@ 2008-10-31 12:33                               ` Kenichi Handa
  2008-10-31 14:28                                 ` Eli Zaretskii
  2008-10-31 12:35                               ` size of emacs executable after unicode merge Stephen Berman
  1 sibling, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-10-31 12:33 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet,
	emacs-devel

In article <umyglvzy8.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > (gdb) run -batch -l loadup dump
> > Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump
> > [Thread debugging using libthread_db enabled]
> > [New process 10730]
> > Executing new program: /usr/local/work/emacs/src/temacs
> > warning: Cannot initialize thread debugging library: generic error
> > warning: Cannot initialize thread debugging library: generic error
> > [New process 10730]
> > [Thread debugging using libthread_db enabled]
> > Debugger segmentation fault
> > 
> > Do you know about this error?  Is this a bug of gdb?  A few
> > months ago, I didn't see this error on running temacs under
> > gdb.

> On a GNU/Linux system, with today's CVS configured as shown below, and
> with GDB 6.7.1, I don't get this problem.

My gdb is "GNU gdb 6.8-debian", and my system is debian
testing (kernel is 2.6.26-1-686).  Emacs is configured with
no option as below:

Configured for `i686-pc-linux-gnu'.

  Where should the build process find the source code?    /usr/local/work/emacs
  What operating system and machine description files should Emacs use?
        `s/gnu-linux.h' and `m/intel386.h'
  What compiler should emacs be built with?               gcc -g -O2 -Wno-pointer-sign 
  Should Emacs use the GNU version of malloc?             yes
      (Using Doug Lea's new malloc from the GNU C Library.)
  Should Emacs use a relocating allocator for buffers?    yes
  Should Emacs use mmap(2) for buffer allocation?         no
  What window system should Emacs use?                    x11
  What toolkit should Emacs use?                          GTK
  Where do we find X Windows header files?                Standard dirs
  Where do we find X Windows libraries?                   Standard dirs
  Does Emacs use -lXaw3d?                                 no
  Does Emacs use -lXpm?                                   yes
  Does Emacs use -ljpeg?                                  yes
  Does Emacs use -ltiff?                                  yes
  Does Emacs use a gif library?                           yes -lgif
  Does Emacs use -lpng?                                   yes
  Does Emacs use -lrsvg-2?                                no
  Does Emacs use -lgpm?                                   no
  Does Emacs use -ldbus?                                  yes
  Does Emacs use -lfreetype?                              yes
  Does Emacs use -lm17n-flt?                              yes
  Does Emacs use -lotf?                                   yes
  Does Emacs use -lxft?                                   yes
  Does Emacs use toolkit scroll bars?                     yes

> Why is libthread_db being used? does your build use GTK+ or some other
> library that uses multithreading?

As shown above, yes my build uses GTK+.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31 10:09                             ` Eli Zaretskii
  2008-10-31 12:33                               ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa
@ 2008-10-31 12:35                               ` Stephen Berman
  2008-11-21 12:32                                 ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Stephen Berman @ 2008-10-31 12:35 UTC (permalink / raw)
  To: emacs-devel

On Fri, 31 Oct 2008 12:09:03 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Kenichi Handa <handa@m17n.org>
>> Date: Fri, 31 Oct 2008 16:32:26 +0900
>> Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, jasonr@gnu.org, dann@ics.uci.edu,
>> 	evilborisnet@netscape.net, emacs-devel@gnu.org
>> 
>> > Is it possible to verify whether the garbage collection indeed reaps
>> > this data?
>> 
>> To verify that, I tried to run temacs under gdb, but I got
>> this error.
>> 
>> (gdb) run -batch -l loadup dump
>> Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump
>> [Thread debugging using libthread_db enabled]
>> [New process 10730]
>> Executing new program: /usr/local/work/emacs/src/temacs
>> warning: Cannot initialize thread debugging library: generic error
>> warning: Cannot initialize thread debugging library: generic error
>> [New process 10730]
>> [Thread debugging using libthread_db enabled]
>> Debugger segmentation fault
>> 
>> Do you know about this error?  Is this a bug of gdb?  A few
>> months ago, I didn't see this error on running temacs under
>> gdb.
>
> On a GNU/Linux system, with today's CVS configured as shown below, and
> with GDB 6.7.1, I don't get this problem.
>
> Why is libthread_db being used? does your build use GTK+ or some other
> library that uses multithreading?

I also get the same gdb segfault as Handa-san with temacs from GNU Emacs
23.0.60.12 (i686-pc-linux-gnu, GTK+ Version 2.12.9) of 2008-10-25 on
escher, built with the default configuration.  My gdb is version 6.8.

Steve Berman





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: gdb error [Re: size of emacs executable after unicode merge]
  2008-10-31 12:33                               ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa
@ 2008-10-31 14:28                                 ` Eli Zaretskii
  0 siblings, 0 replies; 86+ messages in thread
From: Eli Zaretskii @ 2008-10-31 14:28 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet,
	emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: rms@gnu.org, emanuele.giaquinta@gmail.com, cyd@stupidchicken.com,
>         jasonr@gnu.org, dann@ics.uci.edu, evilborisnet@netscape.net,
>         emacs-devel@gnu.org
> Date: Fri, 31 Oct 2008 21:33:20 +0900
> 
> > Why is libthread_db being used? does your build use GTK+ or some other
> > library that uses multithreading?
> 
> As shown above, yes my build uses GTK+.

So you should at least be able to reconfigure without GTK, rebuild,
and then use GDB to debug the original problem (which I think is
unrelated to GTK).

As for the GDB crash, perhaps this should be taken up with the GDB
maintainers on gdb@sources.redhat.com.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  5:29                       ` Kenichi Handa
  2008-10-31  6:32                         ` Chong Yidong
@ 2008-10-31 15:07                         ` Dan Nicolaescu
  2008-10-31 16:44                           ` Stefan Monnier
  2008-10-31 19:30                         ` Richard M. Stallman
  2008-11-09 22:43                         ` Chong Yidong
  3 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-10-31 15:07 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: evilborisnet, jasonr, rms, emanuele.giaquinta, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

  > In article <E1Kvl71-0002pv-Pt@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
  > 
  > >     If I comment the load_charset_map_from_file call in unify_charset the
  > >     data segment size is back to normal.
  > 
  > > Although these are loaded "on demand", perhaps something "demands" them
  > > at build time.
  > 
  > It's not that simple.  This is the strategy of the charset
  > map loading mechanism.  I took that approach expecting that
  > char-tables that are garbage-collected before dumping are
  > not in the dumped file.
  > 
  > (0) At first, Emacs assigns a unique linear character code
  >     space in upper Unicode area (#x110000-) to each big
  >     character set (e.g. GB, JIS, KSC) (*see the note at the
  >     tail).  The decoding of a character of a specific
  >     charset into this area is quite fast (done just by a few
  >     steps of arithmetic calculation).  Encoding is the same
  >     too.
  > 
  > (1) While building Emacs, when unify-charset is called, we
  >     update two char-tables Vchar_unify_table, and
  >     Vchar_unified_charset_table.  The former maps a
  >     character in the above upper area to Unicode area, and
  >     the latter maps the character to charset symbol.
  >     Unify-charset also builds deunifier char-table for each
  >     charater set that maps a character in Unicode area to
  >     the upper area that is unique to each charset.
  > 
  >     So at this time, the full maps is build.
  > 
  > (2) Just before dumping, clear-charset-maps is called.  This
  >     function sets all char-tables built in (1) (except for
  >     Vchar_unified_charset_table) to nil.  Then set
  >     Vchar_unify_table to Vchar_unified_charset_table, and
  >     set Vchar_unified_charset_table to nil.
  >
  >     Then, garbage-collect is called.  After that, the living
  >     char-table is Vchar_unify_table only, and the contents
  >     is not that big because it maps upper area characters to
  >     charset, and each charset has linear upper area, thus
  >     most succeeding charaters have the same value.

To allow the allocator can release pages back to the system after they
being garbage collected, you have to be sure that absolutely ALL the
data allocated can be garbage collected.   (and even then you depend on
the quirks of the platform specific malloc implementation to do it).

From the sound of the description above, it sounds like the data in
Vchar_unify_table is allocated while reading the charset data, and it is
not released after the charset data is.  So the allocator cannot release
all the pages... 
[note: this speculation based solely on your description above]

  > (3) When the dumped Emacs runs, at the time of
  >     decoding/encoding charsets that are unified as above, by
  >     checking if the value of Vchar_unify_table for a
  >     character is symbol or not, Emacs knows whether it has
  >     to load the mapping table again or not.
  > 
  >     So, that way, Emacs loads maps on demand.

So it sounds that your goal is to build Vchar_unify_table, and it is
build from static data in emacs/etc/charsets/*.  In that case, can't the
data in Vchar_unify_table be a C data structure that is build offline,
and just compiled into emacs?





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31 15:07                         ` Dan Nicolaescu
@ 2008-10-31 16:44                           ` Stefan Monnier
  2008-11-04 23:09                             ` Chong Yidong
  0 siblings, 1 reply; 86+ messages in thread
From: Stefan Monnier @ 2008-10-31 16:44 UTC (permalink / raw)
  To: Dan Nicolaescu
  Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, evilborisnet,
	jasonr

> To allow the allocator can release pages back to the system after they
> being garbage collected, you have to be sure that absolutely ALL the
> data allocated can be garbage collected.   (and even then you depend on
> the quirks of the platform specific malloc implementation to do it).

Indeed.  Some malloc implementation just never return memory to the
system, and all others only do it in some cases but not all.  And in our
case we may not even be able to call `free' because some live data is
interspersed within the sea of dead data.

> So it sounds that your goal is to build Vchar_unify_table, and it is
> build from static data in emacs/etc/charsets/*.  In that case, can't the
> data in Vchar_unify_table be a C data structure that is build offline,
> and just compiled into emacs?

Maybe simpler would be to build this table as we do now, then print it
into a file.  Then the "dump" doesn't need to build the table, it can
just read it from the file.


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  5:29                       ` Kenichi Handa
  2008-10-31  6:32                         ` Chong Yidong
  2008-10-31 15:07                         ` Dan Nicolaescu
@ 2008-10-31 19:30                         ` Richard M. Stallman
  2008-11-09 22:43                         ` Chong Yidong
  3 siblings, 0 replies; 86+ messages in thread
From: Richard M. Stallman @ 2008-10-31 19:30 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: jasonr, dann, evilborisnet, emanuele.giaquinta, emacs-devel

	Then, garbage-collect is called.  After that, the living
	char-table is Vchar_unify_table only, and the contents
	is not that big because it maps upper area characters to
	charset, and each charset has linear upper area, thus
	most succeeding charaters have the same value.

GC ought to free the space that was occupied by those tables.
However, freeing the space does not necessarily reduce the size
of the Emacs memory.  It just puts freed  objects in the free list.
For instance, if the maps use cons cells, they will go on the
free list but the memory will still be in the Emacs image.

Maybe what is needed is to run a separate process to build the desired
value for char-unify-table and store it in a file, and load just that
file during building `emacs'.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31 16:44                           ` Stefan Monnier
@ 2008-11-04 23:09                             ` Chong Yidong
  2008-11-05  4:17                               ` Kenichi Handa
  0 siblings, 1 reply; 86+ messages in thread
From: Chong Yidong @ 2008-11-04 23:09 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, Dan Nicolaescu,
	Stefan Monnier, evilborisnet, jasonr

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Maybe simpler would be to build this table as we do now, then print it
> into a file.  Then the "dump" doesn't need to build the table, it can
> just read it from the file.

Handa-san, could you comment on this?

Suppose we have a char-table on file that has the correct precomputed
values for Vchar_unify_table.  At which point should Emacs load it?  In
place of mule-conf.el in loadup.el?  Or would we need to rewrite
mule-conf.el?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-04 23:09                             ` Chong Yidong
@ 2008-11-05  4:17                               ` Kenichi Handa
  2008-11-05 15:50                                 ` Stefan Monnier
  2008-11-05 22:30                                 ` Richard M. Stallman
  0 siblings, 2 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-05  4:17 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <873ai7t7fx.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Maybe simpler would be to build this table as we do now, then print it
> > into a file.  Then the "dump" doesn't need to build the table, it can
> > just read it from the file.

> Handa-san, could you comment on this?

I think it doesn't work.  Emacs needs that table and the other
mapping char-tables to decode non-ascii characters in files
that are loaded before dumping.

> Suppose we have a char-table on file that has the correct precomputed
> values for Vchar_unify_table.  At which point should Emacs load it?  In
> place of mule-conf.el in loadup.el?  Or would we need to rewrite
> mule-conf.el?

Once Emacs loads it before dumping, it occupies Emacs memory
and whether it is freed or not before dumping is
unpredictable.

If it is impossible to exlucde garbage-colleted data
(especially char-tables) from the dumpled file, it seems
that the only way is not to build those char-tables.  But it
requires rather heavy changes to files loaded before dumped.

Anoher way to avoid this problem is, I think, to have a
portable dumper.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-05  4:17                               ` Kenichi Handa
@ 2008-11-05 15:50                                 ` Stefan Monnier
  2008-11-06  7:56                                   ` Kenichi Handa
  2008-11-05 22:30                                 ` Richard M. Stallman
  1 sibling, 1 reply; 86+ messages in thread
From: Stefan Monnier @ 2008-11-05 15:50 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, dann,
	evilborisnet, jasonr

>> > Maybe simpler would be to build this table as we do now, then print it
>> > into a file.  Then the "dump" doesn't need to build the table, it can
>> > just read it from the file.
>> Handa-san, could you comment on this?
> I think it doesn't work.  Emacs needs that table and the other
> mapping char-tables to decode non-ascii characters in files
> that are loaded before dumping.

IIUC Emacs only needs those tables when loading files that use some
particular encodings.  E.g. it doesn't need it to read files using
Unicode encodings.  So couldn't we make sure all the files read during
the dump are using Utf-8 or utf-8-emacs?  Isn't that already the case
for all *.elc files?

> If it is impossible to exclude garbage-colleted data (especially
> char-tables) from the dumpled file, it seems that the only way is not
> to build those char-tables.  But it requires rather heavy changes to
> files loaded before dumped.

What kind of changes?


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-05  4:17                               ` Kenichi Handa
  2008-11-05 15:50                                 ` Stefan Monnier
@ 2008-11-05 22:30                                 ` Richard M. Stallman
  2008-11-06 11:58                                   ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-05 22:30 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

    > > Maybe simpler would be to build this table as we do now, then print it
    > > into a file.  Then the "dump" doesn't need to build the table, it can
    > > just read it from the file.

    > Handa-san, could you comment on this?

    I think it doesn't work.  Emacs needs that table and the other
    mapping char-tables to decode non-ascii characters in files
    that are loaded before dumping.

Any table which is needed by Emacs to read those files
will inevitably be include in the dumped Emacs.

Can we arrange to autoload (later) the files which need these tables?

    Anoher way to avoid this problem is, I think, to have a
    portable dumper.

Do you mean a portable replacement for unexelf and friends?
How would that help this?
I don't see the connection.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-05 15:50                                 ` Stefan Monnier
@ 2008-11-06  7:56                                   ` Kenichi Handa
  2008-11-08  2:42                                     ` Stefan Monnier
  2008-11-09 20:14                                     ` Chong Yidong
  0 siblings, 2 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-06  7:56 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: rms, emanuele.giaquinta, cyd, emacs-devel, dann, evilborisnet,
	jasonr

In article <jwvskq6w4yi.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > I think it doesn't work.  Emacs needs that table and the other
> > mapping char-tables to decode non-ascii characters in files
> > that are loaded before dumping.

> IIUC Emacs only needs those tables when loading files that use some
> particular encodings.  E.g. it doesn't need it to read files using
> Unicode encodings.

Yes.

> So couldn't we make sure all the files read during
> the dump are using Utf-8 or utf-8-emacs?  Isn't that already the case
> for all *.elc files?

Not yet, but such a change of encoding is easy.  The problem
is that lisp/international/characters.el setups syntax-table
and category-table for many characters by map-charset-chars.

Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)

To know which (Unicode) characters belongs to
chinese-gb2312, Emacs has to load a mapping table.

> > If it is impossible to exclude garbage-colleted data (especially
> > char-tables) from the dumpled file, it seems that the only way is not
> > to build those char-tables.  But it requires rather heavy changes to
> > files loaded before dumped.

> What kind of changes?

The biggest one is for characters.el to solve the above
problem.  The others are for files that uses decode-char and
encode-char (e.g. japanese.el).  We can't use them in
pre-loaded files.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-05 22:30                                 ` Richard M. Stallman
@ 2008-11-06 11:58                                   ` Kenichi Handa
  2008-11-07 12:39                                     ` Richard M. Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-06 11:58 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <E1Kxqtd-0002y0-Hq@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     Anoher way to avoid this problem is, I think, to have a
>     portable dumper.

> Do you mean a portable replacement for unexelf and friends?

Yes.

> How would that help this?
> I don't see the connection.

To my understanding, the portable dumper implemented for
Emacs relocates Lisp objects to make the dump file (the file
containing Lisp ojbectes that are reloaded at startup time)
as small as possible.  That means the garbage data are
excluded from the dump file.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-06 11:58                                   ` Kenichi Handa
@ 2008-11-07 12:39                                     ` Richard M. Stallman
  2008-11-07 13:29                                       ` Stephen J. Turnbull
  0 siblings, 1 reply; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-07 12:39 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

    To my understanding, the portable dumper implemented for
    Emacs relocates Lisp objects to make the dump file (the file
    containing Lisp ojbectes that are reloaded at startup time)
    as small as possible.  That means the garbage data are
    excluded from the dump file.

I see.  You mean a relocating dumper.  Yes, that could help.
But that alone does not guarantee success.

The dumper also needs to dump the non-Lisp memory blocks that
Emacs allocates.  They are scattered among the memory blocks that
contain Lisp objects, and they can't be relocated in this way.
Thus, the total memory size can only be reduced down to the
last non-Lisp memory block.  That is likely to eliminate
most of the benefit.

Perhaps with some other special scheme to segregate Lisp blocks
and non-Lisp blocks into two special heaps, this problem could
be avoided.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-07 12:39                                     ` Richard M. Stallman
@ 2008-11-07 13:29                                       ` Stephen J. Turnbull
  2008-11-07 21:15                                         ` Richard M. Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Stephen J. Turnbull @ 2008-11-07 13:29 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, Kenichi Handa, cyd, emacs-devel, dann,
	monnier, evilborisnet, jasonr

Richard M. Stallman writes:

 > The dumper also needs to dump the non-Lisp memory blocks that
 > Emacs allocates.  They are scattered among the memory blocks that
 > contain Lisp objects, and they can't be relocated in this way.

Maybe not at present in Emacs, but as far as I know those blocks *are*
relocated by XEmacs's portable dumper.  The people to ask are Olivier
Galibert (olivier.galibert@xemacs.org) and Marcus Crestani
(crestani@xemacs.org).  Ben Wing and Kyle Jones also have done a lot
of work related to the portable dumper in XEmacs, but they've been
inactive for years.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-07 13:29                                       ` Stephen J. Turnbull
@ 2008-11-07 21:15                                         ` Richard M. Stallman
  2008-11-08  4:00                                           ` Stephen J. Turnbull
  2008-11-08  4:19                                           ` Stefan Monnier
  0 siblings, 2 replies; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-07 21:15 UTC (permalink / raw)
  To: Stephen J. Turnbull
  Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, monnier,
	evilborisnet, jasonr

    Maybe not at present in Emacs, but as far as I know those blocks *are*
    relocated by XEmacs's portable dumper.

How is it possible to relocate all blocks of memory?
The C library allocates some, and Emacs can't tell where the pointers
to them are kept.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-06  7:56                                   ` Kenichi Handa
@ 2008-11-08  2:42                                     ` Stefan Monnier
  2008-11-08  4:10                                       ` Chong Yidong
  2008-11-08 10:30                                       ` Dan Nicolaescu
  2008-11-09 20:14                                     ` Chong Yidong
  1 sibling, 2 replies; 86+ messages in thread
From: Stefan Monnier @ 2008-11-08  2:42 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, cyd, emacs-devel, dann, evilborisnet,
	jasonr

>> So couldn't we make sure all the files read during the dump are using
>> Utf-8 or utf-8-emacs?  Isn't that already the case for all
>> *.elc files?

> Not yet, but such a change of encoding is easy.  The problem
> is that lisp/international/characters.el setups syntax-table
> and category-table for many characters by map-charset-chars.

> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)

> To know which (Unicode) characters belongs to
> chinese-gb2312, Emacs has to load a mapping table.

I guess that means we need to precompute the category table and other
tables populated in that way, so we can just load them during the dump
without needing to populate the other maps.



        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-07 21:15                                         ` Richard M. Stallman
@ 2008-11-08  4:00                                           ` Stephen J. Turnbull
  2008-11-08  4:19                                           ` Stefan Monnier
  1 sibling, 0 replies; 86+ messages in thread
From: Stephen J. Turnbull @ 2008-11-08  4:00 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, monnier,
	evilborisnet, jasonr

Richard M. Stallman writes:
 >     Maybe not at present in Emacs, but as far as I know those blocks *are*
 >     relocated by XEmacs's portable dumper.
 > 
 > The C library allocates some, and Emacs can't tell where the
 > pointers to them are kept.

So what?  The XEmacs portable dumper does not unexec, so those are
lost anyway AIUI.

 > How is it possible to relocate all blocks of memory?

Why ask me?  I already said I don't know, and gave names and addresses
of those who should be able to give useful answers about whether the
XEmacs technology leads to recovery of memory in practice, or could.

Please ask the experts.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-08  2:42                                     ` Stefan Monnier
@ 2008-11-08  4:10                                       ` Chong Yidong
  2008-11-08  9:19                                         ` Eli Zaretskii
  2008-11-09  0:27                                         ` Richard M. Stallman
  2008-11-08 10:30                                       ` Dan Nicolaescu
  1 sibling, 2 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-08  4:10 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, dann,
	evilborisnet, jasonr

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> The problem is that lisp/international/characters.el setups
>> syntax-table and category-table for many characters by
>> map-charset-chars.
>>
>> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
>
>> To know which (Unicode) characters belongs to
>> chinese-gb2312, Emacs has to load a mapping table.
>
> I guess that means we need to precompute the category table and other
> tables populated in that way, so we can just load them during the dump
> without needing to populate the other maps.

That doesn't sound like it can be done within the timeframe for the 23.1
release.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-07 21:15                                         ` Richard M. Stallman
  2008-11-08  4:00                                           ` Stephen J. Turnbull
@ 2008-11-08  4:19                                           ` Stefan Monnier
  1 sibling, 0 replies; 86+ messages in thread
From: Stefan Monnier @ 2008-11-08  4:19 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, evilborisnet,
	Stephen J. Turnbull, jasonr

>     Maybe not at present in Emacs, but as far as I know those blocks *are*
>     relocated by XEmacs's portable dumper.

> How is it possible to relocate all blocks of memory?
> The C library allocates some, and Emacs can't tell where the pointers
> to them are kept.

Maybe the source code needs to be changed to keep track of them.
Kind of like the GCPRO thingies, maybe?


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-08  4:10                                       ` Chong Yidong
@ 2008-11-08  9:19                                         ` Eli Zaretskii
  2008-11-09  0:27                                         ` Richard M. Stallman
  1 sibling, 0 replies; 86+ messages in thread
From: Eli Zaretskii @ 2008-11-08  9:19 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel, monnier, evilborisnet, handa

> From: Chong Yidong <cyd@stupidchicken.com>
> Date: Fri, 07 Nov 2008 23:10:06 -0500
> Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, Kenichi Handa <handa@m17n.org>,
> 	emacs-devel@gnu.org, dann@ics.uci.edu, evilborisnet@netscape.net,
> 	jasonr@gnu.org
> 
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> 
> >> The problem is that lisp/international/characters.el setups
> >> syntax-table and category-table for many characters by
> >> map-charset-chars.
> >>
> >> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
> >
> >> To know which (Unicode) characters belongs to
> >> chinese-gb2312, Emacs has to load a mapping table.
> >
> > I guess that means we need to precompute the category table and other
> > tables populated in that way, so we can just load them during the dump
> > without needing to populate the other maps.
> 
> That doesn't sound like it can be done within the timeframe for the 23.1
> release.

I'm not sure; it's not like 23.1 release date is tomorrow or next
week.

In any case, if we decide not to solve this for Emacs 23.1, we should
have an entry in PROBLEMS about it.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-08  2:42                                     ` Stefan Monnier
  2008-11-08  4:10                                       ` Chong Yidong
@ 2008-11-08 10:30                                       ` Dan Nicolaescu
  1 sibling, 0 replies; 86+ messages in thread
From: Dan Nicolaescu @ 2008-11-08 10:30 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: rms, emanuele.giaquinta, Kenichi Handa, cyd, emacs-devel,
	evilborisnet, jasonr

Stefan Monnier <monnier@iro.umontreal.ca> writes:

  > >> So couldn't we make sure all the files read during the dump are using
  > >> Utf-8 or utf-8-emacs?  Isn't that already the case for all
  > >> *.elc files?
  > 
  > > Not yet, but such a change of encoding is easy.  The problem
  > > is that lisp/international/characters.el setups syntax-table
  > > and category-table for many characters by map-charset-chars.
  > 
  > > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
  > 
  > > To know which (Unicode) characters belongs to
  > > chinese-gb2312, Emacs has to load a mapping table.
  > 
  > I guess that means we need to precompute the category table and other
  > tables populated in that way, so we can just load them during the dump
  > without needing to populate the other maps.

And precomputing should be easy: just prin1 the current value of
Vchar_unify_table, put that value in a new file, check the new file in
CVS and add code to load it a the right time.

Then just disable the current code that computes Vchar_unify_table.

Is there any reason that the above would not work?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-08  4:10                                       ` Chong Yidong
  2008-11-08  9:19                                         ` Eli Zaretskii
@ 2008-11-09  0:27                                         ` Richard M. Stallman
  2008-11-09  6:29                                           ` Dan Nicolaescu
  1 sibling, 1 reply; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-09  0:27 UTC (permalink / raw)
  To: Chong Yidong
  Cc: emanuele.giaquinta, handa, emacs-devel, dann, monnier,
	evilborisnet, jasonr

    > I guess that means we need to precompute the category table and other
    > tables populated in that way, so we can just load them during the dump
    > without needing to populate the other maps.

    That doesn't sound like it can be done within the timeframe for the 23.1
    release.

How big is the excess size caused by this problem?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09  0:27                                         ` Richard M. Stallman
@ 2008-11-09  6:29                                           ` Dan Nicolaescu
  2008-11-09 17:11                                             ` Richard M. Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-11-09  6:29 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, handa, Chong Yidong, emacs-devel, monnier,
	evilborisnet, jasonr

"Richard M. Stallman" <rms@gnu.org> writes:

  >     > I guess that means we need to precompute the category table and other
  >     > tables populated in that way, so we can just load them during the dump
  >     > without needing to populate the other maps.
  > 
  >     That doesn't sound like it can be done within the timeframe for the 23.1
  >     release.
  > 
  > How big is the excess size caused by this problem?

The data segment size of the emacs binary went from 4MB before the
unicode-2 merge to 11MB after the merge.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09  6:29                                           ` Dan Nicolaescu
@ 2008-11-09 17:11                                             ` Richard M. Stallman
  2008-11-10  1:24                                               ` Stefan Monnier
  0 siblings, 1 reply; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-09 17:11 UTC (permalink / raw)
  To: Dan Nicolaescu
  Cc: emanuele.giaquinta, handa, cyd, emacs-devel, monnier,
	evilborisnet, jasonr

    The data segment size of the emacs binary went from 4MB before the
    unicode-2 merge to 11MB after the merge.

Fixing that is worth a delay.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-06  7:56                                   ` Kenichi Handa
  2008-11-08  2:42                                     ` Stefan Monnier
@ 2008-11-09 20:14                                     ` Chong Yidong
  2008-11-10  1:59                                       ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Chong Yidong @ 2008-11-09 20:14 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, Stefan Monnier,
	evilborisnet, jasonr

Kenichi Handa <handa@m17n.org> writes:

> The problem is that lisp/international/characters.el setups
> syntax-table and category-table for many characters by
> map-charset-chars.
>
> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
>
> To know which (Unicode) characters belongs to
> chinese-gb2312, Emacs has to load a mapping table.

Could you try to describe what needs to be done in more detail?  That
way, even if you don't have time to implement this, someone else might
be able to take a stab at it.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31  5:29                       ` Kenichi Handa
                                           ` (2 preceding siblings ...)
  2008-10-31 19:30                         ` Richard M. Stallman
@ 2008-11-09 22:43                         ` Chong Yidong
  2008-11-09 22:57                           ` Chong Yidong
  2008-11-10  1:26                           ` Kenichi Handa
  3 siblings, 2 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-09 22:43 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr

Kenichi Handa <handa@m17n.org> writes:

> (2) Just before dumping, clear-charset-maps is called.  This
>     function sets all char-tables built in (1) (except for
>     Vchar_unified_charset_table) to nil.  Then set
>     Vchar_unify_table to Vchar_unified_charset_table, and
>     set Vchar_unified_charset_table to nil.
>
>     Then, garbage-collect is called.  After that, the living
>     char-table is Vchar_unify_table only, and the contents
>     is not that big because it maps upper area characters to
>     charset, and each charset has linear upper area, thus
>     most succeeding charaters have the same value.
>
> (3) When the dumped Emacs runs, at the time of
>     decoding/encoding charsets that are unified as above, by
>     checking if the value of Vchar_unify_table for a
>     character is symbol or not, Emacs knows whether it has
>     to load the mapping table again or not.

I was looking through charset.c and character.c, and it seems to me that
Vchar_unify_table is not used by anything.  Its contents are altered at
several placed in the code, but it looks like it is never consulted for
anything.  What am I missing?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09 22:43                         ` Chong Yidong
@ 2008-11-09 22:57                           ` Chong Yidong
  2008-11-10  1:28                             ` Kenichi Handa
  2008-11-10  1:26                           ` Kenichi Handa
  1 sibling, 1 reply; 86+ messages in thread
From: Chong Yidong @ 2008-11-09 22:57 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, jasonr, dann, evilborisnet, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> I was looking through charset.c and character.c, and it seems to me that
> Vchar_unify_table is not used by anything.  Its contents are altered at
> several placed in the code, but it looks like it is never consulted for
> anything.  What am I missing?

Similarly, Vchar_unified_charset_table seems to be not consulted
anywhere in the code.  Removing these two variables from the C code
seems to make no difference to Emacs as far as I can tell (other than
shrinking the size of the executable by 2M).

Could it be that this part:

>> (3) When the dumped Emacs runs, at the time of
>>     decoding/encoding charsets that are unified as above, by
>>     checking if the value of Vchar_unify_table for a
>>     character is symbol or not, Emacs knows whether it has
>>     to load the mapping table again or not.

has not been implemented?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09 17:11                                             ` Richard M. Stallman
@ 2008-11-10  1:24                                               ` Stefan Monnier
  2008-11-10  1:55                                                 ` Thomas Lord
  0 siblings, 1 reply; 86+ messages in thread
From: Stefan Monnier @ 2008-11-10  1:24 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, handa, cyd, emacs-devel, Dan Nicolaescu,
	evilborisnet, jasonr

>     The data segment size of the emacs binary went from 4MB before the
>     unicode-2 merge to 11MB after the merge.
> Fixing that is worth a delay.

Note: it's not clear how much of those 7MB we can recover.  Some of them
are undoubtedly actively needed.  But, yes, clearly there are several
MBs worth of "unnecessary data", some of it is apparently not only dead
but even garbage collected.


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09 22:43                         ` Chong Yidong
  2008-11-09 22:57                           ` Chong Yidong
@ 2008-11-10  1:26                           ` Kenichi Handa
  1 sibling, 0 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-10  1:26 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr

In article <871vxkfrkk.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> I was looking through charset.c and character.c, and it seems to me that
> Vchar_unify_table is not used by anything.

It's used in the macro MAYBE_UNIFY_CHAR (in character.h) to
check if the upper area characters (#x110000-) must be
unified or not.  But, as I wrote, the memory consumed by
that char-table is rather small.  The bigger ones are
mapping tables of character sets.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09 22:57                           ` Chong Yidong
@ 2008-11-10  1:28                             ` Kenichi Handa
  2008-11-10 19:29                               ` Richard M. Stallman
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-10  1:28 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr

In article <87wsfco6cx.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Similarly, Vchar_unified_charset_table seems to be not consulted
> anywhere in the code.  Removing these two variables from the C code
> seems to make no difference to Emacs as far as I can tell (other than
> shrinking the size of the executable by 2M).

> Could it be that this part:

>>> (3) When the dumped Emacs runs, at the time of
>>> decoding/encoding charsets that are unified as above, by
>>> checking if the value of Vchar_unify_table for a
>>> character is symbol or not, Emacs knows whether it has
>>> to load the mapping table again or not.

> has not been implemented?

It's implemented in MAYBE_UNIFY_CHAR (in character.h).

---
Kenichi Handa
handa@ni.aist.go.jp






^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:24                                               ` Stefan Monnier
@ 2008-11-10  1:55                                                 ` Thomas Lord
  2008-11-11  4:37                                                   ` Chong Yidong
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Lord @ 2008-11-10  1:55 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: rms, emanuele.giaquinta, handa, cyd, emacs-devel, Dan Nicolaescu,
	evilborisnet, jasonr

On Sun, 2008-11-09 at 20:24 -0500, Stefan Monnier wrote:
> >     The data segment size of the emacs binary went from 4MB before the
> >     unicode-2 merge to 11MB after the merge.
> > Fixing that is worth a delay.
> 
> Note: it's not clear how much of those 7MB we can recover.  Some of them
> are undoubtedly actively needed.  But, yes, clearly there are several
> MBs worth of "unnecessary data", some of it is apparently not only dead
> but even garbage collected.

Are the tables in question represented as flat arrays that are "sparse"
in the sense of mostly containing homogeneous sets of values at roughly
page-sized chunks?   If so, why not just compress them a bit (e.g., make
them shallow trees).   I've had very good results taking that approach
in a past project...   (In particular, I chose the maximum depth of the
tree I was willing to tolerate then did a search to discover optimal (by
my particular metric) branching factors and arranged for the sharing of
identical sub-trees.)

-t

> 
> 
>         Stefan
> 
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-09 20:14                                     ` Chong Yidong
@ 2008-11-10  1:59                                       ` Kenichi Handa
  2008-11-10 15:18                                         ` Chong Yidong
                                                           ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-10  1:59 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <87iqqwk672.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > The problem is that lisp/international/characters.el setups
> > syntax-table and category-table for many characters by
> > map-charset-chars.
> >
> > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)
> >
> > To know which (Unicode) characters belongs to
> > chinese-gb2312, Emacs has to load a mapping table.

> Could you try to describe what needs to be done in more detail?  That
> way, even if you don't have time to implement this, someone else might
> be able to take a stab at it.

map-charset-chars calls FUNCTION (modify-category-entry in
the above case) on all characters in CHARSET.  But, to know
which characters belongs to CHARET (chinese-gb2312 in the
above case), we must consult with
"etc/charsets/GB2312.map".  The contents is something like
this:

0x2121-0x2123 0x3000
0x2124 0x30FB
0x2125 0x02C9
[...]

From this file, we know that #x3000, #x3001, #x3002, #x30FB,
#x02C9, ... belong to chinese-gb2312.

We must find a way to make map-charset-chars work without
loading that map into a char-table.

One idea is to have a single boolean vector of size #x110000
(139264 bytes), setup it for CHARSET everytime when we call
map-charset-chars for the different charset.  In that
vector, only the bit for #x3000, #x3001, #x3002, etc are 1
for chinese-gb2312.  Then map-charset-chars can know for
which characters FUNCTION must be called.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:59                                       ` Kenichi Handa
@ 2008-11-10 15:18                                         ` Chong Yidong
  2008-11-10 23:18                                         ` Chong Yidong
  2008-11-11 18:17                                         ` Chong Yidong
  2 siblings, 0 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-10 15:18 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

Kenichi Handa <handa@m17n.org> writes:

> map-charset-chars calls FUNCTION (modify-category-entry in
> the above case) on all characters in CHARSET.  But, to know
> which characters belongs to CHARET (chinese-gb2312 in the
> above case), we must consult with
> "etc/charsets/GB2312.map".  The contents is something like
> this:
>
> 0x2121-0x2123 0x3000
> 0x2124 0x30FB
> 0x2125 0x02C9
> [...]
>
> From this file, we know that #x3000, #x3001, #x3002, #x30FB,
> #x02C9, ... belong to chinese-gb2312.
>
> We must find a way to make map-charset-chars work without
> loading that map into a char-table.
>
> One idea is to have a single boolean vector of size #x110000
> (139264 bytes), setup it for CHARSET everytime when we call
> map-charset-chars for the different charset.  In that
> vector, only the bit for #x3000, #x3001, #x3002, etc are 1
> for chinese-gb2312.  Then map-charset-chars can know for
> which characters FUNCTION must be called.

Good idea.  But, is this sufficient?  Do you know whether
map-charset-chars the only thing during loadup that loads these
char-tables?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:28                             ` Kenichi Handa
@ 2008-11-10 19:29                               ` Richard M. Stallman
  0 siblings, 0 replies; 86+ messages in thread
From: Richard M. Stallman @ 2008-11-10 19:29 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, jasonr, dann, evilborisnet, emacs-devel

    >>> (3) When the dumped Emacs runs, at the time of
    >>> decoding/encoding charsets that are unified as above, by
    >>> checking if the value of Vchar_unify_table for a
    >>> character is symbol or not, Emacs knows whether it has
    >>> to load the mapping table again or not.

    > has not been implemented?

    It's implemented in MAYBE_UNIFY_CHAR (in character.h).

If it is hard to find the code which implements that feature
then I think there should be  comment near Vchar_unify_table's declaration
saying to look at that macro for the implementation of the feature.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:59                                       ` Kenichi Handa
  2008-11-10 15:18                                         ` Chong Yidong
@ 2008-11-10 23:18                                         ` Chong Yidong
  2008-11-11 18:17                                         ` Chong Yidong
  2 siblings, 0 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-10 23:18 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

One thing I've noticed from playing around with the code: a lot of the
size increase comes from this code portion in map_charset_chars:

  if (CHARSET_UNIFIED_P (charset)
      && CHAR_TABLE_P (CHARSET_DEUNIFIER (charset)))
    {
      map_char_table_for_charset (c_function, function,
				  CHARSET_DEUNIFIER (charset), arg,
				  partial ? charset : NULL, from, to);
    }

Commenting this out reduces the size of the Emacs executable by 6M.  But
AFAICT map_char_table_for_charset doesn't create any new char-tables, so
maybe our understanding of the problem isn't correct.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:55                                                 ` Thomas Lord
@ 2008-11-11  4:37                                                   ` Chong Yidong
  0 siblings, 0 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-11  4:37 UTC (permalink / raw)
  To: Thomas Lord
  Cc: rms, emanuele.giaquinta, handa, emacs-devel, Dan Nicolaescu,
	Stefan Monnier, evilborisnet, jasonr

Thomas Lord <lord@emf.net> writes:

> Are the tables in question represented as flat arrays that are "sparse"
> in the sense of mostly containing homogeneous sets of values at roughly
> page-sized chunks?   If so, why not just compress them a bit (e.g., make
> them shallow trees).

We do employ a strategy of this sort, I think.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-10  1:59                                       ` Kenichi Handa
  2008-11-10 15:18                                         ` Chong Yidong
  2008-11-10 23:18                                         ` Chong Yidong
@ 2008-11-11 18:17                                         ` Chong Yidong
  2008-11-12  6:26                                           ` Kenichi Handa
  2 siblings, 1 reply; 86+ messages in thread
From: Chong Yidong @ 2008-11-11 18:17 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

Okay, I did a bit more digging.

I think the increase in the size of the Emacs executable is not due to
the allocation of char-tables.  In particular, I've tried this
suggestion:

> One idea is to have a single boolean vector of size #x110000
> (139264 bytes), setup it for CHARSET everytime when we call
> map-charset-chars for the different charset.  In that
> vector, only the bit for #x3000, #x3001, #x3002, etc are 1
> for chinese-gb2312.  Then map-charset-chars can know for
> which characters FUNCTION must be called.

but it appears to free a negligible about of memory.

My current hypothesis is that it's the calls to modify-category-entry
and modify-syntax-entry themselves that are responsible for the increase
in space.  The extra memory is in the standard-category-table and
standard-syntax-table data structures.

OTOH, I don't any easy way to reduce this data size.  The only
possibility I can think is to set up a new mechanism to load-defer parts
of the syntax and category tables, which doesn't seem feasible for the
immediate future.

Any ideas?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-11 18:17                                         ` Chong Yidong
@ 2008-11-12  6:26                                           ` Kenichi Handa
  2008-11-13 16:33                                             ` Chong Yidong
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-12  6:26 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <873ahym8ji.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Okay, I did a bit more digging.
> I think the increase in the size of the Emacs executable is not due to
> the allocation of char-tables.  In particular, I've tried this
> suggestion:

> > One idea is to have a single boolean vector of size #x110000
> > (139264 bytes), setup it for CHARSET everytime when we call
> > map-charset-chars for the different charset.  In that
> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1
> > for chinese-gb2312.  Then map-charset-chars can know for
> > which characters FUNCTION must be called.

> but it appears to free a negligible about of memory.

But, that contradicts with this report from Yamamoto-san:

> Anyway, an experiment on Mac OS X (*1) shows that clear-charset-maps
> followed by GC actually collects some amount of data in heap (~7MB),
> but they are not returned to the system, at least with its malloc
> implementation.

Did you comment out the calls of unify-charset in
mule-conf.el and change the encoding of all preloaded *.el
files to utf-8?

---
Kenichi Handa
handa@ni.aist.go.jp

PS.  Yidong, it seems that a few mails I wrote in this
24-hour didn't go out correctly.   Did you get a mail about
the comment on map_char_table_for_charset?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-12  6:26                                           ` Kenichi Handa
@ 2008-11-13 16:33                                             ` Chong Yidong
  2008-11-14  0:48                                               ` Kenichi Handa
  2008-11-27 11:20                                               ` Kenichi Handa
  0 siblings, 2 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-13 16:33 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

Kenichi Handa <handa@m17n.org> writes:

>> > One idea is to have a single boolean vector of size #x110000
>> > (139264 bytes), setup it for CHARSET everytime when we call
>> > map-charset-chars for the different charset.  In that
>> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1
>> > for chinese-gb2312.  Then map-charset-chars can know for
>> > which characters FUNCTION must be called.
>
>> but it appears to free a negligible about of memory.
>
> Did you comment out the calls of unify-charset in
> mule-conf.el and change the encoding of all preloaded *.el
> files to utf-8?

Commenting out the calls to unify-charset does reduce the memory by
several megabytes.

But I'm not sure I understand your idea correctly.  Here's the approach
I tried, do you think it is correct?

- introduce a variable called inhibit-load-charsets-map, defaulting to
  nil

- In loadup.el, let-bind inhibit-load-charsets-map to t when loading
  characters.el.

- In map_charset_chars, when inhibit-load-charsets-map is non-nil, and
  the charset method is deferred, don't call load_charset and
  map_char_table_for_charset.  Instead, call a new function
  map_charset_chars_from_file.

- The new function map_charset_chars_from_file reads the code points and
  character codes from the charset file and applies the function
  argument passed to map_charset_chars on those codes directly.

Would this work?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-13 16:33                                             ` Chong Yidong
@ 2008-11-14  0:48                                               ` Kenichi Handa
  2008-11-27 11:20                                               ` Kenichi Handa
  1 sibling, 0 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-14  0:48 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Commenting out the calls to unify-charset does reduce the memory by
> several megabytes.

> But I'm not sure I understand your idea correctly.  Here's the approach
> I tried, do you think it is correct?

> - introduce a variable called inhibit-load-charsets-map, defaulting to
>   nil

> - In loadup.el, let-bind inhibit-load-charsets-map to t when loading
>   characters.el.

> - In map_charset_chars, when inhibit-load-charsets-map is non-nil, and
>   the charset method is deferred, don't call load_charset and
>   map_char_table_for_charset.  Instead, call a new function
>   map_charset_chars_from_file.

> - The new function map_charset_chars_from_file reads the code points and
>   character codes from the charset file and applies the function
>   argument passed to map_charset_chars on those codes directly.

> Would this work?

I think so, but perhaps it is better that we just set
inhibit-load-charsets-map to t before loading mule-conf, and
set it to nil just before dumping.  And signal an error when
it is found that loading a charaset map is necessary while
inhibit-load-charsets-map is non-nil.

And, we also have to modify `unify-charset' not to call
load_charset_map_from_file but just to setup up
Vchar_unify_table so that all characters in upper space that
must be unified with Unicode have value `t'.

How about making a branch, say "reduce-dump-size", and
commit your change so that I can check the code?

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-10-31 12:35                               ` size of emacs executable after unicode merge Stephen Berman
@ 2008-11-21 12:32                                 ` Kenichi Handa
  2008-11-21 14:18                                   ` Ulrich Mueller
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-21 12:32 UTC (permalink / raw)
  To: Stephen Berman; +Cc: emacs-devel

In article <87skqd6iyx.fsf@escher.local.home>, Stephen Berman <stephen.berman@gmx.net> writes:

>>> From: Kenichi Handa <handa@m17n.org>
[...]
>>> (gdb) run -batch -l loadup dump
>>> Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump
>>> [Thread debugging using libthread_db enabled]
>>> [New process 10730]
>>> Executing new program: /usr/local/work/emacs/src/temacs
>>> warning: Cannot initialize thread debugging library: generic error
>>> warning: Cannot initialize thread debugging library: generic error
>>> [New process 10730]
>>> [Thread debugging using libthread_db enabled]
>>> Debugger segmentation fault
>>> 
>>> Do you know about this error?  Is this a bug of gdb?  A few
>>> months ago, I didn't see this error on running temacs
>>> under gdb.
> >
> > On a GNU/Linux system, with today's CVS configured as shown below, and
> > with GDB 6.7.1, I don't get this problem.
> >
> > Why is libthread_db being used? does your build use GTK+ or some other
> > library that uses multithreading?

> I also get the same gdb segfault as Handa-san with temacs from GNU Emacs
> 23.0.60.12 (i686-pc-linux-gnu, GTK+ Version 2.12.9) of 2008-10-25 on
> escher, built with the default configuration.  My gdb is version 6.8.

It seems that the above error happens when temacs is run
under gdb, and execvp is called in the following hunk of
codes in main (emacs.c).

#ifdef HAVE_PERSONALITY_LINUX32
  if (!initialized
      && (strcmp (argv[argc-1], "dump") == 0
          || strcmp (argv[argc-1], "bootstrap") == 0)
      && ! getenv ("EMACS_HEAP_EXEC"))
    {
      /* Set this so we only do this once.  */
      putenv("EMACS_HEAP_EXEC=true");

      /* A flag to turn off address randomization which is introduced
         in linux kernel shipped with fedora core 4 */
#define ADD_NO_RANDOMIZE 0x0040000
      personality (PER_LINUX32 | ADD_NO_RANDOMIZE);
#undef  ADD_NO_RANDOMIZE

      execvp (argv[0], argv);

      /* If the exec fails, try to dump anyway.  */
      perror ("execvp");
    }
#endif /* HAVE_PERSONALITY_LINUX32 */

Currently, I'm running temacs under gdb with this workaround:
(gdb) set env EMACS_HEAP_EXEC=1

With this, temacs runs until Fdump_emacs is called (and thus
I can at least debug loadup.el).  In the call of
Fdump_emacs, this message is printed:

**************************************************
Warning: Your system has a gap between BSS and the
heap (9475964 bytes).  This usually means that exec-shield
or something similar is in effect.  The dump may
fail because of this.  See the section about
exec-shield in etc/PROBLEMS for more information.
**************************************************

and temacs crashes by SIGSEGV.

Although etc/PROBLEM has this statement:

You can check the Exec-shield state like this:
    cat /proc/sys/kernel/exec-shield

but, my system (debian testing) doesn't have that file.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-21 12:32                                 ` Kenichi Handa
@ 2008-11-21 14:18                                   ` Ulrich Mueller
  0 siblings, 0 replies; 86+ messages in thread
From: Ulrich Mueller @ 2008-11-21 14:18 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Stephen Berman, emacs-devel

>>>>> Kenichi Handa wrote:

> Currently, I'm running temacs under gdb with this workaround:
> (gdb) set env EMACS_HEAP_EXEC=1

> With this, temacs runs until Fdump_emacs is called (and thus
> I can at least debug loadup.el).  In the call of
> Fdump_emacs, this message is printed:

> **************************************************
> Warning: Your system has a gap between BSS and the
> heap (9475964 bytes).  This usually means that exec-shield
> or something similar is in effect.  The dump may
> fail because of this.  See the section about
> exec-shield in etc/PROBLEMS for more information.
> **************************************************

> and temacs crashes by SIGSEGV.

> Although etc/PROBLEM has this statement:

> You can check the Exec-shield state like this:
>     cat /proc/sys/kernel/exec-shield

> but, my system (debian testing) doesn't have that file.

As far as I know, exec-shield is a feature specific to some RH/Fedora
versions. However, Linux versions 2.6.25 and later implement heap
randomisation, see <http://lkml.org/lkml/2007/10/23/435>.

A possible workaround is to turn heap randomisation off:
,----
| echo -n 1 >/proc/sys/kernel/randomize_va_space
`----

Ulrich

P.S. Gentoo bug 236579 <http://bugs.gentoo.org/236579> contains some
additional information about this.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-13 16:33                                             ` Chong Yidong
  2008-11-14  0:48                                               ` Kenichi Handa
@ 2008-11-27 11:20                                               ` Kenichi Handa
  2008-11-27 16:07                                                 ` Chong Yidong
                                                                   ` (4 more replies)
  1 sibling, 5 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-27 11:20 UTC (permalink / raw)
  To: Chong Yidong
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>> > One idea is to have a single boolean vector of size #x110000
>>> > (139264 bytes), setup it for CHARSET everytime when we call
>>> > map-charset-chars for the different charset.  In that
>>> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1
>>> > for chinese-gb2312.  Then map-charset-chars can know for
>>> > which characters FUNCTION must be called.
> >
>>> but it appears to free a negligible about of memory.
> >
> > Did you comment out the calls of unify-charset in
> > mule-conf.el and change the encoding of all preloaded *.el
> > files to utf-8?

> Commenting out the calls to unify-charset does reduce the memory by
> several megabytes.

After taking over Chong's experiment, I could reduce the
size of Emacs executables about 7M bytes.  About 4M bytes
were actually because of charset mapping tables, and it
could be reduced by setting up C structure temp_charset_work
(see charset.c for the detail) instead of making many Lisp
objects (char-table and vector).  Another 3M bytes were
because of big standard category table.  It could be reduced
by hashing the table entries (see hash_get_category_set in
category.c for the detail).

As a result, now the executable is 10,671,313 bytes on
GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
I'm not sure it's worth making more effort to reduce it.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 11:20                                               ` Kenichi Handa
@ 2008-11-27 16:07                                                 ` Chong Yidong
  2008-11-27 16:12                                                 ` Dan Nicolaescu
                                                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 86+ messages in thread
From: Chong Yidong @ 2008-11-27 16:07 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet,
	jasonr

Kenichi Handa <handa@m17n.org> writes:

> After taking over Chong's experiment, I could reduce the
> size of Emacs executables about 7M bytes.  About 4M bytes
> were actually because of charset mapping tables, and it
> could be reduced by setting up C structure temp_charset_work
> (see charset.c for the detail) instead of making many Lisp
> objects (char-table and vector).  Another 3M bytes were
> because of big standard category table.  It could be reduced
> by hashing the table entries (see hash_get_category_set in
> category.c for the detail).
>
> As a result, now the executable is 10,671,313 bytes on
> GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
> I'm not sure it's worth making more effort to reduce it.

Thank you.  No problems here that I can see.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 11:20                                               ` Kenichi Handa
  2008-11-27 16:07                                                 ` Chong Yidong
@ 2008-11-27 16:12                                                 ` Dan Nicolaescu
  2008-11-28  1:02                                                   ` Kenichi Handa
  2008-11-27 16:31                                                 ` Stefan Monnier
                                                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 86+ messages in thread
From: Dan Nicolaescu @ 2008-11-27 16:12 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, monnier,
	evilborisnet, jasonr

Kenichi Handa <handa@m17n.org> writes:

  > In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:
  > 
  > > Kenichi Handa <handa@m17n.org> writes:
  > >>> > One idea is to have a single boolean vector of size #x110000
  > >>> > (139264 bytes), setup it for CHARSET everytime when we call
  > >>> > map-charset-chars for the different charset.  In that
  > >>> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1
  > >>> > for chinese-gb2312.  Then map-charset-chars can know for
  > >>> > which characters FUNCTION must be called.
  > > >
  > >>> but it appears to free a negligible about of memory.
  > > >
  > > > Did you comment out the calls of unify-charset in
  > > > mule-conf.el and change the encoding of all preloaded *.el
  > > > files to utf-8?
  > 
  > > Commenting out the calls to unify-charset does reduce the memory by
  > > several megabytes.
  > 
  > After taking over Chong's experiment, I could reduce the
  > size of Emacs executables about 7M bytes.  About 4M bytes
  > were actually because of charset mapping tables, and it
  > could be reduced by setting up C structure temp_charset_work
  > (see charset.c for the detail) instead of making many Lisp
  > objects (char-table and vector).  Another 3M bytes were
  > because of big standard category table.  It could be reduced
  > by hashing the table entries (see hash_get_category_set in
  > category.c for the detail).
  > 
  > As a result, now the executable is 10,671,313 bytes on
  > GNU/Linux.  

Thanks for doing this!

  > It's still 1.6M bytes larger than Emacs 22, but I'm not sure it's
  > worth making more effort to reduce it.

In that case that size increase might be with us for ever and ever,
which is not ideal.

Also a related question: the data in the .map files in
emacs/etc/charsets be transformed into elisp?  That way the normal
loading mechanism could be used for them, and no parser + other code
would be needed...




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 11:20                                               ` Kenichi Handa
  2008-11-27 16:07                                                 ` Chong Yidong
  2008-11-27 16:12                                                 ` Dan Nicolaescu
@ 2008-11-27 16:31                                                 ` Stefan Monnier
  2008-11-27 20:17                                                 ` Richard M Stallman
  2008-11-28 16:11                                                 ` Juanma Barranquero
  4 siblings, 0 replies; 86+ messages in thread
From: Stefan Monnier @ 2008-11-27 16:31 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, dann,
	evilborisnet, jasonr

> After taking over Chong's experiment, I could reduce the
> size of Emacs executables about 7M bytes.  About 4M bytes
> were actually because of charset mapping tables, and it
> could be reduced by setting up C structure temp_charset_work
> (see charset.c for the detail) instead of making many Lisp
> objects (char-table and vector).  Another 3M bytes were
> because of big standard category table.  It could be reduced
> by hashing the table entries (see hash_get_category_set in
> category.c for the detail).

> As a result, now the executable is 10,671,313 bytes on
> GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
> I'm not sure it's worth making more effort to reduce it.

That's great to hear.  1.6MB of growth is not a feature, but I guess we
can live with it for now.


        Stefan




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 11:20                                               ` Kenichi Handa
                                                                   ` (2 preceding siblings ...)
  2008-11-27 16:31                                                 ` Stefan Monnier
@ 2008-11-27 20:17                                                 ` Richard M Stallman
  2008-11-27 20:42                                                   ` Eli Zaretskii
  2008-11-28  1:47                                                   ` Kenichi Handa
  2008-11-28 16:11                                                 ` Juanma Barranquero
  4 siblings, 2 replies; 86+ messages in thread
From: Richard M Stallman @ 2008-11-27 20:17 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

    As a result, now the executable is 10,671,313 bytes on
    GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
    I'm not sure it's worth making more effort to reduce it.

Thanks forn doing the work to shrink it.

It is no disaster to be 1.6 meg bigger.  But I think is worth some
effort to try to find where that 1.6 meg goes in order to see if there
is an easy idea for substantial more savings.  If there is an easy way,
it will be worth doing.






^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 20:17                                                 ` Richard M Stallman
@ 2008-11-27 20:42                                                   ` Eli Zaretskii
  2008-11-28  1:47                                                   ` Kenichi Handa
  1 sibling, 0 replies; 86+ messages in thread
From: Eli Zaretskii @ 2008-11-27 20:42 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

> From: Richard M Stallman <rms@gnu.org>
> Date: Thu, 27 Nov 2008 15:17:49 -0500
> Cc: emanuele.giaquinta@gmail.com, cyd@stupidchicken.com, emacs-devel@gnu.org,
> 	dann@ics.uci.edu, monnier@iro.umontreal.ca,
> 	evilborisnet@netscape.net, jasonr@gnu.org
> 
>     As a result, now the executable is 10,671,313 bytes on
>     GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
>     I'm not sure it's worth making more effort to reduce it.
> 
> Thanks forn doing the work to shrink it.
> 
> It is no disaster to be 1.6 meg bigger.

Btw, I think we should be comparing the sizes of .data section (and
maybe some other similar sections), rather than the size of the
executable file on disk.  The latter typically includes debug info and
also the code, which typically grows in later releases.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 16:12                                                 ` Dan Nicolaescu
@ 2008-11-28  1:02                                                   ` Kenichi Handa
  0 siblings, 0 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-28  1:02 UTC (permalink / raw)
  To: Dan Nicolaescu
  Cc: rms, emanuele.giaquinta, cyd, emacs-devel, monnier, evilborisnet,
	jasonr

In article <200811271612.mARGCT3f021393@mothra.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> Also a related question: the data in the .map files in
> emacs/etc/charsets be transformed into elisp?  That way the normal
> loading mechanism could be used for them, and no parser + other code
> would be needed...

No.  The key point of my change is not to make many/huge
Lisp objects while temacs is running for dumping.  And the
parser code for the current *.map format is trivial.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 20:17                                                 ` Richard M Stallman
  2008-11-27 20:42                                                   ` Eli Zaretskii
@ 2008-11-28  1:47                                                   ` Kenichi Handa
  2008-11-28 15:38                                                     ` Richard M Stallman
  1 sibling, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-28  1:47 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <E1L5nJ7-0001ge-I1@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes:

>     As a result, now the executable is 10,671,313 bytes on
>     GNU/Linux.  It's still 1.6M bytes larger than Emacs 22, but
>     I'm not sure it's worth making more effort to reduce it.

> Thanks forn doing the work to shrink it.

> It is no disaster to be 1.6 meg bigger.  But I think is worth some
> effort to try to find where that 1.6 meg goes in order to see if there
> is an easy idea for substantial more savings.  If there is an easy way,
> it will be worth doing.

At least, temacs itself is 680K bytes larger.  The new
charset.c mallocs 260K bytes for temporary work.  It is
freed just before dumping, but it is uncertain whether it is
surely returned to the system or not.  In addition, many of
preloaded Lisp files have grown.

I guess the growth is not because of single change but
because of summing up of many small changes.

Of course, someone who has a time to investigate it is
welcome.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-28  1:47                                                   ` Kenichi Handa
@ 2008-11-28 15:38                                                     ` Richard M Stallman
  2008-11-29  1:52                                                       ` Kenichi Handa
  0 siblings, 1 reply; 86+ messages in thread
From: Richard M Stallman @ 2008-11-28 15:38 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

    I guess the growth is not because of single change but
    because of summing up of many small changes.

Maybe that is true, or maybe one large change is responsible for .5
meg (say).  This is an empirical question, and rather than guess,
I am suggesting we find out the answer.

If indeed this growth is due to many small changes then probably there
is nothing to be done.  However, if one thing has eaten .5 meg, maybe
a few hours of work can eliminate that, and it would be worth trying.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-27 11:20                                               ` Kenichi Handa
                                                                   ` (3 preceding siblings ...)
  2008-11-27 20:17                                                 ` Richard M Stallman
@ 2008-11-28 16:11                                                 ` Juanma Barranquero
  2008-11-29  1:47                                                   ` Kenichi Handa
  4 siblings, 1 reply; 86+ messages in thread
From: Juanma Barranquero @ 2008-11-28 16:11 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Emacs developers

On Thu, Nov 27, 2008 at 12:20, Kenichi Handa <handa@m17n.org> wrote:

> After taking over Chong's experiment, I could reduce the
> size of Emacs executables about 7M bytes.  About 4M bytes
> were actually because of charset mapping tables, and it
> could be reduced by setting up C structure temp_charset_work

Your big patch of 2008-11-27 has caused a change when decoding some
filenames. I've detected it because I have a fix for bug#1433 that
works before your changes, but does not work afterwards.

I'm testing by adding the following line to Fw32_short_file_name:

  filename = ENCODE_FILE (filename);

before the call to GetShortPathName.

Before your changes,

1783      filename = ENCODE_FILE (filename);
(gdb) p filename
$5 = 52759907
(gdb) xstring
$6 = (struct Lisp_String *) 0x3250d60
"c:/acci\303\263n/"
(gdb) next
1786      if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0)
(gdb) p filename
$7 = 52759971
(gdb) xstring
$8 = (struct Lisp_String *) 0x3250da0
"c:/acci\363n/"

After your changes,

1783      filename = ENCODE_FILE (filename);
(gdb) p filename
$5 = 55776275
(gdb) xstring
$6 = (struct Lisp_String *) 0x3531410
"c:/acci\303\263n/"
(gdb) next
1786      if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0)
(gdb) p filename
$7 = 55776339
(gdb) xstring
$8 = (struct Lisp_String *) 0x3531450
"c:/acci n/"

             Juanma




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-28 16:11                                                 ` Juanma Barranquero
@ 2008-11-29  1:47                                                   ` Kenichi Handa
  2008-11-29 11:13                                                     ` Juanma Barranquero
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-29  1:47 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs-devel

In article <f7ccd24b0811280811k5996b24dla2a1692af64da2c9@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes:

> Your big patch of 2008-11-27 has caused a change when decoding some
> filenames. I've detected it because I have a fix for bug#1433 that
> works before your changes, but does not work afterwards.

I've just installed fixes for my silly mistakes.  Could you
please try again?

---
Kenichi Handa
handa@ni.aist.go.jp

> I'm testing by adding the following line to Fw32_short_file_name:

>   filename = ENCODE_FILE (filename);

> before the call to GetShortPathName.

> Before your changes,

> 1783      filename = ENCODE_FILE (filename);
> (gdb) p filename
> $5 = 52759907
> (gdb) xstring
> $6 = (struct Lisp_String *) 0x3250d60
> "c:/acci\303\263n/"
> (gdb) next
> 1786      if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0)
> (gdb) p filename
> $7 = 52759971
> (gdb) xstring
> $8 = (struct Lisp_String *) 0x3250da0
> "c:/acci\363n/"

> After your changes,

> 1783      filename = ENCODE_FILE (filename);
> (gdb) p filename
> $5 = 55776275
> (gdb) xstring
> $6 = (struct Lisp_String *) 0x3531410
> "c:/acci\303\263n/"
> (gdb) next
> 1786      if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0)
> (gdb) p filename
> $7 = 55776339
> (gdb) xstring
> $8 = (struct Lisp_String *) 0x3531450
> "c:/acci n/"

>              Juanma







^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-28 15:38                                                     ` Richard M Stallman
@ 2008-11-29  1:52                                                       ` Kenichi Handa
  2008-11-29 10:47                                                         ` Eli Zaretskii
  2008-11-29 19:43                                                         ` Richard M Stallman
  0 siblings, 2 replies; 86+ messages in thread
From: Kenichi Handa @ 2008-11-29  1:52 UTC (permalink / raw)
  To: rms
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

In article <E1L65QO-0004ZH-Ht@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes:

>     I guess the growth is not because of single change but
>     because of summing up of many small changes.

> Maybe that is true, or maybe one large change is responsible for .5
> meg (say).  This is an empirical question, and rather than guess,
> I am suggesting we find out the answer.

> If indeed this growth is due to many small changes then probably there
> is nothing to be done.  However, if one thing has eaten .5 meg, maybe
> a few hours of work can eliminate that, and it would be worth trying.

If one finds a reason of .5 meg eating, it may require just
a few hours to fix it.  But, to find that reason, I expect
one has to spend many hours.  At least, I've already spent
many hours, and have no idea how many more hours to take to
find one big reason (or to decide there's not such thing).

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29  1:52                                                       ` Kenichi Handa
@ 2008-11-29 10:47                                                         ` Eli Zaretskii
  2008-11-29 19:43                                                         ` Richard M Stallman
  1 sibling, 0 replies; 86+ messages in thread
From: Eli Zaretskii @ 2008-11-29 10:47 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Sat, 29 Nov 2008 10:52:38 +0900
> Cc: emanuele.giaquinta@gmail.com, cyd@stupidchicken.com, emacs-devel@gnu.org,
> 	dann@ics.uci.edu, monnier@iro.umontreal.ca,
> 	evilborisnet@netscape.net, jasonr@gnu.org
> 
> If one finds a reason of .5 meg eating, it may require just
> a few hours to fix it.  But, to find that reason, I expect
> one has to spend many hours.  At least, I've already spent
> many hours, and have no idea how many more hours to take to
> find one big reason (or to decide there's not such thing).

How about the following approach: run "temacs -batch -loadup dump"
under strace (or a similar tool), and make a report of all the calls
to memory-allocation services.  Then explore all those calls that
allocate significant amounts of memory that is never released later.

(To ease the job of correlating memory allocation with packages being
loaded, look at system calls that open and read *.el and *.elc files,
immediately before memory allocation calls.)

Would this approach allow to quickly zero in on the worst offenders?
If so, perhaps someone here who is proficient with strace and system
calls could work on preparing such a report, and then post it here?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29  1:47                                                   ` Kenichi Handa
@ 2008-11-29 11:13                                                     ` Juanma Barranquero
  2008-11-29 12:17                                                       ` Juanma Barranquero
  0 siblings, 1 reply; 86+ messages in thread
From: Juanma Barranquero @ 2008-11-29 11:13 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Sat, Nov 29, 2008 at 02:47, Kenichi Handa <handa@m17n.org> wrote:

> I've just installed fixes for my silly mistakes.  Could you
> please try again?

Yes, it works now.

Thanks,

             Juanma




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29 11:13                                                     ` Juanma Barranquero
@ 2008-11-29 12:17                                                       ` Juanma Barranquero
  2008-11-29 13:50                                                         ` Kenichi Handa
  0 siblings, 1 reply; 86+ messages in thread
From: Juanma Barranquero @ 2008-11-29 12:17 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Sat, Nov 29, 2008 at 12:13, Juanma Barranquero <lekktu@gmail.com> wrote:

> Yes, it works now.

I spoke too soon.

Your patch fixes the problem, but I now I'm seeing an assertion
failure when writing some files (most notably, ChangeLogs). That is
100% repeatable: I try to write src/ChangeLog, for example, and get
the assertion failure every time.

The CHAR_TABLE_P call at the start of char_table_ref is getting a null `table'.

See the attached debug trace at the point where `table' is Qnil.

             Juanma


Breakpoint 16, char_table_ref (table=48941057, c=228) at chartab.c:197
197       struct Lisp_Char_Table *tbl = XCHAR_TABLE (table);
(gdb) p table
$43 = 48941057
(gdb) pr
nil
(gdb) bt
#0  char_table_ref (table=48941057, c=228) at chartab.c:197
#1  0x01174a88 in char_encodable_p (c=228, attrs=53025028) at coding.c:8068
#2  0x0117195d in Ffind_coding_systems_region_internal (start=8,
end=4687040, exclude=48941057) at coding.c:8164
#3  0x0103f05b in Ffuncall (nargs=3, args=0x82dfb0) at eval.c:3050
#4  0x012173ad in Fbyte_code (bytestr=20863691, vector=20863748,
maxdepth=32) at bytecode.c:678
#5  0x0103fc28 in funcall_lambda (fun=20863644, nargs=2,
arg_vector=0x82e2c4) at eval.c:3231
#6  0x0103f484 in Ffuncall (nargs=3, args=0x82e2c0) at eval.c:3090
#7  0x012173ad in Fbyte_code (bytestr=20867659, vector=20867956,
maxdepth=72) at bytecode.c:678
#8  0x0103fc28 in funcall_lambda (fun=20867580, nargs=5,
arg_vector=0x82e604) at eval.c:3231
#9  0x0103f484 in Ffuncall (nargs=6, args=0x82e600) at eval.c:3090
#10 0x0103e626 in call5 (fn=49529857, arg1=8, arg2=4687040,
arg3=49221305, arg4=48941057, arg5=53565683) at eval.c:2919
#11 0x010cafe0 in choose_write_coding_system (start=8, end=4687040,
filename=53565683, append=48941057, visit=48941105,
    lockname=53566403, coding=0x82e6f0) at fileio.c:4348
#12 0x010cb392 in Fwrite_region (start=8, end=4687040,
filename=53565683, append=48941057, visit=48941105, lockname=53566403,
    mustbenew=48941057) at fileio.c:4520
#13 0x0103f33c in Ffuncall (nargs=7, args=0x82eb80) at eval.c:3068
#14 0x012173ad in Fbyte_code (bytestr=20586915, vector=20587124,
maxdepth=56) at bytecode.c:678
#15 0x0103fc28 in funcall_lambda (fun=20586892, nargs=0,
arg_vector=0x82eea4) at eval.c:3231
#16 0x0103f484 in Ffuncall (nargs=1, args=0x82eea0) at eval.c:3090
#17 0x012173ad in Fbyte_code (bytestr=20586731, vector=20586780,
maxdepth=16) at bytecode.c:678
#18 0x0103fc28 in funcall_lambda (fun=20586708, nargs=0,
arg_vector=0x82f1b4) at eval.c:3231
#19 0x0103f484 in Ffuncall (nargs=1, args=0x82f1b0) at eval.c:3090
#20 0x012173ad in Fbyte_code (bytestr=20585923, vector=20586180,
maxdepth=40) at bytecode.c:678
#21 0x0103fc28 in funcall_lambda (fun=20585892, nargs=0,
arg_vector=0x82f4d4) at eval.c:3231
#22 0x0103f484 in Ffuncall (nargs=1, args=0x82f4d0) at eval.c:3090
#23 0x012173ad in Fbyte_code (bytestr=20585267, vector=20585364,
maxdepth=40) at bytecode.c:678
#24 0x0103fc28 in funcall_lambda (fun=20585220, nargs=1,
arg_vector=0x82f834) at eval.c:3231
#25 0x0103f484 in Ffuncall (nargs=2, args=0x82f830) at eval.c:3090
#26 0x0121ce62 in Fcall_interactively (function=49156481,
record_flag=48941057, keys=48974596) at callint.c:857
#27 0x0103f05b in Ffuncall (nargs=4, args=0x82faf8) at eval.c:3050
#28 0x0103e598 in call3 (fn=49132665, arg1=49156481, arg2=48941057,
arg3=48941057) at eval.c:2870
#29 0x01024ca1 in Fcommand_execute (cmd=49156481,
record_flag=48941057, keys=48941057, special=48941057) at
keyboard.c:10333
#30 0x0100a93c in command_loop_1 () at keyboard.c:1880
#31 0x0103b136 in internal_condition_case (bfun=0x1007235
<command_loop_1>, handlers=49004809, hfun=0x10069b4 <cmd_error>)
    at eval.c:1511
#32 0x01006e44 in command_loop_2 () at keyboard.c:1338
#33 0x0103ab8f in internal_catch (tag=49000881, func=0x1006e24
<command_loop_2>, arg=48941057) at eval.c:1247
#34 0x01006dfb in command_loop () at keyboard.c:1317
#35 0x0100610b in recursive_edit_1 () at keyboard.c:942
#36 0x010065e0 in Frecursive_edit () at keyboard.c:1004
#37 0x01002a71 in main (argc=2, argv=0xa92748) at emacs.c:1777

Lisp Backtrace:
"find-coding-systems-region-internal" (0x82dfb4)
"find-coding-systems-region" (0x82e2c4)
"select-safe-coding-system" (0x82e604)
"write-region" (0x82eb84)
"basic-save-buffer-2" (0x82eea4)
"basic-save-buffer-1" (0x82f1b4)
"basic-save-buffer" (0x82f4d4)
"save-buffer" (0x82f834)
"call-interactively" (0x82fafc)
(gdb) frame 1
#1  0x01174a88 in char_encodable_p (c=228, attrs=53025028) at coding.c:8068
8068          if (CHAR_CHARSET_P (c, charset))
(gdb) p attrs
$44 = 53025028
(gdb) pr
[chinese-gb18030 nil 99 charset (0 157 158 159 160 161) t nil nil nil
nil nil 32 nil (:ascii-compatible-p t :category coding-categ
ory-charset :name chinese-gb18030 :docstring "GB18030 encoding for
Chinese (MIME:GB18030)." :coding-type charset :mnemonic 99 :cha
rset-list (ascii gb18030-2-byte gb18030-4-byte-bmp gb18030-4-byte-smp
gb18030-4-byte-ext-1 gb18030-4-byte-ext-2) :mime-charset gb1
8030) 14 "
                +
                                           " [0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 nil (157 158) (157 158) (157
158) (157 158 160) (157 160) (157 160) (157 160) (157 160)
(157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159)
(157 159) (157 159) (157 159) (157 159) (157 159) (157 159 1
61) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157
161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 1
61) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157
161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 1
61) (157 161) nil] nil nil nil nil nil nil nil nil nil nil]




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29 12:17                                                       ` Juanma Barranquero
@ 2008-11-29 13:50                                                         ` Kenichi Handa
  2008-11-29 15:05                                                           ` Juanma Barranquero
  0 siblings, 1 reply; 86+ messages in thread
From: Kenichi Handa @ 2008-11-29 13:50 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs-devel

In article <f7ccd24b0811290417q289c695en132c5916631eccec@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes:

> Your patch fixes the problem, but I now I'm seeing an assertion
> failure when writing some files (most notably, ChangeLogs). That is
> 100% repeatable: I try to write src/ChangeLog, for example, and get
> the assertion failure every time.

> The CHAR_TABLE_P call at the start of char_table_ref is getting a null `table'.

> See the attached debug trace at the point where `table' is Qnil.

I've just installed a fix.  Please try again.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29 13:50                                                         ` Kenichi Handa
@ 2008-11-29 15:05                                                           ` Juanma Barranquero
  0 siblings, 0 replies; 86+ messages in thread
From: Juanma Barranquero @ 2008-11-29 15:05 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Sat, Nov 29, 2008 at 14:50, Kenichi Handa <handa@m17n.org> wrote:

> I've just installed a fix.  Please try again.

It works now.

Thanks,

  Juanma




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29  1:52                                                       ` Kenichi Handa
  2008-11-29 10:47                                                         ` Eli Zaretskii
@ 2008-11-29 19:43                                                         ` Richard M Stallman
  2008-11-30  4:50                                                           ` Chetan Pandya
  1 sibling, 1 reply; 86+ messages in thread
From: Richard M Stallman @ 2008-11-29 19:43 UTC (permalink / raw)
  To: Kenichi Handa
  Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet,
	jasonr

    If one finds a reason of .5 meg eating, it may require just
    a few hours to fix it.  But, to find that reason, I expect
    one has to spend many hours.  At least, I've already spent
    many hours, and have no idea how many more hours to take to
    find one big reason (or to decide there's not such thing).

It is impossible to predetermine an answer to that.
I think it is worth a few hours to LOOK for a single cause
that has eaten much space.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: size of emacs executable after unicode merge
  2008-11-29 19:43                                                         ` Richard M Stallman
@ 2008-11-30  4:50                                                           ` Chetan Pandya
  0 siblings, 0 replies; 86+ messages in thread
From: Chetan Pandya @ 2008-11-30  4:50 UTC (permalink / raw)
  To: emacs-devel

While not directly on this, subject, looking at the isearch code, looks like there is some dead code. It would perhaps help to remove such code?

Chetan

Here is a diff with 22.3 isearch.el function isearch-other-meta-char:
diff -u "c:/src/emacs-22.3/lisp/isearch.org.el" "c:/src/emacs-22.3/lisp/isearch.el"
--- c:/src/emacs-22.3/lisp/isearch.org.el	Tue Feb 05 10:49:45 2008
+++ c:/src/emacs-22.3/lisp/isearch.el	Sun Nov 30 04:26:23 2008
@@ -1733,40 +1733,11 @@
                (goto-char isearch-point)))
            (isearch-update))
 	  (search-exit-option
-	   (let (window)
-             (isearch-unread-key-sequence keylist)
-             (setq main-event (car unread-command-events))
-
-	     ;; If we got a mouse click event, that event contains the
-	     ;; window clicked on. maybe it was read with the buffer
-	     ;; it was clicked on.  If so, that buffer, not the current one,
-	     ;; is in isearch mode.  So end the search in that buffer.
-
-	     ;; ??? I have no idea what this if checks for, but it's
-	     ;; obviously wrong for the case that a down-mouse event
-	     ;; on another window invokes this function.  The event
-	     ;; will contain the window clicked on and that window's
-	     ;; buffer is certainly not always in Isearch mode.
-	     ;;
-	     ;; Leave the code in, but check for current buffer not
-	     ;; being in Isearch mode for now, until someone tells
-	     ;; what it's really supposed to do.
-	     ;;
-	     ;; --gerd 2001-08-10.
-
-	     (if (and (not isearch-mode)
-		      (listp main-event)
-		      (setq window (posn-window (event-start main-event)))
-		      (windowp window)
-		      (or (> (minibuffer-depth) 0)
-			  (not (window-minibuffer-p window))))
-		 (save-excursion
-		   (set-buffer (window-buffer window))
-		   (isearch-done)
-		   (isearch-clean-overlays))
-	       (isearch-done)
-	       (isearch-clean-overlays)
-               (setq prefix-arg arg))))
+	   (isearch-unread-key-sequence keylist)
+	   (setq main-event (car unread-command-events))
+	   (isearch-done)
+	   (isearch-clean-overlays)
+	   (setq prefix-arg arg))
           (t;; otherwise nil
 	   (isearch-process-search-string key key)))))
 

Diff finished.  Sat Nov 29 20:34:35 2008





^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2008-11-30  4:50 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-14  3:13 size of emacs executable after unicode merge Evil Boris
2008-05-14  3:51 ` Dan Nicolaescu
2008-05-14 16:39   ` Richard M Stallman
2008-05-14 16:52     ` Dan Nicolaescu
2008-05-15 14:18       ` Richard M Stallman
2008-05-15 15:29         ` Dan Nicolaescu
2008-05-16 11:31           ` Richard M Stallman
2008-05-16 12:06             ` Dan Nicolaescu
2008-05-16 12:32             ` Kenichi Handa
2008-05-16 12:55               ` Jason Rumney
2008-05-16 15:59                 ` Thomas Lord
2008-05-16 22:07                   ` Stephen J. Turnbull
2008-05-16 23:01                     ` Thomas Lord
2008-05-17  0:56                 ` Kenichi Handa
2008-05-17  1:52                   ` YAMAMOTO Mitsuharu
2008-05-19  1:45                     ` Kenichi Handa
2008-10-30 10:18                   ` Emanuele Giaquinta
2008-10-30 21:22                     ` Eli Zaretskii
2008-10-30 21:42                       ` Stefan Monnier
2008-10-31  3:55                     ` Richard M. Stallman
2008-10-31  5:29                       ` Kenichi Handa
2008-10-31  6:32                         ` Chong Yidong
2008-10-31  7:32                           ` Kenichi Handa
2008-10-31 10:09                             ` Eli Zaretskii
2008-10-31 12:33                               ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa
2008-10-31 14:28                                 ` Eli Zaretskii
2008-10-31 12:35                               ` size of emacs executable after unicode merge Stephen Berman
2008-11-21 12:32                                 ` Kenichi Handa
2008-11-21 14:18                                   ` Ulrich Mueller
2008-10-31 10:41                           ` YAMAMOTO Mitsuharu
2008-10-31 15:07                         ` Dan Nicolaescu
2008-10-31 16:44                           ` Stefan Monnier
2008-11-04 23:09                             ` Chong Yidong
2008-11-05  4:17                               ` Kenichi Handa
2008-11-05 15:50                                 ` Stefan Monnier
2008-11-06  7:56                                   ` Kenichi Handa
2008-11-08  2:42                                     ` Stefan Monnier
2008-11-08  4:10                                       ` Chong Yidong
2008-11-08  9:19                                         ` Eli Zaretskii
2008-11-09  0:27                                         ` Richard M. Stallman
2008-11-09  6:29                                           ` Dan Nicolaescu
2008-11-09 17:11                                             ` Richard M. Stallman
2008-11-10  1:24                                               ` Stefan Monnier
2008-11-10  1:55                                                 ` Thomas Lord
2008-11-11  4:37                                                   ` Chong Yidong
2008-11-08 10:30                                       ` Dan Nicolaescu
2008-11-09 20:14                                     ` Chong Yidong
2008-11-10  1:59                                       ` Kenichi Handa
2008-11-10 15:18                                         ` Chong Yidong
2008-11-10 23:18                                         ` Chong Yidong
2008-11-11 18:17                                         ` Chong Yidong
2008-11-12  6:26                                           ` Kenichi Handa
2008-11-13 16:33                                             ` Chong Yidong
2008-11-14  0:48                                               ` Kenichi Handa
2008-11-27 11:20                                               ` Kenichi Handa
2008-11-27 16:07                                                 ` Chong Yidong
2008-11-27 16:12                                                 ` Dan Nicolaescu
2008-11-28  1:02                                                   ` Kenichi Handa
2008-11-27 16:31                                                 ` Stefan Monnier
2008-11-27 20:17                                                 ` Richard M Stallman
2008-11-27 20:42                                                   ` Eli Zaretskii
2008-11-28  1:47                                                   ` Kenichi Handa
2008-11-28 15:38                                                     ` Richard M Stallman
2008-11-29  1:52                                                       ` Kenichi Handa
2008-11-29 10:47                                                         ` Eli Zaretskii
2008-11-29 19:43                                                         ` Richard M Stallman
2008-11-30  4:50                                                           ` Chetan Pandya
2008-11-28 16:11                                                 ` Juanma Barranquero
2008-11-29  1:47                                                   ` Kenichi Handa
2008-11-29 11:13                                                     ` Juanma Barranquero
2008-11-29 12:17                                                       ` Juanma Barranquero
2008-11-29 13:50                                                         ` Kenichi Handa
2008-11-29 15:05                                                           ` Juanma Barranquero
2008-11-05 22:30                                 ` Richard M. Stallman
2008-11-06 11:58                                   ` Kenichi Handa
2008-11-07 12:39                                     ` Richard M. Stallman
2008-11-07 13:29                                       ` Stephen J. Turnbull
2008-11-07 21:15                                         ` Richard M. Stallman
2008-11-08  4:00                                           ` Stephen J. Turnbull
2008-11-08  4:19                                           ` Stefan Monnier
2008-10-31 19:30                         ` Richard M. Stallman
2008-11-09 22:43                         ` Chong Yidong
2008-11-09 22:57                           ` Chong Yidong
2008-11-10  1:28                             ` Kenichi Handa
2008-11-10 19:29                               ` Richard M. Stallman
2008-11-10  1:26                           ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).