* size of emacs executable after unicode merge @ 2008-05-14 3:13 Evil Boris 2008-05-14 3:51 ` Dan Nicolaescu 0 siblings, 1 reply; 86+ messages in thread From: Evil Boris @ 2008-05-14 3:13 UTC (permalink / raw) To: emacs-devel I see Emacs executable size grew by more than a factor of 2 after Unicode merge. (This is on Solaris 7 "sparc-sun-solaris2.7, X toolkit", with ancient gcc 3.1.) Does everyone see that, too? --BA /local/2/software/emacs-cvs/emacs/src: wildcard emacs-* -rwxr-xr-x 2 aronov 9519135 Feb 1 10:54 emacs-23.0.50.22 -rwxr-xr-x 1 aronov 9519135 Feb 1 10:49 emacs-23.0.50.21 -rwxr-xr-x 1 aronov 9518818 Jan 28 11:17 emacs-23.0.50.20 -rwxr-xr-x 1 aronov 9527010 Jan 28 10:55 emacs-23.0.50.19 -rwxr-xr-x 1 aronov 9517283 Jan 3 12:13 emacs-23.0.50.18 -rwxr-xr-x 1 aronov 9517283 Jan 3 10:57 emacs-23.0.50.17 ... /local/2/software/emacs-cvs/emacs-after-unicode2-merge/emacs/src: wildcard emacs* -rwxr-xr-x 2 aronov 22570304 Mar 21 11:23 emacs-23.0.60.36 -rwxr-xr-x 1 aronov 22571511 Mar 16 11:18 emacs-23.0.60.35 -rwxr-xr-x 1 aronov 22579387 Mar 15 13:09 emacs-23.0.60.33 -rwxr-xr-x 1 aronov 22620408 Mar 12 10:18 emacs-23.0.60.31 -rwxr-xr-x 1 aronov 22620408 Mar 10 12:06 emacs-23.0.60.29 -rwxr-xr-x 1 aronov 22767972 Mar 6 15:18 emacs-23.0.60.27 -rwxr-xr-x 1 aronov 22767972 Mar 5 13:56 emacs-23.0.60.25 ... ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-14 3:13 size of emacs executable after unicode merge Evil Boris @ 2008-05-14 3:51 ` Dan Nicolaescu 2008-05-14 16:39 ` Richard M Stallman 0 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-05-14 3:51 UTC (permalink / raw) To: Evil Boris; +Cc: emacs-devel Evil Boris <evilborisnet@netscape.net> writes: > I see Emacs executable size grew by more than a factor of 2 after > Unicode merge. (This is on Solaris 7 "sparc-sun-solaris2.7, X > toolkit", with ancient gcc 3.1.) > > Does everyone see that, too? Yes, it's noted in admin/FOR-RELEASE ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-14 3:51 ` Dan Nicolaescu @ 2008-05-14 16:39 ` Richard M Stallman 2008-05-14 16:52 ` Dan Nicolaescu 0 siblings, 1 reply; 86+ messages in thread From: Richard M Stallman @ 2008-05-14 16:39 UTC (permalink / raw) To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet > I see Emacs executable size grew by more than a factor of 2 after > Unicode merge. (This is on Solaris 7 "sparc-sun-solaris2.7, X > toolkit", with ancient gcc 3.1.) > > Does everyone see that, too? Yes, it's noted in admin/FOR-RELEASE What caused it to grow so much? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-14 16:39 ` Richard M Stallman @ 2008-05-14 16:52 ` Dan Nicolaescu 2008-05-15 14:18 ` Richard M Stallman 0 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-05-14 16:52 UTC (permalink / raw) To: rms; +Cc: evilborisnet, emacs-devel Richard M Stallman <rms@gnu.org> writes: > > I see Emacs executable size grew by more than a factor of 2 after > > Unicode merge. (This is on Solaris 7 "sparc-sun-solaris2.7, X > > toolkit", with ancient gcc 3.1.) > > > > Does everyone see that, too? > > Yes, it's noted in admin/FOR-RELEASE > > What caused it to grow so much? Here's what Handa-san said about that: One possibility is that the difference is because of big CJK charset maps loaded while creating emacs from temacs. Just before dumping, loadup.el calls `clear-charset-maps' which sets internal vectors and char-tables for charsets to nil, but the following call of `garbage-collect' doesn't return those free memory. I'll try to convert all encodings of language/*.el files to utf-8 so that Emacs doesn't have to load charset maps in loadup.el. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-14 16:52 ` Dan Nicolaescu @ 2008-05-15 14:18 ` Richard M Stallman 2008-05-15 15:29 ` Dan Nicolaescu 0 siblings, 1 reply; 86+ messages in thread From: Richard M Stallman @ 2008-05-15 14:18 UTC (permalink / raw) To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet I'll try to convert all encodings of language/*.el files to utf-8 so that Emacs doesn't have to load charset maps in loadup.el. I think that is important, but please let's see what Handa says about this proposed change. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-15 14:18 ` Richard M Stallman @ 2008-05-15 15:29 ` Dan Nicolaescu 2008-05-16 11:31 ` Richard M Stallman 0 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-05-15 15:29 UTC (permalink / raw) To: rms; +Cc: emacs-devel, evilborisnet Richard M Stallman <rms@gnu.org> writes: > I'll try to convert all encodings of language/*.el files to > utf-8 so that Emacs doesn't have to load charset maps in > loadup.el. > > I think that is important, but please let's see what Handa says > about this proposed change. You misunderstood, what you cited above is part of what he said... ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-15 15:29 ` Dan Nicolaescu @ 2008-05-16 11:31 ` Richard M Stallman 2008-05-16 12:06 ` Dan Nicolaescu 2008-05-16 12:32 ` Kenichi Handa 0 siblings, 2 replies; 86+ messages in thread From: Richard M Stallman @ 2008-05-16 11:31 UTC (permalink / raw) To: Dan Nicolaescu; +Cc: emacs-devel, evilborisnet > I'll try to convert all encodings of language/*.el files to > utf-8 so that Emacs doesn't have to load charset maps in > loadup.el. > > I think that is important, but please let's see what Handa says > about this proposed change. You misunderstood, what you cited above is part of what he said... I mean, let's see what he says about the change you propose to make. It looked like he described the current situation and your proposal was a response to that situation. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 11:31 ` Richard M Stallman @ 2008-05-16 12:06 ` Dan Nicolaescu 2008-05-16 12:32 ` Kenichi Handa 1 sibling, 0 replies; 86+ messages in thread From: Dan Nicolaescu @ 2008-05-16 12:06 UTC (permalink / raw) To: rms; +Cc: emacs-devel, evilborisnet Richard M Stallman <rms@gnu.org> writes: > > I'll try to convert all encodings of language/*.el files to > > utf-8 so that Emacs doesn't have to load charset maps in > > loadup.el. > > > > I think that is important, but please let's see what Handa says > > about this proposed change. > > You misunderstood, what you cited above is part of what he said... > > I mean, let's see what he says about the change you propose to make. > It looked like he described the current situation and > your proposal was a response to that situation. You misunderstand again, I didn't propose anything, I just pasted his message word for word. The proposal is his... ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 11:31 ` Richard M Stallman 2008-05-16 12:06 ` Dan Nicolaescu @ 2008-05-16 12:32 ` Kenichi Handa 2008-05-16 12:55 ` Jason Rumney 1 sibling, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-05-16 12:32 UTC (permalink / raw) To: rms; +Cc: dann, evilborisnet, emacs-devel In article <E1Jwy9X-00016Z-4m@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes: > I'll try to convert all encodings of language/*.el files to > utf-8 so that Emacs doesn't have to load charset maps in > loadup.el. > > I think that is important, but please let's see what Handa says > about this proposed change. > You misunderstood, what you cited above is part of what he said... > I mean, let's see what he says about the change you propose to make. > It looked like he described the current situation and > your proposal was a response to that situation. I did the experiment of changing all files containing a Japanese character that are not byte-compiled to utf-8, but the resulting size is not different. That means that my guess above is not correct. :-( --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 12:32 ` Kenichi Handa @ 2008-05-16 12:55 ` Jason Rumney 2008-05-16 15:59 ` Thomas Lord 2008-05-17 0:56 ` Kenichi Handa 0 siblings, 2 replies; 86+ messages in thread From: Jason Rumney @ 2008-05-16 12:55 UTC (permalink / raw) To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet Kenichi Handa wrote: > I did the experiment of changing all files containing a > Japanese character that are not byte-compiled to utf-8, but > the resulting size is not different. That means that my > guess above is not correct. :-( > How big are the data structures holding all the unicode character info and translation tables for encodings? At one point you said you wanted to make the unicode tables be loaded on demand but AFAIK they are currently dumped into the emacs executable. Could that account for a large proportion of the size increase? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 12:55 ` Jason Rumney @ 2008-05-16 15:59 ` Thomas Lord 2008-05-16 22:07 ` Stephen J. Turnbull 2008-05-17 0:56 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Thomas Lord @ 2008-05-16 15:59 UTC (permalink / raw) To: Jason Rumney; +Cc: dann, emacs-devel, rms, evilborisnet, Kenichi Handa Jason Rumney wrote: > How big are the data structures holding all the unicode character info > and translation tables for encodings? If that turns out to be the problem, will someone please contact me directly? (I ask that because I mostly just skim this list and so miss things.) Several years back I devoted a pretty decent number of hours to working out good ways to compress the run-time representation of such tables without sacrificing much performance on accesses. If it would be helpful, I can take a little bit of time to refresh my memory of how I did it and report that to someone working on the problem in Emacs, thus hopefully saving time (and squeaking a little bit more "payout" from my initial investment). -t ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 15:59 ` Thomas Lord @ 2008-05-16 22:07 ` Stephen J. Turnbull 2008-05-16 23:01 ` Thomas Lord 0 siblings, 1 reply; 86+ messages in thread From: Stephen J. Turnbull @ 2008-05-16 22:07 UTC (permalink / raw) To: Thomas Lord Cc: rms, Kenichi Handa, emacs-devel, dann, evilborisnet, Jason Rumney Thomas Lord writes: > Jason Rumney wrote: > > How big are the data structures holding all the unicode character info > > and translation tables for encodings? Is it possible that the whole Unicode range (17*2^16 code points) is being dumped? That would lead to about the size change observed, extrapolating from my "naive estimate" for XEmacs implementation of the BMP given below. But surely no characters outside of the BMP are needed to dump Emacs. > If that turns out to be the problem, will someone please contact me > directly? > (I ask that because I mostly just skim this list and so miss things.) > > Several years back I devoted a pretty decent number of hours to working > out good ways to compress the run-time representation of such tables > without sacrificing much performance on accesses. Loading on demand is generally a better solution, as most non-Asians use less than 500 characters, highly localized to about 3 ranges that can be loaded individually. Nor do you really need "good solutions", as half of the BMP is hanzi and Hangul which are basically constant ranges for the character info tables, and another 10% is private space and surrogates, leading to approximately 60% savings by using ranges and appropriate defaults for these four classes. The non-BMP planes surely can be loaded on-demand. > If it would be helpful, Did you do much better than 60% savings? If not, it's probably not really worth much effort given an efficient range table representation already available. In any case, something else is going on here besides naive representation (assuming we're restricted to the BMP). In XEmacs, where all coding tables for the BMP are loaded by default, much more naive strategies than those outlined above give 891800 bytes total for the to-unicode and from-unicode tables. I think we're missing a couple of charsets that Emacs Mule provides, but they're minor. We don't currently implement the Unidata base, but most (all?) of the character properties can be compactly represented as a small number of Booleans each, so a table of bitvectors for the BMP "should" only be about 256KB or maybe 512KB. IIRC XEmacs/UTF-2000 implemented the BMP Unidata as a Lisp array of Lisp bitvectors in about 1MB (most of which is Lisp object overhead). In other words, even with a naive strategy, the Unicode BMP database should only add about 1.1MB to 1.4MB, ie, about 10% of the size increase seen here, if coded compactly but straightforwardly in C. A few straightforward optimizations can probably get that down to 500KB to 700KB, and for an on-demand setup, most Western users should only see a footprint of about 10-15KB for Unicode data, if that. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 22:07 ` Stephen J. Turnbull @ 2008-05-16 23:01 ` Thomas Lord 0 siblings, 0 replies; 86+ messages in thread From: Thomas Lord @ 2008-05-16 23:01 UTC (permalink / raw) To: Stephen J. Turnbull Cc: rms, Kenichi Handa, emacs-devel, dann, evilborisnet, Jason Rumney Stephen J. Turnbull wrote: > > If it would be helpful, > > Did you do much better than 60% savings? As I recall, I did considerably better, though I'm not clear whether or not we're talking about the same tables. I could be mistaken, hence the passive request for prompting to indicate whether or not it's worth really refreshing my memory here. You are on the right track to observe that the density of stuff that matters is the key to optimization. Trie-based sparse-away approaches seem to work very well. The trick is to do some off-line computation to work out a roughly optimal breadth and depth. I found it worked well to vary the breadth according to depth. That's, in a nutshell, what I'm talking about. You talk about range encoding. Ick. Too many tests and branches, in my experience. A simple trie will do -- just take care to get its shape correct. > In other words, even with a naive strategy, the Unicode BMP database > should only add about 1.1MB to 1.4MB, ie, about 10% of the size > increase seen here, if coded compactly but straightforwardly in C. > > I'm not talking about boatloads of code and, if done right, it has other applications as well. It's no big deal either way. I don't mean to argue. I just thought it might be helpful. I'm just a patzer or kibbitzer here, take yr pick. As an aside: virtual memory hardware sucks and is pointless. Segmentation rocks, on the other hand. But, that's a topic for a day a ways in the future, unfortunately. -t ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-16 12:55 ` Jason Rumney 2008-05-16 15:59 ` Thomas Lord @ 2008-05-17 0:56 ` Kenichi Handa 2008-05-17 1:52 ` YAMAMOTO Mitsuharu 2008-10-30 10:18 ` Emanuele Giaquinta 1 sibling, 2 replies; 86+ messages in thread From: Kenichi Handa @ 2008-05-17 0:56 UTC (permalink / raw) To: Jason Rumney; +Cc: dann, rms, evilborisnet, emacs-devel In article <482D8435.6060407@gnu.org>, Jason Rumney <jasonr@gnu.org> writes: > How big are the data structures holding all the unicode character info > and translation tables for encodings? I have not yet calculated them. Each CJK charset defined by a map need a char-table of encoding and a vector for decoding. > At one point you said you wanted > to make the unicode tables be loaded on demand but AFAIK they are > currently dumped into the emacs executable. Could that account for a > large proportion of the size increase? They are surely loaded on demand. And, at the end of the dumping process, by calling clear-charset-maps, all loaded maps are set to nil, and then garbage-collect is called. So I think they are not included in the dumped Emacs. Perhaps some other char-table is the culprit. Maybe Vchar_unify_table. I must re-read src/charset.c. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-17 0:56 ` Kenichi Handa @ 2008-05-17 1:52 ` YAMAMOTO Mitsuharu 2008-05-19 1:45 ` Kenichi Handa 2008-10-30 10:18 ` Emanuele Giaquinta 1 sibling, 1 reply; 86+ messages in thread From: YAMAMOTO Mitsuharu @ 2008-05-17 1:52 UTC (permalink / raw) To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet, Jason Rumney >>>>> On Sat, 17 May 2008 09:56:58 +0900, Kenichi Handa <handa@m17n.org> said: > They are surely loaded on demand. And, at the end of the dumping > process, by calling clear-charset-maps, all loaded maps are set to > nil, and then garbage-collect is called. So I think they are not > included in the dumped Emacs. Perhaps some other char-table is the > culprit. Maybe Vchar_unify_table. I must re-read src/charset.c. I tried taking some heap statistics during loadup on Mac OS X. It would be interesting to focus on the changes in the numbers of 1024-byte blocks and 48-byte blocks as they are major factors of the increase/decrease. The major increases in the used heap size are observed at the load of international/mule-conf (564KB -> 8897KB) and international/characters (9627KB -> 14685KB). Before loading international/mule-conf: Overall size: 9359KB; 1512 nodes malloced for 564KB (6% of capacity); largest unused: [0x1855e00-7848KB] # of 1024-byte blocks: 193 # of 48-byte blocks: 959 After loading international/mule-conf: Overall size: 9623KB; 18028 nodes malloced for 8897KB (92% of capacity); largest unused: [0xe89920-473KB] # of 1024-byte blocks: 7858 # of 48-byte blocks: 9035 Before loading international/characters: Overall size: 18123KB; 19500 nodes malloced for 9627KB (53% of capacity); largest unused: [0x2041e00-7928KB] # of 1024-byte blocks: 8106 # of 48-byte blocks: 9969 After loading international/characters: Overall size: 23243KB; 102539 nodes malloced for 14685KB (63% of capacity); largest unused: [0x2155800-6825KB] # of 1024-byte blocks: 9358 # of 48-byte blocks: 91721 The clear-charset-maps call followed by garbage-collect surely reduces the size of the used area (15958KB -> 9188KB), but the unused area is not returned to the system at least on Mac OS X (i.e., "Overall size" does not change). Before clear-charset-maps: Overall size: 23703KB; 105317 nodes malloced for 15958KB (67% of capacity); largest unused: [0x222c000-5967KB] # of 1024-byte blocks: 9865 # of 48-byte blocks: 92577 After clear-charset-maps with garbage-collect: Overall size: 23703KB; 91717 nodes malloced for 9188KB (38% of capacity); largest unused: [0x222c000-5967KB] # of 1024-byte blocks: 3484 # of 48-byte blocks: 85777 FYI, below is the final result of Emacs 22.2.50. Overall size: 10023KB; 18828 nodes malloced for 3319KB (33% of capacity); largest unused: [0x19ae000-6471KB] # of 1024-byte blocks: 1245 # of 48-byte blocks: 15510 YAMAMOTO Mitsuharu mituharu@math.s.chiba-u.ac.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-17 1:52 ` YAMAMOTO Mitsuharu @ 2008-05-19 1:45 ` Kenichi Handa 0 siblings, 0 replies; 86+ messages in thread From: Kenichi Handa @ 2008-05-19 1:45 UTC (permalink / raw) To: YAMAMOTO Mitsuharu; +Cc: dann, jasonr, rms, evilborisnet, emacs-devel In article <wlabip66ji.wl%mituharu@math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes: > The clear-charset-maps call followed by garbage-collect surely reduces > the size of the used area (15958KB -> 9188KB), but the unused area is > not returned to the system at least on Mac OS X (i.e., "Overall size" > does not change). > Before clear-charset-maps: > Overall size: 23703KB; 105317 nodes malloced for 15958KB (67% of capacity); largest unused: [0x222c000-5967KB] > # of 1024-byte blocks: 9865 # of 48-byte blocks: 92577 > After clear-charset-maps with garbage-collect: > Overall size: 23703KB; 91717 nodes malloced for 9188KB (38% of capacity); largest unused: [0x222c000-5967KB] > # of 1024-byte blocks: 3484 # of 48-byte blocks: 85777 Could someone tell me how to do the same check on GNU/Linux system? --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-05-17 0:56 ` Kenichi Handa 2008-05-17 1:52 ` YAMAMOTO Mitsuharu @ 2008-10-30 10:18 ` Emanuele Giaquinta 2008-10-30 21:22 ` Eli Zaretskii 2008-10-31 3:55 ` Richard M. Stallman 1 sibling, 2 replies; 86+ messages in thread From: Emanuele Giaquinta @ 2008-10-30 10:18 UTC (permalink / raw) To: Kenichi Handa; +Cc: dann, emacs-devel, rms, evilborisnet, Jason Rumney On Sat, May 17, 2008 at 09:56:58AM +0900, Kenichi Handa wrote: > In article <482D8435.6060407@gnu.org>, Jason Rumney <jasonr@gnu.org> writes: > > > How big are the data structures holding all the unicode character info > > and translation tables for encodings? > > I have not yet calculated them. Each CJK charset defined by > a map need a char-table of encoding and a vector for > decoding. > > > At one point you said you wanted > > to make the unicode tables be loaded on demand but AFAIK they are > > currently dumped into the emacs executable. Could that account for a > > large proportion of the size increase? > > They are surely loaded on demand. And, at the end of the > dumping process, by calling clear-charset-maps, all loaded > maps are set to nil, and then garbage-collect is called. So > I think they are not included in the dumped Emacs. Perhaps > some other char-table is the culprit. Maybe > Vchar_unify_table. I must re-read src/charset.c. If I comment the load_charset_map_from_file call in unify_charset the data segment size is back to normal. Emanuele ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-30 10:18 ` Emanuele Giaquinta @ 2008-10-30 21:22 ` Eli Zaretskii 2008-10-30 21:42 ` Stefan Monnier 2008-10-31 3:55 ` Richard M. Stallman 1 sibling, 1 reply; 86+ messages in thread From: Eli Zaretskii @ 2008-10-30 21:22 UTC (permalink / raw) To: Emanuele Giaquinta; +Cc: rms, handa, jasonr, dann, evilborisnet, emacs-devel > Date: Thu, 30 Oct 2008 11:18:19 +0100 > From: Emanuele Giaquinta <emanuele.giaquinta@gmail.com> > Cc: dann@ics.uci.edu, emacs-devel@gnu.org, rms@gnu.org, > evilborisnet@netscape.net, Jason Rumney <jasonr@gnu.org> > > > They are surely loaded on demand. And, at the end of the > > dumping process, by calling clear-charset-maps, all loaded > > maps are set to nil, and then garbage-collect is called. So > > I think they are not included in the dumped Emacs. Perhaps > > some other char-table is the culprit. Maybe > > Vchar_unify_table. I must re-read src/charset.c. > > If I comment the load_charset_map_from_file call in unify_charset the > data segment size is back to normal. Are we sure GC after setting the maps to nil always resets the data segment size back to what it was before loading those maps? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-30 21:22 ` Eli Zaretskii @ 2008-10-30 21:42 ` Stefan Monnier 0 siblings, 0 replies; 86+ messages in thread From: Stefan Monnier @ 2008-10-30 21:42 UTC (permalink / raw) To: Eli Zaretskii Cc: rms, Emanuele Giaquinta, handa, jasonr, dann, evilborisnet, emacs-devel >> > They are surely loaded on demand. And, at the end of the >> > dumping process, by calling clear-charset-maps, all loaded >> > maps are set to nil, and then garbage-collect is called. So >> > I think they are not included in the dumped Emacs. Perhaps >> > some other char-table is the culprit. Maybe >> > Vchar_unify_table. I must re-read src/charset.c. >> >> If I comment the load_charset_map_from_file call in unify_charset the >> data segment size is back to normal. > Are we sure GC after setting the maps to nil always resets the data > segment size back to what it was before loading those maps? Given the above evidence, I'd say we're pretty sure it doesn't. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-30 10:18 ` Emanuele Giaquinta 2008-10-30 21:22 ` Eli Zaretskii @ 2008-10-31 3:55 ` Richard M. Stallman 2008-10-31 5:29 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Richard M. Stallman @ 2008-10-31 3:55 UTC (permalink / raw) To: Emanuele Giaquinta; +Cc: jasonr, dann, emacs-devel, evilborisnet, handa If I comment the load_charset_map_from_file call in unify_charset the data segment size is back to normal. Although these are loaded "on demand", perhaps something "demands" them at build time. How about if you try running temacs (to build and dump emacs) under GDB, with a breakpoint there? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 3:55 ` Richard M. Stallman @ 2008-10-31 5:29 ` Kenichi Handa 2008-10-31 6:32 ` Chong Yidong ` (3 more replies) 0 siblings, 4 replies; 86+ messages in thread From: Kenichi Handa @ 2008-10-31 5:29 UTC (permalink / raw) To: rms; +Cc: jasonr, dann, evilborisnet, emanuele.giaquinta, emacs-devel In article <E1Kvl71-0002pv-Pt@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes: > If I comment the load_charset_map_from_file call in unify_charset the > data segment size is back to normal. > Although these are loaded "on demand", perhaps something "demands" them > at build time. It's not that simple. This is the strategy of the charset map loading mechanism. I took that approach expecting that char-tables that are garbage-collected before dumping are not in the dumped file. (0) At first, Emacs assigns a unique linear character code space in upper Unicode area (#x110000-) to each big character set (e.g. GB, JIS, KSC) (*see the note at the tail). The decoding of a character of a specific charset into this area is quite fast (done just by a few steps of arithmetic calculation). Encoding is the same too. (1) While building Emacs, when unify-charset is called, we update two char-tables Vchar_unify_table, and Vchar_unified_charset_table. The former maps a character in the above upper area to Unicode area, and the latter maps the character to charset symbol. Unify-charset also builds deunifier char-table for each charater set that maps a character in Unicode area to the upper area that is unique to each charset. So at this time, the full maps is build. (2) Just before dumping, clear-charset-maps is called. This function sets all char-tables built in (1) (except for Vchar_unified_charset_table) to nil. Then set Vchar_unify_table to Vchar_unified_charset_table, and set Vchar_unified_charset_table to nil. Then, garbage-collect is called. After that, the living char-table is Vchar_unify_table only, and the contents is not that big because it maps upper area characters to charset, and each charset has linear upper area, thus most succeeding charaters have the same value. (3) When the dumped Emacs runs, at the time of decoding/encoding charsets that are unified as above, by checking if the value of Vchar_unify_table for a character is symbol or not, Emacs knows whether it has to load the mapping table again or not. So, that way, Emacs loads maps on demand. *Note: The reason Emacs assigns those linear area is because such big charsets tend to have their own private use area, and we must keep a unique characte code for them. Those private characters are decoded and encoded without being mapped to Unicode are. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 5:29 ` Kenichi Handa @ 2008-10-31 6:32 ` Chong Yidong 2008-10-31 7:32 ` Kenichi Handa 2008-10-31 10:41 ` YAMAMOTO Mitsuharu 2008-10-31 15:07 ` Dan Nicolaescu ` (2 subsequent siblings) 3 siblings, 2 replies; 86+ messages in thread From: Chong Yidong @ 2008-10-31 6:32 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > (2) Just before dumping, clear-charset-maps is called. This > function sets all char-tables built in (1) (except for > Vchar_unified_charset_table) to nil. Then set > Vchar_unify_table to Vchar_unified_charset_table, and > set Vchar_unified_charset_table to nil. > > Then, garbage-collect is called. After that, the living > char-table is Vchar_unify_table only, and the contents > is not that big because it maps upper area characters to > charset, and each charset has linear upper area, thus > most succeeding charaters have the same value. Is it possible to verify whether the garbage collection indeed reaps this data? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 6:32 ` Chong Yidong @ 2008-10-31 7:32 ` Kenichi Handa 2008-10-31 10:09 ` Eli Zaretskii 2008-10-31 10:41 ` YAMAMOTO Mitsuharu 1 sibling, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-10-31 7:32 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, jasonr, dann, evilborisnet, emacs-devel In article <87zlklff5a.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Kenichi Handa <handa@m17n.org> writes: > > (2) Just before dumping, clear-charset-maps is called. This > > function sets all char-tables built in (1) (except for > > Vchar_unified_charset_table) to nil. Then set > > Vchar_unify_table to Vchar_unified_charset_table, and > > set Vchar_unified_charset_table to nil. > > > > Then, garbage-collect is called. After that, the living > > char-table is Vchar_unify_table only, and the contents > > is not that big because it maps upper area characters to > > charset, and each charset has linear upper area, thus > > most succeeding charaters have the same value. > Is it possible to verify whether the garbage collection indeed reaps > this data? To verify that, I tried to run temacs under gdb, but I got this error. (gdb) run -batch -l loadup dump Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump [Thread debugging using libthread_db enabled] [New process 10730] Executing new program: /usr/local/work/emacs/src/temacs warning: Cannot initialize thread debugging library: generic error warning: Cannot initialize thread debugging library: generic error [New process 10730] [Thread debugging using libthread_db enabled] Debugger segmentation fault Do you know about this error? Is this a bug of gdb? A few months ago, I didn't see this error on running temacs under gdb. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 7:32 ` Kenichi Handa @ 2008-10-31 10:09 ` Eli Zaretskii 2008-10-31 12:33 ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa 2008-10-31 12:35 ` size of emacs executable after unicode merge Stephen Berman 0 siblings, 2 replies; 86+ messages in thread From: Eli Zaretskii @ 2008-10-31 10:09 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Fri, 31 Oct 2008 16:32:26 +0900 > Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, jasonr@gnu.org, dann@ics.uci.edu, > evilborisnet@netscape.net, emacs-devel@gnu.org > > > Is it possible to verify whether the garbage collection indeed reaps > > this data? > > To verify that, I tried to run temacs under gdb, but I got > this error. > > (gdb) run -batch -l loadup dump > Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump > [Thread debugging using libthread_db enabled] > [New process 10730] > Executing new program: /usr/local/work/emacs/src/temacs > warning: Cannot initialize thread debugging library: generic error > warning: Cannot initialize thread debugging library: generic error > [New process 10730] > [Thread debugging using libthread_db enabled] > Debugger segmentation fault > > Do you know about this error? Is this a bug of gdb? A few > months ago, I didn't see this error on running temacs under > gdb. On a GNU/Linux system, with today's CVS configured as shown below, and with GDB 6.7.1, I don't get this problem. Why is libthread_db being used? does your build use GTK+ or some other library that uses multithreading? Here's how my CVS Emacs was configured: ./configure --with-jpeg=no --with-png=no --with-gif=no --with-tiff=no [...] Configured for `x86_64-unknown-linux-gnu'. Where should the build process find the source code? [...]/eliz/emacs.cvs/emacs What operating system and machine description files should Emacs use? `s/gnu-linux.h' and `m/amdx86-64.h' What compiler should emacs be built with? gcc -g -O2 -Wno-pointer-sign Should Emacs use the GNU version of malloc? yes (Using Doug Lea's new malloc from the GNU C Library.) Should Emacs use a relocating allocator for buffers? yes Should Emacs use mmap(2) for buffer allocation? no What window system should Emacs use? x11 What toolkit should Emacs use? LUCID Where do we find X Windows header files? Standard dirs Where do we find X Windows libraries? /usr/X11R6/lib Does Emacs use -lXaw3d? no Does Emacs use -lXpm? yes Does Emacs use -ljpeg? no Does Emacs use -ltiff? no Does Emacs use a gif library? no Does Emacs use -lpng? no Does Emacs use -lrsvg-2? no Does Emacs use -lgpm? yes Does Emacs use -ldbus? no Does Emacs use -lfreetype? no Does Emacs use -lm17n-flt? no Does Emacs use -lotf? no Does Emacs use -lxft? no Does Emacs use toolkit scroll bars? no ^ permalink raw reply [flat|nested] 86+ messages in thread
* gdb error [Re: size of emacs executable after unicode merge] 2008-10-31 10:09 ` Eli Zaretskii @ 2008-10-31 12:33 ` Kenichi Handa 2008-10-31 14:28 ` Eli Zaretskii 2008-10-31 12:35 ` size of emacs executable after unicode merge Stephen Berman 1 sibling, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-10-31 12:33 UTC (permalink / raw) To: Eli Zaretskii Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet, emacs-devel In article <umyglvzy8.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > (gdb) run -batch -l loadup dump > > Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump > > [Thread debugging using libthread_db enabled] > > [New process 10730] > > Executing new program: /usr/local/work/emacs/src/temacs > > warning: Cannot initialize thread debugging library: generic error > > warning: Cannot initialize thread debugging library: generic error > > [New process 10730] > > [Thread debugging using libthread_db enabled] > > Debugger segmentation fault > > > > Do you know about this error? Is this a bug of gdb? A few > > months ago, I didn't see this error on running temacs under > > gdb. > On a GNU/Linux system, with today's CVS configured as shown below, and > with GDB 6.7.1, I don't get this problem. My gdb is "GNU gdb 6.8-debian", and my system is debian testing (kernel is 2.6.26-1-686). Emacs is configured with no option as below: Configured for `i686-pc-linux-gnu'. Where should the build process find the source code? /usr/local/work/emacs What operating system and machine description files should Emacs use? `s/gnu-linux.h' and `m/intel386.h' What compiler should emacs be built with? gcc -g -O2 -Wno-pointer-sign Should Emacs use the GNU version of malloc? yes (Using Doug Lea's new malloc from the GNU C Library.) Should Emacs use a relocating allocator for buffers? yes Should Emacs use mmap(2) for buffer allocation? no What window system should Emacs use? x11 What toolkit should Emacs use? GTK Where do we find X Windows header files? Standard dirs Where do we find X Windows libraries? Standard dirs Does Emacs use -lXaw3d? no Does Emacs use -lXpm? yes Does Emacs use -ljpeg? yes Does Emacs use -ltiff? yes Does Emacs use a gif library? yes -lgif Does Emacs use -lpng? yes Does Emacs use -lrsvg-2? no Does Emacs use -lgpm? no Does Emacs use -ldbus? yes Does Emacs use -lfreetype? yes Does Emacs use -lm17n-flt? yes Does Emacs use -lotf? yes Does Emacs use -lxft? yes Does Emacs use toolkit scroll bars? yes > Why is libthread_db being used? does your build use GTK+ or some other > library that uses multithreading? As shown above, yes my build uses GTK+. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: gdb error [Re: size of emacs executable after unicode merge] 2008-10-31 12:33 ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa @ 2008-10-31 14:28 ` Eli Zaretskii 0 siblings, 0 replies; 86+ messages in thread From: Eli Zaretskii @ 2008-10-31 14:28 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, cyd, jasonr, dann, evilborisnet, emacs-devel > From: Kenichi Handa <handa@m17n.org> > CC: rms@gnu.org, emanuele.giaquinta@gmail.com, cyd@stupidchicken.com, > jasonr@gnu.org, dann@ics.uci.edu, evilborisnet@netscape.net, > emacs-devel@gnu.org > Date: Fri, 31 Oct 2008 21:33:20 +0900 > > > Why is libthread_db being used? does your build use GTK+ or some other > > library that uses multithreading? > > As shown above, yes my build uses GTK+. So you should at least be able to reconfigure without GTK, rebuild, and then use GDB to debug the original problem (which I think is unrelated to GTK). As for the GDB crash, perhaps this should be taken up with the GDB maintainers on gdb@sources.redhat.com. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 10:09 ` Eli Zaretskii 2008-10-31 12:33 ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa @ 2008-10-31 12:35 ` Stephen Berman 2008-11-21 12:32 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Stephen Berman @ 2008-10-31 12:35 UTC (permalink / raw) To: emacs-devel On Fri, 31 Oct 2008 12:09:03 +0200 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Kenichi Handa <handa@m17n.org> >> Date: Fri, 31 Oct 2008 16:32:26 +0900 >> Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, jasonr@gnu.org, dann@ics.uci.edu, >> evilborisnet@netscape.net, emacs-devel@gnu.org >> >> > Is it possible to verify whether the garbage collection indeed reaps >> > this data? >> >> To verify that, I tried to run temacs under gdb, but I got >> this error. >> >> (gdb) run -batch -l loadup dump >> Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump >> [Thread debugging using libthread_db enabled] >> [New process 10730] >> Executing new program: /usr/local/work/emacs/src/temacs >> warning: Cannot initialize thread debugging library: generic error >> warning: Cannot initialize thread debugging library: generic error >> [New process 10730] >> [Thread debugging using libthread_db enabled] >> Debugger segmentation fault >> >> Do you know about this error? Is this a bug of gdb? A few >> months ago, I didn't see this error on running temacs under >> gdb. > > On a GNU/Linux system, with today's CVS configured as shown below, and > with GDB 6.7.1, I don't get this problem. > > Why is libthread_db being used? does your build use GTK+ or some other > library that uses multithreading? I also get the same gdb segfault as Handa-san with temacs from GNU Emacs 23.0.60.12 (i686-pc-linux-gnu, GTK+ Version 2.12.9) of 2008-10-25 on escher, built with the default configuration. My gdb is version 6.8. Steve Berman ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 12:35 ` size of emacs executable after unicode merge Stephen Berman @ 2008-11-21 12:32 ` Kenichi Handa 2008-11-21 14:18 ` Ulrich Mueller 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-21 12:32 UTC (permalink / raw) To: Stephen Berman; +Cc: emacs-devel In article <87skqd6iyx.fsf@escher.local.home>, Stephen Berman <stephen.berman@gmx.net> writes: >>> From: Kenichi Handa <handa@m17n.org> [...] >>> (gdb) run -batch -l loadup dump >>> Starting program: /usr/local/work/emacs/src/temacs -batch -l loadup dump >>> [Thread debugging using libthread_db enabled] >>> [New process 10730] >>> Executing new program: /usr/local/work/emacs/src/temacs >>> warning: Cannot initialize thread debugging library: generic error >>> warning: Cannot initialize thread debugging library: generic error >>> [New process 10730] >>> [Thread debugging using libthread_db enabled] >>> Debugger segmentation fault >>> >>> Do you know about this error? Is this a bug of gdb? A few >>> months ago, I didn't see this error on running temacs >>> under gdb. > > > > On a GNU/Linux system, with today's CVS configured as shown below, and > > with GDB 6.7.1, I don't get this problem. > > > > Why is libthread_db being used? does your build use GTK+ or some other > > library that uses multithreading? > I also get the same gdb segfault as Handa-san with temacs from GNU Emacs > 23.0.60.12 (i686-pc-linux-gnu, GTK+ Version 2.12.9) of 2008-10-25 on > escher, built with the default configuration. My gdb is version 6.8. It seems that the above error happens when temacs is run under gdb, and execvp is called in the following hunk of codes in main (emacs.c). #ifdef HAVE_PERSONALITY_LINUX32 if (!initialized && (strcmp (argv[argc-1], "dump") == 0 || strcmp (argv[argc-1], "bootstrap") == 0) && ! getenv ("EMACS_HEAP_EXEC")) { /* Set this so we only do this once. */ putenv("EMACS_HEAP_EXEC=true"); /* A flag to turn off address randomization which is introduced in linux kernel shipped with fedora core 4 */ #define ADD_NO_RANDOMIZE 0x0040000 personality (PER_LINUX32 | ADD_NO_RANDOMIZE); #undef ADD_NO_RANDOMIZE execvp (argv[0], argv); /* If the exec fails, try to dump anyway. */ perror ("execvp"); } #endif /* HAVE_PERSONALITY_LINUX32 */ Currently, I'm running temacs under gdb with this workaround: (gdb) set env EMACS_HEAP_EXEC=1 With this, temacs runs until Fdump_emacs is called (and thus I can at least debug loadup.el). In the call of Fdump_emacs, this message is printed: ************************************************** Warning: Your system has a gap between BSS and the heap (9475964 bytes). This usually means that exec-shield or something similar is in effect. The dump may fail because of this. See the section about exec-shield in etc/PROBLEMS for more information. ************************************************** and temacs crashes by SIGSEGV. Although etc/PROBLEM has this statement: You can check the Exec-shield state like this: cat /proc/sys/kernel/exec-shield but, my system (debian testing) doesn't have that file. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-21 12:32 ` Kenichi Handa @ 2008-11-21 14:18 ` Ulrich Mueller 0 siblings, 0 replies; 86+ messages in thread From: Ulrich Mueller @ 2008-11-21 14:18 UTC (permalink / raw) To: Kenichi Handa; +Cc: Stephen Berman, emacs-devel >>>>> Kenichi Handa wrote: > Currently, I'm running temacs under gdb with this workaround: > (gdb) set env EMACS_HEAP_EXEC=1 > With this, temacs runs until Fdump_emacs is called (and thus > I can at least debug loadup.el). In the call of > Fdump_emacs, this message is printed: > ************************************************** > Warning: Your system has a gap between BSS and the > heap (9475964 bytes). This usually means that exec-shield > or something similar is in effect. The dump may > fail because of this. See the section about > exec-shield in etc/PROBLEMS for more information. > ************************************************** > and temacs crashes by SIGSEGV. > Although etc/PROBLEM has this statement: > You can check the Exec-shield state like this: > cat /proc/sys/kernel/exec-shield > but, my system (debian testing) doesn't have that file. As far as I know, exec-shield is a feature specific to some RH/Fedora versions. However, Linux versions 2.6.25 and later implement heap randomisation, see <http://lkml.org/lkml/2007/10/23/435>. A possible workaround is to turn heap randomisation off: ,---- | echo -n 1 >/proc/sys/kernel/randomize_va_space `---- Ulrich P.S. Gentoo bug 236579 <http://bugs.gentoo.org/236579> contains some additional information about this. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 6:32 ` Chong Yidong 2008-10-31 7:32 ` Kenichi Handa @ 2008-10-31 10:41 ` YAMAMOTO Mitsuharu 1 sibling, 0 replies; 86+ messages in thread From: YAMAMOTO Mitsuharu @ 2008-10-31 10:41 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, dann, evilborisnet, jasonr >>>>> On Fri, 31 Oct 2008 02:32:49 -0400, Chong Yidong <cyd@stupidchicken.com> said: > Is it possible to verify whether the garbage collection indeed reaps > this data? Put it in some weak hash table and check if it survives GC? Anyway, an experiment on Mac OS X (*1) shows that clear-charset-maps followed by GC actually collects some amount of data in heap (~7MB), but they are not returned to the system, at least with its malloc implementation. *1: http://lists.gnu.org/archive/html/emacs-devel/2008-05/msg01049.html YAMAMOTO Mitsuharu mituharu@math.s.chiba-u.ac.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 5:29 ` Kenichi Handa 2008-10-31 6:32 ` Chong Yidong @ 2008-10-31 15:07 ` Dan Nicolaescu 2008-10-31 16:44 ` Stefan Monnier 2008-10-31 19:30 ` Richard M. Stallman 2008-11-09 22:43 ` Chong Yidong 3 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-10-31 15:07 UTC (permalink / raw) To: Kenichi Handa; +Cc: evilborisnet, jasonr, rms, emanuele.giaquinta, emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <E1Kvl71-0002pv-Pt@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes: > > > If I comment the load_charset_map_from_file call in unify_charset the > > data segment size is back to normal. > > > Although these are loaded "on demand", perhaps something "demands" them > > at build time. > > It's not that simple. This is the strategy of the charset > map loading mechanism. I took that approach expecting that > char-tables that are garbage-collected before dumping are > not in the dumped file. > > (0) At first, Emacs assigns a unique linear character code > space in upper Unicode area (#x110000-) to each big > character set (e.g. GB, JIS, KSC) (*see the note at the > tail). The decoding of a character of a specific > charset into this area is quite fast (done just by a few > steps of arithmetic calculation). Encoding is the same > too. > > (1) While building Emacs, when unify-charset is called, we > update two char-tables Vchar_unify_table, and > Vchar_unified_charset_table. The former maps a > character in the above upper area to Unicode area, and > the latter maps the character to charset symbol. > Unify-charset also builds deunifier char-table for each > charater set that maps a character in Unicode area to > the upper area that is unique to each charset. > > So at this time, the full maps is build. > > (2) Just before dumping, clear-charset-maps is called. This > function sets all char-tables built in (1) (except for > Vchar_unified_charset_table) to nil. Then set > Vchar_unify_table to Vchar_unified_charset_table, and > set Vchar_unified_charset_table to nil. > > Then, garbage-collect is called. After that, the living > char-table is Vchar_unify_table only, and the contents > is not that big because it maps upper area characters to > charset, and each charset has linear upper area, thus > most succeeding charaters have the same value. To allow the allocator can release pages back to the system after they being garbage collected, you have to be sure that absolutely ALL the data allocated can be garbage collected. (and even then you depend on the quirks of the platform specific malloc implementation to do it). From the sound of the description above, it sounds like the data in Vchar_unify_table is allocated while reading the charset data, and it is not released after the charset data is. So the allocator cannot release all the pages... [note: this speculation based solely on your description above] > (3) When the dumped Emacs runs, at the time of > decoding/encoding charsets that are unified as above, by > checking if the value of Vchar_unify_table for a > character is symbol or not, Emacs knows whether it has > to load the mapping table again or not. > > So, that way, Emacs loads maps on demand. So it sounds that your goal is to build Vchar_unify_table, and it is build from static data in emacs/etc/charsets/*. In that case, can't the data in Vchar_unify_table be a C data structure that is build offline, and just compiled into emacs? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 15:07 ` Dan Nicolaescu @ 2008-10-31 16:44 ` Stefan Monnier 2008-11-04 23:09 ` Chong Yidong 0 siblings, 1 reply; 86+ messages in thread From: Stefan Monnier @ 2008-10-31 16:44 UTC (permalink / raw) To: Dan Nicolaescu Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, evilborisnet, jasonr > To allow the allocator can release pages back to the system after they > being garbage collected, you have to be sure that absolutely ALL the > data allocated can be garbage collected. (and even then you depend on > the quirks of the platform specific malloc implementation to do it). Indeed. Some malloc implementation just never return memory to the system, and all others only do it in some cases but not all. And in our case we may not even be able to call `free' because some live data is interspersed within the sea of dead data. > So it sounds that your goal is to build Vchar_unify_table, and it is > build from static data in emacs/etc/charsets/*. In that case, can't the > data in Vchar_unify_table be a C data structure that is build offline, > and just compiled into emacs? Maybe simpler would be to build this table as we do now, then print it into a file. Then the "dump" doesn't need to build the table, it can just read it from the file. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 16:44 ` Stefan Monnier @ 2008-11-04 23:09 ` Chong Yidong 2008-11-05 4:17 ` Kenichi Handa 0 siblings, 1 reply; 86+ messages in thread From: Chong Yidong @ 2008-11-04 23:09 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, Dan Nicolaescu, Stefan Monnier, evilborisnet, jasonr Stefan Monnier <monnier@iro.umontreal.ca> writes: > Maybe simpler would be to build this table as we do now, then print it > into a file. Then the "dump" doesn't need to build the table, it can > just read it from the file. Handa-san, could you comment on this? Suppose we have a char-table on file that has the correct precomputed values for Vchar_unify_table. At which point should Emacs load it? In place of mule-conf.el in loadup.el? Or would we need to rewrite mule-conf.el? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-04 23:09 ` Chong Yidong @ 2008-11-05 4:17 ` Kenichi Handa 2008-11-05 15:50 ` Stefan Monnier 2008-11-05 22:30 ` Richard M. Stallman 0 siblings, 2 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-05 4:17 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr In article <873ai7t7fx.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > Maybe simpler would be to build this table as we do now, then print it > > into a file. Then the "dump" doesn't need to build the table, it can > > just read it from the file. > Handa-san, could you comment on this? I think it doesn't work. Emacs needs that table and the other mapping char-tables to decode non-ascii characters in files that are loaded before dumping. > Suppose we have a char-table on file that has the correct precomputed > values for Vchar_unify_table. At which point should Emacs load it? In > place of mule-conf.el in loadup.el? Or would we need to rewrite > mule-conf.el? Once Emacs loads it before dumping, it occupies Emacs memory and whether it is freed or not before dumping is unpredictable. If it is impossible to exlucde garbage-colleted data (especially char-tables) from the dumpled file, it seems that the only way is not to build those char-tables. But it requires rather heavy changes to files loaded before dumped. Anoher way to avoid this problem is, I think, to have a portable dumper. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-05 4:17 ` Kenichi Handa @ 2008-11-05 15:50 ` Stefan Monnier 2008-11-06 7:56 ` Kenichi Handa 2008-11-05 22:30 ` Richard M. Stallman 1 sibling, 1 reply; 86+ messages in thread From: Stefan Monnier @ 2008-11-05 15:50 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, dann, evilborisnet, jasonr >> > Maybe simpler would be to build this table as we do now, then print it >> > into a file. Then the "dump" doesn't need to build the table, it can >> > just read it from the file. >> Handa-san, could you comment on this? > I think it doesn't work. Emacs needs that table and the other > mapping char-tables to decode non-ascii characters in files > that are loaded before dumping. IIUC Emacs only needs those tables when loading files that use some particular encodings. E.g. it doesn't need it to read files using Unicode encodings. So couldn't we make sure all the files read during the dump are using Utf-8 or utf-8-emacs? Isn't that already the case for all *.elc files? > If it is impossible to exclude garbage-colleted data (especially > char-tables) from the dumpled file, it seems that the only way is not > to build those char-tables. But it requires rather heavy changes to > files loaded before dumped. What kind of changes? Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-05 15:50 ` Stefan Monnier @ 2008-11-06 7:56 ` Kenichi Handa 2008-11-08 2:42 ` Stefan Monnier 2008-11-09 20:14 ` Chong Yidong 0 siblings, 2 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-06 7:56 UTC (permalink / raw) To: Stefan Monnier Cc: rms, emanuele.giaquinta, cyd, emacs-devel, dann, evilborisnet, jasonr In article <jwvskq6w4yi.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > I think it doesn't work. Emacs needs that table and the other > > mapping char-tables to decode non-ascii characters in files > > that are loaded before dumping. > IIUC Emacs only needs those tables when loading files that use some > particular encodings. E.g. it doesn't need it to read files using > Unicode encodings. Yes. > So couldn't we make sure all the files read during > the dump are using Utf-8 or utf-8-emacs? Isn't that already the case > for all *.elc files? Not yet, but such a change of encoding is easy. The problem is that lisp/international/characters.el setups syntax-table and category-table for many characters by map-charset-chars. Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) To know which (Unicode) characters belongs to chinese-gb2312, Emacs has to load a mapping table. > > If it is impossible to exclude garbage-colleted data (especially > > char-tables) from the dumpled file, it seems that the only way is not > > to build those char-tables. But it requires rather heavy changes to > > files loaded before dumped. > What kind of changes? The biggest one is for characters.el to solve the above problem. The others are for files that uses decode-char and encode-char (e.g. japanese.el). We can't use them in pre-loaded files. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-06 7:56 ` Kenichi Handa @ 2008-11-08 2:42 ` Stefan Monnier 2008-11-08 4:10 ` Chong Yidong 2008-11-08 10:30 ` Dan Nicolaescu 2008-11-09 20:14 ` Chong Yidong 1 sibling, 2 replies; 86+ messages in thread From: Stefan Monnier @ 2008-11-08 2:42 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, cyd, emacs-devel, dann, evilborisnet, jasonr >> So couldn't we make sure all the files read during the dump are using >> Utf-8 or utf-8-emacs? Isn't that already the case for all >> *.elc files? > Not yet, but such a change of encoding is easy. The problem > is that lisp/international/characters.el setups syntax-table > and category-table for many characters by map-charset-chars. > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > To know which (Unicode) characters belongs to > chinese-gb2312, Emacs has to load a mapping table. I guess that means we need to precompute the category table and other tables populated in that way, so we can just load them during the dump without needing to populate the other maps. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-08 2:42 ` Stefan Monnier @ 2008-11-08 4:10 ` Chong Yidong 2008-11-08 9:19 ` Eli Zaretskii 2008-11-09 0:27 ` Richard M. Stallman 2008-11-08 10:30 ` Dan Nicolaescu 1 sibling, 2 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-08 4:10 UTC (permalink / raw) To: Stefan Monnier Cc: rms, emanuele.giaquinta, Kenichi Handa, emacs-devel, dann, evilborisnet, jasonr Stefan Monnier <monnier@iro.umontreal.ca> writes: >> The problem is that lisp/international/characters.el setups >> syntax-table and category-table for many characters by >> map-charset-chars. >> >> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > >> To know which (Unicode) characters belongs to >> chinese-gb2312, Emacs has to load a mapping table. > > I guess that means we need to precompute the category table and other > tables populated in that way, so we can just load them during the dump > without needing to populate the other maps. That doesn't sound like it can be done within the timeframe for the 23.1 release. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-08 4:10 ` Chong Yidong @ 2008-11-08 9:19 ` Eli Zaretskii 2008-11-09 0:27 ` Richard M. Stallman 1 sibling, 0 replies; 86+ messages in thread From: Eli Zaretskii @ 2008-11-08 9:19 UTC (permalink / raw) To: Chong Yidong; +Cc: emacs-devel, monnier, evilborisnet, handa > From: Chong Yidong <cyd@stupidchicken.com> > Date: Fri, 07 Nov 2008 23:10:06 -0500 > Cc: rms@gnu.org, emanuele.giaquinta@gmail.com, Kenichi Handa <handa@m17n.org>, > emacs-devel@gnu.org, dann@ics.uci.edu, evilborisnet@netscape.net, > jasonr@gnu.org > > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > >> The problem is that lisp/international/characters.el setups > >> syntax-table and category-table for many characters by > >> map-charset-chars. > >> > >> Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > > > >> To know which (Unicode) characters belongs to > >> chinese-gb2312, Emacs has to load a mapping table. > > > > I guess that means we need to precompute the category table and other > > tables populated in that way, so we can just load them during the dump > > without needing to populate the other maps. > > That doesn't sound like it can be done within the timeframe for the 23.1 > release. I'm not sure; it's not like 23.1 release date is tomorrow or next week. In any case, if we decide not to solve this for Emacs 23.1, we should have an entry in PROBLEMS about it. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-08 4:10 ` Chong Yidong 2008-11-08 9:19 ` Eli Zaretskii @ 2008-11-09 0:27 ` Richard M. Stallman 2008-11-09 6:29 ` Dan Nicolaescu 1 sibling, 1 reply; 86+ messages in thread From: Richard M. Stallman @ 2008-11-09 0:27 UTC (permalink / raw) To: Chong Yidong Cc: emanuele.giaquinta, handa, emacs-devel, dann, monnier, evilborisnet, jasonr > I guess that means we need to precompute the category table and other > tables populated in that way, so we can just load them during the dump > without needing to populate the other maps. That doesn't sound like it can be done within the timeframe for the 23.1 release. How big is the excess size caused by this problem? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 0:27 ` Richard M. Stallman @ 2008-11-09 6:29 ` Dan Nicolaescu 2008-11-09 17:11 ` Richard M. Stallman 0 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-11-09 6:29 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, handa, Chong Yidong, emacs-devel, monnier, evilborisnet, jasonr "Richard M. Stallman" <rms@gnu.org> writes: > > I guess that means we need to precompute the category table and other > > tables populated in that way, so we can just load them during the dump > > without needing to populate the other maps. > > That doesn't sound like it can be done within the timeframe for the 23.1 > release. > > How big is the excess size caused by this problem? The data segment size of the emacs binary went from 4MB before the unicode-2 merge to 11MB after the merge. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 6:29 ` Dan Nicolaescu @ 2008-11-09 17:11 ` Richard M. Stallman 2008-11-10 1:24 ` Stefan Monnier 0 siblings, 1 reply; 86+ messages in thread From: Richard M. Stallman @ 2008-11-09 17:11 UTC (permalink / raw) To: Dan Nicolaescu Cc: emanuele.giaquinta, handa, cyd, emacs-devel, monnier, evilborisnet, jasonr The data segment size of the emacs binary went from 4MB before the unicode-2 merge to 11MB after the merge. Fixing that is worth a delay. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 17:11 ` Richard M. Stallman @ 2008-11-10 1:24 ` Stefan Monnier 2008-11-10 1:55 ` Thomas Lord 0 siblings, 1 reply; 86+ messages in thread From: Stefan Monnier @ 2008-11-10 1:24 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, handa, cyd, emacs-devel, Dan Nicolaescu, evilborisnet, jasonr > The data segment size of the emacs binary went from 4MB before the > unicode-2 merge to 11MB after the merge. > Fixing that is worth a delay. Note: it's not clear how much of those 7MB we can recover. Some of them are undoubtedly actively needed. But, yes, clearly there are several MBs worth of "unnecessary data", some of it is apparently not only dead but even garbage collected. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:24 ` Stefan Monnier @ 2008-11-10 1:55 ` Thomas Lord 2008-11-11 4:37 ` Chong Yidong 0 siblings, 1 reply; 86+ messages in thread From: Thomas Lord @ 2008-11-10 1:55 UTC (permalink / raw) To: Stefan Monnier Cc: rms, emanuele.giaquinta, handa, cyd, emacs-devel, Dan Nicolaescu, evilborisnet, jasonr On Sun, 2008-11-09 at 20:24 -0500, Stefan Monnier wrote: > > The data segment size of the emacs binary went from 4MB before the > > unicode-2 merge to 11MB after the merge. > > Fixing that is worth a delay. > > Note: it's not clear how much of those 7MB we can recover. Some of them > are undoubtedly actively needed. But, yes, clearly there are several > MBs worth of "unnecessary data", some of it is apparently not only dead > but even garbage collected. Are the tables in question represented as flat arrays that are "sparse" in the sense of mostly containing homogeneous sets of values at roughly page-sized chunks? If so, why not just compress them a bit (e.g., make them shallow trees). I've had very good results taking that approach in a past project... (In particular, I chose the maximum depth of the tree I was willing to tolerate then did a search to discover optimal (by my particular metric) branching factors and arranged for the sharing of identical sub-trees.) -t > > > Stefan > > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:55 ` Thomas Lord @ 2008-11-11 4:37 ` Chong Yidong 0 siblings, 0 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-11 4:37 UTC (permalink / raw) To: Thomas Lord Cc: rms, emanuele.giaquinta, handa, emacs-devel, Dan Nicolaescu, Stefan Monnier, evilborisnet, jasonr Thomas Lord <lord@emf.net> writes: > Are the tables in question represented as flat arrays that are "sparse" > in the sense of mostly containing homogeneous sets of values at roughly > page-sized chunks? If so, why not just compress them a bit (e.g., make > them shallow trees). We do employ a strategy of this sort, I think. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-08 2:42 ` Stefan Monnier 2008-11-08 4:10 ` Chong Yidong @ 2008-11-08 10:30 ` Dan Nicolaescu 1 sibling, 0 replies; 86+ messages in thread From: Dan Nicolaescu @ 2008-11-08 10:30 UTC (permalink / raw) To: Stefan Monnier Cc: rms, emanuele.giaquinta, Kenichi Handa, cyd, emacs-devel, evilborisnet, jasonr Stefan Monnier <monnier@iro.umontreal.ca> writes: > >> So couldn't we make sure all the files read during the dump are using > >> Utf-8 or utf-8-emacs? Isn't that already the case for all > >> *.elc files? > > > Not yet, but such a change of encoding is easy. The problem > > is that lisp/international/characters.el setups syntax-table > > and category-table for many characters by map-charset-chars. > > > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > > > To know which (Unicode) characters belongs to > > chinese-gb2312, Emacs has to load a mapping table. > > I guess that means we need to precompute the category table and other > tables populated in that way, so we can just load them during the dump > without needing to populate the other maps. And precomputing should be easy: just prin1 the current value of Vchar_unify_table, put that value in a new file, check the new file in CVS and add code to load it a the right time. Then just disable the current code that computes Vchar_unify_table. Is there any reason that the above would not work? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-06 7:56 ` Kenichi Handa 2008-11-08 2:42 ` Stefan Monnier @ 2008-11-09 20:14 ` Chong Yidong 2008-11-10 1:59 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Chong Yidong @ 2008-11-09 20:14 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, Stefan Monnier, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > The problem is that lisp/international/characters.el setups > syntax-table and category-table for many characters by > map-charset-chars. > > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > > To know which (Unicode) characters belongs to > chinese-gb2312, Emacs has to load a mapping table. Could you try to describe what needs to be done in more detail? That way, even if you don't have time to implement this, someone else might be able to take a stab at it. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 20:14 ` Chong Yidong @ 2008-11-10 1:59 ` Kenichi Handa 2008-11-10 15:18 ` Chong Yidong ` (2 more replies) 0 siblings, 3 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-10 1:59 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr In article <87iqqwk672.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Kenichi Handa <handa@m17n.org> writes: > > The problem is that lisp/international/characters.el setups > > syntax-table and category-table for many characters by > > map-charset-chars. > > > > Ex: (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c) > > > > To know which (Unicode) characters belongs to > > chinese-gb2312, Emacs has to load a mapping table. > Could you try to describe what needs to be done in more detail? That > way, even if you don't have time to implement this, someone else might > be able to take a stab at it. map-charset-chars calls FUNCTION (modify-category-entry in the above case) on all characters in CHARSET. But, to know which characters belongs to CHARET (chinese-gb2312 in the above case), we must consult with "etc/charsets/GB2312.map". The contents is something like this: 0x2121-0x2123 0x3000 0x2124 0x30FB 0x2125 0x02C9 [...] From this file, we know that #x3000, #x3001, #x3002, #x30FB, #x02C9, ... belong to chinese-gb2312. We must find a way to make map-charset-chars work without loading that map into a char-table. One idea is to have a single boolean vector of size #x110000 (139264 bytes), setup it for CHARSET everytime when we call map-charset-chars for the different charset. In that vector, only the bit for #x3000, #x3001, #x3002, etc are 1 for chinese-gb2312. Then map-charset-chars can know for which characters FUNCTION must be called. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:59 ` Kenichi Handa @ 2008-11-10 15:18 ` Chong Yidong 2008-11-10 23:18 ` Chong Yidong 2008-11-11 18:17 ` Chong Yidong 2 siblings, 0 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-10 15:18 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > map-charset-chars calls FUNCTION (modify-category-entry in > the above case) on all characters in CHARSET. But, to know > which characters belongs to CHARET (chinese-gb2312 in the > above case), we must consult with > "etc/charsets/GB2312.map". The contents is something like > this: > > 0x2121-0x2123 0x3000 > 0x2124 0x30FB > 0x2125 0x02C9 > [...] > > From this file, we know that #x3000, #x3001, #x3002, #x30FB, > #x02C9, ... belong to chinese-gb2312. > > We must find a way to make map-charset-chars work without > loading that map into a char-table. > > One idea is to have a single boolean vector of size #x110000 > (139264 bytes), setup it for CHARSET everytime when we call > map-charset-chars for the different charset. In that > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 > for chinese-gb2312. Then map-charset-chars can know for > which characters FUNCTION must be called. Good idea. But, is this sufficient? Do you know whether map-charset-chars the only thing during loadup that loads these char-tables? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:59 ` Kenichi Handa 2008-11-10 15:18 ` Chong Yidong @ 2008-11-10 23:18 ` Chong Yidong 2008-11-11 18:17 ` Chong Yidong 2 siblings, 0 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-10 23:18 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr One thing I've noticed from playing around with the code: a lot of the size increase comes from this code portion in map_charset_chars: if (CHARSET_UNIFIED_P (charset) && CHAR_TABLE_P (CHARSET_DEUNIFIER (charset))) { map_char_table_for_charset (c_function, function, CHARSET_DEUNIFIER (charset), arg, partial ? charset : NULL, from, to); } Commenting this out reduces the size of the Emacs executable by 6M. But AFAICT map_char_table_for_charset doesn't create any new char-tables, so maybe our understanding of the problem isn't correct. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:59 ` Kenichi Handa 2008-11-10 15:18 ` Chong Yidong 2008-11-10 23:18 ` Chong Yidong @ 2008-11-11 18:17 ` Chong Yidong 2008-11-12 6:26 ` Kenichi Handa 2 siblings, 1 reply; 86+ messages in thread From: Chong Yidong @ 2008-11-11 18:17 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr Okay, I did a bit more digging. I think the increase in the size of the Emacs executable is not due to the allocation of char-tables. In particular, I've tried this suggestion: > One idea is to have a single boolean vector of size #x110000 > (139264 bytes), setup it for CHARSET everytime when we call > map-charset-chars for the different charset. In that > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 > for chinese-gb2312. Then map-charset-chars can know for > which characters FUNCTION must be called. but it appears to free a negligible about of memory. My current hypothesis is that it's the calls to modify-category-entry and modify-syntax-entry themselves that are responsible for the increase in space. The extra memory is in the standard-category-table and standard-syntax-table data structures. OTOH, I don't any easy way to reduce this data size. The only possibility I can think is to set up a new mechanism to load-defer parts of the syntax and category tables, which doesn't seem feasible for the immediate future. Any ideas? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-11 18:17 ` Chong Yidong @ 2008-11-12 6:26 ` Kenichi Handa 2008-11-13 16:33 ` Chong Yidong 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-12 6:26 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr In article <873ahym8ji.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Okay, I did a bit more digging. > I think the increase in the size of the Emacs executable is not due to > the allocation of char-tables. In particular, I've tried this > suggestion: > > One idea is to have a single boolean vector of size #x110000 > > (139264 bytes), setup it for CHARSET everytime when we call > > map-charset-chars for the different charset. In that > > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 > > for chinese-gb2312. Then map-charset-chars can know for > > which characters FUNCTION must be called. > but it appears to free a negligible about of memory. But, that contradicts with this report from Yamamoto-san: > Anyway, an experiment on Mac OS X (*1) shows that clear-charset-maps > followed by GC actually collects some amount of data in heap (~7MB), > but they are not returned to the system, at least with its malloc > implementation. Did you comment out the calls of unify-charset in mule-conf.el and change the encoding of all preloaded *.el files to utf-8? --- Kenichi Handa handa@ni.aist.go.jp PS. Yidong, it seems that a few mails I wrote in this 24-hour didn't go out correctly. Did you get a mail about the comment on map_char_table_for_charset? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-12 6:26 ` Kenichi Handa @ 2008-11-13 16:33 ` Chong Yidong 2008-11-14 0:48 ` Kenichi Handa 2008-11-27 11:20 ` Kenichi Handa 0 siblings, 2 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-13 16:33 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: >> > One idea is to have a single boolean vector of size #x110000 >> > (139264 bytes), setup it for CHARSET everytime when we call >> > map-charset-chars for the different charset. In that >> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 >> > for chinese-gb2312. Then map-charset-chars can know for >> > which characters FUNCTION must be called. > >> but it appears to free a negligible about of memory. > > Did you comment out the calls of unify-charset in > mule-conf.el and change the encoding of all preloaded *.el > files to utf-8? Commenting out the calls to unify-charset does reduce the memory by several megabytes. But I'm not sure I understand your idea correctly. Here's the approach I tried, do you think it is correct? - introduce a variable called inhibit-load-charsets-map, defaulting to nil - In loadup.el, let-bind inhibit-load-charsets-map to t when loading characters.el. - In map_charset_chars, when inhibit-load-charsets-map is non-nil, and the charset method is deferred, don't call load_charset and map_char_table_for_charset. Instead, call a new function map_charset_chars_from_file. - The new function map_charset_chars_from_file reads the code points and character codes from the charset file and applies the function argument passed to map_charset_chars on those codes directly. Would this work? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-13 16:33 ` Chong Yidong @ 2008-11-14 0:48 ` Kenichi Handa 2008-11-27 11:20 ` Kenichi Handa 1 sibling, 0 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-14 0:48 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Commenting out the calls to unify-charset does reduce the memory by > several megabytes. > But I'm not sure I understand your idea correctly. Here's the approach > I tried, do you think it is correct? > - introduce a variable called inhibit-load-charsets-map, defaulting to > nil > - In loadup.el, let-bind inhibit-load-charsets-map to t when loading > characters.el. > - In map_charset_chars, when inhibit-load-charsets-map is non-nil, and > the charset method is deferred, don't call load_charset and > map_char_table_for_charset. Instead, call a new function > map_charset_chars_from_file. > - The new function map_charset_chars_from_file reads the code points and > character codes from the charset file and applies the function > argument passed to map_charset_chars on those codes directly. > Would this work? I think so, but perhaps it is better that we just set inhibit-load-charsets-map to t before loading mule-conf, and set it to nil just before dumping. And signal an error when it is found that loading a charaset map is necessary while inhibit-load-charsets-map is non-nil. And, we also have to modify `unify-charset' not to call load_charset_map_from_file but just to setup up Vchar_unify_table so that all characters in upper space that must be unified with Unicode have value `t'. How about making a branch, say "reduce-dump-size", and commit your change so that I can check the code? --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-13 16:33 ` Chong Yidong 2008-11-14 0:48 ` Kenichi Handa @ 2008-11-27 11:20 ` Kenichi Handa 2008-11-27 16:07 ` Chong Yidong ` (4 more replies) 1 sibling, 5 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-27 11:20 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Kenichi Handa <handa@m17n.org> writes: >>> > One idea is to have a single boolean vector of size #x110000 >>> > (139264 bytes), setup it for CHARSET everytime when we call >>> > map-charset-chars for the different charset. In that >>> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 >>> > for chinese-gb2312. Then map-charset-chars can know for >>> > which characters FUNCTION must be called. > > >>> but it appears to free a negligible about of memory. > > > > Did you comment out the calls of unify-charset in > > mule-conf.el and change the encoding of all preloaded *.el > > files to utf-8? > Commenting out the calls to unify-charset does reduce the memory by > several megabytes. After taking over Chong's experiment, I could reduce the size of Emacs executables about 7M bytes. About 4M bytes were actually because of charset mapping tables, and it could be reduced by setting up C structure temp_charset_work (see charset.c for the detail) instead of making many Lisp objects (char-table and vector). Another 3M bytes were because of big standard category table. It could be reduced by hashing the table entries (see hash_get_category_set in category.c for the detail). As a result, now the executable is 10,671,313 bytes on GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but I'm not sure it's worth making more effort to reduce it. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 11:20 ` Kenichi Handa @ 2008-11-27 16:07 ` Chong Yidong 2008-11-27 16:12 ` Dan Nicolaescu ` (3 subsequent siblings) 4 siblings, 0 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-27 16:07 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, monnier, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > After taking over Chong's experiment, I could reduce the > size of Emacs executables about 7M bytes. About 4M bytes > were actually because of charset mapping tables, and it > could be reduced by setting up C structure temp_charset_work > (see charset.c for the detail) instead of making many Lisp > objects (char-table and vector). Another 3M bytes were > because of big standard category table. It could be reduced > by hashing the table entries (see hash_get_category_set in > category.c for the detail). > > As a result, now the executable is 10,671,313 bytes on > GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but > I'm not sure it's worth making more effort to reduce it. Thank you. No problems here that I can see. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 11:20 ` Kenichi Handa 2008-11-27 16:07 ` Chong Yidong @ 2008-11-27 16:12 ` Dan Nicolaescu 2008-11-28 1:02 ` Kenichi Handa 2008-11-27 16:31 ` Stefan Monnier ` (2 subsequent siblings) 4 siblings, 1 reply; 86+ messages in thread From: Dan Nicolaescu @ 2008-11-27 16:12 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, monnier, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > In article <87r65flh5n.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > > > Kenichi Handa <handa@m17n.org> writes: > >>> > One idea is to have a single boolean vector of size #x110000 > >>> > (139264 bytes), setup it for CHARSET everytime when we call > >>> > map-charset-chars for the different charset. In that > >>> > vector, only the bit for #x3000, #x3001, #x3002, etc are 1 > >>> > for chinese-gb2312. Then map-charset-chars can know for > >>> > which characters FUNCTION must be called. > > > > >>> but it appears to free a negligible about of memory. > > > > > > Did you comment out the calls of unify-charset in > > > mule-conf.el and change the encoding of all preloaded *.el > > > files to utf-8? > > > Commenting out the calls to unify-charset does reduce the memory by > > several megabytes. > > After taking over Chong's experiment, I could reduce the > size of Emacs executables about 7M bytes. About 4M bytes > were actually because of charset mapping tables, and it > could be reduced by setting up C structure temp_charset_work > (see charset.c for the detail) instead of making many Lisp > objects (char-table and vector). Another 3M bytes were > because of big standard category table. It could be reduced > by hashing the table entries (see hash_get_category_set in > category.c for the detail). > > As a result, now the executable is 10,671,313 bytes on > GNU/Linux. Thanks for doing this! > It's still 1.6M bytes larger than Emacs 22, but I'm not sure it's > worth making more effort to reduce it. In that case that size increase might be with us for ever and ever, which is not ideal. Also a related question: the data in the .map files in emacs/etc/charsets be transformed into elisp? That way the normal loading mechanism could be used for them, and no parser + other code would be needed... ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 16:12 ` Dan Nicolaescu @ 2008-11-28 1:02 ` Kenichi Handa 0 siblings, 0 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-28 1:02 UTC (permalink / raw) To: Dan Nicolaescu Cc: rms, emanuele.giaquinta, cyd, emacs-devel, monnier, evilborisnet, jasonr In article <200811271612.mARGCT3f021393@mothra.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes: > Also a related question: the data in the .map files in > emacs/etc/charsets be transformed into elisp? That way the normal > loading mechanism could be used for them, and no parser + other code > would be needed... No. The key point of my change is not to make many/huge Lisp objects while temacs is running for dumping. And the parser code for the current *.map format is trivial. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 11:20 ` Kenichi Handa 2008-11-27 16:07 ` Chong Yidong 2008-11-27 16:12 ` Dan Nicolaescu @ 2008-11-27 16:31 ` Stefan Monnier 2008-11-27 20:17 ` Richard M Stallman 2008-11-28 16:11 ` Juanma Barranquero 4 siblings, 0 replies; 86+ messages in thread From: Stefan Monnier @ 2008-11-27 16:31 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, Chong Yidong, emacs-devel, dann, evilborisnet, jasonr > After taking over Chong's experiment, I could reduce the > size of Emacs executables about 7M bytes. About 4M bytes > were actually because of charset mapping tables, and it > could be reduced by setting up C structure temp_charset_work > (see charset.c for the detail) instead of making many Lisp > objects (char-table and vector). Another 3M bytes were > because of big standard category table. It could be reduced > by hashing the table entries (see hash_get_category_set in > category.c for the detail). > As a result, now the executable is 10,671,313 bytes on > GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but > I'm not sure it's worth making more effort to reduce it. That's great to hear. 1.6MB of growth is not a feature, but I guess we can live with it for now. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 11:20 ` Kenichi Handa ` (2 preceding siblings ...) 2008-11-27 16:31 ` Stefan Monnier @ 2008-11-27 20:17 ` Richard M Stallman 2008-11-27 20:42 ` Eli Zaretskii 2008-11-28 1:47 ` Kenichi Handa 2008-11-28 16:11 ` Juanma Barranquero 4 siblings, 2 replies; 86+ messages in thread From: Richard M Stallman @ 2008-11-27 20:17 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr As a result, now the executable is 10,671,313 bytes on GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but I'm not sure it's worth making more effort to reduce it. Thanks forn doing the work to shrink it. It is no disaster to be 1.6 meg bigger. But I think is worth some effort to try to find where that 1.6 meg goes in order to see if there is an easy idea for substantial more savings. If there is an easy way, it will be worth doing. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 20:17 ` Richard M Stallman @ 2008-11-27 20:42 ` Eli Zaretskii 2008-11-28 1:47 ` Kenichi Handa 1 sibling, 0 replies; 86+ messages in thread From: Eli Zaretskii @ 2008-11-27 20:42 UTC (permalink / raw) To: rms; +Cc: emacs-devel > From: Richard M Stallman <rms@gnu.org> > Date: Thu, 27 Nov 2008 15:17:49 -0500 > Cc: emanuele.giaquinta@gmail.com, cyd@stupidchicken.com, emacs-devel@gnu.org, > dann@ics.uci.edu, monnier@iro.umontreal.ca, > evilborisnet@netscape.net, jasonr@gnu.org > > As a result, now the executable is 10,671,313 bytes on > GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but > I'm not sure it's worth making more effort to reduce it. > > Thanks forn doing the work to shrink it. > > It is no disaster to be 1.6 meg bigger. Btw, I think we should be comparing the sizes of .data section (and maybe some other similar sections), rather than the size of the executable file on disk. The latter typically includes debug info and also the code, which typically grows in later releases. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 20:17 ` Richard M Stallman 2008-11-27 20:42 ` Eli Zaretskii @ 2008-11-28 1:47 ` Kenichi Handa 2008-11-28 15:38 ` Richard M Stallman 1 sibling, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-28 1:47 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr In article <E1L5nJ7-0001ge-I1@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes: > As a result, now the executable is 10,671,313 bytes on > GNU/Linux. It's still 1.6M bytes larger than Emacs 22, but > I'm not sure it's worth making more effort to reduce it. > Thanks forn doing the work to shrink it. > It is no disaster to be 1.6 meg bigger. But I think is worth some > effort to try to find where that 1.6 meg goes in order to see if there > is an easy idea for substantial more savings. If there is an easy way, > it will be worth doing. At least, temacs itself is 680K bytes larger. The new charset.c mallocs 260K bytes for temporary work. It is freed just before dumping, but it is uncertain whether it is surely returned to the system or not. In addition, many of preloaded Lisp files have grown. I guess the growth is not because of single change but because of summing up of many small changes. Of course, someone who has a time to investigate it is welcome. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-28 1:47 ` Kenichi Handa @ 2008-11-28 15:38 ` Richard M Stallman 2008-11-29 1:52 ` Kenichi Handa 0 siblings, 1 reply; 86+ messages in thread From: Richard M Stallman @ 2008-11-28 15:38 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr I guess the growth is not because of single change but because of summing up of many small changes. Maybe that is true, or maybe one large change is responsible for .5 meg (say). This is an empirical question, and rather than guess, I am suggesting we find out the answer. If indeed this growth is due to many small changes then probably there is nothing to be done. However, if one thing has eaten .5 meg, maybe a few hours of work can eliminate that, and it would be worth trying. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-28 15:38 ` Richard M Stallman @ 2008-11-29 1:52 ` Kenichi Handa 2008-11-29 10:47 ` Eli Zaretskii 2008-11-29 19:43 ` Richard M Stallman 0 siblings, 2 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-29 1:52 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr In article <E1L65QO-0004ZH-Ht@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes: > I guess the growth is not because of single change but > because of summing up of many small changes. > Maybe that is true, or maybe one large change is responsible for .5 > meg (say). This is an empirical question, and rather than guess, > I am suggesting we find out the answer. > If indeed this growth is due to many small changes then probably there > is nothing to be done. However, if one thing has eaten .5 meg, maybe > a few hours of work can eliminate that, and it would be worth trying. If one finds a reason of .5 meg eating, it may require just a few hours to fix it. But, to find that reason, I expect one has to spend many hours. At least, I've already spent many hours, and have no idea how many more hours to take to find one big reason (or to decide there's not such thing). --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 1:52 ` Kenichi Handa @ 2008-11-29 10:47 ` Eli Zaretskii 2008-11-29 19:43 ` Richard M Stallman 1 sibling, 0 replies; 86+ messages in thread From: Eli Zaretskii @ 2008-11-29 10:47 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Sat, 29 Nov 2008 10:52:38 +0900 > Cc: emanuele.giaquinta@gmail.com, cyd@stupidchicken.com, emacs-devel@gnu.org, > dann@ics.uci.edu, monnier@iro.umontreal.ca, > evilborisnet@netscape.net, jasonr@gnu.org > > If one finds a reason of .5 meg eating, it may require just > a few hours to fix it. But, to find that reason, I expect > one has to spend many hours. At least, I've already spent > many hours, and have no idea how many more hours to take to > find one big reason (or to decide there's not such thing). How about the following approach: run "temacs -batch -loadup dump" under strace (or a similar tool), and make a report of all the calls to memory-allocation services. Then explore all those calls that allocate significant amounts of memory that is never released later. (To ease the job of correlating memory allocation with packages being loaded, look at system calls that open and read *.el and *.elc files, immediately before memory allocation calls.) Would this approach allow to quickly zero in on the worst offenders? If so, perhaps someone here who is proficient with strace and system calls could work on preparing such a report, and then post it here? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 1:52 ` Kenichi Handa 2008-11-29 10:47 ` Eli Zaretskii @ 2008-11-29 19:43 ` Richard M Stallman 2008-11-30 4:50 ` Chetan Pandya 1 sibling, 1 reply; 86+ messages in thread From: Richard M Stallman @ 2008-11-29 19:43 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr If one finds a reason of .5 meg eating, it may require just a few hours to fix it. But, to find that reason, I expect one has to spend many hours. At least, I've already spent many hours, and have no idea how many more hours to take to find one big reason (or to decide there's not such thing). It is impossible to predetermine an answer to that. I think it is worth a few hours to LOOK for a single cause that has eaten much space. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 19:43 ` Richard M Stallman @ 2008-11-30 4:50 ` Chetan Pandya 0 siblings, 0 replies; 86+ messages in thread From: Chetan Pandya @ 2008-11-30 4:50 UTC (permalink / raw) To: emacs-devel While not directly on this, subject, looking at the isearch code, looks like there is some dead code. It would perhaps help to remove such code? Chetan Here is a diff with 22.3 isearch.el function isearch-other-meta-char: diff -u "c:/src/emacs-22.3/lisp/isearch.org.el" "c:/src/emacs-22.3/lisp/isearch.el" --- c:/src/emacs-22.3/lisp/isearch.org.el Tue Feb 05 10:49:45 2008 +++ c:/src/emacs-22.3/lisp/isearch.el Sun Nov 30 04:26:23 2008 @@ -1733,40 +1733,11 @@ (goto-char isearch-point))) (isearch-update)) (search-exit-option - (let (window) - (isearch-unread-key-sequence keylist) - (setq main-event (car unread-command-events)) - - ;; If we got a mouse click event, that event contains the - ;; window clicked on. maybe it was read with the buffer - ;; it was clicked on. If so, that buffer, not the current one, - ;; is in isearch mode. So end the search in that buffer. - - ;; ??? I have no idea what this if checks for, but it's - ;; obviously wrong for the case that a down-mouse event - ;; on another window invokes this function. The event - ;; will contain the window clicked on and that window's - ;; buffer is certainly not always in Isearch mode. - ;; - ;; Leave the code in, but check for current buffer not - ;; being in Isearch mode for now, until someone tells - ;; what it's really supposed to do. - ;; - ;; --gerd 2001-08-10. - - (if (and (not isearch-mode) - (listp main-event) - (setq window (posn-window (event-start main-event))) - (windowp window) - (or (> (minibuffer-depth) 0) - (not (window-minibuffer-p window)))) - (save-excursion - (set-buffer (window-buffer window)) - (isearch-done) - (isearch-clean-overlays)) - (isearch-done) - (isearch-clean-overlays) - (setq prefix-arg arg)))) + (isearch-unread-key-sequence keylist) + (setq main-event (car unread-command-events)) + (isearch-done) + (isearch-clean-overlays) + (setq prefix-arg arg)) (t;; otherwise nil (isearch-process-search-string key key))))) Diff finished. Sat Nov 29 20:34:35 2008 ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-27 11:20 ` Kenichi Handa ` (3 preceding siblings ...) 2008-11-27 20:17 ` Richard M Stallman @ 2008-11-28 16:11 ` Juanma Barranquero 2008-11-29 1:47 ` Kenichi Handa 4 siblings, 1 reply; 86+ messages in thread From: Juanma Barranquero @ 2008-11-28 16:11 UTC (permalink / raw) To: Kenichi Handa; +Cc: Emacs developers On Thu, Nov 27, 2008 at 12:20, Kenichi Handa <handa@m17n.org> wrote: > After taking over Chong's experiment, I could reduce the > size of Emacs executables about 7M bytes. About 4M bytes > were actually because of charset mapping tables, and it > could be reduced by setting up C structure temp_charset_work Your big patch of 2008-11-27 has caused a change when decoding some filenames. I've detected it because I have a fix for bug#1433 that works before your changes, but does not work afterwards. I'm testing by adding the following line to Fw32_short_file_name: filename = ENCODE_FILE (filename); before the call to GetShortPathName. Before your changes, 1783 filename = ENCODE_FILE (filename); (gdb) p filename $5 = 52759907 (gdb) xstring $6 = (struct Lisp_String *) 0x3250d60 "c:/acci\303\263n/" (gdb) next 1786 if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0) (gdb) p filename $7 = 52759971 (gdb) xstring $8 = (struct Lisp_String *) 0x3250da0 "c:/acci\363n/" After your changes, 1783 filename = ENCODE_FILE (filename); (gdb) p filename $5 = 55776275 (gdb) xstring $6 = (struct Lisp_String *) 0x3531410 "c:/acci\303\263n/" (gdb) next 1786 if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0) (gdb) p filename $7 = 55776339 (gdb) xstring $8 = (struct Lisp_String *) 0x3531450 "c:/acci n/" Juanma ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-28 16:11 ` Juanma Barranquero @ 2008-11-29 1:47 ` Kenichi Handa 2008-11-29 11:13 ` Juanma Barranquero 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-29 1:47 UTC (permalink / raw) To: Juanma Barranquero; +Cc: emacs-devel In article <f7ccd24b0811280811k5996b24dla2a1692af64da2c9@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes: > Your big patch of 2008-11-27 has caused a change when decoding some > filenames. I've detected it because I have a fix for bug#1433 that > works before your changes, but does not work afterwards. I've just installed fixes for my silly mistakes. Could you please try again? --- Kenichi Handa handa@ni.aist.go.jp > I'm testing by adding the following line to Fw32_short_file_name: > filename = ENCODE_FILE (filename); > before the call to GetShortPathName. > Before your changes, > 1783 filename = ENCODE_FILE (filename); > (gdb) p filename > $5 = 52759907 > (gdb) xstring > $6 = (struct Lisp_String *) 0x3250d60 > "c:/acci\303\263n/" > (gdb) next > 1786 if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0) > (gdb) p filename > $7 = 52759971 > (gdb) xstring > $8 = (struct Lisp_String *) 0x3250da0 > "c:/acci\363n/" > After your changes, > 1783 filename = ENCODE_FILE (filename); > (gdb) p filename > $5 = 55776275 > (gdb) xstring > $6 = (struct Lisp_String *) 0x3531410 > "c:/acci\303\263n/" > (gdb) next > 1786 if (GetShortPathName (SDATA (filename), shortname, MAX_PATH) == 0) > (gdb) p filename > $7 = 55776339 > (gdb) xstring > $8 = (struct Lisp_String *) 0x3531450 > "c:/acci n/" > Juanma ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 1:47 ` Kenichi Handa @ 2008-11-29 11:13 ` Juanma Barranquero 2008-11-29 12:17 ` Juanma Barranquero 0 siblings, 1 reply; 86+ messages in thread From: Juanma Barranquero @ 2008-11-29 11:13 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Sat, Nov 29, 2008 at 02:47, Kenichi Handa <handa@m17n.org> wrote: > I've just installed fixes for my silly mistakes. Could you > please try again? Yes, it works now. Thanks, Juanma ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 11:13 ` Juanma Barranquero @ 2008-11-29 12:17 ` Juanma Barranquero 2008-11-29 13:50 ` Kenichi Handa 0 siblings, 1 reply; 86+ messages in thread From: Juanma Barranquero @ 2008-11-29 12:17 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Sat, Nov 29, 2008 at 12:13, Juanma Barranquero <lekktu@gmail.com> wrote: > Yes, it works now. I spoke too soon. Your patch fixes the problem, but I now I'm seeing an assertion failure when writing some files (most notably, ChangeLogs). That is 100% repeatable: I try to write src/ChangeLog, for example, and get the assertion failure every time. The CHAR_TABLE_P call at the start of char_table_ref is getting a null `table'. See the attached debug trace at the point where `table' is Qnil. Juanma Breakpoint 16, char_table_ref (table=48941057, c=228) at chartab.c:197 197 struct Lisp_Char_Table *tbl = XCHAR_TABLE (table); (gdb) p table $43 = 48941057 (gdb) pr nil (gdb) bt #0 char_table_ref (table=48941057, c=228) at chartab.c:197 #1 0x01174a88 in char_encodable_p (c=228, attrs=53025028) at coding.c:8068 #2 0x0117195d in Ffind_coding_systems_region_internal (start=8, end=4687040, exclude=48941057) at coding.c:8164 #3 0x0103f05b in Ffuncall (nargs=3, args=0x82dfb0) at eval.c:3050 #4 0x012173ad in Fbyte_code (bytestr=20863691, vector=20863748, maxdepth=32) at bytecode.c:678 #5 0x0103fc28 in funcall_lambda (fun=20863644, nargs=2, arg_vector=0x82e2c4) at eval.c:3231 #6 0x0103f484 in Ffuncall (nargs=3, args=0x82e2c0) at eval.c:3090 #7 0x012173ad in Fbyte_code (bytestr=20867659, vector=20867956, maxdepth=72) at bytecode.c:678 #8 0x0103fc28 in funcall_lambda (fun=20867580, nargs=5, arg_vector=0x82e604) at eval.c:3231 #9 0x0103f484 in Ffuncall (nargs=6, args=0x82e600) at eval.c:3090 #10 0x0103e626 in call5 (fn=49529857, arg1=8, arg2=4687040, arg3=49221305, arg4=48941057, arg5=53565683) at eval.c:2919 #11 0x010cafe0 in choose_write_coding_system (start=8, end=4687040, filename=53565683, append=48941057, visit=48941105, lockname=53566403, coding=0x82e6f0) at fileio.c:4348 #12 0x010cb392 in Fwrite_region (start=8, end=4687040, filename=53565683, append=48941057, visit=48941105, lockname=53566403, mustbenew=48941057) at fileio.c:4520 #13 0x0103f33c in Ffuncall (nargs=7, args=0x82eb80) at eval.c:3068 #14 0x012173ad in Fbyte_code (bytestr=20586915, vector=20587124, maxdepth=56) at bytecode.c:678 #15 0x0103fc28 in funcall_lambda (fun=20586892, nargs=0, arg_vector=0x82eea4) at eval.c:3231 #16 0x0103f484 in Ffuncall (nargs=1, args=0x82eea0) at eval.c:3090 #17 0x012173ad in Fbyte_code (bytestr=20586731, vector=20586780, maxdepth=16) at bytecode.c:678 #18 0x0103fc28 in funcall_lambda (fun=20586708, nargs=0, arg_vector=0x82f1b4) at eval.c:3231 #19 0x0103f484 in Ffuncall (nargs=1, args=0x82f1b0) at eval.c:3090 #20 0x012173ad in Fbyte_code (bytestr=20585923, vector=20586180, maxdepth=40) at bytecode.c:678 #21 0x0103fc28 in funcall_lambda (fun=20585892, nargs=0, arg_vector=0x82f4d4) at eval.c:3231 #22 0x0103f484 in Ffuncall (nargs=1, args=0x82f4d0) at eval.c:3090 #23 0x012173ad in Fbyte_code (bytestr=20585267, vector=20585364, maxdepth=40) at bytecode.c:678 #24 0x0103fc28 in funcall_lambda (fun=20585220, nargs=1, arg_vector=0x82f834) at eval.c:3231 #25 0x0103f484 in Ffuncall (nargs=2, args=0x82f830) at eval.c:3090 #26 0x0121ce62 in Fcall_interactively (function=49156481, record_flag=48941057, keys=48974596) at callint.c:857 #27 0x0103f05b in Ffuncall (nargs=4, args=0x82faf8) at eval.c:3050 #28 0x0103e598 in call3 (fn=49132665, arg1=49156481, arg2=48941057, arg3=48941057) at eval.c:2870 #29 0x01024ca1 in Fcommand_execute (cmd=49156481, record_flag=48941057, keys=48941057, special=48941057) at keyboard.c:10333 #30 0x0100a93c in command_loop_1 () at keyboard.c:1880 #31 0x0103b136 in internal_condition_case (bfun=0x1007235 <command_loop_1>, handlers=49004809, hfun=0x10069b4 <cmd_error>) at eval.c:1511 #32 0x01006e44 in command_loop_2 () at keyboard.c:1338 #33 0x0103ab8f in internal_catch (tag=49000881, func=0x1006e24 <command_loop_2>, arg=48941057) at eval.c:1247 #34 0x01006dfb in command_loop () at keyboard.c:1317 #35 0x0100610b in recursive_edit_1 () at keyboard.c:942 #36 0x010065e0 in Frecursive_edit () at keyboard.c:1004 #37 0x01002a71 in main (argc=2, argv=0xa92748) at emacs.c:1777 Lisp Backtrace: "find-coding-systems-region-internal" (0x82dfb4) "find-coding-systems-region" (0x82e2c4) "select-safe-coding-system" (0x82e604) "write-region" (0x82eb84) "basic-save-buffer-2" (0x82eea4) "basic-save-buffer-1" (0x82f1b4) "basic-save-buffer" (0x82f4d4) "save-buffer" (0x82f834) "call-interactively" (0x82fafc) (gdb) frame 1 #1 0x01174a88 in char_encodable_p (c=228, attrs=53025028) at coding.c:8068 8068 if (CHAR_CHARSET_P (c, charset)) (gdb) p attrs $44 = 53025028 (gdb) pr [chinese-gb18030 nil 99 charset (0 157 158 159 160 161) t nil nil nil nil nil 32 nil (:ascii-compatible-p t :category coding-categ ory-charset :name chinese-gb18030 :docstring "GB18030 encoding for Chinese (MIME:GB18030)." :coding-type charset :mnemonic 99 :cha rset-list (ascii gb18030-2-byte gb18030-4-byte-bmp gb18030-4-byte-smp gb18030-4-byte-ext-1 gb18030-4-byte-ext-2) :mime-charset gb1 8030) 14 " + " [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 nil (157 158) (157 158) (157 158) (157 158 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 160) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159) (157 159 1 61) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 1 61) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 161) (157 1 61) (157 161) nil] nil nil nil nil nil nil nil nil nil nil] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 12:17 ` Juanma Barranquero @ 2008-11-29 13:50 ` Kenichi Handa 2008-11-29 15:05 ` Juanma Barranquero 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-29 13:50 UTC (permalink / raw) To: Juanma Barranquero; +Cc: emacs-devel In article <f7ccd24b0811290417q289c695en132c5916631eccec@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes: > Your patch fixes the problem, but I now I'm seeing an assertion > failure when writing some files (most notably, ChangeLogs). That is > 100% repeatable: I try to write src/ChangeLog, for example, and get > the assertion failure every time. > The CHAR_TABLE_P call at the start of char_table_ref is getting a null `table'. > See the attached debug trace at the point where `table' is Qnil. I've just installed a fix. Please try again. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-29 13:50 ` Kenichi Handa @ 2008-11-29 15:05 ` Juanma Barranquero 0 siblings, 0 replies; 86+ messages in thread From: Juanma Barranquero @ 2008-11-29 15:05 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Sat, Nov 29, 2008 at 14:50, Kenichi Handa <handa@m17n.org> wrote: > I've just installed a fix. Please try again. It works now. Thanks, Juanma ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-05 4:17 ` Kenichi Handa 2008-11-05 15:50 ` Stefan Monnier @ 2008-11-05 22:30 ` Richard M. Stallman 2008-11-06 11:58 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Richard M. Stallman @ 2008-11-05 22:30 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr > > Maybe simpler would be to build this table as we do now, then print it > > into a file. Then the "dump" doesn't need to build the table, it can > > just read it from the file. > Handa-san, could you comment on this? I think it doesn't work. Emacs needs that table and the other mapping char-tables to decode non-ascii characters in files that are loaded before dumping. Any table which is needed by Emacs to read those files will inevitably be include in the dumped Emacs. Can we arrange to autoload (later) the files which need these tables? Anoher way to avoid this problem is, I think, to have a portable dumper. Do you mean a portable replacement for unexelf and friends? How would that help this? I don't see the connection. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-05 22:30 ` Richard M. Stallman @ 2008-11-06 11:58 ` Kenichi Handa 2008-11-07 12:39 ` Richard M. Stallman 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-06 11:58 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr In article <E1Kxqtd-0002y0-Hq@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes: > Anoher way to avoid this problem is, I think, to have a > portable dumper. > Do you mean a portable replacement for unexelf and friends? Yes. > How would that help this? > I don't see the connection. To my understanding, the portable dumper implemented for Emacs relocates Lisp objects to make the dump file (the file containing Lisp ojbectes that are reloaded at startup time) as small as possible. That means the garbage data are excluded from the dump file. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-06 11:58 ` Kenichi Handa @ 2008-11-07 12:39 ` Richard M. Stallman 2008-11-07 13:29 ` Stephen J. Turnbull 0 siblings, 1 reply; 86+ messages in thread From: Richard M. Stallman @ 2008-11-07 12:39 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr To my understanding, the portable dumper implemented for Emacs relocates Lisp objects to make the dump file (the file containing Lisp ojbectes that are reloaded at startup time) as small as possible. That means the garbage data are excluded from the dump file. I see. You mean a relocating dumper. Yes, that could help. But that alone does not guarantee success. The dumper also needs to dump the non-Lisp memory blocks that Emacs allocates. They are scattered among the memory blocks that contain Lisp objects, and they can't be relocated in this way. Thus, the total memory size can only be reduced down to the last non-Lisp memory block. That is likely to eliminate most of the benefit. Perhaps with some other special scheme to segregate Lisp blocks and non-Lisp blocks into two special heaps, this problem could be avoided. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-07 12:39 ` Richard M. Stallman @ 2008-11-07 13:29 ` Stephen J. Turnbull 2008-11-07 21:15 ` Richard M. Stallman 0 siblings, 1 reply; 86+ messages in thread From: Stephen J. Turnbull @ 2008-11-07 13:29 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, Kenichi Handa, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr Richard M. Stallman writes: > The dumper also needs to dump the non-Lisp memory blocks that > Emacs allocates. They are scattered among the memory blocks that > contain Lisp objects, and they can't be relocated in this way. Maybe not at present in Emacs, but as far as I know those blocks *are* relocated by XEmacs's portable dumper. The people to ask are Olivier Galibert (olivier.galibert@xemacs.org) and Marcus Crestani (crestani@xemacs.org). Ben Wing and Kyle Jones also have done a lot of work related to the portable dumper in XEmacs, but they've been inactive for years. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-07 13:29 ` Stephen J. Turnbull @ 2008-11-07 21:15 ` Richard M. Stallman 2008-11-08 4:00 ` Stephen J. Turnbull 2008-11-08 4:19 ` Stefan Monnier 0 siblings, 2 replies; 86+ messages in thread From: Richard M. Stallman @ 2008-11-07 21:15 UTC (permalink / raw) To: Stephen J. Turnbull Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr Maybe not at present in Emacs, but as far as I know those blocks *are* relocated by XEmacs's portable dumper. How is it possible to relocate all blocks of memory? The C library allocates some, and Emacs can't tell where the pointers to them are kept. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-07 21:15 ` Richard M. Stallman @ 2008-11-08 4:00 ` Stephen J. Turnbull 2008-11-08 4:19 ` Stefan Monnier 1 sibling, 0 replies; 86+ messages in thread From: Stephen J. Turnbull @ 2008-11-08 4:00 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, monnier, evilborisnet, jasonr Richard M. Stallman writes: > Maybe not at present in Emacs, but as far as I know those blocks *are* > relocated by XEmacs's portable dumper. > > The C library allocates some, and Emacs can't tell where the > pointers to them are kept. So what? The XEmacs portable dumper does not unexec, so those are lost anyway AIUI. > How is it possible to relocate all blocks of memory? Why ask me? I already said I don't know, and gave names and addresses of those who should be able to give useful answers about whether the XEmacs technology leads to recovery of memory in practice, or could. Please ask the experts. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-07 21:15 ` Richard M. Stallman 2008-11-08 4:00 ` Stephen J. Turnbull @ 2008-11-08 4:19 ` Stefan Monnier 1 sibling, 0 replies; 86+ messages in thread From: Stefan Monnier @ 2008-11-08 4:19 UTC (permalink / raw) To: rms Cc: emanuele.giaquinta, handa, cyd, emacs-devel, dann, evilborisnet, Stephen J. Turnbull, jasonr > Maybe not at present in Emacs, but as far as I know those blocks *are* > relocated by XEmacs's portable dumper. > How is it possible to relocate all blocks of memory? > The C library allocates some, and Emacs can't tell where the pointers > to them are kept. Maybe the source code needs to be changed to keep track of them. Kind of like the GCPRO thingies, maybe? Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 5:29 ` Kenichi Handa 2008-10-31 6:32 ` Chong Yidong 2008-10-31 15:07 ` Dan Nicolaescu @ 2008-10-31 19:30 ` Richard M. Stallman 2008-11-09 22:43 ` Chong Yidong 3 siblings, 0 replies; 86+ messages in thread From: Richard M. Stallman @ 2008-10-31 19:30 UTC (permalink / raw) To: Kenichi Handa; +Cc: jasonr, dann, evilborisnet, emanuele.giaquinta, emacs-devel Then, garbage-collect is called. After that, the living char-table is Vchar_unify_table only, and the contents is not that big because it maps upper area characters to charset, and each charset has linear upper area, thus most succeeding charaters have the same value. GC ought to free the space that was occupied by those tables. However, freeing the space does not necessarily reduce the size of the Emacs memory. It just puts freed objects in the free list. For instance, if the maps use cons cells, they will go on the free list but the memory will still be in the Emacs image. Maybe what is needed is to run a separate process to build the desired value for char-unify-table and store it in a file, and load just that file during building `emacs'. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-10-31 5:29 ` Kenichi Handa ` (2 preceding siblings ...) 2008-10-31 19:30 ` Richard M. Stallman @ 2008-11-09 22:43 ` Chong Yidong 2008-11-09 22:57 ` Chong Yidong 2008-11-10 1:26 ` Kenichi Handa 3 siblings, 2 replies; 86+ messages in thread From: Chong Yidong @ 2008-11-09 22:43 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr Kenichi Handa <handa@m17n.org> writes: > (2) Just before dumping, clear-charset-maps is called. This > function sets all char-tables built in (1) (except for > Vchar_unified_charset_table) to nil. Then set > Vchar_unify_table to Vchar_unified_charset_table, and > set Vchar_unified_charset_table to nil. > > Then, garbage-collect is called. After that, the living > char-table is Vchar_unify_table only, and the contents > is not that big because it maps upper area characters to > charset, and each charset has linear upper area, thus > most succeeding charaters have the same value. > > (3) When the dumped Emacs runs, at the time of > decoding/encoding charsets that are unified as above, by > checking if the value of Vchar_unify_table for a > character is symbol or not, Emacs knows whether it has > to load the mapping table again or not. I was looking through charset.c and character.c, and it seems to me that Vchar_unify_table is not used by anything. Its contents are altered at several placed in the code, but it looks like it is never consulted for anything. What am I missing? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 22:43 ` Chong Yidong @ 2008-11-09 22:57 ` Chong Yidong 2008-11-10 1:28 ` Kenichi Handa 2008-11-10 1:26 ` Kenichi Handa 1 sibling, 1 reply; 86+ messages in thread From: Chong Yidong @ 2008-11-09 22:57 UTC (permalink / raw) To: Kenichi Handa Cc: rms, emanuele.giaquinta, jasonr, dann, evilborisnet, emacs-devel Chong Yidong <cyd@stupidchicken.com> writes: > I was looking through charset.c and character.c, and it seems to me that > Vchar_unify_table is not used by anything. Its contents are altered at > several placed in the code, but it looks like it is never consulted for > anything. What am I missing? Similarly, Vchar_unified_charset_table seems to be not consulted anywhere in the code. Removing these two variables from the C code seems to make no difference to Emacs as far as I can tell (other than shrinking the size of the executable by 2M). Could it be that this part: >> (3) When the dumped Emacs runs, at the time of >> decoding/encoding charsets that are unified as above, by >> checking if the value of Vchar_unify_table for a >> character is symbol or not, Emacs knows whether it has >> to load the mapping table again or not. has not been implemented? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 22:57 ` Chong Yidong @ 2008-11-10 1:28 ` Kenichi Handa 2008-11-10 19:29 ` Richard M. Stallman 0 siblings, 1 reply; 86+ messages in thread From: Kenichi Handa @ 2008-11-10 1:28 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr In article <87wsfco6cx.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > Similarly, Vchar_unified_charset_table seems to be not consulted > anywhere in the code. Removing these two variables from the C code > seems to make no difference to Emacs as far as I can tell (other than > shrinking the size of the executable by 2M). > Could it be that this part: >>> (3) When the dumped Emacs runs, at the time of >>> decoding/encoding charsets that are unified as above, by >>> checking if the value of Vchar_unify_table for a >>> character is symbol or not, Emacs knows whether it has >>> to load the mapping table again or not. > has not been implemented? It's implemented in MAYBE_UNIFY_CHAR (in character.h). --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-10 1:28 ` Kenichi Handa @ 2008-11-10 19:29 ` Richard M. Stallman 0 siblings, 0 replies; 86+ messages in thread From: Richard M. Stallman @ 2008-11-10 19:29 UTC (permalink / raw) To: Kenichi Handa Cc: emanuele.giaquinta, cyd, jasonr, dann, evilborisnet, emacs-devel >>> (3) When the dumped Emacs runs, at the time of >>> decoding/encoding charsets that are unified as above, by >>> checking if the value of Vchar_unify_table for a >>> character is symbol or not, Emacs knows whether it has >>> to load the mapping table again or not. > has not been implemented? It's implemented in MAYBE_UNIFY_CHAR (in character.h). If it is hard to find the code which implements that feature then I think there should be comment near Vchar_unify_table's declaration saying to look at that macro for the implementation of the feature. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: size of emacs executable after unicode merge 2008-11-09 22:43 ` Chong Yidong 2008-11-09 22:57 ` Chong Yidong @ 2008-11-10 1:26 ` Kenichi Handa 1 sibling, 0 replies; 86+ messages in thread From: Kenichi Handa @ 2008-11-10 1:26 UTC (permalink / raw) To: Chong Yidong Cc: rms, emanuele.giaquinta, emacs-devel, dann, evilborisnet, jasonr In article <871vxkfrkk.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes: > I was looking through charset.c and character.c, and it seems to me that > Vchar_unify_table is not used by anything. It's used in the macro MAYBE_UNIFY_CHAR (in character.h) to check if the upper area characters (#x110000-) must be unified or not. But, as I wrote, the memory consumed by that char-table is rather small. The bigger ones are mapping tables of character sets. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 86+ messages in thread
end of thread, other threads:[~2008-11-30 4:50 UTC | newest] Thread overview: 86+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-14 3:13 size of emacs executable after unicode merge Evil Boris 2008-05-14 3:51 ` Dan Nicolaescu 2008-05-14 16:39 ` Richard M Stallman 2008-05-14 16:52 ` Dan Nicolaescu 2008-05-15 14:18 ` Richard M Stallman 2008-05-15 15:29 ` Dan Nicolaescu 2008-05-16 11:31 ` Richard M Stallman 2008-05-16 12:06 ` Dan Nicolaescu 2008-05-16 12:32 ` Kenichi Handa 2008-05-16 12:55 ` Jason Rumney 2008-05-16 15:59 ` Thomas Lord 2008-05-16 22:07 ` Stephen J. Turnbull 2008-05-16 23:01 ` Thomas Lord 2008-05-17 0:56 ` Kenichi Handa 2008-05-17 1:52 ` YAMAMOTO Mitsuharu 2008-05-19 1:45 ` Kenichi Handa 2008-10-30 10:18 ` Emanuele Giaquinta 2008-10-30 21:22 ` Eli Zaretskii 2008-10-30 21:42 ` Stefan Monnier 2008-10-31 3:55 ` Richard M. Stallman 2008-10-31 5:29 ` Kenichi Handa 2008-10-31 6:32 ` Chong Yidong 2008-10-31 7:32 ` Kenichi Handa 2008-10-31 10:09 ` Eli Zaretskii 2008-10-31 12:33 ` gdb error [Re: size of emacs executable after unicode merge] Kenichi Handa 2008-10-31 14:28 ` Eli Zaretskii 2008-10-31 12:35 ` size of emacs executable after unicode merge Stephen Berman 2008-11-21 12:32 ` Kenichi Handa 2008-11-21 14:18 ` Ulrich Mueller 2008-10-31 10:41 ` YAMAMOTO Mitsuharu 2008-10-31 15:07 ` Dan Nicolaescu 2008-10-31 16:44 ` Stefan Monnier 2008-11-04 23:09 ` Chong Yidong 2008-11-05 4:17 ` Kenichi Handa 2008-11-05 15:50 ` Stefan Monnier 2008-11-06 7:56 ` Kenichi Handa 2008-11-08 2:42 ` Stefan Monnier 2008-11-08 4:10 ` Chong Yidong 2008-11-08 9:19 ` Eli Zaretskii 2008-11-09 0:27 ` Richard M. Stallman 2008-11-09 6:29 ` Dan Nicolaescu 2008-11-09 17:11 ` Richard M. Stallman 2008-11-10 1:24 ` Stefan Monnier 2008-11-10 1:55 ` Thomas Lord 2008-11-11 4:37 ` Chong Yidong 2008-11-08 10:30 ` Dan Nicolaescu 2008-11-09 20:14 ` Chong Yidong 2008-11-10 1:59 ` Kenichi Handa 2008-11-10 15:18 ` Chong Yidong 2008-11-10 23:18 ` Chong Yidong 2008-11-11 18:17 ` Chong Yidong 2008-11-12 6:26 ` Kenichi Handa 2008-11-13 16:33 ` Chong Yidong 2008-11-14 0:48 ` Kenichi Handa 2008-11-27 11:20 ` Kenichi Handa 2008-11-27 16:07 ` Chong Yidong 2008-11-27 16:12 ` Dan Nicolaescu 2008-11-28 1:02 ` Kenichi Handa 2008-11-27 16:31 ` Stefan Monnier 2008-11-27 20:17 ` Richard M Stallman 2008-11-27 20:42 ` Eli Zaretskii 2008-11-28 1:47 ` Kenichi Handa 2008-11-28 15:38 ` Richard M Stallman 2008-11-29 1:52 ` Kenichi Handa 2008-11-29 10:47 ` Eli Zaretskii 2008-11-29 19:43 ` Richard M Stallman 2008-11-30 4:50 ` Chetan Pandya 2008-11-28 16:11 ` Juanma Barranquero 2008-11-29 1:47 ` Kenichi Handa 2008-11-29 11:13 ` Juanma Barranquero 2008-11-29 12:17 ` Juanma Barranquero 2008-11-29 13:50 ` Kenichi Handa 2008-11-29 15:05 ` Juanma Barranquero 2008-11-05 22:30 ` Richard M. Stallman 2008-11-06 11:58 ` Kenichi Handa 2008-11-07 12:39 ` Richard M. Stallman 2008-11-07 13:29 ` Stephen J. Turnbull 2008-11-07 21:15 ` Richard M. Stallman 2008-11-08 4:00 ` Stephen J. Turnbull 2008-11-08 4:19 ` Stefan Monnier 2008-10-31 19:30 ` Richard M. Stallman 2008-11-09 22:43 ` Chong Yidong 2008-11-09 22:57 ` Chong Yidong 2008-11-10 1:28 ` Kenichi Handa 2008-11-10 19:29 ` Richard M. Stallman 2008-11-10 1:26 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).