* Re: maximum buffer size exceeded [not found] ` <utzqatc3x.fsf@gnu.org> @ 2007-09-05 12:37 ` Kim F. Storm 2007-09-05 15:00 ` Stefan Monnier 2007-09-06 4:59 ` Richard Stallman 0 siblings, 2 replies; 6+ messages in thread From: Kim F. Storm @ 2007-09-05 12:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I think the current view in Emacs development is that 64-bit platforms > solve this problem so easily that its solution for 32-bit machines is > much less important than working on other Emacs features. Actually, I think a small trick could increase the buffer size to 1 GB on 32 bit machines at the cost of a little(?) wasted memory. [Note: Assuming USE_LSB_TAG is defined] Currently, we have the lowest 3 bits reserved for the Lisp Type, meaning that the largest positive Emacs integer is 2^28-1 (256MB). Now, consider if we reserve 4 bits for the Lisp Type, but in such a way the Lisp_Int == 0, while the other Lisp types are odd numbers 1,3,5,7,... In this setup, an integer can be recognized by looking at the lowest bit alone (== 0), while the other Lisp types are recognized using the current methods (looking at all 4 type bits). The only drawback I can see is that Lisp_Objects have to be allocated on 16 byte boundaries rather than the current 8 byte boundary, so a little space may be wasted (and maybe not...). I haven't tried this, but given that Lisp_Objects are usually accessed via suitable macros, it looks quite doable. -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: maximum buffer size exceeded 2007-09-05 12:37 ` maximum buffer size exceeded Kim F. Storm @ 2007-09-05 15:00 ` Stefan Monnier 2007-09-05 15:14 ` Jason Rumney 2007-09-06 4:59 ` Richard Stallman 1 sibling, 1 reply; 6+ messages in thread From: Stefan Monnier @ 2007-09-05 15:00 UTC (permalink / raw) To: Kim F. Storm; +Cc: Eli Zaretskii, help-gnu-emacs, emacs-devel >> I think the current view in Emacs development is that 64-bit platforms >> solve this problem so easily that its solution for 32-bit machines is >> much less important than working on other Emacs features. > Actually, I think a small trick could increase the buffer size to 1 GB > on 32 bit machines at the cost of a little(?) wasted memory. > [Note: Assuming USE_LSB_TAG is defined] > Currently, we have the lowest 3 bits reserved for the Lisp Type, > meaning that the largest positive Emacs integer is 2^28-1 (256MB). > Now, consider if we reserve 4 bits for the Lisp Type, but > in such a way the Lisp_Int == 0, while the other Lisp types > are odd numbers 1,3,5,7,... > In this setup, an integer can be recognized by looking at the lowest > bit alone (== 0), while the other Lisp types are recognized using the > current methods (looking at all 4 type bits). > The only drawback I can see is that Lisp_Objects have to be allocated > on 16 byte boundaries rather than the current 8 byte boundary, so a > little space may be wasted (and maybe not...). > I haven't tried this, but given that Lisp_Objects are usually accessed > via suitable macros, it looks quite doable. Increasing from 8 to 16 bytes alignment may be a non-trivial problem: 1 - cons cells use 8 bytes right now, so you'd waste a lot of space for them. 2 - same for floats. 3 - in many places, we rely on malloc to align objects on multiple of 8, so we'd have to use some other approach. Numbers 1 and 2 can be solved by giving two tags to cons and floats, so they only need alignment on multiple of 8. Number 3 is more work. But this work may be the same as the one needed to allow us to use USE_LSB_TAG everywhere (even on machines where malloc and static-vars do not guarantee mult-of-8 alignment). We currently have 7 different types (of the 8 possible tag we only use 7). My own local Emacs build uses the trick you suggest but on the 3bits of tags, so I gave 2 tags to integers to allow them to grow up the 2^29 (i.e. max buffer size = 512MB). That's a very simple change. What you suggest would be to use 4 bits i.e. 16 possible tags: - 8 tags for integers (i.e. 8 tags left for the 6 other types) - 2 tags for cons cells (6 tags left for the 5 other types) - 2 tags for floats - one tag each for the remaining 4 types (arrays, symbols, strings, misc). One other problem: currently `misc' objects need 5 32bit words which USE_LSB_TAG forced to round up to 6 32bit words and symbols use 6 32bit words. So rounding up to mult-of-16 would round them both up to 8 32bit words. The two subtypes of misc which use up 5 words are markers and overlays. So with your rounding up, an overlay would use up 3*8=24 words (3 because there's the overlay object plus the two associated marker objects) instead of 15 (without USE_LSB_TAG) or 18. I had plans to try and squeeze `misc' objects down to 4 words (and hence overlays down to 12 words), but this is a non-trivial change. One possible approach is to replace the linked lists of overlays and markers by arrays (managed just like buffer text: with a gap). Another option is to remove the `symbol' and `string' tags and make symbols and strings subtype of `misc'. Then we could keep 3 tag bits and give 4 of the 8 tags to integers. This would simplify the alloc.c code but would also waste more memory (6 words for string objects) and slow down SYMBOLP and STRINGP slightly. Still, the fundamental problem remains the same: files larger than 256MB are most likely not generated manually. So they may very likely grow to more than 4GB tomorrow. Bumping the limit to 512MB or 1GB (or even 4GB for that matter) is only going to help in some fraction of the cases. I think a better approach to handle this problem is to create a special package to visit arbitrarily large files which would work by loading only parts of the file at a time and do manual "swapping". This would not work as smoothly, but then again manipulating 256MB files in Emacs is currently not that smooth either. Stefan PS: You can supposedly open >4GB files in Emacs with 64bit systems, but looking at the C code, it's clear that you'll bump into bugs where we cast EMACS_INT values to and from `int' (which on many 64bit systems are only 32bit). I tend to fix those bugs when I bump into them, but they're everywhere and I've fixed only a tiny fraction of them. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: maximum buffer size exceeded 2007-09-05 15:00 ` Stefan Monnier @ 2007-09-05 15:14 ` Jason Rumney 2007-09-05 16:08 ` Stefan Monnier 0 siblings, 1 reply; 6+ messages in thread From: Jason Rumney @ 2007-09-05 15:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Kim F. Storm Stefan Monnier wrote: > PS: You can supposedly open >4GB files in Emacs with 64bit systems, but > looking at the C code, it's clear that you'll bump into bugs where we cast > EMACS_INT values to and from `int' (which on many 64bit systems are only > 32bit). I tend to fix those bugs when I bump into them, but they're > everywhere and I've fixed only a tiny fraction of them. > long is also 32 bits on 64bit versions of Windows, so avoid simply replacing int with long. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: maximum buffer size exceeded 2007-09-05 15:14 ` Jason Rumney @ 2007-09-05 16:08 ` Stefan Monnier 0 siblings, 0 replies; 6+ messages in thread From: Stefan Monnier @ 2007-09-05 16:08 UTC (permalink / raw) To: Jason Rumney; +Cc: Eli Zaretskii, Kim F. Storm, help-gnu-emacs, emacs-devel >> PS: You can supposedly open >4GB files in Emacs with 64bit systems, but >> looking at the C code, it's clear that you'll bump into bugs where we cast >> EMACS_INT values to and from `int' (which on many 64bit systems are only >> 32bit). I tend to fix those bugs when I bump into them, but they're >> everywhere and I've fixed only a tiny fraction of them. >> > long is also 32 bits on 64bit versions of Windows, so avoid simply > replacing int with long. Of course, it should be replaced with EMACS_INT or EMACS_UINT. Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: maximum buffer size exceeded 2007-09-05 12:37 ` maximum buffer size exceeded Kim F. Storm 2007-09-05 15:00 ` Stefan Monnier @ 2007-09-06 4:59 ` Richard Stallman 2007-09-06 5:44 ` David Kastrup 1 sibling, 1 reply; 6+ messages in thread From: Richard Stallman @ 2007-09-06 4:59 UTC (permalink / raw) To: Kim F. Storm; +Cc: help-gnu-emacs, emacs-devel The only drawback I can see is that Lisp_Objects have to be allocated on 16 byte boundaries rather than the current 8 byte boundary, so a little space may be wasted (and maybe not...). For cons cells and floats, it would mean half the space is wasted. Markers and symbols and miscs will also waste space, but a smaller fraction. It would be useful to calculate the expected amount of waste in some real Emacs jobs, and compare that with the total memory usage. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: maximum buffer size exceeded 2007-09-06 4:59 ` Richard Stallman @ 2007-09-06 5:44 ` David Kastrup 0 siblings, 0 replies; 6+ messages in thread From: David Kastrup @ 2007-09-06 5:44 UTC (permalink / raw) To: rms; +Cc: eliz, emacs-devel, help-gnu-emacs, Kim F. Storm Richard Stallman <rms@gnu.org> writes: > The only drawback I can see is that Lisp_Objects have to be allocated > on 16 byte boundaries rather than the current 8 byte boundary, so a > little space may be wasted (and maybe not...). > > For cons cells and floats, it would mean half the space is wasted. Maybe we'll need one higher order bit after all. Or we let every cons cell have a siamese twin float in its second half. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-09-06 5:44 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <uk5r9in1f.fsf@yahoo.com.br> [not found] ` <ct4pidqnvf.fsf@freenet.de> [not found] ` <u1wdgn3wq.fsf@yahoo.com.br> [not found] ` <87ejhgdux0.fsf@lion.rapttech.com.au> [not found] ` <mailman.254.1188847491.18990.help-gnu-emacs@gnu.org> [not found] ` <87veaqr5l2.fsf@kobe.laptop> [not found] ` <utzqatc3x.fsf@gnu.org> 2007-09-05 12:37 ` maximum buffer size exceeded Kim F. Storm 2007-09-05 15:00 ` Stefan Monnier 2007-09-05 15:14 ` Jason Rumney 2007-09-05 16:08 ` Stefan Monnier 2007-09-06 4:59 ` Richard Stallman 2007-09-06 5:44 ` David Kastrup
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).