unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: maximum buffer size exceeded
       [not found]           ` <utzqatc3x.fsf@gnu.org>
@ 2007-09-05 12:37             ` Kim F. Storm
  2007-09-05 15:00               ` Stefan Monnier
  2007-09-06  4:59               ` Richard Stallman
  0 siblings, 2 replies; 6+ messages in thread
From: Kim F. Storm @ 2007-09-05 12:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I think the current view in Emacs development is that 64-bit platforms
> solve this problem so easily that its solution for 32-bit machines is
> much less important than working on other Emacs features.

Actually, I think a small trick could increase the buffer size to 1 GB
on 32 bit machines at the cost of a little(?) wasted memory.

[Note: Assuming USE_LSB_TAG is defined]

Currently, we have the lowest 3 bits reserved for the Lisp Type,
meaning that the largest positive Emacs integer is 2^28-1 (256MB).

Now, consider if we reserve 4 bits for the Lisp Type, but
in such a way the Lisp_Int == 0, while the other Lisp types
are odd numbers 1,3,5,7,...

In this setup, an integer can be recognized by looking at the lowest
bit alone (== 0), while the other Lisp types are recognized using the
current methods (looking at all 4 type bits).

The only drawback I can see is that Lisp_Objects have to be allocated
on 16 byte boundaries rather than the current 8 byte boundary, so a
little space may be wasted (and maybe not...).

I haven't tried this, but given that Lisp_Objects are usually accessed
via suitable macros, it looks quite doable.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: maximum buffer size exceeded
  2007-09-05 12:37             ` maximum buffer size exceeded Kim F. Storm
@ 2007-09-05 15:00               ` Stefan Monnier
  2007-09-05 15:14                 ` Jason Rumney
  2007-09-06  4:59               ` Richard Stallman
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2007-09-05 15:00 UTC (permalink / raw)
  To: Kim F. Storm; +Cc: Eli Zaretskii, help-gnu-emacs, emacs-devel

>> I think the current view in Emacs development is that 64-bit platforms
>> solve this problem so easily that its solution for 32-bit machines is
>> much less important than working on other Emacs features.

> Actually, I think a small trick could increase the buffer size to 1 GB
> on 32 bit machines at the cost of a little(?) wasted memory.

> [Note: Assuming USE_LSB_TAG is defined]

> Currently, we have the lowest 3 bits reserved for the Lisp Type,
> meaning that the largest positive Emacs integer is 2^28-1 (256MB).

> Now, consider if we reserve 4 bits for the Lisp Type, but
> in such a way the Lisp_Int == 0, while the other Lisp types
> are odd numbers 1,3,5,7,...

> In this setup, an integer can be recognized by looking at the lowest
> bit alone (== 0), while the other Lisp types are recognized using the
> current methods (looking at all 4 type bits).

> The only drawback I can see is that Lisp_Objects have to be allocated
> on 16 byte boundaries rather than the current 8 byte boundary, so a
> little space may be wasted (and maybe not...).

> I haven't tried this, but given that Lisp_Objects are usually accessed
> via suitable macros, it looks quite doable.

Increasing from 8 to 16 bytes alignment may be a non-trivial problem:
1 - cons cells use 8 bytes right now, so you'd waste a lot of space for them.
2 - same for floats.
3 - in many places, we rely on malloc to align objects on multiple of 8, so
    we'd have to use some other approach.

Numbers 1 and 2 can be solved by giving two tags to cons and floats, so they
only need alignment on multiple of 8.

Number 3 is more work.  But this work may be the same as the one needed to
allow us to use USE_LSB_TAG everywhere (even on machines where malloc and
static-vars do not guarantee mult-of-8 alignment).

We currently have 7 different types (of the 8 possible tag we only use 7).

My own local Emacs build uses the trick you suggest but on the 3bits of
tags, so I gave 2 tags to integers to allow them to grow up the 2^29
(i.e. max buffer size = 512MB).  That's a very simple change.

What you suggest would be to use 4 bits i.e. 16 possible tags:
- 8 tags for integers (i.e. 8 tags left for the 6 other types)
- 2 tags for cons cells (6 tags left for the 5 other types)
- 2 tags for floats
- one tag each for the remaining 4 types (arrays, symbols, strings, misc).

One other problem: currently `misc' objects need 5 32bit words which
USE_LSB_TAG forced to round up to 6 32bit words and symbols use 6 32bit
words.  So rounding up to mult-of-16 would round them both up to
8 32bit words.

The two subtypes of misc which use up 5 words are markers and overlays.
So with your rounding up, an overlay would use up 3*8=24 words (3 because
there's the overlay object plus the two associated marker objects) instead
of 15 (without USE_LSB_TAG) or 18.

I had plans to try and squeeze `misc' objects down to 4 words (and hence
overlays down to 12 words), but this is a non-trivial change.  One possible
approach is to replace the linked lists of overlays and markers by arrays
(managed just like buffer text: with a gap).

Another option is to remove the `symbol' and `string' tags and make symbols
and strings subtype of `misc'.  Then we could keep 3 tag bits and give 4 of
the 8 tags to integers.  This would simplify the alloc.c code but would also
waste more memory (6 words for string objects) and slow down SYMBOLP and
STRINGP slightly.

Still, the fundamental problem remains the same: files larger than 256MB
are most likely not generated manually.  So they may very likely grow to
more than 4GB tomorrow.  Bumping the limit to 512MB or 1GB (or even 4GB for
that matter) is only going to help in some fraction of the cases.

I think a better approach to handle this problem is to create a special
package to visit arbitrarily large files which would work by loading only
parts of the file at a time and do manual "swapping".  This would not work
as smoothly, but then again manipulating 256MB files in Emacs is currently
not that smooth either.


        Stefan


PS: You can supposedly open >4GB files in Emacs with 64bit systems, but
looking at the C code, it's clear that you'll bump into bugs where we cast
EMACS_INT values to and from `int' (which on many 64bit systems are only
32bit).  I tend to fix those bugs when I bump into them, but they're
everywhere and I've fixed only a tiny fraction of them.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: maximum buffer size exceeded
  2007-09-05 15:00               ` Stefan Monnier
@ 2007-09-05 15:14                 ` Jason Rumney
  2007-09-05 16:08                   ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Rumney @ 2007-09-05 15:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Kim F. Storm

Stefan Monnier wrote:
> PS: You can supposedly open >4GB files in Emacs with 64bit systems, but
> looking at the C code, it's clear that you'll bump into bugs where we cast
> EMACS_INT values to and from `int' (which on many 64bit systems are only
> 32bit).  I tend to fix those bugs when I bump into them, but they're
> everywhere and I've fixed only a tiny fraction of them.
>   
long is also 32 bits on 64bit versions of Windows, so avoid simply
replacing int with long.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: maximum buffer size exceeded
  2007-09-05 15:14                 ` Jason Rumney
@ 2007-09-05 16:08                   ` Stefan Monnier
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2007-09-05 16:08 UTC (permalink / raw)
  To: Jason Rumney; +Cc: Eli Zaretskii, Kim F. Storm, help-gnu-emacs, emacs-devel

>> PS: You can supposedly open >4GB files in Emacs with 64bit systems, but
>> looking at the C code, it's clear that you'll bump into bugs where we cast
>> EMACS_INT values to and from `int' (which on many 64bit systems are only
>> 32bit).  I tend to fix those bugs when I bump into them, but they're
>> everywhere and I've fixed only a tiny fraction of them.
>> 
> long is also 32 bits on 64bit versions of Windows, so avoid simply
> replacing int with long.

Of course, it should be replaced with EMACS_INT or EMACS_UINT.


        Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: maximum buffer size exceeded
  2007-09-05 12:37             ` maximum buffer size exceeded Kim F. Storm
  2007-09-05 15:00               ` Stefan Monnier
@ 2007-09-06  4:59               ` Richard Stallman
  2007-09-06  5:44                 ` David Kastrup
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Stallman @ 2007-09-06  4:59 UTC (permalink / raw)
  To: Kim F. Storm; +Cc: help-gnu-emacs, emacs-devel

    The only drawback I can see is that Lisp_Objects have to be allocated
    on 16 byte boundaries rather than the current 8 byte boundary, so a
    little space may be wasted (and maybe not...).

For cons cells and floats, it would mean half the space is wasted.
Markers and symbols and miscs will also waste space, but a smaller
fraction.

It would be useful to calculate the expected amount of waste in some
real Emacs jobs, and compare that with the total memory usage.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: maximum buffer size exceeded
  2007-09-06  4:59               ` Richard Stallman
@ 2007-09-06  5:44                 ` David Kastrup
  0 siblings, 0 replies; 6+ messages in thread
From: David Kastrup @ 2007-09-06  5:44 UTC (permalink / raw)
  To: rms; +Cc: eliz, emacs-devel, help-gnu-emacs, Kim F. Storm

Richard Stallman <rms@gnu.org> writes:

>     The only drawback I can see is that Lisp_Objects have to be allocated
>     on 16 byte boundaries rather than the current 8 byte boundary, so a
>     little space may be wasted (and maybe not...).
>
> For cons cells and floats, it would mean half the space is wasted.

Maybe we'll need one higher order bit after all.  Or we let every cons
cell have a siamese twin float in its second half.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-09-06  5:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <uk5r9in1f.fsf@yahoo.com.br>
     [not found] ` <ct4pidqnvf.fsf@freenet.de>
     [not found]   ` <u1wdgn3wq.fsf@yahoo.com.br>
     [not found]     ` <87ejhgdux0.fsf@lion.rapttech.com.au>
     [not found]       ` <mailman.254.1188847491.18990.help-gnu-emacs@gnu.org>
     [not found]         ` <87veaqr5l2.fsf@kobe.laptop>
     [not found]           ` <utzqatc3x.fsf@gnu.org>
2007-09-05 12:37             ` maximum buffer size exceeded Kim F. Storm
2007-09-05 15:00               ` Stefan Monnier
2007-09-05 15:14                 ` Jason Rumney
2007-09-05 16:08                   ` Stefan Monnier
2007-09-06  4:59               ` Richard Stallman
2007-09-06  5:44                 ` David Kastrup

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).