unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
@ 2011-04-20 21:04 Evans Winner
  2011-04-21  5:52 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Evans Winner @ 2011-04-20 21:04 UTC (permalink / raw)
  To: 8528

My understanding is that a 32-bit GNU Emacs should be able
to open files up to 512 M.  If I am wrong about that, please
let me know.  I have compiled Emacs trunk from source
several times in the last couple of months and somewhere in
the last month or so it seems that the limit on my machine
has become 128 M.  My math could be off, but on the
assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072
bytes, and starting with emacs -Q I tried:

    $ dd if=/dev/zero of=testfile bs=1024 count=131072

and tried to open the file, and got: "Maximum buffer size
exceeded".  Then I tried one K less:

    $ dd if=/dev/zero of=testfile bs=1024 count=131071

and the buffer opened.

I have verified using the `top' command that there is
sufficient free memory for the files.

Also, for what it's worth:

    ELISP> most-positive-fixnum ==> 536870911

I discovered this as a result of not being able to open a
large (~160Mb) .pdf file that I had earlier been able to
open.  Please let me know if there is any other information
I can provide, or if there is something simple I am doing
wrong.


In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, GTK+ Version 3.0.8)
 of 2011-04-19 on braintron
Windowing system distributor `The X.Org Foundation', version 11.0.11001000
configured using `configure  '--with-x-toolkit=gtk3''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-20 21:04 bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Evans Winner
@ 2011-04-21  5:52 ` Eli Zaretskii
  2011-04-21  6:18   ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-04-21  5:52 UTC (permalink / raw)
  To: Evans Winner, Paul Eggert; +Cc: 8528

> From: Evans Winner <ego111@gmail.com>
> Date: Wed, 20 Apr 2011 15:04:06 -0600
> 
> My understanding is that a 32-bit GNU Emacs should be able
> to open files up to 512 M.  If I am wrong about that, please
> let me know.  I have compiled Emacs trunk from source
> several times in the last couple of months and somewhere in
> the last month or so it seems that the limit on my machine
> has become 128 M.  My math could be off, but on the
> assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072
> bytes, and starting with emacs -Q I tried:
> 
>     $ dd if=/dev/zero of=testfile bs=1024 count=131072
> 
> and tried to open the file, and got: "Maximum buffer size
> exceeded".

This happens because of the following test in insert-file-contents:

  /* Arithmetic overflow can occur if an Emacs integer cannot represent the
     file size, or if the calculations below overflow.  The calculations below
     double the file size twice, so check that it can be multiplied by 4
     safely.

     Also check whether the size is negative, which can happen on a platform
     that allows file sizes greater than the maximum off_t value.  */
  if (! not_regular
      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4))
    error ("Maximum buffer size exceeded");

This test was commented out for the last 2 years, but lately it was
uncommented by Paul Eggert in revision 103841 on the trunk.

Paul, could you please tell where do you see twice doubling of the
file size in insert-file-contents?  Back in 1999, when this test was
first introduced, there was indeed such doubling.  But even then it
was only when the REPLACE argument was non-nil (according to my
reading of the code).  In any case, that part of code was completely
rewritten since then, and I don't believe we double the file size even
once.

By disabling that test, I was able to visit a 260-MB file on a 32-bit
machine.  So it seems like this test could be removed, if I'm not
missing anything.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-21  5:52 ` Eli Zaretskii
@ 2011-04-21  6:18   ` Paul Eggert
  2011-04-21  6:40     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2011-04-21  6:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8528, Evans Winner

On 04/20/11 22:52, Eli Zaretskii wrote:
> Paul, could you please tell where do you see twice doubling of the
> file size in insert-file-contents?

I assumed that it was because the internal buffers contain an
Emacs-encoded version of the file, which could be as long as four
times the actual file size, because a single byte in the file
might expand to 4 bytes inside Emacs in some cases.

That would explain the behavior that you saw: if your file's
internal encoding was the same as the external, you wouldn't observe any
problem.  The problem would be exhibited only with files containing
many characters that bloat when read into memory.  However,
I didn't investigate the matter thoroughly; perhaps someone
who's more expert on how Emacs encodes things internally could
speak up.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-21  6:18   ` Paul Eggert
@ 2011-04-21  6:40     ` Eli Zaretskii
  2011-04-21  6:58       ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-04-21  6:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8528, ego111

> Date: Wed, 20 Apr 2011 23:18:55 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Evans Winner <ego111@gmail.com>, 8528@debbugs.gnu.org
> 
> On 04/20/11 22:52, Eli Zaretskii wrote:
> > Paul, could you please tell where do you see twice doubling of the
> > file size in insert-file-contents?
> 
> I assumed that it was because the internal buffers contain an
> Emacs-encoded version of the file, which could be as long as four
> times the actual file size, because a single byte in the file
> might expand to 4 bytes inside Emacs in some cases.

Actually, it could potentially expand even 5-fold (because Emacs
extends UTF-8 to codepoints as large as 0x3FFFFF).  But we test the
buffer size and avoid overflowing it in many other places, both
further down in insert-file-contents and in insdel.c.  If those are
not enough, we could add more such tests, particularly after decoding
the file's contents, where we know the full buffer size in bytes.

So I think artificially limiting the maximum size of a file that can
be visited in that particular place in insert-file-contents is too
harsh.

> That would explain the behavior that you saw: if your file's
> internal encoding was the same as the external, you wouldn't observe any
> problem.  The problem would be exhibited only with files containing
> many characters that bloat when read into memory.

Right, but wouldn't you agree that such a limitation is too stringent?
E.g., I should be able to use find-file-literally to visit a 512MB
file, but currently I cannot.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-21  6:40     ` Eli Zaretskii
@ 2011-04-21  6:58       ` Paul Eggert
  2011-04-21 13:20         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2011-04-21  6:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8528, ego111

On 04/20/11 23:40, Eli Zaretskii wrote:
> Right, but wouldn't you agree that such a limitation is too stringent?

Yes, absolutely, the limit should be removed if possible.

In a brief look at the code, it appeared to me that there were
places where the it does not check for integer overflow
in size calculations when converting external to internal form.  So
it could well be that this preliminary check may be needed to
avoid catastrophe later.  I have not checked this out carefully,
though, and I could be wrong.  (One way to find out would be
to test it with a worst-case-bloat file, but I haven't had time to
do that.)

> E.g., I should be able to use find-file-literally to visit a 512MB
> file, but currently I cannot.

If we know that byte bloat cannot occur, which is the case with
find-file-literally, then the divide-by-4 limit should not be needed.
That case should be easy, in that it shouldn't require a lot
of analysis to fix that case safely.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-21  6:58       ` Paul Eggert
@ 2011-04-21 13:20         ` Eli Zaretskii
  2011-04-29 19:49           ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-04-21 13:20 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8528, ego111

> Date: Wed, 20 Apr 2011 23:58:36 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: ego111@gmail.com, 8528@debbugs.gnu.org
> 
> On 04/20/11 23:40, Eli Zaretskii wrote:
> > Right, but wouldn't you agree that such a limitation is too stringent?
> 
> Yes, absolutely, the limit should be removed if possible.
> 
> In a brief look at the code, it appeared to me that there were
> places where the it does not check for integer overflow
> in size calculations when converting external to internal form.  So
> it could well be that this preliminary check may be needed to
> avoid catastrophe later.  I have not checked this out carefully,
> though, and I could be wrong.

I have now reviewed the code involved in this, and I think the limit
can be lifted.

In general, there could be two ways for us to insert text into the
buffer as result of calling insert-file-contents: (a) directly, by
reading from the file into its buffer (or some temporary buffer used
as part of processing); or (b) indirectly, by decoding inserted text
through various functions in coding.c, which write the decoded text
into the destination buffer.

I found that both of these ways include tests for potential overflows
of the buffer size.  Inserting text directly is protected because it
enlarges the buffer's gap before inserting text, and make_gap which
does that errors out if the new size will overflow (actually, it
errors out 2000 bytes too early, because it wants some extra space).

Insertion by decoding text is also protected because it makes sure the
destination buffer has enough space before it writes another chunk of
decoded text into it.  It assures that by enlarging the gap, which
again goes through make_gap.

I found only one place where we were not protected from overflowing
MOST_POSITIVE_FIXNUM (not sure if it is relevant to
insert-file-contents), and one other place where I wasn't sure we were
protected, so I added a suitable protection in both those places.  See
the proposed patch below.

If no one objects, I will commit these changes in a week or so.

> > E.g., I should be able to use find-file-literally to visit a 512MB
> > file, but currently I cannot.
> 
> If we know that byte bloat cannot occur, which is the case with
> find-file-literally, then the divide-by-4 limit should not be needed.
> That case should be easy, in that it shouldn't require a lot
> of analysis to fix that case safely.

Yes, definitely.  But I think the patch below solves this problem as
well, so there's no need for special treatment for unibyte or pure
ASCII files.

Here's the proposed patch.  Evans, I'd appreciate if you could try it
and see if it solves the original problem for you.

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-04-19 10:48:30 +0000
+++ src/ChangeLog	2011-04-21 12:35:30 +0000
@@ -1,3 +1,16 @@
+2011-04-21  Eli Zaretskii  <eliz@gnu.org>
+
+	* coding.c (coding_alloc_by_realloc): Error out if destination
+	will grow beyond MOST_POSITIVE_FIXNUM.
+	(decode_coding_emacs_mule): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.  Reserve an extra
+	space for up to 2 characters produced in a loop.
+	(decode_coding_iso_2022): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.
+
+	* fileio.c (Finsert_file_contents): Don't limit file size to 1/4
+	of MOST_POSITIVE_FIXNUM.
+
 2011-04-19  Eli Zaretskii  <eliz@gnu.org>
 
 	* syntax.h (SETUP_SYNTAX_TABLE_FOR_OBJECT): Fix setting of

=== modified file 'src/coding.c'
--- src/coding.c	2011-04-14 05:04:02 +0000
+++ src/coding.c	2011-04-21 12:35:33 +0000
@@ -1071,6 +1071,8 @@ coding_set_destination (struct coding_sy
 static void
 coding_alloc_by_realloc (struct coding_system *coding, EMACS_INT bytes)
 {
+  if (coding->dst_bytes > MOST_POSITIVE_FIXNUM - bytes)
+    error ("Maximum size of buffer or string exceeded");
   coding->destination = (unsigned char *) xrealloc (coding->destination,
 						    coding->dst_bytes + bytes);
   coding->dst_bytes += bytes;
@@ -2333,7 +2335,9 @@ decode_coding_emacs_mule (struct coding_
   /* We may produce two annotations (charset and composition) in one
      loop and one more charset annotation at the end.  */
   int *charbuf_end
-    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3);
+    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3)
+      /* We can produce up to 2 characters in a loop.  */
+      - 1;
   EMACS_INT consumed_chars = 0, consumed_chars_base;
   int multibytep = coding->src_multibyte;
   EMACS_INT char_offset = coding->produced_char;
@@ -2348,6 +2352,8 @@ decode_coding_emacs_mule (struct coding_
     {
       int i;
 
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;
@@ -3479,6 +3485,8 @@ decode_coding_iso_2022 (struct coding_sy
 
   if (cmp_status->state != COMPOSING_NO)
     {
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;

=== modified file 'src/fileio.c'
--- src/fileio.c	2011-04-14 20:20:17 +0000
+++ src/fileio.c	2011-04-21 12:07:44 +0000
@@ -3245,15 +3245,10 @@ variable `last-coding-system-used' to th
   record_unwind_protect (close_file_unwind, make_number (fd));
 
 
-  /* Arithmetic overflow can occur if an Emacs integer cannot represent the
-     file size, or if the calculations below overflow.  The calculations below
-     double the file size twice, so check that it can be multiplied by 4
-     safely.
-
-     Also check whether the size is negative, which can happen on a platform
-     that allows file sizes greater than the maximum off_t value.  */
+  /* Check whether the size is too large or negative, which can happen on a
+     platform that allows file sizes greater than the maximum off_t value.  */
   if (! not_regular
-      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4))
+      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM))
     error ("Maximum buffer size exceeded");
 
   /* Prevent redisplay optimizations.  */






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-21 13:20         ` Eli Zaretskii
@ 2011-04-29 19:49           ` Eli Zaretskii
  2011-05-02 14:53             ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-04-29 19:49 UTC (permalink / raw)
  To: eggert, ego111; +Cc: 8528-done

> Date: Thu, 21 Apr 2011 16:20:54 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 8528@debbugs.gnu.org, ego111@gmail.com
> 
> If no one objects, I will commit these changes in a week or so.

No one objected, so I installed this.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
  2011-04-29 19:49           ` Eli Zaretskii
@ 2011-05-02 14:53             ` Stefan Monnier
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2011-05-02 14:53 UTC (permalink / raw)
  To: 8528

> No one objected, so I installed this.

Thanks,


        Stefan





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-02 14:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-20 21:04 bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Evans Winner
2011-04-21  5:52 ` Eli Zaretskii
2011-04-21  6:18   ` Paul Eggert
2011-04-21  6:40     ` Eli Zaretskii
2011-04-21  6:58       ` Paul Eggert
2011-04-21 13:20         ` Eli Zaretskii
2011-04-29 19:49           ` Eli Zaretskii
2011-05-02 14:53             ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).