unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: 8528@debbugs.gnu.org, ego111@gmail.com
Subject: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
Date: Thu, 21 Apr 2011 16:20:54 +0300	[thread overview]
Message-ID: <838vv33fm1.fsf@gnu.org> (raw)
In-Reply-To: <4DAFD59C.5090602@cs.ucla.edu>

> Date: Wed, 20 Apr 2011 23:58:36 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: ego111@gmail.com, 8528@debbugs.gnu.org
> 
> On 04/20/11 23:40, Eli Zaretskii wrote:
> > Right, but wouldn't you agree that such a limitation is too stringent?
> 
> Yes, absolutely, the limit should be removed if possible.
> 
> In a brief look at the code, it appeared to me that there were
> places where the it does not check for integer overflow
> in size calculations when converting external to internal form.  So
> it could well be that this preliminary check may be needed to
> avoid catastrophe later.  I have not checked this out carefully,
> though, and I could be wrong.

I have now reviewed the code involved in this, and I think the limit
can be lifted.

In general, there could be two ways for us to insert text into the
buffer as result of calling insert-file-contents: (a) directly, by
reading from the file into its buffer (or some temporary buffer used
as part of processing); or (b) indirectly, by decoding inserted text
through various functions in coding.c, which write the decoded text
into the destination buffer.

I found that both of these ways include tests for potential overflows
of the buffer size.  Inserting text directly is protected because it
enlarges the buffer's gap before inserting text, and make_gap which
does that errors out if the new size will overflow (actually, it
errors out 2000 bytes too early, because it wants some extra space).

Insertion by decoding text is also protected because it makes sure the
destination buffer has enough space before it writes another chunk of
decoded text into it.  It assures that by enlarging the gap, which
again goes through make_gap.

I found only one place where we were not protected from overflowing
MOST_POSITIVE_FIXNUM (not sure if it is relevant to
insert-file-contents), and one other place where I wasn't sure we were
protected, so I added a suitable protection in both those places.  See
the proposed patch below.

If no one objects, I will commit these changes in a week or so.

> > E.g., I should be able to use find-file-literally to visit a 512MB
> > file, but currently I cannot.
> 
> If we know that byte bloat cannot occur, which is the case with
> find-file-literally, then the divide-by-4 limit should not be needed.
> That case should be easy, in that it shouldn't require a lot
> of analysis to fix that case safely.

Yes, definitely.  But I think the patch below solves this problem as
well, so there's no need for special treatment for unibyte or pure
ASCII files.

Here's the proposed patch.  Evans, I'd appreciate if you could try it
and see if it solves the original problem for you.

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-04-19 10:48:30 +0000
+++ src/ChangeLog	2011-04-21 12:35:30 +0000
@@ -1,3 +1,16 @@
+2011-04-21  Eli Zaretskii  <eliz@gnu.org>
+
+	* coding.c (coding_alloc_by_realloc): Error out if destination
+	will grow beyond MOST_POSITIVE_FIXNUM.
+	(decode_coding_emacs_mule): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.  Reserve an extra
+	space for up to 2 characters produced in a loop.
+	(decode_coding_iso_2022): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.
+
+	* fileio.c (Finsert_file_contents): Don't limit file size to 1/4
+	of MOST_POSITIVE_FIXNUM.
+
 2011-04-19  Eli Zaretskii  <eliz@gnu.org>
 
 	* syntax.h (SETUP_SYNTAX_TABLE_FOR_OBJECT): Fix setting of

=== modified file 'src/coding.c'
--- src/coding.c	2011-04-14 05:04:02 +0000
+++ src/coding.c	2011-04-21 12:35:33 +0000
@@ -1071,6 +1071,8 @@ coding_set_destination (struct coding_sy
 static void
 coding_alloc_by_realloc (struct coding_system *coding, EMACS_INT bytes)
 {
+  if (coding->dst_bytes > MOST_POSITIVE_FIXNUM - bytes)
+    error ("Maximum size of buffer or string exceeded");
   coding->destination = (unsigned char *) xrealloc (coding->destination,
 						    coding->dst_bytes + bytes);
   coding->dst_bytes += bytes;
@@ -2333,7 +2335,9 @@ decode_coding_emacs_mule (struct coding_
   /* We may produce two annotations (charset and composition) in one
      loop and one more charset annotation at the end.  */
   int *charbuf_end
-    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3);
+    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3)
+      /* We can produce up to 2 characters in a loop.  */
+      - 1;
   EMACS_INT consumed_chars = 0, consumed_chars_base;
   int multibytep = coding->src_multibyte;
   EMACS_INT char_offset = coding->produced_char;
@@ -2348,6 +2352,8 @@ decode_coding_emacs_mule (struct coding_
     {
       int i;
 
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;
@@ -3479,6 +3485,8 @@ decode_coding_iso_2022 (struct coding_sy
 
   if (cmp_status->state != COMPOSING_NO)
     {
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;

=== modified file 'src/fileio.c'
--- src/fileio.c	2011-04-14 20:20:17 +0000
+++ src/fileio.c	2011-04-21 12:07:44 +0000
@@ -3245,15 +3245,10 @@ variable `last-coding-system-used' to th
   record_unwind_protect (close_file_unwind, make_number (fd));
 
 
-  /* Arithmetic overflow can occur if an Emacs integer cannot represent the
-     file size, or if the calculations below overflow.  The calculations below
-     double the file size twice, so check that it can be multiplied by 4
-     safely.
-
-     Also check whether the size is negative, which can happen on a platform
-     that allows file sizes greater than the maximum off_t value.  */
+  /* Check whether the size is too large or negative, which can happen on a
+     platform that allows file sizes greater than the maximum off_t value.  */
   if (! not_regular
-      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4))
+      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM))
     error ("Maximum buffer size exceeded");
 
   /* Prevent redisplay optimizations.  */






  reply	other threads:[~2011-04-21 13:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20 21:04 bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Evans Winner
2011-04-21  5:52 ` Eli Zaretskii
2011-04-21  6:18   ` Paul Eggert
2011-04-21  6:40     ` Eli Zaretskii
2011-04-21  6:58       ` Paul Eggert
2011-04-21 13:20         ` Eli Zaretskii [this message]
2011-04-29 19:49           ` Eli Zaretskii
2011-05-02 14:53             ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=838vv33fm1.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=8528@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=ego111@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).