* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit @ 2011-04-20 21:04 Evans Winner 2011-04-21 5:52 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Evans Winner @ 2011-04-20 21:04 UTC (permalink / raw) To: 8528 My understanding is that a 32-bit GNU Emacs should be able to open files up to 512 M. If I am wrong about that, please let me know. I have compiled Emacs trunk from source several times in the last couple of months and somewhere in the last month or so it seems that the limit on my machine has become 128 M. My math could be off, but on the assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072 bytes, and starting with emacs -Q I tried: $ dd if=/dev/zero of=testfile bs=1024 count=131072 and tried to open the file, and got: "Maximum buffer size exceeded". Then I tried one K less: $ dd if=/dev/zero of=testfile bs=1024 count=131071 and the buffer opened. I have verified using the `top' command that there is sufficient free memory for the files. Also, for what it's worth: ELISP> most-positive-fixnum ==> 536870911 I discovered this as a result of not being able to open a large (~160Mb) .pdf file that I had earlier been able to open. Please let me know if there is any other information I can provide, or if there is something simple I am doing wrong. In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, GTK+ Version 3.0.8) of 2011-04-19 on braintron Windowing system distributor `The X.Org Foundation', version 11.0.11001000 configured using `configure '--with-x-toolkit=gtk3'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-20 21:04 bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Evans Winner @ 2011-04-21 5:52 ` Eli Zaretskii 2011-04-21 6:18 ` Paul Eggert 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2011-04-21 5:52 UTC (permalink / raw) To: Evans Winner, Paul Eggert; +Cc: 8528 > From: Evans Winner <ego111@gmail.com> > Date: Wed, 20 Apr 2011 15:04:06 -0600 > > My understanding is that a 32-bit GNU Emacs should be able > to open files up to 512 M. If I am wrong about that, please > let me know. I have compiled Emacs trunk from source > several times in the last couple of months and somewhere in > the last month or so it seems that the limit on my machine > has become 128 M. My math could be off, but on the > assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072 > bytes, and starting with emacs -Q I tried: > > $ dd if=/dev/zero of=testfile bs=1024 count=131072 > > and tried to open the file, and got: "Maximum buffer size > exceeded". This happens because of the following test in insert-file-contents: /* Arithmetic overflow can occur if an Emacs integer cannot represent the file size, or if the calculations below overflow. The calculations below double the file size twice, so check that it can be multiplied by 4 safely. Also check whether the size is negative, which can happen on a platform that allows file sizes greater than the maximum off_t value. */ if (! not_regular && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4)) error ("Maximum buffer size exceeded"); This test was commented out for the last 2 years, but lately it was uncommented by Paul Eggert in revision 103841 on the trunk. Paul, could you please tell where do you see twice doubling of the file size in insert-file-contents? Back in 1999, when this test was first introduced, there was indeed such doubling. But even then it was only when the REPLACE argument was non-nil (according to my reading of the code). In any case, that part of code was completely rewritten since then, and I don't believe we double the file size even once. By disabling that test, I was able to visit a 260-MB file on a 32-bit machine. So it seems like this test could be removed, if I'm not missing anything. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-21 5:52 ` Eli Zaretskii @ 2011-04-21 6:18 ` Paul Eggert 2011-04-21 6:40 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Paul Eggert @ 2011-04-21 6:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 8528, Evans Winner On 04/20/11 22:52, Eli Zaretskii wrote: > Paul, could you please tell where do you see twice doubling of the > file size in insert-file-contents? I assumed that it was because the internal buffers contain an Emacs-encoded version of the file, which could be as long as four times the actual file size, because a single byte in the file might expand to 4 bytes inside Emacs in some cases. That would explain the behavior that you saw: if your file's internal encoding was the same as the external, you wouldn't observe any problem. The problem would be exhibited only with files containing many characters that bloat when read into memory. However, I didn't investigate the matter thoroughly; perhaps someone who's more expert on how Emacs encodes things internally could speak up. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-21 6:18 ` Paul Eggert @ 2011-04-21 6:40 ` Eli Zaretskii 2011-04-21 6:58 ` Paul Eggert 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2011-04-21 6:40 UTC (permalink / raw) To: Paul Eggert; +Cc: 8528, ego111 > Date: Wed, 20 Apr 2011 23:18:55 -0700 > From: Paul Eggert <eggert@cs.ucla.edu> > CC: Evans Winner <ego111@gmail.com>, 8528@debbugs.gnu.org > > On 04/20/11 22:52, Eli Zaretskii wrote: > > Paul, could you please tell where do you see twice doubling of the > > file size in insert-file-contents? > > I assumed that it was because the internal buffers contain an > Emacs-encoded version of the file, which could be as long as four > times the actual file size, because a single byte in the file > might expand to 4 bytes inside Emacs in some cases. Actually, it could potentially expand even 5-fold (because Emacs extends UTF-8 to codepoints as large as 0x3FFFFF). But we test the buffer size and avoid overflowing it in many other places, both further down in insert-file-contents and in insdel.c. If those are not enough, we could add more such tests, particularly after decoding the file's contents, where we know the full buffer size in bytes. So I think artificially limiting the maximum size of a file that can be visited in that particular place in insert-file-contents is too harsh. > That would explain the behavior that you saw: if your file's > internal encoding was the same as the external, you wouldn't observe any > problem. The problem would be exhibited only with files containing > many characters that bloat when read into memory. Right, but wouldn't you agree that such a limitation is too stringent? E.g., I should be able to use find-file-literally to visit a 512MB file, but currently I cannot. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-21 6:40 ` Eli Zaretskii @ 2011-04-21 6:58 ` Paul Eggert 2011-04-21 13:20 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Paul Eggert @ 2011-04-21 6:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 8528, ego111 On 04/20/11 23:40, Eli Zaretskii wrote: > Right, but wouldn't you agree that such a limitation is too stringent? Yes, absolutely, the limit should be removed if possible. In a brief look at the code, it appeared to me that there were places where the it does not check for integer overflow in size calculations when converting external to internal form. So it could well be that this preliminary check may be needed to avoid catastrophe later. I have not checked this out carefully, though, and I could be wrong. (One way to find out would be to test it with a worst-case-bloat file, but I haven't had time to do that.) > E.g., I should be able to use find-file-literally to visit a 512MB > file, but currently I cannot. If we know that byte bloat cannot occur, which is the case with find-file-literally, then the divide-by-4 limit should not be needed. That case should be easy, in that it shouldn't require a lot of analysis to fix that case safely. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-21 6:58 ` Paul Eggert @ 2011-04-21 13:20 ` Eli Zaretskii 2011-04-29 19:49 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2011-04-21 13:20 UTC (permalink / raw) To: Paul Eggert; +Cc: 8528, ego111 > Date: Wed, 20 Apr 2011 23:58:36 -0700 > From: Paul Eggert <eggert@cs.ucla.edu> > CC: ego111@gmail.com, 8528@debbugs.gnu.org > > On 04/20/11 23:40, Eli Zaretskii wrote: > > Right, but wouldn't you agree that such a limitation is too stringent? > > Yes, absolutely, the limit should be removed if possible. > > In a brief look at the code, it appeared to me that there were > places where the it does not check for integer overflow > in size calculations when converting external to internal form. So > it could well be that this preliminary check may be needed to > avoid catastrophe later. I have not checked this out carefully, > though, and I could be wrong. I have now reviewed the code involved in this, and I think the limit can be lifted. In general, there could be two ways for us to insert text into the buffer as result of calling insert-file-contents: (a) directly, by reading from the file into its buffer (or some temporary buffer used as part of processing); or (b) indirectly, by decoding inserted text through various functions in coding.c, which write the decoded text into the destination buffer. I found that both of these ways include tests for potential overflows of the buffer size. Inserting text directly is protected because it enlarges the buffer's gap before inserting text, and make_gap which does that errors out if the new size will overflow (actually, it errors out 2000 bytes too early, because it wants some extra space). Insertion by decoding text is also protected because it makes sure the destination buffer has enough space before it writes another chunk of decoded text into it. It assures that by enlarging the gap, which again goes through make_gap. I found only one place where we were not protected from overflowing MOST_POSITIVE_FIXNUM (not sure if it is relevant to insert-file-contents), and one other place where I wasn't sure we were protected, so I added a suitable protection in both those places. See the proposed patch below. If no one objects, I will commit these changes in a week or so. > > E.g., I should be able to use find-file-literally to visit a 512MB > > file, but currently I cannot. > > If we know that byte bloat cannot occur, which is the case with > find-file-literally, then the divide-by-4 limit should not be needed. > That case should be easy, in that it shouldn't require a lot > of analysis to fix that case safely. Yes, definitely. But I think the patch below solves this problem as well, so there's no need for special treatment for unibyte or pure ASCII files. Here's the proposed patch. Evans, I'd appreciate if you could try it and see if it solves the original problem for you. === modified file 'src/ChangeLog' --- src/ChangeLog 2011-04-19 10:48:30 +0000 +++ src/ChangeLog 2011-04-21 12:35:30 +0000 @@ -1,3 +1,16 @@ +2011-04-21 Eli Zaretskii <eliz@gnu.org> + + * coding.c (coding_alloc_by_realloc): Error out if destination + will grow beyond MOST_POSITIVE_FIXNUM. + (decode_coding_emacs_mule): Abort if there isn't enough place in + charbuf for the composition carryover bytes. Reserve an extra + space for up to 2 characters produced in a loop. + (decode_coding_iso_2022): Abort if there isn't enough place in + charbuf for the composition carryover bytes. + + * fileio.c (Finsert_file_contents): Don't limit file size to 1/4 + of MOST_POSITIVE_FIXNUM. + 2011-04-19 Eli Zaretskii <eliz@gnu.org> * syntax.h (SETUP_SYNTAX_TABLE_FOR_OBJECT): Fix setting of === modified file 'src/coding.c' --- src/coding.c 2011-04-14 05:04:02 +0000 +++ src/coding.c 2011-04-21 12:35:33 +0000 @@ -1071,6 +1071,8 @@ coding_set_destination (struct coding_sy static void coding_alloc_by_realloc (struct coding_system *coding, EMACS_INT bytes) { + if (coding->dst_bytes > MOST_POSITIVE_FIXNUM - bytes) + error ("Maximum size of buffer or string exceeded"); coding->destination = (unsigned char *) xrealloc (coding->destination, coding->dst_bytes + bytes); coding->dst_bytes += bytes; @@ -2333,7 +2335,9 @@ decode_coding_emacs_mule (struct coding_ /* We may produce two annotations (charset and composition) in one loop and one more charset annotation at the end. */ int *charbuf_end - = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3); + = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3) + /* We can produce up to 2 characters in a loop. */ + - 1; EMACS_INT consumed_chars = 0, consumed_chars_base; int multibytep = coding->src_multibyte; EMACS_INT char_offset = coding->produced_char; @@ -2348,6 +2352,8 @@ decode_coding_emacs_mule (struct coding_ { int i; + if (charbuf_end - charbuf < cmp_status->length) + abort (); for (i = 0; i < cmp_status->length; i++) *charbuf++ = cmp_status->carryover[i]; coding->annotated = 1; @@ -3479,6 +3485,8 @@ decode_coding_iso_2022 (struct coding_sy if (cmp_status->state != COMPOSING_NO) { + if (charbuf_end - charbuf < cmp_status->length) + abort (); for (i = 0; i < cmp_status->length; i++) *charbuf++ = cmp_status->carryover[i]; coding->annotated = 1; === modified file 'src/fileio.c' --- src/fileio.c 2011-04-14 20:20:17 +0000 +++ src/fileio.c 2011-04-21 12:07:44 +0000 @@ -3245,15 +3245,10 @@ variable `last-coding-system-used' to th record_unwind_protect (close_file_unwind, make_number (fd)); - /* Arithmetic overflow can occur if an Emacs integer cannot represent the - file size, or if the calculations below overflow. The calculations below - double the file size twice, so check that it can be multiplied by 4 - safely. - - Also check whether the size is negative, which can happen on a platform - that allows file sizes greater than the maximum off_t value. */ + /* Check whether the size is too large or negative, which can happen on a + platform that allows file sizes greater than the maximum off_t value. */ if (! not_regular - && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4)) + && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM)) error ("Maximum buffer size exceeded"); /* Prevent redisplay optimizations. */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-21 13:20 ` Eli Zaretskii @ 2011-04-29 19:49 ` Eli Zaretskii 2011-05-02 14:53 ` Stefan Monnier 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2011-04-29 19:49 UTC (permalink / raw) To: eggert, ego111; +Cc: 8528-done > Date: Thu, 21 Apr 2011 16:20:54 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 8528@debbugs.gnu.org, ego111@gmail.com > > If no one objects, I will commit these changes in a week or so. No one objected, so I installed this. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit 2011-04-29 19:49 ` Eli Zaretskii @ 2011-05-02 14:53 ` Stefan Monnier 0 siblings, 0 replies; 8+ messages in thread From: Stefan Monnier @ 2011-05-02 14:53 UTC (permalink / raw) To: 8528 > No one objected, so I installed this. Thanks, Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-05-02 14:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-20 21:04 bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Evans Winner 2011-04-21 5:52 ` Eli Zaretskii 2011-04-21 6:18 ` Paul Eggert 2011-04-21 6:40 ` Eli Zaretskii 2011-04-21 6:58 ` Paul Eggert 2011-04-21 13:20 ` Eli Zaretskii 2011-04-29 19:49 ` Eli Zaretskii 2011-05-02 14:53 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).