unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 36431@debbugs.gnu.org
Subject: bug#36431: Crash in marker.c:337
Date: Tue, 02 Jul 2019 15:44:07 -0400	[thread overview]
Message-ID: <jwvr278i6ap.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <83sgrocmws.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Jul 2019 21:27:15 +0300")

>> - we insert the new bytes at the beginning of the gap, in order to have
>>   room to grow if there are more bytes than expected, and also in case
>>   there are fewer bytes than expected (in which case we'd otherwise
>>   have to move the bytes we just read so they properly end at the end
>>   of the gap).
>
> Also, you will see in insert-file-contents that it supports quitting
> while reading a huge file, and also the REPLACE argument, where we
> detect the same contents at beginning and end of the file and the
> buffer.

Right, tho the end result is the same (e.g. when we quit, we can either
abort the whole operation and trow away the bytes we read, or we can
keep going with the bytes we did read which is simply another case of
reading less than expected).

>> - decode_coding_gap wants the new input bytes to be at the end of the
>>   gap, so that we can put the decoded chars at the beginning of the gap
>>   and as one grows the other shrinks, so we don't need space for "IN +
>>   OUT" bytes but only for "OUT" bytes.  Is that right (I'm trying to
>>   find some comment or other evidence that this is the case, but
>>   haven't found it yet).
>
> That's right.  The comment you are looking for (well, at least part of
> it) is in the commentary before decode_coding, where it explains the
> semantics of CODING->src_pos.  You will see at the beginning of
> decode_coding_gap how it sets things up according to that hairy
> protocol.

IIUC you're referring to this comment:

    Decode the data at CODING->src_object into CODING->dst_object.
    CODING->src_object is a buffer, a string, or nil.
    CODING->dst_object is a buffer.
 
    If CODING->src_object is a buffer, it must be the current buffer.
    In this case, if CODING->src_pos is positive, it is a position of
    the source text in the buffer, otherwise, the source text is in the
    gap area of the buffer, and CODING->src_pos specifies the offset of
    the text from GPT (which must be the same as PT).  If this is the
    same buffer as CODING->dst_object, CODING->src_pos must be
    negative.
 
    If CODING->src_object is a string, CODING->src_pos is an index to
    that string.
 
    If CODING->src_object is nil, CODING->source must already point to
    the non-relocatable memory area.  In this case, CODING->src_pos is
    an offset from CODING->source.
 
    The decoded data is inserted at the current point of the buffer
    CODING->dst_object.

but this doesn't say if the bytes are to be found originally at the
beginning of the gap or its end, nor whether they finish at the beginning or
the end, nor what happens in the middle and why it's been designed this way.

Is the patch below correct?

>> IOW, it should be possible to optimize the common case by reading the
>> new bytes into the end of the gap to avoid moving everything in the
>> common case (if the number of bytes read is different from originally
>> expected, we'll have to do extra work, but for the common case where we
>> know the file size upfront and it doesn't change while we read it, this
>> will save us some work).
>> 
>> But the effort is probably not worth the trouble: a memmove of a few
>> gigabytes costs relatively little compared to the cost of actually
>> decoding those same gigabytes.
>
> Right.  Also, there are the other subtle issues with quitting, the
> REPLACE argument, special files, etc.

I think the crash-example I sent can probably be made less esoteric by
making it use "quit" instead of catch/throw.  I'm beginning to think
that when we quit (or signal an error) from within
set-auto-coding-function, we simply shouldn't revert the buffer
to multibyte.


        Stefan


diff --git a/src/coding.c b/src/coding.c
index 5b9bfa17dd..218d69e2e7 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7322,11 +7322,16 @@ produce_annotation (struct coding_system *coding, ptrdiff_t pos)
 
    If CODING->src_object is a buffer, it must be the current buffer.
    In this case, if CODING->src_pos is positive, it is a position of
-   the source text in the buffer, otherwise, the source text is in the
-   gap area of the buffer, and CODING->src_pos specifies the offset of
-   the text from GPT (which must be the same as PT).  If this is the
-   same buffer as CODING->dst_object, CODING->src_pos must be
-   negative.
+   the source text in the buffer, otherwise, the source text is at the
+   end of the gap area of the buffer, and CODING->src_pos specifies the
+   offset of the text from the end of the gap (which must be the at PT).
+   If this is the same buffer as CODING->dst_object, CODING->src_pos must
+   be negative.
+   
+   When the text is taken from the gap, it needs to be at the end of
+   the gap so that we can produce the decoded text at the beginning of
+   the gap: this way, as the output grows, the input shrinks, so we only
+   need to allocate enough space for `max(IN, OUT)` instead of `IN + OUT`.
 
    If CODING->src_object is a string, CODING->src_pos is an index to
    that string.






  reply	other threads:[~2019-07-02 19:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-29 11:17 bug#36431: Crash in marker.c:337 Werner LEMBERG
2019-06-29 12:13 ` Eli Zaretskii
2019-06-29 12:20   ` Eli Zaretskii
2019-06-29 22:56     ` Stefan Monnier
2019-06-30  7:26       ` Werner LEMBERG
2019-06-30 13:14         ` Stefan Monnier
2019-07-02 16:29           ` Stefan Monnier
2019-06-30 14:52         ` Eli Zaretskii
2019-06-30 14:39       ` Eli Zaretskii
2019-06-30 14:59         ` Stefan Monnier
2019-06-30 15:16           ` Eli Zaretskii
2019-06-30 15:53             ` Stefan Monnier
2019-07-02 17:04         ` Stefan Monnier
2019-07-02 17:22           ` Stefan Monnier
2019-07-02 17:37             ` Stefan Monnier
2019-07-02 17:42               ` Eli Zaretskii
2019-07-02 17:55                 ` Stefan Monnier
2019-07-02 17:39           ` Eli Zaretskii
2019-07-02 17:51             ` Stefan Monnier
2019-07-02 18:27               ` Eli Zaretskii
2019-07-02 19:44                 ` Stefan Monnier [this message]
2019-07-02 20:15                   ` Eli Zaretskii
2019-07-02 21:00                     ` Stefan Monnier
2019-07-03  4:49                       ` Eli Zaretskii
2019-07-03 16:19                         ` Stefan Monnier
2019-07-03 16:33                           ` Eli Zaretskii
2019-07-03  4:21         ` Stefan Monnier
2019-07-03  4:55           ` Eli Zaretskii
2019-07-03  6:20           ` Werner LEMBERG
2019-07-03  6:29             ` Eli Zaretskii
2019-07-03  6:46               ` Werner LEMBERG
2019-07-03  7:14                 ` Eli Zaretskii
2019-07-03 16:08             ` Stefan Monnier
2019-07-09 21:04             ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvr278i6ap.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=36431@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).