From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#36431: Crash in marker.c:337 Date: Wed, 03 Jul 2019 00:21:54 -0400 Message-ID: References: <20190629.131734.877718102639559715.wl@gnu.org> <831rzch9nd.fsf@gnu.org> <83zhm0fuqg.fsf@gnu.org> <83ftnrf87e.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="102328"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: 36431@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Jul 03 06:37:42 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hiX1Z-000QTS-Ge for geb-bug-gnu-emacs@m.gmane.org; Wed, 03 Jul 2019 06:37:41 +0200 Original-Received: from localhost ([::1]:60662 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hiWnV-0007eo-UB for geb-bug-gnu-emacs@m.gmane.org; Wed, 03 Jul 2019 00:23:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54186) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hiWnQ-0007eR-A9 for bug-gnu-emacs@gnu.org; Wed, 03 Jul 2019 00:23:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hiWnO-0004gQ-Rs for bug-gnu-emacs@gnu.org; Wed, 03 Jul 2019 00:23:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39467) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hiWnO-0004fd-LW for bug-gnu-emacs@gnu.org; Wed, 03 Jul 2019 00:23:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hiWnO-0006r2-D0 for bug-gnu-emacs@gnu.org; Wed, 03 Jul 2019 00:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 03 Jul 2019 04:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36431 X-GNU-PR-Package: emacs Original-Received: via spool by 36431-submit@debbugs.gnu.org id=B36431.156212772526240 (code B ref 36431); Wed, 03 Jul 2019 04:23:01 +0000 Original-Received: (at 36431) by debbugs.gnu.org; 3 Jul 2019 04:22:05 +0000 Original-Received: from localhost ([127.0.0.1]:48288 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hiWmS-0006p8-DA for submit@debbugs.gnu.org; Wed, 03 Jul 2019 00:22:04 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:52166) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hiWmR-0006of-Ai for 36431@debbugs.gnu.org; Wed, 03 Jul 2019 00:22:03 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 01FF74444A2; Wed, 3 Jul 2019 00:21:57 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 318224444A0; Wed, 3 Jul 2019 00:21:55 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1562127715; bh=GPeNsEL01yCDHpLb/O33WIcbjlzE9sREOSGVIibELZM=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=juR8JAAOCYQ+0EZgrr8Cg+yQjiJ1m2Ogq7fTm6+FHvoAEhHs1I41ZLcc1l1Ve57jy F8YY5q+rD2bEOolwmxSKW51Hmd2sOjxYz18j6YCDUfLhr2RMHZ6m7caBMOavI5eRdF d3cbCUv2mXkEgn0BiNRPrnq86gf3KowGAKV4qK2UyuAbGOHl680IAEo3B7EW1QlWL5 oFWKmQ4v2axHLdtwnjLFRl0YY+e516GjqD7GFMqagu0yX1qlGhO6vuXomHyZD0TYF5 QDcCMiXvK4h749Dx34Txo6MK3IehbouYdtNk6dOsfiDuPV4KoAz7EGvO8wvzvGcojm L7rfPX7gJwBpQ== Original-Received: from alfajor (76-10-141-139.dsl.teksavvy.com [76.10.141.139]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id BF6EA120BF4; Wed, 3 Jul 2019 00:21:54 -0400 (EDT) In-Reply-To: <83ftnrf87e.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 30 Jun 2019 17:39:49 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:162003 Archived-At: > AFAICT, this patch moves the call to move_gap_both from a fragment > where we must decode the inserted text to a fragment where such a > decoding might not be necessary. If I'm right, then this makes > insert-file-contents slower in some cases, because moving the gap > might be very expensive with large buffers. Here's an alternative patch which doesn't suffer from this problem but also eliminates the transiently-inconsistent multibyte buffer situation. Stefan diff --git a/src/fileio.c b/src/fileio.c index 2825c1b54c..9ed1fcf8ca 100644 --- a/src/fileio.c +++ b/src/fileio.c @@ -3705,6 +3705,7 @@ because (1) it preserves some marker positions and (2) it puts less data CHECK_CODING_SYSTEM (Vcoding_system_for_read); Fset (Qbuffer_file_coding_system, Vcoding_system_for_read); } + eassert (inserted == 0); goto notfound; } @@ -3731,7 +3732,10 @@ because (1) it preserves some marker positions and (2) it puts less data not_regular = 1; if (! NILP (visit)) - goto notfound; + { + eassert (inserted == 0); + goto notfound; + } if (! NILP (replace) || ! NILP (beg) || ! NILP (end)) xsignal2 (Qfile_error, @@ -4399,10 +4403,10 @@ because (1) it preserves some marker positions and (2) it puts less data if (how_much < 0) report_file_error ("Read error", orig_filename); - /* Make the text read part of the buffer. */ - insert_from_gap_1 (inserted, inserted, false); - - notfound: + notfound: ; + Lisp_Object multibyte + = BVAR (current_buffer, enable_multibyte_characters); + bool ingap = true; /* Bytes are currently in the gap. */ if (NILP (coding_system)) { @@ -4411,6 +4415,7 @@ because (1) it preserves some marker positions and (2) it puts less data Note that we can get here only if the buffer was empty before the insertion. */ + eassert (Z == BEG); if (!NILP (Vcoding_system_for_read)) coding_system = Vcoding_system_for_read; @@ -4421,8 +4426,6 @@ because (1) it preserves some marker positions and (2) it puts less data enable-multibyte-characters directly here without taking care of marker adjustment. By this way, we can run Lisp program safely before decoding the inserted text. */ - Lisp_Object multibyte - = BVAR (current_buffer, enable_multibyte_characters); Lisp_Object undo_list = BVAR (current_buffer, undo_list); ptrdiff_t count1 = SPECPDL_INDEX (); @@ -4430,6 +4433,10 @@ because (1) it preserves some marker positions and (2) it puts less data bset_undo_list (current_buffer, Qt); record_unwind_protect (restore_buffer, Fcurrent_buffer ()); + /* Make the text read part of the buffer. */ + insert_from_gap_1 (inserted, inserted, false); + ingap = false; + if (inserted > 0 && ! NILP (Vset_auto_coding_function)) { coding_system = call2 (Vset_auto_coding_function, @@ -4455,15 +4462,10 @@ because (1) it preserves some marker positions and (2) it puts less data adjust_overlays_for_delete (BEG, Z - BEG); set_buffer_intervals (current_buffer, NULL); TEMP_SET_PT_BOTH (BEG, BEG_BYTE); - - /* Change the buffer's multibyteness directly. We used to do this - from within unbind_to, but it was unsafe since the bytes - may contain invalid sequences for a multibyte buffer (which is OK - here since we'll decode them before anyone else gets to see - them, but is dangerous when we're doing a non-local exit). */ - bset_enable_multibyte_characters (current_buffer, multibyte); bset_undo_list (current_buffer, undo_list); inserted = Z_BYTE - BEG_BYTE; + /* The bytes may be invalid for a multibyte buffer, so we can't + restore the multibyteness yet. */ } if (NILP (coding_system)) @@ -4471,7 +4473,7 @@ because (1) it preserves some marker positions and (2) it puts less data else CHECK_CODING_SYSTEM (coding_system); - if (NILP (BVAR (current_buffer, enable_multibyte_characters))) + if (NILP (multibyte)) /* We must suppress all character code conversion except for end-of-line conversion. */ coding_system = raw_text_coding_system (coding_system); @@ -4490,33 +4492,51 @@ because (1) it preserves some marker positions and (2) it puts less data { /* Visiting a file with these coding system makes the buffer unibyte. */ - if (inserted > 0) + if (!ingap) + multibyte = Qnil; + else if (inserted > 0) bset_enable_multibyte_characters (current_buffer, Qnil); - else + else Fset_buffer_multibyte (Qnil); } } - coding.dst_multibyte = ! NILP (BVAR (current_buffer, enable_multibyte_characters)); + coding.dst_multibyte = !NILP (multibyte); if (CODING_MAY_REQUIRE_DECODING (&coding) && (inserted > 0 || CODING_REQUIRE_FLUSHING (&coding))) { - move_gap_both (PT, PT_BYTE); - GAP_SIZE += inserted; - ZV_BYTE -= inserted; - Z_BYTE -= inserted; - ZV -= inserted; - Z -= inserted; + if (ingap) + { /* Text is at beginning of gap, move it to the end. */ + memmove (GAP_END_ADDR - inserted, GPT_ADDR, inserted); + } + else + { /* Text is inside the buffer; move it to end of the gap. */ + move_gap_both (PT, PT_BYTE); + eassert (inserted == Z_BYTE - BEG_BYTE); + GAP_SIZE += inserted; + ZV = Z = GPT = BEG; + ZV_BYTE = Z_BYTE = GPT_BYTE = BEG_BYTE; + /* Now we are safe to change the buffer's multibyteness directly. */ + bset_enable_multibyte_characters (current_buffer, multibyte); + } + decode_coding_gap (&coding, inserted); inserted = coding.produced_char; coding_system = CODING_ID_NAME (coding.id); } - else if (inserted > 0) + else if (inserted > 0 && ingap) { + /* Make the text read part of the buffer. */ + eassert (NILP (BVAR (current_buffer, enable_multibyte_characters))); + insert_from_gap_1 (inserted, inserted, false); invalidate_buffer_caches (current_buffer, PT, PT + inserted); adjust_after_insert (PT, PT_BYTE, PT + inserted, PT_BYTE + inserted, inserted); } + else if (!ingap) + { /* Apparently, no decoding needed, so just set the bytenesss. */ + bset_enable_multibyte_characters (current_buffer, multibyte); + } /* Call after-change hooks for the inserted text, aside from the case of normal visiting (not with REPLACE), which is done in a new buffer