From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#36431: Crash in marker.c:337 Date: Tue, 02 Jul 2019 15:44:07 -0400 Message-ID: References: <20190629.131734.877718102639559715.wl@gnu.org> <831rzch9nd.fsf@gnu.org> <83zhm0fuqg.fsf@gnu.org> <83ftnrf87e.fsf@gnu.org> <83v9wkcp3z.fsf@gnu.org> <83sgrocmws.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="200606"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: 36431@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 02 21:58:51 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hiOvS-000q25-CT for geb-bug-gnu-emacs@m.gmane.org; Tue, 02 Jul 2019 21:58:50 +0200 Original-Received: from localhost ([::1]:56888 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hiOvR-00005q-CQ for geb-bug-gnu-emacs@m.gmane.org; Tue, 02 Jul 2019 15:58:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44429) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hiOiC-0000LS-A8 for bug-gnu-emacs@gnu.org; Tue, 02 Jul 2019 15:45:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hiOi9-0006vB-7Z for bug-gnu-emacs@gnu.org; Tue, 02 Jul 2019 15:45:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39180) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hiOi6-0006rT-6W for bug-gnu-emacs@gnu.org; Tue, 02 Jul 2019 15:45:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hiOi6-0006zp-1I for bug-gnu-emacs@gnu.org; Tue, 02 Jul 2019 15:45:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 02 Jul 2019 19:45:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36431 X-GNU-PR-Package: emacs Original-Received: via spool by 36431-submit@debbugs.gnu.org id=B36431.156209666026827 (code B ref 36431); Tue, 02 Jul 2019 19:45:01 +0000 Original-Received: (at 36431) by debbugs.gnu.org; 2 Jul 2019 19:44:20 +0000 Original-Received: from localhost ([127.0.0.1]:48001 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hiOhP-0006yc-Gz for submit@debbugs.gnu.org; Tue, 02 Jul 2019 15:44:19 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:11048) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hiOhN-0006yO-PF for 36431@debbugs.gnu.org; Tue, 02 Jul 2019 15:44:18 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 9B78A81176; Tue, 2 Jul 2019 15:44:10 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 1C8D380B8C; Tue, 2 Jul 2019 15:44:09 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1562096649; bh=Cuc8CapowAddY23zo6kPbHsg1GGCkP8lOE3/Oqe2c6M=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=ewRPjei+FONtmJYkImeu4HfsCmpKIIihZupKJlALK+53qcPfU19DDalrJ0vb/8AZc 6Q+kfiVj72NnqvjjIBcV8MI8pS8sndNk9pKmk73CqkjJEATbew6DQJW+qwWImkAAzk SGoSdGd7vVXkkXc7V5SlvlcacPd+XZfxo2nqBi32Tm2K8Fv+ys7fRvpfN18VtABaAZ ezzJpbWLJVeb8XkmVSbgDF8ruXRsqfP/UzdbkndE/bMg7CZl7h7KVlRoQ9R7E1RjUA JaueepOe2ko5DiXLMw1p31GfpgVXOgAgXNDHZ40YZ4xhrzXdGsj7EC16DrKKeKtz4H 6BCIfM80Z4Y8A== Original-Received: from pastel (76-10-151-214.dsl.teksavvy.com [76.10.151.214]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id B8834120B88; Tue, 2 Jul 2019 15:44:08 -0400 (EDT) In-Reply-To: <83sgrocmws.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Jul 2019 21:27:15 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:161983 Archived-At: >> - we insert the new bytes at the beginning of the gap, in order to have >> room to grow if there are more bytes than expected, and also in case >> there are fewer bytes than expected (in which case we'd otherwise >> have to move the bytes we just read so they properly end at the end >> of the gap). > > Also, you will see in insert-file-contents that it supports quitting > while reading a huge file, and also the REPLACE argument, where we > detect the same contents at beginning and end of the file and the > buffer. Right, tho the end result is the same (e.g. when we quit, we can either abort the whole operation and trow away the bytes we read, or we can keep going with the bytes we did read which is simply another case of reading less than expected). >> - decode_coding_gap wants the new input bytes to be at the end of the >> gap, so that we can put the decoded chars at the beginning of the gap >> and as one grows the other shrinks, so we don't need space for "IN + >> OUT" bytes but only for "OUT" bytes. Is that right (I'm trying to >> find some comment or other evidence that this is the case, but >> haven't found it yet). > > That's right. The comment you are looking for (well, at least part of > it) is in the commentary before decode_coding, where it explains the > semantics of CODING->src_pos. You will see at the beginning of > decode_coding_gap how it sets things up according to that hairy > protocol. IIUC you're referring to this comment: Decode the data at CODING->src_object into CODING->dst_object. CODING->src_object is a buffer, a string, or nil. CODING->dst_object is a buffer. If CODING->src_object is a buffer, it must be the current buffer. In this case, if CODING->src_pos is positive, it is a position of the source text in the buffer, otherwise, the source text is in the gap area of the buffer, and CODING->src_pos specifies the offset of the text from GPT (which must be the same as PT). If this is the same buffer as CODING->dst_object, CODING->src_pos must be negative. If CODING->src_object is a string, CODING->src_pos is an index to that string. If CODING->src_object is nil, CODING->source must already point to the non-relocatable memory area. In this case, CODING->src_pos is an offset from CODING->source. The decoded data is inserted at the current point of the buffer CODING->dst_object. but this doesn't say if the bytes are to be found originally at the beginning of the gap or its end, nor whether they finish at the beginning or the end, nor what happens in the middle and why it's been designed this way. Is the patch below correct? >> IOW, it should be possible to optimize the common case by reading the >> new bytes into the end of the gap to avoid moving everything in the >> common case (if the number of bytes read is different from originally >> expected, we'll have to do extra work, but for the common case where we >> know the file size upfront and it doesn't change while we read it, this >> will save us some work). >> >> But the effort is probably not worth the trouble: a memmove of a few >> gigabytes costs relatively little compared to the cost of actually >> decoding those same gigabytes. > > Right. Also, there are the other subtle issues with quitting, the > REPLACE argument, special files, etc. I think the crash-example I sent can probably be made less esoteric by making it use "quit" instead of catch/throw. I'm beginning to think that when we quit (or signal an error) from within set-auto-coding-function, we simply shouldn't revert the buffer to multibyte. Stefan diff --git a/src/coding.c b/src/coding.c index 5b9bfa17dd..218d69e2e7 100644 --- a/src/coding.c +++ b/src/coding.c @@ -7322,11 +7322,16 @@ produce_annotation (struct coding_system *coding, ptrdiff_t pos) If CODING->src_object is a buffer, it must be the current buffer. In this case, if CODING->src_pos is positive, it is a position of - the source text in the buffer, otherwise, the source text is in the - gap area of the buffer, and CODING->src_pos specifies the offset of - the text from GPT (which must be the same as PT). If this is the - same buffer as CODING->dst_object, CODING->src_pos must be - negative. + the source text in the buffer, otherwise, the source text is at the + end of the gap area of the buffer, and CODING->src_pos specifies the + offset of the text from the end of the gap (which must be the at PT). + If this is the same buffer as CODING->dst_object, CODING->src_pos must + be negative. + + When the text is taken from the gap, it needs to be at the end of + the gap so that we can produce the decoded text at the beginning of + the gap: this way, as the output grows, the input shrinks, so we only + need to allocate enough space for `max(IN, OUT)` instead of `IN + OUT`. If CODING->src_object is a string, CODING->src_pos is an index to that string.