From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Date: Sun, 3 Oct 2021 12:10:19 +0000 Message-ID: References: <831r54einq.fsf@gnu.org> <871r54xnds.fsf@gmail.com> <87ee933bcj.fsf@gmail.com> <83pmsnbnci.fsf@gnu.org> <83k0ivbjbu.fsf@gnu.org> <83czonbhex.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15932"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 50946@debbugs.gnu.org, joaotavora@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Oct 03 14:11:12 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mX0Kl-0003sU-6Q for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 14:11:11 +0200 Original-Received: from localhost ([::1]:50296 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mX0Kj-00017U-Dc for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 08:11:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37668) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mX0Kc-000175-16 for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 08:11:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:49751) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mX0Kb-0003MK-Pb for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 08:11:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mX0Kb-0005St-LA for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 08:11:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 03 Oct 2021 12:11:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50946 X-GNU-PR-Package: emacs Original-Received: via spool by 50946-submit@debbugs.gnu.org id=B50946.163326303020962 (code B ref 50946); Sun, 03 Oct 2021 12:11:01 +0000 Original-Received: (at 50946) by debbugs.gnu.org; 3 Oct 2021 12:10:30 +0000 Original-Received: from localhost ([127.0.0.1]:33064 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mX0K6-0005S0-3n for submit@debbugs.gnu.org; Sun, 03 Oct 2021 08:10:30 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:19963 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1mX0K1-0005Ri-UR for 50946@debbugs.gnu.org; Sun, 03 Oct 2021 08:10:28 -0400 Original-Received: (qmail 42658 invoked by uid 3782); 3 Oct 2021 12:10:20 -0000 Original-Received: from acm.muc.de (p4fe1506f.dip0.t-ipconnect.de [79.225.80.111]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 03 Oct 2021 14:10:19 +0200 Original-Received: (qmail 5432 invoked by uid 1000); 3 Oct 2021 12:10:19 -0000 Content-Disposition: inline In-Reply-To: <83czonbhex.fsf@gnu.org> X-Submission-Agent: TMDA/1.3.x (Ph3nix) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:216261 Archived-At: On Sat, Oct 02, 2021 at 18:00:38 +0300, Eli Zaretskii wrote: > > Date: Sat, 2 Oct 2021 14:45:52 +0000 > > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org > > From: Alan Mackenzie [ .... ] > > Have you checked that things work if the first byte in your temporary > > buffer isn't at the start of a character? > I don't see why this matter, can you explain? Yes, it does matter. Because.... Create a file utf8-chars.txt as follows. All the non-ascii characters are 2-byte German UTF8 characters: Yes, it does matter. Because.... Create a file ~/utf8-chars.txt as follows. All the non-ascii characters are 2-byte German UTF8 characters. Only the Q is an ascii character. There is a LF at the end: ÄäÖöQÜüß Now, in an empty buffer, M-: (insert-file-contents "~/utf8-chars.txt" nil 3 15) .. The first character of this buffer is now the Emacs encoding of the raw byte 0xa4. Now do M-: (insert-file-contents "~/utf8-chars.txt" nil 0 3) The entire buffer, apart from the Q and the LF, now consists of raw bytes, and the buffer is now 16 characters long. (Is this a bug?). Note that the Q is now further back from the end of the buffer than it should be. Using insert-file-contents-literally instead doesn't help. So insert-file-contents corrupts the buffer when BEG or END is not at a character boundary. This matters for hack-elisp-shorthands, because this corruption could push the Local Variables: start further back than 3000 characters. Possibly other problems could happen, too. My opinion, for what it's worth, is that using insert-file-contents in hack-elisp-shorthands is a Bad Thing. Even if it is possible to get it working rigorously, it is surely not worth the trouble. Why not simply visit the file in a buffer, and check for buffer local variables in the normal fashion? ######################################################################### There are bugs in the documentation of insert-file-contents in the elisp manual. It confuses bytes with characters, and it fails to mention the need to keep BEG and END at character boundaries. I propose installing the following patch to the release branch: diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi index 2dc808e694..c344e18c2b 100644 --- a/doc/lispref/files.texi +++ b/doc/lispref/files.texi @@ -556,6 +556,9 @@ Reading from Files If @var{beg} and @var{end} are non-@code{nil}, they should be numbers that are byte offsets specifying the portion of the file to insert. +Be careful to ensure that these byte positions are at character +boundaries. Otherwise, Emacs's input functions will corrupt the +buffer. In this case, @var{visit} must be @code{nil}. For example, @example @@ -563,7 +566,7 @@ Reading from Files @end example @noindent -inserts the first 500 characters of a file. +inserts the characters coded by the first 500 bytes of a file. If the argument @var{replace} is non-@code{nil}, it means to replace the contents of the buffer (actually, just the accessible portion) with the @@ -580,7 +583,8 @@ Reading from Files This function works like @code{insert-file-contents} except that it does not run @code{after-insert-file-functions}, and does not do format decoding, character code conversion, automatic uncompression, -and so on. +and so on. @var{beg} and @var{end}, if non-@code{nil}, should be at +character boundaries, as in @code{insert-file-contents}. @end defun If you want to pass a file name to another process so that another The doc strings of insert-file-contents\(-literally\)? will also need to be updated. -- Alan Mackenzie (Nuremberg, Germany).