From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#50946: insert-file-contents can corrupt buffers. Date: Sun, 3 Oct 2021 17:21:35 +0000 Message-ID: References: <83pmsnbnci.fsf@gnu.org> <83k0ivbjbu.fsf@gnu.org> <83czonbhex.fsf@gnu.org> <83lf3a8eo7.fsf@gnu.org> <83czom870a.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5893"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 50946@debbugs.gnu.org, joaotavora@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Oct 03 19:23:50 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mX5DK-0001Ki-Iw for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 19:23:50 +0200 Original-Received: from localhost ([::1]:34204 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mX5DG-0003tY-1F for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 13:23:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51284) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mX5Ba-0001G7-Nj for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 13:22:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:51540) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mX5Ba-0005DQ-FU for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 13:22:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mX5Ba-0005XX-7z for bug-gnu-emacs@gnu.org; Sun, 03 Oct 2021 13:22:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 03 Oct 2021 17:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50946 X-GNU-PR-Package: emacs Original-Received: via spool by 50946-submit@debbugs.gnu.org id=B50946.163328170521271 (code B ref 50946); Sun, 03 Oct 2021 17:22:02 +0000 Original-Received: (at 50946) by debbugs.gnu.org; 3 Oct 2021 17:21:45 +0000 Original-Received: from localhost ([127.0.0.1]:34853 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mX5BI-0005Wy-TF for submit@debbugs.gnu.org; Sun, 03 Oct 2021 13:21:45 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:27662 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1mX5BG-0005WX-Ki for 50946@debbugs.gnu.org; Sun, 03 Oct 2021 13:21:43 -0400 Original-Received: (qmail 58009 invoked by uid 3782); 3 Oct 2021 17:21:36 -0000 Original-Received: from acm.muc.de (p4fe1506f.dip0.t-ipconnect.de [79.225.80.111]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 03 Oct 2021 19:21:36 +0200 Original-Received: (qmail 7619 invoked by uid 1000); 3 Oct 2021 17:21:35 -0000 Content-Disposition: inline In-Reply-To: <83czom870a.fsf@gnu.org> X-Submission-Agent: TMDA/1.3.x (Ph3nix) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:216285 Archived-At: Hello, Eli. On Sun, Oct 03, 2021 at 18:25:57 +0300, Eli Zaretskii wrote: > > Date: Sun, 3 Oct 2021 15:04:27 +0000 > > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org > > From: Alan Mackenzie > > Here is an updated patch, superseding my patch from midday. I have > > amended the descriptions of the two functions, replacing "corruption" of > > the buffer by "inserting raw-text characters" in the first function, and > > added explanation to the second. > Thanks, see below some comments. > > I wasn't able to find a suitable target for a cross-reference explaining > > "raw-text". > I think "Coding System Basics" is where we describe that encoding. OK. > > --- a/doc/lispref/files.texi > > +++ b/doc/lispref/files.texi > > @@ -556,14 +556,18 @@ Reading from Files > > If @var{beg} and @var{end} are non-@code{nil}, they should be numbers > > that are byte offsets specifying the portion of the file to insert. > > -In this case, @var{visit} must be @code{nil}. For example, > > +In this case, @var{visit} must be @code{nil}. Be careful to ensure > > +that these byte positions are at character boundaries. Otherwise, > > +Emacs's character code conversion will insert one or more raw-text > > +characters into the buffer, which is probably not what you want. For > This isn't the whole story. The problem is mainly with the > autodetection of encoding: it can go awry if you give it only a > portion of the file. But if you bind coding-system-for-read, that > problem goes away, and the only effect of using BEG and END arguments > is limited to the first character/byte read. In particular, if you > read a file in chunks, the character at the boundary could end up as 2 > or more raw bytes -- but as long as you bind coding-system-for-read, > no other parts are supposed to be affected. And the problematic > sequence of raw bytes can then be converted back to the original > character with very simple Lisp. OK, I've learnt something new. Thanks! > So the text you propose is too "frightening", in that it basically > says "don't use that". Which is too tough, because valid use cases to > use that feature do exist, and if the programmer knows what he/she is > doing it doesn't have to produce garbled buffers. For the manual, we > need more informative text, which mentions coding-system-for-read. OK, how about this third version of my patch? diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi index 2dc808e694..e73f53b040 100644 --- a/doc/lispref/files.texi +++ b/doc/lispref/files.texi @@ -563,7 +563,17 @@ Reading from Files @end example @noindent -inserts the first 500 characters of a file. +inserts the characters coded by the first 500 bytes of a file. + +If @var{beg} or @var{end} fails to be at a character boundary, Emacs's +character code conversion will insert one or more raw-text characters +(@pxref{Coding System Basics}) into the buffer. If you want to read +part of a file this way, you are recommended to bind +@code{coding-system-for-read} to a suitable value around the call to +this function (@pxref{Specifying Coding Systems}), and to write Lisp +code which will check for raw-text characters at the boundaries, read +the rest of these characters from the file, and convert them back to +valid characters. If the argument @var{replace} is non-@code{nil}, it means to replace the contents of the buffer (actually, just the accessible portion) with the @@ -577,10 +587,11 @@ Reading from Files @end defun @defun insert-file-contents-literally filename &optional visit beg end replace -This function works like @code{insert-file-contents} except that it -does not run @code{after-insert-file-functions}, and does not do -format decoding, character code conversion, automatic uncompression, -and so on. +This function works like @code{insert-file-contents} except that each +byte in the file is handled separately, being converted into a +raw-text character if needed. It does not run +@code{after-insert-file-functions}, and does not do format decoding, +character code conversion, automatic uncompression, and so on. @end defun If you want to pass a file name to another process so that another -- Alan Mackenzie (Nuremberg, Germany).