unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 50946@debbugs.gnu.org, joaotavora@gmail.com
Subject: bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
Date: Sun, 3 Oct 2021 12:10:19 +0000	[thread overview]
Message-ID: <YVmdq7FIzSBZefV1@ACM> (raw)
In-Reply-To: <83czonbhex.fsf@gnu.org>

On Sat, Oct 02, 2021 at 18:00:38 +0300, Eli Zaretskii wrote:
> > Date: Sat, 2 Oct 2021 14:45:52 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> > Have you checked that things work if the first byte in your temporary
> > buffer isn't at the start of a character?

> I don't see why this matter, can you explain?

Yes, it does matter.  Because....

Create a file utf8-chars.txt as follows.  All the non-ascii characters
are 2-byte German UTF8 characters:
Yes, it does matter.  Because....

Create a file ~/utf8-chars.txt as follows.  All the non-ascii characters
are 2-byte German UTF8 characters.  Only the Q is an ascii character.
There is a LF at the end:

ÄäÖöQÜüß

Now, in an empty buffer,

   M-: (insert-file-contents "~/utf8-chars.txt" nil 3 15)

..  The first character of this buffer is now the Emacs encoding of the
raw byte 0xa4.

Now do

   M-: (insert-file-contents "~/utf8-chars.txt" nil 0 3)

The entire buffer, apart from the Q and the LF, now consists of raw
bytes, and the buffer is now 16 characters long.  (Is this a bug?).
Note that the Q is now further back from the end of the buffer than it
should be.

Using insert-file-contents-literally instead doesn't help.

So insert-file-contents corrupts the buffer when BEG or END is not at a
character boundary.  This matters for hack-elisp-shorthands, because this
corruption could push the Local Variables: start further back than 3000
characters.  Possibly other problems could happen, too.

My opinion, for what it's worth, is that using insert-file-contents in
hack-elisp-shorthands is a Bad Thing.  Even if it is possible to get it
working rigorously, it is surely not worth the trouble.  Why not simply
visit the file in a buffer, and check for buffer local variables in the
normal fashion?

#########################################################################

There are bugs in the documentation of insert-file-contents in the elisp
manual.  It confuses bytes with characters, and it fails to mention the
need to keep BEG and END at character boundaries.  I propose installing
the following patch to the release branch:



diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi
index 2dc808e694..c344e18c2b 100644
--- a/doc/lispref/files.texi
+++ b/doc/lispref/files.texi
@@ -556,6 +556,9 @@ Reading from Files
 
 If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
 that are byte offsets specifying the portion of the file to insert.
+Be careful to ensure that these byte positions are at character
+boundaries.  Otherwise, Emacs's input functions will corrupt the
+buffer.
 In this case, @var{visit} must be @code{nil}.  For example,
 
 @example
@@ -563,7 +566,7 @@ Reading from Files
 @end example
 
 @noindent
-inserts the first 500 characters of a file.
+inserts the characters coded by the first 500 bytes of a file.
 
 If the argument @var{replace} is non-@code{nil}, it means to replace the
 contents of the buffer (actually, just the accessible portion) with the
@@ -580,7 +583,8 @@ Reading from Files
 This function works like @code{insert-file-contents} except that it
 does not run @code{after-insert-file-functions}, and does not do
 format decoding, character code conversion, automatic uncompression,
-and so on.
+and so on.  @var{beg} and @var{end}, if non-@code{nil}, should be at
+character boundaries, as in @code{insert-file-contents}.
 @end defun
 
 If you want to pass a file name to another process so that another


The doc strings of insert-file-contents\(-literally\)? will also need to
be updated.

-- 
Alan Mackenzie (Nuremberg, Germany).





  parent reply	other threads:[~2021-10-03 12:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 17:10 bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands Alan Mackenzie
2021-10-01 17:53 ` Eli Zaretskii
2021-10-01 21:15   ` João Távora
2021-10-02  6:10     ` Eli Zaretskii
2021-10-02  0:48   ` João Távora
2021-10-02 10:50     ` Alan Mackenzie
2021-10-02 11:13       ` João Távora
2021-10-02 11:38       ` João Távora
2021-10-02 12:38         ` Alan Mackenzie
2021-10-02 12:52           ` Eli Zaretskii
2021-10-02 13:57             ` Alan Mackenzie
2021-10-02 14:19               ` Eli Zaretskii
2021-10-02 14:45                 ` Alan Mackenzie
2021-10-02 15:00                   ` Eli Zaretskii
2021-10-02 20:07                     ` Alan Mackenzie
2021-10-03 11:45                       ` Eli Zaretskii
2021-10-03 12:10                     ` Alan Mackenzie [this message]
2021-10-03 12:40                       ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Eli Zaretskii
2021-10-03 13:33                         ` Alan Mackenzie
2021-10-03 15:04                         ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
2021-10-03 15:25                           ` Eli Zaretskii
2021-10-03 17:21                             ` Alan Mackenzie
2021-10-03 17:36                               ` Eli Zaretskii
2021-10-03 18:19                                 ` Alan Mackenzie
2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
2021-10-03 15:42                           ` João Távora
2021-10-03 15:56                           ` Eli Zaretskii
2021-10-03 16:02                             ` João Távora
2021-10-03 16:20                               ` Eli Zaretskii
2021-10-03 17:05                                 ` João Távora
2021-10-03 17:56                                   ` Eli Zaretskii
2021-10-03 18:59                                     ` João Távora
2021-10-03 19:51                                       ` Eli Zaretskii
2021-10-03 19:59                                         ` João Távora
2021-10-02 15:02                 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
2021-10-04  0:14                   ` Richard Stallman
2021-10-02 14:47           ` João Távora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVmdq7FIzSBZefV1@ACM \
    --to=acm@muc.de \
    --cc=50946@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=joaotavora@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).