From: Alan Mackenzie <acm@muc.de>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 50946@debbugs.gnu.org, joaotavora@gmail.com
Subject: bug#50946: insert-file-contents can corrupt buffers.
Date: Sun, 3 Oct 2021 17:21:35 +0000 [thread overview]
Message-ID: <YVnmn1k7RTvXtNTI@ACM> (raw)
In-Reply-To: <83czom870a.fsf@gnu.org>
Hello, Eli.
On Sun, Oct 03, 2021 at 18:25:57 +0300, Eli Zaretskii wrote:
> > Date: Sun, 3 Oct 2021 15:04:27 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > Here is an updated patch, superseding my patch from midday. I have
> > amended the descriptions of the two functions, replacing "corruption" of
> > the buffer by "inserting raw-text characters" in the first function, and
> > added explanation to the second.
> Thanks, see below some comments.
> > I wasn't able to find a suitable target for a cross-reference explaining
> > "raw-text".
> I think "Coding System Basics" is where we describe that encoding.
OK.
> > --- a/doc/lispref/files.texi
> > +++ b/doc/lispref/files.texi
> > @@ -556,14 +556,18 @@ Reading from Files
> > If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
> > that are byte offsets specifying the portion of the file to insert.
> > -In this case, @var{visit} must be @code{nil}. For example,
> > +In this case, @var{visit} must be @code{nil}. Be careful to ensure
> > +that these byte positions are at character boundaries. Otherwise,
> > +Emacs's character code conversion will insert one or more raw-text
> > +characters into the buffer, which is probably not what you want. For
> This isn't the whole story. The problem is mainly with the
> autodetection of encoding: it can go awry if you give it only a
> portion of the file. But if you bind coding-system-for-read, that
> problem goes away, and the only effect of using BEG and END arguments
> is limited to the first character/byte read. In particular, if you
> read a file in chunks, the character at the boundary could end up as 2
> or more raw bytes -- but as long as you bind coding-system-for-read,
> no other parts are supposed to be affected. And the problematic
> sequence of raw bytes can then be converted back to the original
> character with very simple Lisp.
OK, I've learnt something new. Thanks!
> So the text you propose is too "frightening", in that it basically
> says "don't use that". Which is too tough, because valid use cases to
> use that feature do exist, and if the programmer knows what he/she is
> doing it doesn't have to produce garbled buffers. For the manual, we
> need more informative text, which mentions coding-system-for-read.
OK, how about this third version of my patch?
diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi
index 2dc808e694..e73f53b040 100644
--- a/doc/lispref/files.texi
+++ b/doc/lispref/files.texi
@@ -563,7 +563,17 @@ Reading from Files
@end example
@noindent
-inserts the first 500 characters of a file.
+inserts the characters coded by the first 500 bytes of a file.
+
+If @var{beg} or @var{end} fails to be at a character boundary, Emacs's
+character code conversion will insert one or more raw-text characters
+(@pxref{Coding System Basics}) into the buffer. If you want to read
+part of a file this way, you are recommended to bind
+@code{coding-system-for-read} to a suitable value around the call to
+this function (@pxref{Specifying Coding Systems}), and to write Lisp
+code which will check for raw-text characters at the boundaries, read
+the rest of these characters from the file, and convert them back to
+valid characters.
If the argument @var{replace} is non-@code{nil}, it means to replace the
contents of the buffer (actually, just the accessible portion) with the
@@ -577,10 +587,11 @@ Reading from Files
@end defun
@defun insert-file-contents-literally filename &optional visit beg end replace
-This function works like @code{insert-file-contents} except that it
-does not run @code{after-insert-file-functions}, and does not do
-format decoding, character code conversion, automatic uncompression,
-and so on.
+This function works like @code{insert-file-contents} except that each
+byte in the file is handled separately, being converted into a
+raw-text character if needed. It does not run
+@code{after-insert-file-functions}, and does not do format decoding,
+character code conversion, automatic uncompression, and so on.
@end defun
If you want to pass a file name to another process so that another
--
Alan Mackenzie (Nuremberg, Germany).
next prev parent reply other threads:[~2021-10-03 17:21 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-01 17:10 bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands Alan Mackenzie
2021-10-01 17:53 ` Eli Zaretskii
2021-10-01 21:15 ` João Távora
2021-10-02 6:10 ` Eli Zaretskii
2021-10-02 0:48 ` João Távora
2021-10-02 10:50 ` Alan Mackenzie
2021-10-02 11:13 ` João Távora
2021-10-02 11:38 ` João Távora
2021-10-02 12:38 ` Alan Mackenzie
2021-10-02 12:52 ` Eli Zaretskii
2021-10-02 13:57 ` Alan Mackenzie
2021-10-02 14:19 ` Eli Zaretskii
2021-10-02 14:45 ` Alan Mackenzie
2021-10-02 15:00 ` Eli Zaretskii
2021-10-02 20:07 ` Alan Mackenzie
2021-10-03 11:45 ` Eli Zaretskii
2021-10-03 12:10 ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Alan Mackenzie
2021-10-03 12:40 ` Eli Zaretskii
2021-10-03 13:33 ` Alan Mackenzie
2021-10-03 15:04 ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
2021-10-03 15:25 ` Eli Zaretskii
2021-10-03 17:21 ` Alan Mackenzie [this message]
2021-10-03 17:36 ` Eli Zaretskii
2021-10-03 18:19 ` Alan Mackenzie
2021-10-03 15:34 ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
2021-10-03 15:42 ` João Távora
2021-10-03 15:56 ` Eli Zaretskii
2021-10-03 16:02 ` João Távora
2021-10-03 16:20 ` Eli Zaretskii
2021-10-03 17:05 ` João Távora
2021-10-03 17:56 ` Eli Zaretskii
2021-10-03 18:59 ` João Távora
2021-10-03 19:51 ` Eli Zaretskii
2021-10-03 19:59 ` João Távora
2021-10-02 15:02 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
2021-10-04 0:14 ` Richard Stallman
2021-10-02 14:47 ` João Távora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YVnmn1k7RTvXtNTI@ACM \
--to=acm@muc.de \
--cc=50946@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=joaotavora@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).