unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: "Arsen Arsenović" <arsen@aarsen.me>
Cc: rms@gnu.org, 9800-done@debbugs.gnu.org
Subject: bug#9800: Incomplete truncated file buffers from the /proc filesystem
Date: Mon, 13 Feb 2023 12:47:33 -0800	[thread overview]
Message-ID: <f711f43f-6892-f849-21ec-26376c351f11@cs.ucla.edu> (raw)
In-Reply-To: <861qmvcglp.fsf@aarsen.me>

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On 2023-02-12 02:21, Arsen Arsenović wrote:
> ... or such.  This approach is robust and general, and I suspect it'd
> even work for named pipes.

Although indeed robust and general and it will work with named pipes in 
some cases, it still has a problem if the other side of the named pipe 
outputs data very slowly: Emacs will still seem to hang until you type C-g.

That being said, the approach is an improvement and it fixes the 
original bug report so I installed the attached and am boldly closing 
the bug report (we can reopen it if I'm wrong). The last patch in the 
attached series is the actual fix: the others are minor cleanups of this 
messy area, which I discovered while looking into the fix.

This patch does not address the abovementioned issue of named pipes, nor 
the issue of inserting very large files: the code should behave roughly 
the same as before in those two areas. These issues can be raised in 
separate bug reports if needed.

PS. I was surprised to see that Emacs master currently has several test 
case failures on GNU/Linux (specifically the latest Fedora and Ubuntu 
releases). I hope these are known and that people are working on them.

1 files did not finish:
   lisp/server-tests.log
4 files contained unexpected results:
   src/lread-tests.log
   lisp/international/mule-tests.log
   lisp/emacs-lisp/map-tests.log
   lisp/emacs-lisp/bytecomp-tests.log

[-- Attachment #2: 0001-Improve-insert-file-contents-checking.patch --]
[-- Type: text/x-patch, Size: 2041 bytes --]

From 5284af27ee5250c631ff4ee2f3d8682f0c5df8bc Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 12 Feb 2023 17:30:46 -0800
Subject: [PATCH 1/5] Improve insert-file-contents checking

* src/fileio.c (Finsert_file_contents): Check BEG, END,
REPLACE for validity before launching into opening files etc.
---
 src/fileio.c | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/src/fileio.c b/src/fileio.c
index c672e0f7baf..64337abdaef 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -3908,7 +3908,6 @@ because (1) it preserves some marker positions (in unchanged portions
   int fd;
   ptrdiff_t inserted = 0;
   ptrdiff_t how_much;
-  off_t beg_offset, end_offset;
   int unprocessed;
   specpdl_ref count = SPECPDL_INDEX ();
   Lisp_Object handler, val, insval, orig_filename, old_undo;
@@ -3970,6 +3969,17 @@ because (1) it preserves some marker positions (in unchanged portions
       goto handled;
     }
 
+  if (!NILP (visit))
+    {
+      if (!NILP (beg) || !NILP (end))
+	error ("Attempt to visit less than an entire file");
+      if (BEG < Z && NILP (replace))
+	error ("Cannot do file visiting in a non-empty buffer");
+    }
+
+  off_t beg_offset = !NILP (beg) ? file_offset (beg) : 0;
+  off_t end_offset = !NILP (end) ? file_offset (end) : -1;
+
   orig_filename = filename;
   filename = ENCODE_FILE (filename);
 
@@ -4030,22 +4040,7 @@ because (1) it preserves some marker positions (in unchanged portions
 		  build_string ("not a regular file"), orig_filename);
     }
 
-  if (!NILP (visit))
-    {
-      if (!NILP (beg) || !NILP (end))
-	error ("Attempt to visit less than an entire file");
-      if (BEG < Z && NILP (replace))
-	error ("Cannot do file visiting in a non-empty buffer");
-    }
-
-  if (!NILP (beg))
-    beg_offset = file_offset (beg);
-  else
-    beg_offset = 0;
-
-  if (!NILP (end))
-    end_offset = file_offset (end);
-  else
+  if (end_offset < 0)
     {
       if (!regular)
 	end_offset = TYPE_MAXIMUM (off_t);
-- 
2.37.2


[-- Attachment #3: 0002-Improve-insert-file-contents-on-non-regular-files.patch --]
[-- Type: text/x-patch, Size: 1521 bytes --]

From b0842671e750be08356425e2fc38251e7b08d5d7 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 12 Feb 2023 17:52:46 -0800
Subject: [PATCH 2/5] Improve insert-file-contents on non-regular files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/fileio.c (Finsert_file_contents):
If the file is not a regular file, check REPLACE and VISIT
before doing further syscalls that won’t matter in this case.
---
 src/fileio.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/fileio.c b/src/fileio.c
index 64337abdaef..751b8ec573c 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -4022,7 +4022,6 @@ because (1) it preserves some marker positions (in unchanged portions
   if (!S_ISREG (st.st_mode))
     {
       regular = false;
-      seekable = lseek (fd, 0, SEEK_CUR) < 0;
 
       if (! NILP (visit))
         {
@@ -4030,14 +4029,15 @@ because (1) it preserves some marker positions (in unchanged portions
 	  goto notfound;
         }
 
+      if (!NILP (replace))
+	xsignal2 (Qfile_error,
+		  build_string ("not a regular file"), orig_filename);
+
+      seekable = lseek (fd, 0, SEEK_CUR) < 0;
       if (!NILP (beg) && !seekable)
 	xsignal2 (Qfile_error,
 		  build_string ("cannot use a start position in a non-seekable file/device"),
 		  orig_filename);
-
-      if (!NILP (replace))
-	xsignal2 (Qfile_error,
-		  build_string ("not a regular file"), orig_filename);
     }
 
   if (end_offset < 0)
-- 
2.37.2


[-- Attachment #4: 0003-Don-t-scan-text-twice-to-guess-coding-system.patch --]
[-- Type: text/x-patch, Size: 1195 bytes --]

From ccc092115172f15c9135771f90d0000f8bf21614 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 13 Feb 2023 08:51:45 -0800
Subject: [PATCH 3/5] =?UTF-8?q?Don=E2=80=99t=20scan=20text=20twice=20to=20?=
 =?UTF-8?q?guess=20coding=20system?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/fileio.c (Finsert_file_contents): If the file shrank
below 4 KiB, don’t read duplicate text into READ_BUF.
This also removes a use of SEEK_END, which Linux /proc
file systems do not support (not that we should get here
with /proc).
---
 src/fileio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/fileio.c b/src/fileio.c
index 751b8ec573c..47177be0f4d 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -4119,7 +4119,7 @@ because (1) it preserves some marker positions (in unchanged portions
 		  if (nread == 1024)
 		    {
 		      int ntail;
-		      if (lseek (fd, - (1024 * 3), SEEK_END) < 0)
+		      if (lseek (fd, st.st_size - 1024 * 3, SEEK_CUR) < 0)
 			report_file_error ("Setting file position",
 					   orig_filename);
 		      ntail = emacs_read_quit (fd, read_buf + nread, 1024 * 3);
-- 
2.37.2


[-- Attachment #5: 0004-Fix-src-fileio.c-comment.patch --]
[-- Type: text/x-patch, Size: 1022 bytes --]

From bae5fa5d9a8ef8c41fbb3408eea441a2ee14d1db Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 13 Feb 2023 08:53:52 -0800
Subject: [PATCH 4/5] Fix src/fileio.c comment
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/fileio.c (Finsert_file_contents): Fix comment.
Since the code relies on st_size, it’s limited to
regular files, not to seekable files.
---
 src/fileio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/fileio.c b/src/fileio.c
index 47177be0f4d..ee30db8b49b 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -4101,7 +4101,7 @@ because (1) it preserves some marker positions (in unchanged portions
       else
 	{
 	  /* Don't try looking inside a file for a coding system
-	     specification if it is not seekable.  */
+	     specification if it is not a regular file.  */
 	  if (regular && !NILP (Vset_auto_coding_function))
 	    {
 	      /* Find a coding system specified in the heading two
-- 
2.37.2


[-- Attachment #6: 0005-Fix-insert-file-contents-on-proc-files.patch --]
[-- Type: text/x-patch, Size: 4914 bytes --]

From b950b46f514989442fdd9937a0e96d53a3affa88 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 13 Feb 2023 12:32:11 -0800
Subject: [PATCH 5/5] Fix insert-file-contents on /proc files

This should fix Bug#9800 (2011-10-19).
* src/fileio.c (Finsert_file_contents):
Do not trust st_size even on regular files, as the file might
be a Linux /proc file, or it might be growing.
Instead, always read to EOF when END is nil.
---
 src/fileio.c | 57 +++++++++++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 32 deletions(-)

diff --git a/src/fileio.c b/src/fileio.c
index ee30db8b49b..b80f8d61de4 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -3907,7 +3907,6 @@ because (1) it preserves some marker positions (in unchanged portions
   struct timespec mtime;
   int fd;
   ptrdiff_t inserted = 0;
-  ptrdiff_t how_much;
   int unprocessed;
   specpdl_ref count = SPECPDL_INDEX ();
   Lisp_Object handler, val, insval, orig_filename, old_undo;
@@ -3920,7 +3919,8 @@ because (1) it preserves some marker positions (in unchanged portions
   bool replace_handled = false;
   bool set_coding_system = false;
   Lisp_Object coding_system;
-  bool read_quit = false;
+  /* Negative if read error, 0 if OK so far, positive if quit.  */
+  ptrdiff_t read_quit = 0;
   /* If the undo log only contains the insertion, there's no point
      keeping it.  It's typically when we first fill a file-buffer.  */
   bool empty_undo_list_p
@@ -4404,7 +4404,7 @@ because (1) it preserves some marker positions (in unchanged portions
       ptrdiff_t bufpos;
       unsigned char *decoded;
       ptrdiff_t temp;
-      ptrdiff_t this = 0;
+      ptrdiff_t this;
       specpdl_ref this_count = SPECPDL_INDEX ();
       bool multibyte
 	= ! NILP (BVAR (current_buffer, enable_multibyte_characters));
@@ -4580,8 +4580,12 @@ because (1) it preserves some marker positions (in unchanged portions
     }
 
   move_gap_both (PT, PT_BYTE);
-  if (GAP_SIZE < total)
-    make_gap (total - GAP_SIZE);
+
+  /* Ensure the gap is at least one byte larger than needed for the
+     estimated file size, so that in the usual case we read to EOF
+     without reallocating.  */
+  if (GAP_SIZE <= total)
+    make_gap (total - GAP_SIZE + 1);
 
   if (beg_offset != 0 || !NILP (replace))
     {
@@ -4589,12 +4593,6 @@ because (1) it preserves some marker positions (in unchanged portions
 	report_file_error ("Setting file position", orig_filename);
     }
 
-  /* In the following loop, HOW_MUCH contains the total bytes read so
-     far for a regular file, and not changed for a special file.  But,
-     before exiting the loop, it is set to a negative value if I/O
-     error occurs.  */
-  how_much = 0;
-
   /* Total bytes inserted.  */
   inserted = 0;
 
@@ -4603,23 +4601,26 @@ because (1) it preserves some marker positions (in unchanged portions
   {
     ptrdiff_t gap_size = GAP_SIZE;
 
-    while (how_much < total)
+    while (NILP (end) || inserted < total)
       {
-	/* `try' is reserved in some compilers (Microsoft C).  */
-	ptrdiff_t trytry = min (total - how_much, READ_BUF_SIZE);
 	ptrdiff_t this;
 
+	if (gap_size == 0)
+	  {
+	    /* The size estimate was wrong.  Make the gap 50% larger.  */
+	    make_gap (GAP_SIZE >> 1);
+	    gap_size = GAP_SIZE - inserted;
+	  }
+
+	/* 'try' is reserved in some compilers (Microsoft C).  */
+	ptrdiff_t trytry = min (gap_size, READ_BUF_SIZE);
+	if (!NILP (end))
+	  trytry = min (trytry, total - inserted);
+
 	if (!seekable && NILP (end))
 	  {
 	    Lisp_Object nbytes;
 
-	    /* Maybe make more room.  */
-	    if (gap_size < trytry)
-	      {
-		make_gap (trytry - gap_size);
-		gap_size = GAP_SIZE - inserted;
-	      }
-
 	    /* Read from the file, capturing `quit'.  When an
 	       error occurs, end the loop, and arrange for a quit
 	       to be signaled after decoding the text we read.  */
@@ -4630,7 +4631,7 @@ because (1) it preserves some marker positions (in unchanged portions
 
 	    if (NILP (nbytes))
 	      {
-		read_quit = true;
+		read_quit = 1;
 		break;
 	      }
 
@@ -4649,19 +4650,11 @@ because (1) it preserves some marker positions (in unchanged portions
 
 	if (this <= 0)
 	  {
-	    how_much = this;
+	    read_quit = this;
 	    break;
 	  }
 
 	gap_size -= this;
-
-	/* For a regular file, where TOTAL is the real size,
-	   count HOW_MUCH to compare with it.
-	   For a special file, where TOTAL is just a buffer size,
-	   so don't bother counting in HOW_MUCH.
-	   (INSERTED is where we count the number of characters inserted.)  */
-	if (seekable || !NILP (end))
-	  how_much += this;
 	inserted += this;
       }
   }
@@ -4682,7 +4675,7 @@ because (1) it preserves some marker positions (in unchanged portions
   emacs_close (fd);
   clear_unwind_protect (fd_index);
 
-  if (how_much < 0)
+  if (read_quit < 0)
     report_file_error ("Read error", orig_filename);
 
  notfound:
-- 
2.37.2


  reply	other threads:[~2023-02-13 20:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-19 22:59 bug#9800: Incomplete truncated file buffers from the /proc filesystem Juri Linkov
2011-10-20  8:22 ` Eli Zaretskii
2011-10-20  8:44   ` Andreas Schwab
2023-02-12  7:38     ` Eli Zaretskii
2023-02-12  9:24       ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2011-10-24  2:53 ` Paul Eggert
2011-10-24 21:50   ` Richard Stallman
2011-10-24 22:02     ` Paul Eggert
2023-02-12 10:21       ` Arsen Arsenović via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-13 20:47         ` Paul Eggert [this message]
2011-11-03 20:32   ` Lars Magne Ingebrigtsen
2011-11-04  9:36     ` Juri Linkov
2011-11-04 10:54       ` Eli Zaretskii
2022-02-07  0:10 ` Lars Ingebrigtsen
2022-02-07 19:41   ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f711f43f-6892-f849-21ec-26376c351f11@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=9800-done@debbugs.gnu.org \
    --cc=arsen@aarsen.me \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).