On 21/09/2023 17:37, Dmitry Gutov wrote:
> We could look into improving that part specifically: for example, 
> reading from the process multiple times into 'chars' right away while 
> there is still pending output present (either looping inside 
> read_process_output, or calling it in a loop in 
> wait_reading_process_output, at least until the process' buffered output 
> is exhausted). That could reduce reactivity, however (can we find out 
> how much is already buffered in advance, and only loop until we exhaust 
> that length?)

Hmm, the naive patch below offers some improvement for the value 4096, 
but still not comparable to raising the buffer size: 0.76 -> 0.72.

diff --git a/src/process.c b/src/process.c
index 2376d0f288d..a550e223f78 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5893,7 +5893,7 @@ wait_reading_process_output (intmax_t time_limit, 
int nsecs, int read_kbd,
  	      && ((fd_callback_info[channel].flags & (KEYBOARD_FD | PROCESS_FD))
  		  == PROCESS_FD))
  	    {
-	      int nread;
+	      int nread = 0, nnread;

  	      /* If waiting for this channel, arrange to return as
  		 soon as no more input to be processed.  No more
@@ -5912,7 +5912,13 @@ wait_reading_process_output (intmax_t time_limit, 
int nsecs, int read_kbd,
  	      /* Read data from the process, starting with our
  		 buffered-ahead character if we have one.  */

-	      nread = read_process_output (proc, channel);
+	      do
+		{
+		  nnread = read_process_output (proc, channel);
+		  nread += nnread;
+		}
+	      while (nnread >= 4096);
+
  	      if ((!wait_proc || wait_proc == XPROCESS (proc))
  		  && got_some_output < nread)
  		got_some_output = nread;


And "unlocking" the pipe size on the external process takes the 
performance further up a notch (by default it's much larger): 0.72 -> 0.65.

diff --git a/src/process.c b/src/process.c
index 2376d0f288d..85fc1b4d0c8 100644
--- a/src/process.c
+++ b/src/process.c
@@ -2206,10 +2206,10 @@ create_process (Lisp_Object process, char 
**new_argv, Lisp_Object current_dir)
        inchannel = p->open_fd[READ_FROM_SUBPROCESS];
        forkout = p->open_fd[SUBPROCESS_STDOUT];

-#if (defined (GNU_LINUX) || defined __ANDROID__)	\
-  && defined (F_SETPIPE_SZ)
-      fcntl (inchannel, F_SETPIPE_SZ, read_process_output_max);
-#endif /* (GNU_LINUX || __ANDROID__) && F_SETPIPE_SZ */
+/* #if (defined (GNU_LINUX) || defined __ANDROID__)	\ */
+/*   && defined (F_SETPIPE_SZ) */
+/*       fcntl (inchannel, F_SETPIPE_SZ, read_process_output_max); */
+/* #endif /\* (GNU_LINUX || __ANDROID__) && F_SETPIPE_SZ *\/ */
      }

    if (!NILP (p->stderrproc))

Apparently the patch from bug#55737 also made things a little worse by 
default, by limiting concurrency (the external process has to wait while 
the pipe is blocked, and by default Linux's pipe is larger). Just 
commenting it out makes performance a little better as well, though not 
as much as the two patches together.

Note that both changes above are just PoC (e.g. the hardcoded 4096, and 
probably other details like carryover).

I've tried to make a more nuanced loop inside read_process_output 
instead (as replacement for the first patch above), and so far it 
performs worse that the baseline. If anyone can see when I'm doing wrong 
(see attachment), comments are very welcome.