process.c: read_process_output: hard coded 4096 bytes read limit

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* process.c: read_process_output: hard coded 4096 bytes read limit
@ 2013-06-22 10:05 Miguel Guedes
  2013-06-22 17:27 ` Paul Eggert
  2013-06-23  9:28 ` Andreas Schwab
  0 siblings, 2 replies; 8+ messages in thread
From: Miguel Guedes @ 2013-06-22 10:05 UTC (permalink / raw)
  To: emacs-devel

Hello List,

In process.c, function read_process_output, all reads are limited to a 
maximum of 4096 bytes.  What is the rationale behind imposing a hardcoded 
limit of 4096 bytes per read from a process channel? (why isn't the 
user allowed to specify this read limit when creating a process/
connection?)

A minor mode I've written that sometimes receives 500-800KB of data as a 
response from an async network connection (opened via make-network-
process) takes several seconds to fully receive the data due to the hard 
coded limit mentioned above and the fact that emacs waits between reads  
(probably in wait_reading_process_output upon receiving a signal in 
sigchld_handler).

-- 
Miguel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-22 10:05 process.c: read_process_output: hard coded 4096 bytes read limit Miguel Guedes
@ 2013-06-22 17:27 ` Paul Eggert
  2013-06-23  8:59   ` Miguel Guedes
  2013-06-23  9:28 ` Andreas Schwab
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2013-06-22 17:27 UTC (permalink / raw)
  To: Miguel Guedes; +Cc: emacs-devel

On 06/22/13 03:05, Miguel Guedes wrote:

> In process.c, function read_process_output, all reads are limited to a 
> maximum of 4096 bytes.  What is the rationale behind imposing a hardcoded 
> limit of 4096 bytes per read from a process channel? (why isn't the 
> user allowed to specify this read limit when creating a process/
> connection?)

I don't know, and that limit has bothered me in the past, too.

It's a simple matter to increase it to 16 K bytes; does the
patch below help your performance?  If so, I can install it.
Letting the user specify a larger limit
might make sense, but would take a bit of thought and hacking.

=== modified file 'src/coding.h'
--- src/coding.h	2013-05-22 14:53:21 +0000
+++ src/coding.h	2013-06-22 16:53:36 +0000
@@ -399,6 +399,7 @@
   int rejected;
 };
 
+enum { CARRYOVER_BYTES_MAX = 64 };
 
 struct coding_system
 {
@@ -491,7 +492,7 @@
   /* Set to 1 if charbuf contains an annotation.  */
   unsigned annotated : 1;
 
-  unsigned char carryover[64];
+  unsigned char carryover[CARRYOVER_BYTES_MAX];
   int carryover_bytes;
 
   int default_char;

=== modified file 'src/process.c'
--- src/process.c	2013-06-22 16:43:39 +0000
+++ src/process.c	2013-06-22 17:08:46 +0000
@@ -4949,7 +4949,7 @@
    starting with our buffered-ahead character if we have one.
    Yield number of decoded characters read.
 
-   This function reads at most 4096 characters.
+   This function reads at most MAX_ALLOCA bytes.
    If you want to read all available subprocess output,
    you must call it repeatedly until it returns zero.
 
@@ -4959,16 +4959,16 @@
 static int
 read_process_output (Lisp_Object proc, register int channel)
 {
-  register ssize_t nbytes;
-  char *chars;
-  register struct Lisp_Process *p = XPROCESS (proc);
+  ssize_t nbytes;
+  char chars[MAX_ALLOCA];
+  struct Lisp_Process *p = XPROCESS (proc);
   struct coding_system *coding = proc_decode_coding_system[channel];
   int carryover = p->decoding_carryover;
-  int readmax = 4096;
+  verify (CARRYOVER_BYTES_MAX < sizeof chars);
+  int readmax = sizeof chars - carryover;
   ptrdiff_t count = SPECPDL_INDEX ();
   Lisp_Object odeactivate;
 
-  chars = alloca (carryover + readmax);
   if (carryover)
     /* See the comment above.  */
     memcpy (chars, SDATA (p->decoding_buf), carryover);





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-22 17:27 ` Paul Eggert
@ 2013-06-23  8:59   ` Miguel Guedes
  0 siblings, 0 replies; 8+ messages in thread
From: Miguel Guedes @ 2013-06-23  8:59 UTC (permalink / raw)
  To: emacs-devel

On Sat, 22 Jun 2013 10:27:55 -0700, Paul Eggert wrote:

> I don't know, and that limit has bothered me in the past, too.
> 
> It's a simple matter to increase it to 16 K bytes; does the patch below
> help your performance?  If so, I can install it. Letting the user
> specify a larger limit might make sense, but would take a bit of thought
> and hacking.
> 

Thank you for your reply, Paul!

I am unable to apply and test the patch at the moment but I'm sure it 
will help with performance, though not entirely fix it.  I have this 
minor mode that I'm developing that receives code completion data 
asynchronously from a server bound to localhost.  The actual server side 
processing of code completion always takes less than 100ms and the 
results are sent to the client in full.  It is then ironic (as well as 
frustrating) that the bottleneck (latency lasting many seconds) I'm 
experiencing is being caused by emacs when just receiving the raw data 
before even any processing takes place in the client side (which is very 
fast anyway)!

There are a couple of ways I can use to mitigate the latency issues 
mentioned above so I already have a fix, however IMHO there is a problem 
in the way emacs reads data from process channels that needs to be fixed, 
which I wouldn't mind contributing to.  I assume you're an active emacs 
developer (my own knowledge of emacs' internal code and architecture is 
extremely limited), how would you tackle this, Paul?  

Apart from raising the hardcoded limit, one (perhaps simpler) way would 
be to allow the user to specify the buffer size when spawning a process/
network connection, however that still leaves open the problem of the 
waits between reads.  So perhaps the best strategy to fix this, though 
probably more complicated, would be to tackle the waits between reads 
head on?  How problematic with this be?

Apologies for not understanding but by "I can install it", did you mean 
you can commit it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-22 10:05 process.c: read_process_output: hard coded 4096 bytes read limit Miguel Guedes
  2013-06-22 17:27 ` Paul Eggert
@ 2013-06-23  9:28 ` Andreas Schwab
  2013-06-25  8:20   ` Miguel Guedes
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2013-06-23  9:28 UTC (permalink / raw)
  To: Miguel Guedes; +Cc: emacs-devel

Miguel Guedes <miguel.a.guedes@gmail.com> writes:

> A minor mode I've written that sometimes receives 500-800KB of data as a 
> response from an async network connection (opened via make-network-
> process) takes several seconds to fully receive the data due to the hard 
> coded limit mentioned above and the fact that emacs waits between reads  
> (probably in wait_reading_process_output upon receiving a signal in 
> sigchld_handler).

If you call accept-process-output with this process it should read
whatever is available without delay.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-23  9:28 ` Andreas Schwab
@ 2013-06-25  8:20   ` Miguel Guedes
  2013-06-26 11:57     ` Thien-Thi Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Miguel Guedes @ 2013-06-25  8:20 UTC (permalink / raw)
  To: emacs-devel

On Sun, 23 Jun 2013 11:28:38 +0200, Andreas Schwab wrote:

> If you call accept-process-output with this process it should read
> whatever is available without delay.

Thanks, Andreas.

How would you use accept-process-output with an asynchronous process 
channel?  E.g. when would you call it? How do you know when there is data 
available?  The only way (that I see) of knowing when there is data 
available is when the filter is first called but can accept-process-
output be called from within the filter; wouldn't it lead to potentially 
massive recursiveness as the filter is called again and again?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-25  8:20   ` Miguel Guedes
@ 2013-06-26 11:57     ` Thien-Thi Nguyen
  2013-06-26 13:16       ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: Thien-Thi Nguyen @ 2013-06-26 11:57 UTC (permalink / raw)
  To: Miguel Guedes; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]

() Miguel Guedes <miguel.a.guedes@gmail.com>
() Tue, 25 Jun 2013 08:20:40 +0000 (UTC)

   How would you use accept-process-output with an asynchronous process
   channel?  E.g. when would you call it?  How do you know when there is
   data available?

Presumably the protocol specifies a format for the received data that
includes some kind of "end of message" marker.  You know there is data
available when that marker is not present (given a well-formed message).
For example:

(defun gnugo-synchronous-send/return (message)
  "Return (TIME . STRING) where TIME is that returned by `current-time' and
STRING omits the two trailing newlines.  See also `gnugo-query'."
  (when (gnugo-get :waitingp)
    (error "Sorry, still waiting for %s to play" (gnugo-get :gnugo-color)))
  (gnugo-put :sync-return "")
  (let ((proc (gnugo-get :proc)))
    (set-process-filter
     proc (lambda (proc string)
            (let* ((so-far (gnugo-get :sync-return))
                   (start  (max 0 (- (length so-far) 2))) ; backtrack a little
                   (full   (gnugo-put :sync-return (concat so-far string))))
              (when (string-match "\n\n" full start)
                (gnugo-put :sync-return
                  (cons (current-time) (substring full 0 -2)))))))
    (gnugo-send-line message)
    (let (rv)
      ;; type change => break
      (while (stringp (setq rv (gnugo-get :sync-return)))
        (accept-process-output proc))
      (gnugo-put :sync-return "")
      rv)))

Here, the "end of message" marker is two consecutive newlines, detected
by ‘string-match’.  The (gratuitously dynamically prepared, but somewhat
convenient for copy-and-paste purposes :-D) filter func accumulates the
input and does the final (at EOM) transform, only.  The trigger for the
initial process output is from ‘gnugo-send-line’ after which it is
‘gnugo-synchronous-send/return’ that controls the looping, by monitoring
the accumulated state to detect the transform.

   The only way (that I see) of knowing when there is data available is
   when the filter is first called but can accept-process- output be
   called from within the filter; wouldn't it lead to potentially
   massive recursiveness as the filter is called again and again?

Yes.  That's why the filter func and its (patiently looping) controller
must adhere to some mini protocol defining access to shared state, and
what that state "means".  [Insert VMS mailbox mumblings, here...]

-- 
Thien-Thi Nguyen
GPG key: 4C807502

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-26 11:57     ` Thien-Thi Nguyen
@ 2013-06-26 13:16       ` Stefan Monnier
  2013-06-27  6:03         ` Thien-Thi Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2013-06-26 13:16 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: Miguel Guedes, emacs-devel

>    How would you use accept-process-output with an asynchronous process
>    channel?  E.g. when would you call it?  How do you know when there is
>    data available?

> Presumably the protocol specifies a format for the received data that
> includes some kind of "end of message" marker.  You know there is data
> available when that marker is not present (given a well-formed message).
> For example:

But accept-process-output is synchronous: while running it, nothing else
can happen.  It's important to make sure that the performance is good
enough with accept-process-output but it should *also* be good enough
without accept-process-output.


        Stefan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: process.c: read_process_output: hard coded 4096 bytes read limit
  2013-06-26 13:16       ` Stefan Monnier
@ 2013-06-27  6:03         ` Thien-Thi Nguyen
  0 siblings, 0 replies; 8+ messages in thread
From: Thien-Thi Nguyen @ 2013-06-27  6:03 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 610 bytes --]

() Stefan Monnier <monnier@iro.umontreal.ca>
() Wed, 26 Jun 2013 09:16:13 -0400

   But accept-process-output is synchronous: while running it, nothing
   else can happen.  It's important to make sure that the performance is
   good enough with accept-process-output but it should *also* be good
   enough without accept-process-output.

I cannot agree or disagree w/o knowing the particulars.  In the case of
gnugo.el, message size never exceeds a few kilobytes, and performance is
not critical.  Anyway, i'm interested in hearing what choices OP makes.

-- 
Thien-Thi Nguyen
GPG key: 4C807502

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-06-27  6:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-22 10:05 process.c: read_process_output: hard coded 4096 bytes read limit Miguel Guedes
2013-06-22 17:27 ` Paul Eggert
2013-06-23  8:59   ` Miguel Guedes
2013-06-23  9:28 ` Andreas Schwab
2013-06-25  8:20   ` Miguel Guedes
2013-06-26 11:57     ` Thien-Thi Nguyen
2013-06-26 13:16       ` Stefan Monnier
2013-06-27  6:03         ` Thien-Thi Nguyen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).