unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Simple optimization for read_avail_input()
@ 2004-01-30 15:57 Dmitry Antipov
  2004-02-03 14:17 ` Kim F. Storm
  2004-02-17 23:51 ` Kim F. Storm
  0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Antipov @ 2004-01-30 15:57 UTC (permalink / raw)


Hello,

this is a top of gprof output for Emacs CVS snapshot. It was being compiled with
'-O0 -ftest-coverage -g -pg -fprofile-arcs', started and finished with C-x C-c
immediately:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 12.12      0.04     0.04     2464     0.02     0.02  ccl_driver
 12.12      0.08     0.04      546     0.07     0.07  read_avail_input
  9.09      0.11     0.03    23731     0.00     0.00  read1
  9.09      0.14     0.03     4452     0.01     0.01  mark_object
  6.06      0.16     0.02   289315     0.00     0.00  readchar
  6.06      0.18     0.02     8335     0.00     0.00  Fbyte_code
  6.06      0.20     0.02      743     0.03     0.03  Fassoc
  3.03      0.21     0.01   136877     0.00     0.00  translate_char

It's clear here that very simple function read_avail_input() wastes a lot of
CPU time. IMHO this is because it wants to zero large 'struct input_event buf'
(which is KBD_BUFFER_SIZE (4096, except old MacOSs) * sizeof (struct input_event)
(44 bytes on 32-bit systems)) every time. But we can clear all 'buf' only once
and clear only used slots next time. The following patch illustrates this idea:

--- keyboard.c.~1.761.~ 2004-01-21 23:19:41.000000000 +0300
+++ keyboard.c  2004-01-30 18:37:04.000000000 +0300
@@ -6568,6 +6568,8 @@
    Returns the number of keyboard chars read, or -1 meaning
    this is a bad time to try to read input.  */
 
+static int prev_read = KBD_BUFFER_SIZE;
+
 static int
 read_avail_input (expected)
      int expected;
@@ -6576,7 +6578,7 @@
   register int i;
   int nread;
 
-  for (i = 0; i < KBD_BUFFER_SIZE; i++)
+  for (i = 0; i < prev_read; i++)
     EVENT_INIT (buf[i]);
 
   if (read_socket_hook)
@@ -6592,12 +6594,12 @@
 
       /* Determine how many characters we should *try* to read.  */
 #ifdef WINDOWSNT
-      return 0;
+      return (prev_read = 0);
 #else /* not WINDOWSNT */
 #ifdef MSDOS
       n_to_read = dos_keysns ();
       if (n_to_read == 0)
-       return 0;
+       return (prev_read = 0);
 #else /* not MSDOS */
 #ifdef FIONREAD
       /* Find out how much input is available.  */
@@ -6615,7 +6617,7 @@
            n_to_read = 0;
        }
       if (n_to_read == 0)
-       return 0;
+       return (prev_read = 0);
       if (n_to_read > sizeof cbuf)
        n_to_read = sizeof cbuf;
 #else /* no FIONREAD */
@@ -6706,7 +6708,7 @@
        break;
     }
 
-  return nread;
+  return (prev_read = nread);
 }
 #endif /* not VMS */
 
Here is an example of gprof output with this patch applied (other conditions
are exactly the same):

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
 18.42      0.07     0.07     4453     0.02     0.02  mark_object
 10.53      0.11     0.04    36279     0.00     0.00  specbind
  7.89      0.14     0.03     2465     0.01     0.01  ccl_driver
  5.26      0.16     0.02     8358     0.00     0.01  Fbyte_code
  5.26      0.18     0.02     7197     0.00     0.00  re_search_2
  2.63      0.19     0.01   289315     0.00     0.00  readchar
  2.63      0.20     0.01   156900     0.00     0.00  Faref
  ...
  0.00      0.38     0.00      563     0.00     0.00  call1
  0.00      0.38     0.00      548     0.00     0.00  XTread_socket
  0.00      0.38     0.00      548     0.00     0.00  read_avail_input
  0.00      0.38     0.00      544     0.00     0.00  handle_async_input

So, after several runs, read_avail_input() goes from 1st or 2nd to > 200th
(238 here) place (by CPU usage).

What do you think about this idea ?

Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Simple optimization for read_avail_input()
  2004-01-30 15:57 Simple optimization for read_avail_input() Dmitry Antipov
@ 2004-02-03 14:17 ` Kim F. Storm
  2004-02-17 23:51 ` Kim F. Storm
  1 sibling, 0 replies; 5+ messages in thread
From: Kim F. Storm @ 2004-02-03 14:17 UTC (permalink / raw)
  Cc: emacs-devel

Dmitry Antipov <dmitry.antipov@mail.ru> writes:

> Hello,
> 
> It's clear here that very simple function read_avail_input() wastes a lot of
> CPU time. IMHO this is because it wants to zero large 'struct input_event buf'
> (which is KBD_BUFFER_SIZE (4096, except old MacOSs) * sizeof (struct input_event)
> (44 bytes on 32-bit systems)) every time. But we can clear all 'buf' only once
> and clear only used slots next time. The following patch illustrates this idea:

Nice idea, but in its current form it fails because the buf array is allocated on 
the stack.  It may work to just declare it static though.

> 
> What do you think about this idea ?

I think it's ok, if we can guarantee that read_avail_input is never called
recursively  (I haven't checked).

But I really wonder why it is necessary to initialize the array at all.

It would seem more sensible simply to require that the functions which
stuff things into an input_event do the EVENT_INIT themselves -- if
necessary.  Of course, it is more work to find these places, but it
seems more correct to do it that way.

BTW, it definitely is unnecessary to init the whole array if
read_socket_hook is NULL; we could just init the first nread
elements in that case.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Simple optimization for read_avail_input()
       [not found] <E1Ap8aH-000LqG-00.dmitry-antipov-mail-ru@f7.mail.ru>
@ 2004-02-07  0:12 ` Kim F. Storm
  0 siblings, 0 replies; 5+ messages in thread
From: Kim F. Storm @ 2004-02-07  0:12 UTC (permalink / raw)
  Cc: emacs-devel

"Dmitry Antipov" <dmitry.antipov@mail.ru> writes:

> Kim F. Storm wrote:
> 
> > Nice idea, but in its current form it fails because the buf array is allocated on
> > the stack. It may work to just declare it static though.
> 
> Bups. I agree. But, in any case, we should avoid to zero 40k each time when read_avail_input() is called.

I agree, this is wasteful.

> It's not quite clear for me why this array should be so large (4k), btw.
> 
> > I think it's ok, if we can guarantee that read_avail_input is never called
> > recursively (I haven't checked).
> 
> It happens at least 1 time sometimes after creating X frame. Is it the same as expected ?

I don't know.

> 
> > But I really wonder why it is necessary to initialize the array at all.
> 
>   IMHO it's necessary. For example, if read_socket_hook is NULL, buf[i].x and buf[i].y
> are untouched. But they are Lisp_Objects, and if Fgarbage_collect() happens immediately
> after kbd_buffer_store_event() (from read_avail_input()), we will got invalid Lisp_Objects
> for mark_object() from mark_kboards (). I can reproduce this situation sometimes  (when
> my fingers are blazingly fast :-)).
>   The same for X events - not all of them touches x and y.

I wasn't advocating for never initializing input_events -- I would
just postpone the initialization until we actually use one of those
input_events, i.e. also before we do kbd_buffer_store_event...

> 
> > BTW, it definitely is unnecessary to init the whole array if
> > read_socket_hook is NULL; we could just init the first nread
> > elements in that case.
> 
> Agree. For XTread_socket and others, we probably should do EVENT_INIT when really needed.

It's the same principle...

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Simple optimization for read_avail_input()
  2004-01-30 15:57 Simple optimization for read_avail_input() Dmitry Antipov
  2004-02-03 14:17 ` Kim F. Storm
@ 2004-02-17 23:51 ` Kim F. Storm
  2004-02-20  0:31   ` Kim F. Storm
  1 sibling, 1 reply; 5+ messages in thread
From: Kim F. Storm @ 2004-02-17 23:51 UTC (permalink / raw)
  Cc: emacs-devel

Dmitry Antipov <dmitry.antipov@mail.ru> writes:

> Hello,
> 
> this is a top of gprof output for Emacs CVS snapshot. It was being compiled with
> '-O0 -ftest-coverage -g -pg -fprofile-arcs', started and finished with C-x C-c
> immediately:
> 
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total           time
>       seconds   seconds    calls  ms/call  ms/call  name    12.12
>       0.04     0.04     2464     0.02     0.02  ccl_driver
>  12.12      0.08     0.04      546     0.07     0.07  read_avail_input
>   9.09      0.11     0.03    23731     0.00     0.00  read1
>   9.09      0.14     0.03     4452     0.01     0.01  mark_object
>   6.06      0.16     0.02   289315     0.00     0.00  readchar
>   6.06      0.18     0.02     8335     0.00     0.00  Fbyte_code
>   6.06      0.20     0.02      743     0.03     0.03  Fassoc
>   3.03      0.21     0.01   136877     0.00     0.00  translate_char
> 
> It's clear here that very simple function read_avail_input() wastes a lot of
> CPU time. IMHO this is because it wants to zero large 'struct input_event buf'
> (which is KBD_BUFFER_SIZE (4096, except old MacOSs) * sizeof (struct input_event)
> (44 bytes on 32-bit systems)) every time. But we can clear all 'buf' only once
> and clear only used slots next time. The following patch illustrates this idea:

Hi Dmitry,

Your patch was installed in CVS yesterday, but it had some problems,
so I reworked it quite a bit.

I don't know if the intended effect on read_avail_input is still
present with my changes, so I would ask you to redo your measurements
to see if my changes are ok.  Can you do that?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Simple optimization for read_avail_input()
  2004-02-17 23:51 ` Kim F. Storm
@ 2004-02-20  0:31   ` Kim F. Storm
  0 siblings, 0 replies; 5+ messages in thread
From: Kim F. Storm @ 2004-02-20  0:31 UTC (permalink / raw)
  Cc: emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> I don't know if the intended effect on read_avail_input is still
> present with my changes, so I would ask you to redo your measurements
> to see if my changes are ok.  Can you do that?

Dmitry kindly measured the effect of my changes, and they were not
quite satisfactory, so I have now committed a simpler and more
efficient version of read_avail_input, which uses a much smaller
input_event buffer (8 instead of 4096 elements) when read_socket_hook
is used, and just a single input_event otherwise.

My own subjective feeling is that emacs is now more responsive, especially
when resizing the frame or dragging a mode line with the mouse.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-02-20  0:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-30 15:57 Simple optimization for read_avail_input() Dmitry Antipov
2004-02-03 14:17 ` Kim F. Storm
2004-02-17 23:51 ` Kim F. Storm
2004-02-20  0:31   ` Kim F. Storm
     [not found] <E1Ap8aH-000LqG-00.dmitry-antipov-mail-ru@f7.mail.ru>
2004-02-07  0:12 ` Kim F. Storm

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).