unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* profiling emacs-23.1 vs emacs-22.3
@ 2009-08-03 20:29 Dan Nicolaescu
  2009-08-04 17:10 ` Leo
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-03 20:29 UTC (permalink / raw)
  To: emacs-devel


I run a small test.
Take src/config.h run C-x r t on it to insert a space in front of each
line.
Then build both emacs-23.1 and 22.3 with profiling enabled (-O2 -g -pg).
Then run:
emacs -Q -nw config.h
M-: (indent-region (point-min)(point-max)) RET
C-x C-c

The 22.3 gprof results:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 41.29      7.13     7.13  4653244     0.00     0.00  forw_comment
 16.10      9.91     2.78    32473     0.09     0.31  scan_sexps_forward
  9.09     11.48     1.57    15132     0.10     0.15  re_search_2
  5.62     12.45     0.97    17286     0.06     0.15  scan_lists
  5.59     13.41     0.96 33462255     0.00     0.00  lookup_char_property
  3.36     13.99     0.58 18478958     0.00     0.00  next_interval
  2.14     14.37     0.37   129733     0.00     0.00  mark_object
  1.80     14.68     0.31   137020     0.00     0.00  skip_chars
  1.74     14.97     0.30 14459993     0.00     0.00  previous_interval
  1.62     15.26     0.28 57446793     0.00     0.00  Fcdr
  1.36     15.49     0.23  2156082     0.00     0.00  update_syntax_table
  1.27     15.71     0.22  3315615     0.00     0.00  re_match_2_internal
  1.19     15.91     0.20 23687203     0.00     0.00  Fassq
  0.87     16.07     0.15    13478     0.01     0.06  Fprevious_single_property_change
  0.75     16.20     0.13 33462255     0.00     0.00  textget
  0.75     16.32     0.13   228439     0.00     0.00  Fbyte_code
  0.52     16.41     0.09  1978403     0.00     0.00  update_interval
  0.46     16.50     0.08  1101056     0.00     0.00  find_interval
  0.29     16.55     0.05  3613429     0.00     0.00  buf_bytepos_to_charpos
  0.26     16.59     0.04 10137050     0.00     0.00  Fcar
  0.26     16.64     0.04  1154883     0.00     0.00  balance_an_interval
  0.23     16.68     0.04   579275     0.00     0.00  unbind_to



The 23.1 gprof results:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 29.60      7.03     7.03 406111658     0.00     0.00  char_table_ref
 26.72     13.37     6.34  4653244     0.00     0.00  forw_comment
 11.80     16.17     2.80    32121     0.00     0.00  scan_sexps_forward
  5.98     17.59     1.42    23271     0.00     0.00  re_search_2
  5.27     18.84     1.25    17289     0.00     0.00  scan_lists
  4.13     19.82     0.98 33439981     0.00     0.00  lookup_char_property
  1.94     20.28     0.46 18488739     0.00     0.00  next_interval
  1.73     20.69     0.41 10038210     0.00     0.00  mark_object
  1.26     20.99     0.30  2156799     0.00     0.00  update_syntax_table
  1.20     21.27     0.29                             sub_char_table_ref
  1.10     21.53     0.26 14455134     0.00     0.00  previous_interval
  0.88     21.74     0.21 57391428     0.00     0.00  Fcdr
  0.88     21.95     0.21  3318263     0.00     0.00  re_match_2_internal
  0.72     22.12     0.17 23671456     0.00     0.00  Fassq
  0.67     22.28     0.16   196373     0.00     0.00  Fbyte_code
  0.59     22.42     0.14    13478     0.00     0.00  Fprevious_single_property_change
  0.51     22.54     0.12       38     0.00     0.02  Fgarbage_collect
  0.38     22.63     0.09 33439981     0.00     0.00  textget
  0.38     22.72     0.09 10115583     0.00     0.00  Fcar
[snip]

-----------------------------------------------
                0.05    0.22     612/32121       back_comment [28]
                2.75   11.33   31509/32121       Fparse_partial_sexp [6]
[5]     60.5    2.80   11.55   32121         scan_sexps_forward [5]
                5.17    3.76 3796695/4653244     forw_comment [7]
                1.62    0.00 93379139/406111658     char_table_ref [8]
                0.15    0.86 1092890/2156799     update_syntax_table [11]
                0.00    0.00    5096/10115583     Fcar [46]
                0.00    0.00    6552/57391428     Fcdr [33]
                0.00    0.00     147/952550      Fcons [105]

It can be seen that 23.1 is quite a bit slower, and that it has a lot of
extra calls to char_table_ref.  
Are those calls necessary?




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-03 20:29 profiling emacs-23.1 vs emacs-22.3 Dan Nicolaescu
@ 2009-08-04 17:10 ` Leo
  2009-08-04 19:50 ` Chong Yidong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Leo @ 2009-08-04 17:10 UTC (permalink / raw)
  To: emacs-devel

On 2009-08-03 21:29 +0100, Dan Nicolaescu wrote:
> It can be seen that 23.1 is quite a bit slower, and that it has a lot
> of extra calls to char_table_ref. Are those calls necessary?

Thank you for the quantitative results. From experience, there is
slowdown in Emacs 23.

-- 
Leo's Emacs uptime: 17 hours, 6 minutes, 51 seconds





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-03 20:29 profiling emacs-23.1 vs emacs-22.3 Dan Nicolaescu
  2009-08-04 17:10 ` Leo
@ 2009-08-04 19:50 ` Chong Yidong
  2009-08-04 19:56   ` Dan Nicolaescu
  2009-08-05  7:47 ` Dan Nicolaescu
  2009-08-24  6:52 ` Kenichi Handa
  3 siblings, 1 reply; 17+ messages in thread
From: Chong Yidong @ 2009-08-04 19:50 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

Dan Nicolaescu <dann@ics.uci.edu> writes:

> I run a small test.
> Take src/config.h run C-x r t on it to insert a space in front of each
> line.
> Then build both emacs-23.1 and 22.3 with profiling enabled (-O2 -g -pg).
> Then run:
> emacs -Q -nw config.h
> M-: (indent-region (point-min)(point-max)) RET
> C-x C-c
>
> It can be seen that 23.1 is quite a bit slower, and that it has a lot of
> extra calls to char_table_ref.

Is there any difference in CVS trunk?




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-04 19:50 ` Chong Yidong
@ 2009-08-04 19:56   ` Dan Nicolaescu
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-04 19:56 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

  > Dan Nicolaescu <dann@ics.uci.edu> writes:
  > 
  > > I run a small test.
  > > Take src/config.h run C-x r t on it to insert a space in front of each
  > > line.
  > > Then build both emacs-23.1 and 22.3 with profiling enabled (-O2 -g -pg).
  > > Then run:
  > > emacs -Q -nw config.h
  > > M-: (indent-region (point-min)(point-max)) RET
  > > C-x C-c
  > >
  > > It can be seen that 23.1 is quite a bit slower, and that it has a lot of
  > > extra calls to char_table_ref.
  > 
  > Is there any difference in CVS trunk?

No




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-03 20:29 profiling emacs-23.1 vs emacs-22.3 Dan Nicolaescu
  2009-08-04 17:10 ` Leo
  2009-08-04 19:50 ` Chong Yidong
@ 2009-08-05  7:47 ` Dan Nicolaescu
  2009-08-24  6:52 ` Kenichi Handa
  3 siblings, 0 replies; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-05  7:47 UTC (permalink / raw)
  To: emacs-devel


Here's another data point.  This time from running a process (a
recursive grep for setq in the lisp subdirectory):

emacs --batch --eval '(progn (call-process "grep" nil (get-buffer-create "tmp") t "--color=never" "-r" "setq" "/tmp/Emacs-CVS/emacs/lisp/") (set-buffer "tmp") (write-file "/tmp/23"))'

emacs-22.3:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 20.59      0.07     0.07      111     0.63     0.63  decode_eol
 14.71      0.12     0.05     5018     0.01     0.01  re_search_2
 14.71      0.17     0.05      138     0.36     1.12  decode_coding
 11.76      0.21     0.04        1    40.00    40.00  find_safe_codings
  8.82      0.24     0.03      224     0.13     0.13  detect_coding_mask
  5.88      0.26     0.02        1    20.00    20.00  unencodable_char_position
  2.94      0.27     0.01  1401155     0.00     0.00  Faref
  2.94      0.28     0.01  1396772     0.00     0.00  translate_char
  2.94      0.29     0.01     6955     0.00     0.00  Fstring_equal
  2.94      0.30     0.01     3180     0.00     0.00  allocate_vector
  2.94      0.31     0.01     2062     0.00     0.00  mark_object
  2.94      0.32     0.01      509     0.02     0.02  analyze_first
  2.94      0.33     0.01      392     0.03     0.03  buf_charpos_to_bytepos
  2.94      0.34     0.01        5     2.00     2.51  Faccessible_keymaps
  0.00      0.34     0.00    84102     0.00     0.00  readchar
  0.00      0.34     0.00    22523     0.00     0.00  re_match_2_internal
[snip]



CVS HEAD:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 18.18      0.06     0.06  2097964     0.00     0.00  mark_object
 15.15      0.11     0.05      966     0.05     0.05  consume_chars
  9.09      0.14     0.03      388     0.08     0.08  encode_coding_raw_text
  9.09      0.17     0.03      219     0.14     0.15  produce_chars
  6.06      0.19     0.02      140     0.14     0.21  detect_coding
  6.06      0.21     0.02       98     0.20     0.20  decode_coding_iso_2022
  6.06      0.23     0.02        1    20.00    20.00  Funencodable_char_position
  3.03      0.24     0.01   377383     0.00     0.00  read_hex
  3.03      0.25     0.01   219810     0.00     0.00  char_table_ref
  3.03      0.26     0.01     8391     0.00     0.00  mem_insert
  3.03      0.27     0.01     7036     0.00     0.00  Fcar
  3.03      0.28     0.01     1472     0.01     0.01  emacs_read
  3.03      0.29     0.01      467     0.02     0.02  Ftext_properties_at
  3.03      0.30     0.01      201     0.05     0.05  gap_left
  3.03      0.31     0.01       64     0.16     0.43  load_charset_map
  3.03      0.32     0.01        4     2.50     2.50  Fkill_buffer
  3.03      0.33     0.01        4     2.50     2.50  detect_coding_iso_2022
  0.00      0.33     0.00   221420     0.00     0.00  char_table_set
  0.00      0.33     0.00   106914     0.00     0.00  readchar
  0.00      0.33     0.00    82427     0.00     0.00  Faref
[snip]

so we have a lot more calls to mark_object.  They do not matter much
from the performance point of view in this particular case, but it
should be interesting to find out why we get them now.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-03 20:29 profiling emacs-23.1 vs emacs-22.3 Dan Nicolaescu
                   ` (2 preceding siblings ...)
  2009-08-05  7:47 ` Dan Nicolaescu
@ 2009-08-24  6:52 ` Kenichi Handa
       [not found]   ` <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>
  3 siblings, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2009-08-24  6:52 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

In article <200908032029.n73KTi9h017528@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> It can be seen that 23.1 is quite a bit slower, and that it has a lot of
> extra calls to char_table_ref.  
> Are those calls necessary?

I found that the syntax of C is mostly defined in the parent of
CURRENT_SYNTAX_TABLE, and thus, in the call of SYNTAX_ENTRY (C),
the optimization for ASCII in this code (in lisp.h) doesn't work.

/* Almost equivalent to Faref (CT, IDX) with optimization for ASCII
   characters.  Do not check validity of CT.  */
#define CHAR_TABLE_REF(CT, IDX)						 \
  ((ASCII_CHAR_P (IDX)							 \
    && SUB_CHAR_TABLE_P (XCHAR_TABLE (CT)->ascii)			 \
    && !NILP (XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX])) \
   ? XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX]		 \
   : char_table_ref ((CT), (IDX)))

Could you try the attached patch?  If it improves the
performance, I'll commit it.

---
Kenichi Handa
handa@m17n.org

--- lisp.h.~1.661.~	2009-08-21 15:03:39.000000000 +0900
+++ lisp.h	2009-08-24 15:47:02.000000000 +0900
@@ -793,13 +793,37 @@
 #define CHAR_TABLE_EXTRA_SLOTS(CT)	\
   (((CT)->size & PSEUDOVECTOR_SIZE_MASK) - CHAR_TABLE_STANDARD_SLOTS)
 
+#ifdef __GNUC__
+
+#define CHAR_TABLE_REF_ASCII(CT, IDX)					\
+  ({struct Lisp_Char_Table *_tbl = NULL;				\
+    Lisp_Object _val;							\
+    do {								\
+      _tbl = _tbl ? XCHAR_TABLE (_tbl->parent) : XCHAR_TABLE (CT);	\
+      _val = (! SUB_CHAR_TABLE_P (_tbl->ascii) ? _tbl->ascii		\
+	      : XSUB_CHAR_TABLE (_tbl->ascii)->contents[IDX]);		\
+      if (NILP (_val))							\
+	_val = _tbl->defalt;						\
+    } while (NILP (_val) && ! NILP (_tbl->parent));			\
+    _val; })
+      
+#else  /* not __GNUC__ */
+
+#define CHAR_TABLE_REF_ASCII(CT, IDX)					  \
+  (! NILP (XCHAR_TABLE (CT)->ascii)					  \
+   ? (! SUB_CHAR_TABLE_P (XCHAR_TABLE (CT)->ascii)			  \
+      ? XCHAR_TABLE (CT)->ascii						  \
+      : ! NILP (XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX]) \
+      ? XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX]	  \
+      : char_table_ref ((CT), (IDX)))					  \
+   :  char_table_ref ((CT), (IDX)))
+
+#endif	/* not __GNUC__ */
+
 /* Almost equivalent to Faref (CT, IDX) with optimization for ASCII
    characters.  Do not check validity of CT.  */
-#define CHAR_TABLE_REF(CT, IDX)						 \
-  ((ASCII_CHAR_P (IDX)							 \
-    && SUB_CHAR_TABLE_P (XCHAR_TABLE (CT)->ascii)			 \
-    && !NILP (XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX])) \
-   ? XSUB_CHAR_TABLE (XCHAR_TABLE (CT)->ascii)->contents[IDX]		 \
+#define CHAR_TABLE_REF(CT, IDX)					\
+  (ASCII_CHAR_P (IDX) ? CHAR_TABLE_REF_ASCII ((CT), (IDX))	\
    : char_table_ref ((CT), (IDX)))
 
 /* Almost equivalent to Faref (CT, IDX).  However, if the result is




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
       [not found]   ` <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>
@ 2009-08-24 11:39     ` Kenichi Handa
  2009-08-24 18:26       ` Dan Nicolaescu
  2009-08-24 22:18       ` Alan Mackenzie
  0 siblings, 2 replies; 17+ messages in thread
From: Kenichi Handa @ 2009-08-24 11:39 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

In article <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> Could you try the attached patch?  If it improves the

> It does improve performance:

How much in real time?

[...]
> It's still slower than 22.3 though.

> One big difference is then time/number of calls to mark_objects 
> 129733 vs 18834514, so 145 times more calls to mark_object.
> Do you know where do those come from?

> The number of Fgarbage_collect calls does not increase that much: 
> from 37 (for 22.3) to  43 (for 23.1).

It seems that c-indent-region has been changed a lot.  I'm
not sure, but perhaps that is the reason.  How do the other
people think?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-24 11:39     ` Kenichi Handa
@ 2009-08-24 18:26       ` Dan Nicolaescu
  2009-08-25  6:07         ` Kenichi Handa
  2009-08-24 22:18       ` Alan Mackenzie
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-24 18:26 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

  > In article <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:
  > 
  > > Could you try the attached patch?  If it improves the
  > 
  > > It does improve performance:
  > 
  > How much in real time?

from 22 seconds to 19 (vs about 16 for 22.3 )


  > [...]
  > > It's still slower than 22.3 though.
  > 
  > > One big difference is then time/number of calls to mark_objects 
  > > 129733 vs 18834514, so 145 times more calls to mark_object.
  > > Do you know where do those come from?
  > 
  > > The number of Fgarbage_collect calls does not increase that much: 
  > > from 37 (for 22.3) to  43 (for 23.1).
  > 
  > It seems that c-indent-region has been changed a lot.  I'm
  > not sure, but perhaps that is the reason.  How do the other
  > people think?

I set the load path to cc-mode from 22.3, it did not make any significant
difference in the amount of mark_object and Fgarbage_collect calls.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-24 11:39     ` Kenichi Handa
  2009-08-24 18:26       ` Dan Nicolaescu
@ 2009-08-24 22:18       ` Alan Mackenzie
  1 sibling, 0 replies; 17+ messages in thread
From: Alan Mackenzie @ 2009-08-24 22:18 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Dan Nicolaescu, emacs-devel

On Mon, Aug 24, 2009 at 08:39:47PM +0900, Kenichi Handa wrote:
> In article <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> It seems that c-indent-region has been changed a lot.  I'm not sure,
> but perhaps that is the reason [for a slowdown].  How do the other
> people think?

The change was a bug fix.  It shouldn't have slowed down indentation, at
least not by more than epsilon.

> ---
> Kenichi Handa
> handa@m17n.org

-- 
Alan Mackenzie (Nuremberg, Germany).




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-24 18:26       ` Dan Nicolaescu
@ 2009-08-25  6:07         ` Kenichi Handa
  2009-08-25 18:47           ` Dan Nicolaescu
  0 siblings, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2009-08-25  6:07 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

In article <200908241826.n7OIQRhS001092@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> Kenichi Handa <handa@m17n.org> writes:
> In article <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:
> 
> > Could you try the attached patch?  If it improves the
> 
> > It does improve performance:
> 
> How much in real time?

> from 22 seconds to 19 (vs about 16 for 22.3 )

I see.  Then it is worth installing that patch.  I've just
done it.

> [...]
> > It's still slower than 22.3 though.
> 
> > One big difference is then time/number of calls to mark_objects 
> > 129733 vs 18834514, so 145 times more calls to mark_object.
> > Do you know where do those come from?
> 
> > The number of Fgarbage_collect calls does not increase that much: 
> > from 37 (for 22.3) to  43 (for 23.1).
> 
> It seems that c-indent-region has been changed a lot.  I'm
> not sure, but perhaps that is the reason.  How do the other
> people think?

> I set the load path to cc-mode from 22.3, it did not make any significant
> difference in the amount of mark_object and Fgarbage_collect calls.

Ok, then I suspect that the slowness is because of newly
introduced char-tables.  I've just installed the attached
change too to improve the performance of object marking in
GC.

Please try again with the latest code.

---
Kenichi Handa
handa@m17n.org

2009-08-25  Kenichi Handa  <handa@m17n.org>

	* alloc.c (mark_char_table): New function.
	(mark_object): Use mark_char_table for a char-table.

--- alloc.c.~1.448.~	2009-08-17 21:17:19.000000000 +0900
+++ alloc.c	2009-08-25 15:01:28.000000000 +0900
@@ -5371,6 +5371,34 @@
   return 1;
 }
 
+/* Like mark_vectorlike but optimized for char-tables (and
+   sub-char-tables) assuming that the contents are mostly integers or
+   symbols.  */
+
+static void
+mark_char_table (ptr)
+     struct Lisp_Vector *ptr;
+{
+  register EMACS_INT size = ptr->size & PSEUDOVECTOR_SIZE_MASK;
+  register int i;
+
+  VECTOR_MARK (ptr);
+  for (i = 0; i < size; i++)
+    {
+      Lisp_Object val = ptr->contents[i];
+
+      if (INTEGERP (val) || SYMBOLP (val) && XSYMBOL (val)->gcmarkbit)
+	continue;
+      if (SUB_CHAR_TABLE_P (val))
+	{
+	  if (! VECTOR_MARKED_P (XVECTOR (val)))
+	    mark_char_table (XVECTOR (val));
+	}
+      else
+	mark_object (val);
+    }
+}
+
 void
 mark_object (arg)
      Lisp_Object arg;
@@ -5533,6 +5561,11 @@
 		VECTOR_MARK (XVECTOR (h->key_and_value));
 	    }
 	}
+      else if (CHAR_TABLE_P (obj))
+	{
+	  if (! VECTOR_MARKED_P (XVECTOR (obj)))
+	    mark_char_table (XVECTOR (obj));
+	}
       else
 	mark_vectorlike (XVECTOR (obj));
       break;




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-25  6:07         ` Kenichi Handa
@ 2009-08-25 18:47           ` Dan Nicolaescu
  2009-08-26  6:01             ` Kenichi Handa
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-25 18:47 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

  > In article <200908241826.n7OIQRhS001092@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:
  > 
  > > Kenichi Handa <handa@m17n.org> writes:
  > > In article <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:
  > > 
  > > > Could you try the attached patch?  If it improves the
  > > 
  > > > It does improve performance:
  > > 
  > > How much in real time?
  > 
  > > from 22 seconds to 19 (vs about 16 for 22.3 )
  > 
  > I see.  Then it is worth installing that patch.  I've just
  > done it.

Thanks.

  > > [...]
  > > > It's still slower than 22.3 though.
  > > 
  > > > One big difference is then time/number of calls to mark_objects 
  > > > 129733 vs 18834514, so 145 times more calls to mark_object.
  > > > Do you know where do those come from?
  > > 
  > > > The number of Fgarbage_collect calls does not increase that much: 
  > > > from 37 (for 22.3) to  43 (for 23.1).
  > > 
  > > It seems that c-indent-region has been changed a lot.  I'm
  > > not sure, but perhaps that is the reason.  How do the other
  > > people think?
  > 
  > > I set the load path to cc-mode from 22.3, it did not make any significant
  > > difference in the amount of mark_object and Fgarbage_collect calls.
  > 
  > Ok, then I suspect that the slowness is because of newly
  > introduced char-tables.  I've just installed the attached
  > change too to improve the performance of object marking in
  > GC.
  > 
  > Please try again with the latest code.

It's better, but still not too good:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 43.03      8.55     8.55  4864529     0.00     0.00  forw_comment
 17.16     11.96     3.41    32860     0.00     0.00  scan_sexps_forward
  7.90     13.53     1.57    17306     0.00     0.00  scan_lists
  7.20     14.96     1.43    50443     0.00     0.00  re_search_2
  4.43     15.84     0.88 33382219     0.00     0.00  lookup_char_property
  2.42     16.32     0.48 18504890     0.00     0.00  next_interval
  2.21     16.76     0.44  8177518     0.00     0.00  mark_object
  1.86     17.13     0.37 14382565     0.00     0.00  previous_interval
  1.66     17.46     0.33  3329754     0.00     0.00  re_match_2_internal
  1.26     17.71     0.25  2156399     0.00     0.00  update_syntax_table
  1.16     17.94     0.23    13472     0.00     0.00  Fprevious_single_property_change
  1.01     18.14     0.20 23703332     0.00     0.00  Fassq
  1.01     18.34     0.20 57233041     0.00     0.00  Fcdr
  0.81     18.50     0.16   201989     0.00     0.00  Fbyte_code
  0.70     18.64     0.14       43     0.00     0.02  Fgarbage_collect
  0.45     18.73     0.09  3943360     0.00     0.00  mark_vectorlike
  0.45     18.82     0.09  1976191     0.00     0.00  update_interval
  0.40     18.90     0.08   103511     0.00     0.00  skip_chars
  0.40     18.98     0.08     3111     0.00     0.00  mark_char_table
  0.35     19.05     0.07 33382219     0.00     0.00  textget
  0.25     19.10     0.05  1076559     0.00     0.00  find_interval
  0.25     19.15     0.05   893966     0.00     0.00  re_iswctype
  0.25     19.20     0.05     1294     0.00     0.00  Fsetcar
  0.20     19.24     0.04    41380     0.00     0.00  re_compile_pattern
  0.15     19.27     0.03  3613707     0.00     0.00  buf_bytepos_to_charpos
  0.15     19.30     0.03   589363     0.00     0.00  set_point_both
  0.15     19.33     0.03   329641     0.00     0.00  validate_interval_range
  0.15     19.36     0.03    15287     0.00     0.00  adjust_for_invis_intang
  0.15     19.39     0.03                             interval_start_pos
  0.10     19.41     0.02   987278     0.00     0.00  Fcons
 

compare to doing the same thing in an emacs -Q -nw session:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 46.55      9.10     9.10  4864529     0.00     0.00  forw_comment
 15.40     12.11     3.01    32860     0.00     0.00  scan_sexps_forward
  7.31     13.54     1.43    17306     0.00     0.00  scan_lists
  7.16     14.94     1.40    27938     0.00     0.00  re_search_2
  4.40     15.80     0.86 33368915     0.00     0.00  lookup_char_property
  2.35     16.26     0.46 18498165     0.00     0.00  next_interval
  1.74     16.60     0.34    13472     0.00     0.00  Fprevious_single_property_change
  1.64     16.92     0.32  3319319     0.00     0.00  re_match_2_internal
  1.53     17.22     0.30  5830028     0.00     0.00  mark_object
  1.53     17.52     0.30  2155779     0.00     0.00  update_syntax_table
  1.48     17.81     0.29 14382565     0.00     0.00  previous_interval
  1.28     18.06     0.25 23687804     0.00     0.00  Fassq
  0.72     18.20     0.14 57187184     0.00     0.00  Fcdr
  0.66     18.33     0.13   198744     0.00     0.00  Fbyte_code
  0.51     18.43     0.10 33368915     0.00     0.00  textget
  0.36     18.50     0.07    15287     0.00     0.00  adjust_for_invis_intang
  0.36     18.57     0.07     2744     0.00     0.00  Frassq
  0.36     18.64     0.07       38     0.00     0.01  Fgarbage_collect
  0.31     18.70     0.06   103511     0.00     0.00  skip_chars
  0.26     18.75     0.05 10044672     0.00     0.00  Fcar
  0.26     18.80     0.05  1976191     0.00     0.00  update_interval
  0.26     18.85     0.05     1862     0.00     0.00  mark_char_table
  0.20     18.89     0.04  1066193     0.00     0.00  find_interval
  0.20     18.93     0.04    73761     0.00     0.00  Flocal_variable_p
  0.20     18.97     0.04    20489     0.00     0.00  re_compile_pattern
  0.20     19.01     0.04     9351     0.00     0.00  back_comment


the -nw -Q session has fewer charsets loaded, the -Q version loads the
charset for BIG5 (and others) at startup even though it's running in a
LANG=C environment.

It seems that the charset data structure is too GC intensive, maybe it's
time to rethink it to make it less so.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-25 18:47           ` Dan Nicolaescu
@ 2009-08-26  6:01             ` Kenichi Handa
  2009-08-26  6:33               ` Dan Nicolaescu
  2009-08-27  6:27               ` Kenichi Handa
  0 siblings, 2 replies; 17+ messages in thread
From: Kenichi Handa @ 2009-08-26  6:01 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

In article <200908251847.n7PIlb4C012720@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> the -nw -Q session has fewer charsets loaded, the -Q version loads the
> charset for BIG5 (and others) at startup even though it's running in a
> LANG=C environment.

Some CJK charsets are necessary to create the default
fontset.  I think it's possible to avoid loading unnecessary
charsets for creating the default fontset.  I'll work on it.

But, I think the main sloness of -Q version is fontsets.  A
fontset is a char-table whose elements are vectors, and thus
the just-installed optimization doesn't work.

By the way, I did this test with (1) Emacs-22, (2) trunk
(without gc optimization), and (3) trunk (with gc
optimization).

% time LANG=C emacs -Q -batch --eval '(dotimes (i 100) (garbage-collect))'

The result is:

(1) user 0m1.460s
(2) user 0m1.772s
(3) user 0m1.644s

So, (3) is still 13% slower than (1), but I think it's a
necessary cost for having the full Unicode support.  And
usually the slowness of GC doesn't direclty affect the total
execution time.

> It seems that the charset data structure is too GC intensive, maybe it's
> time to rethink it to make it less so.

For charset data, I agree that it doesn't have to be a
char-table (for encoding) nor a vector (for decoding).  I'll
consider replacing them with some C structure.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-26  6:01             ` Kenichi Handa
@ 2009-08-26  6:33               ` Dan Nicolaescu
  2009-08-26  8:06                 ` Kenichi Handa
  2009-08-27  6:27               ` Kenichi Handa
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Nicolaescu @ 2009-08-26  6:33 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

  > In article <200908251847.n7PIlb4C012720@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:
  > 
  > > the -nw -Q session has fewer charsets loaded, the -Q version loads the
  > > charset for BIG5 (and others) at startup even though it's running in a
  > > LANG=C environment.
  > 
  > Some CJK charsets are necessary to create the default
  > fontset.  I think it's possible to avoid loading unnecessary
  > charsets for creating the default fontset.  I'll work on it.

Can the data in the default fontset be in the dumped image in pure
memory?  To not have to create it every time on start up and GC it
should be a win.

  > But, I think the main sloness of -Q version is fontsets.  A
  > fontset is a char-table whose elements are vectors, and thus
  > the just-installed optimization doesn't work.
  > 
  > By the way, I did this test with (1) Emacs-22, (2) trunk
  > (without gc optimization), and (3) trunk (with gc
  > optimization).
  > 
  > % time LANG=C emacs -Q -batch --eval '(dotimes (i 100) (garbage-collect))'
  > 
  > The result is:
  > 
  > (1) user 0m1.460s
  > (2) user 0m1.772s
  > (3) user 0m1.644s
  > 
  > So, (3) is still 13% slower than (1), but I think it's a
  > necessary cost for having the full Unicode support.  And
  > usually the slowness of GC doesn't direclty affect the total
  > execution time.

It does, see the profiling data that I sent.  The problem with GC is
that it will flush all the caches if there's too much memory to walk
over.  And the slowdown in GC speed is directly related to the big
increase in the number of GCable objects.

If you want to see a bit of this, please try this simple
instrumentation (is should apply to both 22.3 and CVS trunk):

Index: alloc.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/alloc.c,v
retrieving revision 1.449
diff -u -3 -p -u -p -r1.449 alloc.c
--- alloc.c	25 Aug 2009 06:03:09 -0000	1.449
+++ alloc.c	26 Aug 2009 06:28:36 -0000
@@ -4963,6 +4963,7 @@ inhibit_garbage_collection ()
   return count;
 }
 
+int call_count_mark_object = 0;
 
 DEFUN ("garbage-collect", Fgarbage_collect, Sgarbage_collect, 0, 0, "",
        doc: /* Reclaim storage for Lisp objects no longer needed.
@@ -4986,6 +4987,9 @@ returns nil, because real GC can't be do
   Lisp_Object total[8];
   int count = SPECPDL_INDEX ();
   EMACS_TIME t1, t2, t3;
+  int bef = call_count_mark_object;
+
+  /* fprintf (stderr, "bef gc mark_object count = %d\n", call_count_mark_object); */
 
   if (abort_on_gc)
     abort ();
@@ -5278,6 +5282,8 @@ returns nil, because real GC can't be do
 			      EMACS_USECS (t3) * 1.0e-6);
   gcs_done++;
 
+  fprintf (stderr, "after gc mark_object delta = %d\n", call_count_mark_object - bef);
+
   return Flist (sizeof total / sizeof *total, total);
 }
 
@@ -5410,6 +5416,8 @@ mark_object (arg)
 #endif
   int cdr_count = 0;
 
+  call_count_mark_object++;
+
  loop:
 
   if (PURE_POINTER_P (XPNTR (obj)))



  > > It seems that the charset data structure is too GC intensive, maybe it's
  > > time to rethink it to make it less so.
  > 
  > For charset data, I agree that it doesn't have to be a
  > char-table (for encoding) nor a vector (for decoding).  I'll
  > consider replacing them with some C structure.

Great, thanks!




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-26  6:33               ` Dan Nicolaescu
@ 2009-08-26  8:06                 ` Kenichi Handa
  2009-08-26 20:46                   ` Andreas Schwab
  0 siblings, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2009-08-26  8:06 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: emacs-devel

In article <200908260633.n7Q6XRXi018187@godzilla.ics.uci.edu>, Dan Nicolaescu <dann@ics.uci.edu> writes:

> Some CJK charsets are necessary to create the default
> fontset.  I think it's possible to avoid loading unnecessary
> charsets for creating the default fontset.  I'll work on it.

> Can the data in the default fontset be in the dumped image in pure
> memory?  To not have to create it every time on start up and GC it
> should be a win.

Is it possible to put the whole char-table in pure space,
then modify some part at running time?

> So, (3) is still 13% slower than (1), but I think it's a
> necessary cost for having the full Unicode support.  And
> usually the slowness of GC doesn't direclty affect the total
> execution time.

> It does, see the profiling data that I sent.  The problem with GC is
> that it will flush all the caches if there's too much memory to walk
> over.  And the slowdown in GC speed is directly related to the big
> increase in the number of GCable objects.

If most of the execution is spent by GC, shouldn't such a
task run with bigger gc-cons-threshold?

But, I tried your test case of indent-region with different
gc-cons-threshold (gct), and the result shows that GC is not
the main culprit of the slowness.

emacs-22 (gct=400000):	16 to 17 sec
emacs-22 (gct=10000000):16 to 17 sec
trunk (gct=400000):	20 to 21 sec
trunk (gct=10000000):	19 to 20 sec

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-26  8:06                 ` Kenichi Handa
@ 2009-08-26 20:46                   ` Andreas Schwab
  2009-08-27  2:02                     ` Kenichi Handa
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2009-08-26 20:46 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Dan Nicolaescu, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> Is it possible to put the whole char-table in pure space,
> then modify some part at running time?

Only by copying it.  If you try to modify it you'll get a pure write
error.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-26 20:46                   ` Andreas Schwab
@ 2009-08-27  2:02                     ` Kenichi Handa
  0 siblings, 0 replies; 17+ messages in thread
From: Kenichi Handa @ 2009-08-27  2:02 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: dann, emacs-devel

In article <m2fxbecngn.fsf@igel.home>, Andreas Schwab <schwab@linux-m68k.org> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > Is it possible to put the whole char-table in pure space,
> > then modify some part at running time?

> Only by copying it.  If you try to modify it you'll get a pure write
> error.

If it means that we must deep-copy the whole char-table to
modify only some part of it, I'd like to find some other
way.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: profiling emacs-23.1 vs emacs-22.3
  2009-08-26  6:01             ` Kenichi Handa
  2009-08-26  6:33               ` Dan Nicolaescu
@ 2009-08-27  6:27               ` Kenichi Handa
  1 sibling, 0 replies; 17+ messages in thread
From: Kenichi Handa @ 2009-08-27  6:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: dann, emacs-devel

In article <E1MgBZt-0006U4-9O@etlken>, Kenichi Handa <handa@m17n.org> writes:

> Some CJK charsets are necessary to create the default
> fontset.  I think it's possible to avoid loading unnecessary
> charsets for creating the default fontset.  I'll work on it.

I've just installed a fix for fontset.el.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-08-27  6:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-03 20:29 profiling emacs-23.1 vs emacs-22.3 Dan Nicolaescu
2009-08-04 17:10 ` Leo
2009-08-04 19:50 ` Chong Yidong
2009-08-04 19:56   ` Dan Nicolaescu
2009-08-05  7:47 ` Dan Nicolaescu
2009-08-24  6:52 ` Kenichi Handa
     [not found]   ` <200908240807.n7O87ubg024643@godzilla.ics.uci.edu>
2009-08-24 11:39     ` Kenichi Handa
2009-08-24 18:26       ` Dan Nicolaescu
2009-08-25  6:07         ` Kenichi Handa
2009-08-25 18:47           ` Dan Nicolaescu
2009-08-26  6:01             ` Kenichi Handa
2009-08-26  6:33               ` Dan Nicolaescu
2009-08-26  8:06                 ` Kenichi Handa
2009-08-26 20:46                   ` Andreas Schwab
2009-08-27  2:02                     ` Kenichi Handa
2009-08-27  6:27               ` Kenichi Handa
2009-08-24 22:18       ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).