unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Inlining calls to primitives
@ 2006-08-29  8:48 Ludovic Courtès
  2006-09-04 22:02 ` Neil Jerram
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2006-08-29  8:48 UTC (permalink / raw)


Hi,

I conducted an experiment with the evaluator consisting in implementing
"inlining" of calls to a few primitive procedures and I'd like to get
feedback about it.

"Inlining" means that `CEVAL ()' was augmented with an internal
implementation of these primitives (in the form built-in macros), and
that the call sites of these primitives are modified to use them (i.e.,
by automatically have them use the corresponding built-in macro).
Although this is not strictly R5RS-compatible, most implementations have
a similar feature (e.g., S48's `inline-values' [0]).

I made experiments with the following "inlinables": `not', `eq?',
`null?', `list?', `cons', `pair?', `car', `cdr' (thus I added a built-in
macro for each of those: `#@not', `#@eq?', etc.).  I chose a program
likely to benefit from this optimization, one that uses `every' from
`srfi-1.scm':

  (eval-enable 'inline) ;; switch me!
  (define %lst (iota 2000000))

  (use-modules (srfi srfi-1))

  (let ((start (get-internal-run-time)))
    (every number? %lst)
    (let ((end (get-internal-run-time)))
      (format #t "time: ~a~%" (- end start))))

  (exit)

Without inlining:

  $ time ./pre-inst-guile -l ,,every.scm
  time: 1282

  real    0m24.815s
  user    0m24.002s
  sys     0m0.614s

With inlining:

  $ time ./pre-inst-guile -l ,,every.scm
  time: 922

  real    0m19.714s
  user    0m19.067s
  sys     0m0.484s

So the improvement is around 28% for the time spent in `every' itself,
and it's "only" 20% for the time spent in the whole program.  Of course,
for "real" programs (e.g., the test suite), the improvement is even less
noticeable, or even close to 0.  But still, these results might be
considered a sufficiently good incentive against, say, Kevin's
C-rewriting of SRFI-1.  ;-)

The patch is actually two-fold:

  1. It adds a new file, `built-in.i', that contains declarations of
     built-in macros.  This file is then processed by `extract-imsyms.h'
     that create declarations of the corresponding immediate symbols.
     Also, `built-in-expand.h' contains macrology (inspired by Guile-VM)
     that can extract various things from `built-in.i' for use in
     `eval.c': table of built-in macro names, table of unmemoizers, and
     dispatch tables.

     When GCC is used, dispatch tables can be produced instead of
     `switch' (using GCC's labels-as-pointers extension).  This makes
     immediate-symbol dispatching slightly faster.

  2. It introduces new built-in macros (`#@null?', etc.), code to
     actually perform the inlining in `eval.c', a new option for
     `eval-options', as well as a new snarfing macro,
     `SCM_DEFINE_INLINABLE', that is used by inlinable primitives.
     The `scm_t_subr_entry' type is modified to contain information
     relative to inlinables, allowing for the inlining itself to be O(1)
     wrt. to the number of inlinable procedures; the subr and gsubr APIs
     are also extended to make use of this.

I'm attaching the files and patch (unclean) below, but this can
otherwise be fetched using Arch:

  $ tla register-archive http://www.laas.fr/~lcourtes/software/arch-2005/
  $ tla get \
    lcourtes@laas.fr--2005-libre/guile-core--eval-o1-dispatch--1.9 \
    guile-core-with-inlining

Comments and criticism welcome!

Thanks,
Ludovic.

PS: As a side-effect, the patch also makes `eq?' strictly binary (as
    required by R5RS and documented in the manual), hence the change in
    `syntax.test'.

PS2: BTW, inling `scm_ilookup ()' also has a noticeable impact!

[0] http://s48.org/1.3/manual/manual-Z-H-4.html#node_sec_2.4

\f
The patch:

--- orig/libguile/Makefile.am
+++ mod/libguile/Makefile.am
@@ -26,7 +26,7 @@
 ## Check for headers in $(srcdir)/.., so that #include
 ## <libguile/MUMBLE.h> will find MUMBLE.h in this dir when we're
 ## building.
-INCLUDES = -I.. -I$(top_srcdir)
+INCLUDES = -I.. -I$(top_srcdir) -I$(top_builddir)/libguile
 
 ETAGS_ARGS = --regex='/SCM_\(GLOBAL_\)?\(G?PROC\|G?PROC1\|SYMBOL\|VCELL\|CONST_LONG\).*\"\([^\"]\)*\"/\3/' \
    --regex='/[ \t]*SCM_[G]?DEFINE1?[ \t]*(\([^,]*\),[^,]*/\1/'
@@ -64,6 +64,10 @@
 	rm -f scmconfig.h
 	mv scmconfig.h.tmp scmconfig.h
 
+imsyms.h: $(srcdir)/built-in.i
+	$(srcdir)/extract-imsyms.sh $^ > $@
+
+
 guile_filter_doc_snarfage_SOURCES = c-tokenize.c
 
 ## Override default rule; this should be compiled for BUILD host.
@@ -146,7 +150,7 @@
 EXTRA_DOT_DOC_FILES = @EXTRA_DOT_DOC_FILES@
 
 BUILT_SOURCES = cpp_err_symbols.c cpp_sig_symbols.c libpath.h \
-    version.h scmconfig.h \
+    version.h scmconfig.h imsyms.h \
     $(DOT_X_FILES) $(EXTRA_DOT_X_FILES)
 
 EXTRA_libguile_la_SOURCES = _scm.h		\
@@ -207,12 +211,14 @@
 # and people feel like maintaining them.  For now, this is not the case.
 noinst_SCRIPTS = guile-doc-snarf guile-snarf-docs guile-func-name-check
 
-EXTRA_DIST = ChangeLog-gh ChangeLog-scm ChangeLog-threads		\
-    ChangeLog-1996-1999 ChangeLog-2000 cpp_signal.c			\
+EXTRA_DIST = ChangeLog-gh ChangeLog-scm ChangeLog-threads	\
+    ChangeLog-1996-1999 ChangeLog-2000 cpp_signal.c		\
     cpp_errno.c cpp_err_symbols.in cpp_err_symbols.c		\
-    cpp_sig_symbols.c cpp_sig_symbols.in cpp_cnvt.awk	\
-    c-tokenize.lex version.h.in \
-    scmconfig.h.top gettext.h
+    cpp_sig_symbols.c cpp_sig_symbols.in cpp_cnvt.awk		\
+    c-tokenize.lex version.h.in					\
+    scmconfig.h.top gettext.h					\
+    expand.h built-in.i extract-imsyms.sh
+
 #    $(DOT_DOC_FILES) $(EXTRA_DOT_DOC_FILES) \
 #    guile-procedures.txt guile.texi
 


--- orig/libguile/boolean.c
+++ mod/libguile/boolean.c
@@ -24,13 +24,14 @@
 #include "libguile/boolean.h"
 #include "libguile/lang.h"
 #include "libguile/tags.h"
+#include "libguile/imsyms.h"
 
 \f
 
-
-SCM_DEFINE (scm_not, "not", 1, 0, 0, 
-            (SCM x),
-            "Return @code{#t} iff @var{x} is @code{#f}, else return @code{#f}.")
+SCM_DEFINE_INLINABLE (scm_not, "not", scm_imsym_atnot, 1, 0, 0,
+		      (SCM x),
+		      "Return @code{#t} iff @var{x} is @code{#f}, "
+		      "else return @code{#f}.")
 #define FUNC_NAME s_scm_not
 {
   return scm_from_bool (scm_is_false (x) || SCM_NILP (x));


--- orig/libguile/eq.c
+++ mod/libguile/eq.c
@@ -36,6 +36,7 @@
 
 #include "libguile/validate.h"
 #include "libguile/eq.h"
+#include "libguile/imsyms.h"
 \f
 
 #ifdef HAVE_STRING_H
@@ -43,7 +44,7 @@
 #endif
 \f
 
-SCM_DEFINE1 (scm_eq_p, "eq?", scm_tc7_rpsubr,
+SCM_DEFINE_INLINABLE (scm_eq_p, "eq?", scm_imsym_ateqp, 2, 0, 0,
              (SCM x, SCM y),
 	    "Return @code{#t} if @var{x} and @var{y} are the same object,\n"
 	    "except for numbers and characters.  For example,\n"


--- orig/libguile/eval.c
+++ mod/libguile/eval.c
@@ -98,9 +98,15 @@
 static SCM unmemoize_exprs (SCM expr, SCM env);
 static SCM canonicalize_define (SCM expr);
 static SCM *scm_lookupcar1 (SCM vloc, SCM genv, int check);
-static SCM unmemoize_builtin_macro (SCM expr, SCM env);
 static void eval_letrec_inits (SCM env, SCM init_forms, SCM **init_values_eol);
 
+typedef SCM (* scm_t_unmemoizer) (const SCM expr, const SCM env);
+
+static scm_t_unmemoizer unmemoizer_table[];
+#define unmemoize_builtin_macro(_expr, _env)				\
+  (unmemoizer_table[(ISYMNUM (SCM_CAR (_expr)))] ((_expr), (_env)))
+
+
 \f
 
 /* {Syntax Errors}
@@ -402,36 +408,19 @@
 
 #define ISYMNUM(n) 		(SCM_ITAG8_DATA (n))
 
-/* This table must agree with the list of SCM_IM_ constants in tags.h */
+/* This table must agree with the list of `scm_imsym_' constants
+   defined in `imsyms.h'.  */
 static const char *const isymnames[] =
-{
-  "#@and",
-  "#@begin",
-  "#@case",
-  "#@cond",
-  "#@do",
-  "#@if",
-  "#@lambda",
-  "#@let",
-  "#@let*",
-  "#@letrec",
-  "#@or",
-  "#@quote",
-  "#@set!",
-  "#@define",
-  "#@apply",
-  "#@call-with-current-continuation",
-  "#@dispatch",
-  "#@slot-ref",
-  "#@slot-set!",
-  "#@delay",
-  "#@future",
-  "#@call-with-values",
-  "#@else",
-  "#@arrow",
-  "#@nil-cond",
-  "#@bind"
-};
+  {
+#define EXTRACT_BUILT_IN_MACRO_NAMES
+#include "built-in-expand.h"
+#include "built-in.i"
+#undef EXTRACT_BUILT_IN_MACRO_NAMES
+  };
+
+
+#include "imsyms.h"
+
 
 void
 scm_i_print_isym (SCM isym, SCM port)
@@ -634,8 +623,8 @@
  * body) into its internal form.  The internal form of a body (<expr> ...) is
  * just the body itself, but prefixed with an ISYM that denotes to what kind
  * of outer construct this body belongs: (<ISYM> <expr> ...).  A lambda body
- * starts with SCM_IM_LAMBDA, for example, a body of a let starts with
- * SCM_IM_LET, etc.
+ * starts with scm_imsym_lambda, for example, a body of a let starts with
+ * scm_imsym_let, etc.
  *
  * It is assumed that the calling expression has already made sure that the
  * body is a proper list.  */
@@ -882,7 +871,7 @@
   res = scm_call_2 (SCM_MACRO_CODE (proc), x, env);
   
   if (scm_ilength (res) <= 0)
-    res = scm_list_2 (SCM_IM_BEGIN, res);
+    res = scm_list_2 (scm_imsym_begin, res);
 
   /* njrev: Several queries here: (1) I don't see how it can be
      correct that the SCM_SETCAR 2 lines below this comment needs
@@ -923,7 +912,7 @@
     }
   else
     {
-      SCM_SETCAR (expr, SCM_IM_AND);
+      SCM_SETCAR (expr, scm_imsym_and);
       return expr;
     }
 }
@@ -947,7 +936,7 @@
    * empty clause is OK and where it is not.  */
   ASSERT_SYNTAX (scm_ilength (cdr_expr) >= 0, s_bad_expression, expr);
 
-  SCM_SETCAR (expr, SCM_IM_BEGIN);
+  SCM_SETCAR (expr, scm_imsym_begin);
   return expr;
 }
 
@@ -1008,7 +997,7 @@
 
       /* build the new clause */
       if (scm_is_eq (labels, scm_sym_else))
-        SCM_SETCAR (clause, SCM_IM_ELSE);
+        SCM_SETCAR (clause, scm_imsym_else);
 
       clauses = SCM_CDR (clauses);
     }
@@ -1021,7 +1010,7 @@
                        s_duplicate_case_label, label, expr);
     }
 
-  SCM_SETCAR (expr, SCM_IM_CASE);
+  SCM_SETCAR (expr, scm_imsym_case);
   return expr;
 }
 
@@ -1041,7 +1030,7 @@
       const SCM exprs = SCM_CDR (clause);
 
       const SCM um_exprs = unmemoize_exprs (exprs, env);
-      const SCM um_labels = (scm_is_eq (labels, SCM_IM_ELSE))
+      const SCM um_labels = (scm_is_eq (labels, scm_imsym_else))
         ? scm_sym_else
         : scm_i_finite_list_copy (labels);
       const SCM um_clause = scm_cons (um_labels, um_exprs);
@@ -1089,7 +1078,7 @@
                            s_bad_cond_clause, clause, expr);
           ASSERT_SYNTAX_2 (last_clause_p,
                            s_misplaced_else_clause, clause, expr);
-          SCM_SETCAR (clause, SCM_IM_ELSE);
+          SCM_SETCAR (clause, scm_imsym_else);
 	}
       else if (length >= 2
                && scm_is_eq (SCM_CADR (clause), scm_sym_arrow)
@@ -1097,7 +1086,7 @@
         {
           ASSERT_SYNTAX_2 (length > 2, s_missing_recipient, clause, expr);
           ASSERT_SYNTAX_2 (length == 3, s_extra_expression, clause, expr);
-          SCM_SETCAR (SCM_CDR (clause), SCM_IM_ARROW);
+          SCM_SETCAR (SCM_CDR (clause), scm_imsym_arrow);
 	}
       /* SRFI 61 extended cond */
       else if (length >= 3
@@ -1106,11 +1095,11 @@
 	{
 	  ASSERT_SYNTAX_2 (length > 3, s_missing_recipient, clause, expr);
 	  ASSERT_SYNTAX_2 (length == 4, s_extra_expression, clause, expr);
-	  SCM_SETCAR (SCM_CDDR (clause), SCM_IM_ARROW);
+	  SCM_SETCAR (SCM_CDDR (clause), scm_imsym_arrow);
 	}
     }
 
-  SCM_SETCAR (expr, SCM_IM_COND);
+  SCM_SETCAR (expr, scm_imsym_cond);
   return expr;
 }
 
@@ -1131,13 +1120,13 @@
       SCM um_sequence;
       SCM um_clause;
 
-      if (scm_is_eq (test, SCM_IM_ELSE))
+      if (scm_is_eq (test, scm_imsym_else))
         um_test = scm_sym_else;
       else
         um_test = unmemoize_expression (test, env);
 
       if (!scm_is_null (sequence) && scm_is_eq (SCM_CAR (sequence),
-					      SCM_IM_ARROW))
+					      scm_imsym_arrow))
         {
           const SCM target = SCM_CADR (sequence);
           const SCM um_target = unmemoize_expression (target, env);
@@ -1278,7 +1267,7 @@
 scm_m_delay (SCM expr, SCM env)
 {
   const SCM new_expr = memoize_as_thunk_prototype (expr, env);
-  SCM_SETCAR (new_expr, SCM_IM_DELAY);
+  SCM_SETCAR (new_expr, scm_imsym_delay);
   return new_expr;
 }
 
@@ -1363,7 +1352,7 @@
   commands = SCM_CDR (cddr_expr);
   tail = scm_cons2 (exit_clause, commands, step_forms);
   tail = scm_cons2 (init_forms, variables, tail);
-  SCM_SETCAR (expr, SCM_IM_DO);
+  SCM_SETCAR (expr, scm_imsym_do);
   SCM_SETCDR (expr, tail);
   return expr;
 }
@@ -1415,7 +1404,7 @@
   const SCM cdr_expr = SCM_CDR (expr);
   const long length = scm_ilength (cdr_expr);
   ASSERT_SYNTAX (length == 2 || length == 3, s_expression, expr);
-  SCM_SETCAR (expr, SCM_IM_IF);
+  SCM_SETCAR (expr, scm_imsym_if);
   return expr;
 }
 
@@ -1511,9 +1500,9 @@
   cddr_expr = SCM_CDR (cdr_expr);
   documentation = (length >= 3 && scm_is_string (SCM_CAR (cddr_expr)));
   body = documentation ? SCM_CDR (cddr_expr) : cddr_expr;
-  new_body = m_body (SCM_IM_LAMBDA, body);
+  new_body = m_body (scm_imsym_lambda, body);
 
-  SCM_SETCAR (expr, SCM_IM_LAMBDA);
+  SCM_SETCAR (expr, scm_imsym_lambda);
   if (documentation)
     SCM_SETCDR (cddr_expr, new_body);
   else
@@ -1614,15 +1603,15 @@
 
   {
     const SCM let_body = SCM_CDR (cddr_expr);
-    const SCM lambda_body = m_body (SCM_IM_LET, let_body);
+    const SCM lambda_body = m_body (scm_imsym_let, let_body);
     const SCM lambda_tail = scm_cons (variables, lambda_body);
     const SCM lambda_form = scm_cons_source (expr, scm_sym_lambda, lambda_tail);
 
     const SCM rvar = scm_list_1 (name);
     const SCM init = scm_list_1 (lambda_form);
-    const SCM body = m_body (SCM_IM_LET, scm_list_1 (name));
+    const SCM body = m_body (scm_imsym_let, scm_list_1 (name));
     const SCM letrec_tail = scm_cons (rvar, scm_cons (init, body));
-    const SCM letrec_form = scm_cons_source (expr, SCM_IM_LETREC, letrec_tail);
+    const SCM letrec_form = scm_cons_source (expr, scm_imsym_letrec, letrec_tail);
     return scm_cons_source (expr, letrec_form, inits);
   }
 }
@@ -1650,7 +1639,7 @@
   if (scm_is_null (bindings) || scm_is_null (SCM_CDR (bindings)))
     {
       /* Special case: no bindings or single binding => let* is faster. */
-      const SCM body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
+      const SCM body = m_body (scm_imsym_let, SCM_CDR (cdr_expr));
       return scm_m_letstar (scm_cons2 (SCM_CAR (expr), bindings, body), env);
     }
   else
@@ -1661,9 +1650,9 @@
       transform_bindings (bindings, expr, &rvariables, &inits);
 
       {
-        const SCM new_body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
+        const SCM new_body = m_body (scm_imsym_let, SCM_CDR (cdr_expr));
         const SCM new_tail = scm_cons2 (rvariables, inits, new_body);
-        SCM_SETCAR (expr, SCM_IM_LET);
+        SCM_SETCAR (expr, scm_imsym_let);
         SCM_SETCDR (expr, new_tail);
         return expr;
       }
@@ -1716,7 +1705,7 @@
   if (scm_is_null (bindings))
     {
       /* no bindings, let* is executed faster */
-      SCM body = m_body (SCM_IM_LETREC, SCM_CDR (cdr_expr));
+      SCM body = m_body (scm_imsym_letrec, SCM_CDR (cdr_expr));
       return scm_m_letstar (scm_cons2 (SCM_CAR (expr), SCM_EOL, body), env);
     }
   else
@@ -1727,8 +1716,8 @@
 
       check_bindings (bindings, expr);
       transform_bindings (bindings, expr, &rvariables, &inits);
-      new_body = m_body (SCM_IM_LETREC, SCM_CDR (cdr_expr));
-      return scm_cons2 (SCM_IM_LETREC, rvariables, scm_cons (inits, new_body));
+      new_body = m_body (scm_imsym_letrec, SCM_CDR (cdr_expr));
+      return scm_cons2 (scm_imsym_letrec, rvariables, scm_cons (inits, new_body));
     }
 }
 
@@ -1790,8 +1779,8 @@
       binding_idx = cdr_binding_idx;                    /* continue with P3 */
     }
 
-  new_body = m_body (SCM_IM_LETSTAR, SCM_CDR (cdr_expr));
-  SCM_SETCAR (expr, SCM_IM_LETSTAR);
+  new_body = m_body (scm_imsym_letstar, SCM_CDR (cdr_expr));
+  SCM_SETCAR (expr, scm_imsym_letstar);
   /* the bindings have been changed in place */
   SCM_SETCDR (cdr_expr, new_body);
   return expr;
@@ -1842,7 +1831,7 @@
     }
   else
     {
-      SCM_SETCAR (expr, SCM_IM_OR);
+      SCM_SETCAR (expr, scm_imsym_or);
       return expr;
     }
 }
@@ -1936,7 +1925,7 @@
   if (is_self_quoting_p (quotee))
     return quotee;
 
-  SCM_SETCAR (expr, SCM_IM_QUOTE);
+  SCM_SETCAR (expr, scm_imsym_quote);
   SCM_SETCDR (expr, quotee);
   return expr;
 }
@@ -1971,7 +1960,7 @@
   if (SCM_UNBNDP (new_variable))
     new_variable = variable;
 
-  SCM_SETCAR (expr, SCM_IM_SET_X);
+  SCM_SETCAR (expr, scm_imsym_set_x);
   SCM_SETCAR (cdr_expr, new_variable);
   return expr;
 }
@@ -1982,7 +1971,79 @@
   return scm_cons (scm_sym_set_x, unmemoize_exprs (SCM_CDR (expr), env));
 }
 
+static const char s_null_p[] = "null?";
+SCM_GLOBAL_SYMBOL (scm_sym_null_p, s_null_p);
+
+static SCM
+unmemoize_atnullp (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_null_p, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_pair_p[] = "pair?";
+SCM_GLOBAL_SYMBOL (scm_sym_pair_p, s_pair_p);
+
+static SCM
+unmemoize_atpairp (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_pair_p, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_list_p[] = "list?";
+SCM_GLOBAL_SYMBOL (scm_sym_list_p, s_list_p);
+
+static SCM
+unmemoize_atlistp (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_list_p, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_car[] = "car";
+SCM_GLOBAL_SYMBOL (scm_sym_car, s_car);
+
+static SCM
+unmemoize_atcar (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_car, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_cdr[] = "cdr";
+SCM_GLOBAL_SYMBOL (scm_sym_cdr, s_cdr);
+
+static SCM
+unmemoize_atcdr (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_cdr, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_cons[] = "cons";
+SCM_GLOBAL_SYMBOL (scm_sym_cons, s_cons);
+
+static SCM
+unmemoize_atcons (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_cons, unmemoize_exprs (SCM_CDR (expr), env));
+}
 
+static const char s_not[] = "not";
+SCM_GLOBAL_SYMBOL (scm_sym_not, s_not);
+
+static SCM
+unmemoize_atnot (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_not, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+static const char s_eq_p[] = "eq?";
+SCM_GLOBAL_SYMBOL (scm_sym_eq_p, s_eq_p);
+
+static SCM
+unmemoize_ateqp (const SCM expr, const SCM env)
+{
+  return scm_cons (scm_sym_eq_p, unmemoize_exprs (SCM_CDR (expr), env));
+}
+
+\f
 /* Start of the memoizers for non-R5RS builtin macros.  */
 
 
@@ -1997,7 +2058,7 @@
   ASSERT_SYNTAX (scm_ilength (cdr_expr) >= 0, s_bad_expression, expr);
   ASSERT_SYNTAX (scm_ilength (cdr_expr) == 2, s_missing_expression, expr);
 
-  SCM_SETCAR (expr, SCM_IM_APPLY);
+  SCM_SETCAR (expr, scm_imsym_apply);
   return expr;
 }
 
@@ -2057,7 +2118,7 @@
       SCM_SETCAR (variable_idx, new_variable);
     }
 
-  SCM_SETCAR (expr, SCM_IM_BIND);
+  SCM_SETCAR (expr, scm_imsym_bind);
   SCM_SETCAR (cdr_expr, scm_cons (rvariables, inits));
   return expr;
 }
@@ -2073,7 +2134,7 @@
   ASSERT_SYNTAX (scm_ilength (cdr_expr) >= 0, s_bad_expression, expr);
   ASSERT_SYNTAX (scm_ilength (cdr_expr) == 1, s_expression, expr);
 
-  SCM_SETCAR (expr, SCM_IM_CONT);
+  SCM_SETCAR (expr, scm_imsym_cont);
   return expr;
 }
 
@@ -2094,7 +2155,7 @@
   ASSERT_SYNTAX (scm_ilength (cdr_expr) >= 0, s_bad_expression, expr);
   ASSERT_SYNTAX (scm_ilength (cdr_expr) == 2, s_expression, expr);
 
-  SCM_SETCAR (expr, SCM_IM_CALL_WITH_VALUES);
+  SCM_SETCAR (expr, scm_imsym_call_with_values);
   return expr;
 }
 
@@ -2122,7 +2183,7 @@
 scm_m_future (SCM expr, SCM env)
 {
   const SCM new_expr = memoize_as_thunk_prototype (expr, env);
-  SCM_SETCAR (new_expr, SCM_IM_FUTURE);
+  SCM_SETCAR (new_expr, scm_imsym_future);
   return new_expr;
 }
 
@@ -2161,7 +2222,7 @@
 	 variable and we memoize to (set! <atom> ...).
       */
       exp_target = macroexp (target, env);
-      if (scm_is_eq (SCM_CAR (exp_target), SCM_IM_BEGIN)
+      if (scm_is_eq (SCM_CAR (exp_target), scm_imsym_begin)
 	  && !scm_is_null (SCM_CDR (exp_target))
 	  && scm_is_null (SCM_CDDR (exp_target)))
 	{
@@ -2169,7 +2230,7 @@
 	  ASSERT_SYNTAX_2 (scm_is_symbol (exp_target)
 			   || SCM_VARIABLEP (exp_target),
 			   s_bad_variable, exp_target, expr);
-	  return scm_cons (SCM_IM_SET_X, scm_cons (exp_target,
+	  return scm_cons (scm_imsym_set_x, scm_cons (exp_target,
 						   SCM_CDR (cdr_expr)));
 	}
       else
@@ -2208,7 +2269,7 @@
   slot_nr = SCM_CADR (cdr_expr);
   ASSERT_SYNTAX_2 (SCM_I_INUMP (slot_nr), s_bad_slot_number, slot_nr, expr);
 
-  SCM_SETCAR (expr, SCM_IM_SLOT_REF);
+  SCM_SETCAR (expr, scm_imsym_slot_ref);
   SCM_SETCDR (cdr_expr, slot_nr);
   return expr;
 }
@@ -2241,7 +2302,7 @@
   slot_nr = SCM_CADR (cdr_expr);
   ASSERT_SYNTAX_2 (SCM_I_INUMP (slot_nr), s_bad_slot_number, slot_nr, expr);
 
-  SCM_SETCAR (expr, SCM_IM_SLOT_SET_X);
+  SCM_SETCAR (expr, scm_imsym_slot_set_x);
   return expr;
 }
 
@@ -2275,7 +2336,7 @@
   ASSERT_SYNTAX (length >= 0, s_bad_expression, expr);
   ASSERT_SYNTAX (length >= 1 && (length % 2) == 1, s_expression, expr);
 
-  SCM_SETCAR (expr, SCM_IM_NIL_COND);
+  SCM_SETCAR (expr, scm_imsym_nil_cond);
   return expr;
 }
 
@@ -2325,7 +2386,7 @@
       /* Since the location does not contain a macro, the form is a procedure
        * application.  Replace `@fop' by `@apply' and transform the expression
        * including the `transformer-macro'.  */
-      SCM_SETCAR (expr, SCM_IM_APPLY);
+      SCM_SETCAR (expr, scm_imsym_apply);
       return expr;
     }
   else
@@ -2341,86 +2402,14 @@
 
 #endif /* SCM_ENABLE_ELISP */
 
+static scm_t_unmemoizer unmemoizer_table[] =
+  {
+#define EXTRACT_BUILT_IN_MACRO_UNMEMOIZERS
+#include "built-in-expand.h"
+#include "built-in.i"
+#undef EXTRACT_BUILT_IN_MACRO_UNMEMOIZERS
+  };
 
-static SCM
-unmemoize_builtin_macro (const SCM expr, const SCM env)
-{
-  switch (ISYMNUM (SCM_CAR (expr)))
-    {
-    case (ISYMNUM (SCM_IM_AND)):
-      return unmemoize_and (expr, env);
-
-    case (ISYMNUM (SCM_IM_BEGIN)):
-      return unmemoize_begin (expr, env);
-
-    case (ISYMNUM (SCM_IM_CASE)):
-      return unmemoize_case (expr, env);
-
-    case (ISYMNUM (SCM_IM_COND)):
-      return unmemoize_cond (expr, env);
-
-    case (ISYMNUM (SCM_IM_DELAY)):
-      return unmemoize_delay (expr, env);
-
-    case (ISYMNUM (SCM_IM_DO)):
-      return unmemoize_do (expr, env);
-
-    case (ISYMNUM (SCM_IM_IF)):
-      return unmemoize_if (expr, env);
-
-    case (ISYMNUM (SCM_IM_LAMBDA)):
-      return unmemoize_lambda (expr, env);
-
-    case (ISYMNUM (SCM_IM_LET)):
-      return unmemoize_let (expr, env);
-
-    case (ISYMNUM (SCM_IM_LETREC)):
-      return unmemoize_letrec (expr, env);
-
-    case (ISYMNUM (SCM_IM_LETSTAR)):
-      return unmemoize_letstar (expr, env);
-
-    case (ISYMNUM (SCM_IM_OR)):
-      return unmemoize_or (expr, env);
-
-    case (ISYMNUM (SCM_IM_QUOTE)):
-      return unmemoize_quote (expr, env);
-
-    case (ISYMNUM (SCM_IM_SET_X)):
-      return unmemoize_set_x (expr, env);
-
-    case (ISYMNUM (SCM_IM_APPLY)):
-      return unmemoize_apply (expr, env);
-
-    case (ISYMNUM (SCM_IM_BIND)):
-      return unmemoize_exprs (expr, env);  /* FIXME */
-
-    case (ISYMNUM (SCM_IM_CONT)):
-      return unmemoize_atcall_cc (expr, env);
-
-    case (ISYMNUM (SCM_IM_CALL_WITH_VALUES)):
-      return unmemoize_at_call_with_values (expr, env);
-
-#if 0
-    /* See futures.h for a comment why futures are not enabled.
-     */
-    case (ISYMNUM (SCM_IM_FUTURE)):
-      return unmemoize_future (expr, env);
-#endif
-
-    case (ISYMNUM (SCM_IM_SLOT_REF)):
-      return unmemoize_atslot_ref (expr, env);
-
-    case (ISYMNUM (SCM_IM_SLOT_SET_X)):
-      return unmemoize_atslot_set_x (expr, env);
-
-    case (ISYMNUM (SCM_IM_NIL_COND)):
-      return unmemoize_exprs (expr, env);  /* FIXME */
-
-    default:
-      return unmemoize_exprs (expr, env);  /* FIXME */
-    }
-}
 
 
 /* scm_i_unmemocopy_expr and scm_i_unmemocopy_body take a memoized expression
@@ -2687,7 +2676,7 @@
  * binding within the frame, and last? (which is extracted from the iloc using
  * the macro SCM_ICDRP) indicates whether the binding forms the binding at the
  * very end of the improper list of bindings.  */
-SCM *
+static inline SCM *
 scm_ilookup (SCM iloc, SCM env)
 {
   unsigned int frame_nr = SCM_IFRAME (iloc);
@@ -3067,7 +3056,10 @@
 long scm_eval_stack;
 
 scm_t_option scm_eval_opts[] = {
-  { SCM_OPTION_INTEGER, "stack", 22000, "Size of thread stacks (in machine words)." }
+  { SCM_OPTION_INTEGER, "stack", 22000,
+    "Size of thread stacks (in machine words)." },
+  { SCM_OPTION_BOOLEAN, "inline", 1,
+    "Inline calls to built-in procedures like `null?', `car', etc." }
 };
 
 scm_t_option scm_debug_opts[] = {
@@ -3193,6 +3185,9 @@
  */
 
 
+/* Whether inlining of built-in procedures is enabled.  */
+#define SCM_INLINE_MODE_P   (scm_eval_opts[1].val)
+
 /* Update the toplevel environment frame ENV so that it refers to the
  * current module.  */
 #define UPDATE_TOPLEVEL_ENV(env) \
@@ -3324,9 +3319,19 @@
   SCM_TICK;
   if (SCM_ISYMP (SCM_CAR (x)))
     {
-      switch (ISYMNUM (SCM_CAR (x)))
+#ifdef SCM_USE_DISPATCH_TABLES
+      static void *BUILT_IN_MACRO_DISPATCH_TABLE (builtin_dispatch)[] =
+	{
+#define EXTRACT_BUILT_IN_MACRO_DISPATCH_TABLE
+#include "built-in-expand.h"
+#include "built-in.i"
+#undef EXTRACT_BUILT_IN_MACRO_DISPATCH_TABLE
+	};
+#endif
+
+      BUILT_IN_MACRO_DISPATCH (builtin_dispatch, SCM_CAR (x))
         {
-        case (ISYMNUM (SCM_IM_AND)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, and):
           x = SCM_CDR (x);
           while (!scm_is_null (SCM_CDR (x)))
             {
@@ -3339,7 +3344,7 @@
           PREP_APPLY (SCM_UNDEFINED, SCM_EOL);
           goto carloop;
 
-        case (ISYMNUM (SCM_IM_BEGIN)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, begin):
           x = SCM_CDR (x);
           if (scm_is_null (x))
             RETURN (SCM_UNSPECIFIED);
@@ -3409,7 +3414,7 @@
           }
 
 
-        case (ISYMNUM (SCM_IM_CASE)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, case):
           x = SCM_CDR (x);
           {
             const SCM key = EVALCAR (x, env);
@@ -3418,7 +3423,7 @@
               {
                 const SCM clause = SCM_CAR (x);
                 SCM labels = SCM_CAR (clause);
-                if (scm_is_eq (labels, SCM_IM_ELSE))
+                if (scm_is_eq (labels, scm_imsym_else))
                   {
                     x = SCM_CDR (clause);
                     PREP_APPLY (SCM_UNDEFINED, SCM_EOL);
@@ -3442,12 +3447,12 @@
           RETURN (SCM_UNSPECIFIED);
 
 
-        case (ISYMNUM (SCM_IM_COND)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, cond):
           x = SCM_CDR (x);
           while (!scm_is_null (x))
             {
               const SCM clause = SCM_CAR (x);
-              if (scm_is_eq (SCM_CAR (clause), SCM_IM_ELSE))
+              if (scm_is_eq (SCM_CAR (clause), scm_imsym_else))
                 {
                   x = SCM_CDR (clause);
                   PREP_APPLY (SCM_UNDEFINED, SCM_EOL);
@@ -3459,7 +3464,7 @@
 		  /* SRFI 61 extended cond */
 		  if (!scm_is_null (SCM_CDR (clause))
 		      && !scm_is_null (SCM_CDDR (clause))
-		      && scm_is_eq (SCM_CADDR (clause), SCM_IM_ARROW))
+		      && scm_is_eq (SCM_CADDR (clause), scm_imsym_arrow))
 		    {
 		      SCM xx, guard_result;
 		      if (SCM_VALUESP (arg1))
@@ -3483,7 +3488,7 @@
                       x = SCM_CDR (clause);
                       if (scm_is_null (x))
                         RETURN (arg1);
-                      else if (!scm_is_eq (SCM_CAR (x), SCM_IM_ARROW))
+                      else if (!scm_is_eq (SCM_CAR (x), scm_imsym_arrow))
                         {
                           PREP_APPLY (SCM_UNDEFINED, SCM_EOL);
                           goto begin;
@@ -3503,7 +3508,7 @@
           RETURN (SCM_UNSPECIFIED);
 
 
-        case (ISYMNUM (SCM_IM_DO)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, do):
           x = SCM_CDR (x);
           {
             /* Compute the initialization values and the initial environment.  */
@@ -3574,7 +3579,7 @@
           goto nontoplevel_begin;
 
 
-        case (ISYMNUM (SCM_IM_IF)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, if):
           x = SCM_CDR (x);
           {
             SCM test_result = EVALCAR (x, env);
@@ -3590,7 +3595,7 @@
           goto carloop;
 
 
-        case (ISYMNUM (SCM_IM_LET)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, let):
           x = SCM_CDR (x);
           {
             SCM init_forms = SCM_CADR (x);
@@ -3608,7 +3613,7 @@
           goto nontoplevel_begin;
 
 
-        case (ISYMNUM (SCM_IM_LETREC)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, letrec):
           x = SCM_CDR (x);
           env = SCM_EXTEND_ENV (SCM_CAR (x), undefineds, env);
           x = SCM_CDR (x);
@@ -3624,7 +3629,7 @@
           goto nontoplevel_begin;
 
 
-        case (ISYMNUM (SCM_IM_LETSTAR)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, letstar):
           x = SCM_CDR (x);
           {
             SCM bindings = SCM_CAR (x);
@@ -3645,7 +3650,7 @@
           goto nontoplevel_begin;
 
 
-        case (ISYMNUM (SCM_IM_OR)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, or):
           x = SCM_CDR (x);
           while (!scm_is_null (SCM_CDR (x)))
             {
@@ -3659,15 +3664,15 @@
           goto carloop;
 
 
-        case (ISYMNUM (SCM_IM_LAMBDA)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, lambda):
           RETURN (scm_closure (SCM_CDR (x), env));
 
 
-        case (ISYMNUM (SCM_IM_QUOTE)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, quote):
           RETURN (SCM_CDR (x));
 
 
-        case (ISYMNUM (SCM_IM_SET_X)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, set_x):
           x = SCM_CDR (x);
           {
             SCM *location;
@@ -3689,7 +3694,7 @@
           RETURN (SCM_UNSPECIFIED);
 
 
-	case (ISYMNUM (SCM_IM_APPLY)):
+        BUILT_IN_MACRO_CASE (builtin_dispatch, apply):
           /* Evaluate the procedure to be applied.  */
 	  x = SCM_CDR (x);
 	  proc = EVALCAR (x, env);
@@ -3740,7 +3745,7 @@
 	    }
 
 
-	case (ISYMNUM (SCM_IM_CONT)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, cont):
 	  {
 	    int first;
 	    SCM val = scm_make_continuation (&first);
@@ -3759,19 +3764,19 @@
 	  }
 
 
-	case (ISYMNUM (SCM_IM_DELAY)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, delay):
 	  RETURN (scm_makprom (scm_closure (SCM_CDR (x), env)));
 
 #if 0
 	  /* See futures.h for a comment why futures are not enabled.
 	   */
-	case (ISYMNUM (SCM_IM_FUTURE)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, future):
 	  RETURN (scm_i_make_future (scm_closure (SCM_CDR (x), env)));
 #endif
 
-	  /* PLACEHOLDER for case (ISYMNUM (SCM_IM_DISPATCH)): The following
+	  /* PLACEHOLDER for case (ISYMNUM (scm_imsym_dispatch)): The following
 	     code (type_dispatch) is intended to be the tail of the case
-	     clause for the internal macro SCM_IM_DISPATCH.  Please don't
+	     clause for the internal macro scm_imsym_dispatch.  Please don't
 	     remove it from this location without discussing it with Mikael
 	     <djurfeldt@nada.kth.se>  */
 	  
@@ -3904,7 +3909,7 @@
 	  }
 
 
-	case (ISYMNUM (SCM_IM_SLOT_REF)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, slot_ref):
 	  x = SCM_CDR (x);
 	  {
 	    SCM instance = EVALCAR (x, env);
@@ -3913,7 +3918,7 @@
 	  }
 
 
-	case (ISYMNUM (SCM_IM_SLOT_SET_X)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, slot_set_x):
 	  x = SCM_CDR (x);
 	  {
 	    SCM instance = EVALCAR (x, env);
@@ -3925,8 +3930,8 @@
 
 
 #if SCM_ENABLE_ELISP
-	  
-	case (ISYMNUM (SCM_IM_NIL_COND)):
+
+	BUILT_IN_MACRO_CASE (builtin_dispatch, nil_cond):
 	  {
 	    SCM test_form = SCM_CDR (x);
 	    x = SCM_CDR (test_form);
@@ -3954,7 +3959,7 @@
 
 #endif /* SCM_ENABLE_ELISP */
 
-	case (ISYMNUM (SCM_IM_BIND)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, bind):
 	  {
 	    SCM vars, exps, vals;
 
@@ -3986,7 +3991,7 @@
 	  }
 
 
-	case (ISYMNUM (SCM_IM_CALL_WITH_VALUES)):
+	BUILT_IN_MACRO_CASE (builtin_dispatch, call_with_values):
 	  {
             SCM producer;
 
@@ -4009,9 +4014,74 @@
             goto apply_proc;
 	  }
 
+      /* The inlinables.  */
 
-	default:
-	  break;
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atnullp):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  RETURN (scm_from_bool (SCM_NULL_OR_NIL_P (operand)));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atpairp):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  RETURN (scm_from_bool (scm_is_pair (operand)));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atlistp):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  RETURN (scm_from_bool (scm_ilength (operand) >= 0));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atcar):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  if (!scm_is_pair (operand))
+	    scm_wrong_type_arg ("car", 1, operand);
+	  RETURN (SCM_CAR (operand));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atcdr):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  if (!scm_is_pair (operand))
+	    scm_wrong_type_arg ("cdr", 1, operand);
+	  RETURN (SCM_CDR (operand));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atcons):
+	{
+	  SCM op1, op2;
+	  op1 = EVALCAR (SCM_CDR (x), env);
+	  op2 = EVALCAR (SCM_CDDR (x), env);
+	  RETURN (scm_cell (SCM_UNPACK (op1), SCM_UNPACK (op2)));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, atnot):
+	{
+	  SCM operand = EVALCAR (SCM_CDR (x), env);
+	  RETURN (scm_from_bool (scm_is_false (operand)
+				 || SCM_NILP (operand)));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, ateqp):
+	{
+	  SCM op1, op2;
+	  op1 = EVALCAR (SCM_CDR (x), env);
+	  op2 = EVALCAR (SCM_CDDR (x), env);
+	  RETURN (scm_from_bool (scm_is_eq (op1, op2)));
+	}
+
+      BUILT_IN_MACRO_CASE (builtin_dispatch, arrow):
+      BUILT_IN_MACRO_CASE (builtin_dispatch, else):
+      BUILT_IN_MACRO_CASE (builtin_dispatch, dispatch):
+      BUILT_IN_MACRO_CASE (builtin_dispatch, define):
+	/* Nothing to do.  */
+	{
+	  /* XXX: Avoiding the ``label at end of compound statement'' error. */
+	  (void)0;
+	}
 	}
     }
   else
@@ -4032,7 +4102,56 @@
 		/* we have lost the race, start again. */
 		goto dispatch;
 	      }
+
 	    proc = *location;
+
+	    if ((SCM_INLINE_MODE_P) && (SCM_VARIABLEP (SCM_CAR (x))))
+	      {
+		/* Only procedures from the top-level are subject to
+		   inlining.  */
+		long subr_type = SCM_TYP7 (proc);
+
+		switch (subr_type)
+		  {
+		  case scm_tcs_subrs:
+		    if (SCM_SUBR_INLINABLEP (proc))
+		      {
+			/* If PROC is inlinable, then inline it by replacing
+			   it with the corresponding immediate symbol and
+			   jump back to `dispatch'.  */
+#if 0
+			printf ("inlining call to `%s'\n",
+				scm_i_symbol_chars (SCM_SNAME (proc)));
+#endif
+
+			/* Check the number of args passed.  */
+			switch (subr_type)
+			  {
+			  case scm_tc7_subr_0:
+			    if (!scm_is_null (SCM_CDR (x)))
+			      scm_wrong_num_args (proc);
+			    break;
+			  case scm_tc7_subr_1:
+			    if ((scm_is_null (SCM_CDR (x)))
+				|| (!scm_is_null (SCM_CDDR (x))))
+			      scm_wrong_num_args (proc);
+			    break;
+			  case scm_tc7_subr_2:
+			    if ((scm_is_null (SCM_CDR (x)))
+				|| (scm_is_null (SCM_CDDR (x)))
+				|| (!scm_is_null (SCM_CDDDR (x))))
+			      scm_wrong_num_args (proc);
+			    break;
+			  default:
+			    /* Unsupported argument count.  */
+			    abort ();
+			  }
+
+			SCM_SETCAR (x, SCM_INLINABLE_SUBR_IMSYM (proc));
+			goto dispatch;
+		      }
+		  }
+	      }
 	  }
 
 	  if (SCM_MACROP (proc))
@@ -4055,7 +4174,7 @@
 		case 3:
 		case 2:
 		  if (!scm_is_pair (arg1))
-		    arg1 = scm_list_2 (SCM_IM_BEGIN, arg1);
+		    arg1 = scm_list_2 (scm_imsym_begin, arg1);
 
                   assert (!scm_is_eq (x, SCM_CAR (arg1))
                           && !scm_is_eq (x, SCM_CDR (arg1)));
@@ -4644,40 +4763,6 @@
 
 \f
 
-/* Simple procedure calls
- */
-
-SCM
-scm_call_0 (SCM proc)
-{
-  return scm_apply (proc, SCM_EOL, SCM_EOL);
-}
-
-SCM
-scm_call_1 (SCM proc, SCM arg1)
-{
-  return scm_apply (proc, arg1, scm_listofnull);
-}
-
-SCM
-scm_call_2 (SCM proc, SCM arg1, SCM arg2)
-{
-  return scm_apply (proc, arg1, scm_cons (arg2, scm_listofnull));
-}
-
-SCM
-scm_call_3 (SCM proc, SCM arg1, SCM arg2, SCM arg3)
-{
-  return scm_apply (proc, arg1, scm_cons2 (arg2, arg3, scm_listofnull));
-}
-
-SCM
-scm_call_4 (SCM proc, SCM arg1, SCM arg2, SCM arg3, SCM arg4)
-{
-  return scm_apply (proc, arg1, scm_cons2 (arg2, arg3,
-					   scm_cons (arg4, scm_listofnull)));
-}
-
 /* Simple procedure applies
  */
 


--- orig/libguile/eval.h
+++ mod/libguile/eval.h
@@ -3,7 +3,7 @@
 #ifndef SCM_EVAL_H
 #define SCM_EVAL_H
 
-/* Copyright (C) 1995,1996,1998,1999,2000,2001,2002,2003,2004
+/* Copyright (C) 1995,1996,1998,1999,2000,2001,2002,2003,2004,2006
  * Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
@@ -27,6 +27,13 @@
 
 #include "libguile/struct.h"
 
+/* Headers needed by the inline functions below.  */
+#include "libguile/debug.h"
+#include "libguile/gc.h"
+#include "libguile/pairs.h"
+
+
+
 \f
 
 /* {Options}
@@ -35,7 +42,7 @@
 SCM_API scm_t_option scm_eval_opts[];
 
 #define SCM_EVAL_STACK         scm_eval_opts[0].val
-#define SCM_N_EVAL_OPTIONS 1
+#define SCM_N_EVAL_OPTIONS 2
 
 SCM_API long scm_eval_stack;
 
@@ -123,9 +130,9 @@
 SCM_API SCM scm_sym_set_x;
 SCM_API SCM scm_sym_args;
 
+
 \f
 
-SCM_API SCM * scm_ilookup (SCM iloc, SCM env);
 SCM_API SCM * scm_lookupcar (SCM vloc, SCM genv, int check);
 SCM_API SCM scm_eval_car (SCM pair, SCM env);
 SCM_API SCM scm_eval_body (SCM code, SCM env);
@@ -162,11 +169,6 @@
 SCM_API SCM scm_m_atdispatch (SCM xorig, SCM env);
 SCM_API SCM scm_m_at_call_with_values (SCM xorig, SCM env);
 SCM_API int scm_badargsp (SCM formals, SCM args);
-SCM_API SCM scm_call_0 (SCM proc);
-SCM_API SCM scm_call_1 (SCM proc, SCM arg1);
-SCM_API SCM scm_call_2 (SCM proc, SCM arg1, SCM arg2);
-SCM_API SCM scm_call_3 (SCM proc, SCM arg1, SCM arg2, SCM arg3);
-SCM_API SCM scm_call_4 (SCM proc, SCM arg1, SCM arg2, SCM arg3, SCM arg4);
 SCM_API SCM scm_apply_0 (SCM proc, SCM args);
 SCM_API SCM scm_apply_1 (SCM proc, SCM arg1, SCM args);
 SCM_API SCM scm_apply_2 (SCM proc, SCM arg1, SCM arg2, SCM args);
@@ -199,7 +201,62 @@
 SCM_API SCM scm_i_unmemocopy_body (SCM forms, SCM env);
 SCM_API void scm_init_eval (void);
 
+\f
+/* Simple procedure calls.  */
 
+
+static SCM_C_INLINE SCM
+scm_call_0 (SCM _proc)
+{
+  return (scm_debug_mode_p
+	  ? (scm_apply ((_proc), SCM_EOL, SCM_EOL))
+	  : ((SCM_CELL_TYPE (_proc) == scm_tc7_subr_0)
+	     ? (SCM_SUBRF (_proc) ())
+	     : scm_apply ((_proc), SCM_EOL, SCM_EOL)));
+}
+
+static SCM_C_INLINE SCM
+scm_call_1 (SCM _proc, SCM _arg)
+{
+  return (scm_debug_mode_p
+	  ? (scm_apply ((_proc), scm_cons ((_arg), SCM_EOL), SCM_EOL))
+	  : ((SCM_CELL_TYPE (_proc) == scm_tc7_subr_1)
+	     ? (SCM_SUBRF (_proc) (_arg))
+	     : scm_apply ((_proc), scm_cons ((_arg), SCM_EOL), SCM_EOL)));
+}
+
+static SCM_C_INLINE SCM
+scm_call_2 (SCM _proc, SCM _a1, SCM _a2)
+{
+  return (scm_debug_mode_p
+	  ? (scm_apply ((_proc), (_a1), scm_cons ((_a2), scm_listofnull)))
+	  : ((SCM_CELL_TYPE (_proc) == scm_tc7_subr_2)
+	     ? (SCM_SUBRF (_proc) ((_a1), (_a2)))
+	     : scm_apply ((_proc), (_a1), scm_cons ((_a2), scm_listofnull))));
+}
+
+static SCM_C_INLINE SCM
+scm_call_3 (SCM _proc, SCM _a1, SCM _a2, SCM _a3)
+{
+  return (scm_debug_mode_p
+	  ? (scm_apply ((_proc), (_a1),
+			scm_cons2 ((_a2), (_a3), scm_listofnull)))
+	  : ((SCM_CELL_TYPE (_proc) == scm_tc7_subr_3)
+	     ? (SCM_SUBRF (_proc) ((_a1), (_a2), (_a3)))
+	     : scm_apply ((_proc), (_a1),
+			  scm_cons2 ((_a2), (_a3), scm_listofnull))));
+}
+
+static SCM_C_INLINE SCM
+scm_call_4 (SCM _proc, SCM _a1, SCM _a2, SCM _a3, SCM _a4)
+{
+  return (scm_apply ((_proc), (_a1),
+		     scm_cons2 ((_a2), (_a3),
+				scm_cons ((_a4), scm_listofnull))));
+}
+
+
+\f
 #if (SCM_ENABLE_DEPRECATED == 1)
 
 SCM_API SCM scm_m_undefine (SCM x, SCM env);


--- orig/libguile/goops.c
+++ mod/libguile/goops.c
@@ -52,6 +52,11 @@
 #include "libguile/validate.h"
 #include "libguile/goops.h"
 
+
+/* Import definitions of the evaluator's immediate symbols.  */
+#include "imsyms.h"
+
+
 #define SPEC_OF(x)  SCM_SLOT (x, scm_si_specializers)
 
 #define DEFVAR(v, val) \
@@ -1152,7 +1157,7 @@
 	    "Internal GOOPS magic---don't use this function!")
 #define FUNC_NAME s_scm_sys_tag_body
 {
-  return scm_cons (SCM_IM_LAMBDA, body);
+  return scm_cons (scm_imsym_lambda, body);
 }
 #undef FUNC_NAME
 
@@ -1762,7 +1767,7 @@
 SCM
 scm_make_method_cache (SCM gf)
 {
-  return scm_list_5 (SCM_IM_DISPATCH,
+  return scm_list_5 (scm_imsym_dispatch,
 		     scm_sym_args,
 		     scm_from_int (1),
 		     scm_c_make_vector (SCM_INITIAL_MCACHE_SIZE,


--- orig/libguile/gsubr.c
+++ mod/libguile/gsubr.c
@@ -38,74 +38,120 @@
 SCM scm_f_gsubr_apply;
 
 static SCM
-create_gsubr (int define, const char *name,
-	      int req, int opt, int rst, SCM (*fcn)())
+create_gsubr (int define, int inlinable, const char *name,
+	      int req, int opt, int rst, SCM (*fcn)(), SCM imsym)
 {
   SCM subr;
 
-  switch (SCM_GSUBR_MAKTYPE (req, opt, rst))
+  if (inlinable)
     {
-    case SCM_GSUBR_MAKTYPE(0, 0, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_0, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(1, 0, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_1, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(0, 1, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_1o, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(1, 1, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_2o, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(2, 0, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_2, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(3, 0, 0):
-      subr = scm_c_make_subr (name, scm_tc7_subr_3, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(0, 0, 1):
-      subr = scm_c_make_subr (name, scm_tc7_lsubr, fcn);
-      goto create_subr;
-    case SCM_GSUBR_MAKTYPE(2, 0, 1):
-      subr = scm_c_make_subr (name, scm_tc7_lsubr_2, fcn);
-    create_subr:
-      if (define)
-	scm_define (SCM_SUBR_ENTRY(subr).name, subr);
-      return subr;
-    default:
-      {
-	SCM cclo = scm_makcclo (scm_f_gsubr_apply, 3L);
-	SCM subr = scm_c_make_subr (name, scm_tc7_subr_0, fcn);
-	SCM sym = SCM_SUBR_ENTRY(subr).name;
-	if (SCM_GSUBR_MAX < req + opt + rst)
+      if ((opt != 0) || (rst != 0))
+	/* The common inlinables don't take optional/rest arguments.  */
+	abort ();
+
+      switch (SCM_GSUBR_MAKTYPE (req, opt, rst))
+	{
+	case SCM_GSUBR_MAKTYPE(0, 0, 0):
+	  subr = scm_c_make_inlinable_subr (name, scm_tc7_subr_0,
+					    fcn, imsym);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(1, 0, 0):
+	  subr = scm_c_make_inlinable_subr (name, scm_tc7_subr_1,
+					    fcn, imsym);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(2, 0, 0):
+	  subr = scm_c_make_inlinable_subr (name, scm_tc7_subr_2,
+					    fcn, imsym);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(3, 0, 0):
+	  subr = scm_c_make_inlinable_subr (name, scm_tc7_subr_3,
+					    fcn, imsym);
+	  goto create_subr;
+
+	default:
+	  abort ();
+	}
+    }
+  else
+    {
+      switch (SCM_GSUBR_MAKTYPE (req, opt, rst))
+	{
+	case SCM_GSUBR_MAKTYPE(0, 0, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_0, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(1, 0, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_1, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(0, 1, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_1o, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(1, 1, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_2o, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(2, 0, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_2, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(3, 0, 0):
+	  subr = scm_c_make_subr (name, scm_tc7_subr_3, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(0, 0, 1):
+	  subr = scm_c_make_subr (name, scm_tc7_lsubr, fcn);
+	  goto create_subr;
+	case SCM_GSUBR_MAKTYPE(2, 0, 1):
+	  subr = scm_c_make_subr (name, scm_tc7_lsubr_2, fcn);
+	create_subr:
+	  if (define)
+	    scm_define (SCM_SUBR_ENTRY(subr).name, subr);
+	  return subr;
+	default:
 	  {
-            fprintf (stderr,
-                     "ERROR in scm_c_make_gsubr: too many args (%d) to %s\n",
-                     req + opt + rst, name);
-	    exit (1);
+	    SCM cclo = scm_makcclo (scm_f_gsubr_apply, 3L);
+	    SCM subr = scm_c_make_subr (name, scm_tc7_subr_0, fcn);
+	    SCM sym = SCM_SUBR_ENTRY(subr).name;
+	    if (SCM_GSUBR_MAX < req + opt + rst)
+	      {
+		fprintf (stderr,
+			 "ERROR in scm_c_make_gsubr: too many args (%d) to %s\n",
+			 req + opt + rst, name);
+		exit (1);
+	      }
+	    SCM_SET_GSUBR_PROC (cclo, subr);
+	    SCM_SET_GSUBR_TYPE (cclo,
+				scm_from_int (SCM_GSUBR_MAKTYPE (req, opt, rst)));
+	    if (SCM_REC_PROCNAMES_P)
+	      scm_set_procedure_property_x (cclo, scm_sym_name, sym);
+	    if (define)
+	      scm_define (sym, cclo);
+	    return cclo;
 	  }
-	SCM_SET_GSUBR_PROC (cclo, subr);
-	SCM_SET_GSUBR_TYPE (cclo,
-			    scm_from_int (SCM_GSUBR_MAKTYPE (req, opt, rst)));
-	if (SCM_REC_PROCNAMES_P)
-	  scm_set_procedure_property_x (cclo, scm_sym_name, sym);
-	if (define)
-	  scm_define (sym, cclo);
-      return cclo;
-      }
+	}
     }
 }
 
 SCM
 scm_c_make_gsubr (const char *name, int req, int opt, int rst, SCM (*fcn)())
 {
-  return create_gsubr (0, name, req, opt, rst, fcn);
+  return create_gsubr (0, 0, name, req, opt, rst, fcn, SCM_BOOL_F);
 }
 
 SCM
 scm_c_define_gsubr (const char *name, int req, int opt, int rst, SCM (*fcn)())
 {
-  return create_gsubr (1, name, req, opt, rst, fcn);
+  return create_gsubr (1, 0, name, req, opt, rst, fcn, SCM_BOOL_F);
+}
+
+SCM
+scm_c_make_inlinable_gsubr (const char *name, int req, int opt, int rst,
+			    SCM (*fcn) (), SCM imsym)
+{
+  return create_gsubr (1, 1, name, req, opt, rst, fcn, imsym);
+}
+
+SCM
+scm_c_define_inlinable_gsubr (const char *name, int req, int opt, int rst,
+			      SCM (*fcn) (), SCM imsym)
+{
+  return create_gsubr (1, 1, name, req, opt, rst, fcn, imsym);
 }
 
 static SCM


--- orig/libguile/gsubr.h
+++ mod/libguile/gsubr.h
@@ -41,6 +41,10 @@
 
 SCM_API SCM scm_c_make_gsubr (const char *name, 
 			      int req, int opt, int rst, SCM (*fcn) ());
+SCM_API SCM scm_c_make_inlinable_gsubr (const char *name,
+					int req, int opt, int rst,
+					SCM (*fcn) (),
+					SCM imsym);
 SCM_API SCM scm_c_make_gsubr_with_generic (const char *name,
 					   int req, int opt, int rst,
 					   SCM (*fcn) (), SCM *gf);
@@ -50,6 +54,11 @@
 					     int req, int opt, int rst,
 					     SCM (*fcn) (), SCM *gf);
 
+SCM_API SCM scm_c_define_inlinable_gsubr (const char *name,
+					  int req, int opt, int rst,
+					  SCM (*fcn) (),
+					  SCM imsym);
+
 SCM_API SCM scm_gsubr_apply (SCM args);
 SCM_API void scm_init_gsubr (void);
 


--- orig/libguile/list.c
+++ mod/libguile/list.c
@@ -26,6 +26,8 @@
 #include "libguile/list.h"
 #include "libguile/eval.h"
 
+#include "libguile/imsyms.h"
+
 #include <stdarg.h>
 
 \f
@@ -150,9 +152,10 @@
 \f
 /* general questions about lists --- null?, list?, length, etc.  */
 
-SCM_DEFINE (scm_null_p, "null?", 1, 0, 0, 
-           (SCM x),
-	    "Return @code{#t} iff @var{x} is the empty list, else @code{#f}.")
+SCM_DEFINE_INLINABLE (scm_null_p, "null?", scm_imsym_atnullp, 1, 0, 0,
+		      (SCM x),
+		      "Return @code{#t} iff @var{x} is the empty list, "
+		      "else @code{#f}.")
 #define FUNC_NAME s_scm_null_p
 {
   return scm_from_bool (SCM_NULL_OR_NIL_P (x));
@@ -160,9 +163,10 @@
 #undef FUNC_NAME
 
 
-SCM_DEFINE (scm_list_p, "list?", 1, 0, 0, 
-           (SCM x),
-	    "Return @code{#t} iff @var{x} is a proper list, else @code{#f}.")
+SCM_DEFINE_INLINABLE (scm_list_p, "list?", scm_imsym_atlistp, 1, 0, 0,
+		      (SCM x),
+		      "Return @code{#t} iff @var{x} is a proper list, "
+		      "else @code{#f}.")
 #define FUNC_NAME s_scm_list_p
 {
   return scm_from_bool (scm_ilength (x) >= 0);


--- orig/libguile/load.c
+++ mod/libguile/load.c
@@ -55,6 +55,7 @@
 #define R_OK 4
 #endif
 
+
 \f
 /* Loading a file, given an absolute filename.  */
 


--- orig/libguile/pairs.c
+++ mod/libguile/pairs.c
@@ -22,6 +22,8 @@
 #include "libguile/validate.h"
 
 #include "libguile/pairs.h"
+#include "libguile/imsyms.h"
+
 
 \f
 
@@ -49,11 +51,12 @@
 
 #endif
 
-SCM_DEFINE (scm_cons, "cons", 2, 0, 0,
-	    (SCM x, SCM y),
-	    "Return a newly allocated pair whose car is @var{x} and whose\n"
-	    "cdr is @var{y}.  The pair is guaranteed to be different (in the\n"
-	    "sense of @code{eq?}) from every previously existing object.")
+SCM_DEFINE_INLINABLE (scm_cons, "cons", scm_imsym_atcons, 2, 0, 0,
+		      (SCM x, SCM y),
+		      "Return a newly allocated pair whose car is @var{x} "
+		      "and whose cdr is @var{y}.  The pair is guaranteed "
+		      "to be different (in the sense of @code{eq?}) from "
+		      "every previously existing object.")
 #define FUNC_NAME s_scm_cons
 {
   return scm_cell (SCM_UNPACK (x), SCM_UNPACK (y));
@@ -68,31 +71,37 @@
 }
 
 
-SCM_DEFINE (scm_pair_p, "pair?", 1, 0, 0, 
-            (SCM x),
-	    "Return @code{#t} if @var{x} is a pair; otherwise return\n"
-	    "@code{#f}.")
+SCM_DEFINE_INLINABLE (scm_pair_p, "pair?", scm_imsym_atpairp, 1, 0, 0,
+		      (SCM x),
+		      "Return @code{#t} if @var{x} is a pair; "
+		      "otherwise return @code{#f}.")
 #define FUNC_NAME s_scm_pair_p
 {
   return scm_from_bool (scm_is_pair (x));
 }
 #undef FUNC_NAME
 
-SCM
-scm_car (SCM pair)
+SCM_DEFINE_INLINABLE (scm_car, "car", scm_imsym_atcar, 1, 0, 0,
+		      (SCM pair),
+		      "Return the @code{car} of @var{pair}.")
+#define FUNC_NAME s_scm_car
 {
   if (!scm_is_pair (pair))
     scm_wrong_type_arg_msg (NULL, 0, pair, "pair");
   return SCM_CAR (pair);
 }
+#undef FUNC_NAME
 
-SCM
-scm_cdr (SCM pair)
+SCM_DEFINE_INLINABLE (scm_cdr, "cdr", scm_imsym_atcdr, 1, 0, 0,
+		      (SCM pair),
+		      "Return the @code{cdr} of @var{pair}.")
+#define FUNC_NAME s_scm_cdr
 {
   if (!scm_is_pair (pair))
     scm_wrong_type_arg_msg (NULL, 0, pair, "pair");
   return SCM_CDR (pair);
 }
+#undef FUNC_NAME
 
 SCM
 scm_i_chase_pairs (SCM tree, scm_t_uint32 pattern)
@@ -188,7 +197,9 @@
 {
   unsigned int subnr = 0;
 
-  for (subnr = 0; cxrs[subnr].name; subnr++)
+  /* We start after CAR and CDR which are treated as inlinable primitive
+     procedures.  */
+  for (subnr = 2; cxrs[subnr].name; subnr++)
     {
       SCM (*pattern) () = (SCM (*) ()) (scm_t_bits) cxrs[subnr].pattern;
       scm_c_define_subr (cxrs[subnr].name, scm_tc7_cxr, pattern);


--- orig/libguile/procs.c
+++ mod/libguile/procs.c
@@ -66,12 +66,36 @@
   scm_subr_table[entry].name = scm_from_locale_symbol (name);
   scm_subr_table[entry].generic = 0;
   scm_subr_table[entry].properties = SCM_EOL;
+
+  /* Information used by the inlinable mechanism.  */
+  scm_subr_table[entry].inlinable = 0;
+  scm_subr_table[entry].inlined_builtin = SCM_UNSPECIFIED;
+
   scm_subr_table_size++;
-  
+
   return z;
 }
 
 SCM
+scm_c_make_inlinable_subr (const char *name, long type,
+			   SCM (*fcn) (), SCM imsym)
+{
+  register SCM subr;
+  register long entry;
+
+  if (!SCM_ISYMP (imsym))
+    scm_wrong_type_arg (__FUNCTION__, 4, imsym);
+
+  subr = scm_c_make_subr (name, type, fcn);
+
+  entry = SCM_SUBRNUM (subr);
+  scm_subr_table[entry].inlinable = 1;
+  scm_subr_table[entry].inlined_builtin = imsym;
+
+  return subr;
+}
+
+SCM
 scm_c_define_subr (const char *name, long type, SCM (*fcn) ())
 {
   SCM subr = scm_c_make_subr (name, type, fcn);


--- orig/libguile/procs.h
+++ mod/libguile/procs.h
@@ -38,6 +38,10 @@
 				 * *generic == 0 until first method
 				 */
   SCM properties;		/* procedure properties */
+
+  int inlinable;                /* non-zero if this is an inlinable */
+  SCM inlined_builtin;          /* the immediate symbol of the corresponding
+				   built-in that is inlined in `eval.c' */
 } scm_t_subr_entry;
 
 #define SCM_SUBRNUM(subr) (SCM_CELL_WORD_0 (subr) >> 8)
@@ -51,6 +55,9 @@
 #define SCM_SUBR_PROPS(x) (SCM_SUBR_ENTRY (x).properties)
 #define SCM_SUBR_GENERIC(x) (SCM_SUBR_ENTRY (x).generic)
 
+#define SCM_SUBR_INLINABLEP(x)       (SCM_SUBR_ENTRY (x).inlinable)
+#define SCM_INLINABLE_SUBR_IMSYM(x)  (SCM_SUBR_ENTRY (x).inlined_builtin)
+
 #define SCM_CCLO_LENGTH(x) (SCM_CELL_WORD_0 (x) >> 8)
 #define SCM_MAKE_CCLO_TAG(v)  (((v) << 8) + scm_tc7_cclo)
 #define SCM_SET_CCLO_LENGTH(x, v) (SCM_SET_CELL_WORD_0 ((x), SCM_MAKE_CCLO_TAG(v)))
@@ -141,6 +148,9 @@
 SCM_API SCM scm_c_make_subr (const char *name, long type, SCM (*fcn)());
 SCM_API SCM scm_c_make_subr_with_generic (const char *name, long type,
 					  SCM (*fcn)(), SCM *gf);
+SCM_API SCM scm_c_make_inlinable_subr (const char *name, long type,
+				       SCM (*fcn) (),
+				       SCM imsym);
 SCM_API SCM scm_c_define_subr (const char *name, long type, SCM (*fcn)());
 SCM_API SCM scm_c_define_subr_with_generic (const char *name, long type,
 					    SCM (*fcn)(), SCM *gf);


--- orig/libguile/snarf.h
+++ mod/libguile/snarf.h
@@ -84,6 +84,18 @@
 )\
 SCM_SNARF_DOCS(primitive, FNAME, PRIMNAME, ARGLIST, REQ, OPT, VAR, DOCSTRING)
 
+#define SCM_DEFINE_INLINABLE(FNAME, PRIMNAME, IMSYM, REQ, OPT, VAR, ARGLIST, DOCSTRING) \
+SCM_SNARF_HERE(								      \
+static const char s_ ## FNAME [] = PRIMNAME;				      \
+SCM FNAME ARGLIST							      \
+)									      \
+SCM_SNARF_INIT(								      \
+scm_c_define_inlinable_gsubr (s_ ## FNAME, REQ, OPT, VAR,		      \
+			      (SCM_FUNC_CAST_ARBITRARY_ARGS) FNAME,	      \
+			      IMSYM);					      \
+)									      \
+SCM_SNARF_DOCS(primitive, FNAME, PRIMNAME, ARGLIST, REQ, OPT, VAR, DOCSTRING)
+
 #define SCM_PRIMITIVE_GENERIC(FNAME, PRIMNAME, REQ, OPT, VAR, ARGLIST, DOCSTRING) \
 SCM_SNARF_HERE(\
 static const char s_ ## FNAME [] = PRIMNAME; \


--- orig/libguile/tags.h
+++ mod/libguile/tags.h
@@ -542,32 +542,6 @@
 #define SCM_ISYMP(n) 		(SCM_ITAG8 (n) == scm_tc8_isym)
 #define SCM_MAKISYM(n) 		SCM_MAKE_ITAG8 ((n), scm_tc8_isym)
 
-#define SCM_IM_AND              SCM_MAKISYM (0)
-#define SCM_IM_BEGIN            SCM_MAKISYM (1)
-#define SCM_IM_CASE             SCM_MAKISYM (2)
-#define SCM_IM_COND             SCM_MAKISYM (3)
-#define SCM_IM_DO               SCM_MAKISYM (4)
-#define SCM_IM_IF               SCM_MAKISYM (5)
-#define SCM_IM_LAMBDA           SCM_MAKISYM (6)
-#define SCM_IM_LET              SCM_MAKISYM (7)
-#define SCM_IM_LETSTAR          SCM_MAKISYM (8)
-#define SCM_IM_LETREC           SCM_MAKISYM (9)
-#define SCM_IM_OR               SCM_MAKISYM (10)
-#define SCM_IM_QUOTE            SCM_MAKISYM (11)
-#define SCM_IM_SET_X            SCM_MAKISYM (12)
-#define SCM_IM_DEFINE           SCM_MAKISYM (13)
-#define SCM_IM_APPLY		SCM_MAKISYM (14)
-#define SCM_IM_CONT		SCM_MAKISYM (15)
-#define SCM_IM_DISPATCH		SCM_MAKISYM (16)
-#define SCM_IM_SLOT_REF		SCM_MAKISYM (17)
-#define SCM_IM_SLOT_SET_X	SCM_MAKISYM (18)
-#define SCM_IM_DELAY		SCM_MAKISYM (19)
-#define SCM_IM_FUTURE		SCM_MAKISYM (20)
-#define SCM_IM_CALL_WITH_VALUES SCM_MAKISYM (21)
-#define SCM_IM_ELSE             SCM_MAKISYM (22)
-#define SCM_IM_ARROW            SCM_MAKISYM (23)
-#define SCM_IM_NIL_COND         SCM_MAKISYM (24)  /* Multi-language support */
-#define SCM_IM_BIND             SCM_MAKISYM (25)  /* Multi-language support */
 
 \f
 

--- orig/test-suite/tests/syntax.test
+++ mod/test-suite/tests/syntax.test
@@ -775,9 +775,9 @@
              (define (a x) (if (= x 0) 'a (b (- x 1))))
              (define (b x) (if (= x 0) 'b (c (- x 1))))
              (define (c x) (if (= x 0) 'c (a (- x 1))))
-             (and (eq? 'a (a 0) (a 3))
-                  (eq? 'b (a 1) (a 4))
-                  (eq? 'c (a 2) (a 5))))
+             (and (eq? 'a (a 0)) (eq? 'a (a 3))
+                  (eq? 'b (a 1)) (eq? 'b (a 4))
+                  (eq? 'c (a 2)) (eq? 'c (a 5))))
           (interaction-environment)))
 
   (pass-if "internal defines with begin"
@@ -787,9 +787,9 @@
               (begin
                 (define (b x) (if (= x 0) 'b (c (- x 1)))))
               (define (c x) (if (= x 0) 'c (a (- x 1))))
-              (and (eq? 'a (a 0) (a 3))
-                   (eq? 'b (a 1) (a 4))
-                   (eq? 'c (a 2) (a 5))))
+              (and (eq? 'a (a 0)) (eq? 'a (a 3))
+                   (eq? 'b (a 1)) (eq? 'b (a 4))
+                   (eq? 'c (a 2)) (eq? 'c (a 5))))
            (interaction-environment))))
 
   (pass-if "internal defines with empty begin"
@@ -799,9 +799,9 @@
               (begin)
               (define (b x) (if (= x 0) 'b (c (- x 1))))
               (define (c x) (if (= x 0) 'c (a (- x 1))))
-              (and (eq? 'a (a 0) (a 3))
-                   (eq? 'b (a 1) (a 4))
-                   (eq? 'c (a 2) (a 5))))
+              (and (eq? 'a (a 0)) (eq? 'a (a 3))
+                   (eq? 'b (a 1)) (eq? 'b (a 4))
+                   (eq? 'c (a 2)) (eq? 'c (a 5))))
            (interaction-environment))))
 
   (pass-if "internal defines with macro application"
@@ -813,9 +813,9 @@
                 (define (a x) (if (= x 0) 'a (b (- x 1))))
                 (my-define (b x) (if (= x 0) 'b (c (- x 1))))
                 (define (c x) (if (= x 0) 'c (a (- x 1))))
-                (and (eq? 'a (a 0) (a 3))
-                     (eq? 'b (a 1) (a 4))
-                     (eq? 'c (a 2) (a 5)))))
+                (and (eq? 'a (a 0)) (eq? 'a (a 3))
+                     (eq? 'b (a 1)) (eq? 'b (a 4))
+                     (eq? 'c (a 2)) (eq? 'c (a 5)))))
            (interaction-environment))))
 
   (pass-if-exception "missing body expression"


\f
built-in.i

/* -*- C -*-

   Definitions of the various built-in macros and their unmemoizers.  This
   file is meant to be processed by `expand.h' and by `extract-imsyms.sh'.  */

SCM_DEFINE_BUILT_IN_MACRO ("#@and", and, unmemoize_and)
SCM_DEFINE_BUILT_IN_MACRO ("#@begin", begin, unmemoize_begin)
SCM_DEFINE_BUILT_IN_MACRO ("#@case", case, unmemoize_case)
SCM_DEFINE_BUILT_IN_MACRO ("#@cond", cond, unmemoize_cond)
SCM_DEFINE_BUILT_IN_MACRO ("#@do", do, unmemoize_do)
SCM_DEFINE_BUILT_IN_MACRO ("#@if", if, unmemoize_if)
SCM_DEFINE_BUILT_IN_MACRO ("#@lambda", lambda, unmemoize_lambda)
SCM_DEFINE_BUILT_IN_MACRO ("#@let", let, unmemoize_let)
SCM_DEFINE_BUILT_IN_MACRO ("#@letstar", letstar, unmemoize_letstar)
SCM_DEFINE_BUILT_IN_MACRO ("#@letrec", letrec, unmemoize_letrec)
SCM_DEFINE_BUILT_IN_MACRO ("#@or", or, unmemoize_or)
SCM_DEFINE_BUILT_IN_MACRO ("#@quote", quote, unmemoize_quote)
SCM_DEFINE_BUILT_IN_MACRO ("#@set!", set_x, unmemoize_set_x)
SCM_DEFINE_BUILT_IN_MACRO ("#@define", define, unmemoize_exprs)
SCM_DEFINE_BUILT_IN_MACRO ("#@apply", apply, unmemoize_apply)
SCM_DEFINE_BUILT_IN_MACRO ("#@call-with-current-continuation", cont, unmemoize_atcall_cc)
SCM_DEFINE_BUILT_IN_MACRO ("#@dispatch", dispatch, unmemoize_exprs)
SCM_DEFINE_BUILT_IN_MACRO ("#@slot-ref", slot_ref, unmemoize_atslot_ref)
SCM_DEFINE_BUILT_IN_MACRO ("#@slot-set!", slot_set_x, unmemoize_atslot_set_x)
SCM_DEFINE_BUILT_IN_MACRO ("#@delay", delay, unmemoize_delay)
SCM_DEFINE_BUILT_IN_MACRO ("#@call-with-values", call_with_values, unmemoize_at_call_with_values)
SCM_DEFINE_BUILT_IN_MACRO ("#@else", else, unmemoize_exprs)
SCM_DEFINE_BUILT_IN_MACRO ("#@arrow", arrow, unmemoize_exprs)
SCM_DEFINE_BUILT_IN_MACRO ("#@nil-cond", nil_cond, unmemoize_exprs)  /* Multi-language support */
SCM_DEFINE_BUILT_IN_MACRO ("#@bind", bind, unmemoize_exprs)  /* Multi-language support */
/* SCM_DEFINE_BUILT_IN_MACRO ("#@future", future, unmemoize_future) */

/* Inlineables (see `inlinable.i').  */
SCM_DEFINE_BUILT_IN_MACRO ("#@null?", atnullp, unmemoize_atnullp)
SCM_DEFINE_BUILT_IN_MACRO ("#@pair?", atpairp, unmemoize_atpairp)
SCM_DEFINE_BUILT_IN_MACRO ("#@list?", atlistp, unmemoize_atlistp)
SCM_DEFINE_BUILT_IN_MACRO ("#@car", atcar, unmemoize_atcar)
SCM_DEFINE_BUILT_IN_MACRO ("#@cdr", atcdr, unmemoize_atcdr)
SCM_DEFINE_BUILT_IN_MACRO ("#@cons", atcons, unmemoize_atcons)

SCM_DEFINE_BUILT_IN_MACRO ("#@not", atnot, unmemoize_atnot)

SCM_DEFINE_BUILT_IN_MACRO ("#@eq?", ateqp, unmemoize_ateqp)


\f
built-in-expand.h

/* Expanding built-in macro definitions from `built-in.i'.  */

#ifndef BUILT_IN_MACRO_DISPATCH_LOGIC

# ifdef __GNUC__
/* We're relying on the ``labels as pointers'' GNU C extension.  */
#  define SCM_USE_DISPATCH_TABLES
# endif

# ifdef SCM_USE_DISPATCH_TABLES
#   define BUILT_IN_MACRO_DISPATCH_TABLE(_prefix) \
      isym_jump_table_ ## _prefix
#   define BUILT_IN_MACRO_DISPATCH(_prefix, _isym)		\
      goto *((isym_jump_table_ ## _prefix)[ISYMNUM (_isym)]);
#   define BUILT_IN_MACRO_CASE(_prefix, _cname) \
      isym_handle_ ## _cname
# else
#   define BUILT_IN_MACRO_DISPATCH_TABLE(_prefix) \
      unused_isym_jump_table_ ## _prefix
#   define BUILT_IN_MACRO_DISPATCH(_prefix, _isym) \
      switch (ISYMNUM (_isym))
#   define BUILT_IN_MACRO_CASE(_prefix, _cname) \
      case ISYMNUM (scm_imsym_ ## _cname)
# endif
#endif

#undef SCM_DEFINE_BUILT_IN_MACRO

#if (defined EXTRACT_BUILT_IN_MACRO_UNMEMOIZERS)
# define SCM_DEFINE_BUILT_IN_MACRO(_name, _cname, _unmemoizer) \
    (_unmemoizer),
#elif (defined EXTRACT_BUILT_IN_MACRO_NAMES)
# define SCM_DEFINE_BUILT_IN_MACRO(_name, _cname, _unmemoizer) \
    (_name),
#elif (defined EXTRACT_BUILT_IN_MACRO_DISPATCH_TABLE)
# define SCM_DEFINE_BUILT_IN_MACRO(_name, _cname, _unmemoizer) \
    && isym_handle_ ## _cname,
#endif


\f
extract-imsyms.sh

#!/bin/sh
# Copyright (C) 2006 Free Software Foundation, Inc.
# Written by Ludovic Courtès <ludo@chbouib.org>
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

#
# Extracts built-in macro (immediate symbol) information and issue
# immediate symbol definitions (C macros) for use in `eval.c'
#

macro_count=0

grep '^SCM_DEFINE_BUILT_IN_MACRO' "$1" | \
sed -es'/^SCM_DEFINE_BUILT_IN_MACRO ("[a-z#@!?-]\+", \([A-Za-z_]\+\),.*$/\1/g' | \
while true
do
  read macro_name
  if test "x$macro_name" = "x"
  then
      break
  else
      echo "#define scm_imsym_$macro_name  SCM_MAKISYM ($macro_count)"
      macro_count="`expr $macro_count + 1`"
  fi
done


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-08-29  8:48 Inlining calls to primitives Ludovic Courtès
@ 2006-09-04 22:02 ` Neil Jerram
  2006-09-05 16:20   ` Ludovic Courtès
  2006-09-05 16:21   ` Ludovic Courtès
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Jerram @ 2006-09-04 22:02 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> Hi,
>
> I conducted an experiment with the evaluator consisting in implementing
> "inlining" of calls to a few primitive procedures and I'd like to get
> feedback about it.

Interesting piece of work.

It seems to me, though, that there are 3 things going on here.

1. Memoization of global variable references that yield one of a
   particular subset of common procedures.  (I call this part
   memoization because it seems similar to the memoization that we
   already do for syntax like let, begin, and, etc.)

2. Inlining of the code for these procedures within CEVAL.

3. Changing IM_SYMs to be dynamic instead of fixed constants, plus the
   macrology and GCC jump table stuff.

Do you know what the relative contributions of these 3 changes are?

It would be nice if most of the benefit was coming from (1), because
we could provide this in a very general way for all global variables,
and without needing the code duplication that (2) brings.  But I'd be
surprised if (1) on its own could give us 20% performance improvement,
because we already memoize from <global variable name> to <variable
object>, so the delta is just that from <variable object> to <variable
value>, which is a pointer dereference.  I'd also be surprised,
however, if (2) was a significant contribution, because I didn't think
a function call was a significant overhead these days.

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-04 22:02 ` Neil Jerram
@ 2006-09-05 16:20   ` Ludovic Courtès
  2006-09-15 10:29     ` Neil Jerram
  2006-09-05 16:21   ` Ludovic Courtès
  1 sibling, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2006-09-05 16:20 UTC (permalink / raw)
  Cc: Guile-Devel

Hi Neil,

Neil Jerram <neil@ossau.uklinux.net> writes:

> Interesting piece of work.
>
> It seems to me, though, that there are 3 things going on here.
>
> 1. Memoization of global variable references that yield one of a
>    particular subset of common procedures.  (I call this part
>    memoization because it seems similar to the memoization that we
>    already do for syntax like let, begin, and, etc.)
>
> 2. Inlining of the code for these procedures within CEVAL.
>
> 3. Changing IM_SYMs to be dynamic instead of fixed constants, plus the
>    macrology and GCC jump table stuff.
>
> Do you know what the relative contributions of these 3 changes are?

Thanks Neil for clarifying this.  The measurements you propose are
indeed a good idea and the results are not exactly as I was expecting
(which confirms that I'm not very good at predicting performance ;-)).
BTW, imsyms are not assigned dynamically: they are assigned statically
by the `extract-imsyms.sh' script.

I made a series of measurements with Guile compiled with `-pg -O0'.
Then I tried different configurations switching on and off each of these
3 features.  The first table below summarizes the execution time
improvement, looking at the execution time of `every' itself as well as
the execution time of the whole program.

			               `every'     overall
------------------------------------+----------------------
jump table vs. switch		    |    0.8%      -1.4% (worse!)
inlining in `CEVAL ()' vs. funcall  |   11.0%       4.7%

The second table shows improvement compared to the non-memoizing + jump
table version (i.e., with `(eval-disable 'inline)':

memoization + jt + inline	| 32.4%	     22.1%
memoization + switch + inline	| 31.9%	     23.2%
memoization + jt + funcall	| 24.0%	     18.3%

(Beware: I only run each test case 3 times or so so these figures should
not be considered as an ultimate benchmark!  I'm attaching the whole
results for the record.)

In short, the outcome of using a jump table is negligible in this
context (it's really a microoptimization compared to the two other
things).

Function call overhead, however, _is_ important, though only the second
source of improvement.  Repeatedly using function calls to execute a
handful of instructions is costly.  Plus it probably increases cache
misses, things like that.

Now, if we generalized the memoization thing, as you suggested, so that
any procedure could be memoized (based on user annotations), then things
may be a bit different because we would be using indirect function calls
(i.e., like `SCM (*func) () = xxx; return (func (arg));') while in my
measurements I was using immediate function calls (as in `scm_car
(op)').  I should compare indirect and immediate function calls, but I
presume that there is a slight performance difference.

Finally, memoization does indeed play an important role.  I suspect that
it's mostly because, for instance, argument count is only checked at
memoization time, and not when the "inlinable" is actually executed.
Plus the memoization code is pretty local (unlike when `CEVAL ()' has to
go through `evap0', then `evalp1', etc.).

I'm afraid this is kind of a dirty report, but I hope it sheds some
light on the issue.


Also, Rob mentioned on IRC that he was concerned about the global
switch.  I believe this can be fixed using fluids or something like that
so that inlining can be enabled/disabled on a per-module basis (as we
did with `current-reader').  But that will be the topic of another
thread maybe.  ;-)

Thanks,
Ludovic.



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-04 22:02 ` Neil Jerram
  2006-09-05 16:20   ` Ludovic Courtès
@ 2006-09-05 16:21   ` Ludovic Courtès
  2006-09-05 22:53     ` Kevin Ryde
  2006-09-06  1:20     ` Rob Browning
  1 sibling, 2 replies; 9+ messages in thread
From: Ludovic Courtès @ 2006-09-05 16:21 UTC (permalink / raw)
  Cc: Guile-Devel

[-- Attachment #1: Type: text/plain, Size: 3536 bytes --]

Hi Neil,

Neil Jerram <neil@ossau.uklinux.net> writes:

> Interesting piece of work.
>
> It seems to me, though, that there are 3 things going on here.
>
> 1. Memoization of global variable references that yield one of a
>    particular subset of common procedures.  (I call this part
>    memoization because it seems similar to the memoization that we
>    already do for syntax like let, begin, and, etc.)
>
> 2. Inlining of the code for these procedures within CEVAL.
>
> 3. Changing IM_SYMs to be dynamic instead of fixed constants, plus the
>    macrology and GCC jump table stuff.
>
> Do you know what the relative contributions of these 3 changes are?

Thanks Neil for clarifying this.  The measurements you propose are
indeed a good idea and the results are not exactly as I was expecting
(which confirms that I'm not very good at predicting performance ;-)).
BTW, imsyms are not assigned dynamically: they are assigned statically
by the `extract-imsyms.sh' script.

I made a series of measurements with Guile compiled with `-pg -O0'.
Then I tried different configurations switching on and off each of these
3 features.  The first table below summarizes the execution time
improvement, looking at the execution time of `every' itself as well as
the execution time of the whole program.

			               `every'     overall
------------------------------------+----------------------
jump table vs. switch		    |    0.8%      -1.4% (worse!)
inlining in `CEVAL ()' vs. funcall  |   11.0%       4.7%

The second table shows improvement compared to the non-memoizing + jump
table version (i.e., with `(eval-disable 'inline)':

memoization + jt + inline	| 32.4%	     22.1%
memoization + switch + inline	| 31.9%	     23.2%
memoization + jt + funcall	| 24.0%	     18.3%

(Beware: I only run each test case 3 times or so so these figures should
not be considered as an ultimate benchmark!  I'm attaching the whole
results for the record.)

In short, the outcome of using a jump table is negligible in this
context (it's really a microoptimization compared to the two other
things).

Function call overhead, however, _is_ important, though only the second
source of improvement.  Repeatedly using function calls to execute a
handful of instructions is costly.  Plus it probably increases cache
misses, things like that.

Now, if we generalized the memoization thing, as you suggested, so that
any procedure could be memoized (based on user annotations), then things
may be a bit different because we would be using indirect function calls
(i.e., like `SCM (*func) () = xxx; return (func (arg));') while in my
measurements I was using immediate function calls (as in `scm_car
(op)').  I should compare indirect and immediate function calls, but I
presume that there is a slight performance difference.

Finally, memoization does indeed play an important role.  I suspect that
it's mostly because, for instance, argument count is only checked at
memoization time, and not when the "inlinable" is actually executed.
Plus the memoization code is pretty local (unlike when `CEVAL ()' has to
go through `evap0', then `evalp1', etc.).

I'm afraid this is kind of a dirty report, but I hope it sheds some
light on the issue.


Also, Rob mentioned on IRC that he was concerned about the global
switch.  I believe this can be fixed using fluids or something like that
so that inlining can be enabled/disabled on a per-module basis (as we
did with `current-reader').  But that will be the topic of another
thread maybe.  ;-)

Thanks,
Ludovic.



[-- Attachment #2: Measurements --]
[-- Type: text/plain, Size: 4139 bytes --]

-*- Outline -*-

* Summary

			               `every'     overall
------------------------------------+----------------------
jump table vs. switch		    |    0.8%      -1.4% (worse!)
inlining in `CEVAL ()' vs. funcall  |   11.0%       4.7%


Compared to no-memoization + jump table (521 ; 10.16):

memoization + jt + inline	| 32.4%	     22.1%
memoization + switch + inline	| 31.9%	     23.2%
memoization + jt + funcall	| 24.0%	     18.3%


* memoization + jump table + inlined in `CEVAL ()'

** Raw Execution Time

$ time ./pre-inst-guile -l ,,every.scm
ready.
time taken: 352.0

real    0m7.910s
user    0m7.720s
sys     0m0.028s

** Gprof Top

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 46.95      3.54     3.54     1205     0.00     0.01  deval
 11.67      4.42     0.88    30436     0.00     0.00  scm_i_sweep_card
  7.96      5.02     0.60 16794945     0.00     0.00  scm_is_pair
  7.82      5.61     0.59  7114789     0.00     0.00  scm_cell
  7.43      6.17     0.56   185286     0.00     0.00  scm_gc_mark_dependencies
  6.50      6.66     0.49  5530481     0.00     0.00  scm_ilookup
  3.71      6.94     0.28     8737     0.00     0.00  scm_i_init_card_freelist
  1.33      7.04     0.10      723     0.00     0.00  scm_i_mark_weak_vector_non_weaks


* memoization + switch + inlined in `CEVAL ()'

** Raw Execution Time

$ time ./pre-inst-guile -l ,,every.scm
ready.
time taken: 355.0

real    0m7.803s
user    0m7.724s
sys     0m0.044s

** Gprof Top

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 47.08      3.71     3.71     1205     0.00     0.01  deval
 12.56      4.70     0.99    30436     0.00     0.00  scm_i_sweep_card
  8.50      5.37     0.67 16792815     0.00     0.00  scm_is_pair
  7.30      5.95     0.57  5530481     0.00     0.00  scm_ilookup
  7.23      6.51     0.57   185466     0.00     0.00  scm_gc_mark_dependencies
  6.60      7.04     0.52  7114789     0.00     0.00  scm_cell
  3.55      7.32     0.28     8737     0.00     0.00  scm_i_init_card_freelist
  1.02      7.39     0.08      723     0.00     0.00  scm_i_mark_weak_vector_non
_weaks


* memoization + jump table + no inlining in `CEVAL ()'

** Raw Execution Time

$ time ./pre-inst-guile -l ,,every.scm
ready.
time taken: 396.0

real    0m8.299s
user    0m8.233s
sys     0m0.032s

** Gprof Top

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 44.90      3.44     3.44     1205     0.00     0.01  deval
  9.80      4.18     0.75    30436     0.00     0.00  scm_i_sweep_card
  8.76      4.86     0.67 18293669     0.00     0.00  scm_is_pair
  8.04      5.47     0.61   185273     0.00     0.00  scm_gc_mark_dependencies
  7.32      6.03     0.56  7114789     0.00     0.00  scm_cell
  6.34      6.51     0.48  5530481     0.00     0.00  scm_ilookup
  2.61      6.71     0.20     8737     0.00     0.00  scm_i_init_card_freelist
  1.44      6.83     0.11  2519996     0.00     0.00  scm_list_1

* no memoization + jump table

** Raw Execution Time

$ time ./pre-inst-guile -l ,,every.scm
ready.
time taken: 521.0

real    0m10.163s
user    0m10.097s
sys     0m0.020s

** Gprof Top

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 42.06      4.00     4.00     1206     0.00     0.01  deval
 12.41      5.18     1.18    44288     0.00     0.00  scm_i_sweep_card
  8.62      6.00     0.82 11115394     0.00     0.00  scm_cell
  8.31      6.79     0.79   197701     0.00     0.00  scm_gc_mark_dependencies
  7.47      7.50     0.71 20362535     0.00     0.00  scm_is_pair
  7.41      8.21     0.70  5530482     0.00     0.00  scm_ilookup
  3.15      8.51     0.30    13609     0.00     0.00  scm_i_init_card_freelist
  2.10      8.71     0.20  5520322     0.00     0.00  scm_list_1

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-05 16:21   ` Ludovic Courtès
@ 2006-09-05 22:53     ` Kevin Ryde
  2006-09-06  1:20     ` Rob Browning
  1 sibling, 0 replies; 9+ messages in thread
From: Kevin Ryde @ 2006-09-05 22:53 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> I made a series of measurements with Guile compiled with `-pg -O0'.

-O0 will come out wildly different from an actual build, it stops even
the most basic optimizations -- for instance all variables are put in
actual stack slots (rather than held in registers).

-finstrument-functions is a good alternative to -pg if you want to
know what functions are using up time, though you need a separate
library and whatnot to use it.  fnccheck at

	http://sourceforge.net/projects/fnccheck

worked for me last time I tried it.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-05 16:21   ` Ludovic Courtès
  2006-09-05 22:53     ` Kevin Ryde
@ 2006-09-06  1:20     ` Rob Browning
  2006-09-06  8:21       ` Ludovic Courtès
  1 sibling, 1 reply; 9+ messages in thread
From: Rob Browning @ 2006-09-06  1:20 UTC (permalink / raw)
  Cc: Guile-Devel

ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> memoization + jt + inline	| 32.4%	     22.1%
> memoization + switch + inline	| 31.9%	     23.2%
> memoization + jt + funcall	| 24.0%	     18.3%

> Function call overhead, however, _is_ important, though only the
> second source of improvement.  Repeatedly using function calls to
> execute a handful of instructions is costly.  Plus it probably
> increases cache misses, things like that.

First off, I'd like to say that I'm glad you're looking in to these
issues.

In addition, I have some comments:

  - Just as a general concern, I would personally lean toward only
    adopting changes to the interpreter that affect language semantics
    (especially if they change things to be non-RnRS) if the changes
    are optional, and more importantly, if they produce a fairly
    substantial increase in performance.

    What's substantial?  I'm not sure, but as an example, I don't
    think I'd want to give up the ability to redefine globals and have
    all the code see those changes for say only a %5-10 performance
    increase.  In part, that's also because I'd really rather see the
    same effort put toward resurrecting a VM or Hobbit, or pursuing
    some other approach that we hope will produce much larger
    improvements.

  - With respect to cache misses, at least in the general case, I
    would expect the reverse -- that inlining will tend to bloat the
    object code, and make cache overflows more likely.  I would also
    expect that inlining may put more pressure on RAM and register
    allocation.  Of course I haven't looked carefully at what you're
    doing in this particular case yet, so I'm just speaking broadly.

> Also, Rob mentioned on IRC that he was concerned about the global
> switch.  I believe this can be fixed using fluids or something like
> that so that inlining can be enabled/disabled on a per-module basis
> (as we did with `current-reader').

Right.  If it affects the semantics of the language, then I think it
should probably be optional, so that you can just turn it on for code
that was written with the changed semantics in mind.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-06  1:20     ` Rob Browning
@ 2006-09-06  8:21       ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2006-09-06  8:21 UTC (permalink / raw)
  Cc: Guile-Devel, Neil Jerram

Hi,

Rob Browning <rlb@defaultvalue.org> writes:

> In addition, I have some comments:
>
>   - Just as a general concern, I would personally lean toward only
>     adopting changes to the interpreter that affect language semantics
>     (especially if they change things to be non-RnRS) if the changes
>     are optional, and more importantly, if they produce a fairly
>     substantial increase in performance.
>
>     What's substantial?  I'm not sure, but as an example, I don't
>     think I'd want to give up the ability to redefine globals and have
>     all the code see those changes for say only a %5-10 performance
>     increase.  In part, that's also because I'd really rather see the
>     same effort put toward resurrecting a VM or Hobbit, or pursuing
>     some other approach that we hope will produce much larger
>     improvements.

I think we agree on the general, long-term goal.

That said, this "inlining" thing is kind of a "pragmatic", incremental
approach.   Reviving Guile-VM or Hobbit may be a much longer process.

In the meantime, we find ourselves rewriting Scheme code (e.g., SRFI-1)
into C to get slight performance improvements.  If we can achieve
reasonable (although perhaps not as good) improvements by adding simple
annotations to Scheme code, then that will make things easier.

In designing those annotations, we should make sure that they are
"compiler-friendly", so that we can eventually make use of them in the
revived compiler(s).  (In hindsight, I think we did a mistake with
`current-reader' in that respect.)  This also rules out the
`eval-options' approach that I used in this prototype.

Note that C-rewriting has the same effect on semantics as this
"inlining" thing, since all the macros and C functions used in the C
implementation are not subject to redefinition.

>   - With respect to cache misses, at least in the general case, I
>     would expect the reverse -- that inlining will tend to bloat the
>     object code, and make cache overflows more likely.  I would also
>     expect that inlining may put more pressure on RAM and register
>     allocation.  Of course I haven't looked carefully at what you're
>     doing in this particular case yet, so I'm just speaking broadly.

Cachegrind would tell us more.  In any case, it turns out that the
overhead of inlining few-instruction functions like `scm_car ()' is
ridiculous compared to function call overhead.

BTW, did you try inlining `scm_ilookup ()' as I mentioned in a previous
message?  Although that function is quite large, inlining it has a
noticeable impact on performance time.

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-05 16:20   ` Ludovic Courtès
@ 2006-09-15 10:29     ` Neil Jerram
  2006-09-15 11:57       ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Jerram @ 2006-09-15 10:29 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> I made a series of measurements with Guile compiled with `-pg
> -O0'. [...]

Hi Ludovic,

This is a slightly rushed reply, I'm afraid, because I'm going away
for a week shortly.  (But I have had it in mind for a while, so
hopefully there's some sense in the following.)

My overall feeling about this, based on weighing the performance gains
against maintainability costs, is that I'd be happy for now to see the
memoization of globals in Guile core, but not the code inlining.  I
also think, however, that it would be great if we can keep all the
details of your research around, so that we can decide to formalize
other parts of it in the future, based on judgments at that time.

Therefore...

- For the memoization of globals, I'd like to see what a patch looks
  like that contains only this.  We would need some kind of
  declaration to say "memoize globals in the following code", and it
  feels obvious to me that this declaration should have lexical scope
  - in other words, the declaration affects the code that it looks
  like it affects.  I think this means that we have to use a source
  property and mark expressions as they are read .... but I leave the
  details of this to you.

  (One probably obvious point: only globals in car position should be
  memoized, of course.)

- To make sure that we don't lose your research, I suggest you create
  a new directory "performance" in the CVS "workbook" module (parallel
  to guile-core), and store your discussion results (per your emails)
  in a suitably-named file there.

Finally, can I ask whether you are doing this work because you are
trying to meet a specific performance target?  If so, what is the
target and how far away from it are you at the moment?

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inlining calls to primitives
  2006-09-15 10:29     ` Neil Jerram
@ 2006-09-15 11:57       ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2006-09-15 11:57 UTC (permalink / raw)
  Cc: Guile-Devel

Hi Neil,

Neil Jerram <neil@ossau.uklinux.net> writes:

> - For the memoization of globals, I'd like to see what a patch looks
>   like that contains only this.  We would need some kind of
>   declaration to say "memoize globals in the following code", and it
>   feels obvious to me that this declaration should have lexical scope
>   - in other words, the declaration affects the code that it looks
>   like it affects.  I think this means that we have to use a source
>   property and mark expressions as they are read .... but I leave the
>   details of this to you.

Yes, that's probably the best thing we can do, although it complicates a
bit the compilation/memoization phase (namely, instead of
`if (SCM_INLINE_MODE_P)', a more complex and potentially costly test
has to be performed).

> - To make sure that we don't lose your research, I suggest you create
>   a new directory "performance" in the CVS "workbook" module (parallel
>   to guile-core), and store your discussion results (per your emails)
>   in a suitably-named file there.

Ok, I'll try to look at it (but beware: I'm a bit lazy).

> Finally, can I ask whether you are doing this work because you are
> trying to meet a specific performance target?  If so, what is the
> target and how far away from it are you at the moment?

I do not have any specific performance target in fact: I'm just trying
to learn and play with Guile to see how it can be made faster.  :-)

My main motivation is that Guile is very slow, and I believe that this
can significantly restrict the applications where people would consider
using it.  By very slow, I mean compared to other interpreters, even
compared to sister implementations like SCM and...  Guile 1.4.  It's
also an order of magnitude slower that Bigloo's interpreter, for
instance.  One can notice this by running simple programs like loops,
etc.

Initially, after spending some time playing with Guile's GC, I managed
to (wrongfully?) convince myself that the GC may be responsible for an
important part of Guile's relative slowness.  Alas, my first experiments
with Boehm GC were not conclusive performance-wise (although this still
needs to be studied in more details).

The motivation for this inlining/memoization thing was that I believed
that calls to those small built-in procedures incurred significant
overhead (several C function calls to execute just a few instructions)
and could be quite easily avoided this way.  At the same time, I thought
it would be nice to have this kind of optimization rather than rewrite
Scheme code in C.


Thanks for your analysis and advice.  It's always pleasant to get
detailed feedback like this.

Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-09-15 11:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-29  8:48 Inlining calls to primitives Ludovic Courtès
2006-09-04 22:02 ` Neil Jerram
2006-09-05 16:20   ` Ludovic Courtès
2006-09-15 10:29     ` Neil Jerram
2006-09-15 11:57       ` Ludovic Courtès
2006-09-05 16:21   ` Ludovic Courtès
2006-09-05 22:53     ` Kevin Ryde
2006-09-06  1:20     ` Rob Browning
2006-09-06  8:21       ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).