unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()'
@ 2005-12-19 17:23 Ludovic Courtès
  2006-01-11  8:39 ` Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ludovic Courtès @ 2005-12-19 17:23 UTC (permalink / raw)


Hi,

The patch below does two things:

1.  It introduces `scm_take_locale_symbol ()'.

2.  It modifies `scm_from_locale_symbol ()' so that it doesn't create a
    Scheme string to do the job.

This second modification has a nice effect: it can significantly reduce
the number of objects created at load-time.  Unfortunately, Guile's
built-in reader always produces Scheme strings (in `scm_read_token ()')
so it cannot benefit from this optimization.

Using a slightly modified version of `guile-reader' which does not
create Scheme strings when reading a symbol, I tried to measure the
improvement compared to Guile's built-in reader.  Basically, I had Guile
load a program that defines 20000 variables[*], first with Guile's
built-in reader, then with `guile-reader'.

With Guile's built-in reader:

   %   cumulative   self              self     total           
  time   seconds   seconds    calls   s/call   s/call  name    
  20.85     17.86    17.86    41385     0.00     0.00  ceval
  14.05     29.89    12.03    49963     0.00     0.00  scm_i_sweep_card
  12.50     40.60    10.71  2191128     0.00     0.00  scm_gc_mark_dependencies
   8.49     47.87     7.27     3149     0.00     0.01  scm_i_mark_weak_vector_non_weaks
   5.50     52.58     4.72  4729157     0.00     0.00  scm_cell
   5.43     57.23     4.65  5710240     0.00     0.00  scm_gc_mark
  ...
   0.08     83.46     0.07    20687     0.00     0.00  scm_i_make_string

With `guile-reader':

   %   cumulative   self              self     total           
  time   seconds   seconds    calls   s/call   s/call  name    
  23.59     17.66    17.66    41385     0.00     0.00  ceval
  14.34     28.39    10.73    46707     0.00     0.00  scm_i_sweep_card
  11.24     36.80     8.41  1810700     0.00     0.00  scm_gc_mark_dependencies
   7.99     42.78     5.98     2712     0.00     0.01  scm_i_mark_weak_vector_non_weaks
   6.32     47.51     4.73  4729153     0.00     0.00  scm_cell
   5.75     51.81     4.31  4767765     0.00     0.00  scm_gc_mark
  ...
   0.00     74.83     0.00      687     0.00     0.00  scm_i_make_string

The timings observed are around 15 s. (w/ Guile's built-in reader)
vs. 13 s. on my 500 MHz G4.

Clearly, the mark phase is much quicker as fewer strings were created in
the second case.  Of course, it would be nice if the built-in reader
could benefit from this as well, but this requires a fair amount of
(tedious) work.

Besides, `scm_take_locale_symbol ()' could be beneficial to application
writers as well.

Thanks,
Ludovic.

[*] Produced by:

    (with-output-to-file "t.scm"
      (lambda ()
        (for-each (lambda (x)
                    (format #t "(define sym~a ~a)~%" x x))
                  (iota 20000))))


libguile:

2005-12-19  Ludovic Courtès  <ludovic.courtes@laas.fr>

	* strings.c (scm_i_take_stringbufn): New.
	(scm_i_c_take_symbol): New.
	(scm_take_locale_stringn): Use `scm_i_take_stringbufn ()'.

	* strings.h (scm_i_c_take_symbol): New.
	(scm_i_take_stringbufn): New.

	* symbols.c  (lookup_interned_symbol): New function.
	(scm_i_c_mem2symbol): New function.
	(scm_i_mem2symbol): Use `lookup_symbol ()'.
	(scm_from_locale_symbol): Use `scm_i_c_mem2symbol ()'.  This avoids
	creating a new Scheme string.
	(scm_from_locale_symboln): Likewise.
	(scm_take_locale_symbol): New.
	(scm_take_locale_symboln): New.

	* symbols.h (scm_take_locale_symbol): New.
	(scm_take_locale_symboln): New.


doc/ref:

2005-12-19  Ludovic Courtès  <ludovic.courtes@laas.fr>

	* api-data.texi (Operations Related to Symbols):
	Documented `scm_take_locale_symbol ()'.

\f
--- orig/doc/ref/api-data.texi
+++ mod/doc/ref/api-data.texi
@@ -4551,6 +4551,16 @@
 specified explicitly by @var{len}.
 @end deffn
 
+@deftypefn  {C Function} SCM scm_take_locale_symbol (char *str)
+@deftypefnx {C Function} SCM scm_take_locale_symboln (char *str, size_t len)
+Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln},
+respectively, but also frees @var{str} with @code{free} eventually.
+Thus, you can use this function when you would free @var{str} anyway
+immediately after creating the Scheme string.  In certain cases, Guile
+can then use @var{str} directly as its internal representation.
+@end deftypefn
+
+
 Finally, some applications, especially those that generate new Scheme
 code dynamically, need to generate symbols for use in the generated
 code.  The @code{gensym} primitive meets this need:


--- orig/libguile/strings.c
+++ mod/libguile/strings.c
@@ -122,6 +122,17 @@
     }
 }
 
+/* Return a new stringbuf whose underlying storage consists of the LEN octets
+   pointed to by STR.  */
+SCM_C_INLINE SCM
+scm_i_take_stringbufn (char *str, size_t len)
+{
+  scm_gc_register_collectable_memory (str, len, "stringbuf");
+
+  return scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
+			  (scm_t_bits) len, (scm_t_bits) 0);
+}
+
 SCM
 scm_i_stringbuf_mark (SCM buf)
 {
@@ -412,6 +423,29 @@
 			  (scm_t_bits) hash, SCM_UNPACK (props));
 }
 
+SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+		     scm_t_bits flags, unsigned long hash, SCM props)
+{
+  SCM buf = make_stringbuf (len);
+  memcpy (STRINGBUF_CHARS (buf), name, len);
+
+  return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+			  (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
+/* Return a new symbol that uses the LEN bytes pointed to by NAME as its
+   underlying storage.  */
+SCM
+scm_i_c_take_symbol (char *name, size_t len,
+		     scm_t_bits flags, unsigned long hash, SCM props)
+{
+  SCM buf = scm_i_take_stringbufn (name, len);
+
+  return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+			  (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
 size_t
 scm_i_symbol_length (SCM sym)
 {
@@ -842,12 +876,10 @@
       str[len] = '\0';
     }
 
-  buf = scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
-                         (scm_t_bits) len, (scm_t_bits) 0);
+  buf = scm_i_take_stringbufn (str, len);
   res = scm_double_cell (STRING_TAG,
                          SCM_UNPACK (buf),
                          (scm_t_bits) 0, (scm_t_bits) len);
-  scm_gc_register_collectable_memory (str, len+1, "string");
   return res;
 }
 


--- orig/libguile/strings.h
+++ mod/libguile/strings.h
@@ -124,6 +124,12 @@
 
 SCM_API SCM scm_i_make_symbol (SCM name, scm_t_bits flags, 
 			       unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+		     scm_t_bits flags, unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_take_symbol (char *name, size_t len,
+		     scm_t_bits flags, unsigned long hash, SCM props);
 SCM_API const char *scm_i_symbol_chars (SCM sym);
 SCM_API size_t scm_i_symbol_length (SCM sym);
 SCM_API SCM scm_i_symbol_substring (SCM sym, size_t start, size_t end);
@@ -144,6 +150,7 @@
 SCM_API void scm_i_get_substring_spec (size_t len,
 				       SCM start, size_t *cstart,
 				       SCM end, size_t *cend);
+SCM_API SCM scm_i_take_stringbufn (char *str, size_t len);
 
 /* deprecated stuff */
 


--- orig/libguile/symbols.c
+++ mod/libguile/symbols.c
@@ -85,43 +85,79 @@
 }
 
 static SCM
-scm_i_mem2symbol (SCM str)
+lookup_interned_symbol (const char *name, size_t len,
+			unsigned long raw_hash)
 {
-  const char *name = scm_i_string_chars (str);
-  size_t len = scm_i_string_length (str);
+  /* Try to find the symbol in the symbols table */
+  SCM l;
+  unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+  for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
+       !scm_is_null (l);
+       l = SCM_CDR (l))
+    {
+      SCM sym = SCM_CAAR (l);
+      if (scm_i_symbol_hash (sym) == raw_hash
+	  && scm_i_symbol_length (sym) == len)
+	{
+	  const char *chrs = scm_i_symbol_chars (sym);
+	  size_t i = len;
+
+	  while (i != 0)
+	    {
+	      --i;
+	      if (name[i] != chrs[i])
+		goto next_symbol;
+	    }
+
+	  return sym;
+	}
+    next_symbol:
+      ;
+    }
+
+  return SCM_BOOL_F;
+}
 
+static SCM
+scm_i_c_mem2symbol (const char *name, size_t len)
+{
+  SCM symbol;
   size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
   size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
 
+  symbol = lookup_interned_symbol (name, len, raw_hash);
+  if (symbol != SCM_BOOL_F)
+    return symbol;
+
   {
-    /* Try to find the symbol in the symbols table */
+    /* The symbol was not found - create it. */
+    SCM symbol = scm_i_c_make_symbol (name, len, 0, raw_hash,
+				      scm_cons (SCM_BOOL_F, SCM_EOL));
 
-    SCM l;
+    SCM slot = SCM_HASHTABLE_BUCKET (symbols, hash);
+    SCM cell = scm_cons (symbol, SCM_UNDEFINED);
+    SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot));
+    SCM_HASHTABLE_INCREMENT (symbols);
+    if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols))
+      scm_i_rehash (symbols, scm_i_hash_symbol, 0, "scm_mem2symbol");
 
-    for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
-	 !scm_is_null (l);
-	 l = SCM_CDR (l))
-      {
-	SCM sym = SCM_CAAR (l);
-	if (scm_i_symbol_hash (sym) == raw_hash
-	    && scm_i_symbol_length (sym) == len)
-	  {
-	    const char *chrs = scm_i_symbol_chars (sym);
-	    size_t i = len;
-
-	    while (i != 0)
-	      {
-		--i;
-		if (name[i] != chrs[i])
-		  goto next_symbol;
-	      }
-
-	    return sym;
-	  }
-      next_symbol:
-	;
-      }
+    return symbol;
   }
+}
+
+static SCM
+scm_i_mem2symbol (SCM str)
+{
+  SCM symbol;
+  const char *name = scm_i_string_chars (str);
+  size_t len = scm_i_string_length (str);
+  size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
+  size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+  symbol = lookup_interned_symbol (name, len, raw_hash);
+  if (symbol != SCM_BOOL_F)
+    return symbol;
 
   {
     /* The symbol was not found - create it. */
@@ -139,6 +175,7 @@
   }
 }
 
+
 static SCM
 scm_i_mem2uninterned_symbol (SCM str)
 {
@@ -348,13 +385,50 @@
 SCM
 scm_from_locale_symbol (const char *sym)
 {
-  return scm_string_to_symbol (scm_from_locale_string (sym));
+  return scm_i_c_mem2symbol (sym, strlen (sym));
 }
 
 SCM
 scm_from_locale_symboln (const char *sym, size_t len)
 {
-  return scm_string_to_symbol (scm_from_locale_stringn (sym, len));
+  return scm_i_c_mem2symbol (sym, len);
+}
+
+SCM
+scm_take_locale_symboln (char *sym, size_t len)
+{
+  SCM res;
+  unsigned long raw_hash;
+
+  if (len == (size_t)-1)
+    len = strlen (sym);
+  else
+    {
+      /* Ensure STR is null terminated.  A realloc for 1 extra byte should
+         often be satisfied from the alignment padding after the block, with
+         no actual data movement.  */
+      sym = scm_realloc (sym, len+1);
+      sym[len] = '\0';
+    }
+
+  raw_hash = scm_string_hash ((unsigned char *)sym, len);
+  res = lookup_interned_symbol (sym, len, raw_hash);
+  if (res != SCM_BOOL_F)
+    {
+      free (sym);
+      return res;
+    }
+
+  res = scm_i_c_take_symbol (sym, len, 0, raw_hash,
+			     scm_cons (SCM_BOOL_F, SCM_EOL));
+
+  return res;
+}
+
+SCM
+scm_take_locale_symbol (char *sym)
+{
+  return scm_take_locale_symboln (sym, (size_t)-1);
 }
 
 void


--- orig/libguile/symbols.h
+++ mod/libguile/symbols.h
@@ -56,6 +56,8 @@
 
 SCM_API SCM scm_from_locale_symbol (const char *str);
 SCM_API SCM scm_from_locale_symboln (const char *str, size_t len);
+SCM_API SCM scm_take_locale_symbol (char *sym);
+SCM_API SCM scm_take_locale_symboln (char *sym, size_t len);
 
 /* internal functions. */
 



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-01-25 18:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-19 17:23 [PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()' Ludovic Courtès
2006-01-11  8:39 ` Ludovic Courtès
2006-01-24  8:14 ` Ludovic Courtès
2006-01-24 11:02   ` Han-Wen Nienhuys
2006-01-24 14:17     ` Ludovic Courtès
2006-01-24 16:18       ` Han-Wen Nienhuys
2006-01-24 17:42         ` Ludovic Courtès
2006-01-24 21:06           ` Han-Wen Nienhuys
2006-01-24 23:27             ` Ken Raeburn
2006-01-25  9:45               ` Ludovic Courtès
2006-01-25 11:18                 ` Ken Raeburn
2006-01-25 13:29                   ` Distributed revision control, etc Ludovic Courtès
2006-01-25 18:04                     ` Ken Raeburn
2006-01-24 11:11 ` [PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()' Han-Wen Nienhuys
2006-01-24 14:22   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).