unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: ludo@gnu.org (Ludovic Courtès)
Cc: guile-devel@gnu.org
Subject: Re: [PATCH] Universally-unique gensyms
Date: Wed, 18 Jan 2012 19:58:46 -0500	[thread overview]
Message-ID: <8762g8y0c9.fsf@netris.org> (raw)
In-Reply-To: <87d3agu1rt.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 18 Jan 2012 22:41:26 +0100")

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

Hi Ludovic,

Thanks for the review!  Attached below are updated patches, which I
believe incorporate all of your suggestions.

> This looks cool, though I must confess I don’t see why sizeof (int) is
> not enough for everyone, nor whether this warrants adding all this code.

For some applications, it is important for gensyms to be unique across
multiple sessions.  `local-eval' is one such application, but there are
many others.

> Apparently this doesn’t break the ABI, right?

Good question!  I investigated this, and AFAICT there should be no
problem with changing the scm_i_thread structure.  I see no public
macros to access the structure members, no public APIs to retrieve a
pointer to this structure, and nothing for a user to do with such a
structure if he created one himself.

Indeed, this structure definition should probably be put within:

  #ifdef BUILDING_LIBGUILE

Anyway, here's the new patch.  What do you think?  Okay to push? :)

    Thanks!
      Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: [PATCH 1/2] Add `random-state-from-platform' and `scm_i_random_bytes_from_platform' --]
[-- Type: text/x-patch, Size: 5998 bytes --]

From 55714abaf6c55f380bb68f456fdc6af923bae6f0 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Wed, 18 Jan 2012 02:36:17 -0500
Subject: [PATCH 1/2] Add `random-state-from-platform' and
 `scm_i_random_bytes_from_platform'

* libguile/random.c (scm_random_state_from_platform): New procedure.
  (scm_i_random_bytes_from_platform): New internal function.

* libguile/random.h (scm_random_state_from_platform,
  scm_i_random_bytes_from_platform): Add prototypes.

* doc/ref/api-data.texi (Random): Add documentation.
---
 doc/ref/api-data.texi |   30 ++++++++--------------
 libguile/random.c     |   66 +++++++++++++++++++++++++++++++++++++++++++++++++
 libguile/random.h     |    3 ++
 3 files changed, 80 insertions(+), 19 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index f2450ce..e165559 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -1865,6 +1865,14 @@ Return a datum representation of @var{state} that may be written out and
 read back with the Scheme reader.
 @end deffn
 
+@deffn {Scheme Procedure} random-state-from-platform
+@deffnx {C Function} scm_random_state_from_platform ()
+Construct a new random state seeded from a platform-specific source of
+entropy, appropriate for use in non-security-critical applications.
+Currently @file{/dev/urandom} is tried first, or else the seed is based
+on the time, date, process ID, and a high-resolution timer if available.
+@end deffn
+
 @defvar *random-state*
 The global random state used by the above functions when the
 @var{state} parameter is not given.
@@ -1887,29 +1895,13 @@ Guile started up, will always give:
 (0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12)
 @end lisp
 
-To use the time of day as the random seed, you can use code like this:
-
-@lisp
-(let ((time (gettimeofday)))
-  (set! *random-state*
-        (seed->random-state (+ (car time)
-                               (cdr time)))))
-@end lisp
-
-@noindent
-And then (depending on the time of day, of course):
+To seed the random state in a sensible way for non-security-critical
+applications, do this during initialization of your program:
 
 @lisp
-(map random (cdr (iota 19)))
-@result{}
-(0 0 1 0 2 4 5 4 5 5 9 3 10 1 8 3 14 17)
+(set! *random-state* (random-state-from-platform))
 @end lisp
 
-For security applications, such as password generation, you should use
-more bits of seed.  Otherwise an open source password generator could
-be attacked by guessing the seed@dots{} but that's a subject for
-another manual.
-
 
 @node Characters
 @subsection Characters
diff --git a/libguile/random.c b/libguile/random.c
index 8bc0d87..5c7983a 100644
--- a/libguile/random.c
+++ b/libguile/random.c
@@ -653,6 +653,72 @@ SCM_DEFINE (scm_random_exp, "random:exp", 0, 1, 0,
 }
 #undef FUNC_NAME
 
+/* Return a new random-state seeded from the time, date, process ID,
+   and a high-resolution timer if available.  This is only to be used as
+   a last resort, when no better source of entropy is available. */
+static SCM
+random_state_of_last_resort (void)
+{
+  SCM time_of_day = scm_gettimeofday ();
+  return scm_seed_to_random_state
+    (scm_sum
+     (scm_get_internal_real_time (),
+      scm_ash (scm_sum (scm_getpid (),
+                        scm_ash (scm_sum (scm_cdr (time_of_day),
+                                          scm_ash (scm_car (time_of_day),
+                                                   SCM_I_MAKINUM (20))),
+                                 SCM_I_MAKINUM (20))),
+               SCM_I_MAKINUM (64))));
+}
+
+/* Attempt to fill buffer with random bytes from /dev/urandom.
+   Return 1 if successful, else return 0. */
+static int
+read_dev_urandom (unsigned char *buf, size_t len)
+{
+  size_t res = 0;
+  FILE *f = fopen ("/dev/urandom", "r");
+  if (f)
+    {
+      res = fread(buf, 1, len, f);
+      fclose (f);
+    }
+  return (res == len);
+}
+
+/* Fill a buffer with random bytes seeded from a platform-specific
+   source of entropy.  /dev/urandom is used if available.  Note that
+   this function provides no guarantees about the amount of entropy
+   present in the returned bytes. */
+void
+scm_i_random_bytes_from_platform (unsigned char *buf, size_t len)
+{
+  if (read_dev_urandom (buf, len))
+    return;
+  else  /* FIXME: support other platform sources */
+    {
+      /* When all else fails, use this (rather weak) fallback */
+      SCM random_state = random_state_of_last_resort ();
+      int i;
+      for (i = len-1; i >= 0; --i)
+        buf[i] = scm_to_int (scm_random (SCM_I_MAKINUM (256), random_state));
+    }
+}
+
+SCM_DEFINE (scm_random_state_from_platform, "random-state-from-platform", 0, 0, 0,
+            (void),
+            "Construct a new random state seeded from a platform-specific\n\
+source of entropy, appropriate for use in non-security-critical applications.")
+#define FUNC_NAME s_scm_random_state_from_platform
+{
+  unsigned char buf[32];
+  if (read_dev_urandom (buf, sizeof(buf)))
+    return make_rstate (scm_c_make_rstate ((char *) buf, sizeof(buf)));
+  else
+    return random_state_of_last_resort ();
+}
+#undef FUNC_NAME
+
 void
 scm_init_random ()
 {
diff --git a/libguile/random.h b/libguile/random.h
index 2f1f0f6..109969e 100644
--- a/libguile/random.h
+++ b/libguile/random.h
@@ -86,6 +86,7 @@ SCM_API SCM scm_copy_random_state (SCM state);
 SCM_API SCM scm_seed_to_random_state (SCM seed);
 SCM_API SCM scm_datum_to_random_state (SCM datum);
 SCM_API SCM scm_random_state_to_datum (SCM state);
+SCM_API SCM scm_random_state_from_platform (void);
 SCM_API SCM scm_random_uniform (SCM state);
 SCM_API SCM scm_random_solid_sphere_x (SCM v, SCM state);
 SCM_API SCM scm_random_hollow_sphere_x (SCM v, SCM state);
@@ -94,6 +95,8 @@ SCM_API SCM scm_random_normal_vector_x (SCM v, SCM state);
 SCM_API SCM scm_random_exp (SCM state);
 SCM_INTERNAL void scm_init_random (void);
 
+SCM_INTERNAL void scm_i_random_bytes_from_platform (unsigned char *buf, size_t len);
+
 #endif  /* SCM_RANDOM_H */
 
 /*
-- 
1.7.5.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: [PATCH 2/2] Universally-unique gensyms --]
[-- Type: text/x-patch, Size: 4495 bytes --]

From 0b7603c90d17d5296a94ef7d22849c536e73b49a Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Wed, 18 Jan 2012 02:53:05 -0500
Subject: [PATCH 2/2] Universally-unique gensyms

* libguile/symbols.c (scm_gensym): Make the gensym counter a 128-bit
  thread-local, initialized to a random number upon the first call to
  `gensym' within a given thread.  This counter is rendered as a 22 byte
  suffix of mostly base64 digits.

* libguile/threads.h (scm_i_thread): Add a thread-local gensym_counter.

* libguile/threads.c (guilify_self_1): Initialize gensym_counter to NULL.
---
 libguile/symbols.c |   50 +++++++++++++++++++++++++++++++++++++++-----------
 libguile/threads.c |    1 +
 libguile/threads.h |    4 ++++
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/libguile/symbols.c b/libguile/symbols.c
index 59aca00..840a458 100644
--- a/libguile/symbols.c
+++ b/libguile/symbols.c
@@ -31,6 +31,7 @@
 #include "libguile/variable.h"
 #include "libguile/alist.h"
 #include "libguile/fluids.h"
+#include "libguile/threads.h"
 #include "libguile/strings.h"
 #include "libguile/vectors.h"
 #include "libguile/hashtab.h"
@@ -340,7 +341,9 @@ SCM_DEFINE (scm_string_ci_to_symbol, "string-ci->symbol", 1, 0, 0,
 /* The default prefix for `gensym'd symbols.  */
 static SCM default_gensym_prefix;
 
-#define MAX_PREFIX_LENGTH 30
+#define GENSYM_LENGTH      22  /* bytes */
+#define GENSYM_RADIX_BITS  6
+#define GENSYM_RADIX       (1 << (GENSYM_RADIX_BITS))
 
 SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0,
             (SCM prefix),
@@ -351,22 +354,47 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0,
 	    "resetting the counter.")
 #define FUNC_NAME s_scm_gensym
 {
-  static int gensym_counter = 0;
-  
+  static const char base64[GENSYM_RADIX] =
+    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@";
+  static const char base4[4] = "_.-~";
+
+  unsigned char *digit_buf = SCM_I_CURRENT_THREAD->gensym_counter;
+  char char_buf[GENSYM_LENGTH];
   SCM suffix, name;
-  int n, n_digits;
-  char buf[SCM_INTBUFLEN];
+  int i;
 
   if (SCM_UNBNDP (prefix))
     prefix = default_gensym_prefix;
 
-  /* mutex in case another thread looks and incs at the exact same moment */
-  scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex);
-  n = gensym_counter++;
-  scm_i_pthread_mutex_unlock (&scm_i_misc_mutex);
+  if (SCM_UNLIKELY (digit_buf == NULL))
+    {
+      /* This is the first time gensym has been called in this thread.
+         Allocate and randomize our new thread-local gensym counter */
+      digit_buf = (unsigned char *)
+        scm_gc_malloc_pointerless (GENSYM_LENGTH, "gensym-counter");
+      scm_i_random_bytes_from_platform (digit_buf, GENSYM_LENGTH);
+      for (i = (GENSYM_LENGTH - 1); i >= 0; --i)
+        digit_buf[i] &= (GENSYM_RADIX - 1);
+      SCM_I_CURRENT_THREAD->gensym_counter = digit_buf;
+    }
+
+  /* Increment our thread-local gensym_counter. */
+  for (i = (GENSYM_LENGTH - 1); i >= 0; --i)
+    {
+      if (SCM_LIKELY (++(digit_buf[i]) < GENSYM_RADIX))
+        break;
+      else
+        digit_buf[i] = 0;
+    }
+
+  /* Encode digit_buf as base64, except for the first character where we
+     use the sparse glyphs "_.-~" (base 4) to provide some visual
+     separation between the prefix and the dense base64 block. */
+  for (i = (GENSYM_LENGTH - 1); i > 0; --i)
+    char_buf[i] = base64[digit_buf[i]];
+  char_buf[0] = base4[digit_buf[0] & 3];
 
-  n_digits = scm_iint2str (n, 10, buf);
-  suffix = scm_from_latin1_stringn (buf, n_digits);
+  suffix = scm_from_latin1_stringn (char_buf, GENSYM_LENGTH);
   name = scm_string_append (scm_list_2 (prefix, suffix));
   return scm_string_to_symbol (name);
 }
diff --git a/libguile/threads.c b/libguile/threads.c
index 5a13e5c..67834ff 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -545,6 +545,7 @@ guilify_self_1 (struct GC_stack_base *base)
   t.join_queue = SCM_EOL;
   t.dynamic_state = SCM_BOOL_F;
   t.dynwinds = SCM_EOL;
+  t.gensym_counter = NULL;
   t.active_asyncs = SCM_EOL;
   t.block_asyncs = 1;
   t.pending_asyncs = 1;
diff --git a/libguile/threads.h b/libguile/threads.h
index ec129bc..3660a58 100644
--- a/libguile/threads.h
+++ b/libguile/threads.h
@@ -81,6 +81,10 @@ typedef struct scm_i_thread {
   SCM dynamic_state;
   SCM dynwinds;
 
+  /* Thread-local gensym counter.
+   */
+  unsigned char *gensym_counter;
+
   /* For system asyncs.
    */
   SCM active_asyncs;            /* The thunks to be run at the next
-- 
1.7.5.4


  parent reply	other threads:[~2012-01-19  0:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 13:27 [PATCH] Universally-unique gensyms Mark H Weaver
2012-01-17 13:57 ` Andy Wingo
2012-01-17 14:06   ` Andy Wingo
2012-01-18  8:04   ` Mark H Weaver
2012-01-18  9:35     ` Mark H Weaver
2012-01-18 21:41     ` Ludovic Courtès
2012-01-18 22:23       ` Andy Wingo
2012-01-18 22:43         ` Ludovic Courtès
2012-01-18 23:30         ` Mark H Weaver
2012-01-19  1:21           ` David Kastrup
2012-01-19  4:50             ` Mark H Weaver
2012-01-19  6:25               ` David Kastrup
2012-01-19 10:23                 ` Mark H Weaver
2012-01-20  0:50                   ` Mark H Weaver
2012-01-19  0:58       ` Mark H Weaver [this message]
2012-01-19 23:26         ` Ludovic Courtès
2012-01-18 21:42     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8762g8y0c9.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).