* [PATCH] Universally-unique gensyms @ 2012-01-17 13:27 Mark H Weaver 2012-01-17 13:57 ` Andy Wingo 0 siblings, 1 reply; 17+ messages in thread From: Mark H Weaver @ 2012-01-17 13:27 UTC (permalink / raw) To: guile-devel [-- Attachment #1: Type: text/plain, Size: 80 bytes --] This patch makes our gensyms universally-unique. Comments welcome. Mark [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: [PATCH] Universally-unique gensyms --] [-- Type: text/x-patch, Size: 3611 bytes --] From 18616afc6a3d5aca48da2e0819d08d7ff98de214 Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw@netris.org> Date: Tue, 17 Jan 2012 08:15:10 -0500 Subject: [PATCH] Universally-unique gensyms * libguile/symbols.c (scm_gensym): The gensym counter is now 128 bits, initialized to a random number at the beginning of each session. This counter is rendered as a 22 byte suffix of mostly base64 digits. --- libguile/symbols.c | 69 +++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 60 insertions(+), 9 deletions(-) diff --git a/libguile/symbols.c b/libguile/symbols.c index 59aca00..cf82518 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -340,7 +340,9 @@ SCM_DEFINE (scm_string_ci_to_symbol, "string-ci->symbol", 1, 0, 0, /* The default prefix for `gensym'd symbols. */ static SCM default_gensym_prefix; -#define MAX_PREFIX_LENGTH 30 +#define GENSYM_LENGTH 22 /* bytes */ +#define GENSYM_RADIX_BITS 6 +#define GENSYM_RADIX (1 << (GENSYM_RADIX_BITS)) SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, (SCM prefix), @@ -351,22 +353,71 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, "resetting the counter.") #define FUNC_NAME s_scm_gensym { - static int gensym_counter = 0; - + static int initialized = 0; + static unsigned char digit_buf[GENSYM_LENGTH]; + static char base64[GENSYM_RADIX] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; + static char base4[4] = "_.-~"; + + char char_buf[GENSYM_LENGTH]; SCM suffix, name; - int n, n_digits; - char buf[SCM_INTBUFLEN]; + int i; + + if (SCM_UNLIKELY (!initialized)) + { + int res = 0; + FILE *f = fopen ("/dev/urandom", "r"); + if (f) + { + res = fread(digit_buf, 1, GENSYM_LENGTH, f); + fclose (f); + } + if (res == GENSYM_LENGTH) + { + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + digit_buf[i] &= (GENSYM_RADIX - 1); + } + else + { + /* This is a fallback in case /dev/urandom fails */ + SCM tm = scm_gettimeofday (); + SCM seed = scm_sum (scm_product (scm_car (tm), + scm_from_int (1000000)), + scm_cdr (tm)); + SCM random_state = scm_seed_to_random_state (seed); + SCM radix = scm_from_int (GENSYM_RADIX); + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + digit_buf[i] = scm_to_int (scm_random (radix, random_state)); + } + initialized = 1; + } if (SCM_UNBNDP (prefix)) prefix = default_gensym_prefix; - /* mutex in case another thread looks and incs at the exact same moment */ + /* ENHANCE-ME: make digit_buf thread-local to avoid the mutex */ + + /* lock the mutex */ scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex); - n = gensym_counter++; + + /* increment digit_buf */ + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + { + if (SCM_LIKELY (++(digit_buf[i]) < GENSYM_RADIX)) + break; + else + digit_buf[i] = 0; + } + + /* convert digit_buf to the base64 char_buf */ + for (i = (GENSYM_LENGTH - 1); i > 0; --i) + char_buf[i] = base64[digit_buf[i]]; + char_buf[0] = base4[digit_buf[0] & 3]; + + /* unlock the mutex */ scm_i_pthread_mutex_unlock (&scm_i_misc_mutex); - n_digits = scm_iint2str (n, 10, buf); - suffix = scm_from_latin1_stringn (buf, n_digits); + suffix = scm_from_latin1_stringn (char_buf, GENSYM_LENGTH); name = scm_string_append (scm_list_2 (prefix, suffix)); return scm_string_to_symbol (name); } -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-17 13:27 [PATCH] Universally-unique gensyms Mark H Weaver @ 2012-01-17 13:57 ` Andy Wingo 2012-01-17 14:06 ` Andy Wingo 2012-01-18 8:04 ` Mark H Weaver 0 siblings, 2 replies; 17+ messages in thread From: Andy Wingo @ 2012-01-17 13:57 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel Hi Mark, Excellent! On Tue 17 Jan 2012 14:27, Mark H Weaver <mhw@netris.org> writes: > This patch makes our gensyms universally-unique. Comments welcome. I think the lazy initialization needs to happen within the lock. Also it would be nice to factor out the initialization of the random seed would be a nice helper to have in random.[ch]. You probably don't need to have a flag to support /dev/random (rather than urandom), because after all it's only seeding a PRNG (and not a cryptographically safe one, at that). Otherwise looks great to me! Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-17 13:57 ` Andy Wingo @ 2012-01-17 14:06 ` Andy Wingo 2012-01-18 8:04 ` Mark H Weaver 1 sibling, 0 replies; 17+ messages in thread From: Andy Wingo @ 2012-01-17 14:06 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel On Tue 17 Jan 2012 14:57, Andy Wingo <wingo@pobox.com> writes: > I think the lazy initialization needs to happen within the lock. Also > it would be nice to factor out the initialization of the random seed > would be a nice helper to have in random.[ch]. Sigh, mind racing ahead of the fingers: "Also it would be nice to factor out the initialialization of the random seed to a helper in random.[ch]" A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-17 13:57 ` Andy Wingo 2012-01-17 14:06 ` Andy Wingo @ 2012-01-18 8:04 ` Mark H Weaver 2012-01-18 9:35 ` Mark H Weaver ` (2 more replies) 1 sibling, 3 replies; 17+ messages in thread From: Mark H Weaver @ 2012-01-18 8:04 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 218 bytes --] Hi Andy, Here's an improved implementation, incorporating your excellent feedback and also implementing thread-local gensym counters, thus eliminating the mutex altogether. What do you think? Thanks! Mark [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: [PATCH 1/2] Add `random-state-from-platform' and `scm_i_random_bytes_from_platform' --] [-- Type: text/x-patch, Size: 5744 bytes --] From 9090dfeb58846d637150f5f88e344c7d980efdf2 Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw@netris.org> Date: Wed, 18 Jan 2012 02:36:17 -0500 Subject: [PATCH 1/2] Add `random-state-from-platform' and `scm_i_random_bytes_from_platform' * libguile/random.c (scm_random_state_from_platform): New procedure to construct a new random state seeded from a platform-specific source of randomness. (scm_i_random_bytes_from_platform): New internal function to fill a byte buffer with random bytes from a platform-specific source of randomness. (read_dev_urandom, random_state_of_last_resort): New internal static helper functions. * libguile/random.h (scm_random_state_from_platform, scm_i_random_bytes_from_platform): Add prototypes. * doc/ref/api-data.texi (Random): Add documentation. --- doc/ref/api-data.texi | 30 +++++++++---------------- libguile/random.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++ libguile/random.h | 3 ++ 3 files changed, 72 insertions(+), 19 deletions(-) diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi index f2450ce..e165559 100644 --- a/doc/ref/api-data.texi +++ b/doc/ref/api-data.texi @@ -1865,6 +1865,14 @@ Return a datum representation of @var{state} that may be written out and read back with the Scheme reader. @end deffn +@deffn {Scheme Procedure} random-state-from-platform +@deffnx {C Function} scm_random_state_from_platform () +Construct a new random state seeded from a platform-specific source of +entropy, appropriate for use in non-security-critical applications. +Currently @file{/dev/urandom} is tried first, or else the seed is based +on the time, date, process ID, and a high-resolution timer if available. +@end deffn + @defvar *random-state* The global random state used by the above functions when the @var{state} parameter is not given. @@ -1887,29 +1895,13 @@ Guile started up, will always give: (0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12) @end lisp -To use the time of day as the random seed, you can use code like this: - -@lisp -(let ((time (gettimeofday))) - (set! *random-state* - (seed->random-state (+ (car time) - (cdr time))))) -@end lisp - -@noindent -And then (depending on the time of day, of course): +To seed the random state in a sensible way for non-security-critical +applications, do this during initialization of your program: @lisp -(map random (cdr (iota 19))) -@result{} -(0 0 1 0 2 4 5 4 5 5 9 3 10 1 8 3 14 17) +(set! *random-state* (random-state-from-platform)) @end lisp -For security applications, such as password generation, you should use -more bits of seed. Otherwise an open source password generator could -be attacked by guessing the seed@dots{} but that's a subject for -another manual. - @node Characters @subsection Characters diff --git a/libguile/random.c b/libguile/random.c index 8bc0d87..a1e7abe 100644 --- a/libguile/random.c +++ b/libguile/random.c @@ -653,6 +653,64 @@ SCM_DEFINE (scm_random_exp, "random:exp", 0, 1, 0, } #undef FUNC_NAME +static SCM +random_state_of_last_resort (void) +{ + SCM time_of_day = scm_gettimeofday (); + return scm_seed_to_random_state + (scm_sum + (scm_get_internal_real_time (), + scm_ash (scm_sum (scm_getpid (), + scm_ash (scm_sum (scm_cdr (time_of_day), + scm_ash (scm_car (time_of_day), + SCM_I_MAKINUM (20))), + SCM_I_MAKINUM (20))), + SCM_I_MAKINUM (64)))); +} + +/* Returns 1 if successful, else 0 */ +static int +read_dev_urandom (unsigned char *buf, size_t len) +{ + size_t res = 0; + FILE *f = fopen ("/dev/urandom", "r"); + if (f) + { + res = fread(buf, 1, len, f); + fclose (f); + } + return (res == len); +} + +void +scm_i_random_bytes_from_platform (unsigned char *buf, size_t len) +{ + if (read_dev_urandom (buf, len)) + return; + else /* FIXME: support other platform sources */ + { + /* When all else fails, use this (rather weak) fallback */ + SCM random_state = random_state_of_last_resort (); + int i; + for (i = len-1; i >= 0; --i) + buf[i] = scm_to_int (scm_random (SCM_I_MAKINUM (256), random_state)); + } +} + +SCM_DEFINE (scm_random_state_from_platform, "random-state-from-platform", 0, 0, 0, + (void), + "Construct a new random state seeded from a platform-specific source of\n\ +entropy, appropriate for use in non-security-critical applications.") +#define FUNC_NAME s_scm_random_state_from_platform +{ + unsigned char buf[32]; + if (read_dev_urandom (buf, sizeof(buf))) + return make_rstate (scm_c_make_rstate ((char *) buf, sizeof(buf))); + else + return random_state_of_last_resort (); +} +#undef FUNC_NAME + void scm_init_random () { diff --git a/libguile/random.h b/libguile/random.h index 2f1f0f6..109969e 100644 --- a/libguile/random.h +++ b/libguile/random.h @@ -86,6 +86,7 @@ SCM_API SCM scm_copy_random_state (SCM state); SCM_API SCM scm_seed_to_random_state (SCM seed); SCM_API SCM scm_datum_to_random_state (SCM datum); SCM_API SCM scm_random_state_to_datum (SCM state); +SCM_API SCM scm_random_state_from_platform (void); SCM_API SCM scm_random_uniform (SCM state); SCM_API SCM scm_random_solid_sphere_x (SCM v, SCM state); SCM_API SCM scm_random_hollow_sphere_x (SCM v, SCM state); @@ -94,6 +95,8 @@ SCM_API SCM scm_random_normal_vector_x (SCM v, SCM state); SCM_API SCM scm_random_exp (SCM state); SCM_INTERNAL void scm_init_random (void); +SCM_INTERNAL void scm_i_random_bytes_from_platform (unsigned char *buf, size_t len); + #endif /* SCM_RANDOM_H */ /* -- 1.7.5.4 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: [PATCH 2/2] Universally-unique gensyms --] [-- Type: text/x-patch, Size: 4461 bytes --] From f7feb8c116c40be6894061dcc4474c5939f64e03 Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw@netris.org> Date: Wed, 18 Jan 2012 02:53:05 -0500 Subject: [PATCH 2/2] Universally-unique gensyms * libguile/symbols.c (scm_gensym): Make the gensym counter a 128-bit thread-local, initialized to a random number upon the first call to `gensym' within a given thread. This counter is rendered as a 22 byte suffix of mostly base64 digits. * libguile/threads.h (scm_i_thread): Add a thread-local gensym_counter. * libguile/threads.c (guilify_self_1): Initialize gensym_counter to NULL. --- libguile/symbols.c | 49 ++++++++++++++++++++++++++++++++++++++----------- libguile/threads.c | 1 + libguile/threads.h | 4 ++++ 3 files changed, 43 insertions(+), 11 deletions(-) diff --git a/libguile/symbols.c b/libguile/symbols.c index 59aca00..0656ecb 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -31,6 +31,7 @@ #include "libguile/variable.h" #include "libguile/alist.h" #include "libguile/fluids.h" +#include "libguile/threads.h" #include "libguile/strings.h" #include "libguile/vectors.h" #include "libguile/hashtab.h" @@ -340,7 +341,9 @@ SCM_DEFINE (scm_string_ci_to_symbol, "string-ci->symbol", 1, 0, 0, /* The default prefix for `gensym'd symbols. */ static SCM default_gensym_prefix; -#define MAX_PREFIX_LENGTH 30 +#define GENSYM_LENGTH 22 /* bytes */ +#define GENSYM_RADIX_BITS 6 +#define GENSYM_RADIX (1 << (GENSYM_RADIX_BITS)) SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, (SCM prefix), @@ -351,22 +354,46 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, "resetting the counter.") #define FUNC_NAME s_scm_gensym { - static int gensym_counter = 0; - + static const char base64[GENSYM_RADIX] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; + static const char base4[4] = "_.-~"; + + unsigned char *digit_buf = SCM_I_CURRENT_THREAD->gensym_counter; + char char_buf[GENSYM_LENGTH]; SCM suffix, name; - int n, n_digits; - char buf[SCM_INTBUFLEN]; + int i; if (SCM_UNBNDP (prefix)) prefix = default_gensym_prefix; - /* mutex in case another thread looks and incs at the exact same moment */ - scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex); - n = gensym_counter++; - scm_i_pthread_mutex_unlock (&scm_i_misc_mutex); + if (SCM_UNLIKELY (digit_buf == NULL)) + { + /* This is the first time gensym has been called in this thread. + Allocate and randomize our new thread-local gensym counter */ + digit_buf = (unsigned char *) scm_malloc (GENSYM_LENGTH); + scm_i_random_bytes_from_platform (digit_buf, GENSYM_LENGTH); + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + digit_buf[i] &= (GENSYM_RADIX - 1); + SCM_I_CURRENT_THREAD->gensym_counter = digit_buf; + } + + /* increment our thread-local gensym_counter */ + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + { + if (SCM_LIKELY (++(digit_buf[i]) < GENSYM_RADIX)) + break; + else + digit_buf[i] = 0; + } + + /* encode digit_buf as base64, except for the first character which is + base 4 using the sparse glyphs "_.-~" to hopefully provide some + visual separation between the prefix and the dense base64 block. */ + for (i = (GENSYM_LENGTH - 1); i > 0; --i) + char_buf[i] = base64[digit_buf[i]]; + char_buf[0] = base4[digit_buf[0] & 3]; - n_digits = scm_iint2str (n, 10, buf); - suffix = scm_from_latin1_stringn (buf, n_digits); + suffix = scm_from_latin1_stringn (char_buf, GENSYM_LENGTH); name = scm_string_append (scm_list_2 (prefix, suffix)); return scm_string_to_symbol (name); } diff --git a/libguile/threads.c b/libguile/threads.c index 5a13e5c..67834ff 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -545,6 +545,7 @@ guilify_self_1 (struct GC_stack_base *base) t.join_queue = SCM_EOL; t.dynamic_state = SCM_BOOL_F; t.dynwinds = SCM_EOL; + t.gensym_counter = NULL; t.active_asyncs = SCM_EOL; t.block_asyncs = 1; t.pending_asyncs = 1; diff --git a/libguile/threads.h b/libguile/threads.h index ec129bc..3660a58 100644 --- a/libguile/threads.h +++ b/libguile/threads.h @@ -81,6 +81,10 @@ typedef struct scm_i_thread { SCM dynamic_state; SCM dynwinds; + /* Thread-local gensym counter. + */ + unsigned char *gensym_counter; + /* For system asyncs. */ SCM active_asyncs; /* The thunks to be run at the next -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 8:04 ` Mark H Weaver @ 2012-01-18 9:35 ` Mark H Weaver 2012-01-18 21:41 ` Ludovic Courtès 2012-01-18 21:42 ` Ludovic Courtès 2 siblings, 0 replies; 17+ messages in thread From: Mark H Weaver @ 2012-01-18 9:35 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel I wrote: > + /* This is the first time gensym has been called in this thread. > + Allocate and randomize our new thread-local gensym counter */ > + digit_buf = (unsigned char *) scm_malloc (GENSYM_LENGTH); Oops, that should be `scm_gc_malloc_pointerless'. Mark ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 8:04 ` Mark H Weaver 2012-01-18 9:35 ` Mark H Weaver @ 2012-01-18 21:41 ` Ludovic Courtès 2012-01-18 22:23 ` Andy Wingo 2012-01-19 0:58 ` Mark H Weaver 2012-01-18 21:42 ` Ludovic Courtès 2 siblings, 2 replies; 17+ messages in thread From: Ludovic Courtès @ 2012-01-18 21:41 UTC (permalink / raw) To: guile-devel Hi Mark, Mark H Weaver <mhw@netris.org> skribis: > Here's an improved implementation, incorporating your excellent feedback > and also implementing thread-local gensym counters, thus eliminating the > mutex altogether. What do you think? This looks cool, though I must confess I don’t see why sizeof (int) is not enough for everyone, nor whether this warrants adding all this code. A few comments. > From 9090dfeb58846d637150f5f88e344c7d980efdf2 Mon Sep 17 00:00:00 2001 > From: Mark H Weaver <mhw@netris.org> > Date: Wed, 18 Jan 2012 02:36:17 -0500 > Subject: [PATCH 1/2] Add `random-state-from-platform' and > `scm_i_random_bytes_from_platform' > > * libguile/random.c (scm_random_state_from_platform): New procedure to > construct a new random state seeded from a platform-specific source of > randomness. > > (scm_i_random_bytes_from_platform): New internal function to fill a > byte buffer with random bytes from a platform-specific source of > randomness. > > (read_dev_urandom, random_state_of_last_resort): New internal static > helper functions. Could you stick to GNU-style change logs, describing the change (for example, “New function”), and not the rationale, function, etc.? (Andy might disagree with me, but don’t listen to him. ;-)) > +static SCM > +random_state_of_last_resort (void) > +{ Could you a sentence above new functions describing what they return (like you did for some of them)? > + FILE *f = fopen ("/dev/urandom", "r"); I’ve just checked it’s available on GNU, FreeBSD, Solaris, and Darwin (I thought it was less portable.) > + static const char base64[GENSYM_RADIX] = > + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; > + static const char base4[4] = "_.-~"; Could we use Gnulib’s ‘base64’ module instead? > + /* encode digit_buf as base64, except for the first character which is Please capitalize sentences. :-) > --- a/libguile/threads.h > +++ b/libguile/threads.h > @@ -81,6 +81,10 @@ typedef struct scm_i_thread { > SCM dynamic_state; > SCM dynwinds; > > + /* Thread-local gensym counter. > + */ > + unsigned char *gensym_counter; Apparently this doesn’t break the ABI, right? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 21:41 ` Ludovic Courtès @ 2012-01-18 22:23 ` Andy Wingo 2012-01-18 22:43 ` Ludovic Courtès 2012-01-18 23:30 ` Mark H Weaver 2012-01-19 0:58 ` Mark H Weaver 1 sibling, 2 replies; 17+ messages in thread From: Andy Wingo @ 2012-01-18 22:23 UTC (permalink / raw) To: Ludovic Courtès, Mark H. Weaver; +Cc: guile-devel On Wed 18 Jan 2012 22:41, ludo@gnu.org (Ludovic Courtès) writes: > Could you stick to GNU-style change logs, describing the change (for > example, “New function”), and not the rationale, function, etc.? > > (Andy might disagree with me, but don’t listen to him. ;-)) FWIW I have grown to agree with you over time, and your chiding, while not always well-received in the moment, has made for better commit logs :) >> + static const char base64[GENSYM_RADIX] = >> + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; >> + static const char base4[4] = "_.-~"; > > Could we use Gnulib’s ‘base64’ module instead? FWIW (again) I thought the same initially, but Mark is incrementing a base64 buffer instead of continually reencoding a value. It seems OK in this instance. >> + /* Thread-local gensym counter. >> + */ >> + unsigned char *gensym_counter; > > Apparently this doesn’t break the ABI, right? Interesting. Sorry for asking a stupid question, but why is it that we want the gensym counter to be thread-local? Just to avoid the mutex? TBH I don't think it's that big of a point of contention. This risks devolution into bike-shed-landia tho... Regards Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 22:23 ` Andy Wingo @ 2012-01-18 22:43 ` Ludovic Courtès 2012-01-18 23:30 ` Mark H Weaver 1 sibling, 0 replies; 17+ messages in thread From: Ludovic Courtès @ 2012-01-18 22:43 UTC (permalink / raw) To: Andy Wingo; +Cc: Mark H. Weaver, guile-devel Andy Wingo <wingo@pobox.com> skribis: > On Wed 18 Jan 2012 22:41, ludo@gnu.org (Ludovic Courtès) writes: > >> Could you stick to GNU-style change logs, describing the change (for >> example, “New function”), and not the rationale, function, etc.? >> >> (Andy might disagree with me, but don’t listen to him. ;-)) > > FWIW I have grown to agree with you over time, and your chiding, while > not always well-received in the moment, has made for better commit logs > :) Heh. :-) Apologies if this came out as chiding, though. >>> + static const char base64[GENSYM_RADIX] = >>> + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; >>> + static const char base4[4] = "_.-~"; >> >> Could we use Gnulib’s ‘base64’ module instead? > > FWIW (again) I thought the same initially, but Mark is incrementing a > base64 buffer instead of continually reencoding a value. It seems OK in > this instance. Oh, right, makes sense. Ludo’. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 22:23 ` Andy Wingo 2012-01-18 22:43 ` Ludovic Courtès @ 2012-01-18 23:30 ` Mark H Weaver 2012-01-19 1:21 ` David Kastrup 1 sibling, 1 reply; 17+ messages in thread From: Mark H Weaver @ 2012-01-18 23:30 UTC (permalink / raw) To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel Andy Wingo <wingo@pobox.com> writes: > Sorry for asking a stupid question, but why is it that we want the > gensym counter to be thread-local? Just to avoid the mutex? TBH I > don't think it's that big of a point of contention. This risks > devolution into bike-shed-landia tho... It's a reasonable question. I don't feel strongly about it, but I prefer lock-free programming where practical, and in this case there's really no need for coordination between threads. Indeed, these UUIDs are already designed to avoid collisions between multiple _sessions_ without coordination. So why bother with the lock? Mark ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 23:30 ` Mark H Weaver @ 2012-01-19 1:21 ` David Kastrup 2012-01-19 4:50 ` Mark H Weaver 0 siblings, 1 reply; 17+ messages in thread From: David Kastrup @ 2012-01-19 1:21 UTC (permalink / raw) To: guile-devel Mark H Weaver <mhw@netris.org> writes: > Andy Wingo <wingo@pobox.com> writes: > >> Sorry for asking a stupid question, but why is it that we want the >> gensym counter to be thread-local? Just to avoid the mutex? TBH I >> don't think it's that big of a point of contention. This risks >> devolution into bike-shed-landia tho... > > It's a reasonable question. I don't feel strongly about it, but I > prefer lock-free programming where practical, and in this case there's > really no need for coordination between threads. Indeed, these UUIDs > are already designed to avoid collisions between multiple _sessions_ > without coordination. So why bother with the lock? To avoid both threads reading the same seed value before generating the same number? I have not looked at the code, but that could be a reason for serializing. -- David Kastrup ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-19 1:21 ` David Kastrup @ 2012-01-19 4:50 ` Mark H Weaver 2012-01-19 6:25 ` David Kastrup 0 siblings, 1 reply; 17+ messages in thread From: Mark H Weaver @ 2012-01-19 4:50 UTC (permalink / raw) To: David Kastrup; +Cc: guile-devel David Kastrup <dak@gnu.org> writes: > Mark H Weaver <mhw@netris.org> writes: > >> Andy Wingo <wingo@pobox.com> writes: >> >>> Sorry for asking a stupid question, but why is it that we want the >>> gensym counter to be thread-local? Just to avoid the mutex? TBH I >>> don't think it's that big of a point of contention. This risks >>> devolution into bike-shed-landia tho... >> >> It's a reasonable question. I don't feel strongly about it, but I >> prefer lock-free programming where practical, and in this case there's >> really no need for coordination between threads. Indeed, these UUIDs >> are already designed to avoid collisions between multiple _sessions_ >> without coordination. So why bother with the lock? > > To avoid both threads reading the same seed value before generating the > same number? I have not looked at the code, but that could be a reason > for serializing. Hence the _thread-local_ gensym counters, which is what we're discussing. Please read before you post. Mark ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-19 4:50 ` Mark H Weaver @ 2012-01-19 6:25 ` David Kastrup 2012-01-19 10:23 ` Mark H Weaver 0 siblings, 1 reply; 17+ messages in thread From: David Kastrup @ 2012-01-19 6:25 UTC (permalink / raw) To: guile-devel Mark H Weaver <mhw@netris.org> writes: > David Kastrup <dak@gnu.org> writes: > >> Mark H Weaver <mhw@netris.org> writes: >> >>> Andy Wingo <wingo@pobox.com> writes: >>> >>>> Sorry for asking a stupid question, but why is it that we want the >>>> gensym counter to be thread-local? Just to avoid the mutex? TBH I >>>> don't think it's that big of a point of contention. This risks >>>> devolution into bike-shed-landia tho... >>> >>> It's a reasonable question. I don't feel strongly about it, but I >>> prefer lock-free programming where practical, and in this case there's >>> really no need for coordination between threads. Indeed, these UUIDs >>> are already designed to avoid collisions between multiple _sessions_ >>> without coordination. So why bother with the lock? >> >> To avoid both threads reading the same seed value before generating the >> same number? I have not looked at the code, but that could be a reason >> for serializing. > > Hence the _thread-local_ gensym counters, which is what we're > discussing. Please read before you post. And there is no problem if one thread initializes its gensym counter from the random source without locking at the same time another thread is in the middle of initializing its gensym counter. It does not appear to me that there is any locking that would prevent both ending up with the same random value. -- David Kastrup ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-19 6:25 ` David Kastrup @ 2012-01-19 10:23 ` Mark H Weaver 2012-01-20 0:50 ` Mark H Weaver 0 siblings, 1 reply; 17+ messages in thread From: Mark H Weaver @ 2012-01-19 10:23 UTC (permalink / raw) To: David Kastrup; +Cc: guile-devel David Kastrup <dak@gnu.org> writes: > It does not appear to me that there is any locking that would prevent > both ending up with the same random value. The thread-local 128-bit gensym counters are initialized from /dev/urandom. The kernel ensures that each `read' gets freshly generated random bytes, so there's no issue here. According to Wikipedia, this should cover GNU/Linux, OpenBSD, FreeBSD, NetBSD, Solaris, Tru64 UNIX, AIX, HP-UX, and MacOS X. On platforms without /dev/urandom, I have a fallback that uses the time, date, process ID, and a high-resolution timer if available. In this case, depending on the resolution of the timers, it is indeed feasible for two threads to end up with the same seed value, which would be bad. I think we can solve this problem by including the address of the scm_i_thread structure into the fallback seed. Thanks, Mark ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-19 10:23 ` Mark H Weaver @ 2012-01-20 0:50 ` Mark H Weaver 0 siblings, 0 replies; 17+ messages in thread From: Mark H Weaver @ 2012-01-20 0:50 UTC (permalink / raw) To: David Kastrup; +Cc: guile-devel Replying to myself... > David Kastrup <dak@gnu.org> writes: >> It does not appear to me that there is any locking that would prevent >> both ending up with the same random value. > > The thread-local 128-bit gensym counters are initialized from > /dev/urandom. The kernel ensures that each `read' gets freshly > generated random bytes, so there's no issue here. According to > Wikipedia, this should cover GNU/Linux, OpenBSD, FreeBSD, NetBSD, > Solaris, Tru64 UNIX, AIX, HP-UX, and MacOS X. > > On platforms without /dev/urandom, I have a fallback that uses the time, > date, process ID, and a high-resolution timer if available. In this > case, depending on the resolution of the timers, it is indeed feasible > for two threads to end up with the same seed value, which would be bad. > > I think we can solve this problem by including the address of the > scm_i_thread structure into the fallback seed. Better yet, I now include two memory addresses in the fallback seed calculation: an address of a freshly allocated heap cell, and an address from the local stack frame. The latter is guaranteed to be different for different threads, since each thread has its own stack. This fallback is still not ideal, but it's probably good enough as a last resort. Thankfully, almost all systems now have /dev/urandom. We can add support for additional platform-specific entropy sources as needed. Mark ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 21:41 ` Ludovic Courtès 2012-01-18 22:23 ` Andy Wingo @ 2012-01-19 0:58 ` Mark H Weaver 2012-01-19 23:26 ` Ludovic Courtès 1 sibling, 1 reply; 17+ messages in thread From: Mark H Weaver @ 2012-01-19 0:58 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 1013 bytes --] Hi Ludovic, Thanks for the review! Attached below are updated patches, which I believe incorporate all of your suggestions. > This looks cool, though I must confess I don’t see why sizeof (int) is > not enough for everyone, nor whether this warrants adding all this code. For some applications, it is important for gensyms to be unique across multiple sessions. `local-eval' is one such application, but there are many others. > Apparently this doesn’t break the ABI, right? Good question! I investigated this, and AFAICT there should be no problem with changing the scm_i_thread structure. I see no public macros to access the structure members, no public APIs to retrieve a pointer to this structure, and nothing for a user to do with such a structure if he created one himself. Indeed, this structure definition should probably be put within: #ifdef BUILDING_LIBGUILE Anyway, here's the new patch. What do you think? Okay to push? :) Thanks! Mark [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: [PATCH 1/2] Add `random-state-from-platform' and `scm_i_random_bytes_from_platform' --] [-- Type: text/x-patch, Size: 5998 bytes --] From 55714abaf6c55f380bb68f456fdc6af923bae6f0 Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw@netris.org> Date: Wed, 18 Jan 2012 02:36:17 -0500 Subject: [PATCH 1/2] Add `random-state-from-platform' and `scm_i_random_bytes_from_platform' * libguile/random.c (scm_random_state_from_platform): New procedure. (scm_i_random_bytes_from_platform): New internal function. * libguile/random.h (scm_random_state_from_platform, scm_i_random_bytes_from_platform): Add prototypes. * doc/ref/api-data.texi (Random): Add documentation. --- doc/ref/api-data.texi | 30 ++++++++-------------- libguile/random.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++ libguile/random.h | 3 ++ 3 files changed, 80 insertions(+), 19 deletions(-) diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi index f2450ce..e165559 100644 --- a/doc/ref/api-data.texi +++ b/doc/ref/api-data.texi @@ -1865,6 +1865,14 @@ Return a datum representation of @var{state} that may be written out and read back with the Scheme reader. @end deffn +@deffn {Scheme Procedure} random-state-from-platform +@deffnx {C Function} scm_random_state_from_platform () +Construct a new random state seeded from a platform-specific source of +entropy, appropriate for use in non-security-critical applications. +Currently @file{/dev/urandom} is tried first, or else the seed is based +on the time, date, process ID, and a high-resolution timer if available. +@end deffn + @defvar *random-state* The global random state used by the above functions when the @var{state} parameter is not given. @@ -1887,29 +1895,13 @@ Guile started up, will always give: (0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12) @end lisp -To use the time of day as the random seed, you can use code like this: - -@lisp -(let ((time (gettimeofday))) - (set! *random-state* - (seed->random-state (+ (car time) - (cdr time))))) -@end lisp - -@noindent -And then (depending on the time of day, of course): +To seed the random state in a sensible way for non-security-critical +applications, do this during initialization of your program: @lisp -(map random (cdr (iota 19))) -@result{} -(0 0 1 0 2 4 5 4 5 5 9 3 10 1 8 3 14 17) +(set! *random-state* (random-state-from-platform)) @end lisp -For security applications, such as password generation, you should use -more bits of seed. Otherwise an open source password generator could -be attacked by guessing the seed@dots{} but that's a subject for -another manual. - @node Characters @subsection Characters diff --git a/libguile/random.c b/libguile/random.c index 8bc0d87..5c7983a 100644 --- a/libguile/random.c +++ b/libguile/random.c @@ -653,6 +653,72 @@ SCM_DEFINE (scm_random_exp, "random:exp", 0, 1, 0, } #undef FUNC_NAME +/* Return a new random-state seeded from the time, date, process ID, + and a high-resolution timer if available. This is only to be used as + a last resort, when no better source of entropy is available. */ +static SCM +random_state_of_last_resort (void) +{ + SCM time_of_day = scm_gettimeofday (); + return scm_seed_to_random_state + (scm_sum + (scm_get_internal_real_time (), + scm_ash (scm_sum (scm_getpid (), + scm_ash (scm_sum (scm_cdr (time_of_day), + scm_ash (scm_car (time_of_day), + SCM_I_MAKINUM (20))), + SCM_I_MAKINUM (20))), + SCM_I_MAKINUM (64)))); +} + +/* Attempt to fill buffer with random bytes from /dev/urandom. + Return 1 if successful, else return 0. */ +static int +read_dev_urandom (unsigned char *buf, size_t len) +{ + size_t res = 0; + FILE *f = fopen ("/dev/urandom", "r"); + if (f) + { + res = fread(buf, 1, len, f); + fclose (f); + } + return (res == len); +} + +/* Fill a buffer with random bytes seeded from a platform-specific + source of entropy. /dev/urandom is used if available. Note that + this function provides no guarantees about the amount of entropy + present in the returned bytes. */ +void +scm_i_random_bytes_from_platform (unsigned char *buf, size_t len) +{ + if (read_dev_urandom (buf, len)) + return; + else /* FIXME: support other platform sources */ + { + /* When all else fails, use this (rather weak) fallback */ + SCM random_state = random_state_of_last_resort (); + int i; + for (i = len-1; i >= 0; --i) + buf[i] = scm_to_int (scm_random (SCM_I_MAKINUM (256), random_state)); + } +} + +SCM_DEFINE (scm_random_state_from_platform, "random-state-from-platform", 0, 0, 0, + (void), + "Construct a new random state seeded from a platform-specific\n\ +source of entropy, appropriate for use in non-security-critical applications.") +#define FUNC_NAME s_scm_random_state_from_platform +{ + unsigned char buf[32]; + if (read_dev_urandom (buf, sizeof(buf))) + return make_rstate (scm_c_make_rstate ((char *) buf, sizeof(buf))); + else + return random_state_of_last_resort (); +} +#undef FUNC_NAME + void scm_init_random () { diff --git a/libguile/random.h b/libguile/random.h index 2f1f0f6..109969e 100644 --- a/libguile/random.h +++ b/libguile/random.h @@ -86,6 +86,7 @@ SCM_API SCM scm_copy_random_state (SCM state); SCM_API SCM scm_seed_to_random_state (SCM seed); SCM_API SCM scm_datum_to_random_state (SCM datum); SCM_API SCM scm_random_state_to_datum (SCM state); +SCM_API SCM scm_random_state_from_platform (void); SCM_API SCM scm_random_uniform (SCM state); SCM_API SCM scm_random_solid_sphere_x (SCM v, SCM state); SCM_API SCM scm_random_hollow_sphere_x (SCM v, SCM state); @@ -94,6 +95,8 @@ SCM_API SCM scm_random_normal_vector_x (SCM v, SCM state); SCM_API SCM scm_random_exp (SCM state); SCM_INTERNAL void scm_init_random (void); +SCM_INTERNAL void scm_i_random_bytes_from_platform (unsigned char *buf, size_t len); + #endif /* SCM_RANDOM_H */ /* -- 1.7.5.4 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: [PATCH 2/2] Universally-unique gensyms --] [-- Type: text/x-patch, Size: 4495 bytes --] From 0b7603c90d17d5296a94ef7d22849c536e73b49a Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw@netris.org> Date: Wed, 18 Jan 2012 02:53:05 -0500 Subject: [PATCH 2/2] Universally-unique gensyms * libguile/symbols.c (scm_gensym): Make the gensym counter a 128-bit thread-local, initialized to a random number upon the first call to `gensym' within a given thread. This counter is rendered as a 22 byte suffix of mostly base64 digits. * libguile/threads.h (scm_i_thread): Add a thread-local gensym_counter. * libguile/threads.c (guilify_self_1): Initialize gensym_counter to NULL. --- libguile/symbols.c | 50 +++++++++++++++++++++++++++++++++++++++----------- libguile/threads.c | 1 + libguile/threads.h | 4 ++++ 3 files changed, 44 insertions(+), 11 deletions(-) diff --git a/libguile/symbols.c b/libguile/symbols.c index 59aca00..840a458 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -31,6 +31,7 @@ #include "libguile/variable.h" #include "libguile/alist.h" #include "libguile/fluids.h" +#include "libguile/threads.h" #include "libguile/strings.h" #include "libguile/vectors.h" #include "libguile/hashtab.h" @@ -340,7 +341,9 @@ SCM_DEFINE (scm_string_ci_to_symbol, "string-ci->symbol", 1, 0, 0, /* The default prefix for `gensym'd symbols. */ static SCM default_gensym_prefix; -#define MAX_PREFIX_LENGTH 30 +#define GENSYM_LENGTH 22 /* bytes */ +#define GENSYM_RADIX_BITS 6 +#define GENSYM_RADIX (1 << (GENSYM_RADIX_BITS)) SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, (SCM prefix), @@ -351,22 +354,47 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, "resetting the counter.") #define FUNC_NAME s_scm_gensym { - static int gensym_counter = 0; - + static const char base64[GENSYM_RADIX] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$@"; + static const char base4[4] = "_.-~"; + + unsigned char *digit_buf = SCM_I_CURRENT_THREAD->gensym_counter; + char char_buf[GENSYM_LENGTH]; SCM suffix, name; - int n, n_digits; - char buf[SCM_INTBUFLEN]; + int i; if (SCM_UNBNDP (prefix)) prefix = default_gensym_prefix; - /* mutex in case another thread looks and incs at the exact same moment */ - scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex); - n = gensym_counter++; - scm_i_pthread_mutex_unlock (&scm_i_misc_mutex); + if (SCM_UNLIKELY (digit_buf == NULL)) + { + /* This is the first time gensym has been called in this thread. + Allocate and randomize our new thread-local gensym counter */ + digit_buf = (unsigned char *) + scm_gc_malloc_pointerless (GENSYM_LENGTH, "gensym-counter"); + scm_i_random_bytes_from_platform (digit_buf, GENSYM_LENGTH); + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + digit_buf[i] &= (GENSYM_RADIX - 1); + SCM_I_CURRENT_THREAD->gensym_counter = digit_buf; + } + + /* Increment our thread-local gensym_counter. */ + for (i = (GENSYM_LENGTH - 1); i >= 0; --i) + { + if (SCM_LIKELY (++(digit_buf[i]) < GENSYM_RADIX)) + break; + else + digit_buf[i] = 0; + } + + /* Encode digit_buf as base64, except for the first character where we + use the sparse glyphs "_.-~" (base 4) to provide some visual + separation between the prefix and the dense base64 block. */ + for (i = (GENSYM_LENGTH - 1); i > 0; --i) + char_buf[i] = base64[digit_buf[i]]; + char_buf[0] = base4[digit_buf[0] & 3]; - n_digits = scm_iint2str (n, 10, buf); - suffix = scm_from_latin1_stringn (buf, n_digits); + suffix = scm_from_latin1_stringn (char_buf, GENSYM_LENGTH); name = scm_string_append (scm_list_2 (prefix, suffix)); return scm_string_to_symbol (name); } diff --git a/libguile/threads.c b/libguile/threads.c index 5a13e5c..67834ff 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -545,6 +545,7 @@ guilify_self_1 (struct GC_stack_base *base) t.join_queue = SCM_EOL; t.dynamic_state = SCM_BOOL_F; t.dynwinds = SCM_EOL; + t.gensym_counter = NULL; t.active_asyncs = SCM_EOL; t.block_asyncs = 1; t.pending_asyncs = 1; diff --git a/libguile/threads.h b/libguile/threads.h index ec129bc..3660a58 100644 --- a/libguile/threads.h +++ b/libguile/threads.h @@ -81,6 +81,10 @@ typedef struct scm_i_thread { SCM dynamic_state; SCM dynwinds; + /* Thread-local gensym counter. + */ + unsigned char *gensym_counter; + /* For system asyncs. */ SCM active_asyncs; /* The thunks to be run at the next -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-19 0:58 ` Mark H Weaver @ 2012-01-19 23:26 ` Ludovic Courtès 0 siblings, 0 replies; 17+ messages in thread From: Ludovic Courtès @ 2012-01-19 23:26 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel Hi! Mark H Weaver <mhw@netris.org> skribis: > Anyway, here's the new patch. What do you think? Okay to push? :) OK to push. Thank you! Ludo’. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Universally-unique gensyms 2012-01-18 8:04 ` Mark H Weaver 2012-01-18 9:35 ` Mark H Weaver 2012-01-18 21:41 ` Ludovic Courtès @ 2012-01-18 21:42 ` Ludovic Courtès 2 siblings, 0 replies; 17+ messages in thread From: Ludovic Courtès @ 2012-01-18 21:42 UTC (permalink / raw) To: guile-devel Oh, and they are (still) statistically unique, not “universally-unique.” Ludo’. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-01-20 0:50 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-17 13:27 [PATCH] Universally-unique gensyms Mark H Weaver 2012-01-17 13:57 ` Andy Wingo 2012-01-17 14:06 ` Andy Wingo 2012-01-18 8:04 ` Mark H Weaver 2012-01-18 9:35 ` Mark H Weaver 2012-01-18 21:41 ` Ludovic Courtès 2012-01-18 22:23 ` Andy Wingo 2012-01-18 22:43 ` Ludovic Courtès 2012-01-18 23:30 ` Mark H Weaver 2012-01-19 1:21 ` David Kastrup 2012-01-19 4:50 ` Mark H Weaver 2012-01-19 6:25 ` David Kastrup 2012-01-19 10:23 ` Mark H Weaver 2012-01-20 0:50 ` Mark H Weaver 2012-01-19 0:58 ` Mark H Weaver 2012-01-19 23:26 ` Ludovic Courtès 2012-01-18 21:42 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).