* bug#20087: 'gensym' is not guaranteed to return a fresh symbol @ 2015-03-11 17:15 Ludovic Courtès 2016-03-18 17:03 ` bug#20087: gensym rain1 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2015-03-11 17:15 UTC (permalink / raw) To: 20087 ‘gensym’ returns interned symbols, but the algorithm to determine the new symbol is simplistic and predictable. Thus, one can arrange to produce a symbol before ‘gensym’ does, leading ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as is the case with the second call to ‘gensym’ here: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (gensym "x") $1 = x379 scheme@(guile-user)> 'x405 $2 = x405 scheme@(guile-user)> (gensym "x") $3 = x405 --8<---------------cut here---------------end--------------->8--- Should we worry about it? I think it may have hard to anticipate security implications. Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2015-03-11 17:15 bug#20087: 'gensym' is not guaranteed to return a fresh symbol Ludovic Courtès @ 2016-03-18 17:03 ` rain1 2016-03-22 5:24 ` Mark H Weaver 0 siblings, 1 reply; 11+ messages in thread From: rain1 @ 2016-03-18 17:03 UTC (permalink / raw) To: 20087 [-- Attachment #1: Type: text/plain, Size: 257 bytes --] Hello I agree, this goes against the main assumption people have about gensym. I was able to reproduce the bug. Here's a patch to libguile/symbol.c which fixes this behavior by incrementing the gensym counter in a loop until it creates a fresh symbol. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: guile-gensym-fix.diff --] [-- Type: text/x-diff; name=guile-gensym-fix.diff, Size: 1396 bytes --] diff --git a/libguile/symbols.c b/libguile/symbols.c index 71d9827..2df7433 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -394,19 +394,31 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0, SCM suffix, name; int n, n_digits; char buf[SCM_INTBUFLEN]; - + SCM symbol; + if (SCM_UNBNDP (prefix)) prefix = default_gensym_prefix; - /* mutex in case another thread looks and incs at the exact same moment */ - scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex); - n = gensym_counter++; - scm_i_pthread_mutex_unlock (&scm_i_misc_mutex); - - n_digits = scm_iint2str (n, 10, buf); - suffix = scm_from_latin1_stringn (buf, n_digits); - name = scm_string_append (scm_list_2 (prefix, suffix)); - return scm_string_to_symbol (name); + while(1) { + /* mutex in case another thread looks and incs at the exact same moment */ + scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex); + n = gensym_counter++; + scm_i_pthread_mutex_unlock (&scm_i_misc_mutex); + + n_digits = scm_iint2str (n, 10, buf); + suffix = scm_from_latin1_stringn (buf, n_digits); + name = scm_string_append (scm_list_2 (prefix, suffix)); + + symbol = lookup_interned_symbol (name, scm_i_string_hash (name)); + if (scm_is_true (symbol)) + { + continue; + } + else + { + return scm_string_to_symbol (name); + } + } } #undef FUNC_NAME ^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-18 17:03 ` bug#20087: gensym rain1 @ 2016-03-22 5:24 ` Mark H Weaver 2016-03-22 7:58 ` Ludovic Courtès 2016-03-22 11:21 ` rain1 0 siblings, 2 replies; 11+ messages in thread From: Mark H Weaver @ 2016-03-22 5:24 UTC (permalink / raw) To: rain1; +Cc: 20087, Ludovic Courtès ludo@gnu.org (Ludovic Courtès) writes: > ‘gensym’ returns interned symbols, but the algorithm to determine the > new symbol is simplistic and predictable. > > Thus, one can arrange to produce a symbol before ‘gensym’ does, leading > ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as is > the case with the second call to ‘gensym’ here: rain1@openmailbox.org writes: > I agree, this goes against the main assumption people have about > gensym. I was able to reproduce the bug. > > Here's a patch to libguile/symbol.c which fixes this behavior by > incrementing the gensym counter in a loop until it creates a fresh > symbol. I've considered this idea in the past, but it only avoids collisions with symbols that have been interned before the gensym. It does not avoid collisions with symbols interned *after* the gensym. Obviously, there's no way to avoid such collisions. Therefore, we must unfortunately live with the possibility of collisions. Furthermore, if we add a requirement for deterministic behavior (which I think we must), then we must live with the fact that intentional collisions can be trivially achieved. With this in mind, I'm not sure it makes sense to add code that attempts, but fails, to eliminate the possibility of collisions. If we cannot eliminate the possibility of collisions, and we cannot avoid intentional collisions, what can we do? I think the best we can hope for is to significantly reduce the probability of _unintentional_ collisions, perhaps by starting the gensym counter at a large number. The other thing we can do is to clearly document these inherent problems with gensym, so that they will not be misused for jobs for which they are not appropriate. What do you think? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-22 5:24 ` Mark H Weaver @ 2016-03-22 7:58 ` Ludovic Courtès 2016-03-23 17:55 ` Mark H Weaver 2016-03-22 11:21 ` rain1 1 sibling, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2016-03-22 7:58 UTC (permalink / raw) To: Mark H Weaver; +Cc: rain1, 20087 Mark H Weaver <mhw@netris.org> skribis: > I've considered this idea in the past, but it only avoids collisions > with symbols that have been interned before the gensym. It does not > avoid collisions with symbols interned *after* the gensym. Obviously, > there's no way to avoid such collisions. Yeah, good point. > If we cannot eliminate the possibility of collisions, and we cannot > avoid intentional collisions, what can we do? I think the best we can > hope for is to significantly reduce the probability of _unintentional_ > collisions, perhaps by starting the gensym counter at a large number. I’m not sure if that would help. One thing that could help avoid unintentional collisions is to automatically add whitespace before the number, such that: (gensym "x") => #{x 123}# (This is already the case when called with no arguments.) > The other thing we can do is to clearly document these inherent problems > with gensym, so that they will not be misused for jobs for which they > are not appropriate. I think we should add a sentence to that effect in the manual. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-22 7:58 ` Ludovic Courtès @ 2016-03-23 17:55 ` Mark H Weaver 2016-03-24 8:45 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2016-03-23 17:55 UTC (permalink / raw) To: Ludovic Courtès; +Cc: rain1, 20087 ludo@gnu.org (Ludovic Courtès) writes: > Mark H Weaver <mhw@netris.org> skribis: > >> If we cannot eliminate the possibility of collisions, and we cannot >> avoid intentional collisions, what can we do? I think the best we can >> hope for is to significantly reduce the probability of _unintentional_ >> collisions, perhaps by starting the gensym counter at a large number. > > I’m not sure if that would help. > > One thing that could help avoid unintentional collisions is to > automatically add whitespace before the number, such that: > > (gensym "x") => #{x 123}# I think this is a good idea. >> The other thing we can do is to clearly document these inherent problems >> with gensym, so that they will not be misused for jobs for which they >> are not appropriate. > > I think we should add a sentence to that effect in the manual. It turns out the manual already has the following text in the ‘gensym’ entry, which I think is sufficient. The symbols generated by ‘gensym’ are _likely_ to be unique, since their names begin with a space and it is only otherwise possible to generate such symbols if a programmer goes out of their way to do so. Uniqueness can be guaranteed by instead using uninterned symbols (*noteSymbol Uninterned::), though they can’t be usefully written out and read back in. What do you think? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-23 17:55 ` Mark H Weaver @ 2016-03-24 8:45 ` Ludovic Courtès 2016-06-23 13:48 ` Andy Wingo 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2016-03-24 8:45 UTC (permalink / raw) To: Mark H Weaver; +Cc: rain1, 20087 Mark H Weaver <mhw@netris.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> Mark H Weaver <mhw@netris.org> skribis: >> >>> If we cannot eliminate the possibility of collisions, and we cannot >>> avoid intentional collisions, what can we do? I think the best we can >>> hope for is to significantly reduce the probability of _unintentional_ >>> collisions, perhaps by starting the gensym counter at a large number. >> >> I’m not sure if that would help. >> >> One thing that could help avoid unintentional collisions is to >> automatically add whitespace before the number, such that: >> >> (gensym "x") => #{x 123}# > > I think this is a good idea. > >>> The other thing we can do is to clearly document these inherent problems >>> with gensym, so that they will not be misused for jobs for which they >>> are not appropriate. >> >> I think we should add a sentence to that effect in the manual. > > It turns out the manual already has the following text in the ‘gensym’ > entry, which I think is sufficient. > > The symbols generated by ‘gensym’ are _likely_ to be unique, since > their names begin with a space and it is only otherwise possible to > generate such symbols if a programmer goes out of their way to do so. > Uniqueness can be guaranteed by instead using uninterned symbols > (*noteSymbol Uninterned::), though they can’t be usefully written out > and read back in. > > What do you think? Oh indeed, I guess I had overlooked that. Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-24 8:45 ` Ludovic Courtès @ 2016-06-23 13:48 ` Andy Wingo 2016-06-23 14:13 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Andy Wingo @ 2016-06-23 13:48 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 20087, rain1 On Thu 24 Mar 2016 09:45, ludo@gnu.org (Ludovic Courtès) writes: > Mark H Weaver <mhw@netris.org> skribis: > >> It turns out the manual already has the following text in the ‘gensym’ >> entry, which I think is sufficient. >> >> The symbols generated by ‘gensym’ are _likely_ to be unique, since >> their names begin with a space and it is only otherwise possible to >> generate such symbols if a programmer goes out of their way to do so. >> Uniqueness can be guaranteed by instead using uninterned symbols >> (*noteSymbol Uninterned::), though they can’t be usefully written out >> and read back in. >> >> What do you think? > > Oh indeed, I guess I had overlooked that. I just pushed something to master to error when serializing an uninterned symbol. Otherwise compiling an uninterned symbol effectively interns it! I am not sure that we can apply such a fix in 2.0 though as who knows, maybe someone is compiling something with symbols made with make-symbol. WDYT? If you agree we can close this bug. Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-06-23 13:48 ` Andy Wingo @ 2016-06-23 14:13 ` Ludovic Courtès 2016-06-23 16:05 ` Andy Wingo 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2016-06-23 14:13 UTC (permalink / raw) To: Andy Wingo; +Cc: 20087, rain1 Andy Wingo <wingo@pobox.com> skribis: > On Thu 24 Mar 2016 09:45, ludo@gnu.org (Ludovic Courtès) writes: > >> Mark H Weaver <mhw@netris.org> skribis: >> >>> It turns out the manual already has the following text in the ‘gensym’ >>> entry, which I think is sufficient. >>> >>> The symbols generated by ‘gensym’ are _likely_ to be unique, since >>> their names begin with a space and it is only otherwise possible to >>> generate such symbols if a programmer goes out of their way to do so. >>> Uniqueness can be guaranteed by instead using uninterned symbols >>> (*noteSymbol Uninterned::), though they can’t be usefully written out >>> and read back in. >>> >>> What do you think? >> >> Oh indeed, I guess I had overlooked that. > > I just pushed something to master to error when serializing an > uninterned symbol. Otherwise compiling an uninterned symbol effectively > interns it! I am not sure that we can apply such a fix in 2.0 though as > who knows, maybe someone is compiling something with symbols made with > make-symbol. WDYT? If you agree we can close this bug. That makes sense to me. Thanks! Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-06-23 14:13 ` Ludovic Courtès @ 2016-06-23 16:05 ` Andy Wingo 0 siblings, 0 replies; 11+ messages in thread From: Andy Wingo @ 2016-06-23 16:05 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 20087-done, rain1 On Thu 23 Jun 2016 16:13, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> skribis: > >> I just pushed something to master to error when serializing an >> uninterned symbol. Otherwise compiling an uninterned symbol effectively >> interns it! I am not sure that we can apply such a fix in 2.0 though as >> who knows, maybe someone is compiling something with symbols made with >> make-symbol. WDYT? If you agree we can close this bug. > > That makes sense to me. Closing then. Thanks for the review :) Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-22 5:24 ` Mark H Weaver 2016-03-22 7:58 ` Ludovic Courtès @ 2016-03-22 11:21 ` rain1 2016-03-22 18:06 ` Mark H Weaver 1 sibling, 1 reply; 11+ messages in thread From: rain1 @ 2016-03-22 11:21 UTC (permalink / raw) To: 20087 On 2016-03-22 05:24, Mark H Weaver wrote: > ludo@gnu.org (Ludovic Courtès) writes: >> ‘gensym’ returns interned symbols, but the algorithm to determine the >> new symbol is simplistic and predictable. >> >> Thus, one can arrange to produce a symbol before ‘gensym’ does, >> leading >> ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as >> is >> the case with the second call to ‘gensym’ here: > > rain1@openmailbox.org writes: >> I agree, this goes against the main assumption people have about >> gensym. I was able to reproduce the bug. >> >> Here's a patch to libguile/symbol.c which fixes this behavior by >> incrementing the gensym counter in a loop until it creates a fresh >> symbol. > > I've considered this idea in the past, but it only avoids collisions > with symbols that have been interned before the gensym. It does not > avoid collisions with symbols interned *after* the gensym. Obviously, > there's no way to avoid such collisions. Thanks for looking over the patch I sent! One expects of gensym to create a fresh symbol, something not EQ? to any symbol that already exists. It is an important property to be able to rely on and this patch achieves that. About symbols interned after, would that refer to something like this: ------------------------ scheme@(guile-user)> (define a (gensym "x")) scheme@(guile-user)> a $1 = x280 scheme@(guile-user)> (eq? a (string->symbol "x280")) $2 = #t ------------------------ In most lisps gensym creates an uninterned symbol. I think that would stop the previous giving #t. I could write a patch for this if wanted. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#20087: gensym 2016-03-22 11:21 ` rain1 @ 2016-03-22 18:06 ` Mark H Weaver 0 siblings, 0 replies; 11+ messages in thread From: Mark H Weaver @ 2016-03-22 18:06 UTC (permalink / raw) To: rain1; +Cc: 20087 rain1@openmailbox.org writes: > On 2016-03-22 05:24, Mark H Weaver wrote: >> ludo@gnu.org (Ludovic Courtès) writes: >>> ‘gensym’ returns interned symbols, but the algorithm to determine the >>> new symbol is simplistic and predictable. >>> >>> Thus, one can arrange to produce a symbol before ‘gensym’ does, >>> leading >>> ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), >>> as is >>> the case with the second call to ‘gensym’ here: >> >> rain1@openmailbox.org writes: >>> I agree, this goes against the main assumption people have about >>> gensym. I was able to reproduce the bug. >>> >>> Here's a patch to libguile/symbol.c which fixes this behavior by >>> incrementing the gensym counter in a loop until it creates a fresh >>> symbol. >> >> I've considered this idea in the past, but it only avoids collisions >> with symbols that have been interned before the gensym. It does not >> avoid collisions with symbols interned *after* the gensym. Obviously, >> there's no way to avoid such collisions. > > Thanks for looking over the patch I sent! > > One expects of gensym to create a fresh symbol, something not EQ? to > any symbol that already exists. It is an important property to be able > to rely on and this patch achieves that. Can you give a (non-contrived) example of an application that requires the property you stated above, but does not rely on avoiding collisions with symbols interned after the gensym? I’m open to the idea that such applications exist, but at the moment I cannot think of one :) > About symbols interned after, would that refer to something like this: > > ------------------------ > scheme@(guile-user)> (define a (gensym "x")) > scheme@(guile-user)> a > $1 = x280 > scheme@(guile-user)> (eq? a (string->symbol "x280")) > $2 = #t > ------------------------ Right. Another example would be using ‘read’ after the gensym, on input that contains a symbol of the same name. > In most lisps gensym creates an uninterned symbol. I think that would > stop the previous giving #t. Indeed, it would solve this problem, but we cannot change the behavior of Guile's ‘gensym’ in this way, since it would break a lot of existing code. By the way, I looked at our manual entry for ‘gensym’, and it includes the following text: The symbols generated by ‘gensym’ are _likely_ to be unique, since their names begin with a space and it is only otherwise possible to generate such symbols if a programmer goes out of their way to do so. Uniqueness can be guaranteed by instead using uninterned symbols (*noteSymbol Uninterned::), though they can’t be usefully written out and read back in. We have ‘make-symbol’ for creating uninterned symbols, although you must provide the exact name of the returned symbol. > I could write a patch for this if wanted. It would be nice to have another procedure, maybe ‘uninterned-gensym’ (I’m not sure what to call it, names are hard :) which would be like ‘gensym’ but would return an uninterned symbol, and thus reliably avoid collisions. If you’d like to contribute such a procedure, that would be welcome. It is our policy to ask contributors to assign copyright to the Free Software Foundation. Would you be willing to do this? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-06-23 16:05 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-11 17:15 bug#20087: 'gensym' is not guaranteed to return a fresh symbol Ludovic Courtès 2016-03-18 17:03 ` bug#20087: gensym rain1 2016-03-22 5:24 ` Mark H Weaver 2016-03-22 7:58 ` Ludovic Courtès 2016-03-23 17:55 ` Mark H Weaver 2016-03-24 8:45 ` Ludovic Courtès 2016-06-23 13:48 ` Andy Wingo 2016-06-23 14:13 ` Ludovic Courtès 2016-06-23 16:05 ` Andy Wingo 2016-03-22 11:21 ` rain1 2016-03-22 18:06 ` Mark H Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).