unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#20087: 'gensym' is not guaranteed to return a fresh symbol
@ 2015-03-11 17:15 Ludovic Courtès
  2016-03-18 17:03 ` bug#20087: gensym rain1
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2015-03-11 17:15 UTC (permalink / raw)
  To: 20087

‘gensym’ returns interned symbols, but the algorithm to determine the
new symbol is simplistic and predictable.

Thus, one can arrange to produce a symbol before ‘gensym’ does, leading
‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as is
the case with the second call to ‘gensym’ here:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gensym "x")
$1 = x379
scheme@(guile-user)> 'x405
$2 = x405
scheme@(guile-user)> (gensym "x")
$3 = x405
--8<---------------cut here---------------end--------------->8---

Should we worry about it?  I think it may have hard to anticipate
security implications.

Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2015-03-11 17:15 bug#20087: 'gensym' is not guaranteed to return a fresh symbol Ludovic Courtès
@ 2016-03-18 17:03 ` rain1
  2016-03-22  5:24   ` Mark H Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: rain1 @ 2016-03-18 17:03 UTC (permalink / raw)
  To: 20087

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]

Hello

I agree, this goes against the main assumption people have about gensym. 
I was able to reproduce the bug.

Here's a patch to libguile/symbol.c which fixes this behavior by 
incrementing the gensym counter in a loop until it creates a fresh 
symbol.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: guile-gensym-fix.diff --]
[-- Type: text/x-diff; name=guile-gensym-fix.diff, Size: 1396 bytes --]

diff --git a/libguile/symbols.c b/libguile/symbols.c
index 71d9827..2df7433 100644
--- a/libguile/symbols.c
+++ b/libguile/symbols.c
@@ -394,19 +394,31 @@ SCM_DEFINE (scm_gensym, "gensym", 0, 1, 0,
   SCM suffix, name;
   int n, n_digits;
   char buf[SCM_INTBUFLEN];
-
+  SCM symbol;
+  
   if (SCM_UNBNDP (prefix))
     prefix = default_gensym_prefix;
 
-  /* mutex in case another thread looks and incs at the exact same moment */
-  scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex);
-  n = gensym_counter++;
-  scm_i_pthread_mutex_unlock (&scm_i_misc_mutex);
-
-  n_digits = scm_iint2str (n, 10, buf);
-  suffix = scm_from_latin1_stringn (buf, n_digits);
-  name = scm_string_append (scm_list_2 (prefix, suffix));
-  return scm_string_to_symbol (name);
+  while(1) {
+    /* mutex in case another thread looks and incs at the exact same moment */
+    scm_i_scm_pthread_mutex_lock (&scm_i_misc_mutex);
+    n = gensym_counter++;
+    scm_i_pthread_mutex_unlock (&scm_i_misc_mutex);
+    
+    n_digits = scm_iint2str (n, 10, buf);
+    suffix = scm_from_latin1_stringn (buf, n_digits);
+    name = scm_string_append (scm_list_2 (prefix, suffix));
+    
+    symbol = lookup_interned_symbol (name, scm_i_string_hash (name));
+    if (scm_is_true (symbol))
+      {
+        continue;
+      }
+    else
+      {
+        return scm_string_to_symbol (name);
+      }
+  }
 }
 #undef FUNC_NAME
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-18 17:03 ` bug#20087: gensym rain1
@ 2016-03-22  5:24   ` Mark H Weaver
  2016-03-22  7:58     ` Ludovic Courtès
  2016-03-22 11:21     ` rain1
  0 siblings, 2 replies; 11+ messages in thread
From: Mark H Weaver @ 2016-03-22  5:24 UTC (permalink / raw)
  To: rain1; +Cc: 20087, Ludovic Courtès

ludo@gnu.org (Ludovic Courtès) writes:
> ‘gensym’ returns interned symbols, but the algorithm to determine the
> new symbol is simplistic and predictable.
> 
> Thus, one can arrange to produce a symbol before ‘gensym’ does, leading
> ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as is
> the case with the second call to ‘gensym’ here:

rain1@openmailbox.org writes:
> I agree, this goes against the main assumption people have about
> gensym. I was able to reproduce the bug.
>
> Here's a patch to libguile/symbol.c which fixes this behavior by
> incrementing the gensym counter in a loop until it creates a fresh
> symbol.

I've considered this idea in the past, but it only avoids collisions
with symbols that have been interned before the gensym.  It does not
avoid collisions with symbols interned *after* the gensym.  Obviously,
there's no way to avoid such collisions.

Therefore, we must unfortunately live with the possibility of
collisions.  Furthermore, if we add a requirement for deterministic
behavior (which I think we must), then we must live with the fact that
intentional collisions can be trivially achieved.

With this in mind, I'm not sure it makes sense to add code that
attempts, but fails, to eliminate the possibility of collisions.

If we cannot eliminate the possibility of collisions, and we cannot
avoid intentional collisions, what can we do?  I think the best we can
hope for is to significantly reduce the probability of _unintentional_
collisions, perhaps by starting the gensym counter at a large number.

The other thing we can do is to clearly document these inherent problems
with gensym, so that they will not be misused for jobs for which they
are not appropriate.

What do you think?

      Mark





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-22  5:24   ` Mark H Weaver
@ 2016-03-22  7:58     ` Ludovic Courtès
  2016-03-23 17:55       ` Mark H Weaver
  2016-03-22 11:21     ` rain1
  1 sibling, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2016-03-22  7:58 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: rain1, 20087

Mark H Weaver <mhw@netris.org> skribis:

> I've considered this idea in the past, but it only avoids collisions
> with symbols that have been interned before the gensym.  It does not
> avoid collisions with symbols interned *after* the gensym.  Obviously,
> there's no way to avoid such collisions.

Yeah, good point.

> If we cannot eliminate the possibility of collisions, and we cannot
> avoid intentional collisions, what can we do?  I think the best we can
> hope for is to significantly reduce the probability of _unintentional_
> collisions, perhaps by starting the gensym counter at a large number.

I’m not sure if that would help.

One thing that could help avoid unintentional collisions is to
automatically add whitespace before the number, such that:

  (gensym "x") => #{x 123}#

(This is already the case when called with no arguments.)

> The other thing we can do is to clearly document these inherent problems
> with gensym, so that they will not be misused for jobs for which they
> are not appropriate.

I think we should add a sentence to that effect in the manual.

Thanks,
Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-22  5:24   ` Mark H Weaver
  2016-03-22  7:58     ` Ludovic Courtès
@ 2016-03-22 11:21     ` rain1
  2016-03-22 18:06       ` Mark H Weaver
  1 sibling, 1 reply; 11+ messages in thread
From: rain1 @ 2016-03-22 11:21 UTC (permalink / raw)
  To: 20087

On 2016-03-22 05:24, Mark H Weaver wrote:
> ludo@gnu.org (Ludovic Courtès) writes:
>> ‘gensym’ returns interned symbols, but the algorithm to determine the
>> new symbol is simplistic and predictable.
>> 
>> Thus, one can arrange to produce a symbol before ‘gensym’ does, 
>> leading
>> ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’), as 
>> is
>> the case with the second call to ‘gensym’ here:
> 
> rain1@openmailbox.org writes:
>> I agree, this goes against the main assumption people have about
>> gensym. I was able to reproduce the bug.
>> 
>> Here's a patch to libguile/symbol.c which fixes this behavior by
>> incrementing the gensym counter in a loop until it creates a fresh
>> symbol.
> 
> I've considered this idea in the past, but it only avoids collisions
> with symbols that have been interned before the gensym.  It does not
> avoid collisions with symbols interned *after* the gensym.  Obviously,
> there's no way to avoid such collisions.

Thanks for looking over the patch I sent!

One expects of gensym to create a fresh symbol, something not EQ? to any 
symbol that already exists. It is an important property to be able to 
rely on and this patch achieves that.

About symbols interned after, would that refer to something like this:

------------------------
scheme@(guile-user)> (define a (gensym "x"))
scheme@(guile-user)> a
$1 = x280
scheme@(guile-user)> (eq? a (string->symbol "x280"))
$2 = #t
------------------------

In most lisps gensym creates an uninterned symbol. I think that would 
stop the previous giving #t. I could write a patch for this if wanted.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-22 11:21     ` rain1
@ 2016-03-22 18:06       ` Mark H Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Mark H Weaver @ 2016-03-22 18:06 UTC (permalink / raw)
  To: rain1; +Cc: 20087

rain1@openmailbox.org writes:

> On 2016-03-22 05:24, Mark H Weaver wrote:
>> ludo@gnu.org (Ludovic Courtès) writes:
>>> ‘gensym’ returns interned symbols, but the algorithm to determine the
>>> new symbol is simplistic and predictable.
>>>
>>> Thus, one can arrange to produce a symbol before ‘gensym’ does,
>>> leading
>>> ‘gensym’ to return a symbol that’s not fresh (in terms of ‘eq?’),
>>> as is
>>> the case with the second call to ‘gensym’ here:
>>
>> rain1@openmailbox.org writes:
>>> I agree, this goes against the main assumption people have about
>>> gensym. I was able to reproduce the bug.
>>>
>>> Here's a patch to libguile/symbol.c which fixes this behavior by
>>> incrementing the gensym counter in a loop until it creates a fresh
>>> symbol.
>>
>> I've considered this idea in the past, but it only avoids collisions
>> with symbols that have been interned before the gensym.  It does not
>> avoid collisions with symbols interned *after* the gensym.  Obviously,
>> there's no way to avoid such collisions.
>
> Thanks for looking over the patch I sent!
>
> One expects of gensym to create a fresh symbol, something not EQ? to
> any symbol that already exists. It is an important property to be able
> to rely on and this patch achieves that.

Can you give a (non-contrived) example of an application that requires
the property you stated above, but does not rely on avoiding collisions
with symbols interned after the gensym?

I’m open to the idea that such applications exist, but at the moment I
cannot think of one :)

> About symbols interned after, would that refer to something like this:
>
> ------------------------
> scheme@(guile-user)> (define a (gensym "x"))
> scheme@(guile-user)> a
> $1 = x280
> scheme@(guile-user)> (eq? a (string->symbol "x280"))
> $2 = #t
> ------------------------

Right.  Another example would be using ‘read’ after the gensym, on input
that contains a symbol of the same name.

> In most lisps gensym creates an uninterned symbol. I think that would
> stop the previous giving #t.

Indeed, it would solve this problem, but we cannot change the behavior
of Guile's ‘gensym’ in this way, since it would break a lot of existing
code.

By the way, I looked at our manual entry for ‘gensym’, and it includes
the following text:

     The symbols generated by ‘gensym’ are _likely_ to be unique, since
  their names begin with a space and it is only otherwise possible to
  generate such symbols if a programmer goes out of their way to do so.
  Uniqueness can be guaranteed by instead using uninterned symbols
  (*noteSymbol Uninterned::), though they can’t be usefully written out
  and read back in.

We have ‘make-symbol’ for creating uninterned symbols, although you must
provide the exact name of the returned symbol.

> I could write a patch for this if wanted.

It would be nice to have another procedure, maybe ‘uninterned-gensym’
(I’m not sure what to call it, names are hard :) which would be like
‘gensym’ but would return an uninterned symbol, and thus reliably avoid
collisions.

If you’d like to contribute such a procedure, that would be welcome.

It is our policy to ask contributors to assign copyright to the Free
Software Foundation.  Would you be willing to do this?

      Mark





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-22  7:58     ` Ludovic Courtès
@ 2016-03-23 17:55       ` Mark H Weaver
  2016-03-24  8:45         ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Mark H Weaver @ 2016-03-23 17:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: rain1, 20087

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> If we cannot eliminate the possibility of collisions, and we cannot
>> avoid intentional collisions, what can we do?  I think the best we can
>> hope for is to significantly reduce the probability of _unintentional_
>> collisions, perhaps by starting the gensym counter at a large number.
>
> I’m not sure if that would help.
>
> One thing that could help avoid unintentional collisions is to
> automatically add whitespace before the number, such that:
>
>   (gensym "x") => #{x 123}#

I think this is a good idea.

>> The other thing we can do is to clearly document these inherent problems
>> with gensym, so that they will not be misused for jobs for which they
>> are not appropriate.
>
> I think we should add a sentence to that effect in the manual.

It turns out the manual already has the following text in the ‘gensym’
entry, which I think is sufficient.

     The symbols generated by ‘gensym’ are _likely_ to be unique, since
  their names begin with a space and it is only otherwise possible to
  generate such symbols if a programmer goes out of their way to do so.
  Uniqueness can be guaranteed by instead using uninterned symbols
  (*noteSymbol Uninterned::), though they can’t be usefully written out
  and read back in.

What do you think?

      Mark





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-23 17:55       ` Mark H Weaver
@ 2016-03-24  8:45         ` Ludovic Courtès
  2016-06-23 13:48           ` Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2016-03-24  8:45 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: rain1, 20087

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> If we cannot eliminate the possibility of collisions, and we cannot
>>> avoid intentional collisions, what can we do?  I think the best we can
>>> hope for is to significantly reduce the probability of _unintentional_
>>> collisions, perhaps by starting the gensym counter at a large number.
>>
>> I’m not sure if that would help.
>>
>> One thing that could help avoid unintentional collisions is to
>> automatically add whitespace before the number, such that:
>>
>>   (gensym "x") => #{x 123}#
>
> I think this is a good idea.
>
>>> The other thing we can do is to clearly document these inherent problems
>>> with gensym, so that they will not be misused for jobs for which they
>>> are not appropriate.
>>
>> I think we should add a sentence to that effect in the manual.
>
> It turns out the manual already has the following text in the ‘gensym’
> entry, which I think is sufficient.
>
>      The symbols generated by ‘gensym’ are _likely_ to be unique, since
>   their names begin with a space and it is only otherwise possible to
>   generate such symbols if a programmer goes out of their way to do so.
>   Uniqueness can be guaranteed by instead using uninterned symbols
>   (*noteSymbol Uninterned::), though they can’t be usefully written out
>   and read back in.
>
> What do you think?

Oh indeed, I guess I had overlooked that.

Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-03-24  8:45         ` Ludovic Courtès
@ 2016-06-23 13:48           ` Andy Wingo
  2016-06-23 14:13             ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2016-06-23 13:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 20087, rain1

On Thu 24 Mar 2016 09:45, ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> It turns out the manual already has the following text in the ‘gensym’
>> entry, which I think is sufficient.
>>
>>      The symbols generated by ‘gensym’ are _likely_ to be unique, since
>>   their names begin with a space and it is only otherwise possible to
>>   generate such symbols if a programmer goes out of their way to do so.
>>   Uniqueness can be guaranteed by instead using uninterned symbols
>>   (*noteSymbol Uninterned::), though they can’t be usefully written out
>>   and read back in.
>>
>> What do you think?
>
> Oh indeed, I guess I had overlooked that.

I just pushed something to master to error when serializing an
uninterned symbol.  Otherwise compiling an uninterned symbol effectively
interns it!  I am not sure that we can apply such a fix in 2.0 though as
who knows, maybe someone is compiling something with symbols made with
make-symbol.  WDYT?  If you agree we can close this bug.

Andy





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-06-23 13:48           ` Andy Wingo
@ 2016-06-23 14:13             ` Ludovic Courtès
  2016-06-23 16:05               ` Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2016-06-23 14:13 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 20087, rain1

Andy Wingo <wingo@pobox.com> skribis:

> On Thu 24 Mar 2016 09:45, ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> It turns out the manual already has the following text in the ‘gensym’
>>> entry, which I think is sufficient.
>>>
>>>      The symbols generated by ‘gensym’ are _likely_ to be unique, since
>>>   their names begin with a space and it is only otherwise possible to
>>>   generate such symbols if a programmer goes out of their way to do so.
>>>   Uniqueness can be guaranteed by instead using uninterned symbols
>>>   (*noteSymbol Uninterned::), though they can’t be usefully written out
>>>   and read back in.
>>>
>>> What do you think?
>>
>> Oh indeed, I guess I had overlooked that.
>
> I just pushed something to master to error when serializing an
> uninterned symbol.  Otherwise compiling an uninterned symbol effectively
> interns it!  I am not sure that we can apply such a fix in 2.0 though as
> who knows, maybe someone is compiling something with symbols made with
> make-symbol.  WDYT?  If you agree we can close this bug.

That makes sense to me.

Thanks!
Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#20087: gensym
  2016-06-23 14:13             ` Ludovic Courtès
@ 2016-06-23 16:05               ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2016-06-23 16:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 20087-done, rain1

On Thu 23 Jun 2016 16:13, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> I just pushed something to master to error when serializing an
>> uninterned symbol.  Otherwise compiling an uninterned symbol effectively
>> interns it!  I am not sure that we can apply such a fix in 2.0 though as
>> who knows, maybe someone is compiling something with symbols made with
>> make-symbol.  WDYT?  If you agree we can close this bug.
>
> That makes sense to me.

Closing then.  Thanks for the review :)

Andy





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-23 16:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 17:15 bug#20087: 'gensym' is not guaranteed to return a fresh symbol Ludovic Courtès
2016-03-18 17:03 ` bug#20087: gensym rain1
2016-03-22  5:24   ` Mark H Weaver
2016-03-22  7:58     ` Ludovic Courtès
2016-03-23 17:55       ` Mark H Weaver
2016-03-24  8:45         ` Ludovic Courtès
2016-06-23 13:48           ` Andy Wingo
2016-06-23 14:13             ` Ludovic Courtès
2016-06-23 16:05               ` Andy Wingo
2016-03-22 11:21     ` rain1
2016-03-22 18:06       ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).