* Improve `seed->random-state' in stable-2.0?
@ 2012-01-20 3:28 Mark H Weaver
2012-01-20 8:37 ` David Kastrup
2012-01-20 14:54 ` Andy Wingo
0 siblings, 2 replies; 15+ messages in thread
From: Mark H Weaver @ 2012-01-20 3:28 UTC (permalink / raw)
To: guile-devel
Hello all,
`seed->random-state' is poorly implemented when passed a numeric
argument. It converts the number to a decimal string, and then
`scm_i_init_rstate' takes over and basically adds every 8th byte
together to form the 64-bit internal state.
The problem is that only about 3-bits of entropy is present in each
character of the string, and then these are added together in an aligned
fashion, which also loses a great deal of entropy. I haven't analyzed
this carefully, but I'd be surprised if we preserve much more than
32-bits of entropy from the original seed.
I know how to make this _much_ better, but here's the question: is it
okay to change the behavior of the random number generator in 2.0? Or
is it important that the same sequence of random numbers are generated
from a given seed in the entire stable-2.0 series?
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 3:28 Improve `seed->random-state' in stable-2.0? Mark H Weaver
@ 2012-01-20 8:37 ` David Kastrup
2012-01-20 14:54 ` Andy Wingo
1 sibling, 0 replies; 15+ messages in thread
From: David Kastrup @ 2012-01-20 8:37 UTC (permalink / raw)
To: guile-devel
Mark H Weaver <mhw@netris.org> writes:
> I know how to make this _much_ better, but here's the question: is it
> okay to change the behavior of the random number generator in 2.0? Or
> is it important that the same sequence of random numbers are generated
> from a given seed in the entire stable-2.0 series?
As long as the total _sequence_ stays the same (and does not contain
duplicate values), I would not worry. If the sequence itself becomes
different, then the probability of two random subsequences having at
least one match match grows with the product rather than the sum of
their length. Of course, if the subsequences _do_ overlap, you tend to
get colliding sequences instead of just single outliers.
--
David Kastrup
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 3:28 Improve `seed->random-state' in stable-2.0? Mark H Weaver
2012-01-20 8:37 ` David Kastrup
@ 2012-01-20 14:54 ` Andy Wingo
2012-01-20 18:52 ` Mark H Weaver
1 sibling, 1 reply; 15+ messages in thread
From: Andy Wingo @ 2012-01-20 14:54 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
On Fri 20 Jan 2012 04:28, Mark H Weaver <mhw@netris.org> writes:
> `seed->random-state' is poorly implemented when passed a numeric
> argument. It converts the number to a decimal string, and then
> `scm_i_init_rstate' takes over and basically adds every 8th byte
> together to form the 64-bit internal state.
Is that what it does?
I agree that it's a pretty bad initializer to a pretty bad PRNG, but
from what I can tell, it does use all of the bytes in the input. They
aren't very dense bytes, entropy-wise, but there are more of them, so I
think the amount of entropy added to the seed is the same.
Dunno. Am I reading that code wrong?
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 14:54 ` Andy Wingo
@ 2012-01-20 18:52 ` Mark H Weaver
2012-01-20 20:35 ` Andy Wingo
0 siblings, 1 reply; 15+ messages in thread
From: Mark H Weaver @ 2012-01-20 18:52 UTC (permalink / raw)
To: Andy Wingo; +Cc: guile-devel
Andy Wingo <wingo@pobox.com> writes:
> On Fri 20 Jan 2012 04:28, Mark H Weaver <mhw@netris.org> writes:
>
>> `seed->random-state' is poorly implemented when passed a numeric
>> argument. It converts the number to a decimal string, and then
>> `scm_i_init_rstate' takes over and basically adds every 8th byte
>> together to form the 64-bit internal state.
>
> Is that what it does?
>
> I agree that it's a pretty bad initializer to a pretty bad PRNG, but
> from what I can tell, it does use all of the bytes in the input.
Sorry, my explanation was not clear. What I mean is that the result of
number->string (suppose it is "12345678901234567890") gets processed
like this:
0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38
0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x36
0x37 0x38 0x39 0x30
(those are the ASCII values of digits 0-9)
and then the sum of each column is taken, yielding 64-bits. That's what
I meant when I wrote "adds every 8th byte together". So yes, indeed
every byte is used.
> They aren't very dense bytes, entropy-wise, but there are more of
> them, so I think the amount of entropy added to the seed is the same.
No, adding numbers does _not_ preserve their total entropy, not even
close. Suppose you start with two truly random 8-bit numbers, with all
256 values equally likely. So that's a total of 16 bits of entropy. If
you add them together, you end up with a 9-bit number whose possible
values are _not_ equally likely, so that's less than 9 bits of entropy.
This is what's happening here. We have a 64-bit PRNG, but even if you
make sure to seed it with a number that has plenty of entropy,
`seed->random-state' will only preserve around 32 bits of it.
That's not good. We should fix it if we possibly can. UUIDs really
can't afford to have only 32 bits of entropy and still be reliable. If
we cannot fix `seed->random-state', we'll have to avoid using it in UUID
initialization.
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 18:52 ` Mark H Weaver
@ 2012-01-20 20:35 ` Andy Wingo
2012-01-20 22:45 ` Mark H Weaver
2012-01-20 23:46 ` Mike Gran
0 siblings, 2 replies; 15+ messages in thread
From: Andy Wingo @ 2012-01-20 20:35 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Hi Mark,
Thanks for the explanations!
On Fri 20 Jan 2012 19:52, Mark H Weaver <mhw@netris.org> writes:
> We have a 64-bit PRNG, but even if you make sure to seed it with a
> number that has plenty of entropy, `seed->random-state' will only
> preserve around 32 bits of it.
Wow, that's horrible. Agreed that this needs fixing.
How about, we extend seed->random-state to operate on bytevectors, and
have that interface do the right thing. We deprecate the number
interface. At some point in the future we deprecate the string
interface as well.
I'm hesitant to make seed->random-state change the rstate that it
produces, for a given seed. It's unlikely anyone is relying on this,
but who knows...
WDYT?
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 20:35 ` Andy Wingo
@ 2012-01-20 22:45 ` Mark H Weaver
2012-01-21 7:38 ` David Kastrup
2012-01-23 9:52 ` Andy Wingo
2012-01-20 23:46 ` Mike Gran
1 sibling, 2 replies; 15+ messages in thread
From: Mark H Weaver @ 2012-01-20 22:45 UTC (permalink / raw)
To: Andy Wingo; +Cc: guile-devel
Andy Wingo <wingo@pobox.com> writes:
> On Fri 20 Jan 2012 19:52, Mark H Weaver <mhw@netris.org> writes:
>
>> We have a 64-bit PRNG, but even if you make sure to seed it with a
>> number that has plenty of entropy, `seed->random-state' will only
>> preserve around 32 bits of it.
>
> Wow, that's horrible. Agreed that this needs fixing.
>
> How about, we extend seed->random-state to operate on bytevectors, and
> have that interface do the right thing.
I agree that it would be nice for `seed->random-state' to support
bytevectors as well, but for many (most?) purposes that would be a very
awkward interface to use.
> We deprecate the number interface. At some point in the future we
> deprecate the string interface as well.
I think this would be very unfortunate. Numbers are by far the most
natural representation for seed values, and I hope they will continue to
be fully supported.
> I'm hesitant to make seed->random-state change the rstate that it
> produces, for a given seed. It's unlikely anyone is relying on this,
> but who knows...
Even if we keep a broken `seed->random-state', there's another problem:
our PRNG sucks rocks. If we constrain ourselves to produce the same
sequence of random numbers for a given seed, that means that we're stuck
with this very weak PRNG for the entire 2.0 series.
Can't we just make a clean break now? 2.0 is still not widely deployed,
so now is a great time to assert our right to change the PRNG at will.
As you say, it's unlikely that anyone is relying on this anyway.
If anyone is, wouldn't it be better to deal with that now?
Thanks,
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 22:45 ` Mark H Weaver
@ 2012-01-21 7:38 ` David Kastrup
2012-01-21 8:20 ` Mark H Weaver
2012-01-23 9:52 ` Andy Wingo
1 sibling, 1 reply; 15+ messages in thread
From: David Kastrup @ 2012-01-21 7:38 UTC (permalink / raw)
To: guile-devel
Mark H Weaver <mhw@netris.org> writes:
> Can't we just make a clean break now? 2.0 is still not widely
> deployed, so now is a great time to assert our right to change the
> PRNG at will. As you say, it's unlikely that anyone is relying on
> this anyway. If anyone is, wouldn't it be better to deal with that
> now?
For unique identifier generation, you want a PRNG that has a period
equal to the size of the generated identifier set and generates the same
next value given the current value in order to avoid the birthday
paradoxon. That is actually different from what one considers a "good
PRNG", so it might make sense to have a home-brewn "bad PRNG" for this
purpose.
Actually, you don't need a PRNG at all. Generate a _good_ random
starting value, and count sequentially from there. The probability of
collision is the same as with a pseudo-random sequence, but if there
happens to be a problem at any time, it becomes much easier to diagnose.
--
David Kastrup
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-21 7:38 ` David Kastrup
@ 2012-01-21 8:20 ` Mark H Weaver
0 siblings, 0 replies; 15+ messages in thread
From: Mark H Weaver @ 2012-01-21 8:20 UTC (permalink / raw)
To: David Kastrup; +Cc: guile-devel
David Kastrup <dak@gnu.org> writes:
> Actually, you don't need a PRNG at all. Generate a _good_ random
> starting value, and count sequentially from there.
This is _exactly_ what my patch does, on a per-thread basis.
The starting value is read directly from /dev/urandom if available.
However, if /dev/urandom cannot be read, the PRNG is used to generate
the starting value. This is a last resort, and ideally we should never
use it.
Thanks,
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 22:45 ` Mark H Weaver
2012-01-21 7:38 ` David Kastrup
@ 2012-01-23 9:52 ` Andy Wingo
1 sibling, 0 replies; 15+ messages in thread
From: Andy Wingo @ 2012-01-23 9:52 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Hi Mark,
On Fri 20 Jan 2012 23:45, Mark H Weaver <mhw@netris.org> writes:
> Andy Wingo <wingo@pobox.com> writes:
>
>> How about, we extend seed->random-state to operate on bytevectors, and
>> have that interface do the right thing.
>
> I agree that it would be nice for `seed->random-state' to support
> bytevectors as well, but for many (most?) purposes that would be a very
> awkward interface to use.
Really? My thought would be that if I have to initialize a PRNG, I need
some random bits. Dunno. Deprecating the number interface does have
the advantage that we can perhaps move people to use your nice new
functions, instead of initializing with (getpid) or whatever it is that
they are using.
> Even if we keep a broken `seed->random-state', there's another problem:
> our PRNG sucks rocks. If we constrain ourselves to produce the same
> sequence of random numbers for a given seed, that means that we're stuck
> with this very weak PRNG for the entire 2.0 series.
>
> Can't we just make a clean break now? 2.0 is still not widely deployed,
> so now is a great time to assert our right to change the PRNG at will.
> As you say, it's unlikely that anyone is relying on this anyway.
> If anyone is, wouldn't it be better to deal with that now?
While I agree about the badness of the PRNG -- though we shouldn't
overstate that; for being so simple, MWC does well -- but I really don't
think that we should change the default behavior now.
OTOH, we can make seed->random-state on bytevectors return an rstate
with a different implementation of scm_t_rng -- for example, we could
use GMP's mersenne twister API.
In any case, don't let stability concerns stop you from hacking on a
fix! We'll probably get the first 2.2 preview out within a a year, and
we certainly need to change the default behavior for then
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 20:35 ` Andy Wingo
2012-01-20 22:45 ` Mark H Weaver
@ 2012-01-20 23:46 ` Mike Gran
2012-01-23 9:55 ` Andy Wingo
1 sibling, 1 reply; 15+ messages in thread
From: Mike Gran @ 2012-01-20 23:46 UTC (permalink / raw)
To: Andy Wingo, Mark H Weaver; +Cc: guile-devel@gnu.org
> From: Andy Wingo wingo@pobox.com
>How about, we extend seed->random-state to operate on bytevectors, and
>have that interface do the right thing. We deprecate the number
>interface. At some point in the future we deprecate the string
>interface as well.
The number or string argument to seed->random-state is not unique to
Guile. Slib uses it, too.
The string argument always seemed strange to me and seems to be seldom
used. But a websearch says that the integer argument is quite common.
(seed->random-state (current-time)) seems to be a common idiom that
you would end up breaking.
Regards,
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-20 23:46 ` Mike Gran
@ 2012-01-23 9:55 ` Andy Wingo
2012-01-23 13:06 ` Mike Gran
0 siblings, 1 reply; 15+ messages in thread
From: Andy Wingo @ 2012-01-23 9:55 UTC (permalink / raw)
To: Mike Gran; +Cc: Mark H Weaver, guile-devel@gnu.org
On Sat 21 Jan 2012 00:46, Mike Gran <spk121@yahoo.com> writes:
> (seed->random-state (current-time)) seems to be a common idiom that
> you would end up breaking.
This is a common idiom that is worth deprecating. Mark's new functions
that seed the random state from /dev/urandom are much better.
So no, no plans to break anything -- but we should help these people
write better programs.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-23 9:55 ` Andy Wingo
@ 2012-01-23 13:06 ` Mike Gran
2012-01-23 14:06 ` Andy Wingo
0 siblings, 1 reply; 15+ messages in thread
From: Mike Gran @ 2012-01-23 13:06 UTC (permalink / raw)
To: Andy Wingo; +Cc: Mark H Weaver, guile-devel@gnu.org
> From: Andy Wingo <wingo@pobox.com>
>> (seed->random-state (current-time)) seems to be a common idiom that
>> you would end up breaking.
>
>This is a common idiom that is worth deprecating. Mark's new functions
>that seed the random state from /dev/urandom are much better.
Are you suggesting that you'll break the API in the hope that when people's
code stops working, they'll reread the manual and notice that
random-state-from-platform exists?
It seems a rather unfriendly way to accomplish that task.
Regards,
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-23 13:06 ` Mike Gran
@ 2012-01-23 14:06 ` Andy Wingo
2012-01-24 6:08 ` Mike Gran
0 siblings, 1 reply; 15+ messages in thread
From: Andy Wingo @ 2012-01-23 14:06 UTC (permalink / raw)
To: Mike Gran; +Cc: Mark H Weaver, guile-devel@gnu.org
On Mon 23 Jan 2012 14:06, Mike Gran <spk121@yahoo.com> writes:
>> From: Andy Wingo <wingo@pobox.com>
>>> (seed->random-state (current-time)) seems to be a common idiom that
>>> you would end up breaking.
>>
>>This is a common idiom that is worth deprecating. Mark's new functions
>>that seed the random state from /dev/urandom are much better.
>
> Are you suggesting that you'll break the API in the hope that when people's
> code stops working, they'll reread the manual and notice that
> random-state-from-platform exists?
>
> It seems a rather unfriendly way to accomplish that task.
That would indeed be a mean thing to do! It's not what I'm suggesting
though. Deprecation means causing Guile to emit warnings, at
compile-time or at runtime, indicating that a particular interface will
go away at some point, and noting the interface that should be used
instead.
I think it's fairly helpful, actually, but if you have any suggestions
for how it could be improved, they are much welcome.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-23 14:06 ` Andy Wingo
@ 2012-01-24 6:08 ` Mike Gran
2012-01-24 9:56 ` Andy Wingo
0 siblings, 1 reply; 15+ messages in thread
From: Mike Gran @ 2012-01-24 6:08 UTC (permalink / raw)
To: Andy Wingo; +Cc: Mark H Weaver, guile-devel@gnu.org
> From: Andy Wingo <wingo@pobox.com>
> That would indeed be a mean thing to do! It's not what I'm suggesting
> though. Deprecation means causing Guile to emit warnings, at
> compile-time or at runtime, indicating that a particular interface will
> go away at some point, and noting the interface that should be used
> instead.
>
> I think it's fairly helpful, actually, but if you have any suggestions
> for how it could be improved, they are much welcome.
My beef is with potentially removing the ability to use an integer seed in
seed->random-state. It is useful and common. Many other languages
and schemes do it the same way. Its strengths and limitations are
indicated in the manual.
I could make a technical argument about why this procedure's calling
structure w.r.t. integers shouldn't change: but the technical argument
would just be an attempt to justify my personal opinion that a documented
API that I have used in scheme code that currently works fine shouldn't be
broken.
This is orthogonal to the bug in the procedure, though. A user should
be able to expect the for each integer seed between 0 and 2^N,
for some value of N, that the PRNG will return a different series.
It works that way in most languages that allow integer seeds.
Thanks,
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improve `seed->random-state' in stable-2.0?
2012-01-24 6:08 ` Mike Gran
@ 2012-01-24 9:56 ` Andy Wingo
0 siblings, 0 replies; 15+ messages in thread
From: Andy Wingo @ 2012-01-24 9:56 UTC (permalink / raw)
To: Mike Gran; +Cc: Mark H Weaver, guile-devel@gnu.org
Hi Mike,
On Tue 24 Jan 2012 07:08, Mike Gran <spk121@yahoo.com> writes:
> My beef is with potentially removing the ability to use an integer seed in
> seed->random-state. It is useful and common. Many other languages
> and schemes do it the same way. Its strengths and limitations are
> indicated in the manual.
Fair points, all. It was just a suggestion :) Thanks for the feedback!
> This is orthogonal to the bug in the procedure, though.
Indeed.
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-01-24 9:56 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-20 3:28 Improve `seed->random-state' in stable-2.0? Mark H Weaver
2012-01-20 8:37 ` David Kastrup
2012-01-20 14:54 ` Andy Wingo
2012-01-20 18:52 ` Mark H Weaver
2012-01-20 20:35 ` Andy Wingo
2012-01-20 22:45 ` Mark H Weaver
2012-01-21 7:38 ` David Kastrup
2012-01-21 8:20 ` Mark H Weaver
2012-01-23 9:52 ` Andy Wingo
2012-01-20 23:46 ` Mike Gran
2012-01-23 9:55 ` Andy Wingo
2012-01-23 13:06 ` Mike Gran
2012-01-23 14:06 ` Andy Wingo
2012-01-24 6:08 ` Mike Gran
2012-01-24 9:56 ` Andy Wingo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).