From: Bruno Haible <bruno@clisp.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: jporterbugs@gmail.com, Eli Zaretskii <eliz@gnu.org>,
bug-gnulib@gnu.org, larsi@gnus.org, 57129@debbugs.gnu.org
Subject: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell
Date: Sun, 21 Aug 2022 19:17:08 +0200 [thread overview]
Message-ID: <2758072.AiC22s8V5E@nimes> (raw)
In-Reply-To: <1903912.fIoEIV5pvu@nimes>
Paul Eggert asked:
> > Could you investigate further why mingw 64-bit fails?
Some words on the "why". You seem to have expectations regarding the
distribution quality of the resulting file names, but these expectations
are not warranted.
Donald E. Knuth wrote in TAOCP vol 2 "Seminumerical algorithms" that
an arbitrary sequence of number manipulating operations usually has a
bad quality as a pseudo-random number generator, and that in order
to get good quality pseudo-random numbers, one needs to use *specific*
pseudo-random number generators and then *prove* mathematical properties
of it. (The same is true, btw, for crypto algorithms.)
Then he started discussing linear congruential generators [1] as the
simplest example for which something could be proved.
The current source code of tempname.c generates the pseudo-random numbers
— assuming no HAS_CLOCK_ENTROPY and no ASLR — using a mix of three
operations:
(A) A linear congruential generator [2] with m = 2^64,
a = 2862933555777941757, c = 3037000493.
(B) A floor operation: v ← floor(v / 62^6)
(C) A xor operation: v ^= prev_v
There are three different questions:
(Q1) What is the expected quality inside a single gen_tempname call?
(Q2) What is the expected quality of multiple gen_tempname calls in a
single process?
(Q3) What is the expected quality when considering different processes
that call gen_tempname?
Answers:
(Q1) For just 6 'X'es, there is essentially a single (A) operation.
Therefore the quality will be good.
If someone uses more than 10 'X'es, for example, 50 'X'es, there
will be 5 (A) and 5 (B), interleaved: (A), (B), (A), (B), ...
This is *not* a linear congruential generator, therefore the
expected quality is BAD.
In order to fix this case, what I would do is to get back to
a linear congruential generator: (A), (A), ..., (A), (B).
In other words, feed into (A) exactly the output from the
previous (A). This means, do the statements
XXXXXX[i] = letters[v % 62];
v /= 62;
not on v itself, but on a copy of v.
But wait, there is also the desire to have predictability!
This means to not consume all the possible bits the
random_bits() call, but only part of it.
What I would do here, is to reduce BASE_62_DIGITS from 10 to 6.
So that in each round of the loop, 6 base-62 digits are consumed
and more than 4 base-62 digits are left in v, for predictability.
In the usual calls with 6 'X'es the loop will still end after a
single round.
(Q2) First of all, the multiple gen_tempname calls can occur in
different threads. Since no locking is involved, it is undefined
behaviour to access the 'prev_v' variable from different threads.
On machines with an IA-64 CPU, the 'prev_v' variable's value may
not be propagated from one thread to the other. [3][4][5]
The fix is simple, though: Mark 'prev_v' as 'volatile'.
Then, what the code does, is a mix of (A), (B), (C). Again, this
is *not* a linear congruential generator, therefore the expected
quality is BAD.
To get good quality, I would suggest to use a linear congruential
generator across *all* gen_tempname calls of the entire thread.
This means:
- Move out the (B) invocations out, like explained above in (Q1).
- Remove the (C) code that you added last week.
- Store the v value in a per-thread variable. Using '__thread'
on glibc systems and a TLS variable (#include "glthread/tls.h")
on the other platforms.
(Q3) Here, to force differences between different processes, I would
suggest to use a fine-grained clock value. In terms of platforms,
#if defined CLOCK_MONOTONIC && HAVE_CLOCK_GETTIME
is way too restrictive.
How about
- using CLOCK_REALTIME when CLOCK_MONOTONIC is not available,
- using gettimeofday() as fallback, especially on native Windows.
If one does (Q3), then the suggestions for (Q2) (other than the 'volatile')
may not be needed.
Bruno
[1] https://en.wikipedia.org/wiki/Linear_congruential_generator
[2] https://en.wikipedia.org/wiki/Linear_congruential_generator#c_%E2%89%A0_0
[3] https://es.cs.uni-kl.de/publications/datarsg/Geor16.pdf
[4] https://db.in.tum.de/teaching/ws1718/dataprocessing/chapter3.pdf?lang=de
page 18
[5] https://os.inf.tu-dresden.de/Studium/DOS/SS2014/04-Memory-Consistency.pdf
next prev parent reply other threads:[~2022-08-21 17:17 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-11 2:43 bug#57129: 29.0.50; Improve behavior of conditionals in Eshell Jim Porter
2022-08-11 2:46 ` Jim Porter
2022-08-12 15:22 ` Lars Ingebrigtsen
2022-08-13 5:14 ` Jim Porter
2022-08-13 7:01 ` Eli Zaretskii
2022-08-13 18:56 ` Jim Porter
2022-08-14 5:07 ` Eli Zaretskii
2022-08-14 5:37 ` Jim Porter
2022-08-14 7:30 ` Eli Zaretskii
2022-08-14 21:40 ` Jim Porter
2022-08-15 12:13 ` Eli Zaretskii
2022-08-15 16:14 ` Jim Porter
2022-08-15 16:49 ` Eli Zaretskii
2022-08-15 18:08 ` Jim Porter
2022-08-15 18:21 ` Eli Zaretskii
2022-08-15 18:30 ` Jim Porter
2022-08-15 18:58 ` Eli Zaretskii
2022-08-15 20:55 ` Paul Eggert
[not found] ` <af0af29b-2362-77db-081e-046158937808@cs.ucla.edu>
2022-08-15 21:22 ` Bruno Haible
2022-08-16 11:04 ` Eli Zaretskii
[not found] ` <835yish2l1.fsf@gnu.org>
2022-08-16 13:35 ` Bruno Haible
[not found] ` <1871347.6tgchFWduM@nimes>
2022-08-16 13:51 ` Eli Zaretskii
[not found] ` <838rnofgad.fsf@gnu.org>
2022-08-16 14:40 ` Bruno Haible
2022-08-16 16:25 ` Eli Zaretskii
[not found] ` <83wnb8dukz.fsf@gnu.org>
2022-08-16 16:54 ` Paul Eggert
[not found] ` <206e38df-2db4-a46a-e0ff-952bc8ab939c@cs.ucla.edu>
2022-08-16 17:04 ` Eli Zaretskii
[not found] ` <83sflwdsr2.fsf@gnu.org>
2022-08-16 17:30 ` Paul Eggert
2022-08-16 17:41 ` Eli Zaretskii
[not found] ` <ceeeaa86-6199-93b1-ff65-bbd3e531e235@cs.ucla.edu>
2022-08-16 17:56 ` Eli Zaretskii
2022-08-16 17:25 ` Paul Eggert
2022-08-16 17:29 ` Bruno Haible
[not found] ` <f329244a-cba7-65cd-2e5d-2630eba3e9e9@cs.ucla.edu>
2022-08-16 17:47 ` Eli Zaretskii
2022-08-16 19:11 ` Paul Eggert
[not found] ` <d95734ab-6bbc-7403-c1f8-fbf742badda4@cs.ucla.edu>
2022-08-16 20:12 ` Bruno Haible
2022-08-17 11:08 ` Eli Zaretskii
[not found] ` <83h72bdt4z.fsf@gnu.org>
2022-08-18 6:05 ` Paul Eggert
[not found] ` <57b8f10f-8e9b-5951-e5ad-8cba2a8cb569@cs.ucla.edu>
2022-08-18 6:44 ` Eli Zaretskii
[not found] ` <2594092.Isy0gbHreE@nimes>
2022-08-16 17:52 ` Eli Zaretskii
2022-08-16 20:06 ` Bruno Haible
[not found] ` <2606289.q0ZmV6gNhb@nimes>
2022-08-17 11:29 ` Eli Zaretskii
2022-08-16 19:59 ` Bruno Haible
[not found] ` <2135151.C4sosBPzcN@nimes>
2022-08-16 20:53 ` Paul Eggert
2022-08-21 16:20 ` Bruno Haible
2022-08-21 16:32 ` Eli Zaretskii
2022-08-21 17:17 ` Bruno Haible [this message]
2022-08-22 20:47 ` Paul Eggert
[not found] ` <2dc7ede0-eca7-baf5-f89a-f5d292b80808@cs.ucla.edu>
2022-08-23 0:13 ` Bruno Haible
[not found] ` <3893771.2iPT33SAM4@nimes>
2022-08-23 11:18 ` Eli Zaretskii
[not found] ` <831qt79pjj.fsf@gnu.org>
2022-08-23 14:49 ` Bruno Haible
2022-08-23 16:07 ` Eli Zaretskii
2022-08-20 18:03 ` Jim Porter
2022-08-20 18:14 ` Eli Zaretskii
2022-08-20 18:49 ` Jim Porter
2022-08-24 21:41 ` Jim Porter
2022-08-26 5:10 ` Jim Porter
2023-03-16 5:35 ` Jim Porter
2022-08-14 5:03 ` Sean Whitton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2758072.AiC22s8V5E@nimes \
--to=bruno@clisp.org \
--cc=57129@debbugs.gnu.org \
--cc=bug-gnulib@gnu.org \
--cc=eggert@cs.ucla.edu \
--cc=eliz@gnu.org \
--cc=jporterbugs@gmail.com \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).