unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: jporterbugs@gmail.com, Eli Zaretskii <eliz@gnu.org>,
	bug-gnulib@gnu.org, larsi@gnus.org, 57129@debbugs.gnu.org
Subject: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell
Date: Sun, 21 Aug 2022 19:17:08 +0200	[thread overview]
Message-ID: <2758072.AiC22s8V5E@nimes> (raw)
In-Reply-To: <1903912.fIoEIV5pvu@nimes>

Paul Eggert asked:
> > Could you investigate further why mingw 64-bit fails?

Some words on the "why". You seem to have expectations regarding the
distribution quality of the resulting file names, but these expectations
are not warranted.

Donald E. Knuth wrote in TAOCP vol 2 "Seminumerical algorithms" that
an arbitrary sequence of number manipulating operations usually has a
bad quality as a pseudo-random number generator, and that in order
to get good quality pseudo-random numbers, one needs to use *specific*
pseudo-random number generators and then *prove* mathematical properties
of it. (The same is true, btw, for crypto algorithms.)
Then he started discussing linear congruential generators [1] as the
simplest example for which something could be proved.

The current source code of tempname.c generates the pseudo-random numbers
— assuming no HAS_CLOCK_ENTROPY and no ASLR — using a mix of three
operations:
  (A) A linear congruential generator [2] with m = 2^64,
      a = 2862933555777941757, c = 3037000493.
  (B) A floor operation: v ← floor(v / 62^6)
  (C) A xor operation: v ^= prev_v

There are three different questions:
(Q1) What is the expected quality inside a single gen_tempname call?
(Q2) What is the expected quality of multiple gen_tempname calls in a
     single process?
(Q3) What is the expected quality when considering different processes
     that call gen_tempname?

Answers:
(Q1) For just 6 'X'es, there is essentially a single (A) operation.
     Therefore the quality will be good.
     If someone uses more than 10 'X'es, for example, 50 'X'es, there
     will be 5 (A) and 5 (B), interleaved: (A), (B), (A), (B), ...
     This is *not* a linear congruential generator, therefore the
     expected quality is BAD.
     In order to fix this case, what I would do is to get back to
     a linear congruential generator: (A), (A), ..., (A), (B).
     In other words, feed into (A) exactly the output from the
     previous (A). This means, do the statements
          XXXXXX[i] = letters[v % 62];
          v /= 62;
     not on v itself, but on a copy of v.
     But wait, there is also the desire to have predictability!
     This means to not consume all the possible bits the
     random_bits() call, but only part of it.
     What I would do here, is to reduce BASE_62_DIGITS from 10 to 6.
     So that in each round of the loop, 6 base-62 digits are consumed
     and more than 4 base-62 digits are left in v, for predictability.
     In the usual calls with 6 'X'es the loop will still end after a
     single round.

(Q2) First of all, the multiple gen_tempname calls can occur in
     different threads. Since no locking is involved, it is undefined
     behaviour to access the 'prev_v' variable from different threads.
     On machines with an IA-64 CPU, the 'prev_v' variable's value may
     not be propagated from one thread to the other. [3][4][5]
     The fix is simple, though: Mark 'prev_v' as 'volatile'.

     Then, what the code does, is a mix of (A), (B), (C). Again, this
     is *not* a linear congruential generator, therefore the expected
     quality is BAD.

     To get good quality, I would suggest to use a linear congruential
     generator across *all* gen_tempname calls of the entire thread.
     This means:
       - Move out the (B) invocations out, like explained above in (Q1).
       - Remove the (C) code that you added last week.
       - Store the v value in a per-thread variable. Using '__thread'
         on glibc systems and a TLS variable (#include "glthread/tls.h")
         on the other platforms.

(Q3) Here, to force differences between different processes, I would
     suggest to use a fine-grained clock value. In terms of platforms,
       #if defined CLOCK_MONOTONIC && HAVE_CLOCK_GETTIME
     is way too restrictive.
     How about
       - using CLOCK_REALTIME when CLOCK_MONOTONIC is not available,
       - using gettimeofday() as fallback, especially on native Windows.

If one does (Q3), then the suggestions for (Q2) (other than the 'volatile')
may not be needed.

Bruno

[1] https://en.wikipedia.org/wiki/Linear_congruential_generator
[2] https://en.wikipedia.org/wiki/Linear_congruential_generator#c_%E2%89%A0_0
[3] https://es.cs.uni-kl.de/publications/datarsg/Geor16.pdf
[4] https://db.in.tum.de/teaching/ws1718/dataprocessing/chapter3.pdf?lang=de
    page 18
[5] https://os.inf.tu-dresden.de/Studium/DOS/SS2014/04-Memory-Consistency.pdf










  parent reply	other threads:[~2022-08-21 17:17 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  2:43 bug#57129: 29.0.50; Improve behavior of conditionals in Eshell Jim Porter
2022-08-11  2:46 ` Jim Porter
2022-08-12 15:22   ` Lars Ingebrigtsen
2022-08-13  5:14     ` Jim Porter
2022-08-13  7:01       ` Eli Zaretskii
2022-08-13 18:56         ` Jim Porter
2022-08-14  5:07           ` Eli Zaretskii
2022-08-14  5:37             ` Jim Porter
2022-08-14  7:30               ` Eli Zaretskii
2022-08-14 21:40                 ` Jim Porter
2022-08-15 12:13                   ` Eli Zaretskii
2022-08-15 16:14                     ` Jim Porter
2022-08-15 16:49                       ` Eli Zaretskii
2022-08-15 18:08                         ` Jim Porter
2022-08-15 18:21                           ` Eli Zaretskii
2022-08-15 18:30                             ` Jim Porter
2022-08-15 18:58                               ` Eli Zaretskii
2022-08-15 20:55                                 ` Paul Eggert
     [not found]                                 ` <af0af29b-2362-77db-081e-046158937808@cs.ucla.edu>
2022-08-15 21:22                                   ` Bruno Haible
2022-08-16 11:04                                   ` Eli Zaretskii
     [not found]                                   ` <835yish2l1.fsf@gnu.org>
2022-08-16 13:35                                     ` Bruno Haible
     [not found]                                     ` <1871347.6tgchFWduM@nimes>
2022-08-16 13:51                                       ` Eli Zaretskii
     [not found]                                       ` <838rnofgad.fsf@gnu.org>
2022-08-16 14:40                                         ` Bruno Haible
2022-08-16 16:25                                           ` Eli Zaretskii
     [not found]                                           ` <83wnb8dukz.fsf@gnu.org>
2022-08-16 16:54                                             ` Paul Eggert
     [not found]                                             ` <206e38df-2db4-a46a-e0ff-952bc8ab939c@cs.ucla.edu>
2022-08-16 17:04                                               ` Eli Zaretskii
     [not found]                                               ` <83sflwdsr2.fsf@gnu.org>
2022-08-16 17:30                                                 ` Paul Eggert
2022-08-16 17:41                                                 ` Eli Zaretskii
     [not found]                                                 ` <ceeeaa86-6199-93b1-ff65-bbd3e531e235@cs.ucla.edu>
2022-08-16 17:56                                                   ` Eli Zaretskii
2022-08-16 17:25                                             ` Paul Eggert
2022-08-16 17:29                                             ` Bruno Haible
     [not found]                                             ` <f329244a-cba7-65cd-2e5d-2630eba3e9e9@cs.ucla.edu>
2022-08-16 17:47                                               ` Eli Zaretskii
2022-08-16 19:11                                                 ` Paul Eggert
     [not found]                                                 ` <d95734ab-6bbc-7403-c1f8-fbf742badda4@cs.ucla.edu>
2022-08-16 20:12                                                   ` Bruno Haible
2022-08-17 11:08                                                   ` Eli Zaretskii
     [not found]                                                   ` <83h72bdt4z.fsf@gnu.org>
2022-08-18  6:05                                                     ` Paul Eggert
     [not found]                                                     ` <57b8f10f-8e9b-5951-e5ad-8cba2a8cb569@cs.ucla.edu>
2022-08-18  6:44                                                       ` Eli Zaretskii
     [not found]                                             ` <2594092.Isy0gbHreE@nimes>
2022-08-16 17:52                                               ` Eli Zaretskii
2022-08-16 20:06                                                 ` Bruno Haible
     [not found]                                                 ` <2606289.q0ZmV6gNhb@nimes>
2022-08-17 11:29                                                   ` Eli Zaretskii
2022-08-16 19:59                                         ` Bruno Haible
     [not found]                                         ` <2135151.C4sosBPzcN@nimes>
2022-08-16 20:53                                           ` Paul Eggert
2022-08-21 16:20                                             ` Bruno Haible
2022-08-21 16:32                                               ` Eli Zaretskii
2022-08-21 17:17                                               ` Bruno Haible [this message]
2022-08-22 20:47                                                 ` Paul Eggert
     [not found]                                                 ` <2dc7ede0-eca7-baf5-f89a-f5d292b80808@cs.ucla.edu>
2022-08-23  0:13                                                   ` Bruno Haible
     [not found]                                                   ` <3893771.2iPT33SAM4@nimes>
2022-08-23 11:18                                                     ` Eli Zaretskii
     [not found]                                                     ` <831qt79pjj.fsf@gnu.org>
2022-08-23 14:49                                                       ` Bruno Haible
2022-08-23 16:07                                                         ` Eli Zaretskii
2022-08-20 18:03                                 ` Jim Porter
2022-08-20 18:14                                   ` Eli Zaretskii
2022-08-20 18:49                                     ` Jim Porter
2022-08-24 21:41                                   ` Jim Porter
2022-08-26  5:10                                     ` Jim Porter
2023-03-16  5:35                                       ` Jim Porter
2022-08-14  5:03       ` Sean Whitton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2758072.AiC22s8V5E@nimes \
    --to=bruno@clisp.org \
    --cc=57129@debbugs.gnu.org \
    --cc=bug-gnulib@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=jporterbugs@gmail.com \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).