all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
@ 2022-10-12 16:07 Stefan Kangas
  2022-10-12 18:08 ` Paul Eggert
  2022-10-13 11:45 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-12 16:07 UTC (permalink / raw)
  To: 58472; +Cc: Paul Eggert

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

This is a proposal to make `message-unique-id' less prone to collisions.

For the first 1-2 characters, it uses the return value `user-uid', which
on most single-user systems is a fixed value (for example, this gives us
"87" on almost any single-user Debian machine).[1]  It's also
unnecessarily leaky of potentially privacy sensitive information.

The next 8 characters are the current time, with some gymnastics to
emulate support for a fractional part to seconds.  This seems
unnecessary now that, AFAIU, `time-convert' can do that for us portably
(please correct me if I'm wrong, Paul).

I suggest that we instead base the left-hand side of the Message-ID on:

  1. (time-convert nil (expt 10 9))
  2. 2^N bits of pseudo random data (e.g. N=32)

We can then ignore the effective user id, while significantly decreasing
the risk of a Message-ID collision.[2]

Currently, we get values like:

    (message-unique-id)
    => "87o7uhi3at.fsf"            ; length 10

With the attached patch, we have instead:

    (message-unique-id)
    => "cnk29wgg1a4nvrpqcy.fsf"   ; length ~22

Note also that `message-number-base36' uses a Caesar cipher for some
reason:

    (message-number-base36 5 -1)
    => "u"    ; expect "5"
    (message-number-base36 (expt 36 3) -1)
    => "yzzz" ; expect "1000"

The patch fixes this also.

I don't know if this change should be in NEWS or not.

Footnotes:
[1]  Just for fun, you can search for 87 on
     https://en.wikipedia.org/wiki/Message-ID

[2]  See also: https://www.jwz.org/doc/mid.html

[-- Attachment #2: 0001-Make-message-unique-id-less-prone-to-collisions.patch --]
[-- Type: text/x-diff, Size: 3772 bytes --]

From 5eedf88bb85e5b4a8a1bb556d856324aa9333042 Mon Sep 17 00:00:00 2001
From: Stefan Kangas <stefankangas@gmail.com>
Date: Wed, 12 Oct 2022 17:29:04 +0200
Subject: [PATCH] Make `message-unique-id' less prone to collisions

* lisp/gnus/message.el (message-unique-id): Make less prone to
collisions.
(message-unique-id-char): Make unused variable obsolete.
* test/lisp/gnus/message-tests.el (message-unique-id-test)
(message-number-base36-test): New tests.
---
 lisp/gnus/message.el            | 42 +++++++--------------------------
 test/lisp/gnus/message-tests.el |  9 +++++++
 2 files changed, 18 insertions(+), 33 deletions(-)

diff --git a/lisp/gnus/message.el b/lisp/gnus/message.el
index 67ec0531fa..bdc9a5b2ad 100644
--- a/lisp/gnus/message.el
+++ b/lisp/gnus/message.el
@@ -5889,41 +5889,17 @@ message-make-message-id
 		"_-_" ""))
 	  "@" (message-make-fqdn) ">"))
 
+(make-obsolete-variable 'message-unique-id-char nil "29.1")
 (defvar message-unique-id-char nil)
 
-;; If you ever change this function, make sure the new version
-;; cannot generate IDs that the old version could.
-;; You might for example insert a "." somewhere (not next to another dot
-;; or string boundary), or modify the "fsf" string.
 (defun message-unique-id ()
-  ;; Don't use fractional seconds from timestamp; they may be unsupported.
-  ;; Instead we use this randomly inited counter.
-  (setq message-unique-id-char
-	;; 2^16 * 25 just fits into 4 digits i base 36.
-	(let ((base (* 25 25)))
-	  (if message-unique-id-char
-	      (% (1+ message-unique-id-char) base)
-	    (random base))))
-  (let ((tm (time-convert nil 'integer)))
-    (concat
-     (if (or (eq system-type 'ms-dos)
-	     ;; message-number-base36 doesn't handle bigints.
-	     (floatp (user-uid)))
-	 (let ((user (downcase (user-login-name))))
-	   (while (string-match "[^a-z0-9_]" user)
-	     (aset user (match-beginning 0) ?_))
-	   user)
-       (message-number-base36 (user-uid) -1))
-     (message-number-base36 (+ (ash tm -16)
-			       (ash (% message-unique-id-char 25) 16))
-			    4)
-     (message-number-base36 (+ (logand tm #xffff)
-			       (ash (/ message-unique-id-char 25) 16))
-			    4)
-     ;; Append a given name, because while the generated ID is unique
-     ;; to this newsreader, other newsreaders might otherwise generate
-     ;; the same ID via another algorithm.
-     ".fsf")))
+  (concat
+   (message-number-base36 (car (time-convert nil (expt 10 9))) -1)
+   (message-number-base36 (random (expt 2 32)) -1)
+   ;; Append a given name, because while the generated ID is unique
+   ;; to this newsreader, other newsreaders might otherwise generate
+   ;; the same ID via another algorithm.
+   ".fsf"))
 
 (defun message-number-base36 (num len)
   (if (if (< len 0)
@@ -5931,7 +5907,7 @@ message-number-base36
 	(= len 0))
       ""
     (concat (message-number-base36 (/ num 36) (1- len))
-	    (char-to-string (aref "zyxwvutsrqponmlkjihgfedcba9876543210"
+            (char-to-string (aref "0123456789abcdefghijklmnopqrstuvwxyz"
 				  (% num 36))))))
 
 (defun message-make-organization ()
diff --git a/test/lisp/gnus/message-tests.el b/test/lisp/gnus/message-tests.el
index a724428ecb..e43fb1c1c4 100644
--- a/test/lisp/gnus/message-tests.el
+++ b/test/lisp/gnus/message-tests.el
@@ -31,6 +31,15 @@
 
 (require 'cl-lib)
 
+(ert-deftest message-unique-id-test ()
+  (should (stringp (message-unique-id)))
+  (should (< 10 (length (message-unique-id)))))
+
+(ert-deftest message-number-base36-test ()
+  (should (equal (message-number-base36 10 -1) "a"))
+  (should (equal (message-number-base36 1 -1) "1"))
+  (should (equal (message-number-base36 (expt 36 5) -1) "100000")))
+
 (ert-deftest message-mode-propertize ()
   (with-temp-buffer
     (unwind-protect
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-12 16:07 bug#58472: [PATCH] Make `message-unique-id' less prone to collisions Stefan Kangas
@ 2022-10-12 18:08 ` Paul Eggert
  2022-10-13  2:46   ` Stefan Kangas
  2022-10-13 11:45 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Eggert @ 2022-10-12 18:08 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472

On 2022-10-12 09:07, Stefan Kangas wrote:

> This seems
> unnecessary now that, AFAIU, `time-convert' can do that for us portably
> (please correct me if I'm wrong, Paul).
> 
> I suggest that we instead base the left-hand side of the Message-ID on:
> 
>    1. (time-convert nil (expt 10 9))
>    2. 2^N bits of pseudo random data (e.g. N=32)

(time-convert nil t) would be a bit more efficient.

If the goal is to avoid collisions, though, wouldn't it be better to 
avoid the time entirely? time is not very random in the high order bits.

Also, the comment about ".fsf" and other newsreaders is wrong. If we're 
generating IDs at random it doesn't matter whether we append ".fsf" as 
the ".fsf" itself could be randomly generated by another newsreader. The 
".fsf" is just a comment or advertisement or debugging aid or what have 
you. And if we're changing the algorithm perhaps we should change ".fsf" 
to something else.

Something like this, perhaps, where you can choose LEN as you like:

   (concat
      (let ((len 18))
        ;; Pass LEN, not -1, to message-number-base36 so that it never
        ;; returns "" which would make the message-ID nonconforming.
        (message-number-base36 (random (expt 36 len)) len))
      ;; Append ".gnu" to advertise that we're GNU Emacs.
      ".gnu")






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-12 18:08 ` Paul Eggert
@ 2022-10-13  2:46   ` Stefan Kangas
  2022-10-13  4:53     ` Matt Armstrong
  2022-10-13 16:21     ` Paul Eggert
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-13  2:46 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 58472

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

Paul Eggert <eggert@cs.ucla.edu> writes:

> If the goal is to avoid collisions, though, wouldn't it be better to
> avoid the time entirely? time is not very random in the high order bits.

The main goal is to avoid collisions, but using the time also gives an
idea of when the message was sent, which is kind of nice.  Time also
guarantees a somewhat unique value even if the user has happened to set
the random seed.

If we limit the length of the time string to 12 characters, and the
total length to 25 characters (including the ".gnu" part), we still have
a guaranteed 9 characters of random data, or 46 bits of entropy.

    ⌊log₂ 36⁹⌋ = 46

If we want to guarantee 62 bits of entropy, we could do a total length
of 28 characters instead, but that might be overkill.

> Also, the comment about ".fsf" and other newsreaders is wrong [...]
> And if we're changing the algorithm perhaps we should change ".fsf" to
> something else.

Changing it to ".gnu" makes sense to me.

> Something like this, perhaps, where you can choose LEN as you like:
>
>    (concat
>       (let ((len 18))
>         ;; Pass LEN, not -1, to message-number-base36 so that it never
>         ;; returns "" which would make the message-ID nonconforming.
>         (message-number-base36 (random (expt 36 len)) len))
>       ;; Append ".gnu" to advertise that we're GNU Emacs.
>       ".gnu")

I think making sure it is always the same length is a good idea.

How about the attached?

[-- Attachment #2: 0001-Make-message-unique-id-less-prone-to-collisions.patch --]
[-- Type: text/x-diff, Size: 4030 bytes --]

From 889667e332fe76fc670cb1d4620c47525d9a70bf Mon Sep 17 00:00:00 2001
From: Stefan Kangas <stefankangas@gmail.com>
Date: Wed, 12 Oct 2022 17:29:04 +0200
Subject: [PATCH] Make `message-unique-id' less prone to collisions

* lisp/gnus/message.el (message-unique-id): Make less prone to
collisions.  (Bug#58472)
(message-unique-id-char): Make unused variable obsolete.
* test/lisp/gnus/message-tests.el (message-unique-id-test)
(message-number-base36-test): New tests.
---
 lisp/gnus/message.el            | 43 ++++++++-------------------------
 test/lisp/gnus/message-tests.el |  9 +++++++
 2 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/lisp/gnus/message.el b/lisp/gnus/message.el
index 67ec0531fa..9b85f1c5fd 100644
--- a/lisp/gnus/message.el
+++ b/lisp/gnus/message.el
@@ -47,7 +47,7 @@
 (require 'rfc2047)
 (require 'puny)
 (require 'rmc)                          ; read-multiple-choice
-(require 'subr-x)
+(require 'subr-x)                       ; string-limit
 (require 'yank-media)
 (require 'mailcap)
 (require 'sendmail)
@@ -5889,41 +5889,18 @@ message-make-message-id
 		"_-_" ""))
 	  "@" (message-make-fqdn) ">"))
 
+(make-obsolete-variable 'message-unique-id-char nil "29.1")
 (defvar message-unique-id-char nil)
 
-;; If you ever change this function, make sure the new version
-;; cannot generate IDs that the old version could.
-;; You might for example insert a "." somewhere (not next to another dot
-;; or string boundary), or modify the "fsf" string.
 (defun message-unique-id ()
-  ;; Don't use fractional seconds from timestamp; they may be unsupported.
-  ;; Instead we use this randomly inited counter.
-  (setq message-unique-id-char
-	;; 2^16 * 25 just fits into 4 digits i base 36.
-	(let ((base (* 25 25)))
-	  (if message-unique-id-char
-	      (% (1+ message-unique-id-char) base)
-	    (random base))))
-  (let ((tm (time-convert nil 'integer)))
+  (let* ((time (car (time-convert nil t)))
+         (time-string (string-limit (message-number-base36 time -1) 12))
+         (len (- 21 (length time-string))))
     (concat
-     (if (or (eq system-type 'ms-dos)
-	     ;; message-number-base36 doesn't handle bigints.
-	     (floatp (user-uid)))
-	 (let ((user (downcase (user-login-name))))
-	   (while (string-match "[^a-z0-9_]" user)
-	     (aset user (match-beginning 0) ?_))
-	   user)
-       (message-number-base36 (user-uid) -1))
-     (message-number-base36 (+ (ash tm -16)
-			       (ash (% message-unique-id-char 25) 16))
-			    4)
-     (message-number-base36 (+ (logand tm #xffff)
-			       (ash (/ message-unique-id-char 25) 16))
-			    4)
-     ;; Append a given name, because while the generated ID is unique
-     ;; to this newsreader, other newsreaders might otherwise generate
-     ;; the same ID via another algorithm.
-     ".fsf")))
+     time-string
+     (message-number-base36 (random (expt 36 len)) len)
+     ;; Append ".gnu" to advertise that we're GNU Emacs.
+     ".gnu")))
 
 (defun message-number-base36 (num len)
   (if (if (< len 0)
@@ -5931,7 +5908,7 @@ message-number-base36
 	(= len 0))
       ""
     (concat (message-number-base36 (/ num 36) (1- len))
-	    (char-to-string (aref "zyxwvutsrqponmlkjihgfedcba9876543210"
+            (char-to-string (aref "0123456789abcdefghijklmnopqrstuvwxyz"
 				  (% num 36))))))
 
 (defun message-make-organization ()
diff --git a/test/lisp/gnus/message-tests.el b/test/lisp/gnus/message-tests.el
index a724428ecb..e8c0655543 100644
--- a/test/lisp/gnus/message-tests.el
+++ b/test/lisp/gnus/message-tests.el
@@ -31,6 +31,15 @@
 
 (require 'cl-lib)
 
+(ert-deftest message-unique-id-test ()
+  (should (stringp (message-unique-id)))
+  (should (= (length (message-unique-id)) 25)))
+
+(ert-deftest message-number-base36-test ()
+  (should (equal (message-number-base36 10 -1) "a"))
+  (should (equal (message-number-base36 1 -1) "1"))
+  (should (equal (message-number-base36 (expt 36 5) -1) "100000")))
+
 (ert-deftest message-mode-propertize ()
   (with-temp-buffer
     (unwind-protect
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13  2:46   ` Stefan Kangas
@ 2022-10-13  4:53     ` Matt Armstrong
  2022-10-13 12:10       ` Stefan Kangas
  2022-10-13 16:21     ` Paul Eggert
  1 sibling, 1 reply; 32+ messages in thread
From: Matt Armstrong @ 2022-10-13  4:53 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472, Paul Eggert

Stefan Kangas <stefankangas@gmail.com> writes:

> Paul Eggert <eggert@cs.ucla.edu> writes:
>
>> If the goal is to avoid collisions, though, wouldn't it be better to
>> avoid the time entirely? time is not very random in the high order bits.
>
> The main goal is to avoid collisions, but using the time also gives an
> idea of when the message was sent, which is kind of nice.  Time also
> guarantees a somewhat unique value even if the user has happened to set
> the random seed.

Most email I get today to use a UUID or UUID-like Message-ID, like this:

Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69@dfw1s10mta1086.xt.local>
Message-ID: <1815053947.8446619.1665544925708@lor1-app45123.prod.linkedin.com>
Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000@email.amazonses.com>
Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw@mail.gmail.com>


> If we limit the length of the time string to 12 characters, and the
> total length to 25 characters (including the ".gnu" part), we still have
> a guaranteed 9 characters of random data, or 46 bits of entropy.
>
>     ⌊log₂ 36⁹⌋ = 46
>
> If we want to guarantee 62 bits of entropy, we could do a total length
> of 28 characters instead, but that might be overkill.

I suspect that most mailers use more randomness than that.

Some of the SHA hash algorithms are in the public domain.  Could they be
added to Emacs and used for UUID generation here?


-- 
matt (sent from an Emacs running the feature/noverlay branch)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-12 16:07 bug#58472: [PATCH] Make `message-unique-id' less prone to collisions Stefan Kangas
  2022-10-12 18:08 ` Paul Eggert
@ 2022-10-13 11:45 ` Lars Ingebrigtsen
  2022-10-13 12:10   ` Stefan Kangas
  1 sibling, 1 reply; 32+ messages in thread
From: Lars Ingebrigtsen @ 2022-10-13 11:45 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472, Paul Eggert

Stefan Kangas <stefankangas@gmail.com> writes:

> For the first 1-2 characters, it uses the return value `user-uid', which
> on most single-user systems is a fixed value (for example, this gives us
> "87" on almost any single-user Debian machine).[1]  It's also
> unnecessarily leaky of potentially privacy sensitive information.

Well, since it is "87" on most machines, it's not very leaky.  😀

It's documented to be this way, though?  That is, we've got tricks like
being able to score on References based on your Message-ID to score up
threads you've been a part of, etc.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 11:45 ` Lars Ingebrigtsen
@ 2022-10-13 12:10   ` Stefan Kangas
  2022-10-13 19:15     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-13 12:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 58472, Paul Eggert

Lars Ingebrigtsen <larsi@gnus.org> writes:

> It's documented to be this way, though?

Hmm, right (from `(message) News Headers'):

    ‘Message-ID’
         This required header will be generated by Message.  A unique ID
         will be created based on the date, time, user name (for the local
         part) and the domain part.

Actually, the user name is currently only used on MS-DOS, AFAICT.
So I guess the documentation is already wrong?

> Well, since it is "87" on most machines, it's not very leaky.  😀

Until your name is "88", of course.  ;-(

> That is, we've got tricks like being able to score on References based
> on your Message-ID to score up threads you've been a part of, etc.

It seems flaky to use it for such scoring purposes though, as almost
everyone is named "87"...  Do we really have code that does that?

Should we care?





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13  4:53     ` Matt Armstrong
@ 2022-10-13 12:10       ` Stefan Kangas
  2022-10-13 16:35         ` Matt Armstrong
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-13 12:10 UTC (permalink / raw)
  To: Matt Armstrong; +Cc: 58472, Paul Eggert

Matt Armstrong <matt@rfc20.org> writes:

> Most email I get today to use a UUID or UUID-like Message-ID, like this:
>
> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69@dfw1s10mta1086.xt.local>
> Message-ID: <1815053947.8446619.1665544925708@lor1-app45123.prod.linkedin.com>
> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000@email.amazonses.com>
> Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw@mail.gmail.com>

Those are 30-51 characters in length.  I also note that Gmail uses both
lower case and upper case characters.

>> If we limit the length of the time string to 12 characters, and the
>> total length to 25 characters (including the ".gnu" part), we still have
>> a guaranteed 9 characters of random data, or 46 bits of entropy.
>
> I suspect that most mailers use more randomness than that.

So I guess we might as well bump this up to 30 characters in total,
which gives us 72 bits.  The Message-IDs would look like:

    cnkrs75yamag1k7x8rnt3y50za.gnu@stefankangas.se
    cnkrifkirauwuwfkzs3rcit8cq.gnu@stefankangas.se

We could go longer, but it's also nice to have something which is not an
absolute abomination to look at.

If we add in upper case characters too, we can encode the time with one
less character.  So we end up with 89 bits of randomness and this:

    1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu@stefankangas.se
    1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu@stefankangas.se

If we don't want to always start the Message-ID with the same characters
(which makes them more distinct, at a glance), we could just reverse the
time string:

    QlRXPpmK2Z1kUklxIpMNZpChOu.gnu@stefankangas.se
    Z59YikmK2Z1FSmYj172SAdPpuX.gnu@stefankangas.se

> Some of the SHA hash algorithms are in the public domain.  Could they be
> added to Emacs and used for UUID generation here?

We have `secure-hash'.  Is that what you mean?  Or do you mean to use a
proper RFC 4122 UUID?

All we need is for the Message-ID to be unique though, so some ad hoc
solution is probably fine.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13  2:46   ` Stefan Kangas
  2022-10-13  4:53     ` Matt Armstrong
@ 2022-10-13 16:21     ` Paul Eggert
  2022-10-14  9:22       ` Stefan Kangas
  2022-10-16 15:19       ` Matt Armstrong
  1 sibling, 2 replies; 32+ messages in thread
From: Paul Eggert @ 2022-10-13 16:21 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472

On 2022-10-12 19:46, Stefan Kangas wrote:

> The main goal is to avoid collisions, but using the time also gives an
> idea of when the message was sent, which is kind of nice.

That info is in the Date: line, along with zillions of other Received: 
lines. There should be no need to repeat it in the Message-ID line.

> Time also
> guarantees a somewhat unique value even if the user has happened to set
> the random seed.

If that's a concern, we should be using more-random data, e.g., with

    (base64-encode-string
     (secure-hash 'md5 'iv-auto 128 nil t))

if we want 128 bits of randomness (this yields a string like 
"B8a3usyu5QSE/rTLu0nIHg==").

As an aside, it's weird that there's no easy way to ask Emacs for an 
N-bit random integer, where the randomness is taken from system entropy. 
Shouldn't we extend Emacs to support that? E.g., (make-string 128 
'iv-auto) could give you an N-byte entropy-derived random string, or 
(random -N) could give you an N-bit entropy-derived random nonnegative 
integer, or something like that. Then we could write something like this:

   (base64-encode-string (make-string 16 'iv-auto))

to get a Message-ID component with 16 bytes (128 bits) of entropy.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 12:10       ` Stefan Kangas
@ 2022-10-13 16:35         ` Matt Armstrong
  2022-10-13 16:38           ` Paul Eggert
  2022-10-14  9:22           ` Stefan Kangas
  0 siblings, 2 replies; 32+ messages in thread
From: Matt Armstrong @ 2022-10-13 16:35 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472, Paul Eggert

Stefan Kangas <stefankangas@gmail.com> writes:

> Matt Armstrong <matt@rfc20.org> writes:
>
>> Most email I get today to use a UUID or UUID-like Message-ID, like this:
>>
>> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69@dfw1s10mta1086.xt.local>
>> Message-ID: <1815053947.8446619.1665544925708@lor1-app45123.prod.linkedin.com>
>> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000@email.amazonses.com>
>> Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw@mail.gmail.com>
>
> Those are 30-51 characters in length.  I also note that Gmail uses both
> lower case and upper case characters.

Yes, the Google example is probably a crypto secure hash base64url
encoded.


>>> If we limit the length of the time string to 12 characters, and the
>>> total length to 25 characters (including the ".gnu" part), we still have
>>> a guaranteed 9 characters of random data, or 46 bits of entropy.
>>
>> I suspect that most mailers use more randomness than that.
>
> So I guess we might as well bump this up to 30 characters in total,
> which gives us 72 bits.  The Message-IDs would look like:
>
>     cnkrs75yamag1k7x8rnt3y50za.gnu@stefankangas.se
>     cnkrifkirauwuwfkzs3rcit8cq.gnu@stefankangas.se

You said you like the timestamp prefix.  How many times would that
actually be useful, and for what?

I am not seeing the utility myself, but I might be missing something.


> We could go longer, but it's also nice to have something which is not an
> absolute abomination to look at.
>
> If we add in upper case characters too, we can encode the time with one
> less character.  So we end up with 89 bits of randomness and this:
>
>     1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu@stefankangas.se
>     1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu@stefankangas.se
>
> If we don't want to always start the Message-ID with the same characters
> (which makes them more distinct, at a glance), we could just reverse the
> time string:
>
>     QlRXPpmK2Z1kUklxIpMNZpChOu.gnu@stefankangas.se
>     Z59YikmK2Z1FSmYj172SAdPpuX.gnu@stefankangas.se

Maybe use (base64url-encode-string str t) is an option?


>> Some of the SHA hash algorithms are in the public domain.  Could they be
>> added to Emacs and used for UUID generation here?
>
> We have `secure-hash'.  Is that what you mean?  Or do you mean to use a
> proper RFC 4122 UUID?
>
> All we need is for the Message-ID to be unique though, so some ad hoc
> solution is probably fine.

I am no security expert, so I'll end with this and then be quiet on this
issue.  :-)

I agree that globally unique (with very high probability) is pretty easy
to design in ad hoc ways within a private namespace.  But Message-ID is
not that.  The Message-ID space consists of all Message-ID generated by
all programs past, present and future.  An impossible goal, given that
the RFC basically says "go make something up, have fun with that", so it
is a free for all.

Given that, timestamp + short-rng + ".gnu" doesn't feel good enough to
me, but I won't lose sleep if you use that solution.

In these situations I usually feel better going with what seems like an
existing practice: encode a crypto hash sourced from (possibly security
sensitive) entropy available on the local machine.

Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked
there (https://nullprogram.com/download/uuid.el) I came up with this:

;; Written while discussing Emacs bug#58472
(defun my-message-unique-id()
  (let* ((object (format "%S|%s|%s|%s"
                         (time-convert nil t)
                         (random (expt 2 512))
                         (emacs-uptime)
                         (emacs-pid)))
         (hash (secure-hash 'sha512 object nil nil t))
         (encoded-hash (base64url-encode-string hash t))
         (id (concat encoded-hash ".gnu")))
    ;; Replace leading and trailing non-alnum with ?x.  This is not
    ;; necessary to create strictly conforming Message-Id, but it makes
    ;; them less confusing in practice.
    (dolist (index `(0 ,(1- (length id))))
      (let ((char (aref id index)))
        (when (or (eq char ?-)
                  (eq char ?_))
          (aset id index ?x))))
    id))

This gives IDs like:

   lzD9K4s2GkiGmZhmP46M4ezv37sGEQO2tU2GX2mx-rs.gnu@stefankangas.se
   wEUY87gazgDLnsARlJZ5MEY4pgFZmDhxy9IZhbDLpfA.gnu@stefankangas.se
   VbUzrNkjJiCnZYiDzMdcqz2BqibzismlaR8ZkcU6O7E.gnu@stefankangas.se
   XX8f7gOrT1Iv-NLdZBjHgz8s4u4EF2uE7o-UttFgF0U.gnu@stefankangas.se
   tp9f3Up7ilsrH8XqtmQT4_evPkOO-EteN127d9bt85s.gnu@stefankangas.se

One advantage of this approach is that it is possible to add more
sources of entropy, or grow the hash function, without necessarily
requiring a "format" version change, because we're relying on the hash
for uniqueness in a sort of brute force way, not a schema and a hope
that only Emacs uses it.

One question: I'm not sure that (random (exp 2 512)) actually gives more
entropy than a simple call to (random).  It depends on Emacs' RNG.  Last
I checked there was code still calling something like rand(), which
generally not intended for this purpose.

P.S. if you go with your original approach, I think you want (expt 2 X)
instead of (exp X).

-- 
matt (sent from an Emacs running the feature/noverlay branch)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 16:35         ` Matt Armstrong
@ 2022-10-13 16:38           ` Paul Eggert
  2022-10-14  9:22           ` Stefan Kangas
  1 sibling, 0 replies; 32+ messages in thread
From: Paul Eggert @ 2022-10-13 16:38 UTC (permalink / raw)
  To: Matt Armstrong, Stefan Kangas; +Cc: 58472

On 2022-10-13 09:35, Matt Armstrong wrote:
> One question: I'm not sure that (random (exp 2 512)) actually gives more
> entropy than a simple call to (random).

It does not. (Please see my previous email for how to get more entropy 
out of Emacs; it's kinda ugly now.)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 12:10   ` Stefan Kangas
@ 2022-10-13 19:15     ` Lars Ingebrigtsen
  0 siblings, 0 replies; 32+ messages in thread
From: Lars Ingebrigtsen @ 2022-10-13 19:15 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 58472, Paul Eggert

Stefan Kangas <stefankangas@gmail.com> writes:

> Hmm, right (from `(message) News Headers'):
> 
>     ‘Message-ID’
>          This required header will be generated by Message.  A unique ID
>          will be created based on the date, time, user name (for the local
>          part) and the domain part.
> 
> Actually, the user name is currently only used on MS-DOS, AFAICT.
> So I guess the documentation is already wrong?

Is says "based on", not that the user name is actually in the string.
(And it uses the uid instead, but same same.)

>> That is, we've got tricks like being able to score on References based
>> on your Message-ID to score up threads you've been a part of, etc.
>
> It seems flaky to use it for such scoring purposes though, as almost
> everyone is named "87"...  Do we really have code that does that?
>
> Should we care?

I thought I remembered it being a part of the scoring section of the
manual, so either it's been removed or I misremember.

But that's the reason the algo is the way it is now -- it allows you do
do stuff based on <ID.*@big-domain.com> in References/In-reply-to etc.
If there's only one UID per domain (which is the case these days),
that's less of a thing, of course.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 16:35         ` Matt Armstrong
  2022-10-13 16:38           ` Paul Eggert
@ 2022-10-14  9:22           ` Stefan Kangas
  1 sibling, 0 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-14  9:22 UTC (permalink / raw)
  To: Matt Armstrong; +Cc: 58472, Paul Eggert

Matt Armstrong <matt@rfc20.org> writes:

> Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked
> there (https://nullprogram.com/download/uuid.el) I came up with this:

That's pretty neat!

> One question: I'm not sure that (random (exp 2 512)) actually gives more
> entropy than a simple call to (random).  It depends on Emacs' RNG.  Last
> I checked there was code still calling something like rand(), which
> generally not intended for this purpose.

Ouch, indeed.  I think we should use the stuff Paul pointed to, which
AFAICT just punts to Gnulib.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 16:21     ` Paul Eggert
@ 2022-10-14  9:22       ` Stefan Kangas
  2022-10-16  7:32         ` Stefan Kangas
  2022-10-16 15:19       ` Matt Armstrong
  1 sibling, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-14  9:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 58472, Matt Armstrong

Paul Eggert <eggert@cs.ucla.edu> writes:

> That info is in the Date: line, along with zillions of other Received:
> lines. There should be no need to repeat it in the Message-ID line.

OK, let's use only random data.

> If that's a concern, we should be using more-random data, e.g., with
>
>     (base64-encode-string
>      (secure-hash 'md5 'iv-auto 128 nil t))
>
> if we want 128 bits of randomness (this yields a string like
> "B8a3usyu5QSE/rTLu0nIHg==").

Sounds good to me, but:

The only reference I find to `iv-auto' is in (info "(elisp) Format of
GnuTLS Cryptography Inputs").  The `secure-hash' OBJECT parameter is
documented like this:

    The argument OBJECT should be a buffer or a string.

Is the documentation incomplete?

> As an aside, it's weird that there's no easy way to ask Emacs for an
> N-bit random integer, where the randomness is taken from system entropy.
> Shouldn't we extend Emacs to support that? E.g., (make-string 128
> 'iv-auto) could give you an N-byte entropy-derived random string, or
> (random -N) could give you an N-bit entropy-derived random nonnegative
> integer, or something like that. Then we could write something like this:
>
>    (base64-encode-string (make-string 16 'iv-auto))
>
> to get a Message-ID component with 16 bytes (128 bits) of entropy.

Yes, we should find a better interface here.

We also have `cl-random', which I think support float values.  Perhaps
`random' should do the same?  (See also Bug#9103.)

Oh, and whatever we do, we should probably also document it in `(elisp)
Random Numbers'.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-14  9:22       ` Stefan Kangas
@ 2022-10-16  7:32         ` Stefan Kangas
  2022-10-16 17:05           ` Stefan Kangas
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-16  7:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 58472, Matt Armstrong

Stefan Kangas <stefankangas@gmail.com> writes:

>> If that's a concern, we should be using more-random data, e.g., with
>>
>>     (base64-encode-string
>>      (secure-hash 'md5 'iv-auto 128 nil t))
>>
>> if we want 128 bits of randomness (this yields a string like
>> "B8a3usyu5QSE/rTLu0nIHg==").
>
> Sounds good to me, but:

Looking at the implementation of `secure-hash', doesn't this run the
random number through the MD5 function?  I guess that will reduce
entropy, as AFAIU hash functions like MD5 are not really bijective
functions f: ℕ → ℕ.  (In other words, it does not give you a permutation
of the set of all natural numbers.)

I don't think it matters for generating a Message-ID, but I thought it
was worth pointing out.  So if I'm right, you might not want to use this
particular method for generating cryptographic keys, if anyone happens
to be doing stuff like that in ELisp.

In any case, I do think we should add an easy way of directly accessing
the getrandom(2) syscall [or equivalent].

>> As an aside, it's weird that there's no easy way to ask Emacs for an
>> N-bit random integer, where the randomness is taken from system entropy.
>> Shouldn't we extend Emacs to support that? E.g., (make-string 128
>> 'iv-auto) could give you an N-byte entropy-derived random string, or
>> (random -N) could give you an N-bit entropy-derived random nonnegative
>> integer, or something like that. Then we could write something like this:
>>
>>    (base64-encode-string (make-string 16 'iv-auto))
>>
>> to get a Message-ID component with 16 bytes (128 bits) of entropy.
>
> Yes, we should find a better interface here.

How about `secure-random' (named in analogy with `secure-hash')?

    (secure-random 8)
    => <8 byte number>

This is the same interface as the Linux getrandom(2) system call, and
the portable function in Gnulib, so it's easy enough to implement.

BTW, we currently call getrandom with flags 0 in
extract_data_from_object.  Should we consider using GRND_RANDOM?  The
Linux kernel uses the same code path for /dev/random and /dev/urandom
these days, IIUC, so maybe it doesn't matter.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-13 16:21     ` Paul Eggert
  2022-10-14  9:22       ` Stefan Kangas
@ 2022-10-16 15:19       ` Matt Armstrong
  2022-10-16 16:49         ` Stefan Kangas
  1 sibling, 1 reply; 32+ messages in thread
From: Matt Armstrong @ 2022-10-16 15:19 UTC (permalink / raw)
  To: Paul Eggert, Stefan Kangas; +Cc: 58472

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2022-10-12 19:46, Stefan Kangas wrote:
>
>> The main goal is to avoid collisions, but using the time also gives an
>> idea of when the message was sent, which is kind of nice.
>
> That info is in the Date: line, along with zillions of other Received: 
> lines. There should be no need to repeat it in the Message-ID line.
>
>> Time also
>> guarantees a somewhat unique value even if the user has happened to set
>> the random seed.
>
> If that's a concern, we should be using more-random data, e.g., with
>
>     (base64-encode-string
>      (secure-hash 'md5 'iv-auto 128 nil t))
>
> if we want 128 bits of randomness (this yields a string like 
> "B8a3usyu5QSE/rTLu0nIHg==").

Small suggestion: `base64-url-encode-string` avoids any trailing `===`
and uses slightly preferable non-alnum chars (I think).

But both can generate Message-Id with "command line switch" chars:
either ?- or ?/.  Since some tools expect users to work directly with
Message-ID at times (https://notmuchmail.org/) it might be nice to avoid
leading non-alnum chars, and possibly avoid '-' to avoid any confusion
with a UUID (in the sense of the schema defined by the RFC standard).

Maybe a base 62 encoder could be written just for this, as Emacs'
version of this doesn't need to be fast.  Can a string be turned into a
non-negative bignum integer in (simple) elisp?

-- 
matt (sent from an Emacs running the feature/noverlay branch)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-16 15:19       ` Matt Armstrong
@ 2022-10-16 16:49         ` Stefan Kangas
  2022-10-17  6:17           ` Matt Armstrong
  2022-10-17  7:30           ` Paul Eggert
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-16 16:49 UTC (permalink / raw)
  To: Matt Armstrong, Paul Eggert; +Cc: 58472

Matt Armstrong <matt@rfc20.org> writes:

> Since some tools expect users to work directly with Message-ID at
> times (https://notmuchmail.org/) it might be nice to avoid leading
> non-alnum chars,

I agree that it'd be nice to just use alpha-numeric characters.

> Maybe a base 62 encoder could be written just for this, as Emacs'
> version of this doesn't need to be fast.

A base62 encoder is just `message-number-base36' with the A-Z range
added.  I think I included that in a previous patch.

> Can a string be turned into a non-negative bignum integer in
> (simple) elisp?

Does this look reasonable?

    (seq-reduce (lambda (a i) (+ (ash a 8) i))
                (secure-hash 'md5 'iv-auto 128 nil t) 0)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-16  7:32         ` Stefan Kangas
@ 2022-10-16 17:05           ` Stefan Kangas
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-16 17:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 58472, Matt Armstrong

Stefan Kangas <stefankangas@gmail.com> writes:

> BTW, we currently call getrandom with flags 0 in
> extract_data_from_object.  Should we consider using GRND_RANDOM?

linux.git/include/uapi/linux/random.h says that:

/*
 * Flags for getrandom(2)
 *
 * GRND_NONBLOCK	Don't block and return EAGAIN instead
 * GRND_RANDOM		No effect
 * GRND_INSECURE	Return non-cryptographic random bytes
 */

But the story is probably not the same on other systems, according to
the glibc documentation, so we should probably not change this.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-16 16:49         ` Stefan Kangas
@ 2022-10-17  6:17           ` Matt Armstrong
  2022-10-17  7:30           ` Paul Eggert
  1 sibling, 0 replies; 32+ messages in thread
From: Matt Armstrong @ 2022-10-17  6:17 UTC (permalink / raw)
  To: Stefan Kangas, Paul Eggert; +Cc: 58472

Stefan Kangas <stefankangas@gmail.com> writes:

> Matt Armstrong <matt@rfc20.org> writes:
>
>> Can a string be turned into a non-negative bignum integer in
>> (simple) elisp?
>
> Does this look reasonable?
>
>     (seq-reduce (lambda (a i) (+ (ash a 8) i))
>                 (secure-hash 'md5 'iv-auto 128 nil t) 0)

Yep, not too bad, thanks.

-- 
matt (sent from an Emacs running the feature/noverlay branch)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-16 16:49         ` Stefan Kangas
  2022-10-17  6:17           ` Matt Armstrong
@ 2022-10-17  7:30           ` Paul Eggert
  2022-10-17  8:14             ` Stefan Kangas
                               ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Paul Eggert @ 2022-10-17  7:30 UTC (permalink / raw)
  To: Stefan Kangas, Matt Armstrong; +Cc: 58472

[-- Attachment #1: Type: text/plain, Size: 354 bytes --]

I've been looking into this and have several patches along these lines. 
None of them address message-unique-id directly yet (I plan to tackle 
this soon) but they do address the general problem area. The basic idea 
is to use a new make-nonce primitive.

I thought I'd email the patches now to see what others think about the 
direction they're headed.

[-- Attachment #2: 0001-No-need-to-call-w32_init_random.patch --]
[-- Type: text/x-patch, Size: 1273 bytes --]

From 80c18cb40c3c9ddd56e88019387a3827d0f6add0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 10:48:29 -0700
Subject: [PATCH 01/10] No need to call w32_init_random
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/sysdep.c (init_random): Just use getrandom,
since we’re already using it unconditionally elsewhere.
---
 src/sysdep.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/sysdep.c b/src/sysdep.c
index 736723bdf3..4786c8fa4f 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2163,17 +2163,11 @@ seed_random (void *seed, ptrdiff_t seed_size)
 init_random (void)
 {
   random_seed v;
-  bool success = false;
 
   /* First, try seeding the PRNG from the operating system's entropy
      source.  This approach is both fast and secure.  */
-#ifdef WINDOWSNT
-  /* FIXME: Perhaps getrandom can be used here too?  */
-  success = w32_init_random (&v, sizeof v) == 0;
-#else
   verify (sizeof v <= 256);
-  success = getrandom (&v, sizeof v, 0) == sizeof v;
-#endif
+  bool success = getrandom (&v, sizeof v, 0) == sizeof v;
 
   /* If that didn't work, just use the current time value and PID.
      It's at least better than XKCD 221.  */
-- 
2.34.1


[-- Attachment #3: 0002-Simplify-feedmail-Message-ID-generation.patch --]
[-- Type: text/x-patch, Size: 953 bytes --]

From a793ed61e6ed216d15a012deb980301d0e61fd43 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 10:48:29 -0700
Subject: [PATCH 02/10] Simplify feedmail Message-ID generation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/mail/feedmail.el (feedmail-default-message-id-generator):
Avoid an unnecessary call to ‘mod’.
---
 lisp/mail/feedmail.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/mail/feedmail.el b/lisp/mail/feedmail.el
index 2ae916e3ac..047c8f404e 100644
--- a/lisp/mail/feedmail.el
+++ b/lisp/mail/feedmail.el
@@ -2819,7 +2819,7 @@ feedmail-default-message-id-generator
 	(setq date-time (file-attribute-modification-time
 			 (file-attributes maybe-file))))
     (format "<%d-%s%s>"
-	    (mod (random) 10000)
+	    (random 10000)
 	    (format-time-string "%a%d%b%Y%H%M%S%z" date-time)
 	    end-stuff))
   )
-- 
2.34.1


[-- Attachment #4: 0003-Simplify-calc-comb-by-using-random.patch --]
[-- Type: text/x-patch, Size: 7366 bytes --]

From 318c88aaf1033d452a5449eba45aa382988412f7 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 23:35:21 -0700
Subject: [PATCH 03/10] =?UTF-8?q?Simplify=20calc-comb=20by=20using=20?=
 =?UTF-8?q?=E2=80=98random=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/calc/calc-comb.el (math-random-table, math-last-RandSeed)
(math-random-ptr1, math-random-ptr2, math-random-shift)
(var-RandSeed, math-random-cache, math-init-random-base)
(math-random-base, math-random-last)
(math-random-three-digit-number):
Now obsolete, as we can assume that ‘random’ is good enough.
(math-random-digits): Simplify by using ‘random’.
---
 lisp/calc/calc-comb.el  | 96 ++++++++---------------------------------
 lisp/calc/calc-ext.el   |  6 ++-
 lisp/calc/calc-stuff.el |  4 +-
 3 files changed, 23 insertions(+), 83 deletions(-)

diff --git a/lisp/calc/calc-comb.el b/lisp/calc/calc-comb.el
index c1352fa324..522a7ac6c0 100644
--- a/lisp/calc/calc-comb.el
+++ b/lisp/calc/calc-comb.el
@@ -546,105 +546,42 @@ math-stirling-2
 		      (math-mul m (math-stirling-2 (1- n) m))))))
 
 (defvar math-random-table nil)
+(make-obsolete-variable 'math-random-table nil "29.1")
 (defvar math-last-RandSeed nil)
+(make-obsolete-variable 'math-last-RandSeed nil "29.1")
 (defvar math-random-ptr1 nil)
+(make-obsolete-variable 'math-random-ptr1 nil "29.1")
 (defvar math-random-ptr2 nil)
-(defvar math-random-shift nil)
+(make-obsolete-variable 'math-random-ptr2 nil "29.1")
+(defvar math-random-shift -4)	; assume RAND_MAX >= 16383
+(make-obsolete-variable 'math-random-shift nil "29.1")
 
-;;; Produce a random 10-bit integer, with (random) if no seed provided,
-;;; or else with Numerical Recipes algorithm ran3 / Knuth 3.2.2-A.
+;;; Produce a random 10-bit integer.
 
 (defvar var-RandSeed)
+(make-obsolete-variable 'var-RandSeed nil "29.1")
 (defvar math-random-cache nil)
-(defvar math-gaussian-cache nil)
+(make-obsolete-variable 'math-random-cache nil "29.1")
 
 (defun math-init-random-base ()
-  (if (and (boundp 'var-RandSeed) var-RandSeed)
-      (if (eq (car-safe var-RandSeed) 'vec)
-	  nil
-	(if (Math-integerp var-RandSeed)
-	    (let* ((seed (math-sub 161803 var-RandSeed))
-		   (mj (1+ (math-mod seed 1000000)))
-		   (mk (1+ (math-mod (math-quotient seed 1000000)
-                                     1000000)))
-		   (i 0))
-	      (setq math-random-table (cons 'vec (make-list 55 mj)))
-	      (while (<= (setq i (1+ i)) 54)
-		(let* ((ii (% (* i 21) 55))
-		       (p (nthcdr ii math-random-table)))
-		  (setcar p mk)
-		  (setq mk (- mj mk)
-			mj (car p)))))
-	  (math-reject-arg var-RandSeed "*RandSeed must be an integer"))
-	(setq var-RandSeed (list 'vec var-RandSeed)
-	      math-random-ptr1 math-random-table
-	      math-random-cache nil
-	      math-random-ptr2 (nthcdr 31 math-random-table))
-	(let ((i 200))
-	  (while (> (setq i (1- i)) 0)
-	    (math-random-base))))
-    (setq var-RandSeed nil
-	  math-random-cache nil
-	  math-random-shift -4)  ; assume RAND_MAX >= 16383
-    ;; This exercises the random number generator and also helps
-    ;; deduce a better value for RAND_MAX.
-    (let ((i 0))
-      (while (< (setq i (1+ i)) 30)
-        (if (> (ash (math-abs (random)) math-random-shift) 4095)
-            (setq math-random-shift (1- math-random-shift))))))
-  (setq math-last-RandSeed var-RandSeed
-	math-gaussian-cache nil))
+  (declare (obsolete nil "29.1")))
 
 (defun math-random-base ()
-  (if var-RandSeed
-      (progn
-	(setq math-random-ptr1 (or (cdr math-random-ptr1)
-				   (cdr math-random-table))
-	      math-random-ptr2 (or (cdr math-random-ptr2)
-				   (cdr math-random-table)))
-	(logand (ash (setcar math-random-ptr1
-			     (logand (- (car math-random-ptr1)
-					(car math-random-ptr2)) 524287))
-		     -6) 1023))
-    (logand (ash (random) math-random-shift) 1023)))
-
+  (declare (obsolete 'random "29.1"))
+  (random 1024))
 
 ;;; Produce a random digit in the range 0..999.
-;;; Avoid various pitfalls that may lurk in the built-in (random) function!
-;;; Shuffling algorithm from Numerical Recipes, section 7.1.
 (defvar math-random-last)
+(make-obsolete-variable 'math-random-last nil "29.1")
 (defun math-random-three-digit-number ()
   "Return a random three digit number."
-  (let (i)
-    (or (and (boundp 'var-RandSeed) (eq var-RandSeed math-last-RandSeed))
-	(math-init-random-base))
-    (or math-random-cache
-	(progn
-	  (setq math-random-last (math-random-base)
-		math-random-cache (make-vector 13 nil)
-		i -1)
-	  (while (< (setq i (1+ i)) 13)
-	    (aset math-random-cache i (math-random-base)))))
-    (while (progn
-	     (setq i (/ math-random-last 79)   ; 0 <= i < 13
-		   math-random-last (aref math-random-cache i))
-	     (aset math-random-cache i (math-random-base))
-	     (>= math-random-last 1000)))
-    math-random-last))
+  (declare (obsolete 'random "29.1"))
+  (random 1000))
 
 ;;; Produce an N-digit random integer.
 (defun math-random-digits (n)
   "Produce a random N digit integer."
-  (let* ((slop (% (- 3 (% n 3)) 3))
-         (i (/ (+ n slop) 3))
-         (rnum 0))
-    (while (> i 0)
-      (setq rnum
-            (math-add
-             (math-random-three-digit-number)
-             (math-mul rnum 1000)))
-      (setq i (1- i)))
-    (math-normalize (math-scale-right rnum slop))))
+  (random (expt 10 n)))
 
 ;;; Produce a uniformly-distributed random float 0 <= N < 1.
 (defun math-random-float ()
@@ -652,6 +589,7 @@ math-random-float
 		   (- calc-internal-prec)))
 
 ;;; Produce a Gaussian-distributed random float with mean=0, sigma=1.
+(defvar math-gaussian-cache nil)
 (defun math-gaussian-float ()
   (math-with-extra-prec 2
     (if (and math-gaussian-cache
diff --git a/lisp/calc/calc-ext.el b/lisp/calc/calc-ext.el
index 7ee73d100a..55808d28b1 100644
--- a/lisp/calc/calc-ext.el
+++ b/lisp/calc/calc-ext.el
@@ -785,8 +785,10 @@ calc-init-extensions
 calcFunc-gcd calcFunc-lcm calcFunc-moebius calcFunc-nextprime
 calcFunc-perm calcFunc-prevprime calcFunc-prfac calcFunc-prime
 calcFunc-random calcFunc-shuffle calcFunc-stir1 calcFunc-stir2
-calcFunc-totient math-init-random-base math-member math-prime-test
-math-random-base)
+calcFunc-totient
+math-init-random-base ; obsolescent as of 29.1
+math-random-base ; obsolescent as of 29.1
+math-member math-prime-test)
 
  ("calccomp" calcFunc-cascent calcFunc-cdescent
 calcFunc-cheight calcFunc-cwidth math-comp-ascent math-comp-descent
diff --git a/lisp/calc/calc-stuff.el b/lisp/calc/calc-stuff.el
index 758b920184..87a7eb6e65 100644
--- a/lisp/calc/calc-stuff.el
+++ b/lisp/calc/calc-stuff.el
@@ -155,7 +155,7 @@ math-lud-cache
 (defvar math-log2-cache) ; calc-bin.el
 (defvar math-radix-digits-cache) ; calc-bin.el
 (defvar math-radix-float-cache-tag) ; calc-bin.el
-(defvar math-random-cache) ; calc-comb.el
+(defvar math-random-cache) ; calc-comb.el; obsolescent as of 29.1
 (defvar math-max-digits-cache) ; calc-bin.el
 (defvar math-integral-cache) ; calcalg2.el
 (defvar math-units-table) ; calc-units.el
@@ -170,7 +170,7 @@ calc-flush-caches
 	 math-log2-cache nil
 	 math-radix-digits-cache nil
 	 math-radix-float-cache-tag nil
-	 math-random-cache nil
+	 math-random-cache nil ; obsolescent as of 29.1
 	 math-max-digits-cache nil
 	 math-integral-cache nil
 	 math-units-table nil
-- 
2.34.1


[-- Attachment #5: 0004-Simplify-calc-prog-by-using-random.patch --]
[-- Type: text/x-patch, Size: 1323 bytes --]

From 573d09bfa3b472e31795add03ae6a58c7ad347d1 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 23:37:07 -0700
Subject: [PATCH 04/10] =?UTF-8?q?Simplify=20calc-prog=20by=20using=20?=
 =?UTF-8?q?=E2=80=98random=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/calc/calc-prog.el (calc-user-define-formula):
Simplify by using (random N).
---
 lisp/calc/calc-prog.el | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lisp/calc/calc-prog.el b/lisp/calc/calc-prog.el
index 127f6340a1..0409ef8027 100644
--- a/lisp/calc/calc-prog.el
+++ b/lisp/calc/calc-prog.el
@@ -200,7 +200,7 @@ calc-user-define-formula
 			  (format "%03d" key)))
 	   odef (assq key (calc-user-key-map)))
      (unless keyname
-       (setq keyname (format "%05d" (abs (% (random) 10000)))))
+       (setq keyname (format "%05d" (random 10000))))
      (while
 	 (progn
 	   (setq cmd-base-default (concat "User-" keyname))
@@ -268,7 +268,7 @@ calc-user-define-formula
 	 (setq func (intern (concat "calcFunc-User"
 				    (or keyname
 					(and cmd (symbol-name cmd))
-					(format "%05d" (% (random) 10000)))))))
+					(format "%05d" (random 10000)))))))
 
      (if is-lambda
 	 (setq calc-user-formula-alist math-arglist)
-- 
2.34.1


[-- Attachment #6: 0005-doc-lispref-numbers.texi-Improve-random-doc.patch --]
[-- Type: text/x-patch, Size: 2317 bytes --]

From b111f941e13bd627870f7e812b8be15b7fee4548 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 21:35:47 -0700
Subject: [PATCH 05/10] =?UTF-8?q?*=20doc/lispref/numbers.texi:=20Improve?=
 =?UTF-8?q?=20=E2=80=98random=E2=80=99=20doc.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 doc/lispref/numbers.texi | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index fdcda328d8..c1a1349d1a 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -1238,6 +1238,9 @@ Random Numbers
 sequence of numbers.  By default, Emacs initializes the random seed at
 startup, in such a way that the sequence of values of @code{random}
 (with overwhelming likelihood) differs in each Emacs run.
+The random seed is typically initialized from system entropy;
+however, on obsolescent platforms lacking entropy pools,
+the seed is taken from less-random volatile data such as the current time.
 
   Sometimes you want the random number sequence to be repeatable.  For
 example, when debugging a program whose behavior depends on the random
@@ -1256,12 +1259,17 @@ Random Numbers
 any fixnum, i.e., any integer from @code{most-negative-fixnum} through
 @code{most-positive-fixnum} (@pxref{Integer Basics}).
 
+If @var{limit} is a string, it means to choose a new seed based on the
+string's contents.  This causes later calls to @code{random} to return
+a reproducible sequence of results.
+
 If @var{limit} is @code{t}, it means to choose a new seed as if Emacs
-were restarting, typically from the system entropy.  On systems
-lacking entropy pools, choose the seed from less-random volatile data
-such as the current time.
+were restarting.  This causes later calls to @code{random} to return
+an unpredictable sequence of results.
 
-If @var{limit} is a string, it means to choose a new seed based on the
-string's contents.
+If you need a random nonce for cryptographic purposes, @code{(random
+t)} is typically not the best approach, as it can adversely affect other
+parts of your program that benefit from reproducible results, and it can
+leave information about the nonce scattered about Emacs's internal state.
 
 @end defun
-- 
2.34.1


[-- Attachment #7: 0006-New-function-make-nonce.patch --]
[-- Type: text/x-patch, Size: 6997 bytes --]

From 7113ce5ab4a265db7f2870c6614da88d09407604 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 16:33:05 -0700
Subject: [PATCH 06/10] New function make-nonce

* src/alloc.c (clear_nonce, Fmake_nonce): New functions.
* src/fns.c: Do not include <sys/random.h>.
(extract_data_from_object): Simplify by calling get_entropy.
* src/sysdep.c (get_entropy): New function, taken from
the old extract_data_from_object.
---
 doc/lispref/numbers.texi |  2 ++
 doc/lispref/strings.texi | 12 ++++++++++++
 etc/NEWS                 |  4 ++++
 src/alloc.c              | 36 ++++++++++++++++++++++++++++++++++++
 src/fns.c                | 12 +-----------
 src/lisp.h               |  3 ++-
 src/sysdep.c             | 16 ++++++++++++++++
 7 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index c1a1349d1a..079f7bda9c 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -1271,5 +1271,7 @@ Random Numbers
 t)} is typically not the best approach, as it can adversely affect other
 parts of your program that benefit from reproducible results, and it can
 leave information about the nonce scattered about Emacs's internal state.
+For nonces, it is typically better to use @code{make-nonce}
+(@pxref{Creating Strings}).
 
 @end defun
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index cf961e9e7c..0f3e0ae213 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -455,6 +455,18 @@ Creating Strings
 Remove the final newline, if any, from @var{string}.
 @end defun
 
+@defun make-nonce length &optional function
+Return a newly created random string of length @var{length}.
+The string is unibyte, with bytes taken from system entropy,
+and with each string value equally likely.
+
+If @var{function} is given, call it with the newly created string as
+an argument and return the value that @var{function} returns.
+When the function exits, overwrite the string's random contents with
+unspecified bytes, to lessen information leakage in insecure code.
+The string's contents are therefore valid only during the function call.
+@end defun
+
 @node Modifying Strings
 @section Modifying Strings
 @cindex modifying strings
diff --git a/etc/NEWS b/etc/NEWS
index 041fe0bdbd..2938d6acaf 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3422,6 +3422,10 @@ string, and a callback is called when the user types 'C-c C-c'.
 This is a modal version of 'string-edit', and can be used as an
 alternative to 'read-string'.
 
++++
+** New function 'make-nonce'.
+This creates a random string, useful for cryptographic nonces.
+
 +++
 ** The return value of 'clear-message-function' is not ignored anymore.
 If the function returns 'dont-clear-message', then the message is not
diff --git a/src/alloc.c b/src/alloc.c
index 419c5e558b..f8fd954ff6 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -2266,6 +2266,41 @@ DEFUN ("make-string", Fmake_string, Smake_string, 2, 3, 0,
   return val;
 }
 
+/* Set all the bytes of NONCE (a unibyte string) to unspecifed values,
+   as securely as possible.  */
+
+static void
+clear_nonce (Lisp_Object nonce)
+{
+  explicit_bzero (SDATA (nonce), SCHARS (nonce));
+}
+
+DEFUN ("make-nonce", Fmake_nonce, Smake_nonce,
+       1, 2, 0,
+       doc: /* Return a random string of length LENGTH.
+The string is unibyte, with bytes taken from system entropy.
+
+If FUNCTION is given, call it with with newly created string as an
+argument and return the value that FUNCTION returns.  When the
+function exits, overwrite the newly created string's contents with
+unspecified bytes, for security.  */)
+  (Lisp_Object length, Lisp_Object function)
+{
+  CHECK_FIXNAT (length);
+  EMACS_INT nbytes = XFIXNAT (length);
+  Lisp_Object nonce = make_uninit_string (nbytes);
+  get_entropy (SDATA (nonce), XFIXNAT (length));
+  if (NILP (function))
+    return nonce;
+
+  /* Pin the string, and clear it after FUNCTION exits, to lessen
+     information leakage if Emacs is buggy elsewhere.  */
+  pin_string (nonce);
+  specpdl_ref count = SPECPDL_INDEX ();
+  record_unwind_protect (clear_nonce, nonce);
+  return unbind_to (count, call1 (function, nonce));
+}
+
 /* Fill A with 1 bits if INIT is non-nil, and with 0 bits otherwise.
    Return A.  */
 
@@ -7862,6 +7897,7 @@ syms_of_alloc (void)
   defsubr (&Smake_vector);
   defsubr (&Smake_record);
   defsubr (&Smake_string);
+  defsubr (&Smake_nonce);
   defsubr (&Smake_bool_vector);
   defsubr (&Smake_symbol);
   defsubr (&Smake_marker);
diff --git a/src/fns.c b/src/fns.c
index 4055792382..3d2c8c88ab 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -21,7 +21,6 @@ Copyright (C) 1985-1987, 1993-1995, 1997-2022 Free Software Foundation,
 #include <config.h>
 
 #include <stdlib.h>
-#include <sys/random.h>
 #include <unistd.h>
 #include <filevercmp.h>
 #include <intprops.h>
@@ -5683,16 +5682,7 @@ extract_data_from_object (Lisp_Object spec,
         {
 	  EMACS_INT start_hold = XFIXNAT (start);
           object = make_uninit_string (start_hold);
-	  char *lim = SSDATA (object) + start_hold;
-	  for (char *p = SSDATA (object); p < lim; p++)
-	    {
-	      ssize_t gotten = getrandom (p, lim - p, 0);
-	      if (0 <= gotten)
-		p += gotten;
-	      else if (errno != EINTR)
-		report_file_error ("Getting random data", Qnil);
-	    }
-
+	  get_entropy (SDATA (object), start_hold);
           *start_byte = 0;
           *end_byte = start_hold;
         }
diff --git a/src/lisp.h b/src/lisp.h
index 5f6721595c..053725b798 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1581,7 +1581,7 @@ CDR_SAFE (Lisp_Object c)
 	 If negative, the string is unibyte:
 	 -1 for data normally allocated
 	 -2 for data in rodata (C string constants)
-	 -3 for data that must be immovable (used for bytecode)  */
+	 -3 for data that must be immovable (used for bytecode and nonces)  */
       ptrdiff_t size_byte;
 
       INTERVAL intervals;	/* Text properties in this string.  */
@@ -5042,6 +5042,7 @@ maybe_disable_address_randomization (int argc, char **argv)
 extern unsigned long int get_random_ulong (void);
 extern void seed_random (void *, ptrdiff_t);
 extern void init_random (void);
+extern void get_entropy (void *, ptrdiff_t);
 extern void emacs_backtrace (int);
 extern AVOID emacs_abort (void) NO_INLINE;
 extern int emacs_fstatat (int, char const *, void *, int);
diff --git a/src/sysdep.c b/src/sysdep.c
index 4786c8fa4f..5117460fc0 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2159,6 +2159,22 @@ seed_random (void *seed, ptrdiff_t seed_size)
   set_random_seed (arg);
 }
 
+/* Set BUF, of size BUFSIZE, to random data derived from system entropy.  */
+
+void
+get_entropy (void *buf, ptrdiff_t bufsize)
+{
+  char *p = buf, *lim = p + bufsize;
+  while (p < lim)
+    {
+      ssize_t gotten = getrandom (p, lim - p, 0);
+      if (0 <= gotten)
+	p += gotten;
+      else if (errno != EINTR)
+	report_file_error ("Getting random data", Qnil);
+    }
+}
+
 void
 init_random (void)
 {
-- 
2.34.1


[-- Attachment #8: 0007-Simplify-auth-source-obfuscate-via-make-nonce.patch --]
[-- Type: text/x-patch, Size: 1281 bytes --]

From 45efe70ce2fff4211013b688911b76a6400fb0f5 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 17 Oct 2022 00:05:31 -0700
Subject: [PATCH 07/10] =?UTF-8?q?Simplify=20auth-source--obfuscate=20via?=
 =?UTF-8?q?=20=E2=80=98make-nonce=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/auth-source.el (auth-source--obfuscate):
Simplify by using make-nonce.  This also improves the nonce by using
all the bits in each byte.
---
 lisp/auth-source.el | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lisp/auth-source.el b/lisp/auth-source.el
index feefd391a8..cba7324ea9 100644
--- a/lisp/auth-source.el
+++ b/lisp/auth-source.el
@@ -1216,9 +1216,7 @@ auth-source--obfuscate
   ;; useful information is leaked.  If you reset the nonce, you also
   ;; have to call `auth-source-forget-all-cached'.
   (unless auth-source--session-nonce
-    (setq auth-source--session-nonce
-          (apply #'string (cl-loop repeat 16
-                                   collect (random 128)))))
+    (setq auth-source--session-nonce (make-nonce 16)))
   (if (and (fboundp 'gnutls-symmetric-encrypt)
            (gnutls-available-p))
       (let ((cdata (car (last (gnutls-ciphers)))))
-- 
2.34.1


[-- Attachment #9: 0008-Simplify-ntml-by-using-make-nonce.patch --]
[-- Type: text/x-patch, Size: 952 bytes --]

From e5066b0b790f3d809b016967b222c4eafe8e7b88 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 23:37:58 -0700
Subject: [PATCH 08/10] Simplify ntml by using make-nonce

* lisp/net/ntlm.el (ntlm-generate-nonce): Simplify.
---
 lisp/net/ntlm.el | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lisp/net/ntlm.el b/lisp/net/ntlm.el
index b58f0abb56..03a6c37ff3 100644
--- a/lisp/net/ntlm.el
+++ b/lisp/net/ntlm.el
@@ -221,9 +221,7 @@ ntlm-compute-timestamp
 (defun ntlm-generate-nonce ()
   "Generate a random nonce, not to be used more than once.
 Return a random eight byte unibyte string."
-  (unibyte-string
-   (random 256) (random 256) (random 256) (random 256)
-   (random 256) (random 256) (random 256) (random 256)))
+  (make-nonce 8))
 
 (defun ntlm-build-auth-response (challenge user password-hashes)
   "Return the response string to a challenge string CHALLENGE given by
-- 
2.34.1


[-- Attachment #10: 0009-Improve-randomness-of-server.el.patch --]
[-- Type: text/x-patch, Size: 1861 bytes --]

From 124636f5202a41c8feb46330f3c8fcd9f4eed9e8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 23:40:14 -0700
Subject: [PATCH 09/10] Improve randomness of server.el

* lisp/server.el (server-generate-key): Generate from system
entropy, rather than from (random) which is lower quality.
---
 lisp/server.el | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/lisp/server.el b/lisp/server.el
index 90d97c1538..b2bc245c02 100644
--- a/lisp/server.el
+++ b/lisp/server.el
@@ -590,10 +590,25 @@ server-generate-key
 The key is a 64-byte string of random chars in the range `!'..`~'.
 If called interactively, also inserts it into current buffer."
   (interactive)
-  (let ((auth-key
-	 (cl-loop repeat 64
-                  collect (+ 33 (random 94)) into auth
-                  finally return (concat auth))))
+  (let* ((base 94) (auth-key-len 64)
+	 (auth-key (make-string auth-key-len 0))
+	 ;; 1+ because we divide by BASE first, before taking the remainder.
+	 ;; The division is first because if we took the remainder first
+	 ;; the first remainder would not be entirely random.
+	 (nonce-length (1+ (ceiling (logb (expt base auth-key-len)) 8))))
+    ;; Use make-nonce with a function arg, to clear the nonce.
+    ;; auth-key and the bignum n still have the nonce's info, though.
+    (make-nonce
+     nonce-length
+     #'(lambda (nonce)
+	 (let ((n 0))
+	   ;; Set N to be all the nonce's bits, concatenated.
+	   (cl-loop for i below nonce-length
+		    do (setq n (+ (* 256 n) (aref nonce i))))
+	   ;; Repeatedly divide and remainder to compute each byte.
+	   (cl-loop for i below auth-key-len
+		    do (setq n (/ n base))
+		       (aset auth-key i (+ 33 (% n base)))))))
     (if (called-interactively-p 'interactive)
 	(insert auth-key))
     auth-key))
-- 
2.34.1


[-- Attachment #11: 0010-Improve-randomness-of-message-canlock-generate.patch --]
[-- Type: text/x-patch, Size: 1057 bytes --]

From 54a2d5a65b2e868ce3b18c47f50886d07414adec Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 17 Oct 2022 00:00:44 -0700
Subject: [PATCH 10/10] Improve randomness of message-canlock-generate

* lisp/gnus/message.el (message-canlock-generate):
Simplify by using make-nonce.
---
 lisp/gnus/message.el | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/lisp/gnus/message.el b/lisp/gnus/message.el
index 5e4e9854a6..670844703d 100644
--- a/lisp/gnus/message.el
+++ b/lisp/gnus/message.el
@@ -5192,10 +5192,7 @@ message-send-mail-with-mailclient
 (defun message-canlock-generate ()
   "Return a string that is non-trivial to guess.
 Do not use this for anything important, it is cryptographically weak."
-  (sha1 (concat (message-unique-id)
-                (format "%x%x%x" (random) (random) (random))
-                (prin1-to-string (recent-keys))
-                (prin1-to-string (garbage-collect)))))
+  (make-nonce 20 #'sha1))
 
 (defvar canlock-password)
 (defvar canlock-password-for-verify)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  7:30           ` Paul Eggert
@ 2022-10-17  8:14             ` Stefan Kangas
  2022-10-17  8:23               ` Eli Zaretskii
  2022-10-17 18:47               ` Matt Armstrong
  2022-10-17  8:16             ` Eli Zaretskii
  2022-10-17 18:40             ` Matt Armstrong
  2 siblings, 2 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-10-17  8:14 UTC (permalink / raw)
  To: Paul Eggert, Matt Armstrong; +Cc: 58472

[-- Attachment #1: Type: text/plain, Size: 1680 bytes --]

Paul Eggert <eggert@cs.ucla.edu> writes:

> I've been looking into this and have several patches along these lines.
> None of them address message-unique-id directly yet (I plan to tackle
> this soon) but they do address the general problem area. The basic idea
> is to use a new make-nonce primitive.

Thanks!  I have read your patchset, which looks good to me.

I have also attached my latest patch for `message-unique-id', but I'm
not married to it if you have something better in mind.  It could easily
be updated to use `make-nonce' though.

>  (defun math-init-random-base ()
[...snip...]
> +  (declare (obsolete nil "29.1")))

This is a nit, but perhaps this could be simplified to:

    (declare-obsolete-function-alias 'math-init-random-base
       #'ignore "29.1)

> diff --git a/src/sysdep.c b/src/sysdep.c
> index 4786c8fa4f..5117460fc0 100644
> --- a/src/sysdep.c
> +++ b/src/sysdep.c
> @@ -2159,6 +2159,22 @@ seed_random (void *seed, ptrdiff_t seed_size)
>    set_random_seed (arg);
>  }
>
> +/* Set BUF, of size BUFSIZE, to random data derived from system entropy.  */
> +
> +void
> +get_entropy (void *buf, ptrdiff_t bufsize)
> +{
> +  char *p = buf, *lim = p + bufsize;
> +  while (p < lim)
> +    {
> +      ssize_t gotten = getrandom (p, lim - p, 0);
> +      if (0 <= gotten)
> +	p += gotten;
> +      else if (errno != EINTR)
> +	report_file_error ("Getting random data", Qnil);
> +    }
> +}

If we claim that the random data is suitable for cryptographic purposes,
should we be using the GRND_RANDOM flag here?

On Linux, flags 0 and GRND_RANDOM are equivalent (I read the most recent
kernel code to verify this).  But I have no idea about other platforms.

[-- Attachment #2: 0001-Make-message-unique-id-less-prone-to-collisions.patch --]
[-- Type: text/x-diff, Size: 5465 bytes --]

From 811c7e70b55e7aeace792ffbf19203ef36f30bd2 Mon Sep 17 00:00:00 2001
From: Stefan Kangas <stefankangas@gmail.com>
Date: Wed, 12 Oct 2022 17:29:04 +0200
Subject: [PATCH] Make `message-unique-id' less prone to collisions

* lisp/gnus/message.el (message-unique-id): Make less prone to
collisions.  (Bug#58472)
(message-number-base36): Make obsolete in favor of...
(message--number-base62): ...this new function.
(message-unique-id-char): Make unused variable obsolete.
* test/lisp/gnus/message-tests.el (message-unique-id-test)
(message-number-base62-test): New tests.
---
 lisp/gnus/message.el            | 79 ++++++++++++++++-----------------
 test/lisp/gnus/message-tests.el | 10 +++++
 2 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/lisp/gnus/message.el b/lisp/gnus/message.el
index a714e31876..1590a8b289 100644
--- a/lisp/gnus/message.el
+++ b/lisp/gnus/message.el
@@ -5886,50 +5886,36 @@ message-make-message-id
 		"_-_" ""))
 	  "@" (message-make-fqdn) ">"))
 
-(defvar message-unique-id-char nil)
-
-;; If you ever change this function, make sure the new version
-;; cannot generate IDs that the old version could.
-;; You might for example insert a "." somewhere (not next to another dot
-;; or string boundary), or modify the "fsf" string.
 (defun message-unique-id ()
-  ;; Don't use fractional seconds from timestamp; they may be unsupported.
-  ;; Instead we use this randomly inited counter.
-  (setq message-unique-id-char
-	;; 2^16 * 25 just fits into 4 digits i base 36.
-	(let ((base (* 25 25)))
-	  (if message-unique-id-char
-	      (% (1+ message-unique-id-char) base)
-	    (random base))))
-  (let ((tm (time-convert nil 'integer)))
+  "Return a generated string suitable for using as a unique Message-ID."
+  (let ((random-num
+         (seq-reduce (lambda (a i) (+ (ash a 8) i))
+                     (secure-hash 'md5 'iv-auto 128 nil t) 0)))
+    ;; Use the PRNG to set an extra couple of bits, to avoid having
+    ;; every Message-Id start with "A-F".
+    (setq random-num (logior random-num (ash (random 16) 128)))
     (concat
-     (if (or (eq system-type 'ms-dos)
-	     ;; message-number-base36 doesn't handle bigints.
-	     (floatp (user-uid)))
-	 (let ((user (downcase (user-login-name))))
-	   (while (string-match "[^a-z0-9_]" user)
-	     (aset user (match-beginning 0) ?_))
-	   user)
-       (message-number-base36 (user-uid) -1))
-     (message-number-base36 (+ (ash tm -16)
-			       (ash (% message-unique-id-char 25) 16))
-			    4)
-     (message-number-base36 (+ (logand tm #xffff)
-			       (ash (/ message-unique-id-char 25) 16))
-			    4)
-     ;; Append a given name, because while the generated ID is unique
-     ;; to this newsreader, other newsreaders might otherwise generate
-     ;; the same ID via another algorithm.
-     ".fsf")))
-
-(defun message-number-base36 (num len)
+     ;; We used to base this on time, euid, etc., but now just use 128
+     ;; bits of random data.  The main requirements are that collisions
+     ;; are unlikely, and that the alphabet is limited.  Note that the
+     ;; random data will be securely generated by getrandom() in Gnulib.
+     (message--number-base62 random-num 22)
+     ;; Append ".gnu" to advertise that we're GNU Emacs.
+     ".gnu")))
+
+(defun message--number-base62 (num len)
+  "Return a string similar to Base64, but include only alpha-numerics.
+NUM is the number to convert.
+LEN is the length of the string (or -1 for no limit)."
   (if (if (< len 0)
-	  (<= num 0)
-	(= len 0))
+          (<= num 0)
+        (= len 0))
       ""
-    (concat (message-number-base36 (/ num 36) (1- len))
-	    (char-to-string (aref "zyxwvutsrqponmlkjihgfedcba9876543210"
-				  (% num 36))))))
+    (concat (message--number-base62 (/ num 62) (1- len))
+            (char-to-string (aref (concat "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+                                          "abcdefghijklmnopqrstuvwxyz"
+                                          "0123456789")
+                                  (% num 62))))))
 
 (defun message-make-organization ()
   "Make an Organization header."
@@ -8998,6 +8984,19 @@ message-mailto-1
 	(message-goto-body)
       (message-goto-subject))))
 
+(make-obsolete-variable 'message-unique-id-char nil "29.1")
+(defvar message-unique-id-char nil)
+
+(defun message-number-base36 (num len)
+  (declare (obsolete message--number-base62 "29.1"))
+  (if (if (< len 0)
+          (<= num 0)
+        (= len 0))
+      ""
+    (concat (message-number-base36 (/ num 36) (1- len))
+            (char-to-string (aref "0123456789abcdefghijklmnopqrstuvwxyz"
+                                  (% num 36))))))
+
 (provide 'message)
 
 (make-obsolete-variable 'message-load-hook
diff --git a/test/lisp/gnus/message-tests.el b/test/lisp/gnus/message-tests.el
index a724428ecb..851e07b7ae 100644
--- a/test/lisp/gnus/message-tests.el
+++ b/test/lisp/gnus/message-tests.el
@@ -31,6 +31,16 @@
 
 (require 'cl-lib)
 
+(ert-deftest message-unique-id-test ()
+  (should (stringp (message-unique-id)))
+  (should (= (length (message-unique-id)) 26))
+  (should (string-match (rx ".gnu" eos) (message-unique-id))))
+
+(ert-deftest message--number-base62-test ()
+  (should (equal (message--number-base62 10 -1) "K"))
+  (should (equal (message--number-base62 1 2) "AB"))
+  (should (equal (message--number-base62 (expt 62 5) -1) "BAAAAA")))
+
 (ert-deftest message-mode-propertize ()
   (with-temp-buffer
     (unwind-protect
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  7:30           ` Paul Eggert
  2022-10-17  8:14             ` Stefan Kangas
@ 2022-10-17  8:16             ` Eli Zaretskii
  2022-10-17  8:29               ` Lars Ingebrigtsen
  2022-10-17 18:40             ` Matt Armstrong
  2 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2022-10-17  8:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: matt, stefankangas, 58472

> Cc: 58472@debbugs.gnu.org
> Date: Mon, 17 Oct 2022 00:30:49 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> I've been looking into this and have several patches along these lines. 
> None of them address message-unique-id directly yet (I plan to tackle 
> this soon) but they do address the general problem area. The basic idea 
> is to use a new make-nonce primitive.
> 
> I thought I'd email the patches now to see what others think about the 
> direction they're headed.

IMNSHO, this thread has long passed the point of being reasonable.
There's nothing particularly wrong with the current message-id we use,
and as mentioned several times, its exact form and contents are not
very important anyway.

So I'm objected to any of these wide-sweeping changes for a reason
that is so minor it IMO shouldn't have been brought up in the first
place.  I regret I didn't stop this discussion back then, because it
has now snowballed into a monster.  But better late than never.

> --- a/src/sysdep.c
> +++ b/src/sysdep.c
> @@ -2163,17 +2163,11 @@ seed_random (void *seed, ptrdiff_t seed_size)
>  init_random (void)
>  {
>    random_seed v;
> -  bool success = false;
>  
>    /* First, try seeding the PRNG from the operating system's entropy
>       source.  This approach is both fast and secure.  */
> -#ifdef WINDOWSNT
> -  /* FIXME: Perhaps getrandom can be used here too?  */
> -  success = w32_init_random (&v, sizeof v) == 0;
> -#else
>    verify (sizeof v <= 256);
> -  success = getrandom (&v, sizeof v, 0) == sizeof v;
> -#endif
> +  bool success = getrandom (&v, sizeof v, 0) == sizeof v;
>  
>    /* If that didn't work, just use the current time value and PID.
>       It's at least better than XKCD 221.  */

Please never replace w32-specific code with Gnulib without auditing.
Gnulib doesn't support old versions of Windows which we still do, and
so its replacement break Emacs on those old platforms.

> * lisp/calc/calc-comb.el (math-random-table, math-last-RandSeed)
> (math-random-ptr1, math-random-ptr2, math-random-shift)
> (var-RandSeed, math-random-cache, math-init-random-base)
> (math-random-base, math-random-last)
> (math-random-three-digit-number):
> Now obsolete, as we can assume that ‘random’ is good enough.
> (math-random-digits): Simplify by using ‘random’.

Why do we need to touch Calc, for crying out loud?!

> From 7113ce5ab4a265db7f2870c6614da88d09407604 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 16 Oct 2022 16:33:05 -0700
> Subject: [PATCH 06/10] New function make-nonce
> 
> * src/alloc.c (clear_nonce, Fmake_nonce): New functions.
> * src/fns.c: Do not include <sys/random.h>.
> (extract_data_from_object): Simplify by calling get_entropy.
> * src/sysdep.c (get_entropy): New function, taken from
> the old extract_data_from_object.

I don't want this new function in Emacs, with all the code churn and
other strings with which it comes attached.  Please leave our random
functions alone, they do their job just fine!

Bottom line: please don't install any of this, certainly not so close
to cutting a release branch, and hopefully not ever.  There were much
easier and smaller changes proposed for message-id; let's use one of
those, or even leave the original message-id intact, as there's
nothing particularly wrong with it.  We have much more important jobs
to do.

TIA.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  8:14             ` Stefan Kangas
@ 2022-10-17  8:23               ` Eli Zaretskii
  2022-10-17 18:47               ` Matt Armstrong
  1 sibling, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2022-10-17  8:23 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: matt, eggert, 58472

> Cc: 58472@debbugs.gnu.org
> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Mon, 17 Oct 2022 08:14:03 +0000
> 
> Paul Eggert <eggert@cs.ucla.edu> writes:
> 
> > I've been looking into this and have several patches along these lines.
> > None of them address message-unique-id directly yet (I plan to tackle
> > this soon) but they do address the general problem area. The basic idea
> > is to use a new make-nonce primitive.
> 
> Thanks!  I have read your patchset, which looks good to me.

It doesn't look good to me at all, and I'm against installing any of
that stuff, certainly at this time, hopefully never.

Please stop pushing this issue, as I will not agree to installing
anything that complex.  The only changes I'm willing to consider wrt
this issue are local changes in message.el that affect only the
message-id.  Please drop any wider and more general changes, as I will
not agree to them.

TIA





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  8:16             ` Eli Zaretskii
@ 2022-10-17  8:29               ` Lars Ingebrigtsen
  2022-10-17  8:34                 ` Eli Zaretskii
  2022-10-17  9:30                 ` Stefan Kangas
  0 siblings, 2 replies; 32+ messages in thread
From: Lars Ingebrigtsen @ 2022-10-17  8:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: matt, Paul Eggert, stefankangas, 58472

Eli Zaretskii <eliz@gnu.org> writes:

> So I'm objected to any of these wide-sweeping changes for a reason
> that is so minor it IMO shouldn't have been brought up in the first
> place.  I regret I didn't stop this discussion back then, because it
> has now snowballed into a monster.  But better late than never.

I sort of agree with you here, but not totally -- I think a `make-nonce'
function would be useful in general, because this is an area that's
genuinely difficult to get right, and having a function that does this
for you -- correctly -- is good.

But, like you, I'm not sure about the proposed changes otherwise.

And, like I've said before, there's a reason the Message-ID is on the
format it's on now -- it has information that allows users to do work on
it (so changing it will break some use cases), and it's short (which
makes it efficient in many algos), and it's obviously "good enough" --
it's been this way for decades without any problems.

So I'd prefer not to change `message-make-id', but adding a `make-nonce'
function would be nice anyway.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  8:29               ` Lars Ingebrigtsen
@ 2022-10-17  8:34                 ` Eli Zaretskii
  2022-10-17  9:30                 ` Stefan Kangas
  1 sibling, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2022-10-17  8:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: matt, eggert, stefankangas, 58472

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  matt@rfc20.org,
>   stefankangas@gmail.com,  58472@debbugs.gnu.org
> Date: Mon, 17 Oct 2022 10:29:07 +0200
> 
> I sort of agree with you here, but not totally -- I think a `make-nonce'
> function would be useful in general, because this is an area that's
> genuinely difficult to get right, and having a function that does this
> for you -- correctly -- is good.
> 
> But, like you, I'm not sure about the proposed changes otherwise.
> 
> And, like I've said before, there's a reason the Message-ID is on the
> format it's on now -- it has information that allows users to do work on
> it (so changing it will break some use cases), and it's short (which
> makes it efficient in many algos), and it's obviously "good enough" --
> it's been this way for decades without any problems.

Agreed.

> So I'd prefer not to change `message-make-id', but adding a `make-nonce'
> function would be nice anyway.

If we want a make-nonce function for unrelated reasons, by all means
let's discuss that -- but in a separate thread, and with the reasons
and use cases spelled out.  Doing it as a side effect of what was a
wishlist bug report for a minor feature to begin with doesn't sound
right to me.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  8:29               ` Lars Ingebrigtsen
  2022-10-17  8:34                 ` Eli Zaretskii
@ 2022-10-17  9:30                 ` Stefan Kangas
  2022-10-17 11:22                   ` Lars Ingebrigtsen
  1 sibling, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-17  9:30 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: matt, Paul Eggert, 58472

Lars Ingebrigtsen <larsi@gnus.org> writes:

> And, like I've said before, there's a reason the Message-ID is on the
> format it's on now -- it has information that allows users to do work on
> it (so changing it will break some use cases), and it's short (which
> makes it efficient in many algos), and it's obviously "good enough" --
> it's been this way for decades without any problems.

It still has the privacy issues I've indicated.  Leaking the euid for no
good reason leaves you vulnerable to fingerprinting (or even attack, in
the worst case scenario).

I also don't think the kind of processing you propose on the Message-ID
is useful, as most people end up with euid 1000 these days.  You have
other headers that are more suitable for that.

> So I'd prefer not to change `message-make-id',

If my most recent patch is not acceptable, could you agree with any of
the earlier ones?  We can make it as short as you want to, or even keep
it at 10 characters.  The important part is to not include the euid.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  9:30                 ` Stefan Kangas
@ 2022-10-17 11:22                   ` Lars Ingebrigtsen
  2022-10-17 15:40                     ` Stefan Kangas
  0 siblings, 1 reply; 32+ messages in thread
From: Lars Ingebrigtsen @ 2022-10-17 11:22 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: matt, Eli Zaretskii, Paul Eggert, 58472

Stefan Kangas <stefankangas@gmail.com> writes:

> It still has the privacy issues I've indicated.  Leaking the euid for no
> good reason leaves you vulnerable to fingerprinting (or even attack, in
> the worst case scenario).

I don't think the privacy issues here are compelling.  And it's not for
"no good reason", as previously explained:

> I also don't think the kind of processing you propose on the Message-ID
> is useful, as most people end up with euid 1000 these days.  You have
> other headers that are more suitable for that.

No, matching on References/In-reply-to is the only way to get at that
functionality.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17 11:22                   ` Lars Ingebrigtsen
@ 2022-10-17 15:40                     ` Stefan Kangas
  2022-11-25  1:26                       ` Stefan Kangas
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kangas @ 2022-10-17 15:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: matt, Eli Zaretskii, Paul Eggert, 58472

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> It still has the privacy issues I've indicated.  Leaking the euid for no
>> good reason leaves you vulnerable to fingerprinting (or even attack, in
>> the worst case scenario).
>
> I don't think the privacy issues here are compelling.

Do you find it problematic that it's very easy for us to have
collisions?  We will have a 1 in 625 chance for a _partial_ Message-ID
collision every time two users:

1. send an email the same second, and
2. have the same euid (e.g. 1000 on Ubuntu, or 501 or whatever it is on
   macOS, etc.).

Just try this:

    (let ((tim (time-convert nil 'integer))
          (i 0) ids)
      (while t
        (cl-incf i)
        (cl-flet ((time-convert (lambda (_ _) tim)))
          (let ((id (message-unique-id)))
            (if (member id ids)
                (error "oops after %d tries" i)
              (push id ids))))))

We will have a _full_ Message-ID collision if they also:

3. have the same host (e.g. it's misconfigured [a not insignificant
   number of desktops, mind you, so it says "tickle-me" or whatever
   non-hilarious thing we use now], or they are on the same big domain
   like eecs.mit.edu).

Is that really "good enough"?  We could do drastically better here, with
very small means, so I'm not sure why we wouldn't.

> No, matching on References/In-reply-to is the only way to get at that
> functionality.

I still don't know which functionality that is.  Getting an In-reply-to
for your highly unique euid 1000?  What's wrong with just checking if
your email address is in To/Cc?

If this use-case is important, wouldn't it be much better to use a
defcustom that you could at least set yourself to something somewhat
unique to you?  We could just set it to 1000 or whatever by default (so
we're not worse off than today, but also not leaking information by
default), and then users could set that to whatever they like (even to
their euid).





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  7:30           ` Paul Eggert
  2022-10-17  8:14             ` Stefan Kangas
  2022-10-17  8:16             ` Eli Zaretskii
@ 2022-10-17 18:40             ` Matt Armstrong
  2022-10-18  1:38               ` Paul Eggert
  2 siblings, 1 reply; 32+ messages in thread
From: Matt Armstrong @ 2022-10-17 18:40 UTC (permalink / raw)
  To: Paul Eggert, Stefan Kangas; +Cc: 58472

Paul Eggert <eggert@cs.ucla.edu> writes:

> I've been looking into this and have several patches along these lines.
> None of them address message-unique-id directly yet (I plan to tackle
> this soon) but they do address the general problem area. The basic idea
> is to use a new make-nonce primitive.

I like it.

> ---
>  doc/lispref/numbers.texi | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
> index fdcda328d8..c1a1349d1a 100644
> --- a/doc/lispref/numbers.texi
> +++ b/doc/lispref/numbers.texi
>
> -If @var{limit} is a string, it means to choose a new seed based on the
> -string's contents.
> +If you need a random nonce for cryptographic purposes, @code{(random
> +t)} is typically not the best approach, as it can adversely affect other
> +parts of your program that benefit from reproducible results, and it can
> +leave information about the nonce scattered about Emacs's internal state.

Mention the new `make-nonce'?

With respect to "cryptographic purposes" how about mentioning that
`random' itself is potentially seeded from a cryptographically weak
source and makes no promise to use a PRNG suitable for cryptography?  If
I'm right about those two assertions, I think they are important to
mention.

> diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
> index cf961e9e7c..0f3e0ae213 100644
> --- a/doc/lispref/strings.texi
> +++ b/doc/lispref/strings.texi
> @@ -455,6 +455,18 @@ Creating Strings
>  Remove the final newline, if any, from @var{string}.
>  @end defun
> 
> +@defun make-nonce length &optional function
> +Return a newly created random string of length @var{length}.
> +The string is unibyte, with bytes taken from system entropy,
> +and with each string value equally likely.
> +
> +If @var{function} is given, call it with the newly created string as
> +an argument and return the value that @var{function} returns.
> +When the function exits, overwrite the string's random contents with
> +unspecified bytes, to lessen information leakage in insecure code.
> +The string's contents are therefore valid only during the function call.
> +@end defun

First question I'll have as a reader: what happens if the system has low
entropy?  Does this block?  Signal an error?





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17  8:14             ` Stefan Kangas
  2022-10-17  8:23               ` Eli Zaretskii
@ 2022-10-17 18:47               ` Matt Armstrong
  1 sibling, 0 replies; 32+ messages in thread
From: Matt Armstrong @ 2022-10-17 18:47 UTC (permalink / raw)
  To: Stefan Kangas, Paul Eggert; +Cc: 58472

Stefan Kangas <stefankangas@gmail.com> writes:

> On Linux, flags 0 and GRND_RANDOM are equivalent (I read the most
> recent kernel code to verify this).  But I have no idea about other
> platforms.

That is interesting, since they are documented quite differently with no
hint that they are the same:

https://www.kernel.org/doc/man-pages/ ->
https://man7.org/linux/man-pages/man2/getrandom.2.html

But, yet:

https://lore.kernel.org/lkml/531cfcd62151916cc7fbade2ecd0311fbafc02a9.1567126741.git.luto@kernel.org/

...would not be the first time a developer changed code without updating
the documentation.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17 18:40             ` Matt Armstrong
@ 2022-10-18  1:38               ` Paul Eggert
  2022-10-18 14:05                 ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Eggert @ 2022-10-18  1:38 UTC (permalink / raw)
  To: Matt Armstrong; +Cc: 58472, Stefan Kangas

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On 10/17/22 11:40, Matt Armstrong wrote:

> I like it.

Eli doesn't, so I'll drop the idea for now. I didn't realize we were 
close to releasing 29.1, and I agree with Eli that adding a make-nonce 
primitive is not something to do close to a release.


> With respect to "cryptographic purposes" how about mentioning that
> `random' itself is potentially seeded from a cryptographically weak
> source and makes no promise to use a PRNG suitable for cryptography?  If
> I'm right about those two assertions, I think they are important to
> mention.

Good point. This can be done in the documentation now: this doesn't hurt 
anything release-relevant, as it's simply documenting what we have. I 
installed the attached.

[-- Attachment #2: 0001-Improve-random-doc-re-nonces.patch --]
[-- Type: text/x-patch, Size: 3307 bytes --]

From f4442d49f6490cb754bad66dd34a182d5eae06d9 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Oct 2022 21:35:47 -0700
Subject: [PATCH] =?UTF-8?q?Improve=20=E2=80=98random=E2=80=99=20doc=20re?=
 =?UTF-8?q?=20nonces?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* doc/lispref/numbers.texi (Random Numbers): Improve coverage of
random seed, entropy pools, and why one shouldn’t use ‘random’ for
nonces.  See Bug#58472.
---
 doc/lispref/numbers.texi | 48 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index fdcda328d8..2c7a1d3266 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -1238,6 +1238,9 @@ Random Numbers
 sequence of numbers.  By default, Emacs initializes the random seed at
 startup, in such a way that the sequence of values of @code{random}
 (with overwhelming likelihood) differs in each Emacs run.
+The random seed is typically initialized from system entropy;
+however, on obsolescent platforms lacking entropy pools,
+the seed is taken from less-random volatile data such as the current time.
 
   Sometimes you want the random number sequence to be repeatable.  For
 example, when debugging a program whose behavior depends on the random
@@ -1256,12 +1259,45 @@ Random Numbers
 any fixnum, i.e., any integer from @code{most-negative-fixnum} through
 @code{most-positive-fixnum} (@pxref{Integer Basics}).
 
-If @var{limit} is @code{t}, it means to choose a new seed as if Emacs
-were restarting, typically from the system entropy.  On systems
-lacking entropy pools, choose the seed from less-random volatile data
-such as the current time.
-
 If @var{limit} is a string, it means to choose a new seed based on the
-string's contents.
+string's contents.  This causes later calls to @code{random} to return
+a reproducible sequence of results.
+
+If @var{limit} is @code{t}, it means to choose a new seed as if Emacs
+were restarting.  This causes later calls to @code{random} to return
+an unpredictable sequence of results.
 
 @end defun
+
+If you need a random nonce for cryptographic purposes, using
+@code{random} is typically not the best approach, for several reasons:
+
+@itemize @bullet
+@item
+Although you can use @code{(random t)} to consult system entropy,
+doing so can adversely affect other parts of your program that benefit
+from reproducible results.
+
+@item
+The system-dependent pseudo-random number generator (PRNG) used by
+@code{random} is not necessarily suitable for cryptography.
+
+@item
+A call to @code{(random t)} does not give direct access to system
+entropy; the entropy is passed through the system-dependent PRNG, thus
+possibly biasing the results.
+
+@item
+On typical platforms the random seed contains only 32 bits, which is
+typically narrower than an Emacs fixnum, and is not nearly enough for
+cryptographic purposes.
+
+@item
+A @code{(random t)} call leaves information about the nonce scattered
+about Emacs's internal state, increasing the size of the internal
+attack surface.
+
+@item
+On obsolescent platforms lacking entropy pools, @code{(random t)} is
+seeded from a cryptographically weak source.
+@end itemize
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-18  1:38               ` Paul Eggert
@ 2022-10-18 14:05                 ` Eli Zaretskii
  0 siblings, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2022-10-18 14:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: matt, stefankangas, 58472

> Cc: 58472@debbugs.gnu.org, Stefan Kangas <stefankangas@gmail.com>
> Date: Mon, 17 Oct 2022 18:38:48 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> > I like it.
> 
> Eli doesn't, so I'll drop the idea for now. I didn't realize we were 
> close to releasing 29.1, and I agree with Eli that adding a make-nonce 
> primitive is not something to do close to a release.

I actually don't mind adding a new primitive, and Lars says it could
be useful.  A new primitive cannot do any harm; all I'm asking is not
to start using it right away in places where we have solid code that
worked for years.  And the particular feature for which make-nonce was
intended in this case definitely doesn't need it.

However, I would like to have at least a short discussion of the
potential needs for make-nonce, and to have it in a separate thread,
so that we could be all on the same page about its need, use, and
semantics.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
  2022-10-17 15:40                     ` Stefan Kangas
@ 2022-11-25  1:26                       ` Stefan Kangas
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Kangas @ 2022-11-25  1:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: matt, Eli Zaretskii, Paul Eggert, 58472-done

I'm guessing that none of this is happening, so I'm closing this bug
report.





^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-11-25  1:26 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12 16:07 bug#58472: [PATCH] Make `message-unique-id' less prone to collisions Stefan Kangas
2022-10-12 18:08 ` Paul Eggert
2022-10-13  2:46   ` Stefan Kangas
2022-10-13  4:53     ` Matt Armstrong
2022-10-13 12:10       ` Stefan Kangas
2022-10-13 16:35         ` Matt Armstrong
2022-10-13 16:38           ` Paul Eggert
2022-10-14  9:22           ` Stefan Kangas
2022-10-13 16:21     ` Paul Eggert
2022-10-14  9:22       ` Stefan Kangas
2022-10-16  7:32         ` Stefan Kangas
2022-10-16 17:05           ` Stefan Kangas
2022-10-16 15:19       ` Matt Armstrong
2022-10-16 16:49         ` Stefan Kangas
2022-10-17  6:17           ` Matt Armstrong
2022-10-17  7:30           ` Paul Eggert
2022-10-17  8:14             ` Stefan Kangas
2022-10-17  8:23               ` Eli Zaretskii
2022-10-17 18:47               ` Matt Armstrong
2022-10-17  8:16             ` Eli Zaretskii
2022-10-17  8:29               ` Lars Ingebrigtsen
2022-10-17  8:34                 ` Eli Zaretskii
2022-10-17  9:30                 ` Stefan Kangas
2022-10-17 11:22                   ` Lars Ingebrigtsen
2022-10-17 15:40                     ` Stefan Kangas
2022-11-25  1:26                       ` Stefan Kangas
2022-10-17 18:40             ` Matt Armstrong
2022-10-18  1:38               ` Paul Eggert
2022-10-18 14:05                 ` Eli Zaretskii
2022-10-13 11:45 ` Lars Ingebrigtsen
2022-10-13 12:10   ` Stefan Kangas
2022-10-13 19:15     ` Lars Ingebrigtsen

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.