all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Matt Armstrong <matt@rfc20.org>
To: Stefan Kangas <stefankangas@gmail.com>
Cc: 58472@debbugs.gnu.org, Paul Eggert <eggert@cs.ucla.edu>
Subject: bug#58472: [PATCH] Make `message-unique-id' less prone to collisions
Date: Thu, 13 Oct 2022 09:35:06 -0700	[thread overview]
Message-ID: <87h707r8at.fsf@rfc20.org> (raw)
In-Reply-To: <CADwFkmm1Ki5S4-Bo=SN3n1qVnqh11FuOCckTuN8V07L+E+5=bA@mail.gmail.com>

Stefan Kangas <stefankangas@gmail.com> writes:

> Matt Armstrong <matt@rfc20.org> writes:
>
>> Most email I get today to use a UUID or UUID-like Message-ID, like this:
>>
>> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69@dfw1s10mta1086.xt.local>
>> Message-ID: <1815053947.8446619.1665544925708@lor1-app45123.prod.linkedin.com>
>> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000@email.amazonses.com>
>> Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw@mail.gmail.com>
>
> Those are 30-51 characters in length.  I also note that Gmail uses both
> lower case and upper case characters.

Yes, the Google example is probably a crypto secure hash base64url
encoded.


>>> If we limit the length of the time string to 12 characters, and the
>>> total length to 25 characters (including the ".gnu" part), we still have
>>> a guaranteed 9 characters of random data, or 46 bits of entropy.
>>
>> I suspect that most mailers use more randomness than that.
>
> So I guess we might as well bump this up to 30 characters in total,
> which gives us 72 bits.  The Message-IDs would look like:
>
>     cnkrs75yamag1k7x8rnt3y50za.gnu@stefankangas.se
>     cnkrifkirauwuwfkzs3rcit8cq.gnu@stefankangas.se

You said you like the timestamp prefix.  How many times would that
actually be useful, and for what?

I am not seeing the utility myself, but I might be missing something.


> We could go longer, but it's also nice to have something which is not an
> absolute abomination to look at.
>
> If we add in upper case characters too, we can encode the time with one
> less character.  So we end up with 89 bits of randomness and this:
>
>     1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu@stefankangas.se
>     1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu@stefankangas.se
>
> If we don't want to always start the Message-ID with the same characters
> (which makes them more distinct, at a glance), we could just reverse the
> time string:
>
>     QlRXPpmK2Z1kUklxIpMNZpChOu.gnu@stefankangas.se
>     Z59YikmK2Z1FSmYj172SAdPpuX.gnu@stefankangas.se

Maybe use (base64url-encode-string str t) is an option?


>> Some of the SHA hash algorithms are in the public domain.  Could they be
>> added to Emacs and used for UUID generation here?
>
> We have `secure-hash'.  Is that what you mean?  Or do you mean to use a
> proper RFC 4122 UUID?
>
> All we need is for the Message-ID to be unique though, so some ad hoc
> solution is probably fine.

I am no security expert, so I'll end with this and then be quiet on this
issue.  :-)

I agree that globally unique (with very high probability) is pretty easy
to design in ad hoc ways within a private namespace.  But Message-ID is
not that.  The Message-ID space consists of all Message-ID generated by
all programs past, present and future.  An impossible goal, given that
the RFC basically says "go make something up, have fun with that", so it
is a free for all.

Given that, timestamp + short-rng + ".gnu" doesn't feel good enough to
me, but I won't lose sleep if you use that solution.

In these situations I usually feel better going with what seems like an
existing practice: encode a crypto hash sourced from (possibly security
sensitive) entropy available on the local machine.

Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked
there (https://nullprogram.com/download/uuid.el) I came up with this:

;; Written while discussing Emacs bug#58472
(defun my-message-unique-id()
  (let* ((object (format "%S|%s|%s|%s"
                         (time-convert nil t)
                         (random (expt 2 512))
                         (emacs-uptime)
                         (emacs-pid)))
         (hash (secure-hash 'sha512 object nil nil t))
         (encoded-hash (base64url-encode-string hash t))
         (id (concat encoded-hash ".gnu")))
    ;; Replace leading and trailing non-alnum with ?x.  This is not
    ;; necessary to create strictly conforming Message-Id, but it makes
    ;; them less confusing in practice.
    (dolist (index `(0 ,(1- (length id))))
      (let ((char (aref id index)))
        (when (or (eq char ?-)
                  (eq char ?_))
          (aset id index ?x))))
    id))

This gives IDs like:

   lzD9K4s2GkiGmZhmP46M4ezv37sGEQO2tU2GX2mx-rs.gnu@stefankangas.se
   wEUY87gazgDLnsARlJZ5MEY4pgFZmDhxy9IZhbDLpfA.gnu@stefankangas.se
   VbUzrNkjJiCnZYiDzMdcqz2BqibzismlaR8ZkcU6O7E.gnu@stefankangas.se
   XX8f7gOrT1Iv-NLdZBjHgz8s4u4EF2uE7o-UttFgF0U.gnu@stefankangas.se
   tp9f3Up7ilsrH8XqtmQT4_evPkOO-EteN127d9bt85s.gnu@stefankangas.se

One advantage of this approach is that it is possible to add more
sources of entropy, or grow the hash function, without necessarily
requiring a "format" version change, because we're relying on the hash
for uniqueness in a sort of brute force way, not a schema and a hope
that only Emacs uses it.

One question: I'm not sure that (random (exp 2 512)) actually gives more
entropy than a simple call to (random).  It depends on Emacs' RNG.  Last
I checked there was code still calling something like rand(), which
generally not intended for this purpose.

P.S. if you go with your original approach, I think you want (expt 2 X)
instead of (exp X).

-- 
matt (sent from an Emacs running the feature/noverlay branch)





  reply	other threads:[~2022-10-13 16:35 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 16:07 bug#58472: [PATCH] Make `message-unique-id' less prone to collisions Stefan Kangas
2022-10-12 18:08 ` Paul Eggert
2022-10-13  2:46   ` Stefan Kangas
2022-10-13  4:53     ` Matt Armstrong
2022-10-13 12:10       ` Stefan Kangas
2022-10-13 16:35         ` Matt Armstrong [this message]
2022-10-13 16:38           ` Paul Eggert
2022-10-14  9:22           ` Stefan Kangas
2022-10-13 16:21     ` Paul Eggert
2022-10-14  9:22       ` Stefan Kangas
2022-10-16  7:32         ` Stefan Kangas
2022-10-16 17:05           ` Stefan Kangas
2022-10-16 15:19       ` Matt Armstrong
2022-10-16 16:49         ` Stefan Kangas
2022-10-17  6:17           ` Matt Armstrong
2022-10-17  7:30           ` Paul Eggert
2022-10-17  8:14             ` Stefan Kangas
2022-10-17  8:23               ` Eli Zaretskii
2022-10-17 18:47               ` Matt Armstrong
2022-10-17  8:16             ` Eli Zaretskii
2022-10-17  8:29               ` Lars Ingebrigtsen
2022-10-17  8:34                 ` Eli Zaretskii
2022-10-17  9:30                 ` Stefan Kangas
2022-10-17 11:22                   ` Lars Ingebrigtsen
2022-10-17 15:40                     ` Stefan Kangas
2022-11-25  1:26                       ` Stefan Kangas
2022-10-17 18:40             ` Matt Armstrong
2022-10-18  1:38               ` Paul Eggert
2022-10-18 14:05                 ` Eli Zaretskii
2022-10-13 11:45 ` Lars Ingebrigtsen
2022-10-13 12:10   ` Stefan Kangas
2022-10-13 19:15     ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h707r8at.fsf@rfc20.org \
    --to=matt@rfc20.org \
    --cc=58472@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=stefankangas@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.