From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Matt Armstrong Newsgroups: gmane.emacs.bugs Subject: bug#58472: [PATCH] Make `message-unique-id' less prone to collisions Date: Thu, 13 Oct 2022 09:35:06 -0700 Message-ID: <87h707r8at.fsf@rfc20.org> References: <87k054qq7b.fsf@rfc20.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39736"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58472@debbugs.gnu.org, Paul Eggert To: Stefan Kangas Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 13 18:41:11 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oj1Gh-000A5R-4F for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 13 Oct 2022 18:41:11 +0200 Original-Received: from localhost ([::1]:45068 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oj1Gf-0007og-VK for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 13 Oct 2022 12:41:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59212) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oj1Bn-0002Y2-Nm for bug-gnu-emacs@gnu.org; Thu, 13 Oct 2022 12:36:08 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35690) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oj1Bi-0006yk-OG for bug-gnu-emacs@gnu.org; Thu, 13 Oct 2022 12:36:06 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oj1Bi-0006ko-DW for bug-gnu-emacs@gnu.org; Thu, 13 Oct 2022 12:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Matt Armstrong Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 13 Oct 2022 16:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58472 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 58472-submit@debbugs.gnu.org id=B58472.166567892225913 (code B ref 58472); Thu, 13 Oct 2022 16:36:02 +0000 Original-Received: (at 58472) by debbugs.gnu.org; 13 Oct 2022 16:35:22 +0000 Original-Received: from localhost ([127.0.0.1]:34768 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oj1B4-0006js-8p for submit@debbugs.gnu.org; Thu, 13 Oct 2022 12:35:22 -0400 Original-Received: from relay7-d.mail.gandi.net ([217.70.183.200]:40661) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oj1B1-0006jZ-J7 for 58472@debbugs.gnu.org; Thu, 13 Oct 2022 12:35:20 -0400 Original-Received: (Authenticated sender: matt@rfc20.org) by mail.gandi.net (Postfix) with ESMTPSA id A1A4E20010; Thu, 13 Oct 2022 16:35:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rfc20.org; s=gm1; t=1665678911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BCvP1yk/dKYCBWlLbVYUlVn2ZwEwvHe3zhsqJ7yIzfo=; b=Zh4HCLErrsFxX9JLWLJjrWhMB2t6KVx8eo++MQ2SLVQZ/1loZ2v8D8NCNhOS9AISwzJ14q 0oe/N9LsSeia62MGLeuY8Y87ukHpU7eyLB7Y9yYVxwfCNycW3MSOofew3nA8nN6FekXe8V hxiGxYuX4Fu9V4kGLSkVy+fzpkDI2Tt6LQr9nHEyfKKoS55Z4hC9fCTuLz7Dx41DvCXMS8 K+4Uz0uOS4b8cBvTCSqBwjFxRlrhV+PqwtoMP3TIUNVE6eEMkbpwGkYllhbh8MBF8VQm09 6gIociesAf3Uucg8/zY6fVYIGaDZrXPXLY7xNABZcOQJ8prrKe1u9fVCqkemug== Original-Received: from matt by naz with local (Exim 4.96) (envelope-from ) id 1oj1Ao-000A9a-0f; Thu, 13 Oct 2022 09:35:06 -0700 In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:245328 Archived-At: Stefan Kangas writes: > Matt Armstrong writes: > >> Most email I get today to use a UUID or UUID-like Message-ID, like this: >> >> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69@dfw1s10mta1086.xt.local> >> Message-ID: <1815053947.8446619.1665544925708@lor1-app45123.prod.linkedin.com> >> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000@email.amazonses.com> >> Message-ID: > > Those are 30-51 characters in length. I also note that Gmail uses both > lower case and upper case characters. Yes, the Google example is probably a crypto secure hash base64url encoded. >>> If we limit the length of the time string to 12 characters, and the >>> total length to 25 characters (including the ".gnu" part), we still have >>> a guaranteed 9 characters of random data, or 46 bits of entropy. >> >> I suspect that most mailers use more randomness than that. > > So I guess we might as well bump this up to 30 characters in total, > which gives us 72 bits. The Message-IDs would look like: > > cnkrs75yamag1k7x8rnt3y50za.gnu@stefankangas.se > cnkrifkirauwuwfkzs3rcit8cq.gnu@stefankangas.se You said you like the timestamp prefix. How many times would that actually be useful, and for what? I am not seeing the utility myself, but I might be missing something. > We could go longer, but it's also nice to have something which is not an > absolute abomination to look at. > > If we add in upper case characters too, we can encode the time with one > less character. So we end up with 89 bits of randomness and this: > > 1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu@stefankangas.se > 1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu@stefankangas.se > > If we don't want to always start the Message-ID with the same characters > (which makes them more distinct, at a glance), we could just reverse the > time string: > > QlRXPpmK2Z1kUklxIpMNZpChOu.gnu@stefankangas.se > Z59YikmK2Z1FSmYj172SAdPpuX.gnu@stefankangas.se Maybe use (base64url-encode-string str t) is an option? >> Some of the SHA hash algorithms are in the public domain. Could they be >> added to Emacs and used for UUID generation here? > > We have `secure-hash'. Is that what you mean? Or do you mean to use a > proper RFC 4122 UUID? > > All we need is for the Message-ID to be unique though, so some ad hoc > solution is probably fine. I am no security expert, so I'll end with this and then be quiet on this issue. :-) I agree that globally unique (with very high probability) is pretty easy to design in ad hoc ways within a private namespace. But Message-ID is not that. The Message-ID space consists of all Message-ID generated by all programs past, present and future. An impossible goal, given that the RFC basically says "go make something up, have fun with that", so it is a free for all. Given that, timestamp + short-rng + ".gnu" doesn't feel good enough to me, but I won't lose sleep if you use that solution. In these situations I usually feel better going with what seems like an existing practice: encode a crypto hash sourced from (possibly security sensitive) entropy available on the local machine. Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked there (https://nullprogram.com/download/uuid.el) I came up with this: ;; Written while discussing Emacs bug#58472 (defun my-message-unique-id() (let* ((object (format "%S|%s|%s|%s" (time-convert nil t) (random (expt 2 512)) (emacs-uptime) (emacs-pid))) (hash (secure-hash 'sha512 object nil nil t)) (encoded-hash (base64url-encode-string hash t)) (id (concat encoded-hash ".gnu"))) ;; Replace leading and trailing non-alnum with ?x. This is not ;; necessary to create strictly conforming Message-Id, but it makes ;; them less confusing in practice. (dolist (index `(0 ,(1- (length id)))) (let ((char (aref id index))) (when (or (eq char ?-) (eq char ?_)) (aset id index ?x)))) id)) This gives IDs like: lzD9K4s2GkiGmZhmP46M4ezv37sGEQO2tU2GX2mx-rs.gnu@stefankangas.se wEUY87gazgDLnsARlJZ5MEY4pgFZmDhxy9IZhbDLpfA.gnu@stefankangas.se VbUzrNkjJiCnZYiDzMdcqz2BqibzismlaR8ZkcU6O7E.gnu@stefankangas.se XX8f7gOrT1Iv-NLdZBjHgz8s4u4EF2uE7o-UttFgF0U.gnu@stefankangas.se tp9f3Up7ilsrH8XqtmQT4_evPkOO-EteN127d9bt85s.gnu@stefankangas.se One advantage of this approach is that it is possible to add more sources of entropy, or grow the hash function, without necessarily requiring a "format" version change, because we're relying on the hash for uniqueness in a sort of brute force way, not a schema and a hope that only Emacs uses it. One question: I'm not sure that (random (exp 2 512)) actually gives more entropy than a simple call to (random). It depends on Emacs' RNG. Last I checked there was code still calling something like rand(), which generally not intended for this purpose. P.S. if you go with your original approach, I think you want (expt 2 X) instead of (exp X). -- matt (sent from an Emacs running the feature/noverlay branch)