From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 15/27] import: switch to URL-safe Base64 for Message-IDs
Date: Mon, 19 Mar 2018 08:14:47 +0000 [thread overview]
Message-ID: <20180319081459.10645-16-e@80x24.org> (raw)
In-Reply-To: <20180319081459.10645-1-e@80x24.org>
Hexdigests are too long and shorter Message-IDs are easier
to deal with.
---
lib/PublicInbox/Import.pm | 11 ++++++++++-
t/v2writable.t | 10 ++++++----
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 4c007b6..77e74c1 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -401,7 +401,16 @@ sub atfork_child {
sub digest2mid ($) {
my ($dig) = @_;
- $dig->clone->hexdigest . '@localhost';
+ my $b64 = $dig->clone->b64digest;
+ # Make our own URLs nicer:
+ # See "Base 64 Encoding with URL and Filename Safe Alphabet" in RFC4648
+ $b64 =~ tr!+/=!-_!d;
+
+ # We can make this more meaningful with a date prefix or other things,
+ # but this is only needed for crap that fails to generate a Message-ID
+ # or reuses one. In other words, it's usually spammers who hit this
+ # so they don't deserve nice Message-IDs :P
+ $b64 . '@localhost';
}
1;
diff --git a/t/v2writable.t b/t/v2writable.t
index c6bcefd..bbe6d14 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -68,6 +68,7 @@ if ('ensure git configs are correct') {
[ $mime->header_obj->header_raw('Message-Id') ],
'no new Message-Id added');
+ my $sane_mid = qr/\A<[\w\-]+\@localhost>\z/;
@warn = ();
$mime->header_set('Message-Id', '<a-mid@b>');
$mime->body_set('different');
@@ -75,13 +76,14 @@ if ('ensure git configs are correct') {
like(join(' ', @warn), qr/reused/, 'warned about reused MID');
my @mids = $mime->header_obj->header_raw('Message-Id');
is($mids[1], '<a-mid@b>', 'original mid not changed');
- like($mids[0], qr/\A<\w+\@localhost>\z/, 'new MID added');
+ like($mids[0], $sane_mid, 'new MID added');
is(scalar(@mids), 2, 'only one new MID added');
@warn = ();
$mime->header_set('Message-Id', '<a-mid@b>');
$mime->body_set('this one needs a random mid');
- my $gen = content_digest($mime)->hexdigest . '@localhost';
+ my $gen = PublicInbox::Import::digest2mid(content_digest($mime));
+ unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id');
my $fake = PublicInbox::MIME->new($mime->as_string);
$fake->header_set('Message-Id', $gen);
ok($im->add($fake), 'fake added easily');
@@ -90,14 +92,14 @@ if ('ensure git configs are correct') {
like(join(' ', @warn), qr/using random/, 'warned about using random');
@mids = $mime->header_obj->header_raw('Message-Id');
is($mids[1], '<a-mid@b>', 'original mid not changed');
- like($mids[0], qr/\A<\w+\@localhost>\z/, 'new MID added');
+ like($mids[0], $sane_mid, 'new MID added');
is(scalar(@mids), 2, 'only one new MID added');
@warn = ();
$mime->header_set('Message-Id');
ok($im->add($mime), 'random MID made for MID free message');
@mids = $mime->header_obj->header_raw('Message-Id');
- like($mids[0], qr/\A<\w+\@localhost>\z/, 'mid was generated');
+ like($mids[0], $sane_mid, 'mid was generated');
is(scalar(@mids), 1, 'new generated');
}
--
EW
next prev parent reply other threads:[~2018-03-19 8:15 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-19 8:14 [PATCH 00/27] v2 public-inbox-watch support Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 01/27] content_id: use Sender header if From is not available Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 02/27] v2writable: support "barrier" operation to avoid reforking Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 03/27] use string ref for Email::Simple->new Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 04/27] v2writable: remove unnecessary idx_init call Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 05/27] searchidx: do not delete documents while iterating Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 06/27] search: allow ->reopen to be chainable Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 07/27] v2writable: implement remove correctly Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 08/27] skeleton: barrier init requires a lock Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 09/27] import: (v2) delete writes the blob into history in subdir Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 10/27] import: (v2): write deletes to a separate '_' subdirectory Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 11/27] import: implement barrier operation for v1 repos Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 12/27] mid: mid_mime uses v2-compatible mids function Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 13/27] watchmaildir: use content_digest to generate Message-Id Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 14/27] import: force Message-ID generation for v1 here Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-19 8:14 ` [PATCH 16/27] v2writable: test for idempotent removals Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 17/27] import: enable locking under v2 Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 18/27] index: s/GIT_DIR/REPO_DIR/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 19/27] Lock: new base class for writable lockers Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 20/27] t/watch_maildir: note the reason for FIFO creation Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 21/27] v2writable: ensure ->done is idempotent Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 22/27] watchmaildir: support v2 repositories Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 23/27] searchidxpart: s/barrier/remote_barrier/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 24/27] v2writable: allow disabling parallelization Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 25/27] scripts/import_vger_from_mbox: filter out same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 26/27] v2writable: add DEBUG_DIFF env support Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:14 ` [PATCH 27/27] v2writable: remove "resent" message for duplicate Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-19 8:18 ` SQUASH: " Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180319081459.10645-16-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).