From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 01/13] content_id: do not take Message-Id into account
Date: Thu, 22 Mar 2018 09:40:03 +0000 [thread overview]
Message-ID: <20180322094015.14422-2-e@80x24.org> (raw)
In-Reply-To: <20180322094015.14422-1-e@80x24.org>
If we need to use content_id, we've already lost hope
in relying on Message-Id as a differentiator. This
prevents duplicates from showing up repeatedly with
-watch when Message-Ids are reused and we generate
new Message-Ids to disambiguate.
---
lib/PublicInbox/ContentId.pm | 3 ++-
t/v2writable.t | 10 +++++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/ContentId.pm b/lib/PublicInbox/ContentId.pm
index 9082b76..279eec0 100644
--- a/lib/PublicInbox/ContentId.pm
+++ b/lib/PublicInbox/ContentId.pm
@@ -21,7 +21,8 @@ sub content_digest ($) {
# in SearchIdx, so treat them the same for this:
my %seen;
foreach my $mid (@{mids($hdr)}) {
- $dig->add('mid: '.$mid);
+ # do NOT consider the Message-ID as part of the content_id
+ # if we got here, we've already got Message-ID reuse
$seen{$mid} = 1;
}
foreach my $mid (@{references($hdr)}) {
diff --git a/t/v2writable.t b/t/v2writable.t
index 85b48d2..6cabf0d 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -61,11 +61,15 @@ if ('ensure git configs are correct') {
@warn = ();
$mime->header_set('Message-Id', '<a-mid@b>', '<c@d>');
- ok($im->add($mime), 'secondary MID used');
+ is($im->add($mime), undef, 'secondary MID ignored if first matches');
+ my $sec = PublicInbox::MIME->new($mime->as_string);
+ $sec->header_set('Date');
+ $sec->header_set('Message-Id', '<a-mid@b>', '<c@d>');
+ ok($im->add($sec), 'secondary MID used if data is different');
like(join(' ', @warn), qr/mismatched/, 'warned about mismatch');
like(join(' ', @warn), qr/alternative/, 'warned about alternative');
is_deeply([ '<a-mid@b>', '<c@d>' ],
- [ $mime->header_obj->header_raw('Message-Id') ],
+ [ $sec->header_obj->header_raw('Message-Id') ],
'no new Message-Id added');
my $sane_mid = qr/\A<[\w\-]+\@localhost>\z/;
@@ -85,7 +89,7 @@ if ('ensure git configs are correct') {
my $gen = PublicInbox::Import::digest2mid(content_digest($mime));
unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id');
my $fake = PublicInbox::MIME->new($mime->as_string);
- $fake->header_set('Message-Id', $gen);
+ $fake->header_set('Message-Id', "<$gen>");
ok($im->add($fake), 'fake added easily');
is_deeply(\@warn, [], 'no warnings from a faker');
ok($im->add($mime), 'random MID made');
--
EW
next prev parent reply other threads:[~2018-03-22 9:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-22 9:40 [PATCH 00/13] reindexing, feeds, date fixes Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-22 9:40 ` [PATCH 02/13] introduce InboxWritable class Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 03/13] import: discard all the same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 04/13] InboxWritable: add mbox/maildir parsing + import logic Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 05/13] use both Date: and Received: times Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 06/13] msgmap: add tmp_clone to create an anonymous copy Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 07/13] fix syntax warnings Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 08/13] v2writable: support reindexing Xapian Eric Wong (Contractor, The Linux Foundation)
2018-03-26 20:08 ` Eric Wong
2018-03-22 9:40 ` [PATCH 09/13] t/altid.t: extra tests for mid_set Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 10/13] v2writable: add NNTP article number regeneration support Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 11/13] v2writable: clarify header cleanups Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 12/13] v2writable: DEBUG_DIFF respects $TMPDIR Eric Wong (Contractor, The Linux Foundation)
2018-03-22 9:40 ` [PATCH 13/13] feed: $INBOX/new.atom endpoint supports v2 inboxes Eric Wong (Contractor, The Linux Foundation)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180322094015.14422-2-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).