unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 01/13] content_id: do not take Message-Id into account
Date: Thu, 22 Mar 2018 09:40:03 +0000	[thread overview]
Message-ID: <20180322094015.14422-2-e@80x24.org> (raw)
In-Reply-To: <20180322094015.14422-1-e@80x24.org>

If we need to use content_id, we've already lost hope
in relying on Message-Id as a differentiator.  This
prevents duplicates from showing up repeatedly with
-watch when Message-Ids are reused and we generate
new Message-Ids to disambiguate.
---
 lib/PublicInbox/ContentId.pm |  3 ++-
 t/v2writable.t               | 10 +++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/ContentId.pm b/lib/PublicInbox/ContentId.pm
index 9082b76..279eec0 100644
--- a/lib/PublicInbox/ContentId.pm
+++ b/lib/PublicInbox/ContentId.pm
@@ -21,7 +21,8 @@ sub content_digest ($) {
 	# in SearchIdx, so treat them the same for this:
 	my %seen;
 	foreach my $mid (@{mids($hdr)}) {
-		$dig->add('mid: '.$mid);
+		# do NOT consider the Message-ID as part of the content_id
+		# if we got here, we've already got Message-ID reuse
 		$seen{$mid} = 1;
 	}
 	foreach my $mid (@{references($hdr)}) {
diff --git a/t/v2writable.t b/t/v2writable.t
index 85b48d2..6cabf0d 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -61,11 +61,15 @@ if ('ensure git configs are correct') {
 
 	@warn = ();
 	$mime->header_set('Message-Id', '<a-mid@b>', '<c@d>');
-	ok($im->add($mime), 'secondary MID used');
+	is($im->add($mime), undef, 'secondary MID ignored if first matches');
+	my $sec = PublicInbox::MIME->new($mime->as_string);
+	$sec->header_set('Date');
+	$sec->header_set('Message-Id', '<a-mid@b>', '<c@d>');
+	ok($im->add($sec), 'secondary MID used if data is different');
 	like(join(' ', @warn), qr/mismatched/, 'warned about mismatch');
 	like(join(' ', @warn), qr/alternative/, 'warned about alternative');
 	is_deeply([ '<a-mid@b>', '<c@d>' ],
-		[ $mime->header_obj->header_raw('Message-Id') ],
+		[ $sec->header_obj->header_raw('Message-Id') ],
 		'no new Message-Id added');
 
 	my $sane_mid = qr/\A<[\w\-]+\@localhost>\z/;
@@ -85,7 +89,7 @@ if ('ensure git configs are correct') {
 	my $gen = PublicInbox::Import::digest2mid(content_digest($mime));
 	unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id');
 	my $fake = PublicInbox::MIME->new($mime->as_string);
-	$fake->header_set('Message-Id', $gen);
+	$fake->header_set('Message-Id', "<$gen>");
 	ok($im->add($fake), 'fake added easily');
 	is_deeply(\@warn, [], 'no warnings from a faker');
 	ok($im->add($mime), 'random MID made');
-- 
EW


  reply	other threads:[~2018-03-22  9:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-22  9:40 [PATCH 00/13] reindexing, feeds, date fixes Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-22  9:40 ` [PATCH 02/13] introduce InboxWritable class Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 03/13] import: discard all the same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 04/13] InboxWritable: add mbox/maildir parsing + import logic Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 05/13] use both Date: and Received: times Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 06/13] msgmap: add tmp_clone to create an anonymous copy Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 07/13] fix syntax warnings Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 08/13] v2writable: support reindexing Xapian Eric Wong (Contractor, The Linux Foundation)
2018-03-26 20:08   ` Eric Wong
2018-03-22  9:40 ` [PATCH 09/13] t/altid.t: extra tests for mid_set Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 10/13] v2writable: add NNTP article number regeneration support Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 11/13] v2writable: clarify header cleanups Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 12/13] v2writable: DEBUG_DIFF respects $TMPDIR Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 13/13] feed: $INBOX/new.atom endpoint supports v2 inboxes Eric Wong (Contractor, The Linux Foundation)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180322094015.14422-2-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).