From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id ABF511FAE2 for ; Thu, 22 Mar 2018 09:40:15 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [PATCH 01/13] content_id: do not take Message-Id into account Date: Thu, 22 Mar 2018 09:40:03 +0000 Message-Id: <20180322094015.14422-2-e@80x24.org> In-Reply-To: <20180322094015.14422-1-e@80x24.org> References: <20180322094015.14422-1-e@80x24.org> List-Id: If we need to use content_id, we've already lost hope in relying on Message-Id as a differentiator. This prevents duplicates from showing up repeatedly with -watch when Message-Ids are reused and we generate new Message-Ids to disambiguate. --- lib/PublicInbox/ContentId.pm | 3 ++- t/v2writable.t | 10 +++++++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/ContentId.pm b/lib/PublicInbox/ContentId.pm index 9082b76..279eec0 100644 --- a/lib/PublicInbox/ContentId.pm +++ b/lib/PublicInbox/ContentId.pm @@ -21,7 +21,8 @@ sub content_digest ($) { # in SearchIdx, so treat them the same for this: my %seen; foreach my $mid (@{mids($hdr)}) { - $dig->add('mid: '.$mid); + # do NOT consider the Message-ID as part of the content_id + # if we got here, we've already got Message-ID reuse $seen{$mid} = 1; } foreach my $mid (@{references($hdr)}) { diff --git a/t/v2writable.t b/t/v2writable.t index 85b48d2..6cabf0d 100644 --- a/t/v2writable.t +++ b/t/v2writable.t @@ -61,11 +61,15 @@ if ('ensure git configs are correct') { @warn = (); $mime->header_set('Message-Id', '', ''); - ok($im->add($mime), 'secondary MID used'); + is($im->add($mime), undef, 'secondary MID ignored if first matches'); + my $sec = PublicInbox::MIME->new($mime->as_string); + $sec->header_set('Date'); + $sec->header_set('Message-Id', '', ''); + ok($im->add($sec), 'secondary MID used if data is different'); like(join(' ', @warn), qr/mismatched/, 'warned about mismatch'); like(join(' ', @warn), qr/alternative/, 'warned about alternative'); is_deeply([ '', '' ], - [ $mime->header_obj->header_raw('Message-Id') ], + [ $sec->header_obj->header_raw('Message-Id') ], 'no new Message-Id added'); my $sane_mid = qr/\A<[\w\-]+\@localhost>\z/; @@ -85,7 +89,7 @@ if ('ensure git configs are correct') { my $gen = PublicInbox::Import::digest2mid(content_digest($mime)); unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id'); my $fake = PublicInbox::MIME->new($mime->as_string); - $fake->header_set('Message-Id', $gen); + $fake->header_set('Message-Id', "<$gen>"); ok($im->add($fake), 'fake added easily'); is_deeply(\@warn, [], 'no warnings from a faker'); ok($im->add($mime), 'random MID made'); -- EW