From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 67CDF1F4C0; Tue, 22 Oct 2019 08:09:45 +0000 (UTC) Date: Tue, 22 Oct 2019 08:09:45 +0000 From: Eric Wong To: meta@public-inbox.org Subject: [RFC/HELP] search: multiple From/To/Cc/Subject (what about Date?) Message-ID: <20191022080945.GA4457@dcvr> References: <20191016211415.GA6084@dcvr> <20191017112215.GA13175@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191017112215.GA13175@dcvr> List-Id: We can easily support searching on messages with multiple From/To/Cc/Subject headers. Now, the display part in the thread skeleton could be trickier, but not impossible... Now, how are we supposed to support searching date ranges when messages have multiple Date: headers? e.g: https://lore.kernel.org/linux-renesas-soc/20160524.143422.552507610109476444.davem@davemloft.net/raw OTOH, I'm pretty sure that's just a mangled message somebody passed on because all 3 Message-IDs appear in standalone forms elsewhere (e.g. lkml) But, I also fully expect future messages will contain horribleness in headers unless anti-spam/abuse rules start blocking that before it hits public-inbox... --- lib/PublicInbox/SearchMsg.pm | 4 ++-- t/v2reindex.t | 16 ++++++++++++---- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/SearchMsg.pm b/lib/PublicInbox/SearchMsg.pm index adadf92e..7561e7f2 100644 --- a/lib/PublicInbox/SearchMsg.pm +++ b/lib/PublicInbox/SearchMsg.pm @@ -107,8 +107,8 @@ sub __hdr ($$) { return $val if defined $val; my $mime = $self->{mime} or return; - $val = $mime->header($field); - $val = '' unless defined $val; + my @raw = $mime->header($field); + $val = join(', ', @raw); $val =~ tr/\t\n/ /; $val =~ tr/\r//d; $self->{$field} = $val; diff --git a/t/v2reindex.t b/t/v2reindex.t index 52711f8f..3e56ddfa 100644 --- a/t/v2reindex.t +++ b/t/v2reindex.t @@ -439,7 +439,7 @@ ok(!-d $xap, 'Xapian directories removed again'); my @warn; local $SIG{__WARN__} = sub { push @warn, @_ }; my %config = %$ibx_config; - $config{indexlevel} = 'basic'; + $config{indexlevel} = 'medium'; my $ibx = PublicInbox::Inbox->new(\%config); my $im = PublicInbox::V2Writable->new($ibx); my $m3 = PublicInbox::MIME->new(<<'EOF'); @@ -447,7 +447,7 @@ Date: Tue, 24 May 2016 14:34:22 -0700 (PDT) Message-Id: <20160524.143422.552507610109476444.d@example.com> To: t@example.com Cc: c@example.com -Subject: Re: [PATCH v2 2/2] +Subject: Re: [PATCH v2 2/2] uno From: In-Reply-To: <1463825855-7363-2-git-send-email-y@example.com> References: <1463825855-7363-1-git-send-email-y@example.com> @@ -456,14 +456,14 @@ Date: Wed, 25 May 2016 10:01:51 +0900 From: h@example.com To: g@example.com Cc: m@example.com -Subject: Re: [PATCH] +Subject: Re: [PATCH] dos Message-ID: <20160525010150.GD7292@example.com> References: <1463498133-23918-1-git-send-email-g+r@example.com> In-Reply-To: <1463498133-23918-1-git-send-email-g+r@example.com> From: s@example.com To: h@example.com Cc: m@example.com -Subject: [PATCH 12/13] +Subject: [PATCH 12/13] tres Date: Wed, 01 Jun 2016 01:32:35 +0300 Message-ID: <1923946.Jvi0TDUXFC@wasted.example.com> In-Reply-To: <13205049.n7pM8utpHF@wasted.example.com> @@ -495,6 +495,14 @@ EOF eval { $im->index_sync({reindex=>1}) }; is($@, '', 'no error from reindexing after reused Message-ID (x3)'); is_deeply(\@warn, [], 'no warnings on reindex'); + + my %uniq; + for my $s (qw(uno dos tres)) { + my $msgs = $ibx->search->query("s:$s"); + is(scalar(@$msgs), 1, "only one result for `$s'"); + $uniq{$msgs->[0]->{num}}++; + } + is_deeply([values %uniq], [3], 'search on different subjects'); } done_testing();