From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 23EED1F9FE; Sat, 13 Mar 2021 22:43:43 +0000 (UTC) Date: Sat, 13 Mar 2021 18:43:42 -0400 From: Eric Wong To: meta@public-inbox.org Subject: Re: [PATCH] searchidx: fix -Lmedium for IDs and filenames Message-ID: References: <20210313154027.GA27788@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210313154027.GA27788@dcvr> List-Id: Eric Wong wrote: > sub index_headers ($$) { > my ($self, $smsg) = @_; > - my @x = (from => 'A', # Author > - subject => 'S', to => 'XTO', cc => 'XCC'); > + my @x = (from => 'A', to => 'XTO', cc => 'XCC'); # A: Author > + while (my ($field, $pfx) = splice(@x, 0, 2)) { > + my $val = $smsg->{$field}; > + next if $val eq ''; > + # include "(comments)" after the address, too, so not using > + # PublicInbox::Address::names or pairs > + index_text($self, $val, 1, $pfx); > + > + # we need positional info for email addresses since they > + # can be considered phrases > + if ($self->{indexlevel} eq 'medium') { > + for my $addr (PublicInbox::Address::emails($val)) { > + index_phrase($self, $addr, 1, $pfx); > + } > + } > + } I forgot to note email addresses are also handled as phrases unconditionally. In any case, pushed as commit 64b557420689476493d752968d99ab8ae62bad9a searchidx: fix -Lmedium for IDs and filenames This fixes "m:", "l:", "f:", "t:", "c:", "dfn:", and "n:" search prefixes under indexlevel=medium when mixed with indexlevel=full inboxish. We need positional data for Message-IDs, List-Id, email addresses and filenames for exact matches, though we still want to support wildcards. Fortunately the storage cost is still small as these prefixes tend to be small compared to message bodies. These are NOT boolean terms since wildcard support and partial matching is desired.