unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/11] indexheader + altid enhancements
@ 2024-08-10  9:00 Eric Wong
  2024-08-10  9:00 ` [PATCH 01/11] search: support per-inbox indexheader directive Eric Wong
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

The indexheader feature allows arbitrary headers to be indexed.
This can be done with both per-inbox Xapian indices and extindex
spanning multiple inboxes.

Furthermore, the old altid feature now works with -extindex
even if it's configured per-inbox.

Eric Wong (11):
  search: support per-inbox indexheader directive
  indexheader: deduplicate common values
  search: help: avoid ':' in user prefixes
  search: move QueryParser mappings to xh_args
  www_text: show indexheader contents in help
  www: don't memoize ->user_help contents
  extindex: avoid branch in ->index_eml
  t/extsearch: use autodie to detect chmod failures
  t/extsearch: use xsys_e to detect errors
  extindex: support extindex.*.indexheader
  extindex: support per-inbox indexheader+altid

 Documentation/common.perl             |  22 +----
 Documentation/public-inbox-config.pod |  62 ++++++++++++
 MANIFEST                              |   2 +
 lib/PublicInbox/AltId.pm              |  60 ++++++------
 lib/PublicInbox/Config.pm             |   5 +-
 lib/PublicInbox/ExtSearchIdx.pm       |  24 ++++-
 lib/PublicInbox/IndexHeader.pm        |  79 ++++++++++++++++
 lib/PublicInbox/Isearch.pm            |  16 +++-
 lib/PublicInbox/Search.pm             | 131 ++++++++++++++------------
 lib/PublicInbox/SearchIdx.pm          |  51 ++++++----
 lib/PublicInbox/SearchIdxShard.pm     |   3 +-
 lib/PublicInbox/WwwText.pm            |  29 +-----
 t/extsearch.t                         | 127 +++++++++++++++++++++++--
 t/watch_indexheader.t                 |  92 ++++++++++++++++++
 t/www_altid.t                         |   3 +
 15 files changed, 537 insertions(+), 169 deletions(-)
 create mode 100644 lib/PublicInbox/IndexHeader.pm
 create mode 100644 t/watch_indexheader.t

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] search: support per-inbox indexheader directive
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 02/11] indexheader: deduplicate common values Eric Wong
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

This allows indexing arbitrary headers to allow filtering by
boolean terms or existing text rules.  Disabling RFC 2047
decoding is supported, as well.

This also refactors AltId support to rely on the same mechanisms
as the IndexHeader class for indexing, user help, and
Xapian::QueryParser setup via both bindings and external
XapHelper process to avoid adding complexity to Search.pm and
SearchIdx.pm.

We'll finally document altid support in public-inbox-config(5)
since we're in the area, as it's been a stable feature for many
years, now.
---
 Documentation/public-inbox-config.pod | 62 ++++++++++++++++++
 MANIFEST                              |  2 +
 lib/PublicInbox/AltId.pm              | 60 +++++++++--------
 lib/PublicInbox/Config.pm             |  2 +-
 lib/PublicInbox/IndexHeader.pm        | 73 +++++++++++++++++++++
 lib/PublicInbox/Search.pm             | 43 +++++++------
 lib/PublicInbox/SearchIdx.pm          | 34 +++++-----
 t/watch_indexheader.t                 | 92 +++++++++++++++++++++++++++
 8 files changed, 306 insertions(+), 62 deletions(-)
 create mode 100644 lib/PublicInbox/IndexHeader.pm
 create mode 100644 t/watch_indexheader.t

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index b4a1d94d..50746b21 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -172,6 +172,68 @@ link to the line numbers of blobs.
 
 Default: none
 
+=item publicinbox.<name>.altid
+
+Index by an alternative ID mechanism as a Xapian search prefix e.g.
+C<gmane:1234>.  This is useful to allow looking up legacy serial IDs
+(e.g. gmane article numbers).
+
+It must be specified in the form of
+C<serial:$USER_PREFIX:file=$SQLITE_FILENAME> where C<$USER_PREFIX> is a
+lowercase prefix like C<gmane> for search queries, and
+C<$SQLITE_FILENAME> is points to an SQLite DB.  C<$SQLITE_FILENAME> may
+be an absolute path or a path relative to C<INBOXDIR> for v2 inboxes or
+C<INBOXDIR/public-inbox> for v1 inboxes.
+
+The schema of C<$SQLITE_FILENAME> should be the same as a
+C<msgmap.sqlite3>.  See C<scripts/xhdr-num2mid> in the public-inbox
+source tree for an example of how to generate such a mapping from
+via NNTP.
+
+This is a noop with C<indexlevel=basic>
+
+Default: none
+
+=item publicinbox.<name>.indexheader
+
+Supports indexing of arbitrary mail headers in Xapian.
+
+It must be specified in the form of
+C<$TYPE:$USER_PREFIX:$MAIL_HEADER:$PARAMS>
+where C<$TYPE> determines how it's indexed and queried;
+C<$USER_PREFIX> is a lowercase prefix for search queries,
+C<$MAIL_HEADER> is the header to index (e.g. C<X-Label>),
+C<$PARAMS> is a URL-style query string for optional parameters.
+
+Valid C<$TYPE> values (in ascending order of storage cost) are as follows:
+
+* C<boolean_term> - index for simple filtering (not sortable by relevance)
+
+* C<text> - add frequency information to allow sorting by relevance
+
+* C<phrase> - add positional information to match sentences or phrases
+
+In other words: C<phrase> forces indexing of a particular header to
+behave like it used C<indexlevel=full>; while C<text> indexes as if
+that header used C<indexlevel=medium>.
+
+Valid keys in C<$PARAMS> include:
+
+* raw - do not perform RFC2047 decoding of headers
+
+Example:
+
+	[publicinbox "foo"]
+		indexheader = boolean_term:xlabel:X-Label:raw=1
+
+Support for other parameters is not finalized and subject to change.
+
+This is a noop with C<indexlevel=basic>
+
+New in public-inbox 2.0.0 (PENDING)
+
+Default: none
+
 =item publicinbox.<name>.replyto
 
 May be used to control how reply instructions in the PSGI
diff --git a/MANIFEST b/MANIFEST
index af65a86e..34d3ef14 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -228,6 +228,7 @@ lib/PublicInbox/In3Watch.pm
 lib/PublicInbox/Inbox.pm
 lib/PublicInbox/InboxIdle.pm
 lib/PublicInbox/InboxWritable.pm
+lib/PublicInbox/IndexHeader.pm
 lib/PublicInbox/Inotify.pm
 lib/PublicInbox/Inotify3.pm
 lib/PublicInbox/InputPipe.pm
@@ -630,6 +631,7 @@ t/v2writable.t
 t/view.t
 t/watch_filter_rubylang.t
 t/watch_imap.t
+t/watch_indexheader.t
 t/watch_maildir.t
 t/watch_maildir_v2.t
 t/watch_mh.t
diff --git a/lib/PublicInbox/AltId.pm b/lib/PublicInbox/AltId.pm
index 80757ceb..bd6cf973 100644
--- a/lib/PublicInbox/AltId.pm
+++ b/lib/PublicInbox/AltId.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2016-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # Used for giving serial numbers to messages.  This can be tied to
@@ -10,25 +10,20 @@
 # it leads to reliance on centralization.  However, being able
 # to use existing serial numbers is beneficial.
 package PublicInbox::AltId;
-use strict;
-use warnings;
-use URI::Escape qw(uri_unescape);
-use PublicInbox::Msgmap;
+use v5.12;
+use parent qw(PublicInbox::IndexHeader);
 
 # spec: TYPE:PREFIX:param1=value1&param2=value2&...
 # The PREFIX will be a searchable boolean prefix in Xapian
 # Example: serial:gmane:file=/path/to/altmsgmap.sqlite3
 sub new {
 	my ($class, $ibx, $spec, $writable) = @_;
-	my ($type, $prefix, $query) = split(/:/, $spec, 3);
-	$type eq 'serial' or die "non-serial not supported, yet\n";
-	$prefix =~ /\A\w+\z/ or warn "non-word prefix not searchable\n";
-	my %params = map {
-		my ($k, $v) = split(/=/, uri_unescape($_), 2);
-		$v = '' unless defined $v;
-		($k, $v);
-	} split(/[&;]/, $query);
-	my $f = $params{file} or die "file: required for $type spec $spec\n";
+	my ($type, $pfx, $query) = split /:/, $spec, 3;
+	$type eq 'serial' or die "E: non-serial not supported, yet ($spec)\n";
+	my $self = bless {}, $class;
+	my $params = $self->extra_indexer_new_common($spec, $pfx, $query);
+	my $f = delete $params->{file} or
+		die "E: file= required for $type spec $spec\n";
 	unless (index($f, '/') == 0) {
 		if ($ibx->version == 1) {
 			$f = "$ibx->{inboxdir}/public-inbox/$f";
@@ -36,26 +31,37 @@ sub new {
 			$f = "$ibx->{inboxdir}/$f";
 		}
 	}
-	bless {
-		filename => $f,
-		writable => $writable,
-		prefix => $prefix,
-		xprefix => 'X'.uc($prefix),
-	}, $class;
+	my @k = keys %$params;
+	warn "W: unknown params in `$spec': ", join(', ', @k), "\n" if @k;
+	$self->{filename} = $f;
+	$self->{writable} = $writable if $writable;
+	$self;
 }
 
-sub mm_alt {
+sub mm_alt ($) {
 	my ($self) = @_;
 	$self->{mm_alt} ||= eval {
-		my $f = $self->{filename};
-		my $writable = $self->{writable};
-		PublicInbox::Msgmap->new_file($f, $writable);
+		require PublicInbox::Msgmap;
+		PublicInbox::Msgmap->new_file(@$self{qw(filename writable)});
 	};
 }
 
-sub mid2alt {
-	my ($self, $mid) = @_;
-	$self->mm_alt->num_for($mid);
+sub index_extra { # for PublicInbox::SearchIdx
+	my ($self, $sidx, $eml, $mids) = @_;
+	for my $mid (@$mids) {
+		my $id = mm_alt($self)->num_for($mid) // next;
+		$sidx->index_boolean_term($self->{xprefix}, $id);
+	}
 }
 
+sub user_help { # for PublicInbox::Search
+	my ($self) = @_;
+	("$self->{prefix}:", <<EOF);
+alternate serial number  e.g. $self->{prefix}:12345 (boolean)
+EOF
+}
+
+# callback for PublicInbox::Search
+sub query_parser_method { 'add_boolean_prefix' }
+
 1;
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 998fc25e..3af5f23c 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -481,7 +481,7 @@ sub _fill_ibx {
 	# more things to encourage decentralization
 	for my $k (qw(address altid nntpmirror imapmirror
 			coderepo hide listid url
-			infourl watchheader
+			infourl watchheader indexheader
 			nntpserver imapserver pop3server)) {
 		my $v = $self->{"$pfx.$k"} // next;
 		$ibx->{$k} = _array($v);
diff --git a/lib/PublicInbox/IndexHeader.pm b/lib/PublicInbox/IndexHeader.pm
new file mode 100644
index 00000000..53e9373b
--- /dev/null
+++ b/lib/PublicInbox/IndexHeader.pm
@@ -0,0 +1,73 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# allow searching on arbitrary headers as text
+package PublicInbox::IndexHeader;
+use v5.12;
+use URI::Escape qw(uri_unescape);
+
+my %T2IDX = ( # map to PublicInbox::SearchIdx methods
+	phrase => 'index_phrase1',
+	boolean_term => 'index_boolean_term',
+	text => 'index_text1',
+);
+
+# also called by AltId->new
+sub extra_indexer_new_common ($$$$) {
+	my ($self, $spec, $pfx, $query) = @_;
+	$pfx =~ /\A[a-z][a-z0-9]*\z/ or
+		warn "W: non-word prefix in `$spec' not searchable\n";
+	$self->{prefix} = $pfx;
+	my %params = map {
+		my ($k, $v) = split /=/, uri_unescape($_), 2;
+		($k, $v // '');
+	} split /[&;]/, $query // '';
+	my $xpfx = delete($params{index_prefix}) // "X\U$pfx";
+	$xpfx =~ /\A[A-Z][A-Z0-9]*\z/ or die
+		die "E: `index_prefix' in `$spec' must be ALL CAPS\n";
+	$self->{xprefix} = $xpfx;
+	\%params;
+}
+
+sub new {
+	my ($cls, $ibx, $spec) = @_;
+	my ($type, $pfx, $header, $query) = split /:/, $spec, 4;
+	$pfx // die "E: `$spec' has no user prefix\n";
+	$header // die "E: `$spec' has no mail header\n";
+	my $self = bless { header => $header, type => $type }, $cls;
+	my $params = extra_indexer_new_common $self, $spec, $pfx, $query;
+	$self->{hdr_method} = delete $params->{raw} ? 'header_raw' : 'header';
+	my @k = keys %$params;
+	warn "W: unknown params in `$spec': ", join(', ', @k), "\n" if @k;
+	$T2IDX{$type} // die
+		"E: `$type' not supported in $spec, must be one of: ",
+		join(', ', sort keys %T2IDX), "\n";
+	$self;
+}
+
+sub index_extra { # for PublicInbox::SearchIdx
+	my ($self, $sidx, $eml, $mids) = @_;
+	my $idx_method = $self->{-idx_method} //= $T2IDX{$self->{type}};
+	my $hdr_method = $self->{hdr_method};
+	for my $val ($eml->$hdr_method($self->{header})) {
+		$sidx->$idx_method($self->{xprefix}, $val);
+	}
+}
+
+sub user_help { # for PublicInbox::Search
+	my ($self) = @_;
+	("$self->{prefix}:", <<EOF);
+the `$self->{header}' mail header  e.g. $self->{prefix}:stable
+EOF
+}
+
+my %TYPE_2_QPMETHOD = (
+	phrase => 'add_prefix',
+	boolean_term => 'add_boolean_prefix',
+	text => 'add_prefix',
+);
+
+# callback for PublicInbox::Search
+sub query_parser_method { $TYPE_2_QPMETHOD{$_[0]->{type}} }
+
+1;
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 649157be..6a0bdb0f 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -292,13 +292,25 @@ sub xdb ($) {
 	};
 }
 
+sub load_extra_indexers ($$) {
+	my ($self, $ibx) = @_;
+	my @extra;
+	for my $f (qw(IndexHeader AltId)) {
+		my $specs = $ibx->{lc $f} // next;
+		my $cls = "PublicInbox::$f";
+		eval "require $cls" or die $@;
+		push @extra, map { $cls->new($ibx, $_) } @$specs;
+	}
+	$self->{-extra} = \@extra if @extra;
+}
+
 sub new {
 	my ($class, $ibx) = @_;
 	ref $ibx or die "BUG: expected PublicInbox::Inbox object: $ibx";
 	my $xap = $ibx->version > 1 ? 'xap' : 'public-inbox/xapian';
 	my $xpfx = "$ibx->{inboxdir}/$xap".SCHEMA_VERSION;
 	my $self = bless { xpfx => $xpfx }, $class;
-	$self->{altid} = $ibx->{altid} if defined($ibx->{altid});
+	$self->load_extra_indexers($ibx);
 	$self;
 }
 
@@ -439,6 +451,8 @@ sub xhc_start_maybe (@) {
 	$xhc;
 }
 
+my %QPMETHOD_2_SYM = (add_prefix => ':', add_boolean_prefix => '=');
+
 sub xh_opt ($$) {
 	my ($self, $opt) = @_;
 	my $lim = $opt->{limit} || 50;
@@ -464,9 +478,9 @@ sub xh_opt ($$) {
 	push @ret, '-O', $opt->{eidx_key} if defined $opt->{eidx_key};
 	my $apfx = $self->{-alt_pfx} //= do {
 		my @tmp;
-		for (grep /\Aserial:/, @{$self->{altid} // []}) {
-			my (undef, $pfx) = split /:/, $_;
-			push @tmp, '-Q', "$pfx=X\U$pfx";
+		for my $x (@{$self->{-extra} // []}) {
+			my $sym = $QPMETHOD_2_SYM{$x->query_parser_method};
+			push @tmp, '-Q', $x->{prefix}.$sym.$x->{xprefix};
 		}
 		# TODO: arbitrary header indexing goes here
 		\@tmp;
@@ -593,21 +607,12 @@ sub qparse_new {
 		$qp->add_boolean_prefix($name, $_) foreach split(/ /, $prefix);
 	}
 
-	# we do not actually create AltId objects,
-	# just parse the spec to avoid the extra DB handles for now.
-	if (my $altid = $self->{altid}) {
+	if (my $extra = $self->{-extra}) {
 		my $user_pfx = $self->{-user_pfx} = [];
-		for (@$altid) {
-			# $_ = 'serial:gmane:/path/to/gmane.msgmap.sqlite3'
-			# note: Xapian supports multibyte UTF-8, /^[0-9]+$/,
-			# and '_' with prefixes matching \w+
-			/\Aserial:(\w+):/ or next;
-			my $pfx = $1;
-			push @$user_pfx, "$pfx:", <<EOF;
-alternate serial number  e.g. $pfx:12345 (boolean)
-EOF
-			# gmane => XGMANE
-			$qp->add_boolean_prefix($pfx, 'X'.uc($pfx));
+		for my $x (@$extra) {
+			push @$user_pfx, $x->user_help;
+			my $m = $x->query_parser_method;
+			$qp->$m(@$x{qw(prefix xprefix)});
 		}
 		chomp @$user_pfx;
 	}
@@ -654,7 +659,7 @@ EOM
 
 sub help {
 	my ($self) = @_;
-	$self->{qp} // $self->qparse_new; # parse altids
+	$self->{qp} // $self->qparse_new; # parse altids + indexheaders
 	my @ret = @HELP;
 	if (my $user_pfx = $self->{-user_pfx}) {
 		push @ret, @$user_pfx;
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 4fd493d9..b2576e52 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -52,11 +52,6 @@ sub new {
 	my $inboxdir = $ibx->{inboxdir};
 	my $version = $ibx->version;
 	my $indexlevel = 'full';
-	my $altid = $ibx->{altid};
-	if ($altid) {
-		require PublicInbox::AltId;
-		$altid = [ map { PublicInbox::AltId->new($ibx, $_); } @$altid ];
-	}
 	if ($ibx->{indexlevel}) {
 		if ($ibx->{indexlevel} =~ $INDEXLEVELS) {
 			$indexlevel = $ibx->{indexlevel};
@@ -69,7 +64,7 @@ sub new {
 	my $self = PublicInbox::Search->new($ibx);
 	bless $self, $class;
 	$self->{ibx} = $ibx;
-	$self->{-altid} = $altid;
+	$self->load_extra_indexers($ibx);
 	$self->{indexlevel} = $indexlevel;
 	$self->{-set_indexlevel_once} = 1 if $indexlevel eq 'medium';
 	if ($ibx->{-skip_docdata}) {
@@ -184,6 +179,22 @@ sub index_phrase ($$$$) {
 	$self->{term_generator}->increase_termpos;
 }
 
+sub index_phrase1 { # called by various ->index_extra
+	my ($self, $pfx, $text) = @_;
+	index_phrase $self, $text, 1, $pfx;
+}
+
+sub index_text1 { # called by various ->index_extra
+	my ($self, $pfx, $text) = @_;
+	$self->{term_generator}->index_text_without_positions($text, 1, $pfx);
+}
+
+sub index_boolean_term { # called by various ->index_extra
+	my ($self, $pfx, $term) = @_;
+	my $doc = $self->{term_generator}->get_document;
+	$doc->add_boolean_term($pfx.$term);
+}
+
 sub index_text ($$$$) {
 	my ($self, $text, $wdf_inc, $prefix) = @_;
 
@@ -481,15 +492,8 @@ sub eml2doc ($$$;$) {
 		$doc->set_data($data);
 	}
 
-	if (my $altid = $self->{-altid}) {
-		foreach my $alt (@$altid) {
-			my $pfx = $alt->{xprefix};
-			foreach my $mid (@$mids) {
-				my $id = $alt->mid2alt($mid);
-				next unless defined $id;
-				$doc->add_boolean_term($pfx . $id);
-			}
-		}
+	for my $extra (@{$self->{-extra} // []}) {
+		$extra->index_extra($self, $eml, $mids);
 	}
 	$doc;
 }
diff --git a/t/watch_indexheader.t b/t/watch_indexheader.t
new file mode 100644
index 00000000..e815fca9
--- /dev/null
+++ b/t/watch_indexheader.t
@@ -0,0 +1,92 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use v5.12;
+use autodie;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use PublicInbox::Emergency;
+use PublicInbox::IO qw(write_file);
+use PublicInbox::InboxIdle;
+use PublicInbox::Inbox;
+use PublicInbox::DS;
+use PublicInbox::Config;
+require_mods(qw(DBD::SQLite Xapian));
+my $tmpdir = tmpdir;
+my $config = "$tmpdir/pi_config";
+local $ENV{PI_CONFIG} = $config;
+delete local $ENV{PI_DIR};
+my @V = (1);
+my @creat_opt = (indexlevel => 'medium', sub {});
+my $v1 = create_inbox 'v1', tmpdir => "$tmpdir/v1", @creat_opt;
+my $fh = write_file '>', $config, <<EOM;
+[publicinbox "v1"]
+	inboxdir = $v1->{inboxdir}
+	address = v1\@example.com
+	watch = maildir:$tmpdir/v1-md
+	indexheader = boolean_term:xarchiveshash:X-Archives-Hash
+EOM
+
+SKIP: {
+	require_git(v2.6, 1);
+	push @V, 2;
+	my $v2 = create_inbox 'v2', tmpdir => "$tmpdir/v2", @creat_opt;
+	print $fh <<EOM;
+[publicinbox "v2"]
+	inboxdir = $tmpdir/v2
+	address = v2\@example.com
+	watch = maildir:$tmpdir/v2-md
+	indexheader = boolean_term:xarchiveshash:X-Archives-Hash
+EOM
+}
+close $fh;
+my $cfg = PublicInbox::Config->new;
+for my $v (@V) { for ('', qw(cur new tmp)) { mkdir "$tmpdir/v$v-md/$_" } }
+my $wm = start_script([qw(-watch)]);
+my $h1 = 'deadbeef' x 4;
+my @em = map {
+	my $v = $_;
+	my $em = PublicInbox::Emergency->new("$tmpdir/v$v-md");
+	$em->prepare(\(PublicInbox::Eml->new(<<EOM)->as_string));
+From: x\@example.com
+Message-ID: <i-1$v\@example.com>
+To: <v$v\@example.com>
+Date: Sat, 02 Oct 2010 00:00:00 +0000
+X-Archives-Hash: $h1
+
+EOM
+	$em;
+} @V;
+
+my $delivered = 0;
+my $cb = sub {
+	diag "message delivered to `$_[0]->{name}'";
+	++$delivered;
+};
+PublicInbox::DS->Reset;
+my $ii = PublicInbox::InboxIdle->new($cfg);
+my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup';
+$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
+local @PublicInbox::DS::post_loop_do = (sub { $delivered != @V });
+$_->commit for @em;
+diag 'waiting for -watch to import new message(s)';
+PublicInbox::DS::event_loop();
+$wm->join('TERM');
+$ii->close;
+
+$cfg->each_inbox(sub {
+	my ($ibx) = @_;
+	my $srch = $ibx->search;
+	my $mset = $srch->mset('xarchiveshash:miss');
+	is($mset->size, 0, 'got xarchiveshash:miss non-result');
+	$mset = $srch->mset("xarchiveshash:$h1");
+	is($mset->size, 1, 'got xarchiveshash: hit result') or return;
+	my $num = $srch->mset_to_artnums($mset);
+	my $eml = $ibx->smsg_eml($ibx->over->get_art($num->[0]));
+	is($eml->header_raw('X-Archives-Hash'), $h1,
+		'stored message with X-Archives-Hash');
+	my @opt = $srch->xh_opt;
+	is $opt[-2], '-Q', 'xap_helper -Q switch';
+	is $opt[-1], 'xarchiveshash=XXARCHIVESHASH', 'xap_helper -Q arg';
+});
+
+done_testing;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/11] indexheader: deduplicate common values
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
  2024-08-10  9:00 ` [PATCH 01/11] search: support per-inbox indexheader directive Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 03/11] search: help: avoid ':' in user prefixes Eric Wong
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

Since we plan on sharing IndexHeader across multiple inboxes for
large installations with thousands of inboxes, it makes sense to
deduplicate the values to save some memory at the cost of
increased startup time.
---
 lib/PublicInbox/IndexHeader.pm | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/IndexHeader.pm b/lib/PublicInbox/IndexHeader.pm
index 53e9373b..07827959 100644
--- a/lib/PublicInbox/IndexHeader.pm
+++ b/lib/PublicInbox/IndexHeader.pm
@@ -17,7 +17,8 @@ sub extra_indexer_new_common ($$$$) {
 	my ($self, $spec, $pfx, $query) = @_;
 	$pfx =~ /\A[a-z][a-z0-9]*\z/ or
 		warn "W: non-word prefix in `$spec' not searchable\n";
-	$self->{prefix} = $pfx;
+	my %dedupe = ($pfx => undef);
+	($self->{prefix}) = keys %dedupe;
 	my %params = map {
 		my ($k, $v) = split /=/, uri_unescape($_), 2;
 		($k, $v // '');
@@ -25,7 +26,8 @@ sub extra_indexer_new_common ($$$$) {
 	my $xpfx = delete($params{index_prefix}) // "X\U$pfx";
 	$xpfx =~ /\A[A-Z][A-Z0-9]*\z/ or die
 		die "E: `index_prefix' in `$spec' must be ALL CAPS\n";
-	$self->{xprefix} = $xpfx;
+	%dedupe = ($xpfx => undef);
+	($self->{xprefix}) = keys %dedupe;
 	\%params;
 }
 
@@ -34,14 +36,18 @@ sub new {
 	my ($type, $pfx, $header, $query) = split /:/, $spec, 4;
 	$pfx // die "E: `$spec' has no user prefix\n";
 	$header // die "E: `$spec' has no mail header\n";
+	$T2IDX{$type} // die
+		"E: `$type' not supported in $spec, must be one of: ",
+		join(', ', sort keys %T2IDX), "\n";
+	my %dedupe = ($type => undef);
+	($type) = keys %dedupe;
+	%dedupe = ($header => undef);
+	($header) = keys %dedupe;
 	my $self = bless { header => $header, type => $type }, $cls;
 	my $params = extra_indexer_new_common $self, $spec, $pfx, $query;
 	$self->{hdr_method} = delete $params->{raw} ? 'header_raw' : 'header';
 	my @k = keys %$params;
 	warn "W: unknown params in `$spec': ", join(', ', @k), "\n" if @k;
-	$T2IDX{$type} // die
-		"E: `$type' not supported in $spec, must be one of: ",
-		join(', ', sort keys %T2IDX), "\n";
 	$self;
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/11] search: help: avoid ':' in user prefixes
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
  2024-08-10  9:00 ` [PATCH 01/11] search: support per-inbox indexheader directive Eric Wong
  2024-08-10  9:00 ` [PATCH 02/11] indexheader: deduplicate common values Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 04/11] search: move QueryParser mappings to xh_args Eric Wong
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

The non-':'-suffixed variation of the string is already used as
hash keys and literals elsewhere.  Theoretically, a Perl
implementation can save some allocations this way (though Perl 5
currently doesn't).

In any case, we'll introduce a help2txt method to allow sharing
code between the callers in WwwText and Documentation/common.perl
---
 Documentation/common.perl      | 22 ++---------
 lib/PublicInbox/AltId.pm       |  2 +-
 lib/PublicInbox/IndexHeader.pm |  2 +-
 lib/PublicInbox/Isearch.pm     |  2 +-
 lib/PublicInbox/Search.pm      | 71 ++++++++++++++++++++--------------
 lib/PublicInbox/WwwText.pm     | 25 +-----------
 6 files changed, 49 insertions(+), 75 deletions(-)

diff --git a/Documentation/common.perl b/Documentation/common.perl
index 3a6617c4..53bae495 100755
--- a/Documentation/common.perl
+++ b/Documentation/common.perl
@@ -3,7 +3,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Fcntl qw(SEEK_SET);
-my $have_search = eval { require PublicInbox::Search; 1 };
+use PublicInbox::Search;
 my $addr = 'meta@public-inbox.org';
 for my $pod (@ARGV) {
 	open my $fh, '+<', $pod or die "open($pod): $!";
@@ -27,7 +27,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1$1
 		!ms;
-	$have_search and $s =~ s!^=for\scomment\n
+	$s =~ s!^=for\scomment\n
 			^AUTO-GENERATED-SEARCH-TERMS-BEGIN\n
 			.+?
 			^=for\scomment\n
@@ -46,23 +46,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 }
 
 sub search_terms {
-	my $help = eval('\@PublicInbox::Search::HELP');
-	my $s = '';
-	my $pad = 0;
-	my $i;
-	for ($i = 0; $i < @$help; $i += 2) {
-		my $pfx = $help->[$i];
-		my $n = length($pfx);
-		$pad = $n if $n > $pad;
-		$s .= $pfx . "\0";
-		$s .= $help->[$i + 1];
-		$s .= "\f\n";
-	}
-	$pad += 2;
-	my $padding = ' ' x ($pad + 4);
-	$s =~ s/^/$padding/gms;
-	$s =~ s/^$padding(\S+)\0/"    $1".(' ' x ($pad - length($1)))/egms;
-	$s =~ s/\f\n/\n/gs;
+	my $s = PublicInbox::Search::help2txt(@PublicInbox::Search::HELP);
 	$s =~ s/^  //gms;
 	substr($s, 0, 0, "=for comment\nAUTO-GENERATED-SEARCH-TERMS-BEGIN\n\n");
 	$s .= "\n=for comment\nAUTO-GENERATED-SEARCH-TERMS-END\n";
diff --git a/lib/PublicInbox/AltId.pm b/lib/PublicInbox/AltId.pm
index bd6cf973..76dc23e6 100644
--- a/lib/PublicInbox/AltId.pm
+++ b/lib/PublicInbox/AltId.pm
@@ -56,7 +56,7 @@ sub index_extra { # for PublicInbox::SearchIdx
 
 sub user_help { # for PublicInbox::Search
 	my ($self) = @_;
-	("$self->{prefix}:", <<EOF);
+	($self->{prefix}, <<EOF);
 alternate serial number  e.g. $self->{prefix}:12345 (boolean)
 EOF
 }
diff --git a/lib/PublicInbox/IndexHeader.pm b/lib/PublicInbox/IndexHeader.pm
index 07827959..a67080f9 100644
--- a/lib/PublicInbox/IndexHeader.pm
+++ b/lib/PublicInbox/IndexHeader.pm
@@ -62,7 +62,7 @@ sub index_extra { # for PublicInbox::SearchIdx
 
 sub user_help { # for PublicInbox::Search
 	my ($self) = @_;
-	("$self->{prefix}:", <<EOF);
+	($self->{prefix}, <<EOF);
 the `$self->{header}' mail header  e.g. $self->{prefix}:stable
 EOF
 }
diff --git a/lib/PublicInbox/Isearch.pm b/lib/PublicInbox/Isearch.pm
index 20808d6d..9566f710 100644
--- a/lib/PublicInbox/Isearch.pm
+++ b/lib/PublicInbox/Isearch.pm
@@ -131,7 +131,7 @@ sub mset_to_smsg {
 
 sub has_threadid { 1 }
 
-sub help { $_[0]->{es}->help }
+sub help_txt { $_[0]->{es}->help_txt }
 
 sub xh_args { # prep getopt args to feed to xap_helper.h socket
 	my ($self, $opt) = @_; # TODO uid_range
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 6a0bdb0f..bdf5591c 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -190,33 +190,33 @@ my %prob_prefix = (
 # especially since we don't offer boolean searches for To/Cc/From
 # headers, either
 our @HELP = (
-	's:' => 'match within Subject  e.g. s:"a quick brown fox"',
-	'd:' => <<EOF,
+	s => 'match within Subject  e.g. s:"a quick brown fox"',
+	d => <<EOF,
 match date-time range, git "approxidate" formats supported
 Open-ended ranges such as `d:last.week..' and
 `d:..2.days.ago' are supported
 EOF
-	'b:' => 'match within message body, including text attachments',
-	'nq:' => 'match non-quoted text within message body',
-	'q:' => 'match quoted text within message body',
-	'n:' => 'match filename of attachment(s)',
-	't:' => 'match within the To header',
-	'c:' => 'match within the Cc header',
-	'f:' => 'match within the From header',
-	'a:' => 'match within the To, Cc, and From headers',
-	'tc:' => 'match within the To and Cc headers',
-	'l:' => 'match contents of the List-Id header',
-	'bs:' => 'match within the Subject and body',
-	'dfn:' => 'match filename from diff',
-	'dfa:' => 'match diff removed (-) lines',
-	'dfb:' => 'match diff added (+) lines',
-	'dfhh:' => 'match diff hunk header context (usually a function name)',
-	'dfctx:' => 'match diff context lines',
-	'dfpre:' => 'match pre-image git blob ID',
-	'dfpost:' => 'match post-image git blob ID',
-	'dfblob:' => 'match either pre or post-image git blob ID',
-	'patchid:' => "match `git patch-id --stable' output",
-	'rt:' => <<EOF,
+	b => 'match within message body, including text attachments',
+	nq => 'match non-quoted text within message body',
+	q => 'match quoted text within message body',
+	n => 'match filename of attachment(s)',
+	t => 'match within the To header',
+	c => 'match within the Cc header',
+	f => 'match within the From header',
+	a => 'match within the To, Cc, and From headers',
+	tc => 'match within the To and Cc headers',
+	l => 'match contents of the List-Id header',
+	bs => 'match within the Subject and body',
+	dfn => 'match filename from diff',
+	dfa => 'match diff removed (-) lines',
+	dfb => 'match diff added (+) lines',
+	dfhh => 'match diff hunk header context (usually a function name)',
+	dfctx => 'match diff context lines',
+	dfpre => 'match pre-image git blob ID',
+	dfpost => 'match post-image git blob ID',
+	dfblob => 'match either pre or post-image git blob ID',
+	patchid => "match `git patch-id --stable' output",
+	rt => <<EOF,
 match received time, like `d:' if sender's clock was correct
 EOF
 );
@@ -657,14 +657,27 @@ EOM
 	$ret .= "}\n";
 }
 
-sub help {
+sub help2txt (@) { # also used by Documentation/common.perl
+	my @help = @_;
+	my $pad = 0;
+	my $htxt = '';
+	while (defined(my $pfx = shift @help)) {
+		my $n = length($pfx) + 1;
+		$pad = $n if $n > $pad;
+		$htxt .= $pfx . ":\0" . shift(@help) . "\f\n";
+	}
+	$pad += 2;
+	my $padding = ' ' x ($pad + 4);
+	$htxt =~ s/^/$padding/gms;
+	$htxt =~ s/^$padding(\S+)\0/"    $1".(' ' x ($pad - length($1)))/egms;
+	$htxt =~ s/\f\n/\n/gs;
+	$htxt;
+}
+
+sub help_txt {
 	my ($self) = @_;
 	$self->{qp} // $self->qparse_new; # parse altids + indexheaders
-	my @ret = @HELP;
-	if (my $user_pfx = $self->{-user_pfx}) {
-		push @ret, @$user_pfx;
-	}
-	\@ret;
+	help2txt(@HELP, @{$self->{-user_pfx} // []});
 }
 
 # always returns a scalar value
diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index 8279591a..d39083b6 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -75,29 +75,6 @@ sub get_text {
 	PublicInbox::WwwStream::html_oneshot($ctx, $code, $txt);
 }
 
-sub _srch_prefix ($$) {
-	my ($ibx, $txt) = @_;
-	my $pad = 0;
-	my $htxt = '';
-	my $help = $ibx->isrch->help;
-	my $i;
-	for ($i = 0; $i < @$help; $i += 2) {
-		my $pfx = $help->[$i];
-		my $n = length($pfx);
-		$pad = $n if $n > $pad;
-		$htxt .= $pfx . "\0";
-		$htxt .= $help->[$i + 1];
-		$htxt .= "\f\n";
-	}
-	$pad += 2;
-	my $padding = ' ' x ($pad + 4);
-	$htxt =~ s/^/$padding/gms;
-	$htxt =~ s/^$padding(\S+)\0/"    $1".(' ' x ($pad - length($1)))/egms;
-	$htxt =~ s/\f\n/\n/gs;
-	$$txt .= $htxt;
-	1;
-}
-
 sub _colors_help ($$) {
 	my ($ctx, $txt) = @_;
 	my $ibx = $ctx->{ibx};
@@ -461,7 +438,7 @@ search
   Prefixes supported in this installation include:
 
 EOF
-		_srch_prefix($ibx, $txt);
+		$$txt .= $ibx->isrch->help_txt;
 		$$txt .= <<EOF;
 
   Most prefixes are probabilistic, meaning they support stemming

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/11] search: move QueryParser mappings to xh_args
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (2 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 03/11] search: help: avoid ':' in user prefixes Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 05/11] www_text: show indexheader contents in help Eric Wong
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

These are stable per-search instance.  We'll also deduplicate
the strings in case multiple inboxes share the same mappings.
---
 lib/PublicInbox/Search.pm | 23 ++++++++++++-----------
 t/watch_indexheader.t     |  2 +-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index bdf5591c..dfe0271b 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -476,16 +476,7 @@ sub xh_opt ($$) {
 	push @ret, '-t' if $opt->{threads};
 	push @ret, '-T', $opt->{threadid} if defined $opt->{threadid};
 	push @ret, '-O', $opt->{eidx_key} if defined $opt->{eidx_key};
-	my $apfx = $self->{-alt_pfx} //= do {
-		my @tmp;
-		for my $x (@{$self->{-extra} // []}) {
-			my $sym = $QPMETHOD_2_SYM{$x->query_parser_method};
-			push @tmp, '-Q', $x->{prefix}.$sym.$x->{xprefix};
-		}
-		# TODO: arbitrary header indexing goes here
-		\@tmp;
-	};
-	(@ret, @$apfx);
+	@ret;
 }
 
 # returns a true value if actually handled asynchronously,
@@ -730,7 +721,17 @@ sub all_terms {
 }
 
 sub xh_args { # prep getopt args to feed to xap_helper.h socket
-	map { ('-d', $_) } shard_dirs($_[0]);
+	my ($self) = @_;
+	my $apfx = $self->{-alt_pfx} //= do {
+		my %dedupe;
+		for my $x (@{$self->{-extra} // []}) {
+			my $sym = $QPMETHOD_2_SYM{$x->query_parser_method};
+			$dedupe{$x->{prefix}.$sym.$x->{xprefix}} = undef;
+		}
+		# TODO: arbitrary header indexing goes here
+		[ sort keys %dedupe ];
+	};
+	((map { ('-d', $_) } shard_dirs($self)), map { ('-Q', $_) } @$apfx);
 }
 
 sub docids_by_postlist ($$) {
diff --git a/t/watch_indexheader.t b/t/watch_indexheader.t
index e815fca9..623698e7 100644
--- a/t/watch_indexheader.t
+++ b/t/watch_indexheader.t
@@ -84,7 +84,7 @@ $cfg->each_inbox(sub {
 	my $eml = $ibx->smsg_eml($ibx->over->get_art($num->[0]));
 	is($eml->header_raw('X-Archives-Hash'), $h1,
 		'stored message with X-Archives-Hash');
-	my @opt = $srch->xh_opt;
+	my @opt = $srch->xh_args;
 	is $opt[-2], '-Q', 'xap_helper -Q switch';
 	is $opt[-1], 'xarchiveshash=XXARCHIVESHASH', 'xap_helper -Q arg';
 });

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/11] www_text: show indexheader contents in help
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (3 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 04/11] search: move QueryParser mappings to xh_args Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 06/11] www: don't memoize ->user_help contents Eric Wong
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

This will allow mirror-ers to see how indexing gets done and
replicate the results on their end.
---
 lib/PublicInbox/WwwText.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index d39083b6..20b22136 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -168,8 +168,8 @@ sub inbox_config ($$) {
 	url = https://example.com/$name/
 	url = http://example.onion/$name/
 EOS
-	for my $k (qw(address listid infourl watchheader)) {
-		defined(my $v = $ibx->{$k}) or next;
+	for my $k (qw(address listid infourl watchheader indexheader)) {
+		my $v = $ibx->{$k} // next;
 		$$txt .= "\t$k = $_\n" for @$v;
 	}
 	if (my $altid = $ibx->{altid}) {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/11] www: don't memoize ->user_help contents
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (4 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 05/11] www_text: show indexheader contents in help Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 07/11] extindex: avoid branch in ->index_eml Eric Wong
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

Generating it is cheap enough and not worth the extra memory
and long-lived allocations.  We can avoid allocating a
Xapian::QueryParser object here, too, to avoid wasting memory
for xap_helper external process users.
---
 lib/PublicInbox/Search.pm | 16 ++++------------
 t/www_altid.t             |  3 +++
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index dfe0271b..784e3b0a 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -598,16 +598,10 @@ sub qparse_new {
 		$qp->add_boolean_prefix($name, $_) foreach split(/ /, $prefix);
 	}
 
-	if (my $extra = $self->{-extra}) {
-		my $user_pfx = $self->{-user_pfx} = [];
-		for my $x (@$extra) {
-			push @$user_pfx, $x->user_help;
-			my $m = $x->query_parser_method;
-			$qp->$m(@$x{qw(prefix xprefix)});
-		}
-		chomp @$user_pfx;
+	for my $x (@{$self->{-extra} // []}) {
+		my $m = $x->query_parser_method;
+		$qp->$m(@$x{qw(prefix xprefix)});
 	}
-
 	while (my ($name, $prefix) = each %prob_prefix) {
 		$qp->add_prefix($name, $_) foreach split(/ /, $prefix);
 	}
@@ -666,9 +660,7 @@ sub help2txt (@) { # also used by Documentation/common.perl
 }
 
 sub help_txt {
-	my ($self) = @_;
-	$self->{qp} // $self->qparse_new; # parse altids + indexheaders
-	help2txt(@HELP, @{$self->{-user_pfx} // []});
+	help2txt(@HELP, map { $_->user_help } @{$_[0]->{-extra} // []});
 }
 
 # always returns a scalar value
diff --git a/t/www_altid.t b/t/www_altid.t
index 7ad4a1d2..6f0f0c61 100644
--- a/t/www_altid.t
+++ b/t/www_altid.t
@@ -62,6 +62,9 @@ my $client = sub {
 	is $res->code, 200, 'altid search hit';
 	$res = $cb->(GET('/test/?q=xyz:10'));
 	is $res->code, 404, 'altid search miss';
+	$res = $cb->(GET('/test/_/text/help/'));
+	is $res->code, 200, 'altid help hit';
+	like $res->content, qr/\b$aid:/, 'altid shown in help';
 };
 test_psgi(sub { $www->call(@_) }, $client);
 SKIP: {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/11] extindex: avoid branch in ->index_eml
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (5 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 06/11] www: don't memoize ->user_help contents Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 08/11] t/extsearch: use autodie to detect chmod failures Eric Wong
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

We'll probably be stuffing more extindex-specific fields in
here, so it simplifies our internal API.
---
 lib/PublicInbox/ExtSearchIdx.pm   | 6 ++++--
 lib/PublicInbox/SearchIdxShard.pm | 3 +--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 934197c0..68700c8b 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -245,7 +245,8 @@ sub index_unseen ($) {
 	my $oid = $new_smsg->{blob};
 	my $ibx = delete $req->{ibx} or die 'BUG: {ibx} unset';
 	$self->{oidx}->add_xref3($docid, $req->{xnum}, $oid, $ibx->eidx_key);
-	$idx->index_eml($eml, $new_smsg, $ibx->eidx_key);
+	$new_smsg->{eidx_key} = $ibx->eidx_key;
+	$idx->index_eml($eml, $new_smsg);
 	check_batch_limit($req);
 }
 
@@ -578,7 +579,8 @@ sub _reindex_finalize ($$$) {
 	my $top_smsg = pop @$stable;
 	$top_smsg == $smsg or die 'BUG: top_smsg != smsg';
 	my $ibx = _ibx_for($self, $sync, $smsg);
-	$idx->index_eml($eml, $smsg, $ibx->eidx_key);
+	$smsg->{eidx_key} = $ibx->eidx_key;
+	$idx->index_eml($eml, $smsg);
 	for my $x (reverse @$stable) {
 		$ibx = _ibx_for($self, $sync, $x);
 		my $hdr = delete $x->{hdr} // die 'BUG: no {hdr}';
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index ea261bda..7ee8a121 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -49,8 +49,7 @@ sub ipc_atfork_child { # called automatically before ipc_worker_loop
 }
 
 sub index_eml {
-	my ($self, $eml, $smsg, $eidx_key) = @_;
-	$smsg->{eidx_key} = $eidx_key if defined $eidx_key;
+	my ($self, $eml, $smsg) = @_;
 	$self->ipc_do('add_xapian', $eml, $smsg);
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/11] t/extsearch: use autodie to detect chmod failures
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (6 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 07/11] extindex: avoid branch in ->index_eml Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 09/11] t/extsearch: use xsys_e to detect errors Eric Wong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

We'll also drop a commented out call to chmod while we're at it.
---
 t/extsearch.t | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/extsearch.t b/t/extsearch.t
index 797aa8f5..37e1f024 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -7,7 +7,7 @@ use PublicInbox::Config;
 use PublicInbox::InboxWritable;
 require_git(2.6);
 require_mods(qw(json DBD::SQLite Xapian));
-use autodie qw(open rename truncate unlink);
+use autodie qw(chmod open rename truncate unlink);
 require PublicInbox::Search;
 use_ok 'PublicInbox::ExtSearch';
 use_ok 'PublicInbox::ExtSearchIdx';
@@ -416,7 +416,6 @@ if ('dedupe + dry-run') {
 		'--dry-run alone fails');
 }
 
-# chmod 0755, $home or xbail "chmod: $!";
 for my $j (1, 3, 6) {
 	my $o = { 2 => \(my $err = '') };
 	my $d = "$home/extindex-j$j";
@@ -437,7 +436,7 @@ SKIP: {
 	is($nshards1, 1, 'correct shard count');
 
 	my @ei_dir = glob("$d/ei*/");
-	chmod 0755, $ei_dir[0] or xbail "chmod: $!";
+	chmod 0755, $ei_dir[0];
 	my $mode = sprintf('%04o', 07777 & (stat($ei_dir[0]))[2]);
 	is($mode, '0755', 'mode set on ei*/ dir');
 	my $o = { 2 => \(my $err = '') };

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/11] t/extsearch: use xsys_e to detect errors
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (7 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 08/11] t/extsearch: use autodie to detect chmod failures Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 10/11] extindex: support extindex.*.indexheader Eric Wong
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

More error detection is better in case we test on overloaded or
broken systems.
---
 t/extsearch.t | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/extsearch.t b/t/extsearch.t
index 37e1f024..16d75f63 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -51,7 +51,7 @@ ok(run_script([qw(-extindex --dangerous --all), "$home/extindex"]),
 }
 
 if ('with boost') {
-	xsys([qw(git config publicinbox.v1test.boost), 10],
+	xsys_e([qw(git config publicinbox.v1test.boost), 10],
 		{ GIT_CONFIG => $cfg_path });
 	ok(run_script([qw(-extindex --all), "$home/extindex-b"]),
 		'extindex init with boost');
@@ -66,7 +66,7 @@ if ('with boost') {
 	is(scalar(@$xref3), 2, 'only to entries');
 	undef $es;
 
-	xsys([qw(git config publicinbox.v2test.boost), 20],
+	xsys_e([qw(git config publicinbox.v2test.boost), 20],
 		{ GIT_CONFIG => $cfg_path });
 	ok(run_script([qw(-extindex --all --reindex), "$home/extindex-b"]),
 		'extindex --reindex with altered boost');
@@ -88,9 +88,9 @@ if ('with boost') {
 	like($v2[0], qr/\Av2\.example.*?\b\Q$smsg->{blob}\E\b/,
 		'smsg->{blob} respected boost across 2 index runs');
 
-	xsys([qw(git config --unset publicinbox.v1test.boost)],
+	xsys_e([qw(git config --unset publicinbox.v1test.boost)],
 		{ GIT_CONFIG => $cfg_path });
-	xsys([qw(git config --unset publicinbox.v2test.boost)],
+	xsys_e([qw(git config --unset publicinbox.v2test.boost)],
 		{ GIT_CONFIG => $cfg_path });
 }
 
@@ -392,7 +392,7 @@ SELECT MIN(tid) FROM over WHERE num > 0
 }
 
 if ('remove v1test and test gc') {
-	xsys([qw(git config --unset publicinbox.v1test.inboxdir)],
+	xsys_e([qw(git config --unset publicinbox.v1test.inboxdir)],
 		{ GIT_CONFIG => $cfg_path });
 	my $opt = { 2 => \(my $err = '') };
 	ok(run_script([qw(-extindex --gc), "$home/extindex"], undef, $opt),

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/11] extindex: support extindex.*.indexheader
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (8 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 09/11] t/extsearch: use xsys_e to detect errors Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-10  9:00 ` [PATCH 11/11] extindex: support per-inbox indexheader+altid Eric Wong
  2024-08-12 13:55 ` [PATCH 00/11] indexheader + altid enhancements Konstantin Ryabitsev
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

Per-inbox (publicinbox.*.indexheader) is next...
---
 lib/PublicInbox/Config.pm       |  3 ++-
 lib/PublicInbox/ExtSearchIdx.pm | 17 ++++++++++++++++-
 lib/PublicInbox/SearchIdx.pm    |  1 +
 t/extsearch.t                   | 34 ++++++++++++++++++++++++++++++++-
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 3af5f23c..b40e96f1 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -565,12 +565,13 @@ sub _fill_ei ($$) {
 		my $v = get_1($self, "$pfx.$k") // next;
 		$es->{$k} = $v;
 	}
-	for my $k (qw(coderepo hide url infourl)) {
+	for my $k (qw(coderepo hide url infourl indexheader altid)) {
 		my $v = $self->{"$pfx.$k"} // next;
 		$es->{$k} = _array($v);
 	}
 	return unless valid_foo_name($name, 'extindex');
 	$es->{name} = $name;
+	$es->load_extra_indexers($es);
 	$es;
 }
 
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 68700c8b..094821a3 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -35,6 +35,7 @@ use PublicInbox::Eml;
 use PublicInbox::DS qw(now add_timer);
 use DBI qw(:sql_types); # SQL_BLOB
 use PublicInbox::Admin qw(fmt_localtime);
+use PublicInbox::Config qw(rel2abs_collapsed);
 
 sub new {
 	my (undef, $dir, $opt) = @_;
@@ -86,7 +87,21 @@ sub _ibx_attach { # each_inbox callback
 sub attach_config {
 	my ($self, $cfg, $ibxs) = @_;
 	$self->{cfg} = $cfg;
-	my $types;
+	my ($types, $ro);
+
+	# lookup extindex.$NAME.<indexheader|altid>
+	my $eidx_dir = rel2abs_collapsed($self->{topdir});
+	for my $k (grep(/\Aextindex\.(?:.+)\.topdir\z/, keys %$cfg)) {
+		next if rel2abs_collapsed($cfg->{$k}) ne $eidx_dir;
+		my $n = substr($k, length('extindex.'), -length('.topdir'));
+		$ro = $cfg->lookup_ei($n) and last;
+	}
+
+	# and copy from read-only to our read-write $self
+	for my $f (qw(altid indexheader)) {
+		$self->{$f} = $ro->{$f} if defined $ro->{$f};
+	}
+
 	if ($ibxs) {
 		for my $ibx (@$ibxs) {
 			$self->{ibx_map}->{$ibx->eidx_key} //= do {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index b2576e52..53c16e55 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -1169,6 +1169,7 @@ sub eidx_shard_new {
 		creat => 1,
 	}, $class;
 	$self->{-set_indexlevel_once} = 1 if $self->{indexlevel} eq 'medium';
+	$self->load_extra_indexers($eidx);
 	$self;
 }
 
diff --git a/t/extsearch.t b/t/extsearch.t
index 16d75f63..0ea5bc5b 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -8,7 +8,7 @@ use PublicInbox::InboxWritable;
 require_git(2.6);
 require_mods(qw(json DBD::SQLite Xapian));
 use autodie qw(chmod open rename truncate unlink);
-require PublicInbox::Search;
+use PublicInbox::Search;
 use_ok 'PublicInbox::ExtSearch';
 use_ok 'PublicInbox::ExtSearchIdx';
 use_ok 'PublicInbox::OverIdx';
@@ -26,6 +26,7 @@ ok(run_script([qw(-init -Lbasic -V2 v2test --newsgroup v2.example),
 	"$home/v2test", 'http://example.com/v2test', $v2addr ]), 'v2test init');
 my $env = { ORIGINAL_RECIPIENT => $v2addr };
 my $eml = eml_load('t/utf8.eml');
+my $eidxdir = "$home/extindex";
 
 $eml->header_set('List-Id', '<v2.example.com>');
 
@@ -592,4 +593,35 @@ test_lei(sub {
 		'noted unindexed extindex is unsupported');
 });
 
+if ('indexheader support') {
+	xsys_e [qw(git config extindex.all.indexheader
+		boolean_term:xarchiveshash:X-Archives-Hash)],
+		{ GIT_CONFIG => $cfg_path };
+	my $eml = eml_load('t/plack-qp.eml');
+	$eml->header_set('X-Archives-Hash', 'deadbeefcafe');
+	$in = \($eml->as_string);
+	$env->{ORIGINAL_RECIPIENT} = $v2addr;
+	run_script([qw(-mda --no-precheck)], $env, { 0 => $in }) or
+		xbail '-mda';
+	ok run_script([qw(-extindex --all -vvv), $eidxdir]),
+		'extindex update';
+	$es = PublicInbox::Config->new($cfg_path)->ALL;
+	my $mset = $es->mset('xarchiveshash:deadbeefcafe');
+	is $mset->size, 1, 'extindex.*.indexheader works';
+	local $PublicInbox::Search::XHC = eval {
+		require PublicInbox::XhcMset;
+		PublicInbox::XapClient::start_helper('-j0');
+	} or xbail "no XHC: $@";
+	my @args;
+	$es->async_mset('xarchiveshash:deadbeefcafe', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on hit';
+	is $args[0]->size, 1, 'async mset hit works';
+	ok !$args[1], 'no error on hit';
+	@args = ();
+	$es->async_mset('xarchiveshash:cafebeefdead', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on miss';
+	is $args[0]->size, 0, 'async mset miss works';
+	ok !$args[1], 'no error on miss';
+}
+
 done_testing;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/11] extindex: support per-inbox indexheader+altid
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (9 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 10/11] extindex: support extindex.*.indexheader Eric Wong
@ 2024-08-10  9:00 ` Eric Wong
  2024-08-12 13:55 ` [PATCH 00/11] indexheader + altid enhancements Konstantin Ryabitsev
  11 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2024-08-10  9:00 UTC (permalink / raw)
  To: meta

This allows the venerable altid (e.g. gmane:1234) to finally
work for extindex users.  The newer indexheader directive works
here, too.  This allows a multi-inbox extindex to fully emulate
the capabilities of per-inbox Xapian indices.

For now, per-inbox indexheader and altid DO NOT work when
searching the extindex directly.  In other words, gmane:1234
might work on the /git/ inbox, but not the /all/ extindex
virtual inbox.  This may remain the case since altid is
typically per-inbox only, and stuff like X-Archives-Hash
can be global across inboxes.
---
 lib/PublicInbox/Config.pm       |  2 +-
 lib/PublicInbox/ExtSearchIdx.pm |  1 +
 lib/PublicInbox/Isearch.pm      | 14 ++++-
 lib/PublicInbox/SearchIdx.pm    | 20 +++++--
 t/extsearch.t                   | 98 +++++++++++++++++++++++++++++----
 5 files changed, 117 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index b40e96f1..cda3045e 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -571,7 +571,7 @@ sub _fill_ei ($$) {
 	}
 	return unless valid_foo_name($name, 'extindex');
 	$es->{name} = $name;
-	$es->load_extra_indexers($es);
+	$es->load_extra_indexers($es); # extindex.*.{altid,indexheader}
 	$es;
 }
 
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 094821a3..cead0f8a 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -21,6 +21,7 @@ use Carp qw(croak carp);
 use Scalar::Util qw(blessed);
 use Sys::Hostname qw(hostname);
 use File::Glob qw(bsd_glob GLOB_NOSORT);
+use PublicInbox::Isearch;
 use PublicInbox::MultiGit;
 use PublicInbox::Spawn ();
 use PublicInbox::Search;
diff --git a/lib/PublicInbox/Isearch.pm b/lib/PublicInbox/Isearch.pm
index 9566f710..5f22c2f2 100644
--- a/lib/PublicInbox/Isearch.pm
+++ b/lib/PublicInbox/Isearch.pm
@@ -11,7 +11,11 @@ use PublicInbox::Search;
 
 sub new {
 	my (undef, $ibx, $es) = @_;
-	bless { es => $es, eidx_key => $ibx->eidx_key }, __PACKAGE__;
+	my $self = bless { es => $es, eidx_key => $ibx->eidx_key }, __PACKAGE__;
+	# load publicinbox.*.{altid,indexheader}
+	PublicInbox::Search::load_extra_indexers($self, $ibx);
+	push @{$self->{-extra}}, @{$es->{-extra} // []} if $self->{-extra};
+	$self;
 }
 
 sub _ibx_id ($) {
@@ -55,14 +59,22 @@ SELECT MAX(docid) FROM xref3 WHERE ibx_id = ? AND xnum >= ? AND xnum <= ?
 	\%opt;
 }
 
+sub _isrch_qparse ($) {
+	my ($self) = @_;
+	local $self->{es}->{-extra} = $self->{-extra};
+	$self->{es}->qparse_new; # XXX worth memoizing?
+}
+
 sub mset {
 	my ($self, $str, $opt) = @_;
+	local $self->{es}->{qp} = _isrch_qparse($self) if $self->{-extra};
 	$self->{es}->mset($str, eidx_mset_prep $self, $opt);
 }
 
 sub async_mset {
 	my ($self, $str, $opt, $cb, @args) = @_;
 	$opt = eidx_mset_prep $self, $opt;
+	local $self->{es}->{-extra} = $self->{-extra} if $self->{-extra};
 	$self->{es}->async_mset($str, $opt, $cb, @args);
 }
 
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 53c16e55..7829c7d4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -475,9 +475,8 @@ sub eml2doc ($$$;$) {
 	term_generator($self)->set_document($doc);
 	index_headers($self, $smsg);
 
-	if (defined(my $eidx_key = $smsg->{eidx_key})) {
-		$doc->add_boolean_term('O'.$eidx_key) if $eidx_key ne '.';
-	}
+	my $ekey = $smsg->{eidx_key};
+	$doc->add_boolean_term('O'.$ekey) if ($ekey // '.') ne '.';
 	msg_iter($eml, \&index_xapian, [ $self, $doc ]);
 	index_ids($self, $doc, $eml, $mids);
 
@@ -491,9 +490,10 @@ sub eml2doc ($$$;$) {
 		my $data = $smsg->to_doc_data;
 		$doc->set_data($data);
 	}
-
-	for my $extra (@{$self->{-extra} // []}) {
-		$extra->index_extra($self, $eml, $mids);
+	my $xtra = defined $ekey ? $self->{"-extra\t$ekey"} : undef;
+	$xtra //= $self->{-extra};
+	for my $e (@$xtra) {
+		$e->index_extra($self, $eml, $mids);
 	}
 	$doc;
 }
@@ -1170,6 +1170,14 @@ sub eidx_shard_new {
 	}, $class;
 	$self->{-set_indexlevel_once} = 1 if $self->{indexlevel} eq 'medium';
 	$self->load_extra_indexers($eidx);
+	require PublicInbox::Isearch;
+	my $all = $self->{-extra};
+	for my $ibx (@{$eidx->{ibx_active} // []}) {
+		my $isrch = PublicInbox::Isearch->new($ibx);
+		my $per_ibx = $isrch->{-extra} // next;
+		$self->{"-extra\t$isrch->{eidx_key}"} =
+					$all ? [ @$per_ibx, @$all ] : $per_ibx;
+	}
 	$self;
 }
 
diff --git a/t/extsearch.t b/t/extsearch.t
index 0ea5bc5b..28c43763 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -593,6 +593,7 @@ test_lei(sub {
 		'noted unindexed extindex is unsupported');
 });
 
+require PublicInbox::XhcMset;
 if ('indexheader support') {
 	xsys_e [qw(git config extindex.all.indexheader
 		boolean_term:xarchiveshash:X-Archives-Hash)],
@@ -608,20 +609,97 @@ if ('indexheader support') {
 	$es = PublicInbox::Config->new($cfg_path)->ALL;
 	my $mset = $es->mset('xarchiveshash:deadbeefcafe');
 	is $mset->size, 1, 'extindex.*.indexheader works';
-	local $PublicInbox::Search::XHC = eval {
-		require PublicInbox::XhcMset;
-		PublicInbox::XapClient::start_helper('-j0');
-	} or xbail "no XHC: $@";
+	local $PublicInbox::Search::XHC =
+			PublicInbox::XapClient::start_helper('-j0') or
+			xbail "no XHC: $@";
 	my @args;
 	$es->async_mset('xarchiveshash:deadbeefcafe', {} , sub { @args = @_ });
-	is scalar(@args), 2, 'no extra args on hit';
-	is $args[0]->size, 1, 'async mset hit works';
-	ok !$args[1], 'no error on hit';
+	is scalar(@args), 2, 'no extra args on xarchiveshash hit';
+	is $args[0]->size, 1, 'async mset xarchiveshash hit works';
+	ok !$args[1], 'no error on xarchiveshash hit';
 	@args = ();
 	$es->async_mset('xarchiveshash:cafebeefdead', {} , sub { @args = @_ });
-	is scalar(@args), 2, 'no extra args on miss';
-	is $args[0]->size, 0, 'async mset miss works';
-	ok !$args[1], 'no error on miss';
+	is scalar(@args), 2, 'no extra args on xarchiveshash miss';
+	is $args[0]->size, 0, 'async mset xarchivehash miss works';
+	ok !$args[1], 'no error on xarchiveshash miss';
+}
+
+if ('per-inbox altid w/ extindex') {
+	my $another = 'another-nntp.sqlite3';
+	my $altid = [ "serial:gmane:file=$another" ];
+	my $aibx = create_inbox 'v2', version => 2, indexlevel => 'basic',
+				altid => $altid, sub {
+		my ($im, $ibx) = @_;
+		my $mm = PublicInbox::Msgmap->new_file(
+					"$ibx->{inboxdir}/$another", 2);
+		$mm->mid_set(1234, 'a@example.com') == 1 or xbail 'mid_set';
+		$im->add(PublicInbox::Eml->new(<<'EOF')) or BAIL_OUT;
+From: a@example.com
+To: b@example.com
+Subject: boo!
+Message-ID: <a@example.com>
+X-Archives-Hash: dadfad
+Organization: felonious feline family
+
+hello world gmane:666
+EOF
+	};
+	PublicInbox::IO::write_file '>>', $cfg_path, <<EOF;
+[publicinbox "altid-test"]
+	inboxdir = $aibx->{inboxdir}
+	address = b\@example.com
+	altid = $altid->[0]
+	indexheader = phrase:organization:Organization
+EOF
+	ok run_script([qw(-extindex --all -vvv), $eidxdir]),
+		'extindex update w/ altid';
+	local $PublicInbox::Search::XHC =
+			PublicInbox::XapClient::start_helper('-j0') or
+			xbail "no XHC: $@";
+	my @args;
+	my $pi_cfg = PublicInbox::Config->new($cfg_path);
+	my $ibx = $pi_cfg->lookup('b@example.com');
+	my $mset = $ibx->isrch->mset('gmane:1234');
+
+	is $mset->size, 1, 'isrch->mset altid hit';
+	$ibx->isrch->async_mset('gmane:1234', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on altid hit';
+	is $args[0]->size, 1, 'isrch->async_mset altid hit';
+
+	$mset = $ibx->isrch->mset('organization:felonious');
+	is $mset->size, 1, 'isrch->mset indexheader hit';
+	@args = ();
+	$ibx->isrch->async_mset('organization:felonious', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on indexheader hit';
+	is $args[0]->size, 1, 'isrch->async_mset indexheader hit';
+
+	$mset = $ibx->isrch->mset('organization:world');
+	is $mset->size, 0, 'isrch->mset indexheader miss';
+	@args = ();
+	$ibx->isrch->async_mset('organization:world', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on indexheader miss';
+	is $args[0]->size, 0, 'isrch->async_mset indexheader miss';
+
+	$mset = $ibx->isrch->mset('xarchiveshash:deadbeefcafe');
+	is $mset->size, 0, 'isrch->mset does not cross inbox on indexheader';
+	$mset = $ibx->isrch->mset('xarchiveshash:dadfad');
+	is $mset->size, 1, 'isrch->mset hits global indexheader';
+
+	$es = $pi_cfg->ALL;
+	$mset = $es->mset('xarchiveshash:dadfad');
+	is $mset->size, 1, 'esrch->mset global indexheader hit';
+	$mset = $es->mset('gmane:1234');
+	is $mset->size, 1, '->mset altid hit works globally';
+
+	$mset = $es->mset('gmane:666');
+	is $mset->size, 0, 'global ->mset hits';
+	$mset = $ibx->isrch->mset('gmane:666');
+	is $mset->size, 0, 'isrch->mset altid miss works';
+
+	@args = ();
+	$ibx->isrch->async_mset('gmane:666', {} , sub { @args = @_ });
+	is scalar(@args), 2, 'no extra args on altid miss';
+	is $args[0]->size, 0, 'isrch->async_mset altid miss works';
 }
 
 done_testing;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 00/11] indexheader + altid enhancements
  2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
                   ` (10 preceding siblings ...)
  2024-08-10  9:00 ` [PATCH 11/11] extindex: support per-inbox indexheader+altid Eric Wong
@ 2024-08-12 13:55 ` Konstantin Ryabitsev
  11 siblings, 0 replies; 13+ messages in thread
From: Konstantin Ryabitsev @ 2024-08-12 13:55 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Sat, Aug 10, 2024 at 09:00:01AM GMT, Eric Wong wrote:
> The indexheader feature allows arbitrary headers to be indexed.
> This can be done with both per-inbox Xapian indices and extindex
> spanning multiple inboxes.

Nice! I will try this out shortly and let you know how it's looking.

-K

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-12 14:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-10  9:00 [PATCH 00/11] indexheader + altid enhancements Eric Wong
2024-08-10  9:00 ` [PATCH 01/11] search: support per-inbox indexheader directive Eric Wong
2024-08-10  9:00 ` [PATCH 02/11] indexheader: deduplicate common values Eric Wong
2024-08-10  9:00 ` [PATCH 03/11] search: help: avoid ':' in user prefixes Eric Wong
2024-08-10  9:00 ` [PATCH 04/11] search: move QueryParser mappings to xh_args Eric Wong
2024-08-10  9:00 ` [PATCH 05/11] www_text: show indexheader contents in help Eric Wong
2024-08-10  9:00 ` [PATCH 06/11] www: don't memoize ->user_help contents Eric Wong
2024-08-10  9:00 ` [PATCH 07/11] extindex: avoid branch in ->index_eml Eric Wong
2024-08-10  9:00 ` [PATCH 08/11] t/extsearch: use autodie to detect chmod failures Eric Wong
2024-08-10  9:00 ` [PATCH 09/11] t/extsearch: use xsys_e to detect errors Eric Wong
2024-08-10  9:00 ` [PATCH 10/11] extindex: support extindex.*.indexheader Eric Wong
2024-08-10  9:00 ` [PATCH 11/11] extindex: support per-inbox indexheader+altid Eric Wong
2024-08-12 13:55 ` [PATCH 00/11] indexheader + altid enhancements Konstantin Ryabitsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).