unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 06/12] miscidx: put grokmirror manifest entries in Xapian docdata
Date: Mon, 23 Nov 2020 07:05:56 +0000	[thread overview]
Message-ID: <20201123070602.9698-7-e@80x24.org> (raw)
In-Reply-To: <20201123070602.9698-1-e@80x24.org>

This should make it possible for us quickly generate
manifest.js.gz files with less random I/O and process
spawning in the WWW code.
---
 lib/PublicInbox/MiscIdx.pm   | 15 +++++++++++++++
 script/public-inbox-extindex |  1 +
 t/extsearch.t                |  7 ++++++-
 t/miscsearch.t               |  3 +++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/MiscIdx.pm b/lib/PublicInbox/MiscIdx.pm
index edc70f9b..9dcc96b7 100644
--- a/lib/PublicInbox/MiscIdx.pm
+++ b/lib/PublicInbox/MiscIdx.pm
@@ -20,6 +20,7 @@ use PublicInbox::Spawn qw(nodatacow_dir);
 use Carp qw(croak);
 use File::Path ();
 use PublicInbox::MiscSearch;
+use PublicInbox::Config;
 
 sub new {
 	my ($class, $eidx) = @_;
@@ -97,6 +98,20 @@ EOF
 		}
 	}
 	index_text($self, $ibx->{name}, 1, 'XNAME');
+	my $data = {};
+	if (defined(my $max = $ibx->max_git_epoch)) { # v2
+		my $desc = $ibx->description;
+		my $pfx = "/$ibx->{name}/git/";
+		for my $epoch (0..$max) {
+			my $git = $ibx->git_epoch($epoch) or return;
+			if (my $ent = $git->manifest_entry($epoch, $desc)) {
+				$data->{"$pfx$epoch.git"} = $ent;
+			}
+		}
+	} elsif (my $ent = $ibx->git->manifest_entry) { # v1
+		$data->{"/$ibx->{name}"} = $ent;
+	}
+	$doc->set_data(PublicInbox::Config::json()->encode($data));
 	if (defined $docid) {
 		$xdb->replace_document($docid, $doc);
 	} else {
diff --git a/script/public-inbox-extindex b/script/public-inbox-extindex
index 78d6d9d9..20a0737c 100644
--- a/script/public-inbox-extindex
+++ b/script/public-inbox-extindex
@@ -38,6 +38,7 @@ require PublicInbox::Admin;
 my $cfg = PublicInbox::Config->new;
 my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt, $cfg);
 PublicInbox::Admin::require_or_die(qw(-search));
+PublicInbox::Config::json() or die "Cpanel::JSON::XS or similar missing\n";
 PublicInbox::Admin::progress_prepare($opt);
 my $env = PublicInbox::Admin::index_prepare($opt, $cfg);
 local %ENV = (%ENV, %$env) if $env;
diff --git a/t/extsearch.t b/t/extsearch.t
index e28e2f71..dc825bf4 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -4,7 +4,9 @@
 use strict;
 use Test::More;
 use PublicInbox::TestCommon;
+use PublicInbox::Config;
 use Fcntl qw(:seek);
+my $json = PublicInbox::Config::json() or plan skip_all => 'JSON missing';
 require_git(2.6);
 require_mods(qw(DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::ExtSearch';
@@ -73,6 +75,9 @@ my $es = PublicInbox::ExtSearch->new("$home/eindex");
 }
 
 my $misc = $es->misc;
-is(scalar($misc->mset('')->items), 2, 'two inboxes');
+my @it = $misc->mset('')->items;
+is(scalar(@it), 2, 'two inboxes');
+like($it[0]->get_document->get_data, qr/v2test/, 'docdata matched v2');
+like($it[1]->get_document->get_data, qr/v1test/, 'docdata matched v1');
 
 done_testing;
diff --git a/t/miscsearch.t b/t/miscsearch.t
index 45a19da9..0ba79194 100644
--- a/t/miscsearch.t
+++ b/t/miscsearch.t
@@ -50,5 +50,8 @@ is(scalar($mset->items), 1, 'match partial address');
 
 $mset = $ms->mset('hope');
 is(scalar($mset->items), 1, 'match name');
+my $mi = ($mset->items)[0];
+my $doc = $mi->get_document;
+is($doc->get_data, '{}', 'stored empty data');
 
 done_testing;

  parent reply	other threads:[~2020-11-23  7:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23  7:05 [PATCH 00/12] extindex: speed up manifest.js.gz generation Eric Wong
2020-11-23  7:05 ` [PATCH 01/12] miscsearch: a new Xapian sub-DB for extindex Eric Wong
2020-11-23  7:05 ` [PATCH 02/12] move JSON module portability into PublicInbox::Config Eric Wong
2020-11-23  7:05 ` [PATCH 03/12] git: add manifest_entry method Eric Wong
2020-11-23  7:05 ` [PATCH 04/12] manifest: use ibx->git_epoch method for v2 Eric Wong
2020-11-23  7:05 ` [PATCH 05/12] inbox: git_epoch: remove ->version check Eric Wong
2020-11-23  7:05 ` Eric Wong [this message]
2020-11-23  7:05 ` [PATCH 07/12] extsearch: fix remaining "eindex" references Eric Wong
2020-11-23  7:05 ` [PATCH 08/12] miscidx: cleanup git processes after manifest indexing Eric Wong
2020-11-23  7:05 ` [PATCH 09/12] miscidx: store absolute git_dir of each epoch in docdata Eric Wong
2020-11-23  7:06 ` [PATCH 10/12] extsearchidx: do not short-circuit MiscIdx on no-op v2 prepare Eric Wong
2020-11-23  7:06 ` [PATCH 11/12] manifest: support faster generation via [extindex "all"] Eric Wong
2020-11-23  7:06 ` [PATCH 12/12] *search: simplify retry_reopen users Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201123070602.9698-7-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).