unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/7] lei - Local Email Interface skeleton
@ 2020-12-15 11:47 Eric Wong
  2020-12-15 11:47 ` [PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize Eric Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

patches 1 and 2 are boring cleanups.

The most important is 4/7 which features data structures
for a proposed command set.  Hopefully the command-names
and 1-line descriptions are helpful.

Comments from (potential) users appreciated, especially about 4/7.


I decided to take care of patch 3/7 (FD-passing) early on
because startup latency sucks.

I never used notmuch, but this will feature saved searches (aka
"named queries").  Otherwise, the query subcommand will probably
operate like mairix and dump the results to a
Maildir/mbox/etc...

patch 5/7 - keywords (e.g. `seen', 'draft', ...) read/write
(but not query) support added.

And a couple more cleanups.

lei will have its own writable git storage on top of extindex,
but will be able to do read-only queries against extinbox
(publicinbox || extindex) sources.

Eric Wong (7):
  daemon: support --daemonize without Net::Server::Daemonize
  daemon: simplify fork() failure checks
  lei: FD-passing and IPC basics
  lei: proposed command-listing and options
  lei_store: local storage for Local Email Interface
  tests: more common JSON module loading
  lei: use spawn (vfork + execve) for lazy start

 MANIFEST                          |   6 +
 lib/PublicInbox/Daemon.pm         |  26 +-
 lib/PublicInbox/ExtSearch.pm      |   4 +-
 lib/PublicInbox/ExtSearchIdx.pm   |  35 ++-
 lib/PublicInbox/Import.pm         |   4 +
 lib/PublicInbox/LeiDaemon.pm      | 449 ++++++++++++++++++++++++++++++
 lib/PublicInbox/LeiSearch.pm      |  40 +++
 lib/PublicInbox/LeiStore.pm       | 197 +++++++++++++
 lib/PublicInbox/ManifestJsGz.pm   |   2 +-
 lib/PublicInbox/OverIdx.pm        |  10 +
 lib/PublicInbox/SearchIdx.pm      |  47 +++-
 lib/PublicInbox/SearchIdxShard.pm |  33 +++
 lib/PublicInbox/TestCommon.pm     |   4 +
 lib/PublicInbox/V2Writable.pm     |   2 +-
 script/lei                        |  64 +++++
 t/extsearch.t                     |   3 +-
 t/lei.t                           |  79 ++++++
 t/lei_store.t                     |  74 +++++
 t/www_listing.t                   |   8 +-
 19 files changed, 1055 insertions(+), 32 deletions(-)
 create mode 100644 lib/PublicInbox/LeiDaemon.pm
 create mode 100644 lib/PublicInbox/LeiSearch.pm
 create mode 100644 lib/PublicInbox/LeiStore.pm
 create mode 100755 script/lei
 create mode 100644 t/lei.t
 create mode 100644 t/lei_store.t


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 11:47 ` [PATCH 2/7] daemon: simplify fork() failure checks Eric Wong
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

We don't actually need Net::Server::Daemonize to support
the --daemonize flag, since the daemonize() sub provided
by N::S::D doesn't exactly do the things we want.
---
 lib/PublicInbox/Daemon.pm | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 155707e1..fdedaee7 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -213,16 +213,12 @@ sub daemonize () {
 
 		chdir '/' or die "chdir failed: $!";
 	}
-
-	return unless (defined $pid_file || defined $group || defined $user
-			|| $daemonize);
-
-	eval { require Net::Server::Daemonize };
-	if ($@) {
-		die
-"Net::Server required for --pid-file, --group, --user, and --daemonize\n$@\n";
+	if (defined($pid_file) || defined($group) || defined($user)) {
+		eval { require Net::Server::Daemonize; 1 } // die <<EOF;
+Net::Server required for --pid-file, --group, --user
+$@
+EOF
 	}
-
 	Net::Server::Daemonize::check_pid_file($pid_file) if defined $pid_file;
 	$uid = Net::Server::Daemonize::get_uid($user) if defined $user;
 	if (defined $group) {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] daemon: simplify fork() failure checks
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
  2020-12-15 11:47 ` [PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 11:47 ` [RFC 3/7] lei: FD-passing and IPC basics Eric Wong
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

The defined-or `//' operator in 5.10 allows us to golf down
our code slightly.
---
 lib/PublicInbox/Daemon.pm | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index fdedaee7..a2171535 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -237,8 +237,7 @@ EOF
 	};
 
 	if ($daemonize) {
-		my $pid = fork;
-		die "could not fork: $!\n" unless defined $pid;
+		my $pid = fork // die "fork: $!";
 		exit if $pid;
 
 		open(STDIN, '+<', '/dev/null') or
@@ -246,8 +245,7 @@ EOF
 		open STDOUT, '>&STDIN' or die "redirect stdout failed: $!\n";
 		open STDERR, '>&STDIN' or die "redirect stderr failed: $!\n";
 		POSIX::setsid();
-		$pid = fork;
-		die "could not fork: $!\n" unless defined $pid;
+		$pid = fork // die "fork: $!";
 		exit if $pid;
 	}
 	return unless defined $pid_file;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 3/7] lei: FD-passing and IPC basics
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
  2020-12-15 11:47 ` [PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize Eric Wong
  2020-12-15 11:47 ` [PATCH 2/7] daemon: simplify fork() failure checks Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 11:47 ` [RFC 4/7] lei: proposed command-listing and options Eric Wong
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

The start of lei, a Local Email Interface.  It'll support a
daemon via FD passing to avoid startup time penalties if
IO::FDPass is installed, but fall back to a slow one-shot mode
if not.

Compared to traditional socket daemon, FD passing should allow
us to eventually do stuff like run "git show" and still have
proper terminal support for pager and color.
---
 MANIFEST                     |   3 +
 lib/PublicInbox/Daemon.pm    |   6 +-
 lib/PublicInbox/LeiDaemon.pm | 303 +++++++++++++++++++++++++++++++++++
 script/lei                   |  58 +++++++
 t/lei.t                      |  80 +++++++++
 5 files changed, 448 insertions(+), 2 deletions(-)
 create mode 100644 lib/PublicInbox/LeiDaemon.pm
 create mode 100755 script/lei
 create mode 100644 t/lei.t

diff --git a/MANIFEST b/MANIFEST
index ac442606..7536b7c2 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -159,6 +159,7 @@ lib/PublicInbox/InboxIdle.pm
 lib/PublicInbox/InboxWritable.pm
 lib/PublicInbox/Isearch.pm
 lib/PublicInbox/KQNotify.pm
+lib/PublicInbox/LeiDaemon.pm
 lib/PublicInbox/Linkify.pm
 lib/PublicInbox/Listener.pm
 lib/PublicInbox/Lock.pm
@@ -226,6 +227,7 @@ sa_config/Makefile
 sa_config/README
 sa_config/root/etc/spamassassin/public-inbox.pre
 sa_config/user/.spamassassin/user_prefs
+script/lei
 script/public-inbox-compact
 script/public-inbox-convert
 script/public-inbox-edit
@@ -316,6 +318,7 @@ t/indexlevels-mirror.t
 t/init.t
 t/iso-2202-jp.eml
 t/kqnotify.t
+t/lei.t
 t/linkify.t
 t/main-bin/spamc
 t/mda-mime.eml
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index a2171535..6b92b60d 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -1,7 +1,9 @@
 # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-# contains common daemon code for the httpd, imapd, and nntpd servers.
-# This may be used for read-only IMAP server if we decide to implement it.
+#
+# Contains common daemon code for the httpd, imapd, and nntpd servers
+# and designed for handling thousands of untrusted clients over slow
+# and/or lossy connections.
 package PublicInbox::Daemon;
 use strict;
 use warnings;
diff --git a/lib/PublicInbox/LeiDaemon.pm b/lib/PublicInbox/LeiDaemon.pm
new file mode 100644
index 00000000..ae40b3a6
--- /dev/null
+++ b/lib/PublicInbox/LeiDaemon.pm
@@ -0,0 +1,303 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Backend for `lei' (local email interface).  Unlike the C10K-oriented
+# PublicInbox::Daemon, this is designed exclusively to handle trusted
+# local clients with read/write access to the FS and use as many
+# system resources as the local user has access to.
+package PublicInbox::LeiDaemon;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::DS);
+use Getopt::Long ();
+use Errno qw(EAGAIN ECONNREFUSED ENOENT);
+use POSIX qw(setsid);
+use IO::Socket::UNIX;
+use IO::Handle ();
+use Sys::Syslog qw(syslog openlog);
+use PublicInbox::Syscall qw($SFD_NONBLOCK EPOLLIN EPOLLONESHOT);
+use PublicInbox::Sigfd;
+use PublicInbox::DS qw(now);
+use PublicInbox::Spawn qw(spawn);
+our $quit = sub { exit(shift // 0) };
+my $glp = Getopt::Long::Parser->new;
+$glp->configure(qw(gnu_getopt no_ignore_case auto_abbrev));
+
+sub x_it ($$) { # pronounced "exit"
+	my ($client, $code) = @_;
+	if (my $sig = ($code & 127)) {
+		kill($sig, $client->{pid} // $$);
+	} else {
+		$code >>= 8;
+		if (my $sock = $client->{sock}) {
+			say $sock "exit=$code";
+		} else { # for oneshot
+			$quit->($code);
+		}
+	}
+}
+
+sub emit ($$$) {
+	my ($client, $channel, $buf) = @_;
+	print { $client->{$channel} } $buf or warn "print FD[$channel]: $!";
+}
+
+sub fail ($$;$) {
+	my ($client, $buf, $exit_code) = @_;
+	$buf .= "\n" unless $buf =~ /\n\z/s;
+	emit($client, 2, $buf);
+	x_it($client, ($exit_code // 1) << 8);
+	undef;
+}
+
+sub _help ($;$) {
+	my ($client, $channel) = @_;
+	emit($client, $channel //= 1, <<EOF);
+usage: lei COMMAND [OPTIONS]
+
+...
+EOF
+	x_it($client, $channel == 2 ? 1 << 8 : 0); # stderr => failure
+}
+
+sub assert_args ($$$;$@) {
+	my ($client, $argv, $proto, $opt, @spec) = @_;
+	$opt //= {};
+	push @spec, qw(help|h);
+	$glp->getoptionsfromarray($argv, $opt, @spec) or
+		return fail($client, 'bad arguments or options');
+	if ($opt->{help}) {
+		_help($client);
+		undef;
+	} else {
+		my ($nreq, $rest) = split(/;/, $proto);
+		$nreq = (($nreq // '') =~ tr/$/$/);
+		my $argc = scalar(@$argv);
+		my $tot = ($rest // '') eq '@' ? $argc : ($proto =~ tr/$/$/);
+		return 1 if $argc <= $tot && $argc >= $nreq;
+		_help($client, 2);
+		undef
+	}
+}
+
+sub dispatch {
+	my ($client, $cmd, @argv) = @_;
+	local $SIG{__WARN__} = sub { emit($client, 2, "@_") };
+	local $SIG{__DIE__} = 'DEFAULT';
+	if (defined $cmd) {
+		my $func = "lei_$cmd";
+		$func =~ tr/-/_/;
+		if (my $cb = __PACKAGE__->can($func)) {
+			$client->{cmd} = $cmd;
+			$cb->($client, \@argv);
+		} elsif (grep(/\A-/, $cmd, @argv)) {
+			assert_args($client, [ $cmd, @argv ], '');
+		} else {
+			fail($client, "`$cmd' is not an lei command");
+		}
+	} else {
+		_help($client, 2);
+	}
+}
+
+sub lei_daemon_pid {
+	my ($client, $argv) = @_;
+	assert_args($client, $argv, '') and emit($client, 1, "$$\n");
+}
+
+sub lei_DBG_pwd {
+	my ($client, $argv) = @_;
+	assert_args($client, $argv, '') and
+		emit($client, 1, "$client->{env}->{PWD}\n");
+}
+
+sub lei_DBG_cwd {
+	my ($client, $argv) = @_;
+	require Cwd;
+	assert_args($client, $argv, '') and emit($client, 1, Cwd::cwd()."\n");
+}
+
+sub lei_DBG_false { x_it($_[0], 1 << 8) }
+
+sub lei_daemon_stop {
+	my ($client, $argv) = @_;
+	assert_args($client, $argv, '') and $quit->(0);
+}
+
+sub lei_help { _help($_[0]) }
+
+sub reap_exec { # dwaitpid callback
+	my ($client, $pid) = @_;
+	x_it($client, $?);
+}
+
+sub lei_git { # support passing through random git commands
+	my ($client, $argv) = @_;
+	my %opt = map { $_ => $client->{$_} } (0..2);
+	my $pid = spawn(['git', @$argv], $client->{env}, \%opt);
+	PublicInbox::DS::dwaitpid($pid, \&reap_exec, $client);
+}
+
+sub accept_dispatch { # Listener {post_accept} callback
+	my ($sock) = @_; # ignore other
+	$sock->blocking(1);
+	$sock->autoflush(1);
+	my $client = { sock => $sock };
+	vec(my $rin = '', fileno($sock), 1) = 1;
+	# `say $sock' triggers "die" in lei(1)
+	for my $i (0..2) {
+		if (select(my $rout = $rin, undef, undef, 1)) {
+			my $fd = IO::FDPass::recv(fileno($sock));
+			if ($fd >= 0) {
+				my $rdr = ($fd == 0 ? '<&=' : '>&=');
+				if (open(my $fh, $rdr, $fd)) {
+					$client->{$i} = $fh;
+				} else {
+					say $sock "open($rdr$fd) (FD=$i): $!";
+					return;
+				}
+			} else {
+				say $sock "recv FD=$i: $!";
+				return;
+			}
+		} else {
+			say $sock "timed out waiting to recv FD=$i";
+			return;
+		}
+	}
+	# $ARGV_STR = join("]\0[", @ARGV);
+	# $ENV_STR = join('', map { "$_=$ENV{$_}\0" } keys %ENV);
+	# $line = "$$\0\0>$ARGV_STR\0\0>$ENV_STR\0\0";
+	my ($client_pid, $argv, $env) = do {
+		local $/ = "\0\0\0"; # yes, 3 NULs at EOL, not 2
+		chomp(my $line = <$sock>);
+		split(/\0\0>/, $line, 3);
+	};
+	my %env = map { split(/=/, $_, 2) } split(/\0/, $env);
+	if (chdir($env{PWD})) {
+		$client->{env} = \%env;
+		$client->{pid} = $client_pid;
+		eval { dispatch($client, split(/\]\0\[/, $argv)) };
+		say $sock $@ if $@;
+	} else {
+		say $sock "chdir($env{PWD}): $!"; # implicit close
+	}
+}
+
+sub noop {}
+
+# lei(1) calls this when it can't connect
+sub lazy_start ($$) {
+	my ($path, $err) = @_;
+	if ($err == ECONNREFUSED) {
+		unlink($path) or die "unlink($path): $!";
+	} elsif ($err != ENOENT) {
+		die "connect($path): $!";
+	}
+	my $umask = umask(077) // die("umask(077): $!");
+	my $l = IO::Socket::UNIX->new(Local => $path,
+					Listen => 1024,
+					Type => SOCK_STREAM) or
+		$err = $!;
+	umask($umask) or die("umask(restore): $!");
+	$l or return $err;
+	my @st = stat($path) or die "stat($path): $!";
+	my $dev_ino_expect = pack('dd', $st[0], $st[1]); # dev+ino
+	pipe(my ($eof_r, $eof_w)) or die "pipe: $!";
+	my $oldset = PublicInbox::Sigfd::block_signals();
+	my $pid = fork // die "fork: $!";
+	if ($pid) {
+		PublicInbox::Sigfd::sig_setmask($oldset);
+		return; # client will connect to $path
+	}
+	openlog($path, 'pid', 'user');
+	local $SIG{__DIE__} = sub {
+		syslog('crit', "@_");
+		exit $! if $!;
+		exit $? >> 8 if $? >> 8;
+		exit 255;
+	};
+	local $SIG{__WARN__} = sub { syslog('warning', "@_") };
+	open(STDIN, '+<', '/dev/null') or die "redirect stdin failed: $!\n";
+	open STDOUT, '>&STDIN' or die "redirect stdout failed: $!\n";
+	open STDERR, '>&STDIN' or die "redirect stderr failed: $!\n";
+	setsid();
+	$pid = fork // die "fork: $!";
+	exit if $pid;
+	$0 = "lei-daemon $path";
+	require PublicInbox::Listener;
+	require PublicInbox::EOFpipe;
+	$l->blocking(0);
+	$eof_w->blocking(0);
+	$eof_r->blocking(0);
+	my $listener = PublicInbox::Listener->new($l, \&accept_dispatch, $l);
+	my $exit_code;
+	local $quit = sub {
+		$exit_code //= shift;
+		my $tmp = $listener or exit($exit_code);
+		unlink($path) if defined($path);
+		syswrite($eof_w, '.');
+		$l = $listener = $path = undef;
+		$tmp->close if $tmp; # DS::close
+		PublicInbox::DS->SetLoopTimeout(1000);
+	};
+	PublicInbox::EOFpipe->new($eof_r, sub {}, undef);
+	my $sig = {
+		CHLD => \&PublicInbox::DS::enqueue_reap,
+		QUIT => $quit,
+		INT => $quit,
+		TERM => $quit,
+		HUP => \&noop,
+		USR1 => \&noop,
+		USR2 => \&noop,
+	};
+	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
+	local %SIG = (%SIG, %$sig) if !$sigfd;
+	if ($sigfd) { # TODO: use inotify/kqueue to detect unlinked sockets
+		PublicInbox::DS->SetLoopTimeout(5000);
+	} else {
+		# wake up every second to accept signals if we don't
+		# have signalfd or IO::KQueue:
+		PublicInbox::Sigfd::sig_setmask($oldset);
+		PublicInbox::DS->SetLoopTimeout(1000);
+	}
+	PublicInbox::DS->SetPostLoopCallback(sub {
+		my ($dmap, undef) = @_;
+		if (@st = defined($path) ? stat($path) : ()) {
+			if ($dev_ino_expect ne pack('dd', $st[0], $st[1])) {
+				warn "$path dev/ino changed, quitting\n";
+				$path = undef;
+			}
+		} elsif (defined($path)) {
+			warn "stat($path): $!, quitting ...\n";
+			undef $path; # don't unlink
+			$quit->();
+		}
+		return 1 if defined($path);
+		my $now = now();
+		my $n = 0;
+		for my $s (values %$dmap) {
+			$s->can('busy') or next;
+			if ($s->busy($now)) {
+				++$n;
+			} else {
+				$s->close;
+			}
+		}
+		$n; # true: continue, false: stop
+	});
+	PublicInbox::DS->EventLoop;
+	exit($exit_code // 0);
+}
+
+# for users w/o IO::FDPass
+sub oneshot {
+	dispatch({
+		0 => *STDIN{IO},
+		1 => *STDOUT{IO},
+		2 => *STDERR{IO},
+		env => \%ENV
+	}, @ARGV);
+}
+
+1;
diff --git a/script/lei b/script/lei
new file mode 100755
index 00000000..1b5af3a1
--- /dev/null
+++ b/script/lei
@@ -0,0 +1,58 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Cwd qw(cwd);
+use IO::Socket::UNIX;
+
+if (eval { require IO::FDPass; 1 }) { # use daemon to reduce load time
+	my $path = do {
+		my $runtime_dir = ($ENV{XDG_RUNTIME_DIR} // '') . '/lei';
+		if ($runtime_dir eq '/lei') {
+			require File::Spec;
+			$runtime_dir = File::Spec->tmpdir."/lei-$<";
+		}
+		unless (-d $runtime_dir && -w _) {
+			require File::Path;
+			File::Path::mkpath($runtime_dir, 0, 0700);
+		}
+		"$runtime_dir/sock";
+	};
+	my $sock = IO::Socket::UNIX->new(Peer => $path, Type => SOCK_STREAM);
+	unless ($sock) { # start the daemon if not started
+		my $err = $!;
+		require PublicInbox::LeiDaemon;
+		$err = PublicInbox::LeiDaemon::lazy_start($path, $err);
+		# try connecting again anyways, unlink+bind may be racy
+		$sock = IO::Socket::UNIX->new(Peer => $path,
+						Type => SOCK_STREAM) // die
+			"connect($path): $! (bind($path): $err)";
+	}
+	my $pwd = $ENV{PWD};
+	my $cwd = cwd();
+	if ($pwd) { # prefer ENV{PWD} if it's a symlink to real cwd
+		my @st_cwd = stat($cwd) or die "stat(cwd=$cwd): $!\n";
+		my @st_pwd = stat($pwd);
+		# make sure st_dev/st_ino match for {PWD} to be valid
+		$pwd = $cwd if (!@st_pwd || $st_pwd[1] != $st_cwd[1] ||
+					$st_pwd[0] != $st_cwd[0]);
+	} else {
+		$pwd = $cwd;
+	}
+	local $ENV{PWD} = $pwd;
+	$sock->autoflush(1);
+	IO::FDPass::send(fileno($sock), $_) for (0..2);
+	my $buf = "$$\0\0>" . join("]\0[", @ARGV) . "\0\0>";
+	while (my ($k, $v) = each %ENV) { $buf .= "$k=$v\0" }
+	$buf .= "\0\0";
+	print $sock $buf or die "print(sock, buf): $!";
+	local $/ = "\n";
+	while (my $line = <$sock>) {
+		$line =~ /\Aexit=([0-9]+)\n\z/ and exit($1 + 0);
+		die $line;
+	}
+} else { # for systems lacking IO::FDPass
+	require PublicInbox::LeiDaemon;
+	PublicInbox::LeiDaemon::oneshot();
+}
diff --git a/t/lei.t b/t/lei.t
new file mode 100644
index 00000000..feee9270
--- /dev/null
+++ b/t/lei.t
@@ -0,0 +1,80 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Config;
+my $json = PublicInbox::Config::json() or plan skip_all => 'JSON missing';
+require_mods(qw(DBD::SQLite Search::Xapian));
+my ($home, $for_destroy) = tmpdir();
+my $opt = { 1 => \(my $out = ''), 2 => \(my $err = '') };
+
+SKIP: {
+	require_mods('IO::FDPass', 51);
+	local $ENV{XDG_RUNTIME_DIR} = "$home/xdg_run";
+	mkdir "$home/xdg_run", 0700 or BAIL_OUT "mkdir: $!";
+	my $sock = "$ENV{XDG_RUNTIME_DIR}/lei/sock";
+
+	ok(run_script([qw(lei daemon-pid)], undef, $opt), 'daemon-pid');
+	is($err, '', 'no error from daemon-pid');
+	like($out, qr/\A[0-9]+\n\z/s, 'pid returned') or BAIL_OUT;
+	chomp(my $pid = $out);
+	ok(kill(0, $pid), 'pid is valid');
+	ok(-S $sock, 'sock created');
+
+	ok(!run_script([qw(lei)], undef, $opt), 'no args fails');
+	is($? >> 8, 1, '$? is 1');
+	is($out, '', 'nothing in stdout');
+	like($err, qr/^usage:/sm, 'usage in stderr');
+
+	for my $arg (['-h'], ['--help'], ['help'], [qw(daemon-pid --help)]) {
+		$out = $err = '';
+		ok(run_script(['lei', @$arg], undef, $opt), "lei @$arg");
+		like($out, qr/^usage:/sm, "usage in stdout (@$arg)");
+		is($err, '', "nothing in stderr (@$arg)");
+	}
+
+	ok(!run_script([qw(lei DBG-false)], undef, $opt), 'false(1) emulation');
+	is($? >> 8, 1, '$? set correctly');
+	is($err, '', 'no error from false(1) emulation');
+
+	for my $arg ([''], ['--halp'], ['halp'], [qw(daemon-pid --halp)]) {
+		$out = $err = '';
+		ok(!run_script(['lei', @$arg], undef, $opt), "lei @$arg");
+		is($? >> 8, 1, '$? set correctly');
+		isnt($err, '', 'something in stderr');
+		is($out, '', 'nothing in stdout');
+	}
+
+	$out = '';
+	ok(run_script([qw(lei daemon-pid)], undef, $opt), 'daemon-pid');
+	chomp(my $pid_again = $out);
+	is($pid, $pid_again, 'daemon-pid idempotent');
+
+	ok(run_script([qw(lei daemon-stop)], undef, $opt), 'daemon-stop');
+	is($out, '', 'no output from daemon-stop');
+	is($err, '', 'no error from daemon-stop');
+	for (0..100) {
+		kill(0, $pid) or last;
+		tick();
+	}
+	ok(!-S $sock, 'sock gone');
+	ok(!kill(0, $pid), 'pid gone after stop');
+
+	ok(run_script([qw(lei daemon-pid)], undef, $opt), 'daemon-pid');
+	chomp(my $new_pid = $out);
+	ok(kill(0, $new_pid), 'new pid is running');
+	ok(-S $sock, 'sock exists again');
+	unlink $sock or BAIL_OUT "unlink $!";
+	for (0..100) {
+		kill('CHLD', $new_pid) or last;
+		tick();
+	}
+	ok(!kill(0, $new_pid), 'daemon exits after unlink');
+};
+
+require_ok 'PublicInbox::LeiDaemon';
+
+done_testing;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 4/7] lei: proposed command-listing and options
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
                   ` (2 preceding siblings ...)
  2020-12-15 11:47 ` [RFC 3/7] lei: FD-passing and IPC basics Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-26 11:26   ` "extinbox" term - was: [RFC 4/7] lei: proposed command-listing Eric Wong
  2020-12-15 11:47 ` [RFC 5/7] lei_store: local storage for Local Email Interface Eric Wong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

In an attempt to ensure a coherent UI/UX, we'll try to document
all proposed commands and options in one place for easy reference
---
 lib/PublicInbox/LeiDaemon.pm | 148 +++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/lib/PublicInbox/LeiDaemon.pm b/lib/PublicInbox/LeiDaemon.pm
index ae40b3a6..89434cb8 100644
--- a/lib/PublicInbox/LeiDaemon.pm
+++ b/lib/PublicInbox/LeiDaemon.pm
@@ -23,6 +23,149 @@ our $quit = sub { exit(shift // 0) };
 my $glp = Getopt::Long::Parser->new;
 $glp->configure(qw(gnu_getopt no_ignore_case auto_abbrev));
 
+# TBD: this is a documentation mechanism to show a subcommand
+# (may) pass options through to another command:
+sub pass_through { () }
+
+# TODO: generate shell completion + help using %CMD and %OPTDESC
+# command => [ positional_args, 1-line description, Getopt::Long option spec ]
+our %CMD = ( # sorted in order of importance/use:
+'query' => [ 'SEARCH-TERMS...', 'search for messages matching terms', qw(
+	save-as=s output|o=s format|f=s dedupe|d=s thread|t augment|a
+	limit|n=i sort|s=s reverse|r offset=i remote local! extinbox!
+	since|after=s until|before=s) ],
+
+'show' => [ '{MID|OID}', 'show a given object (Message-ID or object ID)',
+	qw(type=s solve! format|f=s dedupe|d=s thread|t remote local!),
+	pass_through('git show') ],
+
+'add-extinbox' => [ 'URL-OR-PATHNAME',
+	'add/set priority of a publicinbox|extindex for extra matches',
+	qw(prio=i) ],
+'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex sources',
+	qw(format|f=s z local remote) ],
+'forget-extinbox' => [ '{URL-OR-PATHNAME|--prune}',
+	'exclude further results from a publicinbox|extindex',
+	qw(prune) ],
+
+'ls-query' => [ '[FILTER]', 'list saved search queries',
+		qw(name-only format|f=s z) ],
+'rm-query' => [ 'QUERY_NAME', 'remove a saved search' ],
+'mv-query' => [ qw(OLD_NAME NEW_NAME), 'rename a saved search' ],
+
+'plonk' => [ '{--thread|--from=IDENT}',
+	'exclude mail matching From: or thread from non-Message-ID searches',
+	qw(thread|t from|f=s mid=s oid=s) ],
+'mark' => [ 'MESSAGE-FLAGS', 'set/unset flags on message(s) from stdin',
+	qw(stdin| oid=s exact by-mid|mid:s) ],
+'forget' => [ '--stdin', 'exclude message(s) on stdin from query results',
+	qw(stdin| oid=s  exact by-mid|mid:s) ],
+
+'purge-mailsource' => [ '{URL-OR-PATHNAME|--all}',
+	'remove imported messages from IMAP, Maildirs, and MH',
+	qw(exact! all jobs:i indexed) ],
+
+# code repos are used for `show' to solve blobs from patch mails
+'add-coderepo' => [ 'PATHNAME', 'add or set priority of a git code repo',
+	qw(prio=i) ],
+'ls-coderepo' => [ '[FILTER]', 'list known code repos', qw(format|f=s z) ],
+'forget-coderepo' => [ 'PATHNAME',
+	'stop using repo to solve blobs from patches',
+	qw(prune) ],
+
+'add-watch' => [ '[URL_OR_PATHNAME]',
+		'watch for new messages and flag changes',
+	qw(import! flags! interval=s recursive|r exclude=s include=s) ],
+'ls-watch' => [ '[FILTER]', 'list active watches with numbers and status',
+		qw(format|f=s z) ],
+'pause-watch' => [ '[WATCH_NUMBER_OR_FILTER]', qw(all local remote) ],
+'resume-watch' => [ '[WATCH_NUMBER_OR_FILTER]', qw(all local remote) ],
+'forget-watch' => [ '{WATCH_NUMBER|--prune}', 'stop and forget a watch',
+	qw(prune) ],
+
+'import' => [ '{URL_OR_PATHNAME|--stdin}',
+	'one-shot import/update from URL or filesystem',
+	qw(stdin| limit|n=i offset=i recursive|r exclude=s include=s !flags),
+	],
+
+'config' => [ '[ANYTHING...]',
+		'git-config(1) wrapper for ~/.config/lei/config',
+		pass_through('git config') ],
+'init' => [ '[PATHNAME]',
+	'initialize storage, default: ~/.local/share/lei/store',
+	qw(quiet|q) ],
+'daemon-stop' => [ undef, 'stop the lei-daemon' ],
+'daemon-pid' => [ undef, 'show the PID of the lei-daemon' ],
+'help' => [ '[SUBCOMMAND]', 'show help' ],
+
+# XXX do we need this?
+# 'git' => [ '[ANYTHING...]', 'git(1) wrapper', pass_through('git') ],
+
+'reorder-local-store-and-break-history' => [ '[REFNAME]',
+	'rewrite git history in an attempt to improve compression',
+	'gc!' ]
+); # @CMD
+
+# switch descriptions, try to keep consistent across commands
+# $spec: Getopt::Long option specification
+# $spec => [@ALLOWED_VALUES (default is first), $description],
+# $spec => $description
+# "$SUB_COMMAND TAB $spec" => as above
+my $stdin_formats = [ qw(auto raw mboxrd mboxcl2 mboxcl mboxo),
+		'specify message input format' ];
+my $ls_format = [ qw(plain json null), 'listing output format' ];
+
+my %OPTDESC = (
+'quiet|q' => 'be quiet',
+'solve!' => 'do not attempt to reconstruct blobs from emails',
+'save-as=s' => ['NAME', 'save a search terms by given name'],
+
+'type=s' => [qw(any mid git), 'disambiguate type' ],
+
+'dedupe|d=s' => [qw(content oid mid), 'deduplication strategy'],
+'show	thread|t' => 'display entire thread a message belongs to',
+'query	thread|t' =>
+	'return message in the same thread as the actual match(es)',
+'augment|a' => 'augment --output destination instead of clobbering',
+
+'output|o=s' => "destination (e.g. `/path/to/Maildir', or `-' for stdout)",
+
+'show	format|f=s' => [ qw(plain raw html mboxrd mboxcl2 mboxcl),
+			'message/object output format' ],
+'mark	format|f=s' => $stdin_formats,
+'forget	format|f=s' => $stdin_formats,
+'query	format|f=s' => [qw(maildir mboxrd mboxcl2 mboxcl html oid),
+		'specify output format, default: depends on --output'],
+'ls-query	format|f=s' => $ls_format,
+'ls-extinbox format|f=s' => $ls_format,
+
+'limit|n=i' => 'integer limit on number of matches (default: 10000)',
+'offset=i' => 'search result offset (default: 0)',
+
+'sort|s=s@' => [qw(internaldate date relevance docid),
+		"order of results `--output'-dependent)"],
+
+'prio=i' => 'priority of query source',
+
+'local' => 'limit operations to the local filesystem',
+'local!' => 'exclude results from the local filesystem',
+'remote' => 'limit operations to those requiring network access',
+'remote!' => 'prevent operations requiring network access',
+
+'mid=s' => 'specify the Message-ID of a message',
+'oid=s' => 'specify the git object ID of a message',
+
+'recursive|r' => 'scan directories/mailboxes/newsgroups recursively',
+'exclude=s' => 'exclude mailboxes/newsgroups based on pattern',
+'include=s' => 'include mailboxes/newsgroups based on pattern',
+
+'exact' => 'operate on exact header matches only',
+'exact!' => 'rely on content match instead of exact header matches',
+
+'by-mid|mid:s' => 'match only by Message-ID, ignoring contents',
+'jobs:i' => 'set parallelism level',
+); # %OPTDESC
+
 sub x_it ($$) { # pronounced "exit"
 	my ($client, $code) = @_;
 	if (my $sig = ($code & 127)) {
@@ -100,6 +243,11 @@ sub dispatch {
 	}
 }
 
+sub lei_init {
+	my ($client, $argv) = @_;
+	assert_args($client, $argv, '') and emit($client, 1, "hi\n");
+}
+
 sub lei_daemon_pid {
 	my ($client, $argv) = @_;
 	assert_args($client, $argv, '') and emit($client, 1, "$$\n");

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 5/7] lei_store: local storage for Local Email Interface
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
                   ` (3 preceding siblings ...)
  2020-12-15 11:47 ` [RFC 4/7] lei: proposed command-listing and options Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 11:47 ` [RFC 6/7] tests: more common JSON module loading Eric Wong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

Still unstable, this builds off the equally unstable extindex :P

This will be used for caching/memoization of traditional mail
stores (IMAP, Maildir, etc) while providing indexing via Xapian,
along with compression, and checksumming from git.

Most notably, this adds the ability to add/remove per-message
keywords (draft, seen, flagged, answered) as described in the
JMAP specification (RFC 8621 section 4.1.1).

We'll use `.' (a single period) as an $eidx_key since it's an
invalid {inboxdir} or {newsgroup} name.
---
 MANIFEST                          |   3 +
 lib/PublicInbox/ExtSearch.pm      |   4 +-
 lib/PublicInbox/ExtSearchIdx.pm   |  35 +++++-
 lib/PublicInbox/Import.pm         |   4 +
 lib/PublicInbox/LeiDaemon.pm      |   2 +-
 lib/PublicInbox/LeiSearch.pm      |  40 ++++++
 lib/PublicInbox/LeiStore.pm       | 197 ++++++++++++++++++++++++++++++
 lib/PublicInbox/OverIdx.pm        |  10 ++
 lib/PublicInbox/SearchIdx.pm      |  47 ++++++-
 lib/PublicInbox/SearchIdxShard.pm |  33 +++++
 lib/PublicInbox/V2Writable.pm     |   2 +-
 t/lei_store.t                     |  74 +++++++++++
 12 files changed, 441 insertions(+), 10 deletions(-)
 create mode 100644 lib/PublicInbox/LeiSearch.pm
 create mode 100644 lib/PublicInbox/LeiStore.pm
 create mode 100644 t/lei_store.t

diff --git a/MANIFEST b/MANIFEST
index 7536b7c2..9eb97d14 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -160,6 +160,8 @@ lib/PublicInbox/InboxWritable.pm
 lib/PublicInbox/Isearch.pm
 lib/PublicInbox/KQNotify.pm
 lib/PublicInbox/LeiDaemon.pm
+lib/PublicInbox/LeiSearch.pm
+lib/PublicInbox/LeiStore.pm
 lib/PublicInbox/Linkify.pm
 lib/PublicInbox/Listener.pm
 lib/PublicInbox/Lock.pm
@@ -319,6 +321,7 @@ t/init.t
 t/iso-2202-jp.eml
 t/kqnotify.t
 t/lei.t
+t/lei_store.t
 t/linkify.t
 t/main-bin/spamc
 t/mda-mime.eml
diff --git a/lib/PublicInbox/ExtSearch.pm b/lib/PublicInbox/ExtSearch.pm
index 2a560935..410ae958 100644
--- a/lib/PublicInbox/ExtSearch.pm
+++ b/lib/PublicInbox/ExtSearch.pm
@@ -17,13 +17,13 @@ use DBI qw(:sql_types); # SQL_BLOB
 use parent qw(PublicInbox::Search);
 
 sub new {
-	my (undef, $topdir) = @_;
+	my ($class, $topdir) = @_;
 	$topdir = File::Spec->canonpath($topdir);
 	bless {
 		topdir => $topdir,
 		# xpfx => 'ei15'
 		xpfx => "$topdir/ei".PublicInbox::Search::SCHEMA_VERSION
-	}, __PACKAGE__;
+	}, $class;
 }
 
 sub misc {
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index b5024823..cdd1621d 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -812,18 +812,31 @@ sub idx_init { # similar to V2Writable
 	return if $self->{idx_shards};
 
 	$self->git->cleanup;
-
+	my $mode = 0644;
 	my $ALL = $self->git->{git_dir}; # ALL.git
-	PublicInbox::Import::init_bare($ALL) unless -d $ALL;
+	my $old = -d $ALL;
+	if ($opt->{-private}) { # LeiStore
+		$mode = 0600;
+		if (!$old) {
+			umask 077; # don't bother restoring
+			PublicInbox::Import::init_bare($ALL);
+			$self->git->qx(qw(config core.sharedRepository 0600));
+		}
+	} else {
+		PublicInbox::Import::init_bare($ALL) unless $old;
+	}
 	my $info_dir = "$ALL/objects/info";
 	my $alt = "$info_dir/alternates";
-	my $mode = 0644;
 	my (@old, @new, %seen); # seen: st_dev + st_ino
 	if (-e $alt) {
 		open(my $fh, '<', $alt) or die "open $alt: $!";
 		$mode = (stat($fh))[2] & 07777;
 		while (my $line = <$fh>) {
 			chomp(my $d = $line);
+
+			# expand relative path (/local/ stuff)
+			substr($d, 0, 3) eq '../' and
+				$d = "$ALL/objects/$d";
 			if (my @st = stat($d)) {
 				next if $seen{"$st[0]\0$st[1]"}++;
 			} else {
@@ -833,6 +846,22 @@ sub idx_init { # similar to V2Writable
 			push @old, $line;
 		}
 	}
+
+	# for LeiStore, and possibly some mirror-only state
+	if (opendir(my $dh, my $local = "$self->{topdir}/local")) {
+		# highest numbered epoch first
+		for my $n (sort { $b <=> $a } map { substr($_, 0, -4) + 0 }
+				grep(/\A[0-9]+\.git\z/, readdir($dh))) {
+			my $d = "$local/$n.git/objects"; # absolute path
+			if (my @st = stat($d)) {
+				next if $seen{"$st[0]\0$st[1]"}++;
+				# favor relative paths for rename-friendliness
+				push @new, "../../local/$n.git/objects\n";
+			} else {
+				warn "W: stat($d) failed: $!\n";
+			}
+		}
+	}
 	for my $ibx (@{$self->{ibx_list}}) {
 		my $line = $ibx->git->{git_dir} . "/objects\n";
 		chomp(my $d = $line);
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 1a226cc7..07c7baf8 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -403,6 +403,10 @@ sub add {
 	if ($smsg) {
 		$smsg->{blob} = $self->get_mark(":$blob");
 		$smsg->{raw_bytes} = $n;
+		if (my $oidx = delete $smsg->{-oidx}) { # used by LeiStore
+			return if $oidx->blob_exists($smsg->{blob});
+		}
+		# XXX do we need this? it's in git at this point
 		$smsg->{-raw_email} = \$raw_email;
 	}
 	my $ref = $self->{ref};
diff --git a/lib/PublicInbox/LeiDaemon.pm b/lib/PublicInbox/LeiDaemon.pm
index 89434cb8..20ff0758 100644
--- a/lib/PublicInbox/LeiDaemon.pm
+++ b/lib/PublicInbox/LeiDaemon.pm
@@ -42,7 +42,7 @@ our %CMD = ( # sorted in order of importance/use:
 'add-extinbox' => [ 'URL-OR-PATHNAME',
 	'add/set priority of a publicinbox|extindex for extra matches',
 	qw(prio=i) ],
-'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex sources',
+'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex locations',
 	qw(format|f=s z local remote) ],
 'forget-extinbox' => [ '{URL-OR-PATHNAME|--prune}',
 	'exclude further results from a publicinbox|extindex',
diff --git a/lib/PublicInbox/LeiSearch.pm b/lib/PublicInbox/LeiSearch.pm
new file mode 100644
index 00000000..c59e2e55
--- /dev/null
+++ b/lib/PublicInbox/LeiSearch.pm
@@ -0,0 +1,40 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::LeiSearch;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::ExtSearch);
+use PublicInbox::Search;
+
+sub combined_docid ($$) {
+	my ($self, $num) = @_;
+	my $nshard = ($self->{nshard} // 1);
+	($num - 1) * $nshard  + 1;
+}
+
+sub msg_keywords {
+	my ($self, $num) = @_; # num_or_mitem
+	my $xdb = $self->xdb; # set {nshard};
+	my $docid = ref($num) ? $num->get_docid : do {
+		# get combined docid from over.num:
+		# (not generic Xapian, only works with our sharding scheme)
+		my $nshard = $self->{nshard} // 1;
+		($num - 1) * $nshard + $num % $nshard + 1;
+	};
+	my %kw;
+	eval {
+		my $end = $xdb->termlist_end($docid);
+		for (my $cur = $xdb->termlist_begin($docid);
+				$cur != $end; $cur++) {
+			$cur->skip_to('K');
+			last if $cur == $end;
+			my $kw = $cur->get_termname;
+			$kw =~ s/\AK//s and $kw{$kw} = undef;
+		}
+	};
+	warn "E: #$docid ($num): $@\n" if $@;
+	wantarray ? sort(keys(%kw)) : \%kw;
+}
+
+1;
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
new file mode 100644
index 00000000..56f668b8
--- /dev/null
+++ b/lib/PublicInbox/LeiStore.pm
@@ -0,0 +1,197 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Local storage (cache/memo) for lei(1), suitable for personal/private
+# mail iff on encrypted device/FS.  Based on v2, but only deduplicates
+# based on git OID.
+#
+# for xref3, the following are constant: $eidx_key = '.', $xnum = -1
+package PublicInbox::LeiStore;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::Lock);
+use PublicInbox::SearchIdx qw(crlf_adjust);
+use PublicInbox::ExtSearchIdx;
+use PublicInbox::Import;
+use PublicInbox::InboxWritable;
+use PublicInbox::V2Writable;
+use PublicInbox::ContentHash qw(content_hash);
+use PublicInbox::MID qw(mids);
+use PublicInbox::LeiSearch;
+
+sub new {
+	my (undef, $dir, $opt) = @_;
+	my $eidx = PublicInbox::ExtSearchIdx->new($dir, $opt);
+	bless { priv_eidx => $eidx }, __PACKAGE__;
+}
+
+sub git { $_[0]->{priv_eidx}->git } # read-only
+
+sub packing_factor { $PublicInbox::V2Writable::PACKING_FACTOR }
+
+sub rotate_bytes {
+	$_[0]->{rotate_bytes} // ((1024 * 1024 * 1024) / $_[0]->packing_factor)
+}
+
+sub git_pfx { "$_[0]->{priv_eidx}->{topdir}/local" };
+
+sub git_epoch_max  {
+	my ($self) = @_;
+	my $pfx = $self->git_pfx;
+	my $max = 0;
+	return $max unless -d $pfx ;
+	opendir my $dh, $pfx or die "opendir $pfx: $!\n";
+	while (defined(my $git_dir = readdir($dh))) {
+		$git_dir =~ m!\A([0-9]+)\.git\z! or next;
+		$max = $1 + 0 if $1 > $max;
+	}
+	$max;
+}
+
+sub importer {
+	my ($self) = @_;
+	my $max;
+	my $im = $self->{im};
+	if ($im) {
+		return $im if $im->{bytes_added} < $self->rotate_bytes;
+
+		delete $self->{im};
+		$im->done;
+		undef $im;
+		$self->checkpoint;
+		$max = $self->git_epoch_max + 1;
+	}
+	my $pfx = $self->git_pfx;
+	$max //= $self->git_epoch_max;
+	while (1) {
+		my $latest = "$pfx/$max.git";
+		my $old = -e $latest;
+		my $git = PublicInbox::Git->new($latest);
+		PublicInbox::Import::init_bare({ git => $git });
+		$git->qx(qw(config core.sharedRepository 0600)) if !$old;
+		my $packed_bytes = $git->packed_bytes;
+		my $unpacked_bytes = $packed_bytes / $self->packing_factor;
+		if ($unpacked_bytes >= $self->rotate_bytes) {
+			$max++;
+			next;
+		}
+		chomp(my $i = $git->qx(qw(var GIT_COMMITTER_IDENT)));
+		die "$git->{git_dir} GIT_COMMITTER_IDENT failed\n" if $?;
+		my ($n, $e) = ($i =~ /\A(.+) <([^>]+)> [0-9]+ [-\+]?[0-9]+$/g)
+			or die "could not extract name/email from `$i'\n";
+		$self->{im} = $im = PublicInbox::Import->new($git, $n, $e);
+		$im->{bytes_added} = int($packed_bytes / $self->packing_factor);
+		$im->{lock_path} = undef;
+		$im->{path_type} = 'v2';
+		return $im;
+	}
+}
+
+sub search {
+	PublicInbox::LeiSearch->new($_[0]->{priv_eidx}->{topdir});
+}
+
+sub eidx_init {
+	my ($self) = @_;
+	my $eidx = $self->{priv_eidx};
+	$eidx->idx_init({-private => 1});
+	$eidx;
+}
+
+sub _docids_for ($$) {
+	my ($self, $eml) = @_;
+	my %docids;
+	my $chash = content_hash($eml);
+	my $eidx = eidx_init($self);
+	my $oidx = $eidx->{oidx};
+	my $im = $self->{im};
+	for my $mid (@{mids($eml)}) {
+		my ($id, $prev);
+		while (my $cur = $oidx->next_by_mid($mid, \$id, \$prev)) {
+			my $oid = $cur->{blob};
+			my $docid = $cur->{num};
+			my $bref = $im ? $im->cat_blob($oid) : undef;
+			$bref //= $eidx->git->cat_file($oid) // do {
+				warn "W: $oid (#$docid) <$mid> not found\n";
+				next;
+			};
+			local $self->{current_info} = $oid;
+			my $x = PublicInbox::Eml->new($bref);
+			$docids{$docid} = $docid if content_hash($x) eq $chash;
+		}
+	}
+	sort { $a <=> $b } values %docids;
+}
+
+sub set_eml_keywords {
+	my ($self, $eml, @kw) = @_;
+	my $eidx = eidx_init($self);
+	my @docids = _docids_for($self, $eml);
+	for my $docid (@docids) {
+		$eidx->idx_shard($docid)->shard_set_keywords($docid, @kw);
+	}
+	\@docids;
+}
+
+sub add_eml_keywords {
+	my ($self, $eml, @kw) = @_;
+	my $eidx = eidx_init($self);
+	my @docids = _docids_for($self, $eml);
+	for my $docid (@docids) {
+		$eidx->idx_shard($docid)->shard_add_keywords($docid, @kw);
+	}
+	\@docids;
+}
+
+sub remove_eml_keywords {
+	my ($self, $eml, @kw) = @_;
+	my $eidx = eidx_init($self);
+	my @docids = _docids_for($self, $eml);
+	for my $docid (@docids) {
+		$eidx->idx_shard($docid)->shard_remove_keywords($docid, @kw);
+	}
+	\@docids;
+}
+
+sub add_eml {
+	my ($self, $eml) = @_;
+	my $eidx = eidx_init($self);
+	my $oidx = $eidx->{oidx};
+	my $smsg = bless { -oidx => $oidx }, 'PublicInbox::Smsg';
+	my $im = $self->importer;
+	$im->add($eml, undef, $smsg) or return; # duplicate returns undef
+	my $msgref = delete $smsg->{-raw_email};
+	$smsg->{bytes} = $smsg->{raw_bytes} + crlf_adjust($$msgref);
+
+	local $self->{current_info} = $smsg->{blob};
+	if (my @docids = _docids_for($self, $eml)) {
+		for my $docid (@docids) {
+			my $idx = $eidx->idx_shard($docid);
+			$oidx->add_xref3($docid, -1, $smsg->{blob}, '.');
+			$idx->shard_add_eidx_info($docid, '.', $eml); # List-Id
+		}
+	} else {
+		$smsg->{num} = $oidx->adj_counter('eidx_docid', '+');
+		$oidx->add_overview($eml, $smsg);
+		$oidx->add_xref3($smsg->{num}, -1, $smsg->{blob}, '.');
+		my $idx = $eidx->idx_shard($smsg->{num});
+		$idx->index_raw($msgref, $eml, $smsg);
+	}
+	$smsg->{blob}
+}
+
+sub done {
+	my ($self) = @_;
+	my $err = '';
+	if (my $im = delete($self->{im})) {
+		eval { $im->done };
+		if ($@) {
+			$err .= "import done: $@\n";
+			warn $err;
+		}
+	}
+	$self->{priv_eidx}->done;
+	die $err if $err;
+}
+
+1;
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 4a39bf53..c8630ddb 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -684,4 +684,14 @@ DELETE FROM eidxq WHERE docid = ?
 
 }
 
+sub blob_exists {
+	my ($self, $oidhex) = @_;
+	my $sth = $self->dbh->prepare_cached(<<'', undef, 1);
+SELECT COUNT(*) FROM xref3 WHERE oidbin = ?
+
+	$sth->bind_param(1, pack('H*', $oidhex), SQL_BLOB);
+	$sth->execute;
+	$sth->fetchrow_array;
+}
+
 1;
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index c6d2a0e8..ad71bc13 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -1,6 +1,6 @@
 # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-# based on notmuch, but with no concept of folders, files or flags
+# based on notmuch, but with no concept of folders, files
 #
 # Indexes mail with Xapian and our (SQLite-based) ::Msgmap for use
 # with the web and NNTP interfaces.  This index maintains thread
@@ -371,7 +371,7 @@ sub eml2doc ($$$;$) {
 	index_headers($self, $smsg);
 
 	if (defined(my $eidx_key = $smsg->{eidx_key})) {
-		$doc->add_boolean_term('O'.$eidx_key);
+		$doc->add_boolean_term('O'.$eidx_key) if $eidx_key ne '.';
 	}
 	msg_iter($eml, \&index_xapian, [ $self, $doc ]);
 	index_ids($self, $doc, $eml, $mids);
@@ -467,7 +467,7 @@ sub add_eidx_info {
 	begin_txn_lazy($self);
 	my $doc = _get_doc($self, $docid) or return;
 	term_generator($self)->set_document($doc);
-	$doc->add_boolean_term('O'.$eidx_key);
+	$doc->add_boolean_term('O'.$eidx_key) if $eidx_key ne '.';
 	index_list_id($self, $doc, $eml);
 	$self->{xdb}->replace_document($docid, $doc);
 }
@@ -501,6 +501,47 @@ sub remove_eidx_info {
 	$self->{xdb}->replace_document($docid, $doc);
 }
 
+sub set_keywords {
+	my ($self, $docid, @kw) = @_;
+	begin_txn_lazy($self);
+	my $doc = _get_doc($self, $docid) or return;
+	my %keep = map { $_ => 1 } @kw;
+	my %add = %keep;
+	my @rm;
+	my $end = $doc->termlist_end;
+	for (my $cur = $doc->termlist_begin; $cur != $end; $cur++) {
+		$cur->skip_to('K');
+		last if $cur == $end;
+		my $kw = $cur->get_termname;
+		$kw =~ s/\AK//s or next;
+		$keep{$kw} ? delete($add{$kw}) : push(@rm, $kw);
+	}
+	return unless (scalar(@rm) + scalar(keys %add));
+	$doc->remove_term('K'.$_) for @rm;
+	$doc->add_boolean_term('K'.$_) for (keys %add);
+	$self->{xdb}->replace_document($docid, $doc);
+}
+
+sub add_keywords {
+	my ($self, $docid, @kw) = @_;
+	begin_txn_lazy($self);
+	my $doc = _get_doc($self, $docid) or return;
+	$doc->add_boolean_term('K'.$_) for @kw;
+	$self->{xdb}->replace_document($docid, $doc);
+}
+
+sub remove_keywords {
+	my ($self, $docid, @kw) = @_;
+	begin_txn_lazy($self);
+	my $doc = _get_doc($self, $docid) or return;
+	my $replace;
+	eval {
+		$doc->remove_term('K'.$_);
+		$replace = 1
+	} for @kw;
+	$self->{xdb}->replace_document($docid, $doc) if $replace;
+}
+
 sub get_val ($$) {
 	my ($doc, $col) = @_;
 	sortable_unserialise($doc->get_value($col));
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index 2e654769..87b0bad6 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -89,6 +89,12 @@ sub shard_worker_loop ($$$$$) {
 			my ($len, $docid, $eidx_key) = split(/ /, $line, 3);
 			$self->remove_eidx_info($docid, $eidx_key,
 							eml($r, $len));
+		} elsif ($line =~ s/\A=K (\d+) //) {
+			$self->set_keywords($1 + 0, split(/ /, $line));
+		} elsif ($line =~ s/\A-K (\d+) //) {
+			$self->remove_keywords($1 + 0, split(/ /, $line));
+		} elsif ($line =~ s/\A\+K (\d+) //) {
+			$self->add_keywords($1 + 0, split(/ /, $line));
 		} elsif ($line =~ s/\AO ([^\n]+)//) {
 			my $over_fn = $1;
 			$over_fn =~ tr/\0/\n/;
@@ -210,6 +216,33 @@ sub shard_remove {
 	}
 }
 
+sub shard_set_keywords {
+	my ($self, $docid, @kw) = @_;
+	if (my $w = $self->{w}) { # triggers remove_by_docid in a shard child
+		print $w "=K $docid @kw\n" or die "failed to write: $!";
+	} else { # same process
+		$self->set_keywords($docid, @kw);
+	}
+}
+
+sub shard_remove_keywords {
+	my ($self, $docid, @kw) = @_;
+	if (my $w = $self->{w}) { # triggers remove_by_docid in a shard child
+		print $w "-K $docid @kw\n" or die "failed to write: $!";
+	} else { # same process
+		$self->remove_keywords($docid, @kw);
+	}
+}
+
+sub shard_add_keywords {
+	my ($self, $docid, @kw) = @_;
+	if (my $w = $self->{w}) { # triggers remove_by_docid in a shard child
+		print $w "+K $docid @kw\n" or die "failed to write: $!";
+	} else { # same process
+		$self->add_keywords($docid, @kw);
+	}
+}
+
 sub shard_over_check {
 	my ($self, $over) = @_;
 	if (my $w = $self->{w}) { # triggers remove_by_docid in a shard child
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 992305c5..b98b4695 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -24,7 +24,7 @@ use File::Temp ();
 
 my $OID = qr/[a-f0-9]{40,}/;
 # an estimate of the post-packed size to the raw uncompressed size
-my $PACKING_FACTOR = 0.4;
+our $PACKING_FACTOR = 0.4;
 
 # SATA storage lags behind what CPUs are capable of, so relying on
 # nproc(1) can be misleading and having extra Xapian shards is a
diff --git a/t/lei_store.t b/t/lei_store.t
new file mode 100644
index 00000000..c18a9620
--- /dev/null
+++ b/t/lei_store.t
@@ -0,0 +1,74 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+require_mods(qw(DBD::SQLite Search::Xapian));
+require_git 2.6;
+require_ok 'PublicInbox::LeiStore';
+require_ok 'PublicInbox::ExtSearch';
+my ($home, $for_destroy) = tmpdir();
+my $opt = { 1 => \(my $out = ''), 2 => \(my $err = '') };
+my $store_dir = "$home/lst";
+my $lst = PublicInbox::LeiStore->new($store_dir, { creat => 1 });
+ok($lst, '->new');
+my $oid = $lst->add_eml(eml_load('t/data/0001.patch'));
+like($oid, qr/\A[0-9a-f]+\z/, 'add returned OID');
+my $eml = eml_load('t/data/0001.patch');
+is($lst->add_eml($eml), undef, 'idempotent');
+$lst->done;
+{
+	my $es = $lst->search;
+	my $msgs = $es->over->query_xover(0, 1000);
+	is(scalar(@$msgs), 1, 'one message');
+	is($msgs->[0]->{blob}, $oid, 'blob matches');
+	my $mset = $es->mset("mid:$msgs->[0]->{mid}");
+	is($mset->size, 1, 'search works');
+	is_deeply($es->mset_to_artnums($mset), [ $msgs->[0]->{num} ],
+		'mset_to_artnums');
+	my @kw = $es->msg_keywords(($mset->items)[0]);
+	is_deeply(\@kw, [], 'no flags');
+}
+
+for my $parallel (0, 1) {
+	$lst->{priv_eidx}->{parallel} = $parallel;
+	my $docids = $lst->set_eml_keywords($eml, qw(seen draft));
+	is(scalar @$docids, 1, 'set keywords on one doc');
+	$lst->done;
+	my @kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, [qw(draft seen)], 'kw matches');
+
+	$docids = $lst->add_eml_keywords($eml, qw(seen draft));
+	$lst->done;
+	is(scalar @$docids, 1, 'idempotently added keywords to doc');
+	@kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, [qw(draft seen)], 'kw matches after noop');
+
+	$docids = $lst->remove_eml_keywords($eml, qw(seen draft));
+	is(scalar @$docids, 1, 'removed from one doc');
+	$lst->done;
+	@kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, [], 'kw matches after remove');
+
+	$docids = $lst->remove_eml_keywords($eml, qw(answered));
+	is(scalar @$docids, 1, 'removed from one doc (idempotently)');
+	$lst->done;
+	@kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, [], 'kw matches after remove (idempotent)');
+
+	$docids = $lst->add_eml_keywords($eml, qw(answered));
+	is(scalar @$docids, 1, 'added to empty doc');
+	$lst->done;
+	@kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, ['answered'], 'kw matches after add');
+
+	$docids = $lst->set_eml_keywords($eml);
+	is(scalar @$docids, 1, 'set to clobber');
+	$lst->done;
+	@kw = $lst->search->msg_keywords($docids->[0]);
+	is_deeply(\@kw, [], 'set clobbers all');
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 6/7] tests: more common JSON module loading
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
                   ` (4 preceding siblings ...)
  2020-12-15 11:47 ` [RFC 5/7] lei_store: local storage for Local Email Interface Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 11:47 ` [RFC 7/7] lei: use spawn (vfork + execve) for lazy start Eric Wong
  2020-12-15 12:05 ` more considerations in UI/UX Eric Wong
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

We'll probably be using JSON more in the future, so make
it easier to require in tests
---
 lib/PublicInbox/ManifestJsGz.pm | 2 +-
 lib/PublicInbox/TestCommon.pm   | 4 ++++
 t/extsearch.t                   | 3 +--
 t/lei.t                         | 3 +--
 t/www_listing.t                 | 8 +++-----
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/ManifestJsGz.pm b/lib/PublicInbox/ManifestJsGz.pm
index 6d5b57ee..33df020a 100644
--- a/lib/PublicInbox/ManifestJsGz.pm
+++ b/lib/PublicInbox/ManifestJsGz.pm
@@ -11,7 +11,7 @@ use PublicInbox::Config;
 use IO::Compress::Gzip qw(gzip);
 use HTTP::Date qw(time2str);
 
-our $json = PublicInbox::Config::json();
+my $json = PublicInbox::Config::json();
 
 # called by WwwListing
 sub url_regexp {
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 299b9c6a..2116575b 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -75,6 +75,10 @@ sub require_mods {
 	my $maybe = pop @mods if $mods[-1] =~ /\A[0-9]+\z/;
 	my @need;
 	while (my $mod = shift(@mods)) {
+		if ($mod eq 'json') {
+			$mod = 'Cpanel::JSON::XS||JSON::MaybeXS||'.
+				'JSON||JSON::PP'
+		}
 		if ($mod eq 'Search::Xapian') {
 			if (eval { require PublicInbox::Search } &&
 				PublicInbox::Search::load_xapian()) {
diff --git a/t/extsearch.t b/t/extsearch.t
index fb31b0ab..ffbc10e2 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -8,9 +8,8 @@ use PublicInbox::Config;
 use PublicInbox::Search;
 use PublicInbox::InboxWritable;
 use Fcntl qw(:seek);
-my $json = PublicInbox::Config::json() or plan skip_all => 'JSON missing';
 require_git(2.6);
-require_mods(qw(DBD::SQLite Search::Xapian));
+require_mods(qw(json DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::ExtSearch';
 use_ok 'PublicInbox::ExtSearchIdx';
 use_ok 'PublicInbox::OverIdx';
diff --git a/t/lei.t b/t/lei.t
index feee9270..02f21322 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -6,8 +6,7 @@ use v5.10.1;
 use Test::More;
 use PublicInbox::TestCommon;
 use PublicInbox::Config;
-my $json = PublicInbox::Config::json() or plan skip_all => 'JSON missing';
-require_mods(qw(DBD::SQLite Search::Xapian));
+require_mods(qw(json DBD::SQLite Search::Xapian));
 my ($home, $for_destroy) = tmpdir();
 my $opt = { 1 => \(my $out = ''), 2 => \(my $err = '') };
 
diff --git a/t/www_listing.t b/t/www_listing.t
index 63613371..94c1e5bb 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -7,14 +7,12 @@ use Test::More;
 use PublicInbox::Spawn qw(which);
 use PublicInbox::TestCommon;
 use PublicInbox::Import;
-require_mods(qw(URI::Escape Plack::Builder Digest::SHA
+require_mods(qw(json URI::Escape Plack::Builder Digest::SHA
 		IO::Compress::Gzip IO::Uncompress::Gunzip HTTP::Tiny));
 require PublicInbox::WwwListing;
 require PublicInbox::ManifestJsGz;
-my $json = do {
-	no warnings 'once';
-	$PublicInbox::ManifestJsGz::json;
-} or plan skip_all => "JSON module missing";
+use PublicInbox::Config;
+my $json = PublicInbox::Config::json();
 
 use_ok 'PublicInbox::Git';
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 7/7] lei: use spawn (vfork + execve) for lazy start
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
                   ` (5 preceding siblings ...)
  2020-12-15 11:47 ` [RFC 6/7] tests: more common JSON module loading Eric Wong
@ 2020-12-15 11:47 ` Eric Wong
  2020-12-15 12:05 ` more considerations in UI/UX Eric Wong
  7 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-15 11:47 UTC (permalink / raw)
  To: meta

This allows us to rely on FD_CLOEXEC being set on pipes
from prove(1), so forgetting `daemon-stop' won't cause
tests to hang.

Unfortunately, daemon tests will be slower with this.
---
 lib/PublicInbox/LeiDaemon.pm | 12 +++++-------
 script/lei                   | 14 ++++++++++----
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/LeiDaemon.pm b/lib/PublicInbox/LeiDaemon.pm
index 20ff0758..2f614ba4 100644
--- a/lib/PublicInbox/LeiDaemon.pm
+++ b/lib/PublicInbox/LeiDaemon.pm
@@ -335,29 +335,27 @@ sub accept_dispatch { # Listener {post_accept} callback
 sub noop {}
 
 # lei(1) calls this when it can't connect
-sub lazy_start ($$) {
+sub lazy_start {
 	my ($path, $err) = @_;
 	if ($err == ECONNREFUSED) {
 		unlink($path) or die "unlink($path): $!";
 	} elsif ($err != ENOENT) {
 		die "connect($path): $!";
 	}
+	require IO::FDPass;
 	my $umask = umask(077) // die("umask(077): $!");
 	my $l = IO::Socket::UNIX->new(Local => $path,
 					Listen => 1024,
 					Type => SOCK_STREAM) or
 		$err = $!;
 	umask($umask) or die("umask(restore): $!");
-	$l or return $err;
+	$l or return die "bind($path): $err";
 	my @st = stat($path) or die "stat($path): $!";
 	my $dev_ino_expect = pack('dd', $st[0], $st[1]); # dev+ino
 	pipe(my ($eof_r, $eof_w)) or die "pipe: $!";
 	my $oldset = PublicInbox::Sigfd::block_signals();
 	my $pid = fork // die "fork: $!";
-	if ($pid) {
-		PublicInbox::Sigfd::sig_setmask($oldset);
-		return; # client will connect to $path
-	}
+	return if $pid;
 	openlog($path, 'pid', 'user');
 	local $SIG{__DIE__} = sub {
 		syslog('crit', "@_");
@@ -371,7 +369,7 @@ sub lazy_start ($$) {
 	open STDERR, '>&STDIN' or die "redirect stderr failed: $!\n";
 	setsid();
 	$pid = fork // die "fork: $!";
-	exit if $pid;
+	return if $pid;
 	$0 = "lei-daemon $path";
 	require PublicInbox::Listener;
 	require PublicInbox::EOFpipe;
diff --git a/script/lei b/script/lei
index 1b5af3a1..637c1951 100755
--- a/script/lei
+++ b/script/lei
@@ -21,13 +21,19 @@ if (eval { require IO::FDPass; 1 }) { # use daemon to reduce load time
 	};
 	my $sock = IO::Socket::UNIX->new(Peer => $path, Type => SOCK_STREAM);
 	unless ($sock) { # start the daemon if not started
-		my $err = $!;
-		require PublicInbox::LeiDaemon;
-		$err = PublicInbox::LeiDaemon::lazy_start($path, $err);
+		my $err = $! + 0;
+		my $env = { PERL5LIB => join(':', @INC) };
+		my $cmd = [ $^X, qw[-MPublicInbox::LeiDaemon
+			-E PublicInbox::LeiDaemon::lazy_start(@ARGV)],
+			$path, $err ];
+		require PublicInbox::Spawn;
+		waitpid(PublicInbox::Spawn::spawn($cmd, $env), 0);
+		warn "lei-daemon exited with \$?=$?\n" if $?;
+
 		# try connecting again anyways, unlink+bind may be racy
 		$sock = IO::Socket::UNIX->new(Peer => $path,
 						Type => SOCK_STREAM) // die
-			"connect($path): $! (bind($path): $err)";
+			"connect($path): $! (after attempted daemon start)";
 	}
 	my $pwd = $ENV{PWD};
 	my $cwd = cwd();

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* more considerations in UI/UX...
  2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
                   ` (6 preceding siblings ...)
  2020-12-15 11:47 ` [RFC 7/7] lei: use spawn (vfork + execve) for lazy start Eric Wong
@ 2020-12-15 12:05 ` Eric Wong
  2020-12-23  5:42   ` Kyle Meyer
  7 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-12-15 12:05 UTC (permalink / raw)
  To: meta

some rambling, haven't been able to sleep well all year :<

* latency - startup time hurts, especially in Perl.
  There's also DB opens and disk seeks regardless of
  language.  libgit2 has some built-in caching so a
  persistent daemon may help, here.

* shortcuts/names; for two handed-users on QWERTY, 'lei'
  can be typed with alternate hands with 'l' as a home key
  and the 'e' and 'i' being close to home keys.

  'query' is hard-to-type and will have 'q' as a builtin alias
  (matching the 'q=' query parameter of our WWW UI),
  'show' may have 's', matching /$INBOX/$OID/s/ (solver) URLs

  ... or, can 'q' and 's' be the command w/o long form.
  Neither 'show' nor 'query' are search-engine friendly,
  so "lei q" and "lei s" may be better.

* consistency/familiarity - steal ideas from other software
  built-in help, auto-pager/color,
  `q=' is stolen from web search engines,
  search term prefixes (f:, t:, ...) stolen from mairix

  Stuff I don't know but know other users use: Emacs / Gnus

  notmuch - I've only read the code since Maildir can't scale

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: more considerations in UI/UX...
  2020-12-15 12:05 ` more considerations in UI/UX Eric Wong
@ 2020-12-23  5:42   ` Kyle Meyer
  2020-12-23  9:47     ` Eric Wong
  2020-12-26 11:13     ` [RFC] lei: rename proposed "query" command to "q", add JSON output Eric Wong
  0 siblings, 2 replies; 17+ messages in thread
From: Kyle Meyer @ 2020-12-23  5:42 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

I'm still digesting this (and the follow-up thread), but all of this
looks really great and exciting.

Eric Wong writes:

> * consistency/familiarity - steal ideas from other software
>   built-in help, auto-pager/color,
>   `q=' is stolen from web search engines,
>   search term prefixes (f:, t:, ...) stolen from mairix
>
>   Stuff I don't know but know other users use: Emacs / Gnus
>
>   notmuch - I've only read the code since Maildir can't scale

I use Gnus for NNTP (a mixture of public-inbox and Gmane stuff).  In the
context of lei, I'm not sure there's a whole lot to borrow from Gnus.

I also use notmuch (via its Emacs interface).  As someone that will
probably write an Emacs interface for lei (as part of piem), an aspect
of notmuch that I'd be grateful to see in lei is a structured output
format for easier parsing and for conveying the thread layout.  `notmuch
show' and `notmuch search' have json and S-expressions.  I wouldn't
expect to see S-expressions coming out of lei :), but perhaps json would
be on the table for `lei show' and `lei query' given that it's planned
for $ls_format.

Just for reference, here's an example of (heavily edited) `notmuch show'
output for this thread:

  $ notmuch show --entire-thread=true --body=false --format=json \
    id:20201215120544.GA8927@dcvr
  [[[
    {"id": "20201215114722.27400-1-e@80x24.org",
     "match": false,
     "excluded": false,
     "filename": ["..."],
     "timestamp": 1608032835,
     "date_relative": "December 15",
     "tags": [],
     "crypto": {},
     "headers": {
       "Subject": "[PATCH/RFC 0/7] lei - Local Email Interface skeleton",
       "From": "Eric Wong <e@80x24.org>",
       "To": "meta@public-inbox.org",
       "Date": "Tue, 15 Dec 2020 11:47:15 +0000"}},
    [[
        {"id": "20201215114722.27400-2-e@80x24.org",
         "match": false,
         "excluded": false,
         "filename": ["..."],
         "timestamp": 1608032836,
         "date_relative": "December 15",
         "tags": [],
         "crypto": {},
         "headers": {
           "Subject": "[PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize",
           "From": "Eric Wong <e@80x24.org>",
           "To": "meta@public-inbox.org",
           "Date": "Tue, 15 Dec 2020 11:47:16 +0000"}},
        []
      ],
      ...
      [
        {"id": "20201215120544.GA8927@dcvr",
         "match": true,
         "excluded": false,
         "filename": ["..."],
         "timestamp": 1608033944,
         "date_relative": "December 15",
         "tags": [],
         "crypto": {},
         "headers": {
           "Subject": "more considerations in UI/UX...",
           "From": "Eric Wong <e@80x24.org>",
           "To": "meta@public-inbox.org",
           "Date": "Tue, 15 Dec 2020 12:05:44 +0000"}}]]]]]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: more considerations in UI/UX...
  2020-12-23  5:42   ` Kyle Meyer
@ 2020-12-23  9:47     ` Eric Wong
  2020-12-23 15:49       ` Kyle Meyer
  2020-12-26 11:13     ` [RFC] lei: rename proposed "query" command to "q", add JSON output Eric Wong
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-12-23  9:47 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> I'm still digesting this (and the follow-up thread), but all of this
> looks really great and exciting.

Great to know; it's been floating in the back of my mind for a
few years.

> Eric Wong writes:
> 
> > * consistency/familiarity - steal ideas from other software
> >   built-in help, auto-pager/color,
> >   `q=' is stolen from web search engines,
> >   search term prefixes (f:, t:, ...) stolen from mairix
> >
> >   Stuff I don't know but know other users use: Emacs / Gnus
> >
> >   notmuch - I've only read the code since Maildir can't scale
> 
> I use Gnus for NNTP (a mixture of public-inbox and Gmane stuff).  In the
> context of lei, I'm not sure there's a whole lot to borrow from Gnus.
> 
> I also use notmuch (via its Emacs interface).  As someone that will
> probably write an Emacs interface for lei (as part of piem), an aspect
> of notmuch that I'd be grateful to see in lei is a structured output
> format for easier parsing and for conveying the thread layout.  `notmuch
> show' and `notmuch search' have json and S-expressions.  I wouldn't
> expect to see S-expressions coming out of lei :), but perhaps json would
> be on the table for `lei show' and `lei query' given that it's planned
> for $ls_format.

Yes, leaving JSON out of "lei q(uery)" output was an oversight,
and probably "show", too...  I'm also wondering if there'll need
to be multiple flavors of JSON output for compatibility with
existing tools: "nm-json", "foo-json", "bar-json", etc..

(And JMAP support will be developed in parallel, too, so
 that'll need to be taken into account).

S-expressions will probably have to wait a bit (I don't know
Lisp at all :x), there's already a lot on the plate :)

> Just for reference, here's an example of (heavily edited) `notmuch show'
> output for this thread:
> 
>   $ notmuch show --entire-thread=true --body=false --format=json \
>     id:20201215120544.GA8927@dcvr
>   [[[
>     {"id": "20201215114722.27400-1-e@80x24.org",
>      "match": false,
>      "excluded": false,
>      "filename": ["..."],

"filename" could be strange for us to implement, I suppose we
could maintain a stable filename at
/path/to/some/Maildir/cur/$GIT_OBJECT_ID:2, if needed.

From what I'm used to as a mairix user, it blows away the output
folder every search (unless --augment is used); but consumers of
nm-compatible JSON would have different lifetime expectations of
the filename...

>      "timestamp": 1608032835,
>      "date_relative": "December 15",
>      "tags": [],
>      "crypto": {},

There'll need to be some configurable MIME type => text
conversion mapping to translate encrypted emails, HTML parts,
PDFs, and other random formats into something Xapian can index.

Xapian omega already has something we can steal/copy from, I
think.  This should be configurable for public stuff, too.

The rest of the output looks reasonable to pull out of
over.sqlite3 and "tags" from Xapian.  Thanks for the feedback.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: more considerations in UI/UX...
  2020-12-23  9:47     ` Eric Wong
@ 2020-12-23 15:49       ` Kyle Meyer
  0 siblings, 0 replies; 17+ messages in thread
From: Kyle Meyer @ 2020-12-23 15:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:

> Yes, leaving JSON out of "lei q(uery)" output was an oversight,
> and probably "show", too...  I'm also wondering if there'll need
> to be multiple flavors of JSON output for compatibility with
> existing tools: "nm-json", "foo-json", "bar-json", etc..

Ah, sorry, I should have been clearer here.  Perhaps others would have a
use for tool-specific JSON, but it's really just having some sort of
structured output that I'm interested in.  With the notmuch output, I
was trying to give a sense of what it looked like in notmuch's case.
Whatever JSON output fits and makes sense for public-inbox, I can gladly
work with :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] lei: rename proposed "query" command to "q", add JSON output
  2020-12-23  5:42   ` Kyle Meyer
  2020-12-23  9:47     ` Eric Wong
@ 2020-12-26 11:13     ` Eric Wong
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-12-26 11:13 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> I also use notmuch (via its Emacs interface).  As someone that will
> probably write an Emacs interface for lei (as part of piem), an aspect
> of notmuch that I'd be grateful to see in lei is a structured output
> format for easier parsing and for conveying the thread layout.  `notmuch
> show' and `notmuch search' have json and S-expressions.  I wouldn't
> expect to see S-expressions coming out of lei :), but perhaps json would
> be on the table for `lei show' and `lei query' given that it's planned
> for $ls_format.

OK, before I forget, JSON is added.
And an extremely long message on why I want to type less :x
----------8<---------
Subject: [PATCH] lei: rename proposed "query" command to "q", add JSON output

Using "query" as a verb may be confusing when we'll also refer to
them as nouns with the "<ls|rm|mv>-query" sub commands.  "query"
is also many characters to type without tab-completion on what I
expect to be one of the most commonly used sub-commands

Furthermore, "q" is also the common query parameter name used by
our PSGI interface, as is the case with several major web search
engines; so there's an element of familiarity there.

The name "search" was disregarded because "show" could be a
commonly used lei sub-command, too, and typing "se" for
tab-completion may be slow since two-handed typists on QWERTY
keyboards won't be able to use alternating hands.

"f" or "find" could be a possibility here, too; but we're
currently using the term "forget" as a weaker version of
"remove" or "rm", though "ignore" could be substituted for
"forget", perhaps...

Kyle Meyer noted the lack of (proposed) JSON output support
so that's been added to the proposed UI.
---
 lib/PublicInbox/LEI.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index b254e2c5..7002a1f7 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -68,7 +68,7 @@ sub _config_path ($) {
 # TODO: generate shell completion + help using %CMD and %OPTDESC
 # command => [ positional_args, 1-line description, Getopt::Long option spec ]
 our %CMD = ( # sorted in order of importance/use:
-'query' => [ 'SEARCH_TERMS...', 'search for messages matching terms', qw(
+'q' => [ 'SEARCH_TERMS...', 'search for messages matching terms', qw(
 	save-as=s output|o=s format|f=s dedupe|d=s thread|t augment|a
 	sort|s=s@ reverse|r offset=i remote local! extinbox!
 	since|after=s until|before=s), opt_dash('limit|n=i', '[0-9]+') ],
@@ -98,7 +98,7 @@ our %CMD = ( # sorted in order of importance/use:
 	'set/unset flags on message(s) from stdin',
 	qw(stdin| oid=s exact by-mid|mid:s) ],
 'forget' => [ '[--stdin|--oid=OID|--by-mid=MID]',
-	'exclude message(s) on stdin from query results',
+	"exclude message(s) on stdin from `q' search results",
 	qw(stdin| oid=s exact by-mid|mid:s quiet|q) ],
 
 'purge-mailsource' => [ '{URL_OR_PATHNAME|--all}',
@@ -175,7 +175,7 @@ my %OPTDESC = (
 'dedupe|d=s' => ['STRAT|content|oid|mid',
 		'deduplication strategy'],
 'show	thread|t' => 'display entire thread a message belongs to',
-'query	thread|t' =>
+'q	thread|t' =>
 	'return all messages in the same thread as the actual match(es)',
 'augment|a' => 'augment --output destination instead of clobbering',
 
@@ -186,7 +186,7 @@ my %OPTDESC = (
 			'message/object output format' ],
 'mark	format|f=s' => $stdin_formats,
 'forget	format|f=s' => $stdin_formats,
-'query	format|f=s' => [ 'OUT|maildir|mboxrd|mboxcl2|mboxcl|html|oid',
+'q	format|f=s' => [ 'OUT|maildir|mboxrd|mboxcl2|mboxcl|html|oid|json',
 		'specify output format, default depends on --output'],
 'ls-query	format|f=s' => $ls_format,
 'ls-extinbox	format|f=s' => $ls_format,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* "extinbox" term - was: [RFC 4/7] lei: proposed command-listing...
  2020-12-15 11:47 ` [RFC 4/7] lei: proposed command-listing and options Eric Wong
@ 2020-12-26 11:26   ` Eric Wong
  2020-12-28 15:29     ` Kyle Meyer
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-12-26 11:26 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> +'add-extinbox' => [ 'URL-OR-PATHNAME',
> +	'add/set priority of a publicinbox|extindex for extra matches',
> +	qw(prio=i) ],
> +'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex sources',
> +	qw(format|f=s z local remote) ],
> +'forget-extinbox' => [ '{URL-OR-PATHNAME|--prune}',
> +	'exclude further results from a publicinbox|extindex',
> +	qw(prune) ],

I'm a bit iffy on "extinbox"  It's supposed to be a short
version meaning "either external index or a public inbox"

However, it's the same length and only two middle letters
away from "extindex" (short for "external index").

Would "inboxish" be an appropriate term in place of "extinbox"?
There's precedent with git using the terms "treeish" and
"committish".

I also don't want to force a user to specify the type, since it
will support HTTP(S) URLs and not just on-filesystem storage.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "extinbox" term - was: [RFC 4/7] lei: proposed command-listing...
  2020-12-26 11:26   ` "extinbox" term - was: [RFC 4/7] lei: proposed command-listing Eric Wong
@ 2020-12-28 15:29     ` Kyle Meyer
  2020-12-28 21:55       ` Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Meyer @ 2020-12-28 15:29 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:

> Eric Wong <e@80x24.org> wrote:
>> +'add-extinbox' => [ 'URL-OR-PATHNAME',
>> +	'add/set priority of a publicinbox|extindex for extra matches',
>> +	qw(prio=i) ],
>> +'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex sources',
>> +	qw(format|f=s z local remote) ],
>> +'forget-extinbox' => [ '{URL-OR-PATHNAME|--prune}',
>> +	'exclude further results from a publicinbox|extindex',
>> +	qw(prune) ],
>
> I'm a bit iffy on "extinbox"  It's supposed to be a short
> version meaning "either external index or a public inbox"
>
> However, it's the same length and only two middle letters
> away from "extindex" (short for "external index").

Fwiw my brain made the incorrect extinbox => extindex jump when first
glancing over the command names before reading the descriptions.

> Would "inboxish" be an appropriate term in place of "extinbox"?
> There's precedent with git using the terms "treeish" and
> "committish".

Yeah, that seems okay.  I think "ish" would certainly make it clear to
the reader that there is more going on while avoiding the issue above,
but I wonder if that's really much better than just using "inbox" in the
command names and making the descriptions state something to the effect
of "... or external index".  At least from the standpoint of the search
UI, it seems natural to think of an external index as an "inbox", but
perhaps such an overloading is setting things up for confusion.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "extinbox" term - was: [RFC 4/7] lei: proposed command-listing...
  2020-12-28 15:29     ` Kyle Meyer
@ 2020-12-28 21:55       ` Eric Wong
  2020-12-29  3:01         ` Kyle Meyer
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-12-28 21:55 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> 
> > Eric Wong <e@80x24.org> wrote:
> >> +'add-extinbox' => [ 'URL-OR-PATHNAME',
> >> +	'add/set priority of a publicinbox|extindex for extra matches',
> >> +	qw(prio=i) ],
> >> +'ls-extinbox' => [ '[FILTER]', 'list publicinbox|extindex sources',
> >> +	qw(format|f=s z local remote) ],
> >> +'forget-extinbox' => [ '{URL-OR-PATHNAME|--prune}',
> >> +	'exclude further results from a publicinbox|extindex',
> >> +	qw(prune) ],
> >
> > I'm a bit iffy on "extinbox"  It's supposed to be a short
> > version meaning "either external index or a public inbox"
> >
> > However, it's the same length and only two middle letters
> > away from "extindex" (short for "external index").
> 
> Fwiw my brain made the incorrect extinbox => extindex jump when first
> glancing over the command names before reading the descriptions.

What about just "external"?  It could probably be extended to
handle existing IMAP, JMAP, notmuch, mairix, etc... as search
sources with query translation, even.

> > Would "inboxish" be an appropriate term in place of "extinbox"?
> > There's precedent with git using the terms "treeish" and
> > "committish".
> 
> Yeah, that seems okay.  I think "ish" would certainly make it clear to
> the reader that there is more going on while avoiding the issue above,
> but I wonder if that's really much better than just using "inbox" in the
> command names and making the descriptions state something to the effect
> of "... or external index".  At least from the standpoint of the search
> UI, it seems natural to think of an external index as an "inbox", but
> perhaps such an overloading is setting things up for confusion.

I'm using inboxish/ibxish internally, at least.  But now I'm
thinking "external" would give us more flexibility w.r.t. future
features.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "extinbox" term - was: [RFC 4/7] lei: proposed command-listing...
  2020-12-28 21:55       ` Eric Wong
@ 2020-12-29  3:01         ` Kyle Meyer
  0 siblings, 0 replies; 17+ messages in thread
From: Kyle Meyer @ 2020-12-29  3:01 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:


> What about just "external"?  It could probably be extended to
> handle existing IMAP, JMAP, notmuch, mairix, etc... as search
> sources with query translation, even.

"external" sounds good to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-29  3:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15 11:47 [PATCH/RFC 0/7] lei - Local Email Interface skeleton Eric Wong
2020-12-15 11:47 ` [PATCH 1/7] daemon: support --daemonize without Net::Server::Daemonize Eric Wong
2020-12-15 11:47 ` [PATCH 2/7] daemon: simplify fork() failure checks Eric Wong
2020-12-15 11:47 ` [RFC 3/7] lei: FD-passing and IPC basics Eric Wong
2020-12-15 11:47 ` [RFC 4/7] lei: proposed command-listing and options Eric Wong
2020-12-26 11:26   ` "extinbox" term - was: [RFC 4/7] lei: proposed command-listing Eric Wong
2020-12-28 15:29     ` Kyle Meyer
2020-12-28 21:55       ` Eric Wong
2020-12-29  3:01         ` Kyle Meyer
2020-12-15 11:47 ` [RFC 5/7] lei_store: local storage for Local Email Interface Eric Wong
2020-12-15 11:47 ` [RFC 6/7] tests: more common JSON module loading Eric Wong
2020-12-15 11:47 ` [RFC 7/7] lei: use spawn (vfork + execve) for lazy start Eric Wong
2020-12-15 12:05 ` more considerations in UI/UX Eric Wong
2020-12-23  5:42   ` Kyle Meyer
2020-12-23  9:47     ` Eric Wong
2020-12-23 15:49       ` Kyle Meyer
2020-12-26 11:13     ` [RFC] lei: rename proposed "query" command to "q", add JSON output Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).