unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/9] lei p2q: more capable than originally thought
@ 2021-10-26 10:35 Eric Wong
  2021-10-26 10:35 ` [PATCH 1/9] doc: tuning: additional notes for many inboxes Eric Wong
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

I started writing documentation and managed to cobble up a p2q
example in patch 7/9 to find unapplied patches when combined
with "git log".  7/9 makes p2q use less memory, now.

Fixed a bunch of small bugs along the way and killed some
redundant code, too.

Eric Wong (9):
  doc: tuning: additional notes for many inboxes
  doc: lei-store-format: mail sync section, update IPC
  eml: keep body if no headers are found
  lei q: enable expensive Xapian flags
  lei inspect: fix atfork hook
  lei: add net getopt spec to various commands
  lei p2q: use LeiInput for multi-patch series
  lei rm|tag: drop redundant mbox+net callbacks
  input_pipe: account for undefined {sock}

 Documentation/lei-p2q.pod             |   6 ++
 Documentation/lei-store-format.pod    |  14 +++-
 Documentation/public-inbox-tuning.pod |  11 ++-
 lib/PublicInbox/Eml.pm                |   7 +-
 lib/PublicInbox/InputPipe.pm          |   2 +-
 lib/PublicInbox/LEI.pm                |  11 +--
 lib/PublicInbox/LeiInput.pm           |  30 +++++--
 lib/PublicInbox/LeiInspect.pm         |   2 +-
 lib/PublicInbox/LeiP2q.pm             | 115 +++++++++++++-------------
 lib/PublicInbox/LeiRm.pm              |  10 ---
 lib/PublicInbox/LeiSearch.pm          |   2 +
 lib/PublicInbox/LeiTag.pm             |  11 ---
 t/eml.t                               |  11 +++
 t/mbox_reader.t                       |   6 +-
 14 files changed, 134 insertions(+), 104 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/9] doc: tuning: additional notes for many inboxes
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 2/9] doc: lei-store-format: mail sync section, update IPC Eric Wong
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

-extindex is the most important piece for dealing with many
inboxes, so note it first.  Also, frequent use of "git gc" is
important for both loose object performance and reducing memory
mappings.
---
 Documentation/public-inbox-tuning.pod | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index 7b18b3bc4030..53668eccb7cb 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -165,12 +165,15 @@ Other OSes may have similar tuning knobs (patches appreciated).
 
 =head2 Scalability to many inboxes
 
+L<public-inbox-extindex(1)> allows any number of public-inboxes
+to share the same Xapian indices.
+
 git 2.33+ startup time is orders-of-magnitude faster and uses
 less memory when dealing with thousands of alternates required
-for thousands of inboxes.
+for thousands of inboxes with L<public-inbox-extindex(1)>.
 
-L<public-inbox-extindex(1)> allows any number of public-inboxes
-to share the same Xapian indices.
+Frequent packing (via L<git-gc(1)>) both improves performance
+and reduces the need to increase C<sys.vm.max_map_count>.
 
 =head1 CONTACT
 
@@ -184,6 +187,6 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>,
 
 =head1 COPYRIGHT
 
-Copyright 2020-2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/9] doc: lei-store-format: mail sync section, update IPC
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
  2021-10-26 10:35 ` [PATCH 1/9] doc: tuning: additional notes for many inboxes Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 3/9] eml: keep body if no headers are found Eric Wong
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

mail_sync.sqlite3 needs to be documented, and brings the IPC
section up-to-date while we're in the area.
---
 Documentation/lei-store-format.pod | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/Documentation/lei-store-format.pod b/Documentation/lei-store-format.pod
index 71aa72cb96e3..625c60f436a8 100644
--- a/Documentation/lei-store-format.pod
+++ b/Documentation/lei-store-format.pod
@@ -30,7 +30,6 @@ prevent them from being accidentally treated as a v2 inbox.
   $SHARD - Integer starting with 0 based on parallelism
 
   ~/.local/share/lei/store
-  - ipc.lock                        # lock file for internal lei IPC
   - local/$EPOCH.git                # normal bare git repositories
   - mail_sync.sqlite3               # sync state IMAP, Maildir, NNTP
 
@@ -66,11 +65,18 @@ stored in Xapian indices, volatile metadata is associated with
 the Xapian document, thus it is shared across different blobs of
 the "same" message.
 
+=head2 mail_sync.sqlite3
+
+This SQLite database maintained for bidirectional mapping of
+git blobs to IMAP UIDs, Maildir file names, and NNTP article numbers.
+
+It is also used for retrieving messages from Maildirs indexed by
+L<lei-index(1)>.
+
 =head1 IPC
 
-When L<lei(1)> is run in daemon mode, L<flock(2)> is used on
-C<ipc.lock> is used to serialize writes to C<lei/store> across
-multiple internal lei workers while minimizing commits.
+L<lei-daemon(8)> communicates with the C<lei/store> process using
+L<unix(7)> C<SOCK_SEQPACKET> sockets.
 
 =head1 CAVEATS
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/9] eml: keep body if no headers are found
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
  2021-10-26 10:35 ` [PATCH 1/9] doc: tuning: additional notes for many inboxes Eric Wong
  2021-10-26 10:35 ` [PATCH 2/9] doc: lei-store-format: mail sync section, update IPC Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 4/9] lei q: enable expensive Xapian flags Eric Wong
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

This easily allows us to treat "git diff" output as header-less
"messages" for commands such as "lei p2q".
---
 lib/PublicInbox/Eml.pm |  7 ++++---
 t/eml.t                | 11 +++++++++++
 t/mbox_reader.t        |  6 +++++-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 3c681ba5bc2e..485f637a3e7b 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -122,9 +122,10 @@ sub new {
 		my $hdr = substr($$ref, 0, $header_size_limit + 1);
 		hdr_truncate($hdr) if length($hdr) > $header_size_limit;
 		bless { hdr => \$hdr, crlf => $1 }, __PACKAGE__;
-	} else { # nothing useful
-		my $hdr = $$ref = '';
-		bless { hdr => \$hdr, crlf => "\n" }, __PACKAGE__;
+	} else { # just a body w/o header?
+		my $hdr = '';
+		my $eol = ($$ref =~ /(\r?\n)/) ? $1 : "\n";
+		bless { hdr => \$hdr, crlf => $eol, bdy => $ref }, __PACKAGE__;
 	}
 }
 
diff --git a/t/eml.t b/t/eml.t
index 0cf48f225a45..2d8993a51075 100644
--- a/t/eml.t
+++ b/t/eml.t
@@ -216,6 +216,17 @@ if ('one newline before headers') {
 	is($eml->body, "");
 }
 
+if ('body only') {
+	my $str = <<EOM;
+--- a/lib/PublicInbox/Eml.pm
++++ b/lib/PublicInbox/Eml.pm
+@@ -122,9 +122,10 @@ sub new {
+\x20
+EOM
+	my $eml = PublicInbox::Eml->new($str);
+	is($eml->body, $str, 'body-only accepted');
+}
+
 for my $cls (@classes) { # XXX: matching E::M, but not sure about this
 	my $s = <<EOF;
 Content-Type: multipart/mixed; boundary="b"
diff --git a/t/mbox_reader.t b/t/mbox_reader.t
index e5f57d7bcba3..87e8f397662c 100644
--- a/t/mbox_reader.t
+++ b/t/mbox_reader.t
@@ -138,7 +138,11 @@ EOM
 		PublicInbox::MboxReader->$m($fh, sub {
 			push @x, $_[0]->as_string
 		});
-		is_deeply(\@x, [], "messages in invalid $m");
+		if ($m =~ /\Amboxcl/) {
+			is_deeply(\@x, [], "messages in invalid $m");
+		} else {
+			is_deeply(\@x, [ "\n$html" ], "body-only $m");
+		}
 		is_deeply([grep(!/^W: leftover/, @w)], [],
 			"no extra warnings besides leftover ($m)");
 	}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/9] lei q: enable expensive Xapian flags
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (2 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 3/9] eml: keep body if no headers are found Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 5/9] lei inspect: fix atfork hook Eric Wong
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

FLAG_PURE_NOT is too expensive for public-facing WWW use, but
lei isn't public-facing.  We'll also unconditionally enable
phrase search on old "chert" DBs since lei doesn't need to
worry about fairness across 10K users.
---
 lib/PublicInbox/LeiSearch.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/PublicInbox/LeiSearch.pm b/lib/PublicInbox/LeiSearch.pm
index 1fb38da1d7aa..d0ca13f0aa11 100644
--- a/lib/PublicInbox/LeiSearch.pm
+++ b/lib/PublicInbox/LeiSearch.pm
@@ -175,6 +175,8 @@ sub all_terms {
 sub qparse_new {
 	my ($self) = @_;
 	my $qp = $self->SUPER::qparse_new; # PublicInbox::Search
+	$self->{qp_flags} |= PublicInbox::Search::FLAG_PHRASE() |
+				PublicInbox::Search::FLAG_PURE_NOT();
 	$qp->add_boolean_prefix('kw', 'K');
 	$qp->add_boolean_prefix('L', 'L');
 	$qp

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/9] lei inspect: fix atfork hook
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (3 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 4/9] lei q: enable expensive Xapian flags Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 6/9] lei: add net getopt spec to various commands Eric Wong
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

The misnamed sub wasn't firing, but was unlikely to be
noticeable given the short lifetime of the process.

Fixes: 1f887bd51d92b0d4 ("lei inspect: add atfork hook")
---
 lib/PublicInbox/LeiInspect.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiInspect.pm b/lib/PublicInbox/LeiInspect.pm
index 5ea32ccb7e66..d7775d4b6162 100644
--- a/lib/PublicInbox/LeiInspect.pm
+++ b/lib/PublicInbox/LeiInspect.pm
@@ -294,7 +294,7 @@ sub _complete_inspect {
 	# TODO: message-ids?, blobs? could get expensive...
 }
 
-sub input_only_atfork_child {
+sub ipc_atfork_child {
 	my ($self) = @_;
 	$self->{lei}->_lei_atfork_child;
 	$self->SUPER::ipc_atfork_child;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/9] lei: add net getopt spec to various commands
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (4 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 5/9] lei inspect: fix atfork hook Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 7/9] lei p2q: use LeiInput for multi-patch series Eric Wong
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

All of these commands should support --proxy, at least, if not
other curl options.
---
 lib/PublicInbox/LEI.pm | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 32d4b9f3b427..93a7b426de3c 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -179,7 +179,7 @@ our %CMD = ( # sorted in order of importance/use:
 
 'up' => [ 'OUTPUT...|--all', 'update saved search',
 	qw(jobs|j=s lock=s@ alert=s@ mua=s verbose|v+ exclude=s@
-	remote-fudge-time=s all:s remote! local! external!), @c_opt ],
+	remote-fudge-time=s all:s remote! local! external!), @net_opt, @c_opt ],
 
 'lcat' => [ '--stdin|MSGID_OR_URL...', 'display local copy of message(s)',
 	'stdin|', # /|\z/ must be first for lone dash
@@ -215,7 +215,7 @@ our %CMD = ( # sorted in order of importance/use:
 'ls-mail-sync' => [ '[FILTER]', 'list mail sync folders',
 		qw(z|0 globoff|g invert-match|v local remote), @c_opt ],
 'ls-mail-source' => [ 'URL', 'list IMAP or NNTP mail source folders',
-		qw(z|0 ascii l pretty url), @c_opt ],
+		qw(z|0 ascii l pretty url), @net_opt, @c_opt ],
 'forget-external' => [ 'LOCATION...|--prune',
 	'exclude further results from a publicinbox|extindex',
 	qw(prune), @c_opt ],
@@ -265,7 +265,8 @@ our %CMD = ( # sorted in order of importance/use:
 'forget-mail-sync' => [ 'LOCATION...',
 	'forget sync information for a mail folder', @c_opt ],
 'refresh-mail-sync' => [ 'LOCATION...|--all',
-	'prune dangling sync data for a mail folder', 'all:s', @c_opt ],
+	'prune dangling sync data for a mail folder', 'all:s',
+		@net_opt, @c_opt ],
 'export-kw' => [ 'LOCATION...|--all',
 	'one-time export of keywords of sync sources',
 	qw(all:s mode=s), @net_opt, @c_opt ],

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/9] lei p2q: use LeiInput for multi-patch series
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (5 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 6/9] lei: add net getopt spec to various commands Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 8/9] lei rm|tag: drop redundant mbox+net callbacks Eric Wong
  2021-10-26 10:35 ` [PATCH 9/9] input_pipe: account for undefined {sock} Eric Wong
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

The LeiInput backend now allows p2q to work like any other
command which reads .eml, .patch, mbox*, Maildir, IMAP, and NNTP
input.  Running "git format-patch --stdout -1 $COMMIT" remains
supported.

This is intended to allow lower memory use while parsing
"git log --pretty=mboxrd -p" output.  Previously, the entire
output of "git log" would be slurped into memory at once.

The intended use is to allow easy(-ish :P) searching for
unapplied patches as documented in the new example in the
manpage.
---
 Documentation/lei-p2q.pod   |   6 ++
 lib/PublicInbox/LEI.pm      |   4 +-
 lib/PublicInbox/LeiInput.pm |  30 ++++++++--
 lib/PublicInbox/LeiP2q.pm   | 115 ++++++++++++++++++------------------
 4 files changed, 89 insertions(+), 66 deletions(-)

diff --git a/Documentation/lei-p2q.pod b/Documentation/lei-p2q.pod
index 2e0b1ab676ed..4bc5d25f8ef0 100644
--- a/Documentation/lei-p2q.pod
+++ b/Documentation/lei-p2q.pod
@@ -77,6 +77,12 @@ Suppress feedback messages.
   # to view results on a remote HTTP(S) public-inbox instance
   $BROWSER https://example.com/pub-inbox/?q=$(lei p2q --uri $COMMIT_OID)
 
+  # to view unapplied patches for a given $FILE from the past year:
+  echo \( rt:last.year.. AND dfn:$FILE \) AND NOT \( \
+	$(git log -p --pretty=mboxrd --since=last.year $FILE |
+		lei p2q -F mboxrd )
+	\) | lei q -o /tmp/unapplied
+
 =head1 CONTACT
 
 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 93a7b426de3c..f9d24f29c87d 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -274,9 +274,9 @@ our %CMD = ( # sorted in order of importance/use:
 	'one-time conversion from URL or filesystem to another format',
 	qw(stdin| in-format|F=s out-format|f=s output|mfolder|o=s lock=s@ kw!),
 	@net_opt, @c_opt ],
-'p2q' => [ 'FILE|COMMIT_OID|--stdin',
+'p2q' => [ 'LOCATION_OR_COMMIT...|--stdin',
 	"use a patch to generate a query for `lei q --stdin'",
-	qw(stdin| want|w=s@ uri debug), @c_opt ],
+	qw(stdin| in-format|F=s want|w=s@ uri debug), @net_opt, @c_opt ],
 'config' => [ '[...]', sub {
 		'git-config(1) wrapper for '._config_path($_[0]);
 	}, qw(config-file|system|global|file|f=s), # for conflict detection
diff --git a/lib/PublicInbox/LeiInput.pm b/lib/PublicInbox/LeiInput.pm
index 2621fc1f9d05..540681e3ff6b 100644
--- a/lib/PublicInbox/LeiInput.pm
+++ b/lib/PublicInbox/LeiInput.pm
@@ -64,6 +64,11 @@ sub input_mbox_cb { # base MboxReader callback
 	$self->input_eml_cb($eml);
 }
 
+sub input_net_cb { # imap_each, nntp_each cb
+	my ($url, $uid, $kw, $eml, $self) = @_;
+	$self->input_eml_cb($eml);
+}
+
 # import a single file handle of $name
 # Subclass must define ->input_eml_cb and ->input_mbox_cb
 sub input_fh {
@@ -108,10 +113,10 @@ sub handle_http_input ($$@) {
 	grep(/\A--compressed\z/, @$curl) or
 		$fh = IO::Uncompress::Gunzip->new($fh, MultiStream => 1);
 	eval { $self->input_fh('mboxrd', $fh, $url, @args) };
-	my $err = $@;
+	my @err = ($@ ? $@ : ());
 	$ar->join;
-	$? || $err and
-		$lei->child_error($?, "@$cmd failed".$err ? " $err" : '');
+	push(@err, "\$?=$?") if $?;
+	$lei->child_error($?, "@$cmd failed: @err") if @err;
 }
 
 sub input_path_url {
@@ -184,7 +189,17 @@ EOM
 						$self, @args);
 		}
 	} elsif ($self->{missing_ok} && !-e $input) { # don't ->fail
-		$self->folder_missing("$ifmt:$input");
+		if ($lei->{cmd} eq 'p2q') {
+			my $fp = [ qw(git format-patch --stdout -1), $input ];
+			my $rdr = { 2 => $lei->{2} };
+			my $fh = popen_rd($fp, undef, $rdr);
+			eval { $self->input_fh('eml', $fh, $input, @args) };
+			my @err = ($@ ? $@ : ());
+			close($fh) or push @err, "\$?=$?";
+			$lei->child_error($?, "@$fp failed: @err") if @err;
+		} else {
+			$self->folder_missing("$ifmt:$input");
+		}
 	} else {
 		$lei->fail("$ifmt_pfx$input unsupported (TODO)");
 	}
@@ -330,9 +345,12 @@ $input is `eml', not --in-format=$in_fmt
 				}
 				push @md, $input;
 			} elsif ($self->{missing_ok} && !-e $input) {
-				# for lei rm-watch
-				$may_sync and $input = 'maildir:'.
+				if ($lei->{cmd} eq 'p2q') {
+					# will run "git format-patch"
+				} elsif ($may_sync) { # for lei rm-watch
+					$input = 'maildir:'.
 						$lei->abs_path($input);
+				}
 			} else {
 				return $lei->fail("Unable to handle $input")
 			}
diff --git a/lib/PublicInbox/LeiP2q.pm b/lib/PublicInbox/LeiP2q.pm
index 08ec81c5295e..09ec0a079bb9 100644
--- a/lib/PublicInbox/LeiP2q.pm
+++ b/lib/PublicInbox/LeiP2q.pm
@@ -1,16 +1,16 @@
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # front-end for the "lei patch-to-query" sub-command
 package PublicInbox::LeiP2q;
 use strict;
 use v5.10.1;
-use parent qw(PublicInbox::IPC);
+use parent qw(PublicInbox::IPC PublicInbox::LeiInput);
 use PublicInbox::Eml;
 use PublicInbox::Smsg;
 use PublicInbox::MsgIter qw(msg_part_text);
 use PublicInbox::Git qw(git_unquote);
-use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::OnDestroy;
 use URI::Escape qw(uri_escape_utf8);
 my $FN = qr!((?:"?[^/\n]+/[^\r\n]+)|/dev/null)!;
 
@@ -28,8 +28,16 @@ sub xphrase ($) {
 	} ($s =~ m!(\w[\|=><,\./:\\\@\-\w\s]+)!g);
 }
 
+sub add_qterm ($$@) {
+	my ($self, $p, @v) = @_;
+	for (@v) {
+		$self->{qseen}->{"$p\0$_"} //=
+			push(@{$self->{qterms}->{$p}}, $_);
+	}
+}
+
 sub extract_terms { # eml->each_part callback
-	my ($p, $lei) = @_;
+	my ($p, $self) = @_;
 	my $part = $p->[0]; # ignore $depth and @idx;
 	my $ct = $part->content_type || 'text/plain';
 	my ($s, undef) = msg_part_text($part, $ct);
@@ -38,7 +46,7 @@ sub extract_terms { # eml->each_part callback
 	# TODO: b: nq: q:
 	for (split(/\n/, $s)) {
 		if ($in_diff && s/^ //) { # diff context
-			push @{$lei->{qterms}->{dfctx}}, xphrase($_);
+			add_qterm($self, 'dfctx', xphrase($_));
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
 		} elsif (m!^diff --git $FN $FN!) {
@@ -46,21 +54,21 @@ sub extract_terms { # eml->each_part callback
 			$in_diff = 1;
 		} elsif (/^index ([a-f0-9]+)\.\.([a-f0-9]+)\b/) {
 			my ($oa, $ob) = ($1, $2);
-			push @{$lei->{qterms}->{dfpre}}, $oa;
-			push @{$lei->{qterms}->{dfpost}}, $ob;
+			add_qterm($self, 'dfpre', $oa);
+			add_qterm($self, 'dfpost', $ob);
 			# who uses dfblob?
 		} elsif (m!^(?:---|\+{3}) ($FN)!) {
 			next if $1 eq '/dev/null';
 			my $fn = (split(m!/!, git_unquote($1.''), 2))[1];
-			push @{$lei->{qterms}->{dfn}}, xphrase($fn);
+			add_qterm($self, 'dfn', xphrase($fn));
 		} elsif ($in_diff && s/^\+//) { # diff added
-			push @{$lei->{qterms}->{dfb}}, xphrase($_);
+			add_qterm($self, 'dfb', xphrase($_));
 		} elsif ($in_diff && s/^-//) { # diff removed
-			push @{$lei->{qterms}->{dfa}}, xphrase($_);
+			add_qterm($self, 'dfa', xphrase($_));
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*$/) {
 			# traditional diff w/o -p
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)/) {
-			push @{$lei->{qterms}->{dfhh}}, xphrase($1);
+			add_qterm($self, 'dfhh', xphrase($1));
 		} elsif (/^(?:dis)similarity index/ ||
 				/^(?:old|new) mode/ ||
 				/^(?:deleted|new) file mode/ ||
@@ -92,53 +100,43 @@ my %pfx2smsg = (
 	rt => [ qw(ts) ], # ditto...
 );
 
-sub do_p2q { # via wq_do
-	my ($self) = @_;
-	my $lei = $self->{lei};
-	my $want = $lei->{opt}->{want} // [ qw(dfpost7) ];
-	my @want = split(/[, ]+/, "@$want");
-	for (@want) {
-		/\A(?:(d|dt|rt):)?([0-9]+)(\.(?:day|weeks)s?)?\z/ or next;
-		my ($pfx, $n, $unit) = ($1, $2, $3);
-		$n *= 86400 * ($unit =~ /week/i ? 7 : 1);
-		$_ = [ $pfx, $n ];
-	}
-	my $smsg = bless {}, 'PublicInbox::Smsg';
-	my $in = $self->{0};
-	my @cmd;
-	unless ($in) {
-		my $input = $self->{input};
-		my $devfd = $lei->path_to_fd($input) // return;
-		if ($devfd >= 0) {
-			$in = $lei->{$devfd};
-		} elsif (-e $input) {
-			open($in, '<', $input) or
-				return $lei->fail("open < $input: $!");
-		} else {
-			@cmd = (qw(git format-patch --stdout -1), $input);
-			$in = popen_rd(\@cmd, undef, { 2 => $lei->{2} });
+sub input_eml_cb { # used by PublicInbox::LeiInput::input_fh
+	my ($self, $eml) = @_;
+	my $diff_want = $self->{diff_want} // do {
+		my $want = $self->{lei}->{opt}->{want} // [ qw(dfpost7) ];
+		my @want = split(/[, ]+/, "@$want");
+		for (@want) {
+			/\A(?:(d|dt|rt):)?([0-9]+)(\.(?:day|weeks)s?)?\z/
+				or next;
+			my ($pfx, $n, $unit) = ($1, $2, $3);
+			$n *= 86400 * ($unit =~ /week/i ? 7 : 1);
+			$_ = [ $pfx, $n ];
 		}
+		$self->{want_order} = \@want;
+		$self->{diff_want} = +{ map { $_ => 1 } @want };
 	};
-	my $str = do { local $/; <$in> };
-	@cmd && !close($in) and return $lei->fail("E: @cmd failed: $?");
-	my $eml = PublicInbox::Eml->new(\$str);
-	$lei->{diff_want} = +{ map { $_ => 1 } @want };
+	my $smsg = bless {}, 'PublicInbox::Smsg';
 	$smsg->populate($eml);
 	while (my ($pfx, $fields) = each %pfx2smsg) {
-		next unless $lei->{diff_want}->{$pfx};
+		next unless $diff_want->{$pfx};
 		for my $f (@$fields) {
 			my $v = $smsg->{$f} // next;
-			push @{$lei->{qterms}->{$pfx}}, xphrase($v);
+			add_qterm($self, $pfx, xphrase($v));
 		}
 	}
-	$eml->each_part(\&extract_terms, $lei, 1);
+	$eml->each_part(\&extract_terms, $self, 1);
+}
+
+sub emit_query {
+	my ($self) = @_;
+	my $lei = $self->{lei};
 	if ($lei->{opt}->{debug}) {
 		my $json = ref(PublicInbox::Config->json)->new;
 		$json->utf8->canonical->pretty;
-		print { $lei->{2} } $json->encode($lei->{qterms});
+		print { $lei->{2} } $json->encode($self->{qterms});
 	}
 	my (@q, %seen);
-	for my $pfx (@want) {
+	for my $pfx (@{$self->{want_order}}) {
 		if (ref($pfx) eq 'ARRAY') {
 			my ($p, $t_range) = @$pfx; # TODO
 
@@ -148,7 +146,7 @@ sub do_p2q { # via wq_do
 		} else {
 			my $plusminus = ($pfx =~ s/\A([\+\-])//) ? $1 : '';
 			my $end = ($pfx =~ s/([0-9\*]+)\z//) ? $1 : '';
-			my $x = delete($lei->{qterms}->{$pfx}) or next;
+			my $x = delete($self->{qterms}->{$pfx}) or next;
 			my $star = $end =~ tr/*//d ? '*' : '';
 			my $min_len = ($end || 0) + 0;
 
@@ -181,24 +179,25 @@ sub do_p2q { # via wq_do
 }
 
 sub lei_p2q { # the "lei patch-to-query" entry point
-	my ($lei, $input) = @_;
-	my $self = bless {}, __PACKAGE__;
-	if ($lei->{opt}->{stdin}) {
-		$self->{0} = delete $lei->{0}; # guard from _lei_atfork_child
-	} else {
-		$self->{input} = $input;
-	}
-	my ($op_c, $ops) = $lei->workers_start($self, 1);
+	my ($lei, @inputs) = @_;
+	$lei->{opt}->{'in-format'} //= 'eml' if $lei->{opt}->{stdin};
+	my $self = bless { missing_ok => 1 }, __PACKAGE__;
+	$self->prepare_inputs($lei, \@inputs) or return;
+	my $ops = {};
+	$lei->{auth}->op_merge($ops, $self, $lei) if $lei->{auth};
+	(my $op_c, $ops) = $lei->workers_start($self, 1, $ops);
 	$lei->{wq1} = $self;
-	$self->wq_io_do('do_p2q', []);
-	$self->wq_close;
+	net_merge_all_done($self) unless $lei->{auth};
 	$lei->wait_wq_events($op_c, $ops);
 }
 
 sub ipc_atfork_child {
 	my ($self) = @_;
-	$self->{lei}->_lei_atfork_child;
-	$self->SUPER::ipc_atfork_child;
+	PublicInbox::LeiInput::input_only_atfork_child($self);
+	PublicInbox::OnDestroy->new($$, \&emit_query, $self);
 }
 
+no warnings 'once';
+*net_merge_all_done = \&PublicInbox::LeiInput::input_only_net_merge_all_done;
+
 1;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 8/9] lei rm|tag: drop redundant mbox+net callbacks
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (6 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 7/9] lei p2q: use LeiInput for multi-patch series Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  2021-10-26 10:35 ` [PATCH 9/9] input_pipe: account for undefined {sock} Eric Wong
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

These are supplied by the base LeiInput class
---
 lib/PublicInbox/LeiRm.pm  | 10 ----------
 lib/PublicInbox/LeiTag.pm | 11 -----------
 2 files changed, 21 deletions(-)

diff --git a/lib/PublicInbox/LeiRm.pm b/lib/PublicInbox/LeiRm.pm
index 97b1c5c15dc4..524c178e3b47 100644
--- a/lib/PublicInbox/LeiRm.pm
+++ b/lib/PublicInbox/LeiRm.pm
@@ -13,16 +13,6 @@ sub input_eml_cb { # used by PublicInbox::LeiInput::input_fh
 	$self->{lei}->{sto}->wq_do('remove_eml', $eml);
 }
 
-sub input_mbox_cb { # MboxReader callback
-	my ($eml, $self) = @_;
-	input_eml_cb($self, $eml);
-}
-
-sub input_net_cb { # callback for ->imap_each, ->nntp_each
-	my (undef, undef, $kw, $eml, $self) = @_; # @_[0,1]: url + uid ignored
-	input_eml_cb($self, $eml);
-}
-
 sub input_maildir_cb {
 	my (undef, $kw, $eml, $self) = @_; # $_[0] $filename ignored
 	input_eml_cb($self, $eml);
diff --git a/lib/PublicInbox/LeiTag.pm b/lib/PublicInbox/LeiTag.pm
index 77654d1a2f22..d64a9f86e05a 100644
--- a/lib/PublicInbox/LeiTag.pm
+++ b/lib/PublicInbox/LeiTag.pm
@@ -19,23 +19,12 @@ sub input_eml_cb { # used by PublicInbox::LeiInput::input_fh
 	}
 }
 
-sub input_mbox_cb {
-	my ($eml, $self) = @_;
-	$eml->header_set($_) for (qw(X-Status Status));
-	input_eml_cb($self, $eml);
-}
-
 sub pmdir_cb { # called via wq_io_do from LeiPmdir->each_mdir_fn
 	my ($self, $f) = @_;
 	my $eml = eml_from_path($f) or return;
 	input_eml_cb($self, $eml);
 }
 
-sub input_net_cb { # imap_each, nntp_each cb
-	my ($url, $uid, $kw, $eml, $self) = @_;
-	input_eml_cb($self, $eml);
-}
-
 sub lei_tag { # the "lei tag" method
 	my ($lei, @argv) = @_;
 	my $sto = $lei->_lei_store(1);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 9/9] input_pipe: account for undefined {sock}
  2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
                   ` (7 preceding siblings ...)
  2021-10-26 10:35 ` [PATCH 8/9] lei rm|tag: drop redundant mbox+net callbacks Eric Wong
@ 2021-10-26 10:35 ` Eric Wong
  8 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2021-10-26 10:35 UTC (permalink / raw)
  To: meta

It's possible for ->event_step to fire twice due to ->requeue
with EPOLLET (but not EPOLLONESHOT).  So account for that and
avoid causing event loop errors as a result.
---
 lib/PublicInbox/InputPipe.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/InputPipe.pm b/lib/PublicInbox/InputPipe.pm
index 00813a0701b1..e1e26e20b9d2 100644
--- a/lib/PublicInbox/InputPipe.pm
+++ b/lib/PublicInbox/InputPipe.pm
@@ -18,7 +18,7 @@ sub consume {
 
 sub event_step {
 	my ($self) = @_;
-	my $r = sysread($self->{sock}, my $rbuf, 65536);
+	my $r = sysread($self->{sock} // return, my $rbuf, 65536);
 	if ($r) {
 		$self->{cb}->(@{$self->{args} // []}, $rbuf);
 		return $self->requeue; # may be regular file or pipe

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-26 10:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 10:35 [PATCH 0/9] lei p2q: more capable than originally thought Eric Wong
2021-10-26 10:35 ` [PATCH 1/9] doc: tuning: additional notes for many inboxes Eric Wong
2021-10-26 10:35 ` [PATCH 2/9] doc: lei-store-format: mail sync section, update IPC Eric Wong
2021-10-26 10:35 ` [PATCH 3/9] eml: keep body if no headers are found Eric Wong
2021-10-26 10:35 ` [PATCH 4/9] lei q: enable expensive Xapian flags Eric Wong
2021-10-26 10:35 ` [PATCH 5/9] lei inspect: fix atfork hook Eric Wong
2021-10-26 10:35 ` [PATCH 6/9] lei: add net getopt spec to various commands Eric Wong
2021-10-26 10:35 ` [PATCH 7/9] lei p2q: use LeiInput for multi-patch series Eric Wong
2021-10-26 10:35 ` [PATCH 8/9] lei rm|tag: drop redundant mbox+net callbacks Eric Wong
2021-10-26 10:35 ` [PATCH 9/9] input_pipe: account for undefined {sock} Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).