unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Kyle Meyer <kyle@kyleam.com>
Cc: meta@public-inbox.org
Subject: [PATCH 4/3] lei p2q: fix /dev/null filenames, fix phrase quoting rules
Date: Mon, 1 Mar 2021 11:47:36 +0600	[thread overview]
Message-ID: <20210301054736.GA24278@dcvr> (raw)
In-Reply-To: <87k0qrrhve.fsf@kyleam.com>

Kyle Meyer <kyle@kyleam.com> wrote:
> I noticed an unexpected term when trying dfa:
> 
>   $ curl -fSs \
>     https://public-inbox.org/meta/20210228122528.18552-2-e@80x24.org/raw >msg
>   $ lei p2q --want=dfa msg
>   dfa:my @WQ_KEYS = qw dfa:"lxs l2m imp mrr cnv" dfa:"internal workers" dfa:dev/null
> 
> So I think the upstream "--- " filename regexp needs to be adjusted to
> account for "/dev/null".

Thanks.  Also, "my @WQ_KEYS = qw" needs to be quoted, at least,
(and maybe '(' and ')', need to check Xapian more closely....

And I'll have to fix them in SearchIdx (and probably switch to
use a common parser for indexing + term generation).

On a side note: I find myself mega-confused using public-inbox
patches as test data.  I thought Perl was choking and spitting
code back out at me :x

---8<---
Subject: [PATCH] lei p2q: fix /dev/null filenames, fix phrase quoting rules

/dev/null mis-handling was reported by Kyle Meyer.

Phrases quoting rules are also refined to avoid leaving spaces
unquoted when "phrase generator" characters exist.  Also,
context-free hunk headers no longer clobber the in_diff
state of the parser, since git can still generate those.

Link: https://public-inbox.org/meta/87k0qrrhve.fsf@kyleam.com/
---
 lib/PublicInbox/LeiP2q.pm | 10 +++++++---
 t/lei-p2q.t               |  3 +++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiP2q.pm b/lib/PublicInbox/LeiP2q.pm
index d1dd125e..e7ddc852 100644
--- a/lib/PublicInbox/LeiP2q.pm
+++ b/lib/PublicInbox/LeiP2q.pm
@@ -12,6 +12,7 @@ use PublicInbox::MsgIter qw(msg_part_text);
 use PublicInbox::Git qw(git_unquote);
 use PublicInbox::Spawn qw(popen_rd);
 use URI::Escape qw(uri_escape_utf8);
+my $FN = qr!((?:"?[^/\n]+/[^\r\n]+)|/dev/null)!;
 
 sub xphrase ($) {
 	my ($s) = @_;
@@ -23,7 +24,7 @@ sub xphrase ($) {
 	map {
 		s/\A\s*//;
 		s/\s+\z//;
-		/[\|=><,\sA-Z]/ && !m![\./:\\\@]! ? qq("$_") : $_;
+		m![^\./:\\\@\-\w]! ? qq("$_") : $_ ;
 	} ($s =~ m!(\w[\|=><,\./:\\\@\-\w\s]+)!g);
 }
 
@@ -40,7 +41,7 @@ sub extract_terms { # eml->each_part callback
 			push @{$lei->{qterms}->{dfctx}}, xphrase($_);
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
-		} elsif (m!^diff --git "?[^/]+/.+ "?[^/]+/.+\z!) {
+		} elsif (m!^diff --git $FN $FN!) {
 			# wait until "---" and "+++" to capture filenames
 			$in_diff = 1;
 		} elsif (/^index ([a-f0-9]+)\.\.([a-f0-9]+)\b/) {
@@ -48,13 +49,16 @@ sub extract_terms { # eml->each_part callback
 			push @{$lei->{qterms}->{dfpre}}, $oa;
 			push @{$lei->{qterms}->{dfpost}}, $ob;
 			# who uses dfblob?
-		} elsif (m!^(?:---|\+{3}) ("?[^/]+/.+)!) {
+		} elsif (m!^(?:---|\+{3}) ($FN)!) {
+			next if $1 eq '/dev/null';
 			my $fn = (split(m!/!, git_unquote($1.''), 2))[1];
 			push @{$lei->{qterms}->{dfn}}, xphrase($fn);
 		} elsif ($in_diff && s/^\+//) { # diff added
 			push @{$lei->{qterms}->{dfb}}, xphrase($_);
 		} elsif ($in_diff && s/^-//) { # diff removed
 			push @{$lei->{qterms}->{dfa}}, xphrase($_);
+		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*$/) {
+			# traditional diff w/o -p
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)/) {
 			push @{$lei->{qterms}->{dfhh}}, xphrase($1);
 		} elsif (/^(?:dis)similarity index/ ||
diff --git a/t/lei-p2q.t b/t/lei-p2q.t
index 1a2c2e4f..87cf9fa7 100644
--- a/t/lei-p2q.t
+++ b/t/lei-p2q.t
@@ -25,5 +25,8 @@ test_lei(sub {
 			"dfpost:6e006fd73b OR " .
 			"dfpost:6e006fd73\n",
 		'3-byte chop');
+
+	lei_ok(qw(p2q t/data/message_embed.eml --want=dfb));
+	like($lei_out, qr/\bdfb:\S+/, 'got dfb off /dev/null file');
 });
 done_testing;

  reply	other threads:[~2021-03-01  5:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-28 12:25 [PATCH 0/3] lei p2q (patch-to-query) Eric Wong
2021-02-28 12:25 ` [PATCH 1/3] lei p2q: patch-to-query generator for "lei q --stdin" Eric Wong
2021-02-28 21:40   ` Kyle Meyer
2021-03-01  5:47     ` Eric Wong [this message]
2021-02-28 12:25 ` [PATCH 2/3] lei q: fix "-" shortcut for --stdin Eric Wong
2021-02-28 12:25 ` [PATCH 3/3] lei q: improve early aborts w/ remote externals Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210301054736.GA24278@dcvr \
    --to=e@80x24.org \
    --cc=kyle@kyleam.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).