From: Eric Wong <e@80x24.org>
To: Kyle Meyer <kyle@kyleam.com>
Cc: meta@public-inbox.org
Subject: [PATCH 4/3] lei p2q: fix /dev/null filenames, fix phrase quoting rules
Date: Mon, 1 Mar 2021 11:47:36 +0600 [thread overview]
Message-ID: <20210301054736.GA24278@dcvr> (raw)
In-Reply-To: <87k0qrrhve.fsf@kyleam.com>
Kyle Meyer <kyle@kyleam.com> wrote:
> I noticed an unexpected term when trying dfa:
>
> $ curl -fSs \
> https://public-inbox.org/meta/20210228122528.18552-2-e@80x24.org/raw >msg
> $ lei p2q --want=dfa msg
> dfa:my @WQ_KEYS = qw dfa:"lxs l2m imp mrr cnv" dfa:"internal workers" dfa:dev/null
>
> So I think the upstream "--- " filename regexp needs to be adjusted to
> account for "/dev/null".
Thanks. Also, "my @WQ_KEYS = qw" needs to be quoted, at least,
(and maybe '(' and ')', need to check Xapian more closely....
And I'll have to fix them in SearchIdx (and probably switch to
use a common parser for indexing + term generation).
On a side note: I find myself mega-confused using public-inbox
patches as test data. I thought Perl was choking and spitting
code back out at me :x
---8<---
Subject: [PATCH] lei p2q: fix /dev/null filenames, fix phrase quoting rules
/dev/null mis-handling was reported by Kyle Meyer.
Phrases quoting rules are also refined to avoid leaving spaces
unquoted when "phrase generator" characters exist. Also,
context-free hunk headers no longer clobber the in_diff
state of the parser, since git can still generate those.
Link: https://public-inbox.org/meta/87k0qrrhve.fsf@kyleam.com/
---
lib/PublicInbox/LeiP2q.pm | 10 +++++++---
t/lei-p2q.t | 3 +++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/LeiP2q.pm b/lib/PublicInbox/LeiP2q.pm
index d1dd125e..e7ddc852 100644
--- a/lib/PublicInbox/LeiP2q.pm
+++ b/lib/PublicInbox/LeiP2q.pm
@@ -12,6 +12,7 @@ use PublicInbox::MsgIter qw(msg_part_text);
use PublicInbox::Git qw(git_unquote);
use PublicInbox::Spawn qw(popen_rd);
use URI::Escape qw(uri_escape_utf8);
+my $FN = qr!((?:"?[^/\n]+/[^\r\n]+)|/dev/null)!;
sub xphrase ($) {
my ($s) = @_;
@@ -23,7 +24,7 @@ sub xphrase ($) {
map {
s/\A\s*//;
s/\s+\z//;
- /[\|=><,\sA-Z]/ && !m![\./:\\\@]! ? qq("$_") : $_;
+ m![^\./:\\\@\-\w]! ? qq("$_") : $_ ;
} ($s =~ m!(\w[\|=><,\./:\\\@\-\w\s]+)!g);
}
@@ -40,7 +41,7 @@ sub extract_terms { # eml->each_part callback
push @{$lei->{qterms}->{dfctx}}, xphrase($_);
} elsif (/^-- $/) { # email signature begins
$in_diff = undef;
- } elsif (m!^diff --git "?[^/]+/.+ "?[^/]+/.+\z!) {
+ } elsif (m!^diff --git $FN $FN!) {
# wait until "---" and "+++" to capture filenames
$in_diff = 1;
} elsif (/^index ([a-f0-9]+)\.\.([a-f0-9]+)\b/) {
@@ -48,13 +49,16 @@ sub extract_terms { # eml->each_part callback
push @{$lei->{qterms}->{dfpre}}, $oa;
push @{$lei->{qterms}->{dfpost}}, $ob;
# who uses dfblob?
- } elsif (m!^(?:---|\+{3}) ("?[^/]+/.+)!) {
+ } elsif (m!^(?:---|\+{3}) ($FN)!) {
+ next if $1 eq '/dev/null';
my $fn = (split(m!/!, git_unquote($1.''), 2))[1];
push @{$lei->{qterms}->{dfn}}, xphrase($fn);
} elsif ($in_diff && s/^\+//) { # diff added
push @{$lei->{qterms}->{dfb}}, xphrase($_);
} elsif ($in_diff && s/^-//) { # diff removed
push @{$lei->{qterms}->{dfa}}, xphrase($_);
+ } elsif (/^@@ (?:\S+) (?:\S+) @@\s*$/) {
+ # traditional diff w/o -p
} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)/) {
push @{$lei->{qterms}->{dfhh}}, xphrase($1);
} elsif (/^(?:dis)similarity index/ ||
diff --git a/t/lei-p2q.t b/t/lei-p2q.t
index 1a2c2e4f..87cf9fa7 100644
--- a/t/lei-p2q.t
+++ b/t/lei-p2q.t
@@ -25,5 +25,8 @@ test_lei(sub {
"dfpost:6e006fd73b OR " .
"dfpost:6e006fd73\n",
'3-byte chop');
+
+ lei_ok(qw(p2q t/data/message_embed.eml --want=dfb));
+ like($lei_out, qr/\bdfb:\S+/, 'got dfb off /dev/null file');
});
done_testing;
next prev parent reply other threads:[~2021-03-01 5:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-28 12:25 [PATCH 0/3] lei p2q (patch-to-query) Eric Wong
2021-02-28 12:25 ` [PATCH 1/3] lei p2q: patch-to-query generator for "lei q --stdin" Eric Wong
2021-02-28 21:40 ` Kyle Meyer
2021-03-01 5:47 ` Eric Wong [this message]
2021-02-28 12:25 ` [PATCH 2/3] lei q: fix "-" shortcut for --stdin Eric Wong
2021-02-28 12:25 ` [PATCH 3/3] lei q: improve early aborts w/ remote externals Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210301054736.GA24278@dcvr \
--to=e@80x24.org \
--cc=kyle@kyleam.com \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).