From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 06/11] lei q: -I/--exclude/--only support globs and basenames
Date: Tue, 2 Feb 2021 22:11:38 -1000 [thread overview]
Message-ID: <20210203081143.24424-7-e@80x24.org> (raw)
In-Reply-To: <20210203081143.24424-1-e@80x24.org>
We can do basename matching when it's unambiguous. Since '*?[]'
characters are rare in URLs and pathnames, we'll do glob
matching by default to support a (curl-inspired) --globoff/-g
option to disable globbing.
And fix --exclude while we're at it
---
lib/PublicInbox/LEI.pm | 3 ++-
lib/PublicInbox/LeiExternal.pm | 38 +++++++++++++++++++++++++++++++++-
lib/PublicInbox/LeiQuery.pm | 14 ++++++++-----
3 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 05a39cad..3cb7a327 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -104,7 +104,7 @@ our %CMD = ( # sorted in order of importance/use:
'q' => [ 'SEARCH_TERMS...', 'search for messages matching terms', qw(
save-as=s output|mfolder|o=s format|f=s dedupe|d=s thread|t augment|a
sort|s=s reverse|r offset=i remote! local! external! pretty
- include|I=s@ exclude=s@ only=s@ jobs|j=s
+ include|I=s@ exclude=s@ only=s@ jobs|j=s globoff|g
mua-cmd|mua=s no-torsocks torsocks=s verbose|v quiet|q
received-after=s received-before=s sent-after=s sent-since=s),
PublicInbox::LeiQuery::curl_opt(), opt_dash('limit|n=i', '[0-9]+') ],
@@ -201,6 +201,7 @@ my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
my %OPTDESC = (
'help|h' => 'show this built-in help',
'quiet|q' => 'be quiet',
+'globoff|g' => "do not match locations using '*?' wildcards and '[]' ranges",
'verbose|v' => 'be more verbose',
'solve!' => 'do not attempt to reconstruct blobs from emails',
'torsocks=s' => ['auto|no|yes',
diff --git a/lib/PublicInbox/LeiExternal.pm b/lib/PublicInbox/LeiExternal.pm
index 3853cfc1..6b4c7fb0 100644
--- a/lib/PublicInbox/LeiExternal.pm
+++ b/lib/PublicInbox/LeiExternal.pm
@@ -39,7 +39,7 @@ sub lei_ls_external {
}
sub ext_canonicalize {
- my ($location) = $_[-1];
+ my ($location) = @_;
if ($location !~ m!\Ahttps?://!) {
PublicInbox::Config::rel2abs_collapsed($location);
} else {
@@ -52,6 +52,42 @@ sub ext_canonicalize {
}
}
+my %patmap = ('*' => '[^/]*?', '?' => '[^/]', '[' => '[', ']' => ']');
+sub glob2pat {
+ my ($glob) = @_;
+ $glob =~ s!(.)!$patmap{$1} || "\Q$1"!ge;
+ $glob;
+}
+
+sub get_externals {
+ my ($self, $loc, $exclude) = @_;
+ return (ext_canonicalize($loc)) if -e $loc;
+
+ my @m;
+ my @cur = externals_each($self);
+ my $do_glob = !$self->{opt}->{globoff}; # glob by default
+ if ($do_glob && ($loc =~ /[\*\?]/s || $loc =~ /\[.*\]/s)) {
+ my $re = glob2pat($loc);
+ @m = grep(m!$re!, @cur);
+ return @m if scalar(@m);
+ } elsif (index($loc, '/') < 0) { # exact basename match:
+ @m = grep(m!/\Q$loc\E/?\z!, @cur);
+ return @m if scalar(@m) == 1;
+ } elsif ($exclude) { # URL, maybe:
+ my $canon = ext_canonicalize($loc);
+ @m = grep(m!\A\Q$canon\E\z!, @cur);
+ return @m if scalar(@m) == 1;
+ } else { # URL:
+ return (ext_canonicalize($loc));
+ }
+ if (scalar(@m) == 0) {
+ $self->fail("`$loc' is unknown");
+ } else {
+ $self->fail("`$loc' is ambiguous:\n", map { "\t$_\n" } @m);
+ }
+ ();
+}
+
sub lei_add_external {
my ($self, $location) = @_;
my $cfg = $self->_lei_cfg(1);
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 72a67c24..10b8d6fa 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -31,17 +31,21 @@ sub lei_q {
}
if (@only) {
for my $loc (@only) {
- $lxs->prepare_external($self->ext_canonicalize($loc));
+ my @loc = $self->get_externals($loc) or return;
+ $lxs->prepare_external($_) for @loc;
}
} else {
for my $loc (@{$opt->{include} // []}) {
- $lxs->prepare_external($self->ext_canonicalize($loc));
+ my @loc = $self->get_externals($loc) or return;
+ $lxs->prepare_external($_) for @loc;
}
# --external is enabled by default, but allow --no-external
if ($opt->{external} //= 1) {
- my %x = map {;
- ($self->ext_canonicalize($_), 1)
- } @{$self->{exclude} // []};
+ my %x;
+ for my $loc (@{$opt->{exclude} // []}) {
+ my @l = $self->get_externals($loc, 1) or return;
+ $x{$_} = 1 for @l;
+ }
my $ne = $self->externals_each(\&prep_ext, $lxs, \%x);
$opt->{remote} //= !($lxs->locals - $opt->{'local'});
if ($opt->{'local'}) {
next prev parent reply other threads:[~2021-02-03 8:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 8:11 [PATCH 00/11] lei q --stdin, shortcut names, etc Eric Wong
2021-02-03 8:11 ` [PATCH 01/11] lei: reduce FD pressure from lei2mail worker Eric Wong
2021-02-03 8:11 ` [PATCH 02/11] lei: further reduce lei2mail FD pressure Eric Wong
2021-02-03 8:11 ` [PATCH 03/11] pkt_op: rely on DS::in_loop global Eric Wong
2021-02-03 8:11 ` [PATCH 04/11] lei: err: avoid uninitialized variable warnings Eric Wong
2021-02-03 8:11 ` [PATCH 05/11] lei: propagate curl errors, improve internal consistency Eric Wong
2021-02-03 8:11 ` Eric Wong [this message]
2021-02-03 8:11 ` [PATCH 07/11] lei: complete basenames for include|exclude|only Eric Wong
2021-02-03 8:11 ` [PATCH 08/11] lei: help starts pager Eric Wong
2021-02-03 8:11 ` [PATCH 09/11] lei add-external: completion for existing URL basenames Eric Wong
2021-02-03 8:11 ` [PATCH 10/11] lei: use sleep(1) loop for infinite sleep Eric Wong
2021-02-03 8:11 ` [PATCH 11/11] lei q: support reading queries from stdin Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210203081143.24424-7-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).