unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
       [not found] ` <CAL_JsqJBh1O3H2-P07AHzVq0x89BoP_N6P=rT5up6=3QyF_B0Q@mail.gmail.com>
@ 2021-11-08 20:22   ` Konstantin Ryabitsev
  2021-11-08 20:49     ` Eric Wong
  2021-11-08 20:53     ` Rob Herring
  0 siblings, 2 replies; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-11-08 20:22 UTC (permalink / raw)
  To: Rob Herring; +Cc: meta

On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:

Moving this to meta.

> >     lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> >       --threads --dedupe=mid \
> >       '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> >       OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> >       AND rt:1.month.ago..'
> 
> I tried a similar one which I had working as a bookmark:
> 
> $ lei q -I https://lore.kernel.org/all/ -o ~/Mail/my-patches
> --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation
> OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'
> # /home/rob/.local/share/lei/store 0/0
> # /usr/bin/curl -Sf -s -d ''
> https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
> # 0 written to /home/rob/Mail/my-patches/ (0 matches)

It's true, I get the same thing if I omit "AND rt:" at the end.

	$ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'
    # /home/user/.local/share/lei/store 0/0
    # /usr/bin/curl -Sf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
    # 0 written to /home/user/work/temp/lei/robh-patches/ (0 matches)
	$ lei forget-search ~/work/temp/lei/robh-patches
	$ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org AND rt:1.month.ago..'
    # /usr/bin/curl -Sf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1633724105..
    # /home/user/.local/share/lei/store 13/13
    # https://lore.kernel.org/all/ 65/?
    # https://lore.kernel.org/all/ 75/75
    # 45 written to /home/user/work/temp/lei/robh-patches/ (88 matches)

> It seems there is some problem in quoting. Notice the '%22' that's
> inserted in the url.

Deferring to Eric here.

> Also, the above query is a bit of a work-around as what I really want
> is just all patches from me. I haven't been able to get something to
> work. I've tried things like 'dfn:*' or 'dfn:/' or 'dfn:b/'.

I think 's:patch AND nq:diff' is a good option here.

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 20:22   ` lei: incorrect quoting on saved searches (was Re: lore+lei: getting started) Konstantin Ryabitsev
@ 2021-11-08 20:49     ` Eric Wong
  2021-11-08 21:36       ` Konstantin Ryabitsev
  2021-11-08 20:53     ` Rob Herring
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-11-08 20:49 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:
> 
> Moving this to meta.

I don't think workflows should've been dropped, though.

> > >     lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> > >       --threads --dedupe=mid \
> > >       '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> > >       OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> > >       AND rt:1.month.ago..'
> > 
> > I tried a similar one which I had working as a bookmark:

That's actually treating the entire single-quoted section as
a phrase search for Xapian.

The correct way to use '(', ')', and '*' on the command-line for
Xapian is to shell escape them:

	lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
		--threads --dedupe=mid \
	\( dfn:drivers/block/floppy.c OR dfhh:floppy_\* OR s:floppy \
		OR \(\(nq:bug OR nq:regression\) AND nq:floppy\)\) \
		AND rt:1.month.ago...

Since shell escaping so many metacharacters is annoying,
stdin is supported (and implicit iff file|pipe):

	echo '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
		OR ((nq:bug OR nq:regression) AND nq:floppy)) \
		AND rt:1.month.ago..' | \
		lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
		--threads --dedupe=mid

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 20:22   ` lei: incorrect quoting on saved searches (was Re: lore+lei: getting started) Konstantin Ryabitsev
  2021-11-08 20:49     ` Eric Wong
@ 2021-11-08 20:53     ` Rob Herring
  2021-11-08 21:27       ` [PATCH] searchidx: index "diff --git a/... b/..." headers Eric Wong
  1 sibling, 1 reply; 12+ messages in thread
From: Rob Herring @ 2021-11-08 20:53 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

On Mon, Nov 8, 2021 at 2:22 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:
>
> Moving this to meta.
>
> > >     lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> > >       --threads --dedupe=mid \
> > >       '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> > >       OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> > >       AND rt:1.month.ago..'
> >
> > I tried a similar one which I had working as a bookmark:
> >
> > $ lei q -I https://lore.kernel.org/all/ -o ~/Mail/my-patches
> > --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation
> > OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'
> > # /home/rob/.local/share/lei/store 0/0
> > # /usr/bin/curl -Sf -s -d ''
> > https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
> > # 0 written to /home/rob/Mail/my-patches/ (0 matches)
>
> It's true, I get the same thing if I omit "AND rt:" at the end.
>
>         $ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'
>     # /home/user/.local/share/lei/store 0/0
>     # /usr/bin/curl -Sf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
>     # 0 written to /home/user/work/temp/lei/robh-patches/ (0 matches)
>         $ lei forget-search ~/work/temp/lei/robh-patches
>         $ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org AND rt:1.month.ago..'
>     # /usr/bin/curl -Sf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1633724105..
>     # /home/user/.local/share/lei/store 13/13
>     # https://lore.kernel.org/all/ 65/?
>     # https://lore.kernel.org/all/ 75/75
>     # 45 written to /home/user/work/temp/lei/robh-patches/ (88 matches)
>
> > It seems there is some problem in quoting. Notice the '%22' that's
> > inserted in the url.
>
> Deferring to Eric here.
>
> > Also, the above query is a bit of a work-around as what I really want
> > is just all patches from me. I haven't been able to get something to
> > work. I've tried things like 'dfn:*' or 'dfn:/' or 'dfn:b/'.
>
> I think 's:patch AND nq:diff' is a good option here.

Not even close really. That mainly finds my replies with 'diff' in
them. I'm not sure why, but it misses most actual patches:

https://lore.kernel.org/all/?q=s%3Apatch+nq%3Adiff+f%3Arobh%40kernel.org

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] searchidx: index "diff --git a/... b/..." headers
  2021-11-08 20:53     ` Rob Herring
@ 2021-11-08 21:27       ` Eric Wong
  2021-11-09  0:38         ` Rob Herring
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-11-08 21:27 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, meta

Rob Herring <robh@kernel.org> wrote:
> On Mon, Nov 8, 2021 at 2:22 PM Konstantin Ryabitsev
> > I think 's:patch AND nq:diff' is a good option here.
> 
> Not even close really. That mainly finds my replies with 'diff' in
> them. I'm not sure why, but it misses most actual patches:
> 
> https://lore.kernel.org/all/?q=s%3Apatch+nq%3Adiff+f%3Arobh%40kernel.org

Actually, it looks like nq:diff never works.  The diff indexer
skips right over 'diff --git a/... b/...' lines :x

The following should fix it, but reindexing is necessary.
---------8<----------
Subject: [PATCH] searchidx: index "diff --git a/... b/..." headers

While we do detailed indexing of git diffs, the header itself
was failing and queries like 'nq:diff' would not work.

Noticed-by: Rob Herring <robh@kernel.org>
---
 lib/PublicInbox/SearchIdx.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index b886ce78..6e2e614c 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -259,6 +259,7 @@ sub index_diff ($$$) {
 		} elsif (m!^diff --git "?[^/]+/.+ "?[^/]+/.+\z!) {
 			# wait until "---" and "+++" to capture filenames
 			$in_diff = 1;
+			push @xnq, $_;
 		# traditional diff:
 		} elsif (m/^diff -(.+) (\S+) (\S+)$/) {
 			my ($opt, $fa, $fb) = ($1, $2, $3);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 20:49     ` Eric Wong
@ 2021-11-08 21:36       ` Konstantin Ryabitsev
  2021-11-08 21:48         ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-11-08 21:36 UTC (permalink / raw)
  To: Eric Wong; +Cc: Rob Herring, meta, workflows

On Mon, Nov 08, 2021 at 08:49:23PM +0000, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:
> > 
> > Moving this to meta.
> 
> I don't think workflows should've been dropped, though.
> 
> > > >     lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> > > >       --threads --dedupe=mid \
> > > >       '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> > > >       OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> > > >       AND rt:1.month.ago..'
> > > 
> > > I tried a similar one which I had working as a bookmark:
> 
> That's actually treating the entire single-quoted section as
> a phrase search for Xapian.

Hmm... I noticed that when I `lei edit-search` the initial query that was
causing quoting issues, I get the following:

	[lei]
		q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org

So, the extra quotes didn't get added to the config file. Running `lei up` on
that saved search seems to do the right thing, so the erroneous quotes are
only added during the initial `lei q` call.

> The correct way to use '(', ')', and '*' on the command-line for
> Xapian is to shell escape them:

But putting them into single quotes should accomplish the same result, no? At
least, that's how I've always understood shell escaping.

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 21:36       ` Konstantin Ryabitsev
@ 2021-11-08 21:48         ` Eric Wong
  2021-11-08 22:36           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-11-08 21:48 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Rob Herring, meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Nov 08, 2021 at 08:49:23PM +0000, Eric Wong wrote:
> > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:
> > 
> > > > >     lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> > > > >       --threads --dedupe=mid \
> > > > >       '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> > > > >       OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> > > > >       AND rt:1.month.ago..'
> > > > 
> > > > I tried a similar one which I had working as a bookmark:
> > 
> > That's actually treating the entire single-quoted section as
> > a phrase search for Xapian.
> 
> Hmm... I noticed that when I `lei edit-search` the initial query that was
> causing quoting issues, I get the following:
> 
> 	[lei]
> 		q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org
> 
> So, the extra quotes didn't get added to the config file. Running `lei up` on
> that saved search seems to do the right thing, so the erroneous quotes are
> only added during the initial `lei q` call.

Right, each entry in lei.q is actually an entry in argv[].
So the correct query should look something like:

[lei]
        q = (
        q = dfn:drivers/block/floppy.c
        q = OR
...

> > The correct way to use '(', ')', and '*' on the command-line for
> > Xapian is to shell escape them:
> 
> But putting them into single quotes should accomplish the same result, no? At
> least, that's how I've always understood shell escaping.

Yeah, that works, too.  As long as spaces/tabs don't show up
within each argv[] element, it won't be interpreted as a phrase.

I really wanted:	lei q s:"a quick brown fox"
to work from a shell like it would in the WWW UI;
and thus deprioritized '(' and ')' working properly :x

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 21:48         ` Eric Wong
@ 2021-11-08 22:36           ` Konstantin Ryabitsev
  2021-11-08 22:57             ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-11-08 22:36 UTC (permalink / raw)
  To: Eric Wong; +Cc: Rob Herring, meta, workflows

On Mon, Nov 08, 2021 at 09:48:36PM +0000, Eric Wong wrote:
> > Hmm... I noticed that when I `lei edit-search` the initial query that was
> > causing quoting issues, I get the following:
> > 
> > 	[lei]
> > 		q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org
> > 
> > So, the extra quotes didn't get added to the config file. Running `lei up` on
> > that saved search seems to do the right thing, so the erroneous quotes are
> > only added during the initial `lei q` call.
> 
> Right, each entry in lei.q is actually an entry in argv[].
> So the correct query should look something like:

So, to be clear here... the following doesn't work because instead of multiple
query parameters to 'lei q' the single-quoted string becomes a single
parameter?

	lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
    --threads --dedupe=mid \
    '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'

Any way to make this work? I find that it's more easily readable than the
"echo | lei q" version.

For bash users, the following should work as well:

	lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
    --threads --dedupe=mid <<< \
    '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'

Suggestion, can -I accept the URL containing the query, so that the command
becomes:

    lei q -o ~/mail/foo --threads --dedupe=mid -I \
    https://lore.kernel.org/all/?q=f%3Atorvalds+AND+nq%3Agarbage

This way we pass both the location of the extindex to query AND the parameters
we should use, avoiding shell quoting problems?

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)
  2021-11-08 22:36           ` Konstantin Ryabitsev
@ 2021-11-08 22:57             ` Eric Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Wong @ 2021-11-08 22:57 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Rob Herring, meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Nov 08, 2021 at 09:48:36PM +0000, Eric Wong wrote:
> > > Hmm... I noticed that when I `lei edit-search` the initial query that was
> > > causing quoting issues, I get the following:
> > > 
> > > 	[lei]
> > > 		q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org
> > > 
> > > So, the extra quotes didn't get added to the config file. Running `lei up` on
> > > that saved search seems to do the right thing, so the erroneous quotes are
> > > only added during the initial `lei q` call.
> > 
> > Right, each entry in lei.q is actually an entry in argv[].
> > So the correct query should look something like:
> 
> So, to be clear here... the following doesn't work because instead of multiple
> query parameters to 'lei q' the single-quoted string becomes a single
> parameter?
> 
> 	lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
>     --threads --dedupe=mid \
>     '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'
> 
> Any way to make this work? I find that it's more easily readable than the
> "echo | lei q" version.

I can't think of a way to make it work w/o breaking phrase searches
(or asking users to use both single and double-quotes):

	lei q 's:"a quick brown fox"' # yuck

> For bash users, the following should work as well:
> 
> 	lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
>     --threads --dedupe=mid <<< \
>     '(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts) AND f:robh@kernel.org'

Oh, not sure about bash and <<<; but this heredoc should work with any
POSIX sh:

	lei q -I https://lore.kernel.org/all/ \
		-o ~/work/temp/lei/robh-patches \
		--threads --dedupe=mid <<'EOM'
(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR dfn:scripts)
AND f:robh@kernel.org
EOM

> Suggestion, can -I accept the URL containing the query, so that the command
> becomes:
> 
>     lei q -o ~/mail/foo --threads --dedupe=mid -I \
>     https://lore.kernel.org/all/?q=f%3Atorvalds+AND+nq%3Agarbage
> 
> This way we pass both the location of the extindex to query AND the parameters
> we should use, avoiding shell quoting problems?

Maybe, but URI escaping bothers the heck out of me, too.
I guess the heredoc example is actually good...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] searchidx: index "diff --git a/... b/..." headers
  2021-11-08 21:27       ` [PATCH] searchidx: index "diff --git a/... b/..." headers Eric Wong
@ 2021-11-09  0:38         ` Rob Herring
  2021-11-09  3:12           ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Herring @ 2021-11-09  0:38 UTC (permalink / raw)
  To: Eric Wong; +Cc: Konstantin Ryabitsev, meta

On Mon, Nov 8, 2021 at 3:27 PM Eric Wong <e@80x24.org> wrote:
>
> Rob Herring <robh@kernel.org> wrote:
> > On Mon, Nov 8, 2021 at 2:22 PM Konstantin Ryabitsev
> > > I think 's:patch AND nq:diff' is a good option here.
> >
> > Not even close really. That mainly finds my replies with 'diff' in
> > them. I'm not sure why, but it misses most actual patches:
> >
> > https://lore.kernel.org/all/?q=s%3Apatch+nq%3Adiff+f%3Arobh%40kernel.org
>
> Actually, it looks like nq:diff never works.  The diff indexer
> skips right over 'diff --git a/... b/...' lines :x

Never works for 'diff' being a patch? Because it works very well
finding all the other cases.

> The following should fix it, but reindexing is necessary.
> ---------8<----------
> Subject: [PATCH] searchidx: index "diff --git a/... b/..." headers
>
> While we do detailed indexing of git diffs, the header itself
> was failing and queries like 'nq:diff' would not work.

Any thoughts on supporting an 'is a patch' type query?

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] searchidx: index "diff --git a/... b/..." headers
  2021-11-09  0:38         ` Rob Herring
@ 2021-11-09  3:12           ` Eric Wong
  2021-11-09  4:03             ` Rob Herring
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-11-09  3:12 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, meta

Rob Herring <robh@kernel.org> wrote:
> On Mon, Nov 8, 2021 at 3:27 PM Eric Wong <e@80x24.org> wrote:
> >
> > Rob Herring <robh@kernel.org> wrote:
> > > On Mon, Nov 8, 2021 at 2:22 PM Konstantin Ryabitsev
> > > > I think 's:patch AND nq:diff' is a good option here.
> > >
> > > Not even close really. That mainly finds my replies with 'diff' in
> > > them. I'm not sure why, but it misses most actual patches:
> > >
> > > https://lore.kernel.org/all/?q=s%3Apatch+nq%3Adiff+f%3Arobh%40kernel.org
> >
> > Actually, it looks like nq:diff never works.  The diff indexer
> > skips right over 'diff --git a/... b/...' lines :x
> 
> Never works for 'diff' being a patch? Because it works very well
> finding all the other cases.

Yeah, the index_diff() code path ignored the "diff --git" phrase
before this patch.

> > The following should fix it, but reindexing is necessary.
> > ---------8<----------
> > Subject: [PATCH] searchidx: index "diff --git a/... b/..." headers
> >
> > While we do detailed indexing of git diffs, the header itself
> > was failing and queries like 'nq:diff' would not work.
> 
> Any thoughts on supporting an 'is a patch' type query?

I think 's:patch' should be sufficient, don't think there's
many false-positives on that front, actually.

With this fix, nq:"diff --git" should also be working across
https://yhbt.net/lore/ in about 40 hours (whenever reindex
finishes)

I'm not sure if there needs to be a specific term to index
patches on; maybe there is.  There's still a lot of Xapian
we're not using, yet...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] searchidx: index "diff --git a/... b/..." headers
  2021-11-09  3:12           ` Eric Wong
@ 2021-11-09  4:03             ` Rob Herring
  2021-11-09  5:08               ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Herring @ 2021-11-09  4:03 UTC (permalink / raw)
  To: Eric Wong; +Cc: Konstantin Ryabitsev, meta

On Mon, Nov 8, 2021 at 9:12 PM Eric Wong <e@80x24.org> wrote:
>
> Rob Herring <robh@kernel.org> wrote:
> > On Mon, Nov 8, 2021 at 3:27 PM Eric Wong <e@80x24.org> wrote:
> > >
> > > Rob Herring <robh@kernel.org> wrote:
> > > > On Mon, Nov 8, 2021 at 2:22 PM Konstantin Ryabitsev
> > > > > I think 's:patch AND nq:diff' is a good option here.
> > > >
> > > > Not even close really. That mainly finds my replies with 'diff' in
> > > > them. I'm not sure why, but it misses most actual patches:
> > > >
> > > > https://lore.kernel.org/all/?q=s%3Apatch+nq%3Adiff+f%3Arobh%40kernel.org
> > >
> > > Actually, it looks like nq:diff never works.  The diff indexer
> > > skips right over 'diff --git a/... b/...' lines :x
> >
> > Never works for 'diff' being a patch? Because it works very well
> > finding all the other cases.
>
> Yeah, the index_diff() code path ignored the "diff --git" phrase
> before this patch.
>
> > > The following should fix it, but reindexing is necessary.
> > > ---------8<----------
> > > Subject: [PATCH] searchidx: index "diff --git a/... b/..." headers
> > >
> > > While we do detailed indexing of git diffs, the header itself
> > > was failing and queries like 'nq:diff' would not work.
> >
> > Any thoughts on supporting an 'is a patch' type query?
>
> I think 's:patch' should be sufficient, don't think there's
> many false-positives on that front, actually.

It's at least 's:patch OR s:rfc OR s:resend'. That catches all but the
few creative folks that come up with something else.

> With this fix, nq:"diff --git" should also be working across
> https://yhbt.net/lore/ in about 40 hours (whenever reindex
> finishes)

'diff --git' should cover probably 99.9% of patches but there are
still some non-git diffs from time to time.

> I'm not sure if there needs to be a specific term to index
> patches on; maybe there is.  There's still a lot of Xapian
> we're not using, yet...

What I'm hoping to get to is a replacement for patchwork in my
workflow. For that I want all patches which don't have either a
Reviewed/Acked tag from me or a reply from me. I think the first part
should be possible with lei, but I'd imagine the last part is some
processing on top of the lei query.

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] searchidx: index "diff --git a/... b/..." headers
  2021-11-09  4:03             ` Rob Herring
@ 2021-11-09  5:08               ` Eric Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Wong @ 2021-11-09  5:08 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, meta

Rob Herring <robh@kernel.org> wrote:
> On Mon, Nov 8, 2021 at 9:12 PM Eric Wong <e@80x24.org> wrote:
> > I think 's:patch' should be sufficient, don't think there's
> > many false-positives on that front, actually.
> 
> It's at least 's:patch OR s:rfc OR s:resend'. That catches all but the
> few creative folks that come up with something else.
> 
> > With this fix, nq:"diff --git" should also be working across
> > https://yhbt.net/lore/ in about 40 hours (whenever reindex
> > finishes)
> 
> 'diff --git' should cover probably 99.9% of patches but there are
> still some non-git diffs from time to time.

OK, so maybe the combination of:

	s:patch OR s:rfc OR s:resend OR nq:"diff --git"

Is enough?  Maybe it would be good do support some form of alias
expansion in the Xapian query parser for "common" things like
that.  I know there's a few not-seriously-proposed patches which
may lack all of those, but perhaps they weren't meant to be
applied, either...

Fwiw, it would also be useful to have it expand .mailmap and
sendemail.aliasesfile entries, too.

Unfortunately, I think doing aliases cleanly requires dropping
down to C++ to supply custom routines to Xapian.  The current
approxidate parsing is all done via fragile string
substitutions; I'm not sure how that holds up...

> > I'm not sure if there needs to be a specific term to index
> > patches on; maybe there is.  There's still a lot of Xapian
> > we're not using, yet...
> 
> What I'm hoping to get to is a replacement for patchwork in my
> workflow. For that I want all patches which don't have either a
> Reviewed/Acked tag from me or a reply from me. I think the first part
> should be possible with lei, but I'd imagine the last part is some
> processing on top of the lei query.

Yes, exactly.  Powerful-enough local search should be able to
replace many web-based tools.  Patch indexing could take into
account git trailers, but maybe existing 'nq:' phrases are
enough *shrug*

lei is still primitive, but things like "lei p2q" can be
combined to do some patch tracking.  An example from the
lei-p2q(1) manpage:

         # to view unapplied patches for a given $FILE from the past year:
         echo \( rt:last.year.. AND dfn:$FILE \) AND NOT \( \
               $(git log -p --pretty=mboxrd --since=last.year $FILE |
                       lei p2q -F mboxrd )
               \) | lei q -o /tmp/unapplied

I do need some time to consider future features and non-bugfix
stuff to lei+public-inbox, though.  There's already a huge
amount of stuff in the TODO and a billion things in my life
that also need fixing :<

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-09  5:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <lorelei.part1.202111051304.mdtebsxahljcrxak@meerkat.local>
     [not found] ` <CAL_JsqJBh1O3H2-P07AHzVq0x89BoP_N6P=rT5up6=3QyF_B0Q@mail.gmail.com>
2021-11-08 20:22   ` lei: incorrect quoting on saved searches (was Re: lore+lei: getting started) Konstantin Ryabitsev
2021-11-08 20:49     ` Eric Wong
2021-11-08 21:36       ` Konstantin Ryabitsev
2021-11-08 21:48         ` Eric Wong
2021-11-08 22:36           ` Konstantin Ryabitsev
2021-11-08 22:57             ` Eric Wong
2021-11-08 20:53     ` Rob Herring
2021-11-08 21:27       ` [PATCH] searchidx: index "diff --git a/... b/..." headers Eric Wong
2021-11-09  0:38         ` Rob Herring
2021-11-09  3:12           ` Eric Wong
2021-11-09  4:03             ` Rob Herring
2021-11-09  5:08               ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).