From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <e@80x24.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN:  
X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from localhost (dcvr.yhbt.net [127.0.0.1])
	by dcvr.yhbt.net (Postfix) with ESMTP id 281CF1F9FC;
	Tue,  9 Nov 2021 05:08:27 +0000 (UTC)
Date: Tue, 9 Nov 2021 05:08:27 +0000
From: Eric Wong <e@80x24.org>
To: Rob Herring <robh@kernel.org>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	meta@public-inbox.org
Subject: Re: [PATCH] searchidx: index "diff --git a/... b/..." headers
Message-ID: <20211109050827.GA28454@dcvr>
References: <lorelei.part1.202111051304.mdtebsxahljcrxak@meerkat.local>
 <CAL_JsqJBh1O3H2-P07AHzVq0x89BoP_N6P=rT5up6=3QyF_B0Q@mail.gmail.com>
 <20211108202204.q5zg6bachnvbjlnx@meerkat.local>
 <CAL_Jsq+XtqOEF7p5zbO2O2YdHPr61+ahPgdDhH7_XMwyuDuc2w@mail.gmail.com>
 <20211108212714.GA13642@dcvr>
 <CAL_Jsq+aqDmpxUHsw844xS8f6WRX3gcvt7GQhf2XB7-Lb=Yx8Q@mail.gmail.com>
 <20211109031233.GA19089@dcvr>
 <CAL_Jsq+Eqru9qGwU3QKVVbdN5o7md-0YcZHVkBB-egDD0=319g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CAL_Jsq+Eqru9qGwU3QKVVbdN5o7md-0YcZHVkBB-egDD0=319g@mail.gmail.com>
List-Id: <meta.public-inbox.org>

Rob Herring <robh@kernel.org> wrote:
> On Mon, Nov 8, 2021 at 9:12 PM Eric Wong <e@80x24.org> wrote:
> > I think 's:patch' should be sufficient, don't think there's
> > many false-positives on that front, actually.
> 
> It's at least 's:patch OR s:rfc OR s:resend'. That catches all but the
> few creative folks that come up with something else.
> 
> > With this fix, nq:"diff --git" should also be working across
> > https://yhbt.net/lore/ in about 40 hours (whenever reindex
> > finishes)
> 
> 'diff --git' should cover probably 99.9% of patches but there are
> still some non-git diffs from time to time.

OK, so maybe the combination of:

	s:patch OR s:rfc OR s:resend OR nq:"diff --git"

Is enough?  Maybe it would be good do support some form of alias
expansion in the Xapian query parser for "common" things like
that.  I know there's a few not-seriously-proposed patches which
may lack all of those, but perhaps they weren't meant to be
applied, either...

Fwiw, it would also be useful to have it expand .mailmap and
sendemail.aliasesfile entries, too.

Unfortunately, I think doing aliases cleanly requires dropping
down to C++ to supply custom routines to Xapian.  The current
approxidate parsing is all done via fragile string
substitutions; I'm not sure how that holds up...

> > I'm not sure if there needs to be a specific term to index
> > patches on; maybe there is.  There's still a lot of Xapian
> > we're not using, yet...
> 
> What I'm hoping to get to is a replacement for patchwork in my
> workflow. For that I want all patches which don't have either a
> Reviewed/Acked tag from me or a reply from me. I think the first part
> should be possible with lei, but I'd imagine the last part is some
> processing on top of the lei query.

Yes, exactly.  Powerful-enough local search should be able to
replace many web-based tools.  Patch indexing could take into
account git trailers, but maybe existing 'nq:' phrases are
enough *shrug*

lei is still primitive, but things like "lei p2q" can be
combined to do some patch tracking.  An example from the
lei-p2q(1) manpage:

         # to view unapplied patches for a given $FILE from the past year:
         echo \( rt:last.year.. AND dfn:$FILE \) AND NOT \( \
               $(git log -p --pretty=mboxrd --since=last.year $FILE |
                       lei p2q -F mboxrd )
               \) | lei q -o /tmp/unapplied

I do need some time to consider future features and non-bugfix
stuff to lei+public-inbox, though.  There's already a huge
amount of stuff in the TODO and a billion things in my life
that also need fixing :<