From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <e@80x24.org> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 281CF1F9FC; Tue, 9 Nov 2021 05:08:27 +0000 (UTC) Date: Tue, 9 Nov 2021 05:08:27 +0000 From: Eric Wong <e@80x24.org> To: Rob Herring <robh@kernel.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>, meta@public-inbox.org Subject: Re: [PATCH] searchidx: index "diff --git a/... b/..." headers Message-ID: <20211109050827.GA28454@dcvr> References: <lorelei.part1.202111051304.mdtebsxahljcrxak@meerkat.local> <CAL_JsqJBh1O3H2-P07AHzVq0x89BoP_N6P=rT5up6=3QyF_B0Q@mail.gmail.com> <20211108202204.q5zg6bachnvbjlnx@meerkat.local> <CAL_Jsq+XtqOEF7p5zbO2O2YdHPr61+ahPgdDhH7_XMwyuDuc2w@mail.gmail.com> <20211108212714.GA13642@dcvr> <CAL_Jsq+aqDmpxUHsw844xS8f6WRX3gcvt7GQhf2XB7-Lb=Yx8Q@mail.gmail.com> <20211109031233.GA19089@dcvr> <CAL_Jsq+Eqru9qGwU3QKVVbdN5o7md-0YcZHVkBB-egDD0=319g@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <CAL_Jsq+Eqru9qGwU3QKVVbdN5o7md-0YcZHVkBB-egDD0=319g@mail.gmail.com> List-Id: <meta.public-inbox.org> Rob Herring <robh@kernel.org> wrote: > On Mon, Nov 8, 2021 at 9:12 PM Eric Wong <e@80x24.org> wrote: > > I think 's:patch' should be sufficient, don't think there's > > many false-positives on that front, actually. > > It's at least 's:patch OR s:rfc OR s:resend'. That catches all but the > few creative folks that come up with something else. > > > With this fix, nq:"diff --git" should also be working across > > https://yhbt.net/lore/ in about 40 hours (whenever reindex > > finishes) > > 'diff --git' should cover probably 99.9% of patches but there are > still some non-git diffs from time to time. OK, so maybe the combination of: s:patch OR s:rfc OR s:resend OR nq:"diff --git" Is enough? Maybe it would be good do support some form of alias expansion in the Xapian query parser for "common" things like that. I know there's a few not-seriously-proposed patches which may lack all of those, but perhaps they weren't meant to be applied, either... Fwiw, it would also be useful to have it expand .mailmap and sendemail.aliasesfile entries, too. Unfortunately, I think doing aliases cleanly requires dropping down to C++ to supply custom routines to Xapian. The current approxidate parsing is all done via fragile string substitutions; I'm not sure how that holds up... > > I'm not sure if there needs to be a specific term to index > > patches on; maybe there is. There's still a lot of Xapian > > we're not using, yet... > > What I'm hoping to get to is a replacement for patchwork in my > workflow. For that I want all patches which don't have either a > Reviewed/Acked tag from me or a reply from me. I think the first part > should be possible with lei, but I'd imagine the last part is some > processing on top of the lei query. Yes, exactly. Powerful-enough local search should be able to replace many web-based tools. Patch indexing could take into account git trailers, but maybe existing 'nq:' phrases are enough *shrug* lei is still primitive, but things like "lei p2q" can be combined to do some patch tracking. An example from the lei-p2q(1) manpage: # to view unapplied patches for a given $FILE from the past year: echo \( rt:last.year.. AND dfn:$FILE \) AND NOT \( \ $(git log -p --pretty=mboxrd --since=last.year $FILE | lei p2q -F mboxrd ) \) | lei q -o /tmp/unapplied I do need some time to consider future features and non-bugfix stuff to lei+public-inbox, though. There's already a huge amount of stuff in the TODO and a billion things in my life that also need fixing :<