unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* filtering stable patches in lore queries
@ 2024-04-27  0:28 Jason A. Donenfeld
  2024-04-27  7:19 ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2024-04-27  0:28 UTC (permalink / raw)
  To: tools, stable, meta; +Cc: sashal, gregkh, mricon, krzk

Hi,

Greg and Sasha add the "X-stable: review" to their patch bombs, with the
intention that people will be able to filter these out should they
desire to do so. For example, I usually want all threads that match code
I care about, but I don't regularly want to see thousand-patch stable
series. So this header is helpful.

However, I'm not able to formulate a query for lore (to pass to `lei q`)
that will match on negating it. The idea would be to exclude the thread
if the parent has this header. It looks like public inbox might only
index on some headers, but can't generically search all? I'm not sure
how it works, but queries only seem to half way work when searching for
that header.

In the meantime, I've been using this ugly bash script, which gets the
job done, but means I have to download everything locally first:

    #!/bin/bash
    PWD="${BASH_SOURCE[0]}"
    PWD="${PWD%/*}"
    set -e
    cd "$PWD"
    echo "[+] Syncing new mail" >&2
    lei up "$PWD"
    echo "[+] Cleaning up stable patch bombs" >&2
    mapfile -d $'\0' -t parents < <(grep -F -x -Z -r -l 'X-stable: review' cur tmp new)
    {
      [[ -f stable-message-ids ]] && cat stable-message-ids
      [[ ${#parents[@]} -gt 0 ]] && sed -n 's/^Message-ID: <\(.*\)>$/\1/p' "${parents[@]}"
    } | sort -u > stable-message-ids.new
    mv stable-message-ids.new stable-message-ids
    [[ -s stable-message-ids ]] || exit 0
    mapfile -d $'\0' -t children < <(grep -F -Z -r -l -f - cur tmp new < stable-message-ids)
    total=$(( ${#parents[@]} + ${#children[@]} ))
    [[ $total -gt 0 ]] || exit 0
    echo "# rm <...$total messages...>" >&2
    rm -f "${parents[@]}" "${children[@]}"

This results in something like:

    zx2c4@thinkpad ~/Projects/lkml $ ./update.bash
    [+] Syncing new mail
    # https://lore.kernel.org/all/ limiting ...
    # /usr/bin/curl -gSf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(...
    [+] Cleaning up stable patch bombs
    # rm <...24593 messages...>

It works, but it'd be nice to not even download these messages in the
first place. Since I'm deleting message I don't want, I have to keep
track of the message IDs of those deleted messages with the stable
header in case replies come in later. That's some book keeping, sheesh!

Any thoughts on this workflow?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filtering stable patches in lore queries
  2024-04-27  0:28 filtering stable patches in lore queries Jason A. Donenfeld
@ 2024-04-27  7:19 ` Eric Wong
  2024-04-29 14:27   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2024-04-27  7:19 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: tools, stable, meta, sashal, gregkh, mricon, krzk

"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
> Hi,
> 
> Greg and Sasha add the "X-stable: review" to their patch bombs, with the
> intention that people will be able to filter these out should they
> desire to do so. For example, I usually want all threads that match code
> I care about, but I don't regularly want to see thousand-patch stable
> series. So this header is helpful.
> 
> However, I'm not able to formulate a query for lore (to pass to `lei q`)
> that will match on negating it. The idea would be to exclude the thread
> if the parent has this header. It looks like public inbox might only
> index on some headers, but can't generically search all? I'm not sure
> how it works, but queries only seem to half way work when searching for
> that header.

Correct, public-inbox currently won't index every header due to
cost, false positives, and otherwise lack of usefulness (general
gibberish from DKIM sigs, various UUIDs, etc).

So it doesn't currently know about "X-stable:"

I started working on making headers indexing configurable last
year, but didn't hear a response from the person that
potentially was interested:

https://public-inbox.org/meta/20231120032132.M610564@dcvr/

Right now, indexing new headers + validations can be maintained
as a Perl module in the public-inbox codebase.

For lore, it'd make sense to be able to configure a bunch (or
all) inboxes at once instead of the per-inbox configuration in
my proposed RFC.

At minimum, one would have to know:

1) the mail header name (e.g. `X-stable')
2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
3) the type of header value (phrase, string, sortable numeric, etc...)

I'm trying to avoid supporting sortable numeric values for this,
since supporting them will problems if columns get repurposed
with admins changing their minds.   A full reindex would fix it,
but those are crazy expensive.

So probably just supporting strings and/or phrases to start...

Validation to prevent poisoning by malicious/broken senders can
be useful in some cases (and the reason the RFC was a per use
case Perl module).  That said, I'm not sure if much validation
is necessary for X-stable: headers or if just any text is fine.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filtering stable patches in lore queries
  2024-04-27  7:19 ` Eric Wong
@ 2024-04-29 14:27   ` Konstantin Ryabitsev
  2024-05-08 11:33     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-29 14:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk

On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
> Correct, public-inbox currently won't index every header due to
> cost, false positives, and otherwise lack of usefulness (general
> gibberish from DKIM sigs, various UUIDs, etc).
> 
> So it doesn't currently know about "X-stable:"
> 
> I started working on making headers indexing configurable last
> year, but didn't hear a response from the person that
> potentially was interested:
> 
> https://public-inbox.org/meta/20231120032132.M610564@dcvr/
> 
> Right now, indexing new headers + validations can be maintained
> as a Perl module in the public-inbox codebase.
> 
> For lore, it'd make sense to be able to configure a bunch (or
> all) inboxes at once instead of the per-inbox configuration in
> my proposed RFC.
> 
> At minimum, one would have to know:
> 
> 1) the mail header name (e.g. `X-stable')
> 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
> 3) the type of header value (phrase, string, sortable numeric, etc...)

I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
like to be able to identify code-review trailers sent for a specific 
patch, even if that patch itself is not on lore. For example, this could 
be a patch that is part of a pull-request on a git forge, but we'd still 
like to be able to collect and find code-review trailers for it when a 
maintainer applies it.

Currently, I am using the following approach:

| Reviewed-by: Some Developer <some.dev@example.org>
| ---
| for-patch-id: abcd...1234

Then I can query 'nq:"for-patch-id: abcd...1234"', but this is probably 
much more heavy than if I could provide this in a custom header:

| X-For-Patch-ID: abcd...1234

and query for "xforpatchid:abcd...1234"

> I'm trying to avoid supporting sortable numeric values for this,
> since supporting them will problems if columns get repurposed
> with admins changing their minds.   A full reindex would fix it,
> but those are crazy expensive.

I'm perfectly fine with it only being a string, honestly.

> 
> So probably just supporting strings and/or phrases to start...
> 
> Validation to prevent poisoning by malicious/broken senders can
> be useful in some cases (and the reason the RFC was a per use
> case Perl module).  That said, I'm not sure if much validation
> is necessary for X-stable: headers or if just any text is fine.

I'd let the consumer clients worry about it.

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filtering stable patches in lore queries
  2024-04-29 14:27   ` Konstantin Ryabitsev
@ 2024-05-08 11:33     ` Eric Wong
  2024-05-08 17:01       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2024-05-08 11:33 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk

Konstantin Ryabitsev <mricon@kernel.org> wrote:
> On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
> > Correct, public-inbox currently won't index every header due to
> > cost, false positives, and otherwise lack of usefulness (general
> > gibberish from DKIM sigs, various UUIDs, etc).
> > 
> > So it doesn't currently know about "X-stable:"
> > 
> > I started working on making headers indexing configurable last
> > year, but didn't hear a response from the person that
> > potentially was interested:
> > 
> > https://public-inbox.org/meta/20231120032132.M610564@dcvr/
> > 
> > Right now, indexing new headers + validations can be maintained
> > as a Perl module in the public-inbox codebase.
> > 
> > For lore, it'd make sense to be able to configure a bunch (or
> > all) inboxes at once instead of the per-inbox configuration in
> > my proposed RFC.
> > 
> > At minimum, one would have to know:
> > 
> > 1) the mail header name (e.g. `X-stable')
> > 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
> > 3) the type of header value (phrase, string, sortable numeric, etc...)
> 
> I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
> like to be able to identify code-review trailers sent for a specific 
> patch, even if that patch itself is not on lore. For example, this could 
> be a patch that is part of a pull-request on a git forge, but we'd still 
> like to be able to collect and find code-review trailers for it when a 
> maintainer applies it.

OK, a more configurable version is available on a per-inbox basis:

https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/

But that's a PITA to configure with hundreds of inboxes and
doesn't have extindex support, yet.

I made it share logic with the old altid code; so I'll also be
getting altid into extindex since ISTR users wanting to be able
to lookup gmane stuff via extindex.

And it also works with the new C++ xap_helper process
(which I'll use for threadid: support (still working on that...)).

> I'm perfectly fine with it only being a string, honestly.

Yeah, though there's 3 ways of indexing strings, currently :x
I've decided to keep some options open and support boolean_term,
text, and phrase for now.

boolean_term is the cheapest and probably best for exactly
matching labels/enums and such.  The others may work better
for more complex texts (comma-delimited labels, maybe).

> > So probably just supporting strings and/or phrases to start...
> > 
> > Validation to prevent poisoning by malicious/broken senders can
> > be useful in some cases (and the reason the RFC was a per use
> > case Perl module).  That said, I'm not sure if much validation
> > is necessary for X-stable: headers or if just any text is fine.
> 
> I'd let the consumer clients worry about it.

Agreed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filtering stable patches in lore queries
  2024-05-08 11:33     ` Eric Wong
@ 2024-05-08 17:01       ` Konstantin Ryabitsev
  2024-05-08 17:09         ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2024-05-08 17:01 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk

On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote:
> > I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
> > like to be able to identify code-review trailers sent for a specific 
> > patch, even if that patch itself is not on lore. For example, this could 
> > be a patch that is part of a pull-request on a git forge, but we'd still 
> > like to be able to collect and find code-review trailers for it when a 
> > maintainer applies it.
> 
> OK, a more configurable version is available on a per-inbox basis:
> 
> https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/
> 
> But that's a PITA to configure with hundreds of inboxes and
> doesn't have extindex support, yet.
> 
> I made it share logic with the old altid code; so I'll also be
> getting altid into extindex since ISTR users wanting to be able
> to lookup gmane stuff via extindex.

Great, thanks for doing this. I'll wait until this has extindex support,
because I really need to be able to look across all inboxes.

> Yeah, though there's 3 ways of indexing strings, currently :x
> I've decided to keep some options open and support boolean_term,
> text, and phrase for now.

What's the difference between "text" and "phrase"?

> boolean_term is the cheapest and probably best for exactly
> matching labels/enums and such.

So, this is for "X-Ignore-Me: Yes" type of headers?

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filtering stable patches in lore queries
  2024-05-08 17:01       ` Konstantin Ryabitsev
@ 2024-05-08 17:09         ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2024-05-08 17:09 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk

Konstantin Ryabitsev <mricon@kernel.org> wrote:
> On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote:
> > https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/

> > Yeah, though there's 3 ways of indexing strings, currently :x
> > I've decided to keep some options open and support boolean_term,
> > text, and phrase for now.
> 
> What's the difference between "text" and "phrase"?

text is like indexlevel=medium (case-insensitive and sortable
by relevance), while phrase is like indexlevel=full so
adds positions to allow searching phrases via "double quotesk

(also documented in the proposed public-inbox-config.pod change,
not sure how clear it was for non-Xapian-internals-aware
folks...)

> > boolean_term is the cheapest and probably best for exactly
> > matching labels/enums and such.
> 
> So, this is for "X-Ignore-Me: Yes" type of headers?

Yes.  Though I just remembered it's case-sensitive (same
treatment as Message-ID with "m:"), so I guess that needs to be
documented.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-08 17:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-27  0:28 filtering stable patches in lore queries Jason A. Donenfeld
2024-04-27  7:19 ` Eric Wong
2024-04-29 14:27   ` Konstantin Ryabitsev
2024-05-08 11:33     ` Eric Wong
2024-05-08 17:01       ` Konstantin Ryabitsev
2024-05-08 17:09         ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).