* filtering stable patches in lore queries @ 2024-04-27 0:28 Jason A. Donenfeld 2024-04-27 7:19 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Jason A. Donenfeld @ 2024-04-27 0:28 UTC (permalink / raw) To: tools, stable, meta; +Cc: sashal, gregkh, mricon, krzk Hi, Greg and Sasha add the "X-stable: review" to their patch bombs, with the intention that people will be able to filter these out should they desire to do so. For example, I usually want all threads that match code I care about, but I don't regularly want to see thousand-patch stable series. So this header is helpful. However, I'm not able to formulate a query for lore (to pass to `lei q`) that will match on negating it. The idea would be to exclude the thread if the parent has this header. It looks like public inbox might only index on some headers, but can't generically search all? I'm not sure how it works, but queries only seem to half way work when searching for that header. In the meantime, I've been using this ugly bash script, which gets the job done, but means I have to download everything locally first: #!/bin/bash PWD="${BASH_SOURCE[0]}" PWD="${PWD%/*}" set -e cd "$PWD" echo "[+] Syncing new mail" >&2 lei up "$PWD" echo "[+] Cleaning up stable patch bombs" >&2 mapfile -d $'\0' -t parents < <(grep -F -x -Z -r -l 'X-stable: review' cur tmp new) { [[ -f stable-message-ids ]] && cat stable-message-ids [[ ${#parents[@]} -gt 0 ]] && sed -n 's/^Message-ID: <\(.*\)>$/\1/p' "${parents[@]}" } | sort -u > stable-message-ids.new mv stable-message-ids.new stable-message-ids [[ -s stable-message-ids ]] || exit 0 mapfile -d $'\0' -t children < <(grep -F -Z -r -l -f - cur tmp new < stable-message-ids) total=$(( ${#parents[@]} + ${#children[@]} )) [[ $total -gt 0 ]] || exit 0 echo "# rm <...$total messages...>" >&2 rm -f "${parents[@]}" "${children[@]}" This results in something like: zx2c4@thinkpad ~/Projects/lkml $ ./update.bash [+] Syncing new mail # https://lore.kernel.org/all/ limiting ... # /usr/bin/curl -gSf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(... [+] Cleaning up stable patch bombs # rm <...24593 messages...> It works, but it'd be nice to not even download these messages in the first place. Since I'm deleting message I don't want, I have to keep track of the message IDs of those deleted messages with the stable header in case replies come in later. That's some book keeping, sheesh! Any thoughts on this workflow? Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: filtering stable patches in lore queries 2024-04-27 0:28 filtering stable patches in lore queries Jason A. Donenfeld @ 2024-04-27 7:19 ` Eric Wong 2024-04-29 14:27 ` Konstantin Ryabitsev 0 siblings, 1 reply; 6+ messages in thread From: Eric Wong @ 2024-04-27 7:19 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: tools, stable, meta, sashal, gregkh, mricon, krzk "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: > Hi, > > Greg and Sasha add the "X-stable: review" to their patch bombs, with the > intention that people will be able to filter these out should they > desire to do so. For example, I usually want all threads that match code > I care about, but I don't regularly want to see thousand-patch stable > series. So this header is helpful. > > However, I'm not able to formulate a query for lore (to pass to `lei q`) > that will match on negating it. The idea would be to exclude the thread > if the parent has this header. It looks like public inbox might only > index on some headers, but can't generically search all? I'm not sure > how it works, but queries only seem to half way work when searching for > that header. Correct, public-inbox currently won't index every header due to cost, false positives, and otherwise lack of usefulness (general gibberish from DKIM sigs, various UUIDs, etc). So it doesn't currently know about "X-stable:" I started working on making headers indexing configurable last year, but didn't hear a response from the person that potentially was interested: https://public-inbox.org/meta/20231120032132.M610564@dcvr/ Right now, indexing new headers + validations can be maintained as a Perl module in the public-inbox codebase. For lore, it'd make sense to be able to configure a bunch (or all) inboxes at once instead of the per-inbox configuration in my proposed RFC. At minimum, one would have to know: 1) the mail header name (e.g. `X-stable') 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK 3) the type of header value (phrase, string, sortable numeric, etc...) I'm trying to avoid supporting sortable numeric values for this, since supporting them will problems if columns get repurposed with admins changing their minds. A full reindex would fix it, but those are crazy expensive. So probably just supporting strings and/or phrases to start... Validation to prevent poisoning by malicious/broken senders can be useful in some cases (and the reason the RFC was a per use case Perl module). That said, I'm not sure if much validation is necessary for X-stable: headers or if just any text is fine. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: filtering stable patches in lore queries 2024-04-27 7:19 ` Eric Wong @ 2024-04-29 14:27 ` Konstantin Ryabitsev 2024-05-08 11:33 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Konstantin Ryabitsev @ 2024-04-29 14:27 UTC (permalink / raw) To: Eric Wong; +Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote: > Correct, public-inbox currently won't index every header due to > cost, false positives, and otherwise lack of usefulness (general > gibberish from DKIM sigs, various UUIDs, etc). > > So it doesn't currently know about "X-stable:" > > I started working on making headers indexing configurable last > year, but didn't hear a response from the person that > potentially was interested: > > https://public-inbox.org/meta/20231120032132.M610564@dcvr/ > > Right now, indexing new headers + validations can be maintained > as a Perl module in the public-inbox codebase. > > For lore, it'd make sense to be able to configure a bunch (or > all) inboxes at once instead of the per-inbox configuration in > my proposed RFC. > > At minimum, one would have to know: > > 1) the mail header name (e.g. `X-stable') > 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK > 3) the type of header value (phrase, string, sortable numeric, etc...) I'm whole-heartedly for this! This ties nicely to my b4 work where I'd like to be able to identify code-review trailers sent for a specific patch, even if that patch itself is not on lore. For example, this could be a patch that is part of a pull-request on a git forge, but we'd still like to be able to collect and find code-review trailers for it when a maintainer applies it. Currently, I am using the following approach: | Reviewed-by: Some Developer <some.dev@example.org> | --- | for-patch-id: abcd...1234 Then I can query 'nq:"for-patch-id: abcd...1234"', but this is probably much more heavy than if I could provide this in a custom header: | X-For-Patch-ID: abcd...1234 and query for "xforpatchid:abcd...1234" > I'm trying to avoid supporting sortable numeric values for this, > since supporting them will problems if columns get repurposed > with admins changing their minds. A full reindex would fix it, > but those are crazy expensive. I'm perfectly fine with it only being a string, honestly. > > So probably just supporting strings and/or phrases to start... > > Validation to prevent poisoning by malicious/broken senders can > be useful in some cases (and the reason the RFC was a per use > case Perl module). That said, I'm not sure if much validation > is necessary for X-stable: headers or if just any text is fine. I'd let the consumer clients worry about it. -K ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: filtering stable patches in lore queries 2024-04-29 14:27 ` Konstantin Ryabitsev @ 2024-05-08 11:33 ` Eric Wong 2024-05-08 17:01 ` Konstantin Ryabitsev 0 siblings, 1 reply; 6+ messages in thread From: Eric Wong @ 2024-05-08 11:33 UTC (permalink / raw) To: Konstantin Ryabitsev Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk Konstantin Ryabitsev <mricon@kernel.org> wrote: > On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote: > > Correct, public-inbox currently won't index every header due to > > cost, false positives, and otherwise lack of usefulness (general > > gibberish from DKIM sigs, various UUIDs, etc). > > > > So it doesn't currently know about "X-stable:" > > > > I started working on making headers indexing configurable last > > year, but didn't hear a response from the person that > > potentially was interested: > > > > https://public-inbox.org/meta/20231120032132.M610564@dcvr/ > > > > Right now, indexing new headers + validations can be maintained > > as a Perl module in the public-inbox codebase. > > > > For lore, it'd make sense to be able to configure a bunch (or > > all) inboxes at once instead of the per-inbox configuration in > > my proposed RFC. > > > > At minimum, one would have to know: > > > > 1) the mail header name (e.g. `X-stable') > > 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK > > 3) the type of header value (phrase, string, sortable numeric, etc...) > > I'm whole-heartedly for this! This ties nicely to my b4 work where I'd > like to be able to identify code-review trailers sent for a specific > patch, even if that patch itself is not on lore. For example, this could > be a patch that is part of a pull-request on a git forge, but we'd still > like to be able to collect and find code-review trailers for it when a > maintainer applies it. OK, a more configurable version is available on a per-inbox basis: https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/ But that's a PITA to configure with hundreds of inboxes and doesn't have extindex support, yet. I made it share logic with the old altid code; so I'll also be getting altid into extindex since ISTR users wanting to be able to lookup gmane stuff via extindex. And it also works with the new C++ xap_helper process (which I'll use for threadid: support (still working on that...)). > I'm perfectly fine with it only being a string, honestly. Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now. boolean_term is the cheapest and probably best for exactly matching labels/enums and such. The others may work better for more complex texts (comma-delimited labels, maybe). > > So probably just supporting strings and/or phrases to start... > > > > Validation to prevent poisoning by malicious/broken senders can > > be useful in some cases (and the reason the RFC was a per use > > case Perl module). That said, I'm not sure if much validation > > is necessary for X-stable: headers or if just any text is fine. > > I'd let the consumer clients worry about it. Agreed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: filtering stable patches in lore queries 2024-05-08 11:33 ` Eric Wong @ 2024-05-08 17:01 ` Konstantin Ryabitsev 2024-05-08 17:09 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Konstantin Ryabitsev @ 2024-05-08 17:01 UTC (permalink / raw) To: Eric Wong; +Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote: > > I'm whole-heartedly for this! This ties nicely to my b4 work where I'd > > like to be able to identify code-review trailers sent for a specific > > patch, even if that patch itself is not on lore. For example, this could > > be a patch that is part of a pull-request on a git forge, but we'd still > > like to be able to collect and find code-review trailers for it when a > > maintainer applies it. > > OK, a more configurable version is available on a per-inbox basis: > > https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/ > > But that's a PITA to configure with hundreds of inboxes and > doesn't have extindex support, yet. > > I made it share logic with the old altid code; so I'll also be > getting altid into extindex since ISTR users wanting to be able > to lookup gmane stuff via extindex. Great, thanks for doing this. I'll wait until this has extindex support, because I really need to be able to look across all inboxes. > Yeah, though there's 3 ways of indexing strings, currently :x > I've decided to keep some options open and support boolean_term, > text, and phrase for now. What's the difference between "text" and "phrase"? > boolean_term is the cheapest and probably best for exactly > matching labels/enums and such. So, this is for "X-Ignore-Me: Yes" type of headers? -K ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: filtering stable patches in lore queries 2024-05-08 17:01 ` Konstantin Ryabitsev @ 2024-05-08 17:09 ` Eric Wong 0 siblings, 0 replies; 6+ messages in thread From: Eric Wong @ 2024-05-08 17:09 UTC (permalink / raw) To: Konstantin Ryabitsev Cc: Jason A. Donenfeld, tools, stable, meta, sashal, gregkh, krzk Konstantin Ryabitsev <mricon@kernel.org> wrote: > On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote: > > https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/ > > Yeah, though there's 3 ways of indexing strings, currently :x > > I've decided to keep some options open and support boolean_term, > > text, and phrase for now. > > What's the difference between "text" and "phrase"? text is like indexlevel=medium (case-insensitive and sortable by relevance), while phrase is like indexlevel=full so adds positions to allow searching phrases via "double quotesk (also documented in the proposed public-inbox-config.pod change, not sure how clear it was for non-Xapian-internals-aware folks...) > > boolean_term is the cheapest and probably best for exactly > > matching labels/enums and such. > > So, this is for "X-Ignore-Me: Yes" type of headers? Yes. Though I just remembered it's case-sensitive (same treatment as Message-ID with "m:"), so I guess that needs to be documented. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-08 17:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-27 0:28 filtering stable patches in lore queries Jason A. Donenfeld 2024-04-27 7:19 ` Eric Wong 2024-04-29 14:27 ` Konstantin Ryabitsev 2024-05-08 11:33 ` Eric Wong 2024-05-08 17:01 ` Konstantin Ryabitsev 2024-05-08 17:09 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).