unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* extra search flags and params? (ispatch, replycount, ...)
@ 2023-11-28  0:10 Eric Wong
  2023-11-28 15:30 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-11-28  0:10 UTC (permalink / raw)
  To: meta; +Cc: workflows

Would they be useful?

It's not currently possible to quickly search for whether or not
a term (e.g. patchid:) is present in a Xapian document.  Having
the ability to do so would make it easier to find non-patch messages,
or easily filter down to cover letters, bot replies, etc...

Thus adding Xapian terms is probably necessary for that.

replycount would be useful for implementing search for
ANSWERED/UNANSWERED in IMAP.  However, proper indexing/reindex
could be a PITA due to out-of-order message reception.

I don't think any of these would be required to get "lei rediff"
working on entire patchsets, though (it only does individual
messages, currently).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28  0:10 extra search flags and params? (ispatch, replycount, ...) Eric Wong
@ 2023-11-28 15:30 ` Konstantin Ryabitsev
  2023-11-28 17:35   ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-11-28 15:30 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, workflows

On Tue, Nov 28, 2023 at 12:10:28AM +0000, Eric Wong wrote:
> Would they be useful?
> 
> It's not currently possible to quickly search for whether or not
> a term (e.g. patchid:) is present in a Xapian document.  Having
> the ability to do so would make it easier to find non-patch messages,
> or easily filter down to cover letters, bot replies, etc...

I understand the reasoning, but I'm not sure we should be trying too hard to
make public-inbox a patch tracking platform. What makes lei great is ability
to automatically find and retrieve entire threads -- I feel like we should
leave series tracking to other platforms that already exist (patchwork,
patchew, etc).

> I don't think any of these would be required to get "lei rediff"
> working on entire patchsets, though (it only does individual
> messages, currently).

Incidentally, I've recently discovered that relying on git-patch-id to match
commits to message archives has some important flaws. Linus was actually the
one who caused this when he recommended that maintainers switch to using the
"histogram" diff algorithm instead of the default ("myers").

This made me realize that there's actually a multitude of ways the same patch
can be represented (diff-algorithm, number of context lines, etc) that would
cause git-patch-id to return a different value for the exact same commit.

So, while I know that Linus doesn't want Link: entries in commits that just go
to the series, using the message-id remains the only mechanism to reliably
link commits to the series discussion.

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 15:30 ` Konstantin Ryabitsev
@ 2023-11-28 17:35   ` Eric Wong
  2023-11-28 17:49     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-11-28 17:35 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Nov 28, 2023 at 12:10:28AM +0000, Eric Wong wrote:
> > Would they be useful?
> > 
> > It's not currently possible to quickly search for whether or not
> > a term (e.g. patchid:) is present in a Xapian document.  Having
> > the ability to do so would make it easier to find non-patch messages,
> > or easily filter down to cover letters, bot replies, etc...
> 
> I understand the reasoning, but I'm not sure we should be trying too hard to
> make public-inbox a patch tracking platform. What makes lei great is ability
> to automatically find and retrieve entire threads -- I feel like we should
> leave series tracking to other platforms that already exist (patchwork,
> patchew, etc).

I was thinking more along the lines of readers just trying to
find trying to find non-patch discussions.  I'm not really
interested in the tracking part, more just being able to quickly
find discussion related to a commit.

> > I don't think any of these would be required to get "lei rediff"
> > working on entire patchsets, though (it only does individual
> > messages, currently).
> 
> Incidentally, I've recently discovered that relying on git-patch-id to match
> commits to message archives has some important flaws. Linus was actually the
> one who caused this when he recommended that maintainers switch to using the
> "histogram" diff algorithm instead of the default ("myers").

Yeah, -cindex was actually built to support joins on pre or post-image
blob OIDs, too, just need to clamp to a 7 char hex abbreviation.
Even Subjects <=> commit titles could be made to work with the
way our indices are setup.

> This made me realize that there's actually a multitude of ways the same patch
> can be represented (diff-algorithm, number of context lines, etc) that would
> cause git-patch-id to return a different value for the exact same commit.

Yeah, post-image blob abbreviations are probably the way to go.

Fwiw, solver only uses post-image blob abbreviations and the
filename as a hint.  I rolled it out a few hours ago on yhbt.net/lore
and it seems to be solving kernel blobs just fine, but the
debug log is choosing random git URLs.

(Solver is the thing that powers `lei rediff' and the linkified
hunk headers on public-inbox.org/git since 2019, and now yhbt.net/lore)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 17:35   ` Eric Wong
@ 2023-11-28 17:49     ` Konstantin Ryabitsev
  2023-11-28 18:20       ` Eric Wong
  2023-12-12 23:29       ` Rob Herring
  0 siblings, 2 replies; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-11-28 17:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, workflows

On Tue, Nov 28, 2023 at 05:35:09PM +0000, Eric Wong wrote:
> > I understand the reasoning, but I'm not sure we should be trying too hard to
> > make public-inbox a patch tracking platform. What makes lei great is ability
> > to automatically find and retrieve entire threads -- I feel like we should
> > leave series tracking to other platforms that already exist (patchwork,
> > patchew, etc).
> 
> I was thinking more along the lines of readers just trying to
> find trying to find non-patch discussions.

Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
introducing additional xapian indexing parameters. Though, perhaps the web
interface can also gain a "collapse threads" view?

> > This made me realize that there's actually a multitude of ways the same patch
> > can be represented (diff-algorithm, number of context lines, etc) that would
> > cause git-patch-id to return a different value for the exact same commit.
> 
> Yeah, post-image blob abbreviations are probably the way to go.
> 
> Fwiw, solver only uses post-image blob abbreviations and the
> filename as a hint.  I rolled it out a few hours ago on yhbt.net/lore
> and it seems to be solving kernel blobs just fine, but the
> debug log is choosing random git URLs.

Ah, neat! That said, what happens if a series was applied with "git am -3" and
the post-image blob abbreviations are necessarily different? (I may be
misunderstanding the approach, please correct me if I do.)

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 17:49     ` Konstantin Ryabitsev
@ 2023-11-28 18:20       ` Eric Wong
  2023-11-28 20:00         ` Konstantin Ryabitsev
  2023-12-12 23:29       ` Rob Herring
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-11-28 18:20 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Nov 28, 2023 at 05:35:09PM +0000, Eric Wong wrote:
> > > I understand the reasoning, but I'm not sure we should be trying too hard to
> > > make public-inbox a patch tracking platform. What makes lei great is ability
> > > to automatically find and retrieve entire threads -- I feel like we should
> > > leave series tracking to other platforms that already exist (patchwork,
> > > patchew, etc).
> > 
> > I was thinking more along the lines of readers just trying to
> > find trying to find non-patch discussions.
> 
> Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
> introducing additional xapian indexing parameters. Though, perhaps the web
> interface can also gain a "collapse threads" view?

topics_new.html / topics_active.html endpoints?
Also, '&t' is a weird accident that happens to work:

https://yhbt.net/lore/git/?q=s:*+AND+NOT+s:PATCH&t

I suppose that's OK for the majority of cases.

Though being able to find unanswered threads could be helpful.

> > > This made me realize that there's actually a multitude of ways the same patch
> > > can be represented (diff-algorithm, number of context lines, etc) that would
> > > cause git-patch-id to return a different value for the exact same commit.
> > 
> > Yeah, post-image blob abbreviations are probably the way to go.
> > 
> > Fwiw, solver only uses post-image blob abbreviations and the
> > filename as a hint.  I rolled it out a few hours ago on yhbt.net/lore
> > and it seems to be solving kernel blobs just fine, but the
> > debug log is choosing random git URLs.
> 
> Ah, neat! That said, what happens if a series was applied with "git am -3" and
> the post-image blob abbreviations are necessarily different? (I may be
> misunderstanding the approach, please correct me if I do.)

The resulting blob will be retrievable directly via git once
it's pushed to a mirrored repo.  solver tries to look for a blob
in git, first, before trying to search+apply patches[1].

Right now I'm only clamping to 3 coderepos for performance
reasons, but that limitation is going away.  But I need a nap,
first :x


[1] solver could reverse the order to look for the mails first
    if we wanted to see the fullest email history of a blob,
    but it's much slower because it has to do a lot more work
    (hitting Xapian and runing git apply+ls-files for every
    single patch).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 18:20       ` Eric Wong
@ 2023-11-28 20:00         ` Konstantin Ryabitsev
  2023-11-29  2:13           ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-11-28 20:00 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, workflows

On Tue, Nov 28, 2023 at 06:20:03PM +0000, Eric Wong wrote:
> > Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
> > introducing additional xapian indexing parameters. Though, perhaps the web
> > interface can also gain a "collapse threads" view?
> 
> topics_new.html / topics_active.html endpoints?
> Also, '&t' is a weird accident that happens to work:
> 
> https://yhbt.net/lore/git/?q=s:*+AND+NOT+s:PATCH&t
> 
> I suppose that's OK for the majority of cases.

Nice!

> Though being able to find unanswered threads could be helpful.

Note, I'm not saying it's not a cool feature. :) However, I imagine people
would be more interested in searching for something like "show me all threads
mentioning $foo to which *I* haven't replied yet". It's not quite the same
thing as "nobody has replied yet."

I have no idea how hard this would be.

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 20:00         ` Konstantin Ryabitsev
@ 2023-11-29  2:13           ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-11-29  2:13 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, workflows

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Nov 28, 2023 at 06:20:03PM +0000, Eric Wong wrote:
> > Though being able to find unanswered threads could be helpful.
> 
> Note, I'm not saying it's not a cool feature. :) However, I imagine people
> would be more interested in searching for something like "show me all threads
> mentioning $foo to which *I* haven't replied yet". It's not quite the same
> thing as "nobody has replied yet."
> 
> I have no idea how hard this would be.

I think notmuch can do it:

https://public-inbox.org/meta/20230412000654.M9394@dcvr/
https://public-inbox.org/meta/f8156a76-f4ac-8fc3-0eae-0c3a1a927ddb@intel.com/

So can we, since I've started using C++ in xap_helper and will
introduce RLIMIT_* support soonish to keep performance in check.
Going to see how much punishment the ancient (but fanless :>)
CPU on yhbt.net/lore can handle...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: extra search flags and params? (ispatch, replycount, ...)
  2023-11-28 17:49     ` Konstantin Ryabitsev
  2023-11-28 18:20       ` Eric Wong
@ 2023-12-12 23:29       ` Rob Herring
  1 sibling, 0 replies; 8+ messages in thread
From: Rob Herring @ 2023-12-12 23:29 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Eric Wong, meta, workflows

On Tue, Nov 28, 2023 at 11:49 AM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Tue, Nov 28, 2023 at 05:35:09PM +0000, Eric Wong wrote:
> > > I understand the reasoning, but I'm not sure we should be trying too hard to
> > > make public-inbox a patch tracking platform. What makes lei great is ability
> > > to automatically find and retrieve entire threads -- I feel like we should
> > > leave series tracking to other platforms that already exist (patchwork,
> > > patchew, etc).

patch tracking platforms might want to use public-inbox to get the
patches in the first place.

> >
> > I was thinking more along the lines of readers just trying to
> > find trying to find non-patch discussions.

I do this time to time to find things I miss. Since I have patch
tracking, I don't miss patches.

> Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
> introducing additional xapian indexing parameters. Though, perhaps the web
> interface can also gain a "collapse threads" view?

There's also [RFC 1/N], [PATCHv5], or just [vN], so:

"s:* AND NOT (s:PATCH OR s:RFC OR s:v1 OR s:v2 OR s:v3...)"

But when someone does "[RFC] Things I want to discuss" or "blah blah
patch blah blah" it won't work. It's fragile and inexact.

We already have find certain patches with dfn:, why not find all patches?

Rob

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-12 23:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-28  0:10 extra search flags and params? (ispatch, replycount, ...) Eric Wong
2023-11-28 15:30 ` Konstantin Ryabitsev
2023-11-28 17:35   ` Eric Wong
2023-11-28 17:49     ` Konstantin Ryabitsev
2023-11-28 18:20       ` Eric Wong
2023-11-28 20:00         ` Konstantin Ryabitsev
2023-11-29  2:13           ` Eric Wong
2023-12-12 23:29       ` Rob Herring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).