unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* search by whole thread?
@ 2023-04-11 22:59 Jacob Keller
  2023-04-12  0:06 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2023-04-11 22:59 UTC (permalink / raw)
  To: meta

Hi,

I'm wondering if there is a way to search a list by the entire thread?

For example, I want to find all threads which have at least one message
with dfn:<some path> and which have no messages containing the text
"Reviewed-by".

This would for example let me search an open source archive for threads
(patch series for example) which have not received any reviewed-by reply.

The current search function available through the HTML website doesn't
seem to have a "by thread" function. I also haven't been able to find
any option similar to this in the email client I typically use for
interacting with the lists (Thunderbird).

Perhaps this is something that I could implement locally from the clone
of the archive, but I am not quite sure how to go about it.. It seems
like something that should be reasonably straight forward given the way
that public inbox already tracks threads. Any suggestions on how to get
something like this would be appreciated.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-11 22:59 search by whole thread? Jacob Keller
@ 2023-04-12  0:06 ` Eric Wong
  2023-04-12 18:29   ` Jacob Keller
  2023-04-12 18:49   ` Konstantin Ryabitsev
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Wong @ 2023-04-12  0:06 UTC (permalink / raw)
  To: Jacob Keller; +Cc: meta

Jacob Keller <jacob.e.keller@intel.com> wrote:
> Hi,
> 
> I'm wondering if there is a way to search a list by the entire thread?

Not yet...

> For example, I want to find all threads which have at least one message
> with dfn:<some path> and which have no messages containing the text
> "Reviewed-by".
> 
> This would for example let me search an open source archive for threads
> (patch series for example) which have not received any reviewed-by reply.

Yes, that's something I've wanted, too...

> The current search function available through the HTML website doesn't
> seem to have a "by thread" function. I also haven't been able to find
> any option similar to this in the email client I typically use for
> interacting with the lists (Thunderbird).

I think the reason it's rare in MUAs is that it's potentially
very expensive.  But I think the `thread:{subquery}' feature
from notmuch I discussed with Konstantin the other week[1] can
do what you want it to do.

Keep in mind, notmuch-search-terms(7) states:

	The performance of such queries can vary wildly.

And that's for a private client tool for a single user.

For a public-facing web UI, we'll need proper timeouts (likely
via RLIMIT_CPU + SIGXCPU) in an external process and a C++ build
against libxapian.  AFAIK, custom query parsers aren't possible
in Xapian's high-level language bindings; fortunately I can
legally reuse GPL-3+ C++ code from notmuch \o/

The external process will probably be similar to
`git cat-file --batch-command' though it can use SOCK_SEQPACKET
for requests and pipes for large responses.

> Perhaps this is something that I could implement locally from the clone
> of the archive, but I am not quite sure how to go about it.. It seems
> like something that should be reasonably straight forward given the way
> that public inbox already tracks threads. Any suggestions on how to get
> something like this would be appreciated.

Are you able to confirm notmuch `thread:{subquery}' is what
you're after?

I plan on implementing it with proper timeouts for untrusted
clients within the next few weeks/months; assuming some other
stuff works out and I still have Internet + power.


[1] https://public-inbox.org/meta/20230328194549.M808175@dcvr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-12  0:06 ` Eric Wong
@ 2023-04-12 18:29   ` Jacob Keller
  2023-04-12 18:49   ` Konstantin Ryabitsev
  1 sibling, 0 replies; 7+ messages in thread
From: Jacob Keller @ 2023-04-12 18:29 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta



On 4/11/2023 5:06 PM, Eric Wong wrote:
> Jacob Keller <jacob.e.keller@intel.com> wrote:
>> Hi,
>>
>> I'm wondering if there is a way to search a list by the entire thread?
> 
> Not yet...
> 
>> For example, I want to find all threads which have at least one message
>> with dfn:<some path> and which have no messages containing the text
>> "Reviewed-by".
>>
>> This would for example let me search an open source archive for threads
>> (patch series for example) which have not received any reviewed-by reply.
> 
> Yes, that's something I've wanted, too...
> 
>> The current search function available through the HTML website doesn't
>> seem to have a "by thread" function. I also haven't been able to find
>> any option similar to this in the email client I typically use for
>> interacting with the lists (Thunderbird).
> 
> I think the reason it's rare in MUAs is that it's potentially
> very expensive.  But I think the `thread:{subquery}' feature
> from notmuch I discussed with Konstantin the other week[1] can
> do what you want it to do.
> 
> Keep in mind, notmuch-search-terms(7) states:
> 
> 	The performance of such queries can vary wildly.
> 
> And that's for a private client tool for a single user.
> 
> For a public-facing web UI, we'll need proper timeouts (likely
> via RLIMIT_CPU + SIGXCPU) in an external process and a C++ build
> against libxapian.  AFAIK, custom query parsers aren't possible
> in Xapian's high-level language bindings; fortunately I can
> legally reuse GPL-3+ C++ code from notmuch \o/
> 
> The external process will probably be similar to
> `git cat-file --batch-command' though it can use SOCK_SEQPACKET
> for requests and pipes for large responses.
> 
>> Perhaps this is something that I could implement locally from the clone
>> of the archive, but I am not quite sure how to go about it.. It seems
>> like something that should be reasonably straight forward given the way
>> that public inbox already tracks threads. Any suggestions on how to get
>> something like this would be appreciated.
> 
> Are you able to confirm notmuch `thread:{subquery}' is what
> you're after?

Ah. I tried searching but didn't hit upon not much. To be honest, I bet
just directly using notmuch and having it subscribe to the messages from
the public inbox would be sufficient for my purposes :D

I'll explore this and see, but it does sound like the thread:{subquery}
is basically what I want.

> 
> I plan on implementing it with proper timeouts for untrusted
> clients within the next few weeks/months; assuming some other
> stuff works out and I still have Internet + power.
> 
> 
> [1] https://public-inbox.org/meta/20230328194549.M808175@dcvr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-12  0:06 ` Eric Wong
  2023-04-12 18:29   ` Jacob Keller
@ 2023-04-12 18:49   ` Konstantin Ryabitsev
  2023-04-12 20:17     ` Eric Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-04-12 18:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jacob Keller, meta

On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
> I think the reason it's rare in MUAs is that it's potentially
> very expensive.  But I think the `thread:{subquery}' feature
> from notmuch I discussed with Konstantin the other week[1] can
> do what you want it to do.
> 
> Keep in mind, notmuch-search-terms(7) states:
> 
> 	The performance of such queries can vary wildly.
> 
> And that's for a private client tool for a single user.

Yes, when I was wondering about that, it was really for the lei side of
things. I don't really want to run expensive queries on lore (though I'm okay
if we can turn it off for /all/ or other very large lists).

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-12 18:49   ` Konstantin Ryabitsev
@ 2023-04-12 20:17     ` Eric Wong
  2023-04-12 21:01       ` Jacob Keller
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-04-12 20:17 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Jacob Keller, meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
> > I think the reason it's rare in MUAs is that it's potentially
> > very expensive.  But I think the `thread:{subquery}' feature
> > from notmuch I discussed with Konstantin the other week[1] can
> > do what you want it to do.
> > 
> > Keep in mind, notmuch-search-terms(7) states:
> > 
> > 	The performance of such queries can vary wildly.
> > 
> > And that's for a private client tool for a single user.
> 
> Yes, when I was wondering about that, it was really for the lei side of
> things. I don't really want to run expensive queries on lore (though I'm okay
> if we can turn it off for /all/ or other very large lists).

I expect relying on timeouts in an external process will be fine
for lore, especially since some expensive queries are already
possible :x

I suppose ITIMER_REAL is better than RLIMIT_CPU since the former
accounts for I/O time.  Xapian makes a lot of small pread
syscalls so I don't see it being stuck in D-state long on SSDs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-12 20:17     ` Eric Wong
@ 2023-04-12 21:01       ` Jacob Keller
  2023-04-12 21:21         ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2023-04-12 21:01 UTC (permalink / raw)
  To: Eric Wong, Konstantin Ryabitsev; +Cc: meta



On 4/12/2023 1:17 PM, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
>> On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
>>> I think the reason it's rare in MUAs is that it's potentially
>>> very expensive.  But I think the `thread:{subquery}' feature
>>> from notmuch I discussed with Konstantin the other week[1] can
>>> do what you want it to do.
>>>
>>> Keep in mind, notmuch-search-terms(7) states:
>>>
>>> 	The performance of such queries can vary wildly.
>>>
>>> And that's for a private client tool for a single user.
>>
>> Yes, when I was wondering about that, it was really for the lei side of
>> things. I don't really want to run expensive queries on lore (though I'm okay
>> if we can turn it off for /all/ or other very large lists).
> 
> I expect relying on timeouts in an external process will be fine
> for lore, especially since some expensive queries are already
> possible :x
> 
> I suppose ITIMER_REAL is better than RLIMIT_CPU since the former
> accounts for I/O time.  Xapian makes a lot of small pread
> syscalls so I don't see it being stuck in D-state long on SSDs.

For what is worth to those watching the thread, I was able to get what I
needed via combining [1] with notmuch, and its good enough for my purposes.

Being able to do the thread:{} querying directly on lore would be
convenient, but doing the search locally is good enough for my purposes.

Thanks for the tip on notmuch!

-Jake

[1]: https://github.com/wkz/notmuch-lore

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: search by whole thread?
  2023-04-12 21:01       ` Jacob Keller
@ 2023-04-12 21:21         ` Eric Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2023-04-12 21:21 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Konstantin Ryabitsev, meta

Jacob Keller <jacob.e.keller@intel.com> wrote:
> Thanks for the tip on notmuch!

No problem!  Much of the indexing and search logic in
public-inbox was originally stolen from the C++ code of notmuch
and translated to Perl.

I haven't kept up-to-date with notmuch since giant Maildirs are
too expensive for me; but upcoming FUSE support in lei should
will exposing its git blobs w/o storing duplicate messages
anywhere on disk.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-12 21:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-11 22:59 search by whole thread? Jacob Keller
2023-04-12  0:06 ` Eric Wong
2023-04-12 18:29   ` Jacob Keller
2023-04-12 18:49   ` Konstantin Ryabitsev
2023-04-12 20:17     ` Eric Wong
2023-04-12 21:01       ` Jacob Keller
2023-04-12 21:21         ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).