unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* lei q -tt doesn't work properly?
@ 2023-02-13 16:06 Maxim Mikityanskiy
  2023-02-14  2:42 ` [PATCH] lei q: do not collapse threads with `-tt' Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Maxim Mikityanskiy @ 2023-02-13 16:06 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong, Kyle Meyer

Hello,

I'm trying to use the -tt flag to download the whole thread, but mark
the actual matching emails as important. I'm not sure if I'm doing it
incorrectly, or maybe there is a bug in lei.

According to the man page:

--cut--

-t  Return all messages in the same thread as the actual match(es).

    Using this twice ("-tt") sets the "flagged" (AKA "important") on
    messages which were actual matches.  This is useful to distinguish
    messages which were direct hits from messages which were merely
    part of the same thread.

--cut--

I'm using this command, for example:

lei q --no-save -a -o /tmp/lei-test -I 'https://lore.kernel.org/all' \
    -tt 'a:syzbot AND rt:2023-01-01..2023-01-07'

What I expect to see is at least one flagged email in each thread
(otherwise why would this thread by downloaded), however, instead, most
of the emails are not flagged, that is, the whole threads don't have any
flagged email (although they clearly have emails from syzbot, which
caused the match). Occasionally, some emails are flagged, for example,
these two:

https://lore.kernel.org/all/87wn621hmp.fsf@toke.dk/
https://lore.kernel.org/all/20230103081308.942805751@linuxfoundation.org/

It looks as if the match works correctly, but the -tt option fails to
mark most of the matched emails as important, except a few that actually
got marked (I couldn't find a pattern here). It's also not consistent,
for example, after I removed /tmp/lei-test and restarted the lei q
command, I got many more important emails, almost in each thread, but
there were still threads without flagged emails.

I'm checking the flags with mutt.

Does anyone know what could be the reason for such behavior?

Thanks,
Max

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] lei q: do not collapse threads with `-tt'
  2023-02-13 16:06 lei q -tt doesn't work properly? Maxim Mikityanskiy
@ 2023-02-14  2:42 ` Eric Wong
  2023-02-26 12:17   ` Maxim Mikityanskiy
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2023-02-14  2:42 UTC (permalink / raw)
  To: Maxim Mikityanskiy; +Cc: meta, Kyle Meyer

Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
> lei q --no-save -a -o /tmp/lei-test -I 'https://lore.kernel.org/all' \
>     -tt 'a:syzbot AND rt:2023-01-01..2023-01-07'

At first, I thought -a (--augment) was causing it...

Sidenote: you also don't need to quote the query (I forget the exact
rules, but I tried to keep quotes easier for phrase searches).

> It looks as if the match works correctly, but the -tt option fails to
> mark most of the matched emails as important, except a few that actually
> got marked (I couldn't find a pattern here). It's also not consistent,
> for example, after I removed /tmp/lei-test and restarted the lei q
> command, I got many more important emails, almost in each thread, but
> there were still threads without flagged emails.

Yes, now it seems it's the collapsing optimization.

> I'm checking the flags with mutt.
> 
> Does anyone know what could be the reason for such behavior?

I think the following patch fixes it.

(I accidentally sent you a private copy with invalid blobs since
I had other unpublished changes)

-----8<-------
Subject: [PATCH] lei q: do not collapse threads with `-tt'

While having Xapian collapse threads is an easy way to reduce
the amount of deduplication work we need to do when writing
out threads; we can't rely on it when using `lei q -tt` since
that needs to flag all hits.

Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://public-inbox.org/git/Y+pgBmj0jxR+cVkD@mail.gmail.com/
---
 lib/PublicInbox/Search.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 2feb3e13..273cc57c 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -460,8 +460,9 @@ sub _enquire_once { # retry_reopen callback
 		$enquire->set_sort_by_relevance_then_value(TS, !$opts->{asc});
 	}
 
-	# `mairix -t / --threads' or JMAP collapseThreads
-	if ($opts->{threads} && has_threadid($self)) {
+	# `lei q -t / --threads' or JMAP collapseThreads; but don't collapse
+	# on `-tt' ({threads} > 1) which sets the Flagged|Important keyword
+	if (($opts->{threads} // 0) == 1 && has_threadid($self)) {
 		$enquire->set_collapse_key(THREADID);
 	}
 	$enquire->get_mset($opts->{offset} || 0, $opts->{limit} || 50);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] lei q: do not collapse threads with `-tt'
  2023-02-14  2:42 ` [PATCH] lei q: do not collapse threads with `-tt' Eric Wong
@ 2023-02-26 12:17   ` Maxim Mikityanskiy
  2023-02-26 17:09     ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Maxim Mikityanskiy @ 2023-02-26 12:17 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Kyle Meyer

On Tue, Feb 14, 2023 at 02:42:32AM +0000, Eric Wong wrote:
> Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
> > lei q --no-save -a -o /tmp/lei-test -I 'https://lore.kernel.org/all' \
> >     -tt 'a:syzbot AND rt:2023-01-01..2023-01-07'
> 
> At first, I thought -a (--augment) was causing it...
> 
> Sidenote: you also don't need to quote the query (I forget the exact
> rules, but I tried to keep quotes easier for phrase searches).
> 
> > It looks as if the match works correctly, but the -tt option fails to
> > mark most of the matched emails as important, except a few that actually
> > got marked (I couldn't find a pattern here). It's also not consistent,
> > for example, after I removed /tmp/lei-test and restarted the lei q
> > command, I got many more important emails, almost in each thread, but
> > there were still threads without flagged emails.
> 
> Yes, now it seems it's the collapsing optimization.
> 
> > I'm checking the flags with mutt.
> > 
> > Does anyone know what could be the reason for such behavior?
> 
> I think the following patch fixes it.

Sorry for taking too long, I finally found a minute to test it, and
unfortunately I didn't see a difference. I queried for:

a:syzbot AND rt:2023-02-01..2023-02-07

and I still saw I lot of threads without a single flag.

I double-checked that the patch was actually applied, killed lei-daemon,
and removed the mailbox directory, but it didn't help.

> (I accidentally sent you a private copy with invalid blobs since
> I had other unpublished changes)
> 
> -----8<-------
> Subject: [PATCH] lei q: do not collapse threads with `-tt'
> 
> While having Xapian collapse threads is an easy way to reduce
> the amount of deduplication work we need to do when writing
> out threads; we can't rely on it when using `lei q -tt` since
> that needs to flag all hits.
> 
> Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
> Link: https://public-inbox.org/git/Y+pgBmj0jxR+cVkD@mail.gmail.com/
> ---
>  lib/PublicInbox/Search.pm | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
> index 2feb3e13..273cc57c 100644
> --- a/lib/PublicInbox/Search.pm
> +++ b/lib/PublicInbox/Search.pm
> @@ -460,8 +460,9 @@ sub _enquire_once { # retry_reopen callback
>  		$enquire->set_sort_by_relevance_then_value(TS, !$opts->{asc});
>  	}
>  
> -	# `mairix -t / --threads' or JMAP collapseThreads
> -	if ($opts->{threads} && has_threadid($self)) {
> +	# `lei q -t / --threads' or JMAP collapseThreads; but don't collapse
> +	# on `-tt' ({threads} > 1) which sets the Flagged|Important keyword
> +	if (($opts->{threads} // 0) == 1 && has_threadid($self)) {
>  		$enquire->set_collapse_key(THREADID);
>  	}
>  	$enquire->get_mset($opts->{offset} || 0, $opts->{limit} || 50);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lei q: do not collapse threads with `-tt'
  2023-02-26 12:17   ` Maxim Mikityanskiy
@ 2023-02-26 17:09     ` Eric Wong
  2023-02-26 17:15       ` [PATCH] doc: note "lei q -tt" is broken with HTTP(S) remotes Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2023-02-26 17:09 UTC (permalink / raw)
  To: Maxim Mikityanskiy; +Cc: meta, Kyle Meyer

Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
> On Tue, Feb 14, 2023 at 02:42:32AM +0000, Eric Wong wrote:
> > Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
> > > lei q --no-save -a -o /tmp/lei-test -I 'https://lore.kernel.org/all' \
> > >     -tt 'a:syzbot AND rt:2023-01-01..2023-01-07'

<snip>

> > Yes, now it seems it's the collapsing optimization.

<snip>

> Sorry for taking too long, I finally found a minute to test it, and
> unfortunately I didn't see a difference. I queried for:
> 
> a:syzbot AND rt:2023-02-01..2023-02-07
> 
> and I still saw I lot of threads without a single flag.
> 
> I double-checked that the patch was actually applied, killed lei-daemon,
> and removed the mailbox directory, but it didn't help.

Ah, oops.  My original fix only works for locally-cloned inboxes;
but not remote (http/https) inboxes...

I think some inconsistency on the client side is also introduced
by using -I/--include vs --only; since -I/--include will use
previously-indexed messages in ~/.local/share/lei/store

Getting -tt to work on remote inboxes will take more effort.
I'm not sure which option is better:

1) Support t=2 natively in the WWW interface.  This requires
   both the server and client to be updated.  It may require
   extra dedupe step on the server, making it more expensive.
   Thinking out loud, I think the dedupe step can be avoided
   by sorting on THREADID...

2) use t=1 in the client as-is, but index the streamed mbox
   locally, first.  This requires a temporary Xapian DB to
   ensure there's no overlap if using --only.
   This only requires a client update, but likely adds more
   complexity.  It also delays updates to the Maildir,
   meaning all messages need to be downloaded before the MUA
   sees it...

I'm leaning towards 1...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] doc: note "lei q -tt" is broken with HTTP(S) remotes
  2023-02-26 17:09     ` Eric Wong
@ 2023-02-26 17:15       ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2023-02-26 17:15 UTC (permalink / raw)
  To: Maxim Mikityanskiy; +Cc: meta, Kyle Meyer

Eric Wong <e@80x24.org> wrote:
> Getting -tt to work on remote inboxes will take more effort.
> I'm not sure which option is better:

I suppose documenting the current breakage first is important:

--------- 8< --------
Subject: [PATCH] doc: note "lei q -tt" is broken with HTTP(S) remotes

I'm still trying to decide how to handle HTTP(S) remotes
properly...

Link: https://public-inbox.org/meta/20230226170931.M947721@dcvr/
---
 Documentation/lei-q.pod | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/lei-q.pod b/Documentation/lei-q.pod
index d52c5b04..5e9a5658 100644
--- a/Documentation/lei-q.pod
+++ b/Documentation/lei-q.pod
@@ -124,6 +124,9 @@ of the same thread.
 TODO: Warning: this flag may become persistent and saved in
 lei/store unless an MUA unflags it!  (Behavior undecided)
 
+Caveat: C<-tt> only works on locally-indexed messages at the
+moment, and not on remote (HTTP(S)) endpoints.
+
 =item --jobs=QUERY_WORKERS[,WRITE_WORKERS]
 =item --jobs=,WRITE_WORKERS
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-26 17:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-13 16:06 lei q -tt doesn't work properly? Maxim Mikityanskiy
2023-02-14  2:42 ` [PATCH] lei q: do not collapse threads with `-tt' Eric Wong
2023-02-26 12:17   ` Maxim Mikityanskiy
2023-02-26 17:09     ` Eric Wong
2023-02-26 17:15       ` [PATCH] doc: note "lei q -tt" is broken with HTTP(S) remotes Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).