unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* find threads where I and Jian participated but not Dave
@ 2016-01-09 21:55 Xu Wang
  2017-06-13 15:57 ` Xu Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Xu Wang @ 2016-01-09 21:55 UTC (permalink / raw)
  To: notmuch

Has someone a notmuch-based script by which to search for threads in
which I have participated and Jian has participated, but Dave has not
participated?

Kind regards,

Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2016-01-09 21:55 find threads where I and Jian participated but not Dave Xu Wang
@ 2017-06-13 15:57 ` Xu Wang
  2017-06-13 23:31   ` David Bremner
  0 siblings, 1 reply; 27+ messages in thread
From: Xu Wang @ 2017-06-13 15:57 UTC (permalink / raw)
  To: notmuch

I bump this. Actually more simple than that, how to search for thread
in which I have participated and Jian has participated? Excluding
threads in which Dave participated is perhaps more complicated.

kind regards,

Xu

On Sat, Jan 9, 2016 at 4:55 PM, Xu Wang <xuwang762@gmail.com> wrote:
> Has someone a notmuch-based script by which to search for threads in
> which I have participated and Jian has participated, but Dave has not
> participated?
>
> Kind regards,
>
> Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-13 15:57 ` Xu Wang
@ 2017-06-13 23:31   ` David Bremner
  2017-06-14  0:24     ` Xu Wang
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: David Bremner @ 2017-06-13 23:31 UTC (permalink / raw)
  To: Xu Wang, notmuch

Xu Wang <xuwang762@gmail.com> writes:

> I bump this. Actually more simple than that, how to search for thread
> in which I have participated and Jian has participated? Excluding
> threads in which Dave participated is perhaps more complicated.

I don't know of an efficient way to do this. You could write a script
something like

notmuch search --output=threads from:Xu  > A
notmuch search --output=threads from:Jian  > B
comm -12 A B

I think the output is sorted, but you might also have to sort A and B

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-13 23:31   ` David Bremner
@ 2017-06-14  0:24     ` Xu Wang
  2017-06-14  3:32     ` Brian Sniffen
  2017-06-15 17:54     ` Daniel Kahn Gillmor
  2 siblings, 0 replies; 27+ messages in thread
From: Xu Wang @ 2017-06-14  0:24 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

ooo that is nice also. I will look into that. I think it is a good
approach. Kind regards to everyone! Xu

On Tue, Jun 13, 2017 at 7:31 PM, David Bremner <david@tethera.net> wrote:
> Xu Wang <xuwang762@gmail.com> writes:
>
>> I bump this. Actually more simple than that, how to search for thread
>> in which I have participated and Jian has participated? Excluding
>> threads in which Dave participated is perhaps more complicated.
>
> I don't know of an efficient way to do this. You could write a script
> something like
>
> notmuch search --output=threads from:Xu  > A
> notmuch search --output=threads from:Jian  > B
> comm -12 A B
>
> I think the output is sorted, but you might also have to sort A and B
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-13 23:31   ` David Bremner
  2017-06-14  0:24     ` Xu Wang
@ 2017-06-14  3:32     ` Brian Sniffen
  2017-06-15 17:54     ` Daniel Kahn Gillmor
  2 siblings, 0 replies; 27+ messages in thread
From: Brian Sniffen @ 2017-06-14  3:32 UTC (permalink / raw)
  To: David Bremner, Xu Wang, notmuch

David Bremner <david@tethera.net> writes:

> Xu Wang <xuwang762@gmail.com> writes:
>
>> I bump this. Actually more simple than that, how to search for thread
>> in which I have participated and Jian has participated? Excluding
>> threads in which Dave participated is perhaps more complicated.
>
> I don't know of an efficient way to do this. You could write a script
> something like
>
> notmuch search --output=threads from:Xu  > A
> notmuch search --output=threads from:Jian  > B
> comm -12 A B
>
> I think the output is sorted, but you might also have to sort A and B

I did test that part before posting mine, and the output is inverted.
--sort=oldest-first *also* gets it wrong, though in more subtle
ways. Piping through `sort -u` is the only way to be sure (I can't
imagine the -u helping, but I also can't imagine it hurting and it's
cheap). 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-13 23:31   ` David Bremner
  2017-06-14  0:24     ` Xu Wang
  2017-06-14  3:32     ` Brian Sniffen
@ 2017-06-15 17:54     ` Daniel Kahn Gillmor
  2017-06-15 20:20       ` David Bremner
  2 siblings, 1 reply; 27+ messages in thread
From: Daniel Kahn Gillmor @ 2017-06-15 17:54 UTC (permalink / raw)
  To: David Bremner, Xu Wang, notmuch

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On Tue 2017-06-13 20:31:19 -0300, David Bremner wrote:
> Xu Wang <xuwang762@gmail.com> writes:
>
>> I bump this. Actually more simple than that, how to search for thread
>> in which I have participated and Jian has participated? Excluding
>> threads in which Dave participated is perhaps more complicated.
>
> I don't know of an efficient way to do this. You could write a script
> something like
>
> notmuch search --output=threads from:Xu  > A
> notmuch search --output=threads from:Jian  > B
> comm -12 A B
>
> I think the output is sorted, but you might also have to sort A and B

that said, it'd be really nice to have something like this in notmuch
core.

One of my long-standing wishes is to be able to say "show me mails in my
inbox from people who have replied to messages i've sent them".

This could be re-framed as "show me threads in which i've participated,
where there are some messages flagged with 'inbox'".  but generating a
huge list of all threads in which i've participated, just to be able to
do an intersection operation with a (much smaller) list of all threads
that have a message with the inbox flag seems like a pretty gross
inefficiency.

     --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-15 17:54     ` Daniel Kahn Gillmor
@ 2017-06-15 20:20       ` David Bremner
  2017-06-16  1:07         ` Matt Armstrong
  2017-06-16  6:28         ` Gaute Hope
  0 siblings, 2 replies; 27+ messages in thread
From: David Bremner @ 2017-06-15 20:20 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Xu Wang, notmuch

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

>
> One of my long-standing wishes is to be able to say "show me mails in my
> inbox from people who have replied to messages i've sent them".
>
> This could be re-framed as "show me threads in which i've participated,
> where there are some messages flagged with 'inbox'".  but generating a
> huge list of all threads in which i've participated, just to be able to
> do an intersection operation with a (much smaller) list of all threads
> that have a message with the inbox flag seems like a pretty gross
> inefficiency.

At the moment the best we could do is essentially the same algorithm,
but in C instead of shell / python. Threads are not documents in the
database, so they can't efficiently be searched for.  Of course we could
change that, but those kind of changes take a fair amount of effort, and
some careful design work.

d

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-15 20:20       ` David Bremner
@ 2017-06-16  1:07         ` Matt Armstrong
  2017-06-16  6:28         ` Gaute Hope
  1 sibling, 0 replies; 27+ messages in thread
From: Matt Armstrong @ 2017-06-16  1:07 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, Xu Wang, notmuch

David Bremner <david@tethera.net> writes:

> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>>
>> One of my long-standing wishes is to be able to say "show me mails in my
>> inbox from people who have replied to messages i've sent them".
>>
>> This could be re-framed as "show me threads in which i've participated,
>> where there are some messages flagged with 'inbox'".  but generating a
>> huge list of all threads in which i've participated, just to be able to
>> do an intersection operation with a (much smaller) list of all threads
>> that have a message with the inbox flag seems like a pretty gross
>> inefficiency.
>
> At the moment the best we could do is essentially the same algorithm,
> but in C instead of shell / python. Threads are not documents in the
> database, so they can't efficiently be searched for.  Of course we could
> change that, but those kind of changes take a fair amount of effort, and
> some careful design work.

Even if the C level does the same algorithm, it may be able to do some
optimizations on behalf of the "scripting layer" queries.

I suspect that a separate "thread based" query language may be an
interesting area of investigation.

Taking Daniel's last example, "show me mails in my inbox from people who
have replied to messages I've sent them".  That isn't even an entirely
unambiguous query specification.  What is *actually* desired:

a) show me messages from X that are part of threads where at least one
message is in the inbox and for which at least one message is from me.

or,

b) same as (a) but the "message from X" must be in the inbox (not just
any other message in the thread)

or,

c) same as (a) or (b) but the "message from X" is a reply (e.g. dated
after, or in-reply-to) a message from me.

or,

d) same as (c) but "message from X" is "unread", etc.

Like David's 'comm -12 A B' solution, these pretty quickly start looking
like multi-pass, or structed/nested, queries.  They are a lot more like
relational database queries (SQL) than the single-pass, flat (NoSQL)
queries we typically use with notmuch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-15 20:20       ` David Bremner
  2017-06-16  1:07         ` Matt Armstrong
@ 2017-06-16  6:28         ` Gaute Hope
  2017-06-21 20:04           ` Matt Armstrong
  1 sibling, 1 reply; 27+ messages in thread
From: Gaute Hope @ 2017-06-16  6:28 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, notmuch, Xu Wang

David Bremner writes on juni 15, 2017 22:20:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
> 
>>
>> One of my long-standing wishes is to be able to say "show me mails in my
>> inbox from people who have replied to messages i've sent them".
>>
>> This could be re-framed as "show me threads in which i've participated,
>> where there are some messages flagged with 'inbox'".  but generating a
>> huge list of all threads in which i've participated, just to be able to
>> do an intersection operation with a (much smaller) list of all threads
>> that have a message with the inbox flag seems like a pretty gross
>> inefficiency.
> 
> At the moment the best we could do is essentially the same algorithm,
> but in C instead of shell / python. Threads are not documents in the
> database, so they can't efficiently be searched for.  Of course we could
> change that, but those kind of changes take a fair amount of effort, and
> some careful design work.

There are probably multiple earlier references to this, but here's one:

  id:1471858269.x2m28lgosh.astroid@strange

matching against the whole thread vs the individual messages would be
very useful!


Regards, Gaute


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-16  6:28         ` Gaute Hope
@ 2017-06-21 20:04           ` Matt Armstrong
  2017-06-21 21:30             ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 27+ messages in thread
From: Matt Armstrong @ 2017-06-21 20:04 UTC (permalink / raw)
  To: Gaute Hope, David Bremner, Daniel Kahn Gillmor, notmuch, Xu Wang

Gaute Hope <eg@gaute.vetsj.com> writes:

> David Bremner writes on juni 15, 2017 22:20:
>> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>>>
>>> One of my long-standing wishes is to be able to say "show me mails in my
>>> inbox from people who have replied to messages i've sent them".
>>>
>>> This could be re-framed as "show me threads in which i've participated,
>>> where there are some messages flagged with 'inbox'".  but generating a
>>> huge list of all threads in which i've participated, just to be able to
>>> do an intersection operation with a (much smaller) list of all threads
>>> that have a message with the inbox flag seems like a pretty gross
>>> inefficiency.
>> 
>> At the moment the best we could do is essentially the same algorithm,
>> but in C instead of shell / python. Threads are not documents in the
>> database, so they can't efficiently be searched for.  Of course we could
>> change that, but those kind of changes take a fair amount of effort, and
>> some careful design work.
>
> There are probably multiple earlier references to this, but here's one:
>
>   id:1471858269.x2m28lgosh.astroid@strange
>
> matching against the whole thread vs the individual messages would be
> very useful!

For what it is worth, I've found this idea from Daniel intriguing and
pretty useful in practice:

  "show me threads in which i've participated, where there are some
   messages flagged with 'inbox'"

I implement it like this in my post-new hook:

    # All messages in threads in which I participate get tag:participated
    notmuch search --output=threads from:marmstrong | \
      sed -e 's,^,+participated -- ,' | \
      notmuch tag --batch

On my database the query takes about two seconds to run, and lets me to
searches like "tag:inbox and tag:participated".  The set of threads
found is typically a subset of "tag:inbox and to:marmstrong", but not
always, and I now have two canned "inbox" searches:

  "participated" -> "tag:inbox and tag:participated"
  "me" -> "tag:inbox and to:marmstrong and not tag:participated"

The "me" search tends to be new stuff, bot-generated notifications, and
such.  The "participated" is typically active conversations and stuff
I've already engaged with, or initiated myself.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-21 20:04           ` Matt Armstrong
@ 2017-06-21 21:30             ` Daniel Kahn Gillmor
  2017-06-22  6:08               ` Gaute Hope
  2017-06-22 20:15               ` find threads where I and Jian participated but not Dave Matt Armstrong
  0 siblings, 2 replies; 27+ messages in thread
From: Daniel Kahn Gillmor @ 2017-06-21 21:30 UTC (permalink / raw)
  To: Matt Armstrong, Gaute Hope, David Bremner, notmuch, Xu Wang

[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]

On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
> For what it is worth, I've found this idea from Daniel intriguing and
> pretty useful in practice:
>
>   "show me threads in which i've participated, where there are some
>    messages flagged with 'inbox'"
>
> I implement it like this in my post-new hook:
>
>     # All messages in threads in which I participate get tag:participated
>     notmuch search --output=threads from:marmstrong | \
>       sed -e 's,^,+participated -- ,' | \
>       notmuch tag --batch

cool, thx for the suggestion.

the "notmuch search" part of the pipeline alone takes ~19s (wall time,
and actual CPU time) for me though :/  It returns 30504 threads!  how
many threads do you get?

you're effectively re-tagging every single message in every participated
thread every time you run "notmuch new", right?

> On my database the query takes about two seconds to run, and lets me to
> searches like "tag:inbox and tag:participated".  The set of threads
> found is typically a subset of "tag:inbox and to:marmstrong", but not
> always, and I now have two canned "inbox" searches:
>
>   "participated" -> "tag:inbox and tag:participated"
>   "me" -> "tag:inbox and to:marmstrong and not tag:participated"
>
> The "me" search tends to be new stuff, bot-generated notifications, and
> such.  The "participated" is typically active conversations and stuff
> I've already engaged with, or initiated myself.

I like this outcome!  I'm just looking for a way to do it that wouldn't
cost me so many cycles -- Maybe if i only run "notmuch new" once a day
:)

     --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-21 21:30             ` Daniel Kahn Gillmor
@ 2017-06-22  6:08               ` Gaute Hope
  2017-06-22  6:22                 ` Gaute Hope
  2017-06-22 20:15               ` find threads where I and Jian participated but not Dave Matt Armstrong
  1 sibling, 1 reply; 27+ messages in thread
From: Gaute Hope @ 2017-06-22  6:08 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, Matt Armstrong, notmuch,
	Xu Wang

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

Daniel Kahn Gillmor writes on juni 21, 2017 23:30:
> On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
>> For what it is worth, I've found this idea from Daniel intriguing and
>> pretty useful in practice:
>>
>>   "show me threads in which i've participated, where there are some
>>    messages flagged with 'inbox'"
>>
>> I implement it like this in my post-new hook:
>>
>>     # All messages in threads in which I participate get tag:participated
>>     notmuch search --output=threads from:marmstrong | \
>>       sed -e 's,^,+participated -- ,' | \
>>       notmuch tag --batch
> 
> cool, thx for the suggestion.
> 
> the "notmuch search" part of the pipeline alone takes ~19s (wall time,
> and actual CPU time) for me though :/  It returns 30504 threads!  how
> many threads do you get?

Is there any reason why you do not filter on a tag 'new' as well?

     notmuch search --output=threads from:marmstrong and tag:new | \
       sed -e 's,^,+participated -- ,' | \
       notmuch tag --batch


Regards, Gaute


[-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-22  6:08               ` Gaute Hope
@ 2017-06-22  6:22                 ` Gaute Hope
  2017-06-23  0:00                   ` Matt Armstrong
  0 siblings, 1 reply; 27+ messages in thread
From: Gaute Hope @ 2017-06-22  6:22 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, Matt Armstrong, notmuch,
	Xu Wang

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

Gaute Hope writes on juni 22, 2017 8:08:
> Daniel Kahn Gillmor writes on juni 21, 2017 23:30:
>> On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
>>> For what it is worth, I've found this idea from Daniel intriguing and
>>> pretty useful in practice:
>>>
>>>   "show me threads in which i've participated, where there are some
>>>    messages flagged with 'inbox'"
>>>
>>> I implement it like this in my post-new hook:
>>>
>>>     # All messages in threads in which I participate get tag:participated
>>>     notmuch search --output=threads from:marmstrong | \
>>>       sed -e 's,^,+participated -- ,' | \
>>>       notmuch tag --batch
>> 
>> cool, thx for the suggestion.
>> 
>> the "notmuch search" part of the pipeline alone takes ~19s (wall time,
>> and actual CPU time) for me though :/  It returns 30504 threads!  how
>> many threads do you get?
> 
> Is there any reason why you do not filter on a tag 'new' as well?
> 
>      notmuch search --output=threads from:marmstrong and tag:new | \
>        sed -e 's,^,+participated -- ,' | \
>        notmuch tag --batch
> 

Nevermind, I get it - it might be possible to add a temporary tag 
new-tag to the whole thread first and not just new messages. That might 
be faster. As long as all sent messages get the new tag as well.


[-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-21 21:30             ` Daniel Kahn Gillmor
  2017-06-22  6:08               ` Gaute Hope
@ 2017-06-22 20:15               ` Matt Armstrong
  1 sibling, 0 replies; 27+ messages in thread
From: Matt Armstrong @ 2017-06-22 20:15 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Gaute Hope, David Bremner, notmuch, Xu Wang

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
>> For what it is worth, I've found this idea from Daniel intriguing and
>> pretty useful in practice:
>>
>>   "show me threads in which i've participated, where there are some
>>    messages flagged with 'inbox'"
>>
>> I implement it like this in my post-new hook:
>>
>>     # All messages in threads in which I participate get tag:participated
>>     notmuch search --output=threads from:marmstrong | \
>>       sed -e 's,^,+participated -- ,' | \
>>       notmuch tag --batch
>
> cool, thx for the suggestion.
>
> the "notmuch search" part of the pipeline alone takes ~19s (wall time,
> and actual CPU time) for me though :/  It returns 30504 threads!  how
> many threads do you get?

The query returns 6600 threads.  I'm getting 2 seconds wall clock time.


> you're effectively re-tagging every single message in every participated
> thread every time you run "notmuch new", right?

Yeah, the "batch script" that the above search+sed creates and pipes
into "notmuch tag --batch" is 265K, but it only takes 0.5 seconds to
execute.  My understanding is that "notmuch tag" is smart enough to do
no work if the tag is already present on a message, so the only changes
happening in the database are actually for new stuff.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: find threads where I and Jian participated but not Dave
  2017-06-22  6:22                 ` Gaute Hope
@ 2017-06-23  0:00                   ` Matt Armstrong
  2017-06-25 15:46                     ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Daniel Kahn Gillmor
  0 siblings, 1 reply; 27+ messages in thread
From: Matt Armstrong @ 2017-06-23  0:00 UTC (permalink / raw)
  To: Gaute Hope, David Bremner, Daniel Kahn Gillmor, notmuch, Xu Wang

Gaute Hope <eg@gaute.vetsj.com> writes:

> Gaute Hope writes on juni 22, 2017 8:08:
>> Daniel Kahn Gillmor writes on juni 21, 2017 23:30:
>>> On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
>>>> For what it is worth, I've found this idea from Daniel intriguing and
>>>> pretty useful in practice:
>>>>
>>>>   "show me threads in which i've participated, where there are some
>>>>    messages flagged with 'inbox'"
>>>>
>>>> I implement it like this in my post-new hook:
>>>>
>>>>     # All messages in threads in which I participate get tag:participated
>>>>     notmuch search --output=threads from:marmstrong | \
>>>>       sed -e 's,^,+participated -- ,' | \
>>>>       notmuch tag --batch
>>> 
>>> cool, thx for the suggestion.
>>> 
>>> the "notmuch search" part of the pipeline alone takes ~19s (wall time,
>>> and actual CPU time) for me though :/  It returns 30504 threads!  how
>>> many threads do you get?
>> 
>> Is there any reason why you do not filter on a tag 'new' as well?
>> 
>>      notmuch search --output=threads from:marmstrong and tag:new | \
>>        sed -e 's,^,+participated -- ,' | \
>>        notmuch tag --batch
>> 
>
> Nevermind, I get it - it might be possible to add a temporary tag 
> new-tag to the whole thread first and not just new messages. That might 
> be faster. As long as all sent messages get the new tag as well.

Gaute, I took this as a challenge and came up with what I think is an
equivalent but more efficient approach.  The disadvantage is that it is
much more complex.  The advantage is that it runs in under 0.2 seconds
to process a day's worth of my "new" mail.

I now have this in my notmuch post-hook.  I believe I could change the
"tag:new OR date:today" query to just "tag:new".  The "OR date:today"
helped during interactive development.

# All threads in which I participate get tag:participated
#  1) Find all threads with a message tagged new
#     (finding all 'today' messages helps during testing,
#     but isn't necessary)
#  2) Run through "xargs -s 2048 echo" to to group threads
#     lines of about 2K in size.
#  3) For each line (2) produces, narrow the threads to
#     those containing a message from me.
#  4) For each such thread, tag every message with +participated.
notmuch search --output=threads tag:new OR date:today | \
  xargs -s 2048 echo | \
  xargs -I '{}' notmuch search \
  --output=threads from:marmstrong AND \( '{}' \) | \
  sed -e 's,^,+participated -- ,' | \
  notmuch tag --batch


The basic idea is that each run of the notmuch post-hook will
incorporate relatively little mail, so the number of unique threads will
be relatively small.  So, we just list them all by thread ID.

Then for each thread with new messages, we figure out which threads have
a message from:marmstrong (it need not be the new message).

We then tag all messages in each of those threads with +participated.

You said "it might be possible to add a temporary tag new-tag to the
whole thread first and not just new messages." -- Yes, and that is
implicitly what I am doing, except that each such thread is instead
tracked in an ephemeral way through the xargs based shell pipeline.

I did try an approach of explicitly labeling all messages in "new"
threads, temporarily, but that was slower.

You said "As long as all sent messages get the new tag as well." --
true, and I'm not sure about that.  My primary use for this is to
discover new activity from others *after* I've participated in a thread,
so I don't much care if a thread that is "participated in" is not tagged
that way until some mail from somebody else arrives.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-23  0:00                   ` Matt Armstrong
@ 2017-06-25 15:46                     ` Daniel Kahn Gillmor
  2017-06-25 17:14                       ` David Bremner
                                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Daniel Kahn Gillmor @ 2017-06-25 15:46 UTC (permalink / raw)
  To: Matt Armstrong, Gaute Hope, David Bremner, notmuch, Xu Wang

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

Hey all--

I really appreciate the thought and experimentation and research that's
gone into this thread!

On Thu 2017-06-22 17:00:58 -0700, Matt Armstrong wrote:
> # All threads in which I participate get tag:participated
> #  1) Find all threads with a message tagged new
> #     (finding all 'today' messages helps during testing,
> #     but isn't necessary)
> #  2) Run through "xargs -s 2048 echo" to to group threads
> #     lines of about 2K in size.
> #  3) For each line (2) produces, narrow the threads to
> #     those containing a message from me.
> #  4) For each such thread, tag every message with +participated.
> notmuch search --output=threads tag:new OR date:today | \
>   xargs -s 2048 echo | \
>   xargs -I '{}' notmuch search \
>   --output=threads from:marmstrong AND \( '{}' \) | \
>   sed -e 's,^,+participated -- ,' | \
>   notmuch tag --batch

This makes sense to me, modulo the split into 2048-octet lines (magic
numbers make me nervous, though i think i understand why you've included
it).

That said, i've been trying to think lately about how to make notmuch a
tool that's usable by normal humans, who probably won't want to
understand all the moving pieces here.  I don't want yet another MUA
that requires you to edit a turing-complete config file to get useful
functionality -- we already have mutt for that :)

Is there a way that we can push this idea/functionality further into
the core of notmuch in a way that makes it easier to use?

For example, would it make sense to have "notmuch new" (and "notmuch
insert") do "thread-based propagation" of specific tags?  for example,
consider the following (i've just made up the config options):

    notmuch config set new.from_self_tags participated
    notmuch config set new.propagate_thread_tags participated

the idea is that "new.from_self_tags" would be applied by "notmuch new" or
"notmuch insert" if the message was explicitly from: user.primary_email
or user.other_email.

and additionally, if a message was inserted into a thread which has any
of the new.propagated_thread_tags applied, the new message would also
get those tags.

What do y'all think?

    --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-25 15:46                     ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Daniel Kahn Gillmor
@ 2017-06-25 17:14                       ` David Bremner
  2017-06-26 20:49                         ` Matt Armstrong
  2017-08-20 13:48                         ` Jani Nikula
  2017-06-25 17:40                       ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Brian Sniffen
  2017-06-26 20:54                       ` Matt Armstrong
  2 siblings, 2 replies; 27+ messages in thread
From: David Bremner @ 2017-06-25 17:14 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Matt Armstrong, Gaute Hope, notmuch, Xu Wang

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

>
> For example, would it make sense to have "notmuch new" (and "notmuch
> insert") do "thread-based propagation" of specific tags?  for example,
> consider the following (i've just made up the config options):
>
>     notmuch config set new.from_self_tags participated
>     notmuch config set new.propagate_thread_tags participated
>
> the idea is that "new.from_self_tags" would be applied by "notmuch new" or
> "notmuch insert" if the message was explicitly from: user.primary_email
> or user.other_email.

At the moment I'm more inclined to work on "doing things right" by
adding xapian documents (database items) for threads. Many of the ideas
in this thread amount to working around their absence. OTOH, it's
certainly true that this last idea (unlike some of the query ideas)
would be relatively straightforward to impliment.

d

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-25 15:46                     ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Daniel Kahn Gillmor
  2017-06-25 17:14                       ` David Bremner
@ 2017-06-25 17:40                       ` Brian Sniffen
  2017-06-26 20:54                       ` Matt Armstrong
  2 siblings, 0 replies; 27+ messages in thread
From: Brian Sniffen @ 2017-06-25 17:40 UTC (permalink / raw)
  To: Daniel Kahn Gillmor
  Cc: Matt Armstrong, Gaute Hope, David Bremner, notmuch, Xu Wang

Well, it's not quite *away* from Turing complete configuration... but it sounds like you might like the program that computes tags for new messages to get not only the message but also its thread id and read-only access to the database?  Then both filtering "participated" and computing "participated" from "sent" become easy. And fancier ideas like computing tags from senders, list-id, the rest. 

-- 
Brian Sniffen

> On Jun 25, 2017, at 11:46 AM, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> 
> Hey all--
> 
> I really appreciate the thought and experimentation and research that's
> gone into this thread!
> 
>> On Thu 2017-06-22 17:00:58 -0700, Matt Armstrong wrote:
>> # All threads in which I participate get tag:participated
>> #  1) Find all threads with a message tagged new
>> #     (finding all 'today' messages helps during testing,
>> #     but isn't necessary)
>> #  2) Run through "xargs -s 2048 echo" to to group threads
>> #     lines of about 2K in size.
>> #  3) For each line (2) produces, narrow the threads to
>> #     those containing a message from me.
>> #  4) For each such thread, tag every message with +participated.
>> notmuch search --output=threads tag:new OR date:today | \
>>  xargs -s 2048 echo | \
>>  xargs -I '{}' notmuch search \
>>  --output=threads from:marmstrong AND \( '{}' \) | \
>>  sed -e 's,^,+participated -- ,' | \
>>  notmuch tag --batch
> 
> This makes sense to me, modulo the split into 2048-octet lines (magic
> numbers make me nervous, though i think i understand why you've included
> it).
> 
> That said, i've been trying to think lately about how to make notmuch a
> tool that's usable by normal humans, who probably won't want to
> understand all the moving pieces here.  I don't want yet another MUA
> that requires you to edit a turing-complete config file to get useful
> functionality -- we already have mutt for that :)
> 
> Is there a way that we can push this idea/functionality further into
> the core of notmuch in a way that makes it easier to use?
> 
> For example, would it make sense to have "notmuch new" (and "notmuch
> insert") do "thread-based propagation" of specific tags?  for example,
> consider the following (i've just made up the config options):
> 
>    notmuch config set new.from_self_tags participated
>    notmuch config set new.propagate_thread_tags participated
> 
> the idea is that "new.from_self_tags" would be applied by "notmuch new" or
> "notmuch insert" if the message was explicitly from: user.primary_email
> or user.other_email.
> 
> and additionally, if a message was inserted into a thread which has any
> of the new.propagated_thread_tags applied, the new message would also
> get those tags.
> 
> What do y'all think?
> 
>    --dkg
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-25 17:14                       ` David Bremner
@ 2017-06-26 20:49                         ` Matt Armstrong
  2017-06-26 23:09                           ` David Bremner
  2017-08-20 13:35                           ` Jani Nikula
  2017-08-20 13:48                         ` Jani Nikula
  1 sibling, 2 replies; 27+ messages in thread
From: Matt Armstrong @ 2017-06-26 20:49 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, Gaute Hope, notmuch, Xu Wang

David Bremner <david@tethera.net> writes:

> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>>
>> For example, would it make sense to have "notmuch new" (and "notmuch
>> insert") do "thread-based propagation" of specific tags?  for example,
>> consider the following (i've just made up the config options):
>>
>>     notmuch config set new.from_self_tags participated
>>     notmuch config set new.propagate_thread_tags participated
>>
>> the idea is that "new.from_self_tags" would be applied by "notmuch new" or
>> "notmuch insert" if the message was explicitly from: user.primary_email
>> or user.other_email.
>
> At the moment I'm more inclined to work on "doing things right" by
> adding xapian documents (database items) for threads. Many of the
> ideas in this thread amount to working around their absence. OTOH,
> it's certainly true that this last idea (unlike some of the query
> ideas) would be relatively straightforward to impliment.

David, that makes a lot of sense.

Take two other examples from my post-hook:

    # Unmute all threads with new messages sent to me.
    notmuch search --output=threads tag:new AND tag:me | \
      xargs --no-run-if-empty notmuch tag -muted --

    # Remove all muted threads from the inbox and mark
    # every message in them muted.
    notmuch search --output=threads tag:muted | \
      xargs --no-run-if-empty notmuch tag -inbox +muted --

Above I'm just operating on all the messages in a thread as a unit.  If
notmuch had a 'database item' for each thread, I could potentially tag
just that to achieve similar results with simpler commands.

It has some subtle implications.  E.g. when I tag through a particular
UI do I mean to tag a particular message or the thread?  Is it worth
making the user think about the difference?  Is there some way to
express this such that they never do?  Are some tags configured to
always apply to the thread?  Are these the only tags that do?  Or do
"thread tags" always contain the union of all associated message tags?

Fun stuff.  :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-25 15:46                     ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Daniel Kahn Gillmor
  2017-06-25 17:14                       ` David Bremner
  2017-06-25 17:40                       ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Brian Sniffen
@ 2017-06-26 20:54                       ` Matt Armstrong
  2 siblings, 0 replies; 27+ messages in thread
From: Matt Armstrong @ 2017-06-26 20:54 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Gaute Hope, David Bremner, notmuch, Xu Wang

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> Hey all--
>
> I really appreciate the thought and experimentation and research that's
> gone into this thread!
>
> On Thu 2017-06-22 17:00:58 -0700, Matt Armstrong wrote:
>> # All threads in which I participate get tag:participated
>> #  1) Find all threads with a message tagged new
>> #     (finding all 'today' messages helps during testing,
>> #     but isn't necessary)
>> #  2) Run through "xargs -s 2048 echo" to to group threads
>> #     lines of about 2K in size.
>> #  3) For each line (2) produces, narrow the threads to
>> #     those containing a message from me.
>> #  4) For each such thread, tag every message with +participated.
>> notmuch search --output=threads tag:new OR date:today | \
>>   xargs -s 2048 echo | \
>>   xargs -I '{}' notmuch search \
>>   --output=threads from:marmstrong AND \( '{}' \) | \
>>   sed -e 's,^,+participated -- ,' | \
>>   notmuch tag --batch
>
> This makes sense to me, modulo the split into 2048-octet lines (magic
> numbers make me nervous, though i think i understand why you've included
> it).

Yes, the two xargs commands and "2048 business" is just a hack to work
around the documented limitations of "xargs -I".  I'd love to come up
with a simpler way to do this.  I suspect there is one, but when it
comes to this kind of Unix shell hackery, I usually stop once I get to
something that works.  :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-26 20:49                         ` Matt Armstrong
@ 2017-06-26 23:09                           ` David Bremner
  2017-08-20 13:35                           ` Jani Nikula
  1 sibling, 0 replies; 27+ messages in thread
From: David Bremner @ 2017-06-26 23:09 UTC (permalink / raw)
  To: Matt Armstrong, Daniel Kahn Gillmor, Gaute Hope, notmuch, Xu Wang

Matt Armstrong <marmstrong@google.com> writes:

> David Bremner <david@tethera.net> writes:
>
> It has some subtle implications.  E.g. when I tag through a particular
> UI do I mean to tag a particular message or the thread?  Is it worth
> making the user think about the difference?  Is there some way to
> express this such that they never do?  Are some tags configured to
> always apply to the thread?  Are these the only tags that do?  Or do
> "thread tags" always contain the union of all associated message tags?

My initial thinking is that thread documents would just have the union
of the terms (tags and otherwise) of all messages in the thread. So
they'd be visible only when querying, and not need to be dumped and
restored.

Before I get too worried about the UI (i.e. how does one specify that a
query is meant to be applied threadwise), I want to have some idea about
the time and space costs of maintaining those extra documents in the
database.

d

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-26 20:49                         ` Matt Armstrong
  2017-06-26 23:09                           ` David Bremner
@ 2017-08-20 13:35                           ` Jani Nikula
  1 sibling, 0 replies; 27+ messages in thread
From: Jani Nikula @ 2017-08-20 13:35 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, Daniel Kahn Gillmor, Gaute Hope,
	notmuch, Xu Wang

On Mon, 26 Jun 2017, Matt Armstrong <marmstrong@google.com> wrote:
> David Bremner <david@tethera.net> writes:
>
>> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>>
>>>
>>> For example, would it make sense to have "notmuch new" (and "notmuch
>>> insert") do "thread-based propagation" of specific tags?  for example,
>>> consider the following (i've just made up the config options):
>>>
>>>     notmuch config set new.from_self_tags participated
>>>     notmuch config set new.propagate_thread_tags participated
>>>
>>> the idea is that "new.from_self_tags" would be applied by "notmuch new" or
>>> "notmuch insert" if the message was explicitly from: user.primary_email
>>> or user.other_email.
>>
>> At the moment I'm more inclined to work on "doing things right" by
>> adding xapian documents (database items) for threads. Many of the
>> ideas in this thread amount to working around their absence. OTOH,
>> it's certainly true that this last idea (unlike some of the query
>> ideas) would be relatively straightforward to impliment.
>
> David, that makes a lot of sense.
>
> Take two other examples from my post-hook:
>
>     # Unmute all threads with new messages sent to me.
>     notmuch search --output=threads tag:new AND tag:me | \
>       xargs --no-run-if-empty notmuch tag -muted --
>
>     # Remove all muted threads from the inbox and mark
>     # every message in them muted.
>     notmuch search --output=threads tag:muted | \
>       xargs --no-run-if-empty notmuch tag -inbox +muted --
>
> Above I'm just operating on all the messages in a thread as a unit.  If
> notmuch had a 'database item' for each thread, I could potentially tag
> just that to achieve similar results with simpler commands.

FWIW, this is my approach to "thread tags", from my post-new hook:

# Propagate thread tags
THREAD_TAGS="muted"
for tag in $THREAD_TAGS; do
    notmuch tag +$tag $(notmuch search --output=threads tag:$tag)
done

Basically if I tag a message muted, all messages in the thread will get
the tag on next post-new.

This may or may not be related to the problems people in this thread are
trying to solve. ;)

BR,
Jani.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave]
  2017-06-25 17:14                       ` David Bremner
  2017-06-26 20:49                         ` Matt Armstrong
@ 2017-08-20 13:48                         ` Jani Nikula
  2017-08-20 21:32                           ` [PATCH] WIP: add thread subqueries David Bremner
  1 sibling, 1 reply; 27+ messages in thread
From: Jani Nikula @ 2017-08-20 13:48 UTC (permalink / raw)
  To: David Bremner, Daniel Kahn Gillmor, Matt Armstrong, Gaute Hope,
	notmuch, Xu Wang

On Sun, 25 Jun 2017, David Bremner <david@tethera.net> wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>>
>> For example, would it make sense to have "notmuch new" (and "notmuch
>> insert") do "thread-based propagation" of specific tags?  for example,
>> consider the following (i've just made up the config options):
>>
>>     notmuch config set new.from_self_tags participated
>>     notmuch config set new.propagate_thread_tags participated
>>
>> the idea is that "new.from_self_tags" would be applied by "notmuch new" or
>> "notmuch insert" if the message was explicitly from: user.primary_email
>> or user.other_email.
>
> At the moment I'm more inclined to work on "doing things right" by
> adding xapian documents (database items) for threads. Many of the ideas
> in this thread amount to working around their absence. OTOH, it's
> certainly true that this last idea (unlike some of the query ideas)
> would be relatively straightforward to impliment.

I admit I didn't thoroughly read the entire thread... but I think
there's an alternative to adding thread documents, with reasonable
design, to address the original problem in this thread.

Have a new prefix that evaluates to the threads of the sub-query passed
as the prefix value, say thread-of:<query> that you could use as part of
the query. (Or, make thread: work on queries as well as thread IDs.)

The original query in this thread becomes:

thread-of:"from:me" and thread-of:"from:jian" and not thread-of:"from:dave".

As a side effect, this also makes thread: queries portable across
databases via thread-of:"id:<message-id>".

I think this should be doable in current Xapian that supports custom
prefix handlers.

IIRC the idea was suggested by Austin ages ago.


BR,
Jani.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] WIP: add thread subqueries.
  2017-08-20 13:48                         ` Jani Nikula
@ 2017-08-20 21:32                           ` David Bremner
  2017-08-21  1:35                             ` David Bremner
  0 siblings, 1 reply; 27+ messages in thread
From: David Bremner @ 2017-08-20 21:32 UTC (permalink / raw)
  To: Jani Nikula, David Bremner, Daniel Kahn Gillmor, Matt Armstrong,
	Gaute Hope, notmuch, Xu Wang

This is a quick implimentation of Jani's suggestion. Performance is
not great in some obvious cases, but maybe that doesn't matter
---

I just wrote this while waiting for dinner to cook, so it might be
full of bugs, and or strange effects of hunger.

If you're interested in this feature, please test, let me know if it's
workable for your use case.  The syntax here is thread:{$query}. The
usual caveats about whitespace apply. The running example is

	     thread:{from:me} and thread:{from:jian} and not thread:{from:dave}

Since each of those typically expands to a large number of threads
(and is processed independently), this can take a bit of time to process.
If we like the idea, maybe some caching can help.


 lib/Makefile.local |  3 ++-
 lib/database.cc    |  6 ++++-
 lib/thread-fp.cc   | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/thread-fp.h    | 42 +++++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 lib/thread-fp.cc
 create mode 100644 lib/thread-fp.h

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 0b5c4b08..1c1b96f1 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -57,7 +57,8 @@ libnotmuch_cxx_srcs =		\
 	$(dir)/query-fp.cc      \
 	$(dir)/config.cc	\
 	$(dir)/regexp-fields.cc	\
-	$(dir)/thread.cc
+	$(dir)/thread.cc \
+	$(dir)/thread-fp.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
 
diff --git a/lib/database.cc b/lib/database.cc
index 8f0e22a8..be63fdbb 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "thread-fp.h"
 #include "regexp-fields.h"
 #include "string-util.h"
 
@@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
-    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL },
+    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "is",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 	    fp = (new DateFieldProcessor())->release ();
 	else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
 	    fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
+	else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
+	    fp = (new ThreadFieldProcessor (*notmuch->query_parser, notmuch))->release ();
 	else
 	    fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
 					    *notmuch->query_parser, notmuch))->release ();
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
new file mode 100644
index 00000000..7cfe0721
--- /dev/null
+++ b/lib/thread-fp.cc
@@ -0,0 +1,68 @@
+/* thread-fp.cc - "thread:" field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2016 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include "database-private.h"
+#include "thread-fp.h"
+#include <iostream>
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+
+Xapian::Query
+ThreadFieldProcessor::operator() (const std::string & str)
+{
+    notmuch_status_t status;
+
+    if (str.at (0) == '{') {
+	if (str.length() > 1 && str.at (str.size () - 1) == '}'){
+	    std::string subquery_str = str.substr(1,str.size () - 2);
+	    notmuch_query_t *subquery = notmuch_query_create (notmuch, subquery_str.c_str());
+	    notmuch_threads_t *threads;
+	    notmuch_thread_t *thread;
+	    std::vector<std::string> terms;
+
+	    if (!subquery)
+		throw Xapian::QueryParserError ("failed to create subquery for '" + subquery_str + "'");
+
+	    status = notmuch_query_search_threads (subquery, &threads);
+	    if (status)
+		throw Xapian::QueryParserError ("failed to search threads for '" + subquery_str + "'");
+
+
+	    for (; notmuch_threads_valid (threads);  notmuch_threads_move_to_next (threads)) {
+		std::string term = "G";
+
+		thread = notmuch_threads_get (threads);
+		term += notmuch_thread_get_thread_id (thread);
+		terms.push_back (term);
+	    }
+	    return Xapian::Query (Xapian::Query::OP_OR, terms.begin(), terms.end());
+	} else {
+	    throw Xapian::QueryParserError ("missing } in '" + str + "'");
+	}
+    } else {
+	/* literal thread id */
+	std::string term = "G"+str;
+	return Xapian::Query (term);
+    }
+
+}
+#endif
diff --git a/lib/thread-fp.h b/lib/thread-fp.h
new file mode 100644
index 00000000..13725978
--- /dev/null
+++ b/lib/thread-fp.h
@@ -0,0 +1,42 @@
+/* thread-fp.h - thread field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#ifndef NOTMUCH_THREAD_FP_H
+#define NOTMUCH_THREAD_FP_H
+
+#include <xapian.h>
+#include "notmuch.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+ protected:
+    Xapian::QueryParser &parser;
+    notmuch_database_t *notmuch;
+
+ public:
+    ThreadFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
+	: parser(parser_), notmuch(notmuch_) { };
+
+    Xapian::Query operator()(const std::string & str);
+};
+#endif
+#endif /* NOTMUCH_THREAD_FP_H */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] WIP: add thread subqueries.
  2017-08-20 21:32                           ` [PATCH] WIP: add thread subqueries David Bremner
@ 2017-08-21  1:35                             ` David Bremner
  2017-09-07 17:47                               ` Gaute Hope
  0 siblings, 1 reply; 27+ messages in thread
From: David Bremner @ 2017-08-21  1:35 UTC (permalink / raw)
  To: Jani Nikula, notmuch

David Bremner <david@tethera.net> writes:


> If you're interested in this feature, please test, let me know if it's
> workable for your use case.  The syntax here is thread:{$query}. The
> usual caveats about whitespace apply. The running example is
>
> 	     thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
>

I was playing around with this, and

  'thread:"{from:bremner and date:2017}"' and thread:{from:sitaram}

is about 10x faster than without the date restriction. This is roughly
proportional to the number of threads matching the first subquery.
Unfortunately the date restriction needs to be inside the subquery with
the current design.

d

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] WIP: add thread subqueries.
  2017-08-21  1:35                             ` David Bremner
@ 2017-09-07 17:47                               ` Gaute Hope
  2017-09-07 18:51                                 ` David Bremner
  0 siblings, 1 reply; 27+ messages in thread
From: Gaute Hope @ 2017-09-07 17:47 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

David Bremner writes on august 21, 2017 3:35:
> David Bremner <david@tethera.net> writes:
> 
> 
>> If you're interested in this feature, please test, let me know if it's
>> workable for your use case.  The syntax here is thread:{$query}. The
>> usual caveats about whitespace apply. The running example is
>>
>> 	     thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
>>
> 
> I was playing around with this, and
> 
>   'thread:"{from:bremner and date:2017}"' and thread:{from:sitaram}
> 
> is about 10x faster than without the date restriction. This is roughly
> proportional to the number of threads matching the first subquery.
> Unfortunately the date restriction needs to be inside the subquery with
> the current design.

So just to clarify; to make true full thread searches there cannot be more than one term in each thread:{} query? Otherwise, all terms in each thread: query must be found in at least one message at the same time. So you could put the date: restriction in a separate thread:{}, but it would be slower?

-gaute


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] WIP: add thread subqueries.
  2017-09-07 17:47                               ` Gaute Hope
@ 2017-09-07 18:51                                 ` David Bremner
  0 siblings, 0 replies; 27+ messages in thread
From: David Bremner @ 2017-09-07 18:51 UTC (permalink / raw)
  To: Gaute Hope, Jani Nikula, notmuch

Gaute Hope <eg@gaute.vetsj.com> writes:

> David Bremner writes on august 21, 2017 3:35:
>> David Bremner <david@tethera.net> writes:
>> 
>> 
>>> If you're interested in this feature, please test, let me know if it's
>>> workable for your use case.  The syntax here is thread:{$query}. The
>>> usual caveats about whitespace apply. The running example is
>>>
>>> 	     thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
>>>
>> 
>> I was playing around with this, and
>> 
>>   'thread:"{from:bremner and date:2017}"' and thread:{from:sitaram}
>> 
>> is about 10x faster than without the date restriction. This is roughly
>> proportional to the number of threads matching the first subquery.
>> Unfortunately the date restriction needs to be inside the subquery with
>> the current design.
>
> So just to clarify; to make true full thread searches there cannot be
> more than one term in each thread:{} query? Otherwise, all terms in
> each thread: query must be found in at least one message at the same
> time. So you could put the date: restriction in a separate thread:{},
> but it would be slower?

I think your interpretation is correct. I would further emphasise the
following:

  - contrary to usual intuition with notmuch/xapian every extra
    thread:{} term will slow things down

  - having the date restriction in a seperate thread:{} term would ask
    for threads with at least one message in that date range

  - there might be many such threads; e.g. I have 34178 threads from
    2016

d

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-09-07 18:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-09 21:55 find threads where I and Jian participated but not Dave Xu Wang
2017-06-13 15:57 ` Xu Wang
2017-06-13 23:31   ` David Bremner
2017-06-14  0:24     ` Xu Wang
2017-06-14  3:32     ` Brian Sniffen
2017-06-15 17:54     ` Daniel Kahn Gillmor
2017-06-15 20:20       ` David Bremner
2017-06-16  1:07         ` Matt Armstrong
2017-06-16  6:28         ` Gaute Hope
2017-06-21 20:04           ` Matt Armstrong
2017-06-21 21:30             ` Daniel Kahn Gillmor
2017-06-22  6:08               ` Gaute Hope
2017-06-22  6:22                 ` Gaute Hope
2017-06-23  0:00                   ` Matt Armstrong
2017-06-25 15:46                     ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Daniel Kahn Gillmor
2017-06-25 17:14                       ` David Bremner
2017-06-26 20:49                         ` Matt Armstrong
2017-06-26 23:09                           ` David Bremner
2017-08-20 13:35                           ` Jani Nikula
2017-08-20 13:48                         ` Jani Nikula
2017-08-20 21:32                           ` [PATCH] WIP: add thread subqueries David Bremner
2017-08-21  1:35                             ` David Bremner
2017-09-07 17:47                               ` Gaute Hope
2017-09-07 18:51                                 ` David Bremner
2017-06-25 17:40                       ` finding incoming messages in threads in which i've participated [was: Re: find threads where I and Jian participated but not Dave] Brian Sniffen
2017-06-26 20:54                       ` Matt Armstrong
2017-06-22 20:15               ` find threads where I and Jian participated but not Dave Matt Armstrong

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).