unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* thread merge/split proposal
@ 2016-04-04 17:14 Daniel Kahn Gillmor
  2016-04-04 18:23 ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-04 17:14 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1435 bytes --]

Some people i communicate with regularly break threads when
replying. This is a major pain.  I'd like to be able to tell notmuch
(perhaps programmatically) how to connect these threads.

I know we've talked about being able to join threads, but no one has
made such a change in notmuch, afaict.  One of the major concerns people
have about joining threads is that the action seems irreversible.

If it were reversible (if it were easy to split a joined thread back
into its original threads), maybe it would be less scary to have a "join
thread" implementation?

i see two ways to do this:

 a) store an "original thread" attribute for each message that has been
    joined, and just reset it when an unjoin is requested

or

 b) when an unjoin is requested, do a graph analysis of every message in
    the thread's In-Reply-To and References headers, and recreate
    distinct threads from the connected components.


the problem with (a) is that once threads are joined, and a new message
is added to the joined thread, it's not clear which it should have as
its "original thread".

So what do folks think about (b)?  If that was implemented, would it be
less-scary to have a "join thread" operation?

From the CLI, it would look something like:

  notmuch join-threads THREAD_A THREAD_B [ THREAD_C ... ]
  notmuch split-thread THREAD_X

What do people think about this approach?

     --dkg


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-04 17:14 thread merge/split proposal Daniel Kahn Gillmor
@ 2016-04-04 18:23 ` Daniel Kahn Gillmor
  2016-04-10 13:16   ` David Bremner
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-04 18:23 UTC (permalink / raw)
  To: Notmuch Mail

On Mon 2016-04-04 14:14:27 -0300, Daniel Kahn Gillmor wrote:
>   b) when an unjoin is requested, do a graph analysis of every message in
>      the thread's In-Reply-To and References headers, and recreate
>      distinct threads from the connected components.
 [...]
>  From the CLI, it would look something like:
>
>    notmuch join-threads THREAD_A THREAD_B [ THREAD_C ... ]
>    notmuch split-thread THREAD_X

On IRC, bremner pointed out two specific improvements to this proposal:

 0) the inverse operation of "join" proposed above is distinct from the
    ongoing discussion about splitting threads in arbitrary places.  I
    don't want to conflate these issues, so my proposed
    connected-component-analysis operation should be "notmuch
    unjoin-thread", and not "notmuch split-thread"

 1) a "join" operation probably has to be stored explicitly in the
    database, so that the threads will be re-joined across a
    dump/restore operation.

I'm happy with both of these improvements.

for (1) i'd propose that the join operation would be implemented by
adding a new term type "join", which can be applied to any document.
Its value is the message-id of a message that *should* be "in-reply-to"
but wasn't.

So for example: messages A and B are in one thread; messages C and D
come in in a separate thread that should have been joined to the prior
thread but is not.

i propose implementing this as something like:

    notmuch_message_add_term(message_c, "join", get_message_id(message_a));
  
    notmuch_message_set_thread_id(message_c, get_thread_id(message_a));
    notmuch_message_set_thread_id(message_d, get_thread_id(message_a));

i'd also add all the "join" terms to "notmuch dump", though i'm not sure
exactly how to extend the "notmuch dump" format.

feedback welcome,

        --dkg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-04 18:23 ` Daniel Kahn Gillmor
@ 2016-04-10 13:16   ` David Bremner
  2016-04-11 22:41     ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2016-04-10 13:16 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> for (1) i'd propose that the join operation would be implemented by
> adding a new term type "join", which can be applied to any document.
> Its value is the message-id of a message that *should* be "in-reply-to"
> but wasn't.

Having "split" terms or equivalently "signed" +-reference terms would
allow more general thread splitting, effectively updating (via a little
journal of additions and deletions) the references data stored in mail
file.

The implementation cost could not be that much higher than only
join/unjoin; a bit more work managing the terms attached to a document
to avoid contradictions.

Both versions probably complicate some peoples syncing solutions.

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-10 13:16   ` David Bremner
@ 2016-04-11 22:41     ` Daniel Kahn Gillmor
  2016-04-12  0:56       ` David Bremner
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-11 22:41 UTC (permalink / raw)
  To: David Bremner, Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]

On Sun 2016-04-10 09:16:40 -0400, David Bremner wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>> for (1) i'd propose that the join operation would be implemented by
>> adding a new term type "join", which can be applied to any document.
>> Its value is the message-id of a message that *should* be "in-reply-to"
>> but wasn't.
>
> Having "split" terms or equivalently "signed" +-reference terms would
> allow more general thread splitting, effectively updating (via a little
> journal of additions and deletions) the references data stored in mail
> file.

I'm not sure what you mean by "signed" here (cryptographically signed?
a term named "signed"?  the idea that the term could be either positive
or negative?), but i think your proposal is that we could have a
"reference" term with a value of "+foo@example.com" or
"-foo@example.com", instead of having a "join" term with value
"foo@example.com" and a "split" term with value "foo@example.com"

I'm not sure i see much of a difference between

 a) introduce two new term types, "join" and "split", with unsigned
    values

and

 b) introduce one new term type, "reference" with signed values

> The implementation cost could not be that much higher than only
> join/unjoin; a bit more work managing the terms attached to a document
> to avoid contradictions.

right -- and we'd need an understanding of the order in which these
terms are applied if multiple possibly-conflicting terms are present.

> Both versions probably complicate some peoples syncing solutions.

both (a) and (b) complicate syncing solutions, but my original proposal
of:

 c) just introduce a new term type "join" with unsigned value

is easy to sync, i think; i was going for the low-hanging fruit, and
trying to not let it get caught up on the more-fully-featured
arbitrary-split use case, though i understand the appeal of the generic
approach.

fwiw, i can do a really nasty workaround today to implement "join"
between two messages:

#### notmuch-join:
--------------
#!/bin/bash

verify_exists() {
    if ! notmuch search --output=files id:"$1" | grep -q . ; then
        printf "message-id %s is not in your messages\n" "$1" >&2
        exit 1
    fi
}

verify_exists "$1"
verify_exists "$2"

jdir=$(notmuch config get database.path)/join
mkdir -p "$jdir"
z=$(mktemp "$jdir/join.XXXXXX")

cat >"$z" <<EOF
From: test@example.org
Date: $(date -R)
Message-Id: <$(uuidgen)@join.example.org>
References: <$1>, <$2>
Subject: join

test
EOF
notmuch new
rm "$z"
notmuch new
--------------

And i note that this change is also not synced across dump/restore.

So adding an explicit "join" document term (and figuring out how to
represent it in "notmuch dump" and "notmuch restore") would be a strict
improvement over the current situation, right?

        --dkg


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-11 22:41     ` Daniel Kahn Gillmor
@ 2016-04-12  0:56       ` David Bremner
  2016-04-12  1:29         ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2016-04-12  0:56 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> I'm not sure what you mean by "signed" here (cryptographically signed?
> a term named "signed"?  the idea that the term could be either positive
> or negative?), but i think your proposal is that we could have a
> "reference" term with a value of "+foo@example.com" or
> "-foo@example.com", instead of having a "join" term with value
> "foo@example.com" and a "split" term with value "foo@example.com"

I was thinking mostly in terms of the UI. I think

%  notmuch reference +id1 -id2 $QUERY
        
goes well with the tag interface.

> I'm not sure i see much of a difference between
>
>  a) introduce two new term types, "join" and "split", with unsigned
>     values
> and
>
>  b) introduce one new term type, "reference" with signed values

Yeah, it's an implimentation detail, not clear to me that it matters.

> both (a) and (b) complicate syncing solutions, but my original proposal
> of:
>
>  c) just introduce a new term type "join" with unsigned value

I just meant it isn't representable as folders, like tags are (not well,
but *shrug*).

> is easy to sync, i think; i was going for the low-hanging fruit, and
> trying to not let it get caught up on the more-fully-featured
> arbitrary-split use case, though i understand the appeal of the generic
> approach.

I'm a bit worried about UI proliferation with notmuch-join,
notmuch-unjoin now and maybe notmuch-split, notmuch-unsplit later. I'd
be fine with a more generic command with parts perhaps unimplimented.
 
> So adding an explicit "join" document term (and figuring out how to
> represent it in "notmuch dump" and "notmuch restore") would be a strict
> improvement over the current situation, right?

Making things generic in the right way will be less work in the long
run, I think.  For example, if we had thought about more general terms
attached to a message in the last revision of dump/restore, we'd be done
now. 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-12  0:56       ` David Bremner
@ 2016-04-12  1:29         ` Daniel Kahn Gillmor
  2016-08-06 23:50           ` David Bremner
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-12  1:29 UTC (permalink / raw)
  To: David Bremner, Notmuch Mail

On Mon 2016-04-11 20:56:57 -0400, David Bremner wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>> I'm not sure what you mean by "signed" here (cryptographically signed?
>> a term named "signed"?  the idea that the term could be either positive
>> or negative?), but i think your proposal is that we could have a
>> "reference" term with a value of "+foo@example.com" or
>> "-foo@example.com", instead of having a "join" term with value
>> "foo@example.com" and a "split" term with value "foo@example.com"
>
> I was thinking mostly in terms of the UI. I think
>
> %  notmuch reference +id1 -id2 $QUERY
>         
> goes well with the tag interface.

I see, yeah, that makes sense.

That still doesn't cover the "notmuch unjoin" semantics i'd sketched out
earlier, though.  that might need to be a different use case.

The semantics would be something like:

  break the selected threads into threads based solely on their
  References headers (including any manual reference terms) using
  connected component analysis, restoring the threading to what would be
  produced on a clean import.

maybe "unjoin" is the wrong verb, but i'm open to suggestions.

> I'm a bit worried about UI proliferation with notmuch-join,
> notmuch-unjoin now and maybe notmuch-split, notmuch-unsplit later. I'd
> be fine with a more generic command with parts perhaps unimplimented.

i see, that makes sense.

> Making things generic in the right way will be less work in the long
> run, I think.  For example, if we had thought about more general terms
> attached to a message in the last revision of dump/restore, we'd be done
> now. 

right -- we don't even have any version information in the notmuch dump
file.  what's the right way to approach this?

       --dkg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: thread merge/split proposal
  2016-04-12  1:29         ` Daniel Kahn Gillmor
@ 2016-08-06 23:50           ` David Bremner
  0 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2016-08-06 23:50 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> That still doesn't cover the "notmuch unjoin" semantics i'd sketched out
> earlier, though.  that might need to be a different use case.
>
> The semantics would be something like:
>
>   break the selected threads into threads based solely on their
>   References headers (including any manual reference terms) using
>   connected component analysis, restoring the threading to what would be
>   produced on a clean import.

One thing I haven't understood is if the problem this is working around
is unavoidable due to the order messages arrive (or some other factor)
or if this is just a workaround for bugs in the threading algorithm. I'm
guessing the former, but can you explain why the problem is intractable
enough that the right answer is to reindex a thread (btw, that might be
a UI, to use the reindex command introduced in your indexed plaintex
patches).

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-06 23:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04 17:14 thread merge/split proposal Daniel Kahn Gillmor
2016-04-04 18:23 ` Daniel Kahn Gillmor
2016-04-10 13:16   ` David Bremner
2016-04-11 22:41     ` Daniel Kahn Gillmor
2016-04-12  0:56       ` David Bremner
2016-04-12  1:29         ` Daniel Kahn Gillmor
2016-08-06 23:50           ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).