unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Reply all - issue
@ 2013-01-27 21:58 Robert Mast
  2013-01-28 15:13 ` Jani Nikula
  2013-01-31 10:52 ` Michał Nazarewicz
  0 siblings, 2 replies; 23+ messages in thread
From: Robert Mast @ 2013-01-27 21:58 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

Last week I studied many Windows-Mail User Agents with the conversation
threading feature.

None of them (SUP, mutt-kz(notmuch), Outlook 2010, Thunderbird with
conversation thread plug in, Postbox, Evolution) could cope with the
following case:

 

In our e-mail-discussions people often choose 'reply-all' to construct a new
message with the same reciepients.

They clear the body and the subject, but the hidden References: and
In-reply-To: stay and should be cleared as well.

Result is that this new subject drowns in an old
conversation-thread-drilldown and this unpredictable behavior makes
conversation threading useless.

This weekend I went analyzing the notmuch-source to find where I could put a
fix best.

 

I think of a fix that indexes the first dates of (stripped) subject-changes
within threads, and with each first (stripped) subject change check the body
on quotes of previous messages. If there is no quote to referenced mails
then drop the reference and assign a new thread_id_ to the (stripped)
subject.

 

After two days of studying I think the best place with the least
interference with existing code is between 'notmuch new' and starting the
MUA. Then the threads are in place in XAPIAN, and new thread_id_'s can be
inserted.

 

Am I right?

 


[-- Attachment #2: Type: text/html, Size: 3661 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-01-27 21:58 Reply all - issue Robert Mast
@ 2013-01-28 15:13 ` Jani Nikula
  2013-01-28 18:15   ` Robert Mast
  2013-01-31 10:52 ` Michał Nazarewicz
  1 sibling, 1 reply; 23+ messages in thread
From: Jani Nikula @ 2013-01-28 15:13 UTC (permalink / raw)
  To: Robert Mast, notmuch

On Sun, 27 Jan 2013, Robert Mast <beheerder@tekenbeetziekten.nl> wrote:
> Last week I studied many Windows-Mail User Agents with the conversation
> threading feature.
>
> None of them (SUP, mutt-kz(notmuch), Outlook 2010, Thunderbird with
> conversation thread plug in, Postbox, Evolution) could cope with the
> following case:

Apparently all of them obey the RFC 2822 References: and In-Reply-To:
headers for threading, and have no additional heuristics. I think it's a
good thing, but YMMV. I think mutt supports manual breaking and joining
of threads. The gmail web interface, OTOH, automatically breaks threads
on Subject: changes too [1].

> In our e-mail-discussions people often choose 'reply-all' to construct a new
> message with the same reciepients.
>
> They clear the body and the subject, but the hidden References: and
> In-reply-To: stay and should be cleared as well.
>
> Result is that this new subject drowns in an old
> conversation-thread-drilldown and this unpredictable behavior makes
> conversation threading useless.

The term you're looking for is thread hijacking. One could argue the
problem lies in the sender, not the recipient, and therefore should be
fixed in the sender end.

> I think of a fix that indexes the first dates of (stripped) subject-changes
> within threads, and with each first (stripped) subject change check the body
> on quotes of previous messages. If there is no quote to referenced mails
> then drop the reference and assign a new thread_id_ to the (stripped)
> subject.

Doing something like this would break threading for emailed patch series
[2], a very common practice in the open source world, including notmuch
development [3]. Indeed, the way gmail breaks patch threads, but then
joins different versions of the same patch email into new threads, is
very annoying IMO.

Also note that whatever you do, it should work the same way regardless
of the order in which messages the thread are indexed. Regenerating the
database should always end up in the same thread structure.

> After two days of studying I think the best place with the least
> interference with existing code is between 'notmuch new' and starting the
> MUA. Then the threads are in place in XAPIAN, and new thread_id_'s can be
> inserted.

The place you're looking for is probably
_notmuch_database_link_message() in lib/database.cc.

Patches, as they say, are welcome, but I believe for upstream notmuch
inclusion you'd need to address the issues I pointed out above.

HTH,
Jani.


[1] http://support.google.com/mail/bin/answer.py?hl=en&answer=5900

[2] http://git-scm.com/book/en/Distributed-Git-Contributing-to-a-Project#Public-Large-Project

[3] http://notmuchmail.org/pipermail/notmuch/2013/thread.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-28 15:13 ` Jani Nikula
@ 2013-01-28 18:15   ` Robert Mast
  2013-01-29  2:47     ` Carl Worth
  0 siblings, 1 reply; 23+ messages in thread
From: Robert Mast @ 2013-01-28 18:15 UTC (permalink / raw)
  To: 'Jani Nikula', notmuch

Thanks for your reply.

I never tried gmail-conversation threading, but from your first reference I
understand it breaks threads on subject unconditionally.

Breaking on subject unconditionally would be even easier to implement, as
comparing the contents of previous messages takes performance and as long as
the crucial linking messages aren't read the outcome is ambiguous and would
lead to the annoying jumping of results.

I'll watch for 'client-end' solutions, but the mail that broke all those
mailers originated from my own mailprogram, I think Outlook 2010, so
automatic clearing references and in-reply-to when the user clears the
subject and body isn't common practice for MUA's.

Your point on patch-breaking related to gmail and my proposal isn't
completely clear to me, but I've probably addressed it well with my new
approach.

I'll study the code for adding the option of unconditional (stripped)
subject breaking on top of the existing thread-breaking.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-28 18:15   ` Robert Mast
@ 2013-01-29  2:47     ` Carl Worth
  2013-01-30 17:14       ` Robert Mast
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Carl Worth @ 2013-01-29  2:47 UTC (permalink / raw)
  To: Robert Mast, 'Jani Nikula', notmuch

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

Robert Mast <beheerder@tekenbeetziekten.nl> writes:
> Your point on patch-breaking related to gmail and my proposal isn't
> completely clear to me, but I've probably addressed it well with my new
> approach.

The issue here is that many developers tend to develop a patch series
(perhaps with dozens of patches) as a single conceptual unit. When these
are emailed out, they are often sent as one thread with a new subject
for every patch. In particular, users of git and "git send-email" often
send patches this way. For what it's worth, it's my preferred way to
send and receive patches via email.

It's extremely useful for messages like this to be presented as a
single thread. This means that the dozens of messages don't clutter the
inbox, and it also allows for an operation to act on all of the messages
at once, (for example, notmuch provides "C-u |" which can apply all of
the received patches to a code repository in a single operation).

So, those of us accustomed to sending, receiving, reviewing, and
applying patches emailed in this way would be basically unable to use an
email program that split threads unconditionally on subject changes.

So it may be tricky to find a single behavior that would make everyone
happy. Perhaps a configuration option for splitting threads on subject
changes.

> I'll study the code for adding the option of unconditional (stripped)
> subject breaking on top of the existing thread-breaking.

Is there any existing thread-breaking? There wasn't the last time I
looked at the code closely, (but admittedly, that was a while ago).

-Carl


[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-29  2:47     ` Carl Worth
@ 2013-01-30 17:14       ` Robert Mast
  2013-01-30 21:39         ` Suvayu Ali
  2013-01-30 20:56       ` Robert Mast
  2013-01-30 21:49       ` Robert Mast
  2 siblings, 1 reply; 23+ messages in thread
From: Robert Mast @ 2013-01-30 17:14 UTC (permalink / raw)
  To: 'Carl Worth', 'Jani Nikula', notmuch

Thanks for your clear explanation.

The thread-merging and breaking is in the procedure already pointed at by
Jani: (_notmuch_database_link_message() in lib/database.cc.)

Is there a quick way to recognize those git-threads by subject-syntax, or to
reliably tag them to exclude them from subject-breaking?




-----Oorspronkelijk bericht-----
Van: Carl Worth [mailto:cworth@cworth.org] 
Verzonden: dinsdag 29 januari 2013 3:48
Aan: Robert Mast; 'Jani Nikula'; notmuch@notmuchmail.org
Onderwerp: RE: Reply all - issue


Is there any existing thread-breaking? There wasn't the last time I looked
at the code closely, (but admittedly, that was a while ago).

-Carl

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-29  2:47     ` Carl Worth
  2013-01-30 17:14       ` Robert Mast
@ 2013-01-30 20:56       ` Robert Mast
  2013-01-30 21:49       ` Robert Mast
  2 siblings, 0 replies; 23+ messages in thread
From: Robert Mast @ 2013-01-30 20:56 UTC (permalink / raw)
  To: 'Carl Worth', 'Jani Nikula', notmuch

I never used git for mailpatching, so I have no example-mailbox to analyse.

I understand that the subject starting with "[PATCH <anything>]" can be a
git-hint, but is not guaranteed. Or is it? [1]

If it isn't, can I assume all git-messages comply to this set: [2]

"The patch is expected to be inline, directly following the message. Any
line that is of the form: 

. three-dashes and end-of-line, or 
. a line that begins with "diff -", or 
. a line that begins with "Index: "
"

Or should the git filter also look for a "scissor-line" [3] to identify a
git-message?

[1] http://www.kernel.org/pub/software/scm/git/docs/git-send-email.html
[2] http://linux.die.net/man/1/git-am 
[3] http://linux.die.net/man/1/git-mailinfo 

Or are there any guaranteed under water git-markers in the mailheader?

-----Oorspronkelijk bericht-----
Van: Robert Mast [mailto:beheerder@tekenbeetziekten.nl] 
Verzonden: woensdag 30 januari 2013 18:15
Aan: 'Carl Worth'; 'Jani Nikula'; 'notmuch@notmuchmail.org'
Onderwerp: RE: Reply all - issue

Thanks for your clear explanation.

The thread-merging and breaking is in the procedure already pointed at by
Jani: (_notmuch_database_link_message() in lib/database.cc.)

Is there a quick way to recognize those git-threads by subject-syntax, or to
reliably tag them to exclude them from subject-breaking?




-----Oorspronkelijk bericht-----
Van: Carl Worth [mailto:cworth@cworth.org] 
Verzonden: dinsdag 29 januari 2013 3:48
Aan: Robert Mast; 'Jani Nikula'; notmuch@notmuchmail.org
Onderwerp: RE: Reply all - issue


Is there any existing thread-breaking? There wasn't the last time I looked
at the code closely, (but admittedly, that was a while ago).

-Carl

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-01-30 17:14       ` Robert Mast
@ 2013-01-30 21:39         ` Suvayu Ali
  2013-01-31 10:21           ` Andrei POPESCU
  0 siblings, 1 reply; 23+ messages in thread
From: Suvayu Ali @ 2013-01-30 21:39 UTC (permalink / raw)
  To: notmuch

Hi,

I am a *very new* notmuch user (notmuch + mutt-kz/Emacs), but I would
like to throw in a few opinions about this topic.

On Wed, Jan 30, 2013 at 06:14:48PM +0100, Robert Mast wrote:
> Thanks for your clear explanation.
> 
> The thread-merging and breaking is in the procedure already pointed at by
> Jani: (_notmuch_database_link_message() in lib/database.cc.)
> 
> Is there a quick way to recognize those git-threads by subject-syntax, or to
> reliably tag them to exclude them from subject-breaking?
> 

I don't like this feature at all.  Threads with patches from
git-send-email are not the only kind of threads where this might be
relevant.  Often I encounter threads with sub-threads which are a little
OT hence get renamed, but they are still related to the parent thread.
Sometimes this helps in following how a topic came up while discussing
something else.  This is especially true when going through old emails
for reference.  I have encountered this in mailing lists, personal
emails and discussions with colleagues.  One of the many other reasons
for me to switch from Gmail to my present setup was to avoid this
"feature".

That said, I think this feature is indeed useful at times but it should
be implemented in the UI on user command or as a configurable (e.g. mutt
provides the <break-thread> command), not a default underlying behaviour
of the backend.  If this is pursued, implementing it as a configurable
in the Emacs UI might be more appropriate (or whatever other UIs exists
out there).

Hope this is constructive to the discussion.  :)

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-29  2:47     ` Carl Worth
  2013-01-30 17:14       ` Robert Mast
  2013-01-30 20:56       ` Robert Mast
@ 2013-01-30 21:49       ` Robert Mast
  2013-01-31  1:12         ` David Bremner
  2 siblings, 1 reply; 23+ messages in thread
From: Robert Mast @ 2013-01-30 21:49 UTC (permalink / raw)
  To: 'Carl Worth', 'Jani Nikula', notmuch

I ran git send-email and became the following line in the mail-header:

"X-Mailer: git-send-email 1.7.9.5"

I see that this git-marker already existed in version 1.3 in 2006:

http://comments.gmane.org/gmane.comp.version-control.git/23337 

Can I assume, apart from the version number, that this header-marker applies
to all git-mail that should not be subject-splitted?

I can also leave the threads in the database as they are and only change
notmuch_query_search_threads in lib/query.cc to add the subject as a second
hash-key and notmuch_threads_get/_notmuch_thread_create for also looking for
the subject of the seed-message.

Then I only have to add the stripped subject as a search-term. The subject
that's now in the database is the original non-stripped subject.

I expect next weekend to have some time again.

-----Oorspronkelijk bericht-----
Van: Robert Mast [mailto:beheerder@tekenbeetziekten.nl] 
Verzonden: woensdag 30 januari 2013 21:57
Aan: 'Carl Worth'; 'Jani Nikula'; 'notmuch@notmuchmail.org'
Onderwerp: RE: Reply all - issue

I never used git for mailpatching, so I have no example-mailbox to analyse.

I understand that the subject starting with "[PATCH <anything>]" can be a
git-hint, but is not guaranteed. Or is it? [1]

If it isn't, can I assume all git-messages comply to this set: [2]

"The patch is expected to be inline, directly following the message. Any
line that is of the form: 

. three-dashes and end-of-line, or 
. a line that begins with "diff -", or 
. a line that begins with "Index: "
"

Or should the git filter also look for a "scissor-line" [3] to identify a
git-message?

[1] http://www.kernel.org/pub/software/scm/git/docs/git-send-email.html
[2] http://linux.die.net/man/1/git-am 
[3] http://linux.die.net/man/1/git-mailinfo 

Or are there any guaranteed under water git-markers in the mailheader?

-----Oorspronkelijk bericht-----
Van: Robert Mast [mailto:beheerder@tekenbeetziekten.nl] 
Verzonden: woensdag 30 januari 2013 18:15
Aan: 'Carl Worth'; 'Jani Nikula'; 'notmuch@notmuchmail.org'
Onderwerp: RE: Reply all - issue

Thanks for your clear explanation.

The thread-merging and breaking is in the procedure already pointed at by
Jani: (_notmuch_database_link_message() in lib/database.cc.)

Is there a quick way to recognize those git-threads by subject-syntax, or to
reliably tag them to exclude them from subject-breaking?




-----Oorspronkelijk bericht-----
Van: Carl Worth [mailto:cworth@cworth.org] 
Verzonden: dinsdag 29 januari 2013 3:48
Aan: Robert Mast; 'Jani Nikula'; notmuch@notmuchmail.org
Onderwerp: RE: Reply all - issue


Is there any existing thread-breaking? There wasn't the last time I looked
at the code closely, (but admittedly, that was a while ago).

-Carl

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-30 21:49       ` Robert Mast
@ 2013-01-31  1:12         ` David Bremner
  2013-01-31  1:14           ` David Bremner
  0 siblings, 1 reply; 23+ messages in thread
From: David Bremner @ 2013-01-31  1:12 UTC (permalink / raw)
  To: Robert Mast, 'Carl Worth', 'Jani Nikula', notmuch

Robert Mast <beheerder@tekenbeetziekten.nl> writes:

> I ran git send-email and became the following line in the mail-header:
>
> "X-Mailer: git-send-email 1.7.9.5"
>
> Can I assume, apart from the version number, that this header-marker applies
> to all git-mail that should not be subject-splitted?

Hardcoding particular headers sounds too fragile to me. With that said,
if you want a corpus of email to investigate, there is e.g. 

http://notmuchmail.org/corpus/

Unfortunately I seem to recall threading is mostly pretty trivial,
except in the notmuch mailing list archive. If you prefer a smaller
download, that is at 

          http://notmuchmail.org/corpus/

(as an mbox)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-31  1:12         ` David Bremner
@ 2013-01-31  1:14           ` David Bremner
  2013-02-12  7:07             ` Jameson Graef Rollins
  0 siblings, 1 reply; 23+ messages in thread
From: David Bremner @ 2013-01-31  1:14 UTC (permalink / raw)
  To: Robert Mast, 'Carl Worth', 'Jani Nikula', notmuch

David Bremner <david@tethera.net> writes:
>
> Hardcoding particular headers sounds too fragile to me. With that said,
> if you want a corpus of email to investigate, there is e.g. 
>

Let me step back a level and say that special casing git patch series
strikes me as not yet seeing the problem in enough generality. Others
might disagree, of course.

d

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-01-30 21:39         ` Suvayu Ali
@ 2013-01-31 10:21           ` Andrei POPESCU
  0 siblings, 0 replies; 23+ messages in thread
From: Andrei POPESCU @ 2013-01-31 10:21 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

On Mi, 30 ian 13, 22:39:40, Suvayu Ali wrote:
> 
> That said, I think this feature is indeed useful at times but it should
> be implemented in the UI on user command or as a configurable (e.g. mutt
> provides the <break-thread> command), not a default underlying behaviour
> of the backend.  If this is pursued, implementing it as a configurable
> in the Emacs UI might be more appropriate (or whatever other UIs exists
> out there).

As a subscriber of 30+ mailing lists a big +1 from me. Mutt's 
<break-thread> has served me well on the very rare occasion I needed it.

Kind regards,
Andrei
-- 
If you can't explain it simply, you don't understand it well enough.
(Albert Einstein)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-01-27 21:58 Reply all - issue Robert Mast
  2013-01-28 15:13 ` Jani Nikula
@ 2013-01-31 10:52 ` Michał Nazarewicz
  2013-02-02 16:21   ` Robert Mast
  1 sibling, 1 reply; 23+ messages in thread
From: Michał Nazarewicz @ 2013-01-31 10:52 UTC (permalink / raw)
  To: Robert Mast; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

28 sty 2013 08:37, "Robert Mast" <beheerder@tekenbeetziekten.nl> napisał(a):
> I think of a fix that indexes the first dates of (stripped)
subject-changes within threads, and with each first (stripped) subject
change check the body on quotes of previous messages. If there is no quote
to referenced mails then drop the reference and assign a new thread_id_ to
the (stripped) subject.

This is a misfeature which only reinforces the incorrect behaviour and one
of the tbings I hate about Gmail. As such I hope that at the *very* *least*
there will be an option to turn tbis behaviour off.

[-- Attachment #2: Type: text/html, Size: 701 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-31 10:52 ` Michał Nazarewicz
@ 2013-02-02 16:21   ` Robert Mast
  2013-02-02 20:52     ` David Bremner
  2013-02-04 10:39     ` Michał Nazarewicz
  0 siblings, 2 replies; 23+ messages in thread
From: Robert Mast @ 2013-02-02 16:21 UTC (permalink / raw)
  To: 'Michał Nazarewicz'; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

Off course I’ll try not to hinder the current notmuch-users. My intent is to even find some support for it.

 

As far as I know Gmail was the great example of threading for the SUP-developers, and SUP lead to Notmuch.

 

So Gmail-threading is still the best I suppose, except for git send-email-users, which happen to have quite an overlap with the developers of nutmuch.

 

Anyone interested in me patching Notmuch, or shall I keep the changes to myself?

 

Van: mnazarewicz@gmail.com
…

This is a misfeature which only reinforces the incorrect behaviour and one of the things I hate about Gmail. As such I hope that at the *very* *least* there will be an option to turn this behaviour off.


[-- Attachment #2: Type: text/html, Size: 4037 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-02 16:21   ` Robert Mast
@ 2013-02-02 20:52     ` David Bremner
  2013-02-03  0:06       ` [Spam-verdenking][english 100%] " Robert Mast
                         ` (2 more replies)
  2013-02-04 10:39     ` Michał Nazarewicz
  1 sibling, 3 replies; 23+ messages in thread
From: David Bremner @ 2013-02-02 20:52 UTC (permalink / raw)
  To: Robert Mast; +Cc: notmuch

Robert Mast <beheerder@tekenbeetziekten.nl> writes:

>
> Anyone interested in me patching Notmuch, or shall I keep the changes
> to myself?
>

Hi Robert;

If you have patches, and you want feedback on them, then you are of
course welcome to send them to the list.  Previous experience suggests
us that it is often faster in the long run (in terms of actually getting
code into notmuch) to take time to work out the design issues before
starting coding. Some suggestions/comments:

1) See http://notmuchmail.org/contributing/ for some general hints on
   contributing code to notmuch.
             
2) Make sure whatever threading heuristic you use is deterministic, and
   robust in the face of messages arriving in different orders, and
   munging of headers by mailing lists (subjects in particular get
   munged fairly often).  

3) In particular, it seems important that "notmuch dump" followed by
   "notmuch restore" (possibly followed by notmuch new?) yields unchanged
   or equivalent thread structure

4) Since threading heuristics are a matter of taste (i.e. not everyone
   is convinced that the way Gmail does it is the way notmuch should),
   you'll need to make this configurable. One constraint is that the
   library itself (under ./lib) is should not read configuration files
   (or environment variables, although it violates this for debugging).
   This just means you will have to change the API to pass configuration
   information in to certain routines.

5) I'd say it's more important that you can shut off the heuristic
   completely than have special handling for git (or other version
   control system) patch series.  If you do decide to add some special
   handling for patch series, I'd suggest making it as generic as
   possible, perhaps a configurable list of (header, regex) values that
   disable the thread splitting heuristics.

6) Decide how, if at all your design will support manually joining
   threads together.  I think an acceptable answer would probably be
   "disable all thread splitting heuristics and rebuild the
   database". I'm not sure if it's feasible to do anything nicer than
   that.

d

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [Spam-verdenking][english 100%] RE: Reply all - issue
  2013-02-02 20:52     ` David Bremner
@ 2013-02-03  0:06       ` Robert Mast
  2013-02-03 15:26       ` Robert Mast
  2013-02-10 15:43       ` Robert Mast
  2 siblings, 0 replies; 23+ messages in thread
From: Robert Mast @ 2013-02-03  0:06 UTC (permalink / raw)
  To: 'David Bremner'; +Cc: notmuch

Thanks for the guidelines!

One answer I couldn't find under coding: Do you all develop with emacs/GDB or is there a more visual and intuitive IDE to code with and load/show the dependency-tree? I only debugged some C-code with emacs 15 years ago and feel quite clumsy to get emacs to function like a proper wysywig-IDE.

#4) naturally.

I like your last suggestion at #5) of the header-regexp and agree to first work on the design-issues left before coding:

@#6&#2&#3): I doubt whether I should tamper with threading heuristics at all at the level of /lib/database.cc. Does anyone know whether the MUA's using notmuch depend on thread-id's at the level of database.cc, or will MUA's respect the different threads coming from seeding lib/thread.cc/_notmuch_thread_create with all known messages except already processed messages as is done with notmuch_query_search_threads?

If I let lib/thread.cc/_notmuch_thread_create only 'eat' everything from 'match_set' for a stripped subject the 'seed'-message of another subject within the same thread will then lead to another created thread within the result set.

If I start coding this I can try the result with mutt-kz/notmuch and notmuch/emacs.

My aim with getting notmuch working well will be providing a base for reviving something like mail2forum for phpBB3 with mailcompression-capabilities to prevent for mailthreads to be copied in again and again with every mailed answer.

I think that can be accomplished by keeping the original mails as well and compress the forum-threads to sup-like threads (by hiding quoted e-mail).


-----Oorspronkelijk bericht-----
Van: David Bremner [mailto:david@tethera.net] 
Verzonden: zaterdag 2 februari 2013 21:53
Aan: Robert Mast
CC: notmuch@notmuchmail.org
Onderwerp: [Spam-verdenking][english 100%] RE: Reply all - issue

Robert Mast <beheerder@tekenbeetziekten.nl> writes:

>
> Anyone interested in me patching Notmuch, or shall I keep the changes 
> to myself?
>

Hi Robert;

If you have patches, and you want feedback on them, then you are of course welcome to send them to the list.  Previous experience suggests us that it is often faster in the long run (in terms of actually getting code into notmuch) to take time to work out the design issues before starting coding. Some suggestions/comments:

1) See http://notmuchmail.org/contributing/ for some general hints on
   contributing code to notmuch.
             
2) Make sure whatever threading heuristic you use is deterministic, and
   robust in the face of messages arriving in different orders, and
   munging of headers by mailing lists (subjects in particular get
   munged fairly often).  

3) In particular, it seems important that "notmuch dump" followed by
   "notmuch restore" (possibly followed by notmuch new?) yields unchanged
   or equivalent thread structure

4) Since threading heuristics are a matter of taste (i.e. not everyone
   is convinced that the way Gmail does it is the way notmuch should),
   you'll need to make this configurable. One constraint is that the
   library itself (under ./lib) is should not read configuration files
   (or environment variables, although it violates this for debugging).
   This just means you will have to change the API to pass configuration
   information in to certain routines.

5) I'd say it's more important that you can shut off the heuristic
   completely than have special handling for git (or other version
   control system) patch series.  If you do decide to add some special
   handling for patch series, I'd suggest making it as generic as
   possible, perhaps a configurable list of (header, regex) values that
   disable the thread splitting heuristics.

6) Decide how, if at all your design will support manually joining
   threads together.  I think an acceptable answer would probably be
   "disable all thread splitting heuristics and rebuild the
   database". I'm not sure if it's feasible to do anything nicer than
   that.

d




.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-02 20:52     ` David Bremner
  2013-02-03  0:06       ` [Spam-verdenking][english 100%] " Robert Mast
@ 2013-02-03 15:26       ` Robert Mast
  2013-02-03 18:28         ` David Bremner
  2013-02-10 15:43       ` Robert Mast
  2 siblings, 1 reply; 23+ messages in thread
From: Robert Mast @ 2013-02-03 15:26 UTC (permalink / raw)
  To: 'David Bremner'; +Cc: notmuch

I committed a little patch on a memory-issue I found.

Can someone look whether I used git the right way, or should I study git send-email some further?

-----Oorspronkelijk bericht-----
Van: David Bremner [mailto:david@tethera.net] 
Verzonden: zaterdag 2 februari 2013 21:53
Aan: Robert Mast
CC: notmuch@notmuchmail.org
Onderwerp: [Spam-verdenking][english 100%] RE: Reply all - issue

Robert Mast <beheerder@tekenbeetziekten.nl> writes:

>
> Anyone interested in me patching Notmuch, or shall I keep the changes 
> to myself?
>

Hi Robert;

If you have patches, and you want feedback on them, then you are of course welcome to send them to the list.  Previous experience suggests us that it is often faster in the long run (in terms of actually getting code into notmuch) to take time to work out the design issues before starting coding. Some suggestions/comments:

1) See http://notmuchmail.org/contributing/ for some general hints on
   contributing code to notmuch.
             
2) Make sure whatever threading heuristic you use is deterministic, and
   robust in the face of messages arriving in different orders, and
   munging of headers by mailing lists (subjects in particular get
   munged fairly often).  

3) In particular, it seems important that "notmuch dump" followed by
   "notmuch restore" (possibly followed by notmuch new?) yields unchanged
   or equivalent thread structure

4) Since threading heuristics are a matter of taste (i.e. not everyone
   is convinced that the way Gmail does it is the way notmuch should),
   you'll need to make this configurable. One constraint is that the
   library itself (under ./lib) is should not read configuration files
   (or environment variables, although it violates this for debugging).
   This just means you will have to change the API to pass configuration
   information in to certain routines.

5) I'd say it's more important that you can shut off the heuristic
   completely than have special handling for git (or other version
   control system) patch series.  If you do decide to add some special
   handling for patch series, I'd suggest making it as generic as
   possible, perhaps a configurable list of (header, regex) values that
   disable the thread splitting heuristics.

6) Decide how, if at all your design will support manually joining
   threads together.  I think an acceptable answer would probably be
   "disable all thread splitting heuristics and rebuild the
   database". I'm not sure if it's feasible to do anything nicer than
   that.

d




.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-03 15:26       ` Robert Mast
@ 2013-02-03 18:28         ` David Bremner
  0 siblings, 0 replies; 23+ messages in thread
From: David Bremner @ 2013-02-03 18:28 UTC (permalink / raw)
  To: Robert Mast; +Cc: notmuch

Robert Mast <beheerder@tekenbeetziekten.nl> writes:

> I committed a little patch on a memory-issue I found.

Where did you commit it?

>
> Can someone look whether I used git the right way, or should I study
> git send-email some further?

I guess that's probably the simplest. Otherwise you need to push it to a
publically available repo.

d

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-02 16:21   ` Robert Mast
  2013-02-02 20:52     ` David Bremner
@ 2013-02-04 10:39     ` Michał Nazarewicz
  2013-02-04 15:29       ` Suvayu Ali
  2013-02-06 18:19       ` Istvan Marko
  1 sibling, 2 replies; 23+ messages in thread
From: Michał Nazarewicz @ 2013-02-04 10:39 UTC (permalink / raw)
  To: Robert Mast; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

2 lut 2013 17:21, "Robert Mast" <beheerder@tekenbeetziekten.nl> napisał(a):
> So Gmail-threading is still the best I suppose,

I strongly disagree. Having said that, as long as it's configurable I
obviously won't be blocking your efforts.

> Anyone interested in me patching Notmuch, or shall I keep the changes to
myself?

I was actually wondering that instead of hard coding the logic into notmuch
itself, maybe it would be better to provide some sort of "split-thread" and
"join-threads" which could than be used by separate tagging tool.

To be user-friendly this may require a possibly to search for all ancestors
of a given message and possibly an option to sort results topologically
which I dunno if notmuch has.

[-- Attachment #2: Type: text/html, Size: 919 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-02-04 10:39     ` Michał Nazarewicz
@ 2013-02-04 15:29       ` Suvayu Ali
  2013-02-06 18:19       ` Istvan Marko
  1 sibling, 0 replies; 23+ messages in thread
From: Suvayu Ali @ 2013-02-04 15:29 UTC (permalink / raw)
  To: notmuch

On Mon, Feb 04, 2013 at 11:39:44AM +0100, Michał Nazarewicz wrote:
> 2 lut 2013 17:21, "Robert Mast" <beheerder@tekenbeetziekten.nl> napisał(a):
> > Anyone interested in me patching Notmuch, or shall I keep the changes to
> myself?
> 
> I was actually wondering that instead of hard coding the logic into notmuch
> itself, maybe it would be better to provide some sort of "split-thread" and
> "join-threads" which could than be used by separate tagging tool.
> 
> To be user-friendly this may require a possibly to search for all ancestors
> of a given message and possibly an option to sort results topologically
> which I dunno if notmuch has.

This would be a wonderful addition.  :)

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Reply all - issue
  2013-02-04 10:39     ` Michał Nazarewicz
  2013-02-04 15:29       ` Suvayu Ali
@ 2013-02-06 18:19       ` Istvan Marko
  1 sibling, 0 replies; 23+ messages in thread
From: Istvan Marko @ 2013-02-06 18:19 UTC (permalink / raw)
  To: notmuch

Michał Nazarewicz <mina86-deATy8a+UHjQT0dZR+AlfA@public.gmane.org>
writes:

> I was actually wondering that instead of hard coding the logic into notmuch
> itself, maybe it would be better to provide some sort of "split-thread" and
> "join-threads" which could than be used by separate tagging tool.

Such a customized threading feature would be great, I would use it to
tie together "loose threads" originating from crappy ticket tracking
tools that don't insert any In-Reply-To or References headers. Currently
I handle this by inserting fake In-Reply-To headers during delivery and
I would love to have a cleaner way.

To make this useful it would have to be persistent across dumps and
restores. 

If we only consider splitting then a relatively easy way might be to
allow the user to configure some tags to mark a split. In
.notmuch-config you'd have:

split_tags: split

And then you'd tag +split the message to mark the start of a new
thread. The threading code would watch for such tags. Which might get
hairy if the tag information is not already at hand during threading.

I don't see how this would work for joins so it would not help me but it
could address the original problem.

-- 
	Istvan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-02 20:52     ` David Bremner
  2013-02-03  0:06       ` [Spam-verdenking][english 100%] " Robert Mast
  2013-02-03 15:26       ` Robert Mast
@ 2013-02-10 15:43       ` Robert Mast
  2 siblings, 0 replies; 23+ messages in thread
From: Robert Mast @ 2013-02-10 15:43 UTC (permalink / raw)
  To: notmuch

From many replies I understand manual thread-joining and -breaking exists with mutt's manual commands and default subject breaking -as Gmail does- would not be preferred, while not only version control systems vary subjects within a thread, but also discussions with slight off-topic forks and therefore slightly changed subjects should stay in the same thread.

The reason why I asked my question in the first place is because I have lots of mail-discussions going on between about 10 board-members who are not able to meet in real life often enough to decide everything by conference, so our mailboxes pile up with suggestions, remarks with longer growing thread-histories and evolving addressee-lists. Those addressee-lists vary by individual choice, often without confirmation of other participants to involve some new addressee's, sometimes resulting in leakage.

I thought to revive mail2forum, a plug-in for phpBB, to force people to use existing addressee-lists per 'circle' and archive all e-maildiscussions in a forum, so people wouldn't be e-mailed for every subject and could lose/drop their own mail. Threads should be compressed to keep mostly only original messages - if available -, and small citations, or links to original texts if needed. This thread-compression is functionality the existing mail2forum doesn't have, so that's where for example notmuch comes in.

Discussing around I understand that phpBB misses the very basic feature of thread-forking: Every slightly off-topic remark in a phpBB-message can only be splitted to a completely new thread.

I wonder whether that is blocking for my own situation, as participants in our discussions don't change the subject-line very often, but it could probably affect the viability of mail2forum as an open source-project.

I don't see how I can easily manually manipulate threads with a mail client when mail2forum automatically reads and processes new incoming mail, so my efforts with notmuch will probably stick to the 'optional' subject-splitting-solution.

As, however, mail2forum should handle postponed e-mail as well (and exchange former quotes with their original texts), probably patching from a manually altered maildir wouldn't be such a big step. I however haven't studied all that's needed there yet.

By the way, before I spend too much time on mail2forum - does anyone know an (open source?) group-mail project with user authentication to centrally stored messages that already does have satisfying thread-compression?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-01-31  1:14           ` David Bremner
@ 2013-02-12  7:07             ` Jameson Graef Rollins
  2013-02-12 19:17               ` Carl Worth
  0 siblings, 1 reply; 23+ messages in thread
From: Jameson Graef Rollins @ 2013-02-12  7:07 UTC (permalink / raw)
  To: David Bremner, Robert Mast, 'Carl Worth',
	'Jani Nikula', notmuch

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Wed, Jan 30 2013, David Bremner <david@tethera.net> wrote:
> Let me step back a level and say that special casing git patch series
> strikes me as not yet seeing the problem in enough generality. Others
> might disagree, of course.

I agree with this statement.

So I encounter the thread hijacking problem occasionally, but not
frequently enough that I would trust a particular heuristic to cover it.
I think I would prefer to just split hijacked threads manually as I
encounter them.

Just a thought: what if messages with a given tag (e.g. "new-thread")
were always treated as the source of a new thread?  A message with the
given tag could just be (re)indexed with any In-Reply-To/References
headers stripped before indexing.  This would allow users to break
threads manually, and it would mean dump && restore would always return
the same state.

The actual thread breaking, or specifically where it happens, would have
to be thought through a bit.  Maybe this could be rolled into notmuch
new somehow?  Or some other top-level function that applies operations
to messages based on tags?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Reply all - issue
  2013-02-12  7:07             ` Jameson Graef Rollins
@ 2013-02-12 19:17               ` Carl Worth
  0 siblings, 0 replies; 23+ messages in thread
From: Carl Worth @ 2013-02-12 19:17 UTC (permalink / raw)
  To: Jameson Graef Rollins, David Bremner, Robert Mast,
	'Jani Nikula', notmuch

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

Jameson Graef Rollins <jrollins@finestructure.net> writes:
> Just a thought: what if messages with a given tag (e.g. "new-thread")
> were always treated as the source of a new thread?

It's a good start. And an approach like that would have the advantage
that one could undo a thread-split by just removing the tag. (That's not
an explicit thread-join feature, but I don't think anyone has ever asked
for that.)

> A message with the given tag could just be (re)indexed with any
> In-Reply-To/References headers stripped before indexing.

It would require a little more than that. Imagine this thread:

  A: Subject: An original thread
  └─B: Subject: Thread hijacking is fun (tag:new-thread)
    └─C: Subject: Re: Thread hijacking is fun

In this case, message C is likely to have a References header that
mentions both A and B. So the thread stitching logic in notmuch will
want to merge threads A and B when indexing C. So special care will have
to be taken here as well, (not just when indexing B).

And that special care may not be cheap if it requires additional
database lookups for each unique thread ID encountered among references
of a message.

Though, I don't mean to dissuade anyone from thinking this through and
coding it up. The relevant code for the pieces I'm referring to starts
in _notmuch_database_link_message in lib/database.cc.

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-02-12 19:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-27 21:58 Reply all - issue Robert Mast
2013-01-28 15:13 ` Jani Nikula
2013-01-28 18:15   ` Robert Mast
2013-01-29  2:47     ` Carl Worth
2013-01-30 17:14       ` Robert Mast
2013-01-30 21:39         ` Suvayu Ali
2013-01-31 10:21           ` Andrei POPESCU
2013-01-30 20:56       ` Robert Mast
2013-01-30 21:49       ` Robert Mast
2013-01-31  1:12         ` David Bremner
2013-01-31  1:14           ` David Bremner
2013-02-12  7:07             ` Jameson Graef Rollins
2013-02-12 19:17               ` Carl Worth
2013-01-31 10:52 ` Michał Nazarewicz
2013-02-02 16:21   ` Robert Mast
2013-02-02 20:52     ` David Bremner
2013-02-03  0:06       ` [Spam-verdenking][english 100%] " Robert Mast
2013-02-03 15:26       ` Robert Mast
2013-02-03 18:28         ` David Bremner
2013-02-10 15:43       ` Robert Mast
2013-02-04 10:39     ` Michał Nazarewicz
2013-02-04 15:29       ` Suvayu Ali
2013-02-06 18:19       ` Istvan Marko

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).