unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] Handle message renames in mail spool
@ 2009-11-26 20:23 Mikhail Gusarov
  2009-12-02 21:15 ` [PATCH (rebased)] " Mikhail Gusarov
  0 siblings, 1 reply; 21+ messages in thread
From: Mikhail Gusarov @ 2009-11-26 20:23 UTC (permalink / raw)
  To: notmuch

In order to handle message renames the following changes were deemed necessary:

* Mtime check on individual files was disabled. As files may be moved around
without changing their mtime, it's necessary to parse them even if they appear
old in case old message was moved. mtime check on directories was kept as moving
files changes mtime of directory.

* If message being parsed is already found in database under different path,
then this message is considered to be moved, path is updated in database and
this file does not undergo further processing.

Note that after applying this patch notmuch still does not handle copying files
(which is harmless, database will point to the last copy of message found during
'notmuch new') and deleting files (which is more serious, as dangling entries
will show up in searches).

Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net>
---
 lib/database.cc |   32 +++++++++++++------
 notmuch-new.c   |   92 ++++++++++++++++++++++++++----------------------------
 2 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 2c90019..257c0b8 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -990,19 +990,31 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 	if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
 	    _notmuch_message_set_filename (message, filename);
 	    _notmuch_message_add_term (message, "type", "mail");
-	} else {
-	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-	    goto DONE;
-	}
 
-	ret = _notmuch_database_link_message (notmuch, message, message_file);
-	if (ret)
-	    goto DONE;
+	    ret = _notmuch_database_link_message (notmuch, message, message_file);
+	    if (ret)
+		goto DONE;
 
-	date = notmuch_message_file_get_header (message_file, "date");
-	_notmuch_message_set_date (message, date);
+	    date = notmuch_message_file_get_header (message_file, "date");
+	    _notmuch_message_set_date (message, date);
 
-	_notmuch_message_index_file (message, filename);
+	    _notmuch_message_index_file (message, filename);
+	} else {
+	    const char *old_filename = notmuch_message_get_filename (message);
+	    if (strcmp (old_filename, filename) == 0) {
+		/* We have already seen it */
+		goto DONE;
+	    } else {
+		if (access (old_filename, R_OK) == 0) {
+		    /* old_filename still exists, we've got a duplicate */
+		    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+		    goto DONE;
+		} else {
+		    /* Message file has been moved/renamed */
+		    _notmuch_message_set_filename (message, filename);
+		}
+	    }
+	}
 
 	_notmuch_message_sync (message);
     } catch (const Xapian::Error &error) {
diff --git a/notmuch-new.c b/notmuch-new.c
index 0dd2784..d16679c 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -174,54 +174,50 @@ add_files_recursive (notmuch_database_t *notmuch,
 	}
 
 	if (S_ISREG (st->st_mode)) {
-	    /* If the file hasn't been modified since the last
-	     * add_files, then we need not look at it. */
-	    if (path_dbtime == 0 || st->st_mtime > path_dbtime) {
-		state->processed_files++;
-
-		status = notmuch_database_add_message (notmuch, next, &message);
-		switch (status) {
-		    /* success */
-		    case NOTMUCH_STATUS_SUCCESS:
-			state->added_messages++;
-			tag_inbox_and_unread (message);
-			break;
-		    /* Non-fatal issues (go on to next file) */
-		    case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
-		        /* Stay silent on this one. */
-			break;
-		    case NOTMUCH_STATUS_FILE_NOT_EMAIL:
-			fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
-				 next);
-			break;
-		    /* Fatal issues. Don't process anymore. */
-		    case NOTMUCH_STATUS_READONLY_DATABASE:
-		    case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
-		    case NOTMUCH_STATUS_OUT_OF_MEMORY:
-			fprintf (stderr, "Error: %s. Halting processing.\n",
-				 notmuch_status_to_string (status));
-			ret = status;
-			goto DONE;
-		    default:
-		    case NOTMUCH_STATUS_FILE_ERROR:
-		    case NOTMUCH_STATUS_NULL_POINTER:
-		    case NOTMUCH_STATUS_TAG_TOO_LONG:
-		    case NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW:
-		    case NOTMUCH_STATUS_LAST_STATUS:
-			INTERNAL_ERROR ("add_message returned unexpected value: %d",  status);
-			goto DONE;
-		}
-
-		if (message) {
-		    notmuch_message_destroy (message);
-		    message = NULL;
-		}
-
-		if (do_add_files_print_progress) {
-		    do_add_files_print_progress = 0;
-		    add_files_print_progress (state);
-		}
-	    }
+            state->processed_files++;
+
+            status = notmuch_database_add_message (notmuch, next, &message);
+            switch (status) {
+                /* success */
+                case NOTMUCH_STATUS_SUCCESS:
+                    state->added_messages++;
+                    tag_inbox_and_unread (message);
+                    break;
+		/* Non-fatal issues (go on to next file) */
+                case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
+                    /* Stay silent on this one. */
+                    break;
+                case NOTMUCH_STATUS_FILE_NOT_EMAIL:
+                    fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
+                             next);
+                    break;
+		/* Fatal issues. Don't process anymore. */
+                case NOTMUCH_STATUS_READONLY_DATABASE:
+                case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
+                case NOTMUCH_STATUS_OUT_OF_MEMORY:
+                    fprintf (stderr, "Error: %s. Halting processing.\n",
+                             notmuch_status_to_string (status));
+                    ret = status;
+                    goto DONE;
+                default:
+                case NOTMUCH_STATUS_FILE_ERROR:
+                case NOTMUCH_STATUS_NULL_POINTER:
+                case NOTMUCH_STATUS_TAG_TOO_LONG:
+                case NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW:
+                case NOTMUCH_STATUS_LAST_STATUS:
+                    INTERNAL_ERROR ("add_message returned unexpected value: %d",  status);
+                    goto DONE;
+            }
+
+            if (message) {
+                notmuch_message_destroy (message);
+                message = NULL;
+            }
+
+            if (do_add_files_print_progress) {
+                do_add_files_print_progress = 0;
+                add_files_print_progress (state);
+            }
 	} else if (S_ISDIR (st->st_mode)) {
 	    status = add_files_recursive (notmuch, next, st, state);
 	    if (status && ret == NOTMUCH_STATUS_SUCCESS)
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH (rebased)] Handle message renames in mail spool
  2009-11-26 20:23 [PATCH] Handle message renames in mail spool Mikhail Gusarov
@ 2009-12-02 21:15 ` Mikhail Gusarov
  2009-12-04  0:45   ` Carl Worth
  0 siblings, 1 reply; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-02 21:15 UTC (permalink / raw)
  To: notmuch

In order to handle message renames the following changes were deemed necessary:

* Mtime check on individual files was disabled. As files may be moved around
without changing their mtime, it's necessary to parse them even if they appear
old in case old message was moved. mtime check on directories was kept as moving
files changes mtime of directory.

* If message being parsed is already found in database under different path,
then this message is considered to be moved, path is updated in database and
this file does not undergo further processing.

Note that after applying this patch notmuch still does not handle copying files
(which is harmless, database will point to the last copy of message found during
'notmuch new') and deleting files (which is more serious, as dangling entries
will show up in searches).

Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net>
---
 lib/database.cc |   32 ++++++++++-----
 notmuch-new.c   |  116 ++++++++++++++++++++++++++----------------------------
 2 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 23ddd4a..45d8fc7 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -993,19 +993,31 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 	if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
 	    _notmuch_message_set_filename (message, filename);
 	    _notmuch_message_add_term (message, "type", "mail");
-	} else {
-	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-	    goto DONE;
-	}
 
-	ret = _notmuch_database_link_message (notmuch, message, message_file);
-	if (ret)
-	    goto DONE;
+	    ret = _notmuch_database_link_message (notmuch, message, message_file);
+	    if (ret)
+		goto DONE;
 
-	date = notmuch_message_file_get_header (message_file, "date");
-	_notmuch_message_set_date (message, date);
+	    date = notmuch_message_file_get_header (message_file, "date");
+	    _notmuch_message_set_date (message, date);
 
-	_notmuch_message_index_file (message, filename);
+	    _notmuch_message_index_file (message, filename);
+	} else {
+	    const char *old_filename = notmuch_message_get_filename (message);
+	    if (strcmp (old_filename, filename) == 0) {
+		/* We have already seen it */
+		goto DONE;
+	    } else {
+		if (access (old_filename, R_OK) == 0) {
+		    /* old_filename still exists, we've got a duplicate */
+		    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+		    goto DONE;
+		} else {
+		    /* Message file has been moved/renamed */
+		    _notmuch_message_set_filename (message, filename);
+		}
+	    }
+	}
 
 	_notmuch_message_sync (message);
     } catch (const Xapian::Error &error) {
diff --git a/notmuch-new.c b/notmuch-new.c
index 9d20616..d595fc4 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -217,66 +217,62 @@ add_files_recursive (notmuch_database_t *notmuch,
 	}
 
 	if (S_ISREG (st->st_mode)) {
-	    /* If the file hasn't been modified since the last
-	     * add_files, then we need not look at it. */
-	    if (path_dbtime == 0 || st->st_mtime > path_dbtime) {
-		state->processed_files++;
-
-		if (state->verbose) {
-		    if (state->output_is_a_tty)
-			printf("\r\033[K");
-
-		    printf ("%i/%i: %s",
-			    state->processed_files,
-			    state->total_files,
-			    next);
-
-		    putchar((state->output_is_a_tty) ? '\r' : '\n');
-		    fflush (stdout);
-		}
-
-		status = notmuch_database_add_message (notmuch, next, &message);
-		switch (status) {
-		    /* success */
-		    case NOTMUCH_STATUS_SUCCESS:
-			state->added_messages++;
-			tag_inbox_and_unread (message);
-			break;
-		    /* Non-fatal issues (go on to next file) */
-		    case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
-		        /* Stay silent on this one. */
-			break;
-		    case NOTMUCH_STATUS_FILE_NOT_EMAIL:
-			fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
-				 next);
-			break;
-		    /* Fatal issues. Don't process anymore. */
-		    case NOTMUCH_STATUS_READONLY_DATABASE:
-		    case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
-		    case NOTMUCH_STATUS_OUT_OF_MEMORY:
-			fprintf (stderr, "Error: %s. Halting processing.\n",
-				 notmuch_status_to_string (status));
-			ret = status;
-			goto DONE;
-		    default:
-		    case NOTMUCH_STATUS_FILE_ERROR:
-		    case NOTMUCH_STATUS_NULL_POINTER:
-		    case NOTMUCH_STATUS_TAG_TOO_LONG:
-		    case NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW:
-		    case NOTMUCH_STATUS_LAST_STATUS:
-			INTERNAL_ERROR ("add_message returned unexpected value: %d",  status);
-			goto DONE;
-		}
-
-		if (message) {
-		    notmuch_message_destroy (message);
-		    message = NULL;
-		}
-
-		if (do_add_files_print_progress) {
-		    do_add_files_print_progress = 0;
-		    add_files_print_progress (state);
-		}
+	    state->processed_files++;
+
+	    if (state->verbose) {
+		if (state->output_is_a_tty)
+		    printf("\r\033[K");
+
+		printf ("%i/%i: %s",
+			state->processed_files,
+			state->total_files,
+			next);
+
+		putchar((state->output_is_a_tty) ? '\r' : '\n');
+		fflush (stdout);
+	    }
+
+	    status = notmuch_database_add_message (notmuch, next, &message);
+	    switch (status) {
+		/* success */
+		case NOTMUCH_STATUS_SUCCESS:
+		    state->added_messages++;
+		    tag_inbox_and_unread (message);
+		    break;
+		/* Non-fatal issues (go on to next file) */
+		case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
+		    /* Stay silent on this one. */
+		    break;
+		case NOTMUCH_STATUS_FILE_NOT_EMAIL:
+		    fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
+			     next);
+		    break;
+		/* Fatal issues. Don't process anymore. */
+		case NOTMUCH_STATUS_READONLY_DATABASE:
+		case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
+		case NOTMUCH_STATUS_OUT_OF_MEMORY:
+		    fprintf (stderr, "Error: %s. Halting processing.\n",
+			     notmuch_status_to_string (status));
+		    ret = status;
+		    goto DONE;
+		default:
+		case NOTMUCH_STATUS_FILE_ERROR:
+		case NOTMUCH_STATUS_NULL_POINTER:
+		case NOTMUCH_STATUS_TAG_TOO_LONG:
+		case NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW:
+		case NOTMUCH_STATUS_LAST_STATUS:
+		    INTERNAL_ERROR ("add_message returned unexpected value: %d",  status);
+		    goto DONE;
+	    }
+
+	    if (message) {
+		notmuch_message_destroy (message);
+		message = NULL;
+	    }
+
+	    if (do_add_files_print_progress) {
+		do_add_files_print_progress = 0;
+		add_files_print_progress (state);
 	    }
 	} else if (S_ISDIR (st->st_mode)) {
 	    status = add_files_recursive (notmuch, next, st, state);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-02 21:15 ` [PATCH (rebased)] " Mikhail Gusarov
@ 2009-12-04  0:45   ` Carl Worth
  2009-12-04 13:55     ` david
  0 siblings, 1 reply; 21+ messages in thread
From: Carl Worth @ 2009-12-04  0:45 UTC (permalink / raw)
  To: Mikhail Gusarov, notmuch

[-- Attachment #1: Type: text/plain, Size: 3084 bytes --]

On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov <dottedmag@dottedmag.net> wrote:
> In order to handle message renames the following changes were deemed
> necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

> * Mtime check on individual files was disabled. As files may be moved around
> without changing their mtime, it's necessary to parse them even if they appear
> old in case old message was moved. mtime check on directories was kept as moving
> files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 10000 files every time sounds pretty rough. Fortunately...

> Note that after applying this patch notmuch still does not handle copying files
> (which is harmless, database will point to the last copy of message found during
> 'notmuch new') and deleting files (which is more serious, as dangling entries
> will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

	notmuch_directory_t
	notmuch_database_read_directory (notmuch_database_t *database,
	                                 const char *path);

	notmuch_status_t
	notmuch_message_remove_filename (notmuch_message_t *message,
                                         const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for "I'm done adding
files, go ahead and delete dangling messages", or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto "just in case".

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04  0:45   ` Carl Worth
@ 2009-12-04 13:55     ` david
  2009-12-04 18:05       ` Carl Worth
  0 siblings, 1 reply; 21+ messages in thread
From: david @ 2009-12-04 13:55 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch


At Thu, 03 Dec 2009 16:45:22 -0800,

> Anyway, I think we'll see code for that soon, so I'm not planning to
> commit the offered patch. But people really needing renames might want
> to use it for now, (and live with any performance implications it
> causes).

I could live with the performance issues, but it seems that it re-tags
every "Processed" file (renamed or not) as inbox.  This brings about
20k messages back into my inbox, which is a bit unusable.  The problem
seems to be that notmuch_database_add_message returns
NOTMUCH_STATUS_SUCCESS whether or not a new message was really added.
I don't know if there is an easy fix for this, or if it is worth
pursuing, given that the patch won't be committed.

d

P.S. do people want to be CC'd on this list, or not?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 13:55     ` david
@ 2009-12-04 18:05       ` Carl Worth
  2009-12-04 18:07         ` Mikhail Gusarov
  0 siblings, 1 reply; 21+ messages in thread
From: Carl Worth @ 2009-12-04 18:05 UTC (permalink / raw)
  To: david; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]

On Fri, 04 Dec 2009 09:55:45 -0400, david@tethera.net wrote:
> P.S. do people want to be CC'd on this list, or not?

We don't require subscription to the list, so I recommend CC, yes.

Plus, notmuch already handles duplicate mail just fine, (in that the
user only sees one copy at least). And I tag my mail differently when
one of my addresses appears on the CC list, so I definitely prefer that
people CC me when they want to call my specific attention to a message.

-Carl


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:05       ` Carl Worth
@ 2009-12-04 18:07         ` Mikhail Gusarov
  2009-12-04 18:35           ` Carl Worth
  2009-12-04 18:52           ` Michael Alan Dorman
  0 siblings, 2 replies; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-04 18:07 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]


Twas brillig at 10:05:05 04.12.2009 UTC-08 when cworth@cworth.org did gyre and gimble:

 CW> Plus, notmuch already handles duplicate mail just fine, (in that the
 CW> user only sees one copy at least). And I tag my mail differently when
 CW> one of my addresses appears on the CC list, so I definitely prefer that
 CW> people CC me when they want to call my specific attention to a message.

The only problem with Cc is that Mailman suppresses duplicate messages and hence
there is no List-Id: on message.

-- 
  http://fossarchy.blogspot.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:07         ` Mikhail Gusarov
@ 2009-12-04 18:35           ` Carl Worth
  2009-12-04 18:42             ` Mikhail Gusarov
                               ` (2 more replies)
  2009-12-04 18:52           ` Michael Alan Dorman
  1 sibling, 3 replies; 21+ messages in thread
From: Carl Worth @ 2009-12-04 18:35 UTC (permalink / raw)
  To: Mikhail Gusarov, notmuch

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

On Sat, 05 Dec 2009 00:07:36 +0600, Mikhail Gusarov <dottedmag@dottedmag.net> wrote:
> The only problem with Cc is that Mailman suppresses duplicate messages and hence
> there is no List-Id: on message.

Hey, well notmuch doesn't even index the List-Id: header anyway. [*] ;-)

But the above sounds like the List-Id header is unreliable enough to be
useless. Any reason not to just use something like
to:notmuch@notmuchmail to match messages sent to a list like this one?

I think mailman defaults to not allowing messages with the mailing-list
address implicit (such as in a Bcc) so it seems like matching the list
recipient will be more reliable than hoping the List-Id is always there.

-Carl

[*] Our TODO list does talk about supporting a configuration parameter
for indexing additional headers of interest.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:35           ` Carl Worth
@ 2009-12-04 18:42             ` Mikhail Gusarov
  2009-12-04 19:09             ` Michael Alan Dorman
  2009-12-16 23:51             ` Bdale Garbee
  2 siblings, 0 replies; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-04 18:42 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 908 bytes --]


Twas brillig at 10:35:27 04.12.2009 UTC-08 when cworth@cworth.org did gyre and gimble:

 >> The only problem with Cc is that Mailman suppresses duplicate
 >> messages and hence there is no List-Id: on message.

 CW> But the above sounds like the List-Id header is unreliable enough
 CW> to be useless.  Any reason not to just use something like
 CW> to:notmuch@notmuchmail to match messages sent to a list like this
 CW> one?

Automated processing. I'd go crazy to put all mailing lists' addresses
to .procmailrc instead of simple sorter in sed. But it seems it's the
only reliable way.

 CW> I think mailman defaults to not allowing messages with the
 CW> mailing-list address implicit (such as in a Bcc) so it seems like
 CW> matching the list recipient will be more reliable than hoping the
 CW> List-Id is always there.

Yep. Unfortunately.

-- 
  http://fossarchy.blogspot.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:07         ` Mikhail Gusarov
  2009-12-04 18:35           ` Carl Worth
@ 2009-12-04 18:52           ` Michael Alan Dorman
  2009-12-04 18:55             ` Mikhail Gusarov
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Alan Dorman @ 2009-12-04 18:52 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

On Sat, 05 Dec 2009 00:07:36 +0600
Mikhail Gusarov <dottedmag@dottedmag.net> wrote:

> The only problem with Cc is that Mailman suppresses duplicate
> messages and hence there is no List-Id: on message.

Err, this makes no sense.  How can Mailman have any knowledge of, and
therefore "do anything" to any message that came by way of a CC?

Now, your mail transfer agent might do duplicate suppression, and if
the direct email reaches you before the one that went through the
mailing list, you won't have a copy that includes the list-id header,
but that's an issue on your end, not with the mailing list software.

Mike.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:52           ` Michael Alan Dorman
@ 2009-12-04 18:55             ` Mikhail Gusarov
  2009-12-04 19:40               ` Michael Alan Dorman
  0 siblings, 1 reply; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-04 18:55 UTC (permalink / raw)
  To: Michael Alan Dorman; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]


Twas brillig at 13:52:20 04.12.2009 UTC-05 when mdorman@ironicdesign.com did gyre and gimble:

 MAD> Err, this makes no sense.  How can Mailman have any knowledge of,
 MAD> and therefore "do anything" to any message that came by way of a
 MAD> CC?

for each subscriber:
  if subscriber.email in message.cc:
     continue
  ...
  # delivery

 MAD> Now, your mail transfer agent might do duplicate suppression

No, it does not.

-- 
  http://fossarchy.blogspot.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:35           ` Carl Worth
  2009-12-04 18:42             ` Mikhail Gusarov
@ 2009-12-04 19:09             ` Michael Alan Dorman
  2009-12-05  0:32               ` Carl Worth
  2009-12-05  0:39               ` Carl Worth
  2009-12-16 23:51             ` Bdale Garbee
  2 siblings, 2 replies; 21+ messages in thread
From: Michael Alan Dorman @ 2009-12-04 19:09 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

> But the above sounds like the List-Id header is unreliable enough to
> be useless.

In my current .sieve setup, I have 93 entries for mailing lists.  87
of them use list-id[1].  3 use list-post.  1 uses 'mailing-list', but
looking at it, could be switched to list-id.  2 use x-mailing-list
(blasted vger.kernel.org).

None of my email gets misfiled, so it seems pretty darn reliable to
me. :)

Now, if you have an MTA that does duplicate suppression based on
message-id, you probably won't see the copy of a message that went to
the list if you're cc:'d on it because the direct copy (sans list-id
header) is likely to arrive first.

I would argue that that's a feature not a bug---the sender, at least,
hopes you will give it closer scrutiny because you were CC:'d.  They're
trying to bring it to your attention.

Besides, in notmuch, what's the difference going to be?  It'll still be
threaded the same, etc., but you'd be able to tell that this one came
to you rather than through the list, no?

(I'm waiting for Debian packages, lazy bastard that I am, so I'm
guessing on that)

> Any reason not to just use something like
> to:notmuch@notmuchmail to match messages sent to a list like this one?

On the linux-kernel list, l-k often isn't in the to: field---or does
notmuch also index the cc: as to:?  If it does, this could work; if
not, FAIL.

Mike.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:55             ` Mikhail Gusarov
@ 2009-12-04 19:40               ` Michael Alan Dorman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Alan Dorman @ 2009-12-04 19:40 UTC (permalink / raw)
  To: Mikhail Gusarov; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

On Sat, 05 Dec 2009 00:55:20 +0600
Mikhail Gusarov <dottedmag@dottedmag.net> wrote:

> 
> Twas brillig at 13:52:20 04.12.2009 UTC-05 when
> mdorman@ironicdesign.com did gyre and gimble:
> 
>  MAD> Err, this makes no sense.  How can Mailman have any knowledge
>  MAD> of, and therefore "do anything" to any message that came by way
>  MAD> of a CC?
> 
> for each subscriber:
>   if subscriber.email in message.cc:
>      continue
>   ...
>   # delivery

I stand corrected---it seems like a gigantic misfeature to me, so
much so that I checked and apparently that is exactly how Mailman
works in its default configuration.

My apologies for suggesting you didn't know what you were talking
about.  I made the mistake of assuming sane software.

Mike.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 19:09             ` Michael Alan Dorman
@ 2009-12-05  0:32               ` Carl Worth
  2009-12-05  0:39               ` Carl Worth
  1 sibling, 0 replies; 21+ messages in thread
From: Carl Worth @ 2009-12-05  0:32 UTC (permalink / raw)
  To: Michael Alan Dorman, notmuch

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman <mdorman@ironicdesign.com> wrote:
> Now, if you have an MTA that does duplicate suppression based on
> message-id, you probably won't see the copy of a message that went to
> the list if you're cc:'d on it because the direct copy (sans list-id
> header) is likely to arrive first.
> 
> I would argue that that's a feature not a bug---the sender, at least,
> hopes you will give it closer scrutiny because you were CC:'d.  They're
> trying to bring it to your attention.

Sure, giving it closer scrutiny is good. But if I expect a search like:

	tag:lkml

to match all of my mail that came through the mailing list, but it
actually *misses* mail where the sender wanted me to give extra
scrutiny, then that's a big failure.

> Besides, in notmuch, what's the difference going to be?  It'll still be
> threaded the same, etc., but you'd be able to tell that this one came
> to you rather than through the list, no?

The difference is whether the message is found in a search, (see above).

> (I'm waiting for Debian packages, lazy bastard that I am, so I'm
> guessing on that)

Yeah, I'll get to that (real soon now, I promise.)

> On the linux-kernel list, l-k often isn't in the to: field---or does
> notmuch also index the cc: as to:?  If it does, this could work; if
> not, FAIL.

Yes. In notmuch, all recipient fields, (even Bcc: if a mail happens to
hit your mail store with that intact), all get indexed to a single "to"
prefix. My rationale is that when reading a message it's often very
useful to see whether I was addresses specifically or just CC'ed. But
when _searching_ for a message, it's too fragile to have to guess
whether the recipient was on the To: or CC: header (and too painful to
always type (to:me@example.com or cc:me@example.com).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 19:09             ` Michael Alan Dorman
  2009-12-05  0:32               ` Carl Worth
@ 2009-12-05  0:39               ` Carl Worth
  2009-12-05  0:47                 ` Mikhail Gusarov
  2009-12-05  8:51                 ` Marten Veldthuis
  1 sibling, 2 replies; 21+ messages in thread
From: Carl Worth @ 2009-12-05  0:39 UTC (permalink / raw)
  To: Michael Alan Dorman, notmuch

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

On Fri, 4 Dec 2009 14:09:46 -0500, Michael Alan Dorman <mdorman@ironicdesign.com> wrote:
> Besides, in notmuch, what's the difference going to be?  It'll still be
> threaded the same, etc., but you'd be able to tell that this one came
> to you rather than through the list, no?

There's one other point I should make here while talking about duplicate
messages, (as determined by identical Message ID).

Currently notmuch just indexes the first version of any given message it
sees, and simply ignores anything else it sees in the future.

We're planning to change it to at least save each of the filenames for
messages with multiple files. That way if some duplicates are deleted,
then notmuch will still be able to find one of the others.

Also, we could make notmuch index duplicate messages and add any
additional terms found to the document for the message. Currently, that
wouldn't make a big difference since notmuch is only indexing the body
and a few specific headers, (From, Subject, To, Cc, Bcc, Messsage-ID,
In-Reply-To, References).

So any differences there should be quite minor (a "[LIST]" prefix in
subject? an extra footer in the boday?), under the assumption that no
mail files will ever exist with the same message ID but disparate
content.

Now, we have a TODO item to allow for indexing additional headers,
(either by default or by user configuration). Once we start doing that,
it probably will make sense to at least index the duplicates.

But when viewing an actual message, I'm still planning on having notmuch
just return an arbitrary filename from the list of filenames associated
with that message. Does anyone see any problem with that? Can you think
of a case where you'd really care about seeing one or the other of
a particular duplicated message?

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-05  0:39               ` Carl Worth
@ 2009-12-05  0:47                 ` Mikhail Gusarov
  2009-12-05  8:51                 ` Marten Veldthuis
  1 sibling, 0 replies; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-05  0:47 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]


Twas brillig at 16:39:50 04.12.2009 UTC-08 when cworth@cworth.org did gyre and gimble:

 CW> But when viewing an actual message, I'm still planning on having
 CW> notmuch just return an arbitrary filename from the list of
 CW> filenames associated with that message. Does anyone see any problem
 CW> with that? Can you think of a case where you'd really care about
 CW> seeing one or the other of a particular duplicated message?

There might be different Reply-To fields.

So I'd just return bigger dup, as it probably contains more information
:)

-- 
  http://fossarchy.blogspot.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-05  0:39               ` Carl Worth
  2009-12-05  0:47                 ` Mikhail Gusarov
@ 2009-12-05  8:51                 ` Marten Veldthuis
  2009-12-05 22:48                   ` Carl Worth
  1 sibling, 1 reply; 21+ messages in thread
From: Marten Veldthuis @ 2009-12-05  8:51 UTC (permalink / raw)
  To: Carl Worth, Michael Alan Dorman, notmuch

On Fri, 04 Dec 2009 16:39:50 -0800, Carl Worth <cworth@cworth.org> wrote:
> But when viewing an actual message, I'm still planning on having notmuch
> just return an arbitrary filename from the list of filenames associated
> with that message. Does anyone see any problem with that? Can you think
> of a case where you'd really care about seeing one or the other of
> a particular duplicated message?

As long as it's deterministic. But if you don't display the first
filename received, couldn't you exploit this by spoofing message ids?

-- 
- Marten

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-05  8:51                 ` Marten Veldthuis
@ 2009-12-05 22:48                   ` Carl Worth
  0 siblings, 0 replies; 21+ messages in thread
From: Carl Worth @ 2009-12-05 22:48 UTC (permalink / raw)
  To: Marten Veldthuis, Michael Alan Dorman, notmuch

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

On Sat, 05 Dec 2009 09:51:58 +0100, Marten Veldthuis <marten@veldthuis.com> wrote:
> On Fri, 04 Dec 2009 16:39:50 -0800, Carl Worth <cworth@cworth.org> wrote:
> > But when viewing an actual message, I'm still planning on having notmuch
> > just return an arbitrary filename from the list of filenames associated
> > with that message. Does anyone see any problem with that? Can you think
> > of a case where you'd really care about seeing one or the other of
> > a particular duplicated message?
> 
> As long as it's deterministic. But if you don't display the first
> filename received, couldn't you exploit this by spoofing message ids?

What it currently does is use the filename of the first file that
notmuch encounters. That's different than "first received", but either
way, there's still a race condition here for active spoofing attempts.

And, yes, actual intentional collisions of message IDs is something I
hadn't given thought to yet. So thanks for bringing that up. It's
definitely a case where you'd want to know and see the difference.

So maybe what we really want to do is to display some full-context diff
of the message by default, and have notmuch learn about differences the
user isn't interested in seeing, (such as mailing-list footers or so).

That sounds workable and should make any spoofing attempt obvious to the
user.

-Carl


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-04 18:35           ` Carl Worth
  2009-12-04 18:42             ` Mikhail Gusarov
  2009-12-04 19:09             ` Michael Alan Dorman
@ 2009-12-16 23:51             ` Bdale Garbee
  2009-12-17  0:01               ` Mikhail Gusarov
  2009-12-17  1:36               ` Michael Alan Dorman
  2 siblings, 2 replies; 21+ messages in thread
From: Bdale Garbee @ 2009-12-16 23:51 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

On Fri, 2009-12-04 at 10:35 -0800, Carl Worth wrote:

> But the above sounds like the List-Id header is unreliable enough to be
> useless. 

FWIW, that does not match my experience.

> Any reason not to just use something like
> to:notmuch@notmuchmail to match messages sent to a list like this one?

I'd had much better luck matching List-Id than matching addresses in
recent years.  YMMV.

Bdale

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-16 23:51             ` Bdale Garbee
@ 2009-12-17  0:01               ` Mikhail Gusarov
  2009-12-17  2:57                 ` Bdale Garbee
  2009-12-17  1:36               ` Michael Alan Dorman
  1 sibling, 1 reply; 21+ messages in thread
From: Mikhail Gusarov @ 2009-12-17  0:01 UTC (permalink / raw)
  To: Bdale Garbee; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]


Twas brillig at 16:51:17 16.12.2009 UTC-07 when bdale@gag.com did gyre and gimble:

 >> But the above sounds like the List-Id header is unreliable enough to
 >> be useless.

 BG> FWIW, that does not match my experience.

Yeah. This mail just arrived to my "main" folder instead of "notmuch"
one, as you kept me in CC and hence Mailman did not send the copy with
List-Id to me.

Please read the whole thread.

-- 
  http://fossarchy.blogspot.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-16 23:51             ` Bdale Garbee
  2009-12-17  0:01               ` Mikhail Gusarov
@ 2009-12-17  1:36               ` Michael Alan Dorman
  1 sibling, 0 replies; 21+ messages in thread
From: Michael Alan Dorman @ 2009-12-17  1:36 UTC (permalink / raw)
  To: notmuch

> I'd had much better luck matching List-Id than matching addresses in
> recent years.  YMMV.

As long as you're not CC:d, you're fine.  If you're CC:'d, well, Mailman
is more brain-dead than you could imagine.

Mike.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH (rebased)] Handle message renames in mail spool
  2009-12-17  0:01               ` Mikhail Gusarov
@ 2009-12-17  2:57                 ` Bdale Garbee
  0 siblings, 0 replies; 21+ messages in thread
From: Bdale Garbee @ 2009-12-17  2:57 UTC (permalink / raw)
  To: Mikhail Gusarov; +Cc: notmuch

On Thu, 2009-12-17 at 06:01 +0600, Mikhail Gusarov wrote:
> Twas brillig at 16:51:17 16.12.2009 UTC-07 when bdale@gag.com did gyre and gimble:
> 
>  >> But the above sounds like the List-Id header is unreliable enough to
>  >> be useless.
> 
>  BG> FWIW, that does not match my experience.
> 
> Yeah. This mail just arrived to my "main" folder instead of "notmuch"
> one, as you kept me in CC and hence Mailman did not send the copy with
> List-Id to me.
> 
> Please read the whole thread.

I did.  I guess I've just been lucky enough to mostly participate in
lists run with other software than Mailman or whose admins didn't leave
this default behavior in place...  [sigh]

I will, very unhappily, concede the point.

Bdale

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-12-17  2:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-26 20:23 [PATCH] Handle message renames in mail spool Mikhail Gusarov
2009-12-02 21:15 ` [PATCH (rebased)] " Mikhail Gusarov
2009-12-04  0:45   ` Carl Worth
2009-12-04 13:55     ` david
2009-12-04 18:05       ` Carl Worth
2009-12-04 18:07         ` Mikhail Gusarov
2009-12-04 18:35           ` Carl Worth
2009-12-04 18:42             ` Mikhail Gusarov
2009-12-04 19:09             ` Michael Alan Dorman
2009-12-05  0:32               ` Carl Worth
2009-12-05  0:39               ` Carl Worth
2009-12-05  0:47                 ` Mikhail Gusarov
2009-12-05  8:51                 ` Marten Veldthuis
2009-12-05 22:48                   ` Carl Worth
2009-12-16 23:51             ` Bdale Garbee
2009-12-17  0:01               ` Mikhail Gusarov
2009-12-17  2:57                 ` Bdale Garbee
2009-12-17  1:36               ` Michael Alan Dorman
2009-12-04 18:52           ` Michael Alan Dorman
2009-12-04 18:55             ` Mikhail Gusarov
2009-12-04 19:40               ` Michael Alan Dorman

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).