* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run
@ 2011-01-28 0:08 Austin Clements
2011-01-28 8:51 ` Sebastian Spaeth
0 siblings, 1 reply; 11+ messages in thread
From: Austin Clements @ 2011-01-28 0:08 UTC (permalink / raw)
To: Carl Worth; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 878 bytes --]
Looks like eagerly synchronizing tags is easy and works fine, though I need
to beef up the tests and put some transactions around things before I'm
satisfied.
I added a "notmuch_database_remove_message_get" to the public API that's
just like "notmuch_database_remove_message" except that it also returns a
notmuch_message_t if other instances of the message still exist. It feels
clunky to have two almost identical variants of this function. Is this the
preferred way to change the public API? Or should I just add the argument
to the existing function and fix the other three calls to it?
On Thu, Jan 27, 2011 at 12:43 AM, Austin Clements <amdragon@mit.edu> wrote:
> Sure. I've been wanting to take a crack at notmuch new's atomicity for a
> while. Though you'll have to get through some of my outstanding patches. I
> can only keep so many branches in my head. ]:--8)
>
[-- Attachment #2: Type: text/html, Size: 1179 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-28 0:08 [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run Austin Clements @ 2011-01-28 8:51 ` Sebastian Spaeth 0 siblings, 0 replies; 11+ messages in thread From: Sebastian Spaeth @ 2011-01-28 8:51 UTC (permalink / raw) To: Austin Clements, Carl Worth; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 779 bytes --] On Thu, 27 Jan 2011 19:08:38 -0500, Austin Clements <amdragon@mit.edu> wrote: > I added a "notmuch_database_remove_message_get" to the public API that's > just like "notmuch_database_remove_message" except that it also returns a > notmuch_message_t if other instances of the message still exist. It feels > clunky to have two almost identical variants of this function. Is this the > preferred way to change the public API? Or should I just add the argument > to the existing function and fix the other three calls to it? Just adding an argument to the public API without library version bump would break my python bindings. So if we modify the public API, I would prefer if we rename the function and remove the old one completely. That I could detect at least. Sebastian [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/3] Speedups and enhancements of notmuch new @ 2011-01-21 9:59 Michal Sojka 2011-01-21 9:59 ` [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run Michal Sojka 0 siblings, 1 reply; 11+ messages in thread From: Michal Sojka @ 2011-01-21 9:59 UTC (permalink / raw) To: notmuch Hi all, after yesterday discussion on IRC I looked at some ways how to speedup the initial run of notmuch new. It turned out that with a few pretty simple changes I'm able to reduce the time by about 30% (from 1h 46m to 1h 14m on ext4, 200k messages, 3GB). -Michal Michal Sojka (3): new: Do not defer maildir flag synchronization during the first run new: Add all initial tags at once new: Enhance progress reporting notmuch-new.c | 194 +++++++++++++++++++++++++++++++++++++-------------------- 1 files changed, 126 insertions(+), 68 deletions(-) -- 1.7.2.3 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-21 9:59 [PATCH 0/3] Speedups and enhancements of notmuch new Michal Sojka @ 2011-01-21 9:59 ` Michal Sojka 2011-01-25 22:42 ` Austin Clements 0 siblings, 1 reply; 11+ messages in thread From: Michal Sojka @ 2011-01-21 9:59 UTC (permalink / raw) To: notmuch When notmuch new is run for the first time, it is not necessary to defer maildir flags synchronization to later because we already know that no files will be removed. Performing the maildinr flag synchronization immediately after the message is added to the database has the advantage that the message is likely hot in the disk cache so the synchronization is faster. Additionally, we also save one database query for each message, which must be performed when the operation is deferred. Without this patchi, the first notmuch new of 200k messages (3 GB) took 1h and 46m out of which 20m was maildir flags synchronization. With this patch, the whole operation took only 1h and 36m. --- notmuch-new.c | 36 ++++++++++++++++++++++++++---------- 1 files changed, 26 insertions(+), 10 deletions(-) diff --git a/notmuch-new.c b/notmuch-new.c index cdf8513..a2af045 100644 --- a/notmuch-new.c +++ b/notmuch-new.c @@ -420,19 +420,35 @@ add_files_recursive (notmuch_database_t *notmuch, state->added_messages++; for (tag=state->new_tags; *tag != NULL; tag++) notmuch_message_add_tag (message, *tag); - /* Defer sync of maildir flags until after old filenames - * are removed in the case of a rename. */ - if (state->synchronize_flags == TRUE) - _filename_list_add (state->message_ids_to_sync, - notmuch_message_get_message_id (message)); + if (state->synchronize_flags == TRUE) { + if (!state->total_files) { + /* Defer sync of maildir flags until after old filenames + * are removed in the case of a rename. */ + _filename_list_add (state->message_ids_to_sync, + notmuch_message_get_message_id (message)); + } else { + /* During the first notmuch new we synchronize + * flags immediately, while the message is hot in + * disk cache. */ + notmuch_message_maildir_flags_to_tags (message); + } + } break; /* Non-fatal issues (go on to next file) */ case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: - /* Defer sync of maildir flags until after old filenames - * are removed in the case of a rename. */ - if (state->synchronize_flags == TRUE) - _filename_list_add (state->message_ids_to_sync, - notmuch_message_get_message_id (message)); + if (state->synchronize_flags == TRUE) { + if (!state->total_files) { + /* Defer sync of maildir flags until after old filenames + * are removed in the case of a rename. */ + _filename_list_add (state->message_ids_to_sync, + notmuch_message_get_message_id (message)); + } else { + /* During the first notmuch new we synchronize + * flags immediately, while the message is hot in + * disk cache. */ + notmuch_message_maildir_flags_to_tags (message); + } + } break; case NOTMUCH_STATUS_FILE_NOT_EMAIL: fprintf (stderr, "Note: Ignoring non-mail file: %s\n", -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-21 9:59 ` [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run Michal Sojka @ 2011-01-25 22:42 ` Austin Clements 2011-01-26 9:15 ` Carl Worth 0 siblings, 1 reply; 11+ messages in thread From: Austin Clements @ 2011-01-25 22:42 UTC (permalink / raw) To: Michal Sojka; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 5258 bytes --] Wouldn't this be simpler and more general? --- a/notmuch-new.c +++ b/notmuch-new.c @@ -419,12 +419,11 @@ add_files_recursive (notmuch_database_t *notmuch, case NOTMUCH_STATUS_SUCCESS: state->added_messages++; for (tag=state->new_tags; *tag != NULL; tag++) notmuch_message_add_tag (message, *tag); /* Defer sync of maildir flags until after old filenames * are removed in the case of a rename. */ if (state->synchronize_flags == TRUE) - _filename_list_add (state->message_ids_to_sync, - notmuch_message_get_message_id (message)); + notmuch_message_maildir_flags_to_tags (message); break; /* Non-fatal issues (go on to next file) */ case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: The idea is that, if notmuch_database_add_message returns NOTMUCH_STATUS_SUCCESS, then we know this is a new message (and not a rename or anything complicated) and thus might as well perform the flag synchronization immediately. If it returns NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID, then it could be a rename (or something more complicated), and so we defer the flag synchronization like usual. This works for any new messages, regardless of whether this is the initial import or not. I believe my reasoning is correct. At least, it passes the maildir sync test cases, so if it isn't correct, then we need more maildir sync tests. On Fri, Jan 21, 2011 at 4:59 AM, Michal Sojka <sojkam1@fel.cvut.cz> wrote: > When notmuch new is run for the first time, it is not necessary to defer > maildir flags synchronization to later because we already know that no > files will be removed. > > Performing the maildinr flag synchronization immediately after the > message is added to the database has the advantage that the message is > likely hot in the disk cache so the synchronization is faster. > Additionally, we also save one database query for each message, which > must be performed when the operation is deferred. > > Without this patchi, the first notmuch new of 200k messages (3 GB) took > 1h and 46m out of which 20m was maildir flags synchronization. With this > patch, the whole operation took only 1h and 36m. > --- > notmuch-new.c | 36 ++++++++++++++++++++++++++---------- > 1 files changed, 26 insertions(+), 10 deletions(-) > > diff --git a/notmuch-new.c b/notmuch-new.c > index cdf8513..a2af045 100644 > --- a/notmuch-new.c > +++ b/notmuch-new.c > @@ -420,19 +420,35 @@ add_files_recursive (notmuch_database_t *notmuch, > state->added_messages++; > for (tag=state->new_tags; *tag != NULL; tag++) > notmuch_message_add_tag (message, *tag); > - /* Defer sync of maildir flags until after old filenames > - * are removed in the case of a rename. */ > - if (state->synchronize_flags == TRUE) > - _filename_list_add (state->message_ids_to_sync, > - notmuch_message_get_message_id > (message)); > + if (state->synchronize_flags == TRUE) { > + if (!state->total_files) { > + /* Defer sync of maildir flags until after old > filenames > + * are removed in the case of a rename. */ > + _filename_list_add (state->message_ids_to_sync, > + notmuch_message_get_message_id > (message)); > + } else { > + /* During the first notmuch new we synchronize > + * flags immediately, while the message is hot in > + * disk cache. */ > + notmuch_message_maildir_flags_to_tags (message); > + } > + } > break; > /* Non-fatal issues (go on to next file) */ > case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: > - /* Defer sync of maildir flags until after old filenames > - * are removed in the case of a rename. */ > - if (state->synchronize_flags == TRUE) > - _filename_list_add (state->message_ids_to_sync, > - notmuch_message_get_message_id > (message)); > + if (state->synchronize_flags == TRUE) { > + if (!state->total_files) { > + /* Defer sync of maildir flags until after old > filenames > + * are removed in the case of a rename. */ > + _filename_list_add (state->message_ids_to_sync, > + notmuch_message_get_message_id > (message)); > + } else { > + /* During the first notmuch new we synchronize > + * flags immediately, while the message is hot in > + * disk cache. */ > + notmuch_message_maildir_flags_to_tags (message); > + } > + } > break; > case NOTMUCH_STATUS_FILE_NOT_EMAIL: > fprintf (stderr, "Note: Ignoring non-mail file: %s\n", > -- > 1.7.2.3 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch > [-- Attachment #2: Type: text/html, Size: 6260 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-25 22:42 ` Austin Clements @ 2011-01-26 9:15 ` Carl Worth 2011-01-26 11:59 ` Carl Worth 0 siblings, 1 reply; 11+ messages in thread From: Carl Worth @ 2011-01-26 9:15 UTC (permalink / raw) To: Austin Clements, Michal Sojka; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 1893 bytes --] On Tue, 25 Jan 2011 17:42:30 -0500, Austin Clements <amdragon@mit.edu> wrote: > Wouldn't this be simpler and more general? ... > case NOTMUCH_STATUS_SUCCESS: .. > if (state->synchronize_flags == TRUE) > - _filename_list_add (state->message_ids_to_sync, > - notmuch_message_get_message_id (message)); > + notmuch_message_maildir_flags_to_tags (message); Yes, that is much simpler and should work equally well as the original patch. But there's perhaps a problem with both of these patches. Besides rename, (which obviously can't happen with a new message), we also need to take care when a message is added with multiple filenames (and with different flags on the files). We've got a plan for adding flags-to-tags mappings which only apply if every file for the message has the corresponding flag. For example, this is the semantic we want for the 'D' flag mapping to the "deleted" tag. So we'll want to make sure these cases do the right thing. Consider two new files with the same Message-Id both appearing in a run of "notmuch new", one with the D flag and one without. If the file with the D flag is seen first, and the maildir_flags_to_tags processing happens without being deferred, then the "deleted" tag will be applied to the message. This is different than would happen if both messages were seen, but I think it's just fine. It's still in a state that's consistent, nothing bad would happen if you interrupted this and then acted on the "deleted" tag, and if you restarted "notmuch new" and the second message were seen, then the tag would be correctly removed. So, I think I've convinced myself that the change is actually OK. But then I'm also wondering if perhaps we could do the processing undeferred in all cases? I haven't thought that through, but I'd be glad to hear your ideas. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-26 9:15 ` Carl Worth @ 2011-01-26 11:59 ` Carl Worth 2011-01-26 15:07 ` Austin Clements 0 siblings, 1 reply; 11+ messages in thread From: Carl Worth @ 2011-01-26 11:59 UTC (permalink / raw) To: Austin Clements, Michal Sojka; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 1040 bytes --] On Wed, 26 Jan 2011 19:15:21 +1000, Carl Worth <cworth@cworth.org> wrote: > Yes, that is much simpler and should work equally well as the original > patch. ... > So, I think I've convinced myself that the change is actually OK. For those reasons, I'm pushing the patch now. > But then I'm also wondering if perhaps we could do the processing undeferred > in all cases? > > I haven't thought that through, but I'd be glad to hear your ideas. This is still an open question if anyone wants to pursue it, (it might make it simpler to then fix the atomicity bugs with adding new messages to the database, and only later adjusting the maildir filename). On that topic, if we decide we do need to defer the tags/flags mapping, then the real fix is to probably also defer the addition of the filename to the message document in the database. If we change these things only at the same time, then we should be able to avoid any problems with things getting out of synchronization. -Carl -- carl.d.worth@intel.com [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-26 11:59 ` Carl Worth @ 2011-01-26 15:07 ` Austin Clements 2011-01-26 16:50 ` Michal Sojka 2011-01-27 5:04 ` Carl Worth 0 siblings, 2 replies; 11+ messages in thread From: Austin Clements @ 2011-01-26 15:07 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch Quoth Carl Worth on Jan 26 at 9:59 pm: > On Wed, 26 Jan 2011 19:15:21 +1000, Carl Worth <cworth@cworth.org> wrote: > > But then I'm also wondering if perhaps we could do the processing undeferred > > in all cases? > > > > I haven't thought that through, but I'd be glad to hear your ideas. > > This is still an open question if anyone wants to pursue it, (it might > make it simpler to then fix the atomicity bugs with adding new messages > to the database, and only later adjusting the maildir filename). I believe you're right that synchronization could always be done eagerly. In fact, I believe this is true even with the addition of conjunctive and disjunctive flags. When adding or removing messages, flag synchronization fully dictates all tags in flag2tag (that is, the tags' current states don't matter). It also always examines *all* file names backing the message. This is a stateless process. With eager synchronization, the tags may go through some transient states during notmuch new, but will always settle down to the correct set as of the final add or remove for a given message ID. > On that topic, if we decide we do need to defer the tags/flags mapping, > then the real fix is to probably also defer the addition of the filename > to the message document in the database. If we change these things only > at the same time, then we should be able to avoid any problems with > things getting out of synchronization. I'd been thinking about this as a way to break up notmuch new into small, consistent transactions, but I don't think it's necessary since flags can be synchronized eagerly. In fact, with eager synchronization and one other change, I believe notmuch new can be completely correctly performed in lots of small transactions. The one other change is that currently, if notmuch is interrupted after updating a directory mtime but before processing file removals, a subsequent notmuch new will miss those removals. This could be brushed under the rug, since notmuch will notice the removal after any other change in that directory. Or it could be handled correctly by giving directories a "removal mtime" in addition to their current "addition mtime" that is only updated after removals are handled. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-26 15:07 ` Austin Clements @ 2011-01-26 16:50 ` Michal Sojka 2011-01-27 5:04 ` Carl Worth 1 sibling, 0 replies; 11+ messages in thread From: Michal Sojka @ 2011-01-26 16:50 UTC (permalink / raw) To: Austin Clements, Carl Worth; +Cc: notmuch On Wed, 26 Jan 2011, Austin Clements wrote: > The one other change is that currently, if notmuch is interrupted > after updating a directory mtime but before processing file removals, > a subsequent notmuch new will miss those removals. This could be > brushed under the rug, since notmuch will notice the removal after any > other change in that directory. Or it could be handled correctly by > giving directories a "removal mtime" in addition to their current > "addition mtime" that is only updated after removals are handled. Yes, that might be a problem if notmuch new is interrupted by something like SIGKILL or power-off. There is protection against interruption by Ctrl-C. -Michal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-26 15:07 ` Austin Clements 2011-01-26 16:50 ` Michal Sojka @ 2011-01-27 5:04 ` Carl Worth 2011-01-27 5:43 ` Austin Clements 1 sibling, 1 reply; 11+ messages in thread From: Carl Worth @ 2011-01-27 5:04 UTC (permalink / raw) To: Austin Clements; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 498 bytes --] On Wed, 26 Jan 2011 10:07:28 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > Quoth Carl Worth on Jan 26 at 9:59 pm: > > I believe you're right that synchronization could always be done > eagerly. In fact, I believe this is true even with the addition of > conjunctive and disjunctive flags. Excellent! Want to take a whack at implementing that? And seeing if you can find any test cases that stress this more than existing test cases might? -Carl -- carl.d.worth@intel.com [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-27 5:04 ` Carl Worth @ 2011-01-27 5:43 ` Austin Clements 2011-01-30 0:21 ` Rob Browning 0 siblings, 1 reply; 11+ messages in thread From: Austin Clements @ 2011-01-27 5:43 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch Sure. I've been wanting to take a crack at notmuch new's atomicity for a while. Though you'll have to get through some of my outstanding patches. I can only keep so many branches in my head. ]:--8) rlb, you expressed an interest in solving this problem, too. Did you make any headway? "Carl Worth" <cworth@cworth.org> wrote: >On Wed, 26 Jan 2011 10:07:28 -0500, Austin Clements <amdragon@MIT.EDU> >wrote: >> Quoth Carl Worth on Jan 26 at 9:59 pm: >> >> I believe you're right that synchronization could always be done >> eagerly. In fact, I believe this is true even with the addition of >> conjunctive and disjunctive flags. > >Excellent! Want to take a whack at implementing that? And seeing if you >can find any test cases that stress this more than existing test cases >might? > >-Carl > >-- >carl.d.worth@intel.com -- Sent from my Android. Please excuse my brevity. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run 2011-01-27 5:43 ` Austin Clements @ 2011-01-30 0:21 ` Rob Browning 0 siblings, 0 replies; 11+ messages in thread From: Rob Browning @ 2011-01-30 0:21 UTC (permalink / raw) To: Austin Clements; +Cc: notmuch Austin Clements <amdragon@MIT.EDU> writes: > Sure. I've been wanting to take a crack at notmuch new's atomicity for > a while. Though you'll have to get through some of my outstanding > patches. I can only keep so many branches in my head. ]:--8) > > rlb, you expressed an interest in solving this problem, too. Did you > make any headway? No, I haven't done anything there yet. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-01-30 0:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-28 0:08 [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run Austin Clements 2011-01-28 8:51 ` Sebastian Spaeth -- strict thread matches above, loose matches on Subject: below -- 2011-01-21 9:59 [PATCH 0/3] Speedups and enhancements of notmuch new Michal Sojka 2011-01-21 9:59 ` [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run Michal Sojka 2011-01-25 22:42 ` Austin Clements 2011-01-26 9:15 ` Carl Worth 2011-01-26 11:59 ` Carl Worth 2011-01-26 15:07 ` Austin Clements 2011-01-26 16:50 ` Michal Sojka 2011-01-27 5:04 ` Carl Worth 2011-01-27 5:43 ` Austin Clements 2011-01-30 0:21 ` Rob Browning
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).