* Slowness (search opens every email file?) @ 2011-07-11 19:07 Jason Woofenden 2011-07-11 21:58 ` Patrick Totzke 2011-07-11 22:13 ` Austin Clements 0 siblings, 2 replies; 11+ messages in thread From: Jason Woofenden @ 2011-07-11 19:07 UTC (permalink / raw) To: Notmuch Mail Hi all, I'm having a great time patching up the vim frontend, but I've got an issue that is in the backend, and seems far above my head at this point: notmuch search tag:foo is slow! (when my e-mail files are not already in the disk cache) I saw on my activity monitor applet that it was using mostly i/o, and started to wonder if it was opening every e-mail. I little work with strace and searching revealed that this command was opening many many e-mail files from my maildir(s). I spent a little while digging around in the notmuch source, and didn't see where it was opening the email files. I don't think the search command should be opening the files. So my questions: 1) Why is it opening the e-mail files? What information is being read? 2) Do you agree that it should instead get this information from the database? 3) How hard would it be make this fast? What would it take? 4) Who wants to do it? I'd like it to be able to spit out 1000 threads in under a second. Preferably under 100ms. Thank you, -- Jason P.S. I mean really slow... notmuch search tag:foo took 0.5 seconds for 32 threads notmuch search foo took 6.4 seconds for 130 threads Everything's getting into my cache, so I can't easily get lots of numbers. For a while I had a simple search (tag:foo and tag:bar) which returned about 600 threads, and it would frequently take seconds. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-11 19:07 Slowness (search opens every email file?) Jason Woofenden @ 2011-07-11 21:58 ` Patrick Totzke 2011-07-12 20:10 ` Jason Woofenden 2011-07-11 22:13 ` Austin Clements 1 sibling, 1 reply; 11+ messages in thread From: Patrick Totzke @ 2011-07-11 21:58 UTC (permalink / raw) To: Jason Woofenden; +Cc: Notmuch Mail [-- Attachment #1: Type: text/plain, Size: 1581 bytes --] Hi Jason, On Mon, Jul 11, 2011 at 03:07:21PM -0400, Jason Woofenden wrote: > notmuch search tag:foo is slow! > yes, i've just used the vim ui for the first time and i agreee, its sluggish, searching for * takes a while. > (when my e-mail files are not already in the disk cache) > > I saw on my activity monitor applet that it was using mostly i/o, > and started to wonder if it was opening every e-mail. I little work > with strace and searching revealed that this command was opening > many many e-mail files from my maildir(s). I spent a little while > digging around in the notmuch source, and didn't see where it was > opening the email files. I cannot reproduce this. I'm no expert, but at least the output of strace vim -c ":NotMuch" 2>log does not contain any path that matches that of my maildir. Also, I wout be surprised if all individual mails would be read, because for this, a search for all messages feels too fast. > 2) Do you agree that it should instead get this information from > the database? agreed. And If the mail files get read on every search for you, something is definately going wrong there. A quick browse through notmuch.vim tells me that 1) it doesn't use notmuchs json output, i think it should, as iirc this api is considered 'more stable' and is easier to parse than the default output. More importantly, 2) the output of notmuch is copied into a list. This will of course be slow if your query matches a lot of messages. Could this be done by asyncronously writing to the buffer somehow? best, /p [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-11 21:58 ` Patrick Totzke @ 2011-07-12 20:10 ` Jason Woofenden 0 siblings, 0 replies; 11+ messages in thread From: Jason Woofenden @ 2011-07-12 20:10 UTC (permalink / raw) To: Notmuch Mail On 2011-07-11 10:58PM, Patrick Totzke wrote: > Hi Jason, > On Mon, Jul 11, 2011 at 03:07:21PM -0400, Jason Woofenden wrote: > > notmuch search tag:foo is slow! > > > yes, i've just used the vim ui for the first time and i agreee, its sluggish, > searching for * takes a while. It's not the vim ui that's the bottleneck. The underlying notmuch search command is slow. I gave examples in my last e-mail: >> notmuch search tag:foo took 0.5 seconds for 32 threads >> >> notmuch search foo took 6.4 seconds for 130 threads > > (when my e-mail files are not already in the disk cache) > > > > I saw on my activity monitor applet that it was using mostly i/o, > > and started to wonder if it was opening every e-mail. I little work > > with strace and searching revealed that this command was opening > > many many e-mail files from my maildir(s). I spent a little while > > digging around in the notmuch source, and didn't see where it was > > opening the email files. > > I cannot reproduce this. I'm no expert, but at least the output of > strace vim -c ":NotMuch" 2>log :NotMuch just shows the mailboxes. that's fast. It's showing the contents that is slow. And it's not vim, it's because the notmuch command is slow. Try this in a terminal: strace notmuch search tag:flagged 2>&1 | grep 'open(.*/cur/' Of course change the tag if you don't have flagged messages. > A quick browse through notmuch.vim tells me that > 1) it doesn't use notmuchs json output, i think it should, as iirc > this api is considered 'more stable' and is easier to parse than the > default output. More importantly, I like this idea. I did some work earlier on improving the message parsing in the vim ui. Might be better to use the json. I'll look into json parsing in vim. > 2) the output of notmuch is copied into a list. This will of course > be slow if your query matches a lot of messages. Could this be done > by asyncronously writing to the buffer somehow? I'm pretty sure vim doesn't do asynchronous anything. That came up in a vim vs emacs article I read. Take care, - Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-11 19:07 Slowness (search opens every email file?) Jason Woofenden 2011-07-11 21:58 ` Patrick Totzke @ 2011-07-11 22:13 ` Austin Clements 2011-07-12 20:24 ` Jason Woofenden 1 sibling, 1 reply; 11+ messages in thread From: Austin Clements @ 2011-07-11 22:13 UTC (permalink / raw) To: Jason Woofenden; +Cc: Notmuch Mail On Mon, Jul 11, 2011 at 3:07 PM, Jason Woofenden <jason@jasonwoof.com> wrote: > notmuch search tag:foo is slow! > > (when my e-mail files are not already in the disk cache) > > I saw on my activity monitor applet that it was using mostly i/o, > and started to wonder if it was opening every e-mail. I little work > with strace and searching revealed that this command was opening > many many e-mail files from my maildir(s). I spent a little while > digging around in the notmuch source, and didn't see where it was > opening the email files. It is opening every file to get a few headers to display in the search output. Istvan Marko sent an experimental patch to store these headers in the database a while ago, though as far as I know there hasn't been any progress cleaning it up for inclusion: id:"m3sjsv2kw2.fsf@zsu.kismala.com" . ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-11 22:13 ` Austin Clements @ 2011-07-12 20:24 ` Jason Woofenden 2011-07-12 20:35 ` Get mboxes from mailman [Was: Slowness (search opens every email file?)] Uwe Kleine-König 2011-07-13 0:50 ` Slowness (search opens every email file?) Austin Clements 0 siblings, 2 replies; 11+ messages in thread From: Jason Woofenden @ 2011-07-12 20:24 UTC (permalink / raw) To: Notmuch Mail On 2011-07-11 06:13PM, Austin Clements wrote: > On Mon, Jul 11, 2011 at 3:07 PM, Jason Woofenden <jason@jasonwoof.com> wrote: > > notmuch search tag:foo is slow! > > > > (when my e-mail files are not already in the disk cache) > > > > I saw on my activity monitor applet that it was using mostly i/o, > > and started to wonder if it was opening every e-mail. I little work > > with strace and searching revealed that this command was opening > > many many e-mail files from my maildir(s). I spent a little while > > digging around in the notmuch source, and didn't see where it was > > opening the email files. > > It is opening every file to get a few headers to display in the search > output. Istvan Marko sent an experimental patch to store these > headers in the database a while ago, though as far as I know there > hasn't been any progress cleaning it up for inclusion: > id:"m3sjsv2kw2.fsf@zsu.kismala.com" . Cool. I suspected it was reading for header files. I googled the id and found this patchwork link: http://patchwork.notmuchmail.org/patch/947/ (I didn't see any way to ask mailman for a message id.) Does someone want to work at this soon or should I try my hand at it? Thanks, - Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Get mboxes from mailman [Was: Slowness (search opens every email file?)] 2011-07-12 20:24 ` Jason Woofenden @ 2011-07-12 20:35 ` Uwe Kleine-König 2011-07-13 0:50 ` Slowness (search opens every email file?) Austin Clements 1 sibling, 0 replies; 11+ messages in thread From: Uwe Kleine-König @ 2011-07-12 20:35 UTC (permalink / raw) To: Notmuch Mail Hello, On Tue, Jul 12, 2011 at 04:24:59PM -0400, Jason Woofenden wrote: > (I didn't see any way to ask mailman for a message id.) It's possible to let mailman offer mbox downloads parallel to the "Gzip'd Text" files. You need to set PUBLIC_MBOX = Yes; in your mailman config and restart mailman (i.e. mailmanctl restart). Best regards Uwe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-12 20:24 ` Jason Woofenden 2011-07-12 20:35 ` Get mboxes from mailman [Was: Slowness (search opens every email file?)] Uwe Kleine-König @ 2011-07-13 0:50 ` Austin Clements 2011-07-13 1:55 ` Istvan Marko 1 sibling, 1 reply; 11+ messages in thread From: Austin Clements @ 2011-07-13 0:50 UTC (permalink / raw) To: Notmuch Mail, Istvan Marko On Tue, Jul 12, 2011 at 4:24 PM, Jason Woofenden <jason@jasonwoof.com> wrote: > On 2011-07-11 06:13PM, Austin Clements wrote: >> On Mon, Jul 11, 2011 at 3:07 PM, Jason Woofenden <jason@jasonwoof.com> wrote: >> > notmuch search tag:foo is slow! >> > >> > (when my e-mail files are not already in the disk cache) >> > >> > I saw on my activity monitor applet that it was using mostly i/o, >> > and started to wonder if it was opening every e-mail. I little work >> > with strace and searching revealed that this command was opening >> > many many e-mail files from my maildir(s). I spent a little while >> > digging around in the notmuch source, and didn't see where it was >> > opening the email files. >> >> It is opening every file to get a few headers to display in the search >> output. Istvan Marko sent an experimental patch to store these >> headers in the database a while ago, though as far as I know there >> hasn't been any progress cleaning it up for inclusion: >> id:"m3sjsv2kw2.fsf@zsu.kismala.com" . > > Cool. I suspected it was reading for header files. > > I googled the id and found this patchwork link: > > http://patchwork.notmuchmail.org/patch/947/ > > (I didn't see any way to ask mailman for a message id.) Here's the full thread from Nabble: http://notmuch.198994.n3.nabble.com/storing-From-and-Subject-in-xapian-td2901262.html > Does someone want to work at this soon or should I try my hand at > it? Istvan, did you make any progress on this patch since the last version? I seem to recall it just needed general cleanup (code style and such) and a better answer for backwards compatibility (the unfortunate " " thing). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-13 0:50 ` Slowness (search opens every email file?) Austin Clements @ 2011-07-13 1:55 ` Istvan Marko 2011-07-13 2:22 ` Austin Clements 0 siblings, 1 reply; 11+ messages in thread From: Istvan Marko @ 2011-07-13 1:55 UTC (permalink / raw) To: Austin Clements; +Cc: Notmuch Mail [-- Attachment #1: Type: text/plain, Size: 654 bytes --] Austin Clements <amdragon-3s7WtUTddSA@public.gmane.org> writes: > Istvan, did you make any progress on this patch since the last > version? I seem to recall it just needed general cleanup (code style > and such) and a better answer for backwards compatibility (the > unfortunate " " thing). I have been using the version that encodes empty headers to " " but another way to handle this is to simply not set a VALUE for empty headers and then fall back to the original parsing method for these messages. Emails without from/subject/message-id headers are not very common so perhaps this is a good compromise. Below is the patch without the " " hack. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: notmuch-value2.patch --] [-- Type: text/x-patch, Size: 3026 bytes --] diff --git a/lib/database.cc b/lib/database.cc index 9c2f4ec..63a15bb 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1654,7 +1654,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, goto DONE; date = notmuch_message_file_get_header (message_file, "date"); - _notmuch_message_set_date (message, date); + _notmuch_message_set_header_values (message, date, from, subject); _notmuch_message_index_file (message, filename); } else { diff --git a/lib/message.cc b/lib/message.cc index d993cde..48a31f5 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -410,6 +410,20 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) const char * notmuch_message_get_header (notmuch_message_t *message, const char *header) { + std::string value; + + // fetch header from the appropriate xapian value field if available + if (strcmp(header,"from") == 0) + value=message->doc.get_value(NOTMUCH_VALUE_FROM); + else if (strcmp(header,"subject") == 0) + value=message->doc.get_value (NOTMUCH_VALUE_SUBJECT); + else if (strcmp(header,"message-id") == 0) + value=message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); + + if (!value.empty()) + return talloc_strdup (message, value.c_str ()); + + // otherwise fall back to parsing the file _notmuch_message_ensure_message_file (message); if (message->message_file == NULL) return NULL; @@ -785,8 +799,10 @@ notmuch_message_set_author (notmuch_message_t *message, } void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date) +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject) { time_t time_value; @@ -799,6 +815,8 @@ _notmuch_message_set_date (notmuch_message_t *message, message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, Xapian::sortable_serialise (time_value)); + message->doc.add_value (NOTMUCH_VALUE_FROM, from); + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); } /* Synchronize changes made to message->doc out into the database. */ diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 02e24ee..2e91afd 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -111,7 +111,9 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE (1, 2); typedef enum { NOTMUCH_VALUE_TIMESTAMP = 0, - NOTMUCH_VALUE_MESSAGE_ID + NOTMUCH_VALUE_MESSAGE_ID, + NOTMUCH_VALUE_FROM, + NOTMUCH_VALUE_SUBJECT } notmuch_value_t; /* Xapian (with flint backend) complains if we provide a term longer @@ -287,9 +289,10 @@ void _notmuch_message_ensure_thread_id (notmuch_message_t *message); void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date); - +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject); void _notmuch_message_sync (notmuch_message_t *message); [-- Attachment #3: Type: text/plain, Size: 13 bytes --] -- Istvan ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-13 1:55 ` Istvan Marko @ 2011-07-13 2:22 ` Austin Clements 2011-07-13 3:07 ` Istvan Marko 0 siblings, 1 reply; 11+ messages in thread From: Austin Clements @ 2011-07-13 2:22 UTC (permalink / raw) To: Istvan Marko; +Cc: Notmuch Mail Quoth Istvan Marko on Jul 12 at 6:55 pm: > Austin Clements <amdragon-3s7WtUTddSA@public.gmane.org> writes: > > > Istvan, did you make any progress on this patch since the last > > version? I seem to recall it just needed general cleanup (code style > > and such) and a better answer for backwards compatibility (the > > unfortunate " " thing). > > I have been using the version that encodes empty headers to " " but > another way to handle this is to simply not set a VALUE for empty > headers and then fall back to the original parsing method for these > messages. Emails without from/subject/message-id headers are not very > common so perhaps this is a good compromise. > > Below is the patch without the " " hack. Ah, clever. I was going to suggest adding another value to indicate the presence or absence of these Xapian values, but I like your solution better. The only downside I can think of is that it might not extend to other headers if we store more header values in the database in the future. I'd say this patch looks good other than coding style - Tab indentation - /* */ comments, starting with a capital letter - Space between function name and open paren - Space after comma in argument lists - Spaces around assignment operator ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-13 2:22 ` Austin Clements @ 2011-07-13 3:07 ` Istvan Marko 2011-07-13 5:03 ` Austin Clements 0 siblings, 1 reply; 11+ messages in thread From: Istvan Marko @ 2011-07-13 3:07 UTC (permalink / raw) To: Austin Clements; +Cc: Notmuch Mail [-- Attachment #1: Type: text/plain, Size: 328 bytes --] Austin Clements <amdragon@MIT.EDU> writes: > I'd say this patch looks good other than coding style > - Tab indentation > - /* */ comments, starting with a capital letter > - Space between function name and open paren > - Space after comma in argument lists > - Spaces around assignment operator Thanks, fixed the ones I see: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: notmuch-value3.patch --] [-- Type: text/x-patch, Size: 3035 bytes --] diff --git a/lib/database.cc b/lib/database.cc index 9c2f4ec..63a15bb 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1654,7 +1654,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, goto DONE; date = notmuch_message_file_get_header (message_file, "date"); - _notmuch_message_set_date (message, date); + _notmuch_message_set_header_values (message, date, from, subject); _notmuch_message_index_file (message, filename); } else { diff --git a/lib/message.cc b/lib/message.cc index d993cde..55070c6 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -410,6 +410,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) const char * notmuch_message_get_header (notmuch_message_t *message, const char *header) { + std::string value; + + /* Fetch header from the appropriate xapian value field if + * available */ + if (strcmp(header, "from") == 0) + value = message->doc.get_value(NOTMUCH_VALUE_FROM); + else if (strcmp(header, "subject") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); + else if (strcmp(header, "message-id") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); + + if (!value.empty()) + return talloc_strdup (message, value.c_str ()); + + /* Otherwise fall back to parsing the file */ _notmuch_message_ensure_message_file (message); if (message->message_file == NULL) return NULL; @@ -785,8 +800,10 @@ notmuch_message_set_author (notmuch_message_t *message, } void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date) +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject) { time_t time_value; @@ -799,6 +816,8 @@ _notmuch_message_set_date (notmuch_message_t *message, message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, Xapian::sortable_serialise (time_value)); + message->doc.add_value (NOTMUCH_VALUE_FROM, from); + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); } /* Synchronize changes made to message->doc out into the database. */ diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 02e24ee..2e91afd 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -111,7 +111,9 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE (1, 2); typedef enum { NOTMUCH_VALUE_TIMESTAMP = 0, - NOTMUCH_VALUE_MESSAGE_ID + NOTMUCH_VALUE_MESSAGE_ID, + NOTMUCH_VALUE_FROM, + NOTMUCH_VALUE_SUBJECT } notmuch_value_t; /* Xapian (with flint backend) complains if we provide a term longer @@ -287,9 +289,10 @@ void _notmuch_message_ensure_thread_id (notmuch_message_t *message); void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date); - +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject); void _notmuch_message_sync (notmuch_message_t *message); [-- Attachment #3: Type: text/plain, Size: 13 bytes --] -- Istvan ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Slowness (search opens every email file?) 2011-07-13 3:07 ` Istvan Marko @ 2011-07-13 5:03 ` Austin Clements 0 siblings, 0 replies; 11+ messages in thread From: Austin Clements @ 2011-07-13 5:03 UTC (permalink / raw) To: Istvan Marko; +Cc: Notmuch Mail Quoth Istvan Marko on Jul 12 at 8:07 pm: > Austin Clements <amdragon@MIT.EDU> writes: > > > I'd say this patch looks good other than coding style > > - Tab indentation > > - /* */ comments, starting with a capital letter > > - Space between function name and open paren > > - Space after comma in argument lists > > - Spaces around assignment operator > > Thanks, fixed the ones I see: + /* Fetch header from the appropriate xapian value field if + * available */ + if (strcmp(header, "from") == 0) + value = message->doc.get_value(NOTMUCH_VALUE_FROM); + else if (strcmp(header, "subject") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); + else if (strcmp(header, "message-id") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); The strcmp's should have a space before the paren, as should the first get_value. (Yeah, it's weird. Blame glib.) Also, it occurred to me that these should be strcasecmp's, since headers are case-insensitive. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-07-13 5:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-11 19:07 Slowness (search opens every email file?) Jason Woofenden 2011-07-11 21:58 ` Patrick Totzke 2011-07-12 20:10 ` Jason Woofenden 2011-07-11 22:13 ` Austin Clements 2011-07-12 20:24 ` Jason Woofenden 2011-07-12 20:35 ` Get mboxes from mailman [Was: Slowness (search opens every email file?)] Uwe Kleine-König 2011-07-13 0:50 ` Slowness (search opens every email file?) Austin Clements 2011-07-13 1:55 ` Istvan Marko 2011-07-13 2:22 ` Austin Clements 2011-07-13 3:07 ` Istvan Marko 2011-07-13 5:03 ` Austin Clements
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).