* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
@ 2011-11-06 21:07 ` Jani Nikula
2011-11-06 21:59 ` Daniel Schoepe
2011-11-06 22:01 ` Austin Clements
2011-11-06 21:41 ` Daniel Schoepe
` (3 subsequent siblings)
4 siblings, 2 replies; 12+ messages in thread
From: Jani Nikula @ 2011-11-06 21:07 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
>
> Search retrieves these headers for every message in the search
> results. Previously, this required opening and parsing every message
> file. Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
Hi, sounds good, but...
> Taking full advantage of this requires a database rebuild, but it will
> fall back to the old behavior for messages that do not have headers
> stored in the database.
...what's the most convenient way of rebuilding the database while
preserving my tags etc.? If this was merged, would an older version of
notmuch choke on the rebuilt database with these headers? (To me it
looks like it would be fine.)
BR,
Jani.
> ---
> lib/database.cc | 2 +-
> lib/message.cc | 23 +++++++++++++++++++++--
> lib/notmuch-private.h | 11 +++++++----
> 3 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index fa632f8..e4ef14e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> goto DONE;
>
> date = notmuch_message_file_get_header (message_file, "date");
> - _notmuch_message_set_date (message, date);
> + _notmuch_message_set_header_values (message, date, from, subject);
>
> _notmuch_message_index_file (message, filename);
> } else {
> diff --git a/lib/message.cc b/lib/message.cc
> index 8f22e02..ca7fbf2 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
> const char *
> notmuch_message_get_header (notmuch_message_t *message, const char *header)
> {
> + std::string value;
> +
> + /* Fetch header from the appropriate xapian value field if
> + * available */
> + if (strcasecmp (header, "from") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_FROM);
> + else if (strcasecmp (header, "subject") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
> + else if (strcasecmp (header, "message-id") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);
> +
> + if (!value.empty())
> + return talloc_strdup (message, value.c_str ());
> +
> + /* Otherwise fall back to parsing the file */
> _notmuch_message_ensure_message_file (message);
> if (message->message_file == NULL)
> return NULL;
> @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message,
> }
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date)
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject)
> {
> time_t time_value;
>
> @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message,
>
> message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
> Xapian::sortable_serialise (time_value));
> + message->doc.add_value (NOTMUCH_VALUE_FROM, from);
> + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> }
>
> /* Synchronize changes made to message->doc out into the database. */
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 0d3cc27..60a932f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS
>
> typedef enum {
> NOTMUCH_VALUE_TIMESTAMP = 0,
> - NOTMUCH_VALUE_MESSAGE_ID
> + NOTMUCH_VALUE_MESSAGE_ID,
> + NOTMUCH_VALUE_FROM,
> + NOTMUCH_VALUE_SUBJECT
> } notmuch_value_t;
>
> /* Xapian (with flint backend) complains if we provide a term longer
> @@ -269,9 +271,10 @@ void
> _notmuch_message_ensure_thread_id (notmuch_message_t *message);
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date);
> -
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject);
> void
> _notmuch_message_sync (notmuch_message_t *message);
>
> --
> 1.7.2.3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 21:07 ` Jani Nikula
@ 2011-11-06 21:59 ` Daniel Schoepe
2011-11-06 22:01 ` Austin Clements
1 sibling, 0 replies; 12+ messages in thread
From: Daniel Schoepe @ 2011-11-06 21:59 UTC (permalink / raw)
To: Jani Nikula, Austin Clements, notmuch; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 441 bytes --]
On Sun, 06 Nov 2011 23:07:51 +0200, Jani Nikula <jani@nikula.org> wrote:
> ...what's the most convenient way of rebuilding the database while
> preserving my tags etc.? If this was merged, would an older version of
> notmuch choke on the rebuilt database with these headers? (To me it
> looks like it would be fine.)
Here's what I did:
notmuch dump > tags.db
rm -rf ~/Maildir/.notmuch
notmuch new
notmuch restore < tags.db
Cheers,
Daniel
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 21:07 ` Jani Nikula
2011-11-06 21:59 ` Daniel Schoepe
@ 2011-11-06 22:01 ` Austin Clements
2011-11-06 22:30 ` Jani Nikula
1 sibling, 1 reply; 12+ messages in thread
From: Austin Clements @ 2011-11-06 22:01 UTC (permalink / raw)
To: Jani Nikula; +Cc: notmuch, notmuch
On Sun, Nov 6, 2011 at 4:07 PM, Jani Nikula <jani@nikula.org> wrote:
> On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
>> Taking full advantage of this requires a database rebuild, but it will
>> fall back to the old behavior for messages that do not have headers
>> stored in the database.
>
> ...what's the most convenient way of rebuilding the database while
> preserving my tags etc.? If this was merged, would an older version of
> notmuch choke on the rebuilt database with these headers? (To me it
> looks like it would be fine.)
The standard way to rebuild the database is to do a notmuch dump, move
.notmuch out of the way, notmuch new, then notmuch restore. Some day
this process should be made automatic.
Old versions of notmuch will be blissfully unaware of the new headers
stored in the database. They can even safely add messages to an
upgraded database without breaking new versions of notmuch.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 22:01 ` Austin Clements
@ 2011-11-06 22:30 ` Jani Nikula
0 siblings, 0 replies; 12+ messages in thread
From: Jani Nikula @ 2011-11-06 22:30 UTC (permalink / raw)
To: Austin Clements; +Cc: notmuch, notmuch
On Sun, 6 Nov 2011 17:01:14 -0500, Austin Clements <amdragon@mit.edu> wrote:
> On Sun, Nov 6, 2011 at 4:07 PM, Jani Nikula <jani@nikula.org> wrote:
> > On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> >> Taking full advantage of this requires a database rebuild, but it will
> >> fall back to the old behavior for messages that do not have headers
> >> stored in the database.
> >
> > ...what's the most convenient way of rebuilding the database while
> > preserving my tags etc.? If this was merged, would an older version of
> > notmuch choke on the rebuilt database with these headers? (To me it
> > looks like it would be fine.)
>
> The standard way to rebuild the database is to do a notmuch dump, move
> .notmuch out of the way, notmuch new, then notmuch restore. Some day
> this process should be made automatic.
>
> Old versions of notmuch will be blissfully unaware of the new headers
> stored in the database. They can even safely add messages to an
> upgraded database without breaking new versions of notmuch.
Hi, I ran a quick test with/without the patch. I don't have much mail,
but on my aging laptop the performance increase is significant. See
below. 'du -h' on the .notmuch dir increased from 82M to 83M with the
patch, IMHO well worth it.
BR,
Jani.
WITHOUT THE PATCH:
$ sudo bash -c "/bin/sync; /bin/echo 3 > /proc/sys/vm/drop_caches"
$ time notmuch search "*" | wc -l
8167
real 0m43.216s
user 0m3.860s
sys 0m2.268s
$ time notmuch search "*" | wc -l
8167
real 0m2.762s
user 0m2.196s
sys 0m0.564s
WITH THE PATCH:
$ sudo bash -c "/bin/sync; /bin/echo 3 > /proc/sys/vm/drop_caches"
$ time notmuch search "*" | wc -l
8167
real 0m8.019s
user 0m2.088s
sys 0m0.720s
$ time notmuch search "*" | wc -l
8167
real 0m2.033s
user 0m1.592s
sys 0m0.440s
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
2011-11-06 21:07 ` Jani Nikula
@ 2011-11-06 21:41 ` Daniel Schoepe
2011-11-11 1:33 ` Pieter Praet
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Daniel Schoepe @ 2011-11-06 21:41 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 474 bytes --]
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> Search retrieves these headers for every message in the search
> results. Previously, this required opening and parsing every message
> file. Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
Just tried the patch and I can confirm that, after rebuilding the
database, it makes searches a lot faster.
Cheers,
Daniel
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
2011-11-06 21:07 ` Jani Nikula
2011-11-06 21:41 ` Daniel Schoepe
@ 2011-11-11 1:33 ` Pieter Praet
2011-11-11 1:38 ` Pieter Praet
2011-11-14 6:34 ` Jameson Graef Rollins
2011-11-14 23:19 ` David Bremner
4 siblings, 1 reply; 12+ messages in thread
From: Pieter Praet @ 2011-11-11 1:33 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
>
Fantastic performance improvement Austin! This should be merged in ASAP.
BTW, compacting the db from time to time also has a significant impact:
Running:
$ du -h .notmuch
$ sync && sudo /sbin/sysctl vm.drop_caches=3
$ time notmuch search "*" | wc -l
On:
1 - original database, compacted some time ago
2 - fresh database generated before patching, non-compacted
3 - fresh database generated after patching, non-compacted
4 - fresh database generated after patching, compacted with
$ mv .notmuch/xapian .notmuch/xapian-fat
$ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian
Results:
| db | 1 | 2 | 3 | 4 |
|---------+-----------+----------+-----------+-----------|
| db size | 272M | 289M | 291M | 172M |
| amount | 9536 | 9540 | 9540 | 9540 |
|---------+-----------+----------+-----------+-----------|
| real | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s |
| user | 0m8.379s | 0m8.133s | 0m4.043s | 0m3.353s |
| sys | 0m5.216s | 0m4.933s | 0m1.530s | 0m1.000s |
> Search retrieves these headers for every message in the search
> results. Previously, this required opening and parsing every message
> file. Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
>
> Taking full advantage of this requires a database rebuild, but it will
> fall back to the old behavior for messages that do not have headers
> stored in the database.
> ---
> lib/database.cc | 2 +-
> lib/message.cc | 23 +++++++++++++++++++++--
> lib/notmuch-private.h | 11 +++++++----
> 3 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index fa632f8..e4ef14e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> goto DONE;
>
> date = notmuch_message_file_get_header (message_file, "date");
> - _notmuch_message_set_date (message, date);
> + _notmuch_message_set_header_values (message, date, from, subject);
>
> _notmuch_message_index_file (message, filename);
> } else {
> diff --git a/lib/message.cc b/lib/message.cc
> index 8f22e02..ca7fbf2 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
> const char *
> notmuch_message_get_header (notmuch_message_t *message, const char *header)
> {
> + std::string value;
> +
> + /* Fetch header from the appropriate xapian value field if
> + * available */
> + if (strcasecmp (header, "from") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_FROM);
> + else if (strcasecmp (header, "subject") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
> + else if (strcasecmp (header, "message-id") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);
> +
> + if (!value.empty())
> + return talloc_strdup (message, value.c_str ());
> +
> + /* Otherwise fall back to parsing the file */
> _notmuch_message_ensure_message_file (message);
> if (message->message_file == NULL)
> return NULL;
> @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message,
> }
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date)
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject)
> {
> time_t time_value;
>
> @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message,
>
> message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
> Xapian::sortable_serialise (time_value));
> + message->doc.add_value (NOTMUCH_VALUE_FROM, from);
> + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> }
>
> /* Synchronize changes made to message->doc out into the database. */
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 0d3cc27..60a932f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS
>
> typedef enum {
> NOTMUCH_VALUE_TIMESTAMP = 0,
> - NOTMUCH_VALUE_MESSAGE_ID
> + NOTMUCH_VALUE_MESSAGE_ID,
> + NOTMUCH_VALUE_FROM,
> + NOTMUCH_VALUE_SUBJECT
> } notmuch_value_t;
>
> /* Xapian (with flint backend) complains if we provide a term longer
> @@ -269,9 +271,10 @@ void
> _notmuch_message_ensure_thread_id (notmuch_message_t *message);
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date);
> -
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject);
> void
> _notmuch_message_sync (notmuch_message_t *message);
>
> --
> 1.7.2.3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
Peace
--
Pieter
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-11 1:33 ` Pieter Praet
@ 2011-11-11 1:38 ` Pieter Praet
2011-11-11 3:00 ` Austin Clements
0 siblings, 1 reply; 12+ messages in thread
From: Pieter Praet @ 2011-11-11 1:38 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
On Fri, 11 Nov 2011 02:33:38 +0100, Pieter Praet <pieter@praet.org> wrote:
> On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> > This is a rebase and cleanup of Istvan Marko's patch from
> > id:m3pqnj2j7a.fsf@zsu.kismala.com
> >
>
> Fantastic performance improvement Austin! [...]
... and Istvan Marko, of course! Thanks!
Peace
--
Pieter
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-11 1:38 ` Pieter Praet
@ 2011-11-11 3:00 ` Austin Clements
0 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2011-11-11 3:00 UTC (permalink / raw)
To: Pieter Praet; +Cc: notmuch, notmuch
Quoth Pieter Praet on Nov 11 at 2:38 am:
> On Fri, 11 Nov 2011 02:33:38 +0100, Pieter Praet <pieter@praet.org> wrote:
> > On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> > > This is a rebase and cleanup of Istvan Marko's patch from
> > > id:m3pqnj2j7a.fsf@zsu.kismala.com
> > >
> >
> > Fantastic performance improvement Austin! [...]
>
> ... and Istvan Marko, of course! Thanks!
Yes. This is really Istvan's patch. I just dug it out of the
archives and cleaned up some whitespace.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
` (2 preceding siblings ...)
2011-11-11 1:33 ` Pieter Praet
@ 2011-11-14 6:34 ` Jameson Graef Rollins
2011-11-14 23:19 ` David Bremner
4 siblings, 0 replies; 12+ messages in thread
From: Jameson Graef Rollins @ 2011-11-14 6:34 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 799 bytes --]
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
>
> Search retrieves these headers for every message in the search
> results. Previously, this required opening and parsing every message
> file. Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
Hey, Austin. This is a very nice patch. Short and sweet, a really nice
performance improvement, and a nice gentle fallback.
I just rebuilt my database and I can definitely see the improvements.
Search results are incredibly snappy, and the resultant database is only
about 8% bigger.
I fully endorse this being pushed.
jamie.
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Store "from" and "subject" headers in the database.
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
` (3 preceding siblings ...)
2011-11-14 6:34 ` Jameson Graef Rollins
@ 2011-11-14 23:19 ` David Bremner
2011-11-15 1:15 ` [PATCH] news: " Austin Clements
4 siblings, 1 reply; 12+ messages in thread
From: David Bremner @ 2011-11-14 23:19 UTC (permalink / raw)
To: Austin Clements, notmuch; +Cc: notmuch
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
>
Pushed. Would you mind making a NEWS patch?
d
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] news: Store "from" and "subject" headers in the database.
2011-11-14 23:19 ` David Bremner
@ 2011-11-15 1:15 ` Austin Clements
0 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2011-11-15 1:15 UTC (permalink / raw)
To: notmuch
---
NEWS | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/NEWS b/NEWS
index 71c7c9a..88f7b20 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,21 @@ Add search terms to "notmuch dump"
search/show/tag. The output file argument of dump is deprecated in
favour of using stdout.
+Optimizations
+-------------
+
+Search avoids opening and parsing message files
+
+ We now store more information in the database so search no longer
+ has to open every message file to get basic headers. This can
+ improve search speed by as much as 10X, but taking advantage of this
+ requires a database rebuild:
+
+ notmuch dump > notmuch.dump
+ # Backup, then remove notmuch database ($MAIL/.notmuch)
+ notmuch new
+ notmuch restore notmuch.dump
+
Notmuch 0.9 (2011-10-01)
========================
--
1.7.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread