From: Pieter Praet <pieter@praet.org>
To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
Cc: notmuch@kismala.com
Subject: Re: [PATCH] Store "from" and "subject" headers in the database.
Date: Fri, 11 Nov 2011 02:33:38 +0100 [thread overview]
Message-ID: <87obwjtpcd.fsf@praet.org> (raw)
In-Reply-To: <1320599856-24078-1-git-send-email-amdragon@mit.edu>
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
>
Fantastic performance improvement Austin! This should be merged in ASAP.
BTW, compacting the db from time to time also has a significant impact:
Running:
$ du -h .notmuch
$ sync && sudo /sbin/sysctl vm.drop_caches=3
$ time notmuch search "*" | wc -l
On:
1 - original database, compacted some time ago
2 - fresh database generated before patching, non-compacted
3 - fresh database generated after patching, non-compacted
4 - fresh database generated after patching, compacted with
$ mv .notmuch/xapian .notmuch/xapian-fat
$ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian
Results:
| db | 1 | 2 | 3 | 4 |
|---------+-----------+----------+-----------+-----------|
| db size | 272M | 289M | 291M | 172M |
| amount | 9536 | 9540 | 9540 | 9540 |
|---------+-----------+----------+-----------+-----------|
| real | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s |
| user | 0m8.379s | 0m8.133s | 0m4.043s | 0m3.353s |
| sys | 0m5.216s | 0m4.933s | 0m1.530s | 0m1.000s |
> Search retrieves these headers for every message in the search
> results. Previously, this required opening and parsing every message
> file. Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
>
> Taking full advantage of this requires a database rebuild, but it will
> fall back to the old behavior for messages that do not have headers
> stored in the database.
> ---
> lib/database.cc | 2 +-
> lib/message.cc | 23 +++++++++++++++++++++--
> lib/notmuch-private.h | 11 +++++++----
> 3 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index fa632f8..e4ef14e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> goto DONE;
>
> date = notmuch_message_file_get_header (message_file, "date");
> - _notmuch_message_set_date (message, date);
> + _notmuch_message_set_header_values (message, date, from, subject);
>
> _notmuch_message_index_file (message, filename);
> } else {
> diff --git a/lib/message.cc b/lib/message.cc
> index 8f22e02..ca7fbf2 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
> const char *
> notmuch_message_get_header (notmuch_message_t *message, const char *header)
> {
> + std::string value;
> +
> + /* Fetch header from the appropriate xapian value field if
> + * available */
> + if (strcasecmp (header, "from") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_FROM);
> + else if (strcasecmp (header, "subject") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
> + else if (strcasecmp (header, "message-id") == 0)
> + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);
> +
> + if (!value.empty())
> + return talloc_strdup (message, value.c_str ());
> +
> + /* Otherwise fall back to parsing the file */
> _notmuch_message_ensure_message_file (message);
> if (message->message_file == NULL)
> return NULL;
> @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message,
> }
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date)
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject)
> {
> time_t time_value;
>
> @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message,
>
> message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
> Xapian::sortable_serialise (time_value));
> + message->doc.add_value (NOTMUCH_VALUE_FROM, from);
> + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> }
>
> /* Synchronize changes made to message->doc out into the database. */
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 0d3cc27..60a932f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS
>
> typedef enum {
> NOTMUCH_VALUE_TIMESTAMP = 0,
> - NOTMUCH_VALUE_MESSAGE_ID
> + NOTMUCH_VALUE_MESSAGE_ID,
> + NOTMUCH_VALUE_FROM,
> + NOTMUCH_VALUE_SUBJECT
> } notmuch_value_t;
>
> /* Xapian (with flint backend) complains if we provide a term longer
> @@ -269,9 +271,10 @@ void
> _notmuch_message_ensure_thread_id (notmuch_message_t *message);
>
> void
> -_notmuch_message_set_date (notmuch_message_t *message,
> - const char *date);
> -
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> + const char *date,
> + const char *from,
> + const char *subject);
> void
> _notmuch_message_sync (notmuch_message_t *message);
>
> --
> 1.7.2.3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
Peace
--
Pieter
next prev parent reply other threads:[~2011-11-11 1:34 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-06 17:17 [PATCH] Store "from" and "subject" headers in the database Austin Clements
2011-11-06 21:07 ` Jani Nikula
2011-11-06 21:59 ` Daniel Schoepe
2011-11-06 22:01 ` Austin Clements
2011-11-06 22:30 ` Jani Nikula
2011-11-06 21:41 ` Daniel Schoepe
2011-11-11 1:33 ` Pieter Praet [this message]
2011-11-11 1:38 ` Pieter Praet
2011-11-11 3:00 ` Austin Clements
2011-11-14 6:34 ` Jameson Graef Rollins
2011-11-14 23:19 ` David Bremner
2011-11-15 1:15 ` [PATCH] news: " Austin Clements
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87obwjtpcd.fsf@praet.org \
--to=pieter@praet.org \
--cc=amdragon@MIT.EDU \
--cc=notmuch@kismala.com \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).