From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pieter@praet.org> Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 3EFB9431FD0 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:34 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eFAtHlloWcft for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:33 -0800 (PST) Received: from mail-wy0-f181.google.com (mail-wy0-f181.google.com [74.125.82.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 20915431FB6 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:33 -0800 (PST) Received: by wyg8 with SMTP id 8so3761231wyg.26 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:32 -0800 (PST) Received: by 10.180.81.73 with SMTP id y9mr11590030wix.37.1320975271818; Thu, 10 Nov 2011 17:34:31 -0800 (PST) Received: from localhost (26.48-242-81.adsl-dyn.isp.belgacom.be. [81.242.48.26]) by mx.google.com with ESMTPS id co5sm5987687wib.8.2011.11.10.17.34.30 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 10 Nov 2011 17:34:31 -0800 (PST) From: Pieter Praet <pieter@praet.org> To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org Subject: Re: [PATCH] Store "from" and "subject" headers in the database. In-Reply-To: <1320599856-24078-1-git-send-email-amdragon@mit.edu> References: <1320599856-24078-1-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.9+76~g2fd88e6 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-unknown-linux-gnu) Date: Fri, 11 Nov 2011 02:33:38 +0100 Message-ID: <87obwjtpcd.fsf@praet.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: notmuch@kismala.com X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." <notmuch.notmuchmail.org> List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>, <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe> List-Archive: <http://notmuchmail.org/pipermail/notmuch> List-Post: <mailto:notmuch@notmuchmail.org> List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help> List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>, <mailto:notmuch-request@notmuchmail.org?subject=subscribe> X-List-Received-Date: Fri, 11 Nov 2011 01:34:34 -0000 On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > This is a rebase and cleanup of Istvan Marko's patch from > id:m3pqnj2j7a.fsf@zsu.kismala.com > Fantastic performance improvement Austin! This should be merged in ASAP. BTW, compacting the db from time to time also has a significant impact: Running: $ du -h .notmuch $ sync && sudo /sbin/sysctl vm.drop_caches=3 $ time notmuch search "*" | wc -l On: 1 - original database, compacted some time ago 2 - fresh database generated before patching, non-compacted 3 - fresh database generated after patching, non-compacted 4 - fresh database generated after patching, compacted with $ mv .notmuch/xapian .notmuch/xapian-fat $ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian Results: | db | 1 | 2 | 3 | 4 | |---------+-----------+----------+-----------+-----------| | db size | 272M | 289M | 291M | 172M | | amount | 9536 | 9540 | 9540 | 9540 | |---------+-----------+----------+-----------+-----------| | real | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s | | user | 0m8.379s | 0m8.133s | 0m4.043s | 0m3.353s | | sys | 0m5.216s | 0m4.933s | 0m1.530s | 0m1.000s | > Search retrieves these headers for every message in the search > results. Previously, this required opening and parsing every message > file. Storing them directly in the database significantly reduces IO > and computation, speeding up search by between 50% and 10X. > > Taking full advantage of this requires a database rebuild, but it will > fall back to the old behavior for messages that do not have headers > stored in the database. > --- > lib/database.cc | 2 +- > lib/message.cc | 23 +++++++++++++++++++++-- > lib/notmuch-private.h | 11 +++++++---- > 3 files changed, 29 insertions(+), 7 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index fa632f8..e4ef14e 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > goto DONE; > > date = notmuch_message_file_get_header (message_file, "date"); > - _notmuch_message_set_date (message, date); > + _notmuch_message_set_header_values (message, date, from, subject); > > _notmuch_message_index_file (message, filename); > } else { > diff --git a/lib/message.cc b/lib/message.cc > index 8f22e02..ca7fbf2 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) > const char * > notmuch_message_get_header (notmuch_message_t *message, const char *header) > { > + std::string value; > + > + /* Fetch header from the appropriate xapian value field if > + * available */ > + if (strcasecmp (header, "from") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_FROM); > + else if (strcasecmp (header, "subject") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); > + else if (strcasecmp (header, "message-id") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); > + > + if (!value.empty()) > + return talloc_strdup (message, value.c_str ()); > + > + /* Otherwise fall back to parsing the file */ > _notmuch_message_ensure_message_file (message); > if (message->message_file == NULL) > return NULL; > @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message, > } > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date) > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject) > { > time_t time_value; > > @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message, > > message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, > Xapian::sortable_serialise (time_value)); > + message->doc.add_value (NOTMUCH_VALUE_FROM, from); > + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > } > > /* Synchronize changes made to message->doc out into the database. */ > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > index 0d3cc27..60a932f 100644 > --- a/lib/notmuch-private.h > +++ b/lib/notmuch-private.h > @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS > > typedef enum { > NOTMUCH_VALUE_TIMESTAMP = 0, > - NOTMUCH_VALUE_MESSAGE_ID > + NOTMUCH_VALUE_MESSAGE_ID, > + NOTMUCH_VALUE_FROM, > + NOTMUCH_VALUE_SUBJECT > } notmuch_value_t; > > /* Xapian (with flint backend) complains if we provide a term longer > @@ -269,9 +271,10 @@ void > _notmuch_message_ensure_thread_id (notmuch_message_t *message); > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date); > - > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject); > void > _notmuch_message_sync (notmuch_message_t *message); > > -- > 1.7.2.3 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch Peace -- Pieter