From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pieter@praet.org>
Received: from localhost (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 3EFB9431FD0
	for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:34 -0800 (PST)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
X-Spam-Flag: NO
X-Spam-Score: -0.7
X-Spam-Level: 
X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
	tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
Received: from olra.theworths.org ([127.0.0.1])
	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id eFAtHlloWcft for <notmuch@notmuchmail.org>;
	Thu, 10 Nov 2011 17:34:33 -0800 (PST)
Received: from mail-wy0-f181.google.com (mail-wy0-f181.google.com
	[74.125.82.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(No client certificate requested)
	by olra.theworths.org (Postfix) with ESMTPS id 20915431FB6
	for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:33 -0800 (PST)
Received: by wyg8 with SMTP id 8so3761231wyg.26
	for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:32 -0800 (PST)
Received: by 10.180.81.73 with SMTP id y9mr11590030wix.37.1320975271818;
	Thu, 10 Nov 2011 17:34:31 -0800 (PST)
Received: from localhost (26.48-242-81.adsl-dyn.isp.belgacom.be.
	[81.242.48.26])
	by mx.google.com with ESMTPS id co5sm5987687wib.8.2011.11.10.17.34.30
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 10 Nov 2011 17:34:31 -0800 (PST)
From: Pieter Praet <pieter@praet.org>
To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
Subject: Re: [PATCH] Store "from" and "subject" headers in the database.
In-Reply-To: <1320599856-24078-1-git-send-email-amdragon@mit.edu>
References: <1320599856-24078-1-git-send-email-amdragon@mit.edu>
User-Agent: Notmuch/0.9+76~g2fd88e6 (http://notmuchmail.org) Emacs/23.3.1
	(x86_64-unknown-linux-gnu)
Date: Fri, 11 Nov 2011 02:33:38 +0100
Message-ID: <87obwjtpcd.fsf@praet.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: notmuch@kismala.com
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: "Use and development of the notmuch mail system."
	<notmuch.notmuchmail.org>
List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Nov 2011 01:34:34 -0000

On Sun,  6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This is a rebase and cleanup of Istvan Marko's patch from
> id:m3pqnj2j7a.fsf@zsu.kismala.com
> 

Fantastic performance improvement Austin!  This should be merged in ASAP.

BTW, compacting the db from time to time also has a significant impact:

Running:
  $ du -h .notmuch
  $ sync && sudo /sbin/sysctl vm.drop_caches=3
  $ time notmuch search "*" | wc -l

On:
  1 - original database, compacted some time ago
  2 - fresh database generated before patching, non-compacted
  3 - fresh database generated after patching, non-compacted
  4 - fresh database generated after patching, compacted with
      $ mv .notmuch/xapian .notmuch/xapian-fat
      $ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian

Results:
  | db      | 1         | 2        | 3         | 4         |
  |---------+-----------+----------+-----------+-----------|
  | db size | 272M      | 289M     | 291M      | 172M      |
  | amount  | 9536      | 9540     | 9540      | 9540      |
  |---------+-----------+----------+-----------+-----------|
  | real    | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s |
  | user    | 0m8.379s  | 0m8.133s | 0m4.043s  | 0m3.353s  |
  | sys     | 0m5.216s  | 0m4.933s | 0m1.530s  | 0m1.000s  |


> Search retrieves these headers for every message in the search
> results.  Previously, this required opening and parsing every message
> file.  Storing them directly in the database significantly reduces IO
> and computation, speeding up search by between 50% and 10X.
> 
> Taking full advantage of this requires a database rebuild, but it will
> fall back to the old behavior for messages that do not have headers
> stored in the database.
> ---
>  lib/database.cc       |    2 +-
>  lib/message.cc        |   23 +++++++++++++++++++++--
>  lib/notmuch-private.h |   11 +++++++----
>  3 files changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/database.cc b/lib/database.cc
> index fa632f8..e4ef14e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
>  		goto DONE;
>  
>  	    date = notmuch_message_file_get_header (message_file, "date");
> -	    _notmuch_message_set_date (message, date);
> +	    _notmuch_message_set_header_values (message, date, from, subject);
>  
>  	    _notmuch_message_index_file (message, filename);
>  	} else {
> diff --git a/lib/message.cc b/lib/message.cc
> index 8f22e02..ca7fbf2 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
>  const char *
>  notmuch_message_get_header (notmuch_message_t *message, const char *header)
>  {
> +    std::string value;
> +
> +    /* Fetch header from the appropriate xapian value field if
> +     * available */
> +    if (strcasecmp (header, "from") == 0)
> +	value = message->doc.get_value (NOTMUCH_VALUE_FROM);
> +    else if (strcasecmp (header, "subject") == 0)
> +	value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
> +    else if (strcasecmp (header, "message-id") == 0)
> +	value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);
> +
> +    if (!value.empty())
> +	return talloc_strdup (message, value.c_str ());
> +
> +    /* Otherwise fall back to parsing the file */
>      _notmuch_message_ensure_message_file (message);
>      if (message->message_file == NULL)
>  	return NULL;
> @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message,
>  }
>  
>  void
> -_notmuch_message_set_date (notmuch_message_t *message,
> -			   const char *date)
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> +				    const char *date,
> +				    const char *from,
> +				    const char *subject)
>  {
>      time_t time_value;
>  
> @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message,
>  
>      message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
>  			    Xapian::sortable_serialise (time_value));
> +    message->doc.add_value (NOTMUCH_VALUE_FROM, from);
> +    message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
>  }
>  
>  /* Synchronize changes made to message->doc out into the database. */
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 0d3cc27..60a932f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS
>  
>  typedef enum {
>      NOTMUCH_VALUE_TIMESTAMP = 0,
> -    NOTMUCH_VALUE_MESSAGE_ID
> +    NOTMUCH_VALUE_MESSAGE_ID,
> +    NOTMUCH_VALUE_FROM,
> +    NOTMUCH_VALUE_SUBJECT
>  } notmuch_value_t;
>  
>  /* Xapian (with flint backend) complains if we provide a term longer
> @@ -269,9 +271,10 @@ void
>  _notmuch_message_ensure_thread_id (notmuch_message_t *message);
>  
>  void
> -_notmuch_message_set_date (notmuch_message_t *message,
> -			   const char *date);
> -
> +_notmuch_message_set_header_values (notmuch_message_t *message,
> +				    const char *date,
> +				    const char *from,
> +				    const char *subject);
>  void
>  _notmuch_message_sync (notmuch_message_t *message);
>  
> -- 
> 1.7.2.3
> 
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch


Peace

-- 
Pieter