unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Fetch all message metadata in a single pass
@ 2010-12-09 20:59 Austin Clements
  2010-12-09 20:59 ` [PATCH 1/5] Use a single unified pass to fetch scalar message metadata Austin Clements
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch

This is the second of the two optimizations I described a while ago,
and brings my inbox search down to 1.811 seconds, 2.5X faster than it
was originally.

This optimization is based on the observation that Xapian decompresses
a document's term list every time you iterate over it.  As a result,
notmuch can decompress the beginning of a single term list quite a few
times.  This patch series combines all of this into a single pass that
is only slightly more expensive than fetching one metadata field used
to be, but offers a huge win in the common case where a message object
is used for multiple metadata fields.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] Use a single unified pass to fetch scalar message metadata.
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
@ 2010-12-09 20:59 ` Austin Clements
  2010-12-09 20:59 ` [PATCH 2/5] Implement an internal generic string list and use it Austin Clements
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch; +Cc: Austin Clements

This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each.  Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.

This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
---
 lib/message.cc |  197 ++++++++++++++++++++++++++++----------------------------
 1 files changed, 98 insertions(+), 99 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index adcd07d..d6ab636 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -254,41 +254,106 @@ _notmuch_message_create_for_message_id (notmuch_database_t *notmuch,
     return message;
 }
 
-unsigned int
-_notmuch_message_get_doc_id (notmuch_message_t *message)
-{
-    return message->doc_id;
-}
-
-const char *
-notmuch_message_get_message_id (notmuch_message_t *message)
+static char *
+_notmuch_message_get_term (notmuch_message_t *message,
+			   Xapian::TermIterator &i, Xapian::TermIterator &end,
+			   const char *prefix)
 {
-    Xapian::TermIterator i;
+    int prefix_len = strlen (prefix);
+    const char *term = NULL;
+    char *value;
 
-    if (message->message_id)
-	return message->message_id;
+    i.skip_to (prefix);
 
-    i = message->doc.termlist_begin ();
-    i.skip_to (_find_prefix ("id"));
+    if (i != end)
+	term = (*i).c_str ();
 
-    if (i == message->doc.termlist_end ())
-	INTERNAL_ERROR ("Message with document ID of %d has no message ID.\n",
-			message->doc_id);
+    if (!term || strncmp (term, prefix, prefix_len))
+	return NULL;
 
-    message->message_id = talloc_strdup (message, (*i).c_str () + 1);
+    value = talloc_strdup (message, term + prefix_len);
 
 #if DEBUG_DATABASE_SANITY
     i++;
 
-    if (i != message->doc.termlist_end () &&
-	strncmp ((*i).c_str (), _find_prefix ("id"),
-		 strlen (_find_prefix ("id"))) == 0)
-    {
-	INTERNAL_ERROR ("Mail (doc_id: %d) has duplicate message IDs",
-			message->doc_id);
+    if (i != end && strncmp ((*i).c_str (), prefix, prefix_len) == 0) {
+	INTERNAL_ERROR ("Mail (doc_id: %d) has duplicate %s terms: %s and %s\n",
+			message->doc_id, prefix, value,
+			(*i).c_str () + prefix_len);
     }
 #endif
 
+    return value;
+}
+
+void
+_notmuch_message_ensure_metadata (notmuch_message_t *message)
+{
+    Xapian::TermIterator i, end;
+    const char *thread_prefix = _find_prefix ("thread"),
+	*id_prefix = _find_prefix ("id"),
+	*replyto_prefix = _find_prefix ("replyto");
+
+    /* We do this all in a single pass because Xapian decompresses the
+     * term list every time you iterate over it.  Thus, while this is
+     * slightly more costly than looking up individual fields if only
+     * one field of the message object is actually used, it's a huge
+     * win as more fields are used. */
+
+    i = message->doc.termlist_begin ();
+    end = message->doc.termlist_end ();
+
+    /* Get thread */
+    if (!message->thread_id)
+	message->thread_id =
+	    _notmuch_message_get_term (message, i, end, thread_prefix);
+
+    /* Get id */
+    assert (strcmp (thread_prefix, id_prefix) < 0);
+    if (!message->message_id)
+	message->message_id =
+	    _notmuch_message_get_term (message, i, end, id_prefix);
+
+    /* Get reply to */
+    assert (strcmp (id_prefix, replyto_prefix) < 0);
+    if (!message->in_reply_to)
+	message->in_reply_to =
+	    _notmuch_message_get_term (message, i, end, replyto_prefix);
+    /* It's perfectly valid for a message to have no In-Reply-To
+     * header. For these cases, we return an empty string. */
+    if (!message->in_reply_to)
+	message->in_reply_to = talloc_strdup (message, "");
+}
+
+static void
+_notmuch_message_invalidate_metadata (notmuch_message_t *message,
+				      const char *prefix_name)
+{
+    if (strcmp ("thread", prefix_name) == 0) {
+	talloc_free (message->thread_id);
+	message->thread_id = NULL;
+    }
+
+    if (strcmp ("replyto", prefix_name) == 0) {
+	talloc_free (message->in_reply_to);
+	message->in_reply_to = NULL;
+    }
+}
+
+unsigned int
+_notmuch_message_get_doc_id (notmuch_message_t *message)
+{
+    return message->doc_id;
+}
+
+const char *
+notmuch_message_get_message_id (notmuch_message_t *message)
+{
+    if (!message->message_id)
+	_notmuch_message_ensure_metadata (message);
+    if (!message->message_id)
+	INTERNAL_ERROR ("Message with document ID of %u has no message ID.\n",
+			message->doc_id);
     return message->message_id;
 }
 
@@ -327,89 +392,19 @@ notmuch_message_get_header (notmuch_message_t *message, const char *header)
 const char *
 _notmuch_message_get_in_reply_to (notmuch_message_t *message)
 {
-    const char *prefix = _find_prefix ("replyto");
-    int prefix_len = strlen (prefix);
-    Xapian::TermIterator i;
-    std::string in_reply_to;
-
-    if (message->in_reply_to)
-	return message->in_reply_to;
-
-    i = message->doc.termlist_begin ();
-    i.skip_to (prefix);
-
-    if (i != message->doc.termlist_end ())
-	in_reply_to = *i;
-
-    /* It's perfectly valid for a message to have no In-Reply-To
-     * header. For these cases, we return an empty string. */
-    if (i == message->doc.termlist_end () ||
-	strncmp (in_reply_to.c_str (), prefix, prefix_len))
-    {
-	message->in_reply_to = talloc_strdup (message, "");
-	return message->in_reply_to;
-    }
-
-    message->in_reply_to = talloc_strdup (message,
-					  in_reply_to.c_str () + prefix_len);
-
-#if DEBUG_DATABASE_SANITY
-    i++;
-
-    in_reply_to = *i;
-
-    if (i != message->doc.termlist_end () &&
-	strncmp ((*i).c_str (), prefix, prefix_len) == 0)
-    {
-       INTERNAL_ERROR ("Message %s has duplicate In-Reply-To IDs: %s and %s\n",
-			notmuch_message_get_message_id (message),
-			message->in_reply_to,
-			(*i).c_str () + prefix_len);
-    }
-#endif
-
+    if (!message->in_reply_to)
+	_notmuch_message_ensure_metadata (message);
     return message->in_reply_to;
 }
 
 const char *
 notmuch_message_get_thread_id (notmuch_message_t *message)
 {
-    const char *prefix = _find_prefix ("thread");
-    Xapian::TermIterator i;
-    std::string id;
-
-    /* This code is written with the assumption that "thread" has a
-     * single-character prefix. */
-    assert (strlen (prefix) == 1);
-
-    if (message->thread_id)
-	return message->thread_id;
-
-    i = message->doc.termlist_begin ();
-    i.skip_to (prefix);
-
-    if (i != message->doc.termlist_end ())
-	id = *i;
-
-    if (i == message->doc.termlist_end () || id[0] != *prefix)
-	INTERNAL_ERROR ("Message with document ID of %d has no thread ID.\n",
+    if (!message->thread_id)
+	_notmuch_message_ensure_metadata (message);
+    if (!message->thread_id)
+	INTERNAL_ERROR ("Message with document ID of %u has no thread ID.\n",
 			message->doc_id);
-
-    message->thread_id = talloc_strdup (message, id.c_str () + 1);
-
-#if DEBUG_DATABASE_SANITY
-    i++;
-    id = *i;
-
-    if (i != message->doc.termlist_end () && id[0] == *prefix)
-    {
-	INTERNAL_ERROR ("Message %s has duplicate thread IDs: %s and %s\n",
-			notmuch_message_get_message_id (message),
-			message->thread_id,
-			id.c_str () + 1);
-    }
-#endif
-
     return message->thread_id;
 }
 
@@ -738,6 +733,8 @@ _notmuch_message_add_term (notmuch_message_t *message,
 
     talloc_free (term);
 
+    _notmuch_message_invalidate_metadata (message, prefix_name);
+
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
@@ -801,6 +798,8 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 
     talloc_free (term);
 
+    _notmuch_message_invalidate_metadata (message, prefix_name);
+
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] Implement an internal generic string list and use it.
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
  2010-12-09 20:59 ` [PATCH 1/5] Use a single unified pass to fetch scalar message metadata Austin Clements
@ 2010-12-09 20:59 ` Austin Clements
  2010-12-23  2:44   ` Austin Clements
  2010-12-09 20:59 ` [PATCH 3/5] Add a generic function to get a list of terms with some prefix Austin Clements
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch; +Cc: Austin Clements

This replaces the guts of the filename list and tag list, making those
interfaces simple iterators over the generic string list.  The
directory, message filename, and tags-related code now build generic
string lists and then wraps them in specific iterators.  The real wins
come in later patches, when we use these for even more generic
functionality.

As a nice side-effect, this also eliminates the annoying dependency on
GList in the tag list.
---
 lib/Makefile.local    |    1 +
 lib/database.cc       |   12 +++---
 lib/directory.cc      |    7 ++--
 lib/filenames.c       |   52 +++------------------------
 lib/message.cc        |   15 ++++----
 lib/messages.c        |   11 +++---
 lib/notmuch-private.h |   68 +++++++++++++++++++++--------------
 lib/strings.c         |   94 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/tags.c            |   63 ++++++--------------------------
 lib/thread.cc         |   10 +++---
 10 files changed, 179 insertions(+), 154 deletions(-)
 create mode 100644 lib/strings.c

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 5233ea6..37d1c0d 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -50,6 +50,7 @@ extra_cflags += -I$(dir) -fPIC
 libnotmuch_c_srcs =		\
 	$(notmuch_compat_srcs)	\
 	$(dir)/filenames.c	\
+	$(dir)/strings.c	\
 	$(dir)/libsha1.c	\
 	$(dir)/message-file.c	\
 	$(dir)/messages.c	\
diff --git a/lib/database.cc b/lib/database.cc
index 7a00917..45613bd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1752,15 +1752,15 @@ _notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
 		       Xapian::TermIterator &end)
 {
     const char *prefix = _find_prefix ("tag");
-    notmuch_tags_t *tags;
+    notmuch_string_list_t *list;
     std::string tag;
 
     /* Currently this iteration is written with the assumption that
      * "tag" has a single-character prefix. */
     assert (strlen (prefix) == 1);
 
-    tags = _notmuch_tags_create (ctx);
-    if (unlikely (tags == NULL))
+    list = _notmuch_string_list_create (ctx);
+    if (unlikely (list == NULL))
 	return NULL;
 
     i.skip_to (prefix);
@@ -1771,14 +1771,14 @@ _notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
 	if (tag.empty () || tag[0] != *prefix)
 	    break;
 
-	_notmuch_tags_add_tag (tags, tag.c_str () + 1);
+	_notmuch_string_list_append (list, tag.c_str () + 1);
 
 	i++;
     }
 
-    _notmuch_tags_prepare_iterator (tags);
+    _notmuch_string_list_sort (list);
 
-    return tags;
+    return _notmuch_tags_create (ctx, list, TRUE);
 }
 
 notmuch_tags_t *
diff --git a/lib/directory.cc b/lib/directory.cc
index 946be4f..aeee9ca 100644
--- a/lib/directory.cc
+++ b/lib/directory.cc
@@ -33,11 +33,11 @@ _create_filenames_for_terms_with_prefix (void *ctx,
 					 notmuch_database_t *notmuch,
 					 const char *prefix)
 {
-    notmuch_filename_list_t *filename_list;
+    notmuch_string_list_t *filename_list;
     Xapian::TermIterator i, end;
     int prefix_len = strlen (prefix);
 
-    filename_list = _notmuch_filename_list_create (ctx);
+    filename_list = _notmuch_string_list_create (ctx);
     if (unlikely (filename_list == NULL))
 	return NULL;
 
@@ -47,8 +47,7 @@ _create_filenames_for_terms_with_prefix (void *ctx,
     {
 	std::string term = *i;
 
-	_notmuch_filename_list_add_filename (filename_list, term.c_str () +
-					     prefix_len);
+	_notmuch_string_list_append (filename_list, term.c_str () + prefix_len);
     }
 
     return _notmuch_filenames_create (ctx, filename_list);
diff --git a/lib/filenames.c b/lib/filenames.c
index f078c95..f1ea243 100644
--- a/lib/filenames.c
+++ b/lib/filenames.c
@@ -21,56 +21,14 @@
 #include "notmuch-private.h"
 
 struct _notmuch_filenames {
-    notmuch_filename_node_t *iterator;
+    notmuch_string_node_t *iterator;
 };
 
-/* Create a new notmuch_filename_list_t object, with 'ctx' as its
- * talloc owner.
- *
- * This function can return NULL in case of out-of-memory.
- */
-notmuch_filename_list_t *
-_notmuch_filename_list_create (const void *ctx)
-{
-    notmuch_filename_list_t *list;
-
-    list = talloc (ctx, notmuch_filename_list_t);
-    if (unlikely (list == NULL))
-	return NULL;
-
-    list->head = NULL;
-    list->tail = &list->head;
-
-    return list;
-}
-
-void
-_notmuch_filename_list_add_filename (notmuch_filename_list_t *list,
-				     const char *filename)
-{
-    /* Create and initialize new node. */
-    notmuch_filename_node_t *node = talloc (list,
-					    notmuch_filename_node_t);
-
-    node->filename = talloc_strdup (node, filename);
-    node->next = NULL;
-
-    /* Append the node to the list. */
-    *(list->tail) = node;
-    list->tail = &node->next;
-}
-
-void
-_notmuch_filename_list_destroy (notmuch_filename_list_t *list)
-{
-    talloc_free (list);
-}
-
-/* The notmuch_filenames_t is an iterator object for a
- * notmuch_filename_list_t */
+/* The notmuch_filenames_t iterates over a notmuch_string_list_t of
+ * file names */
 notmuch_filenames_t *
 _notmuch_filenames_create (const void *ctx,
-			   notmuch_filename_list_t *list)
+			   notmuch_string_list_t *list)
 {
     notmuch_filenames_t *filenames;
 
@@ -99,7 +57,7 @@ notmuch_filenames_get (notmuch_filenames_t *filenames)
     if (filenames->iterator == NULL)
 	return NULL;
 
-    return filenames->iterator->filename;
+    return filenames->iterator->string;
 }
 
 void
diff --git a/lib/message.cc b/lib/message.cc
index d6ab636..031eda5 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -32,7 +32,7 @@ struct _notmuch_message {
     char *message_id;
     char *thread_id;
     char *in_reply_to;
-    notmuch_filename_list_t *filename_list;
+    notmuch_string_list_t *filename_list;
     char *author;
     notmuch_message_file_t *message_file;
     notmuch_message_list_t *replies;
@@ -434,7 +434,7 @@ _notmuch_message_add_filename (notmuch_message_t *message,
     char *direntry;
 
     if (message->filename_list) {
-	_notmuch_filename_list_destroy (message->filename_list);
+	talloc_free (message->filename_list);
 	message->filename_list = NULL;
     }
 
@@ -511,7 +511,7 @@ _notmuch_message_ensure_filename_list (notmuch_message_t *message)
     if (message->filename_list)
 	return;
 
-    message->filename_list = _notmuch_filename_list_create (message);
+    message->filename_list = _notmuch_string_list_create (message);
 
     i = message->doc.termlist_begin ();
     i.skip_to (prefix);
@@ -532,7 +532,7 @@ _notmuch_message_ensure_filename_list (notmuch_message_t *message)
 	if (data == NULL)
 	    INTERNAL_ERROR ("message with no filename");
 
-	_notmuch_filename_list_add_filename (message->filename_list, data);
+	_notmuch_string_list_append (message->filename_list, data);
 
 	return;
     }
@@ -573,8 +573,7 @@ _notmuch_message_ensure_filename_list (notmuch_message_t *message)
 	    filename = talloc_asprintf (message, "%s/%s",
 					db_path, basename);
 
-	_notmuch_filename_list_add_filename (message->filename_list,
-					     filename);
+	_notmuch_string_list_append (message->filename_list, filename);
 
 	talloc_free (local);
     }
@@ -589,12 +588,12 @@ notmuch_message_get_filename (notmuch_message_t *message)
 	return NULL;
 
     if (message->filename_list->head == NULL ||
-	message->filename_list->head->filename == NULL)
+	message->filename_list->head->string == NULL)
     {
 	INTERNAL_ERROR ("message with no filename");
     }
 
-    return message->filename_list->head->filename;
+    return message->filename_list->head->string;
 }
 
 notmuch_filenames_t *
diff --git a/lib/messages.c b/lib/messages.c
index 120a48a..404e8b3 100644
--- a/lib/messages.c
+++ b/lib/messages.c
@@ -146,13 +146,14 @@ notmuch_messages_destroy (notmuch_messages_t *messages)
 notmuch_tags_t *
 notmuch_messages_collect_tags (notmuch_messages_t *messages)
 {
-    notmuch_tags_t *tags, *msg_tags;
+    notmuch_string_list_t *tags;
+    notmuch_tags_t *msg_tags;
     notmuch_message_t *msg;
     GHashTable *htable;
     GList *keys, *l;
     const char *tag;
 
-    tags = _notmuch_tags_create (messages);
+    tags = _notmuch_string_list_create (messages);
     if (tags == NULL) return NULL;
 
     htable = g_hash_table_new_full (g_str_hash, g_str_equal, free, NULL);
@@ -170,12 +171,12 @@ notmuch_messages_collect_tags (notmuch_messages_t *messages)
 
     keys = g_hash_table_get_keys (htable);
     for (l = keys; l; l = l->next) {
-	_notmuch_tags_add_tag (tags, (char *)l->data);
+	_notmuch_string_list_append (tags, (char *)l->data);
     }
 
     g_list_free (keys);
     g_hash_table_destroy (htable);
 
-    _notmuch_tags_prepare_iterator (tags);
-    return tags;
+    _notmuch_string_list_sort (tags);
+    return _notmuch_tags_create (messages, tags, TRUE);
 }
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 303aeb3..b6f1095 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -453,48 +453,60 @@ notmuch_sha1_of_string (const char *str);
 char *
 notmuch_sha1_of_file (const char *filename);
 
-/* tags.c */
+/* strings.c */
 
-notmuch_tags_t *
-_notmuch_tags_create (void *ctx);
+typedef struct _notmuch_string_node {
+    char *string;
+    struct _notmuch_string_node *next;
+} notmuch_string_node_t;
+
+typedef struct _notmuch_string_list {
+    int length;
+    notmuch_string_node_t *head;
+    notmuch_string_node_t **tail;
+} notmuch_string_list_t;
 
+notmuch_string_list_t *
+_notmuch_string_list_create (const void *ctx);
+
+/* Add 'string' to 'list'.
+ *
+ * The list will create its own talloced copy of 'string'.
+ */
 void
-_notmuch_tags_add_tag (notmuch_tags_t *tags, const char *tag);
+_notmuch_string_list_append (notmuch_string_list_t *list,
+			     const char *string);
 
 void
-_notmuch_tags_prepare_iterator (notmuch_tags_t *tags);
+_notmuch_string_list_sort (notmuch_string_list_t *list);
 
-/* filenames.c */
+/* tags.c */
+
+notmuch_tags_t *
+_notmuch_tags_create (const void *ctx, notmuch_string_list_t *list,
+		      notmuch_bool_t steal);
 
-typedef struct _notmuch_filename_node {
-    char *filename;
-    struct _notmuch_filename_node *next;
-} notmuch_filename_node_t;
+/* filenames.c */
 
-typedef struct _notmuch_filename_list {
-    notmuch_filename_node_t *head;
-    notmuch_filename_node_t **tail;
-} notmuch_filename_list_t;
+/* The notmuch_filenames_t iterates over a notmuch_string_list_t of
+ * file names */
+notmuch_filenames_t *
+_notmuch_filenames_create (const void *ctx,
+			   notmuch_string_list_t *list);
 
-notmuch_filename_list_t *
-_notmuch_filename_list_create (const void *ctx);
+/* tags.c */
 
-/* Add 'filename' to 'list'.
- *
- * The list will create its own talloced copy of 'filename'.
- */
-void
-_notmuch_filename_list_add_filename (notmuch_filename_list_t *list,
-				     const char *filename);
+notmuch_tags_t *
+_notmuch_tags_create (const void *ctx, notmuch_string_list_t *list,
+		      notmuch_bool_t steal);
 
-void
-_notmuch_filename_list_destroy (notmuch_filename_list_t *list);
+/* filenames.c */
 
-/* The notmuch_filenames_t is an iterator object for a
- * notmuch_filename_list_t */
+/* The notmuch_filenames_t iterates over a notmuch_string_list_t of
+ * file names */
 notmuch_filenames_t *
 _notmuch_filenames_create (const void *ctx,
-			   notmuch_filename_list_t *list);
+			   notmuch_string_list_t *list);
 
 #pragma GCC visibility pop
 
diff --git a/lib/strings.c b/lib/strings.c
new file mode 100644
index 0000000..d92a0ba
--- /dev/null
+++ b/lib/strings.c
@@ -0,0 +1,94 @@
+/* strings.c - Iterator for a list of strings
+ *
+ * Copyright © 2010 Intel Corporation
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: Carl Worth <cworth@cworth.org>
+ */
+
+#include "notmuch-private.h"
+
+/* Create a new notmuch_string_list_t object, with 'ctx' as its
+ * talloc owner.
+ *
+ * This function can return NULL in case of out-of-memory.
+ */
+notmuch_string_list_t *
+_notmuch_string_list_create (const void *ctx)
+{
+    notmuch_string_list_t *list;
+
+    list = talloc (ctx, notmuch_string_list_t);
+    if (unlikely (list == NULL))
+	return NULL;
+
+    list->length = 0;
+    list->head = NULL;
+    list->tail = &list->head;
+
+    return list;
+}
+
+void
+_notmuch_string_list_append (notmuch_string_list_t *list,
+			     const char *string)
+{
+    /* Create and initialize new node. */
+    notmuch_string_node_t *node = talloc (list, notmuch_string_node_t);
+
+    node->string = talloc_strdup (node, string);
+    node->next = NULL;
+
+    /* Append the node to the list. */
+    *(list->tail) = node;
+    list->tail = &node->next;
+    list->length++;
+}
+
+static int
+cmpnode (const void *pa, const void *pb)
+{
+    notmuch_string_node_t *a = *(notmuch_string_node_t * const *)pa;
+    notmuch_string_node_t *b = *(notmuch_string_node_t * const *)pb;
+
+    return strcmp (a->string, b->string);
+}
+
+void
+_notmuch_string_list_sort (notmuch_string_list_t *list)
+{
+    notmuch_string_node_t **nodes, *node;
+    int i;
+
+    if (list->length == 0)
+	return;
+
+    nodes = talloc_array (list, notmuch_string_node_t *, list->length);
+    if (unlikely (nodes == NULL))
+	INTERNAL_ERROR ("Could not allocate memory for list sort");
+
+    for (i = 0, node = list->head; node; i++, node = node->next)
+	nodes[i] = node;
+
+    qsort (nodes, list->length, sizeof (*nodes), cmpnode);
+
+    for (i = 0; i < list->length - 1; ++i)
+	nodes[i]->next = nodes[i+1];
+    nodes[i]->next = NULL;
+    list->head = nodes[0];
+    list->tail = &nodes[i]->next;
+
+    talloc_free (nodes);
+}
diff --git a/lib/tags.c b/lib/tags.c
index 8fe4a3f..8392ef6 100644
--- a/lib/tags.c
+++ b/lib/tags.c
@@ -20,30 +20,20 @@
 
 #include "notmuch-private.h"
 
-#include <glib.h> /* GList */
-
 struct _notmuch_tags {
-    int sorted;
-    GList *tags;
-    GList *iterator;
+    notmuch_string_node_t *iterator;
 };
 
-/* XXX: Should write some talloc-friendly list to avoid the need for
- * this. */
-static int
-_notmuch_tags_destructor (notmuch_tags_t *tags)
-{
-    g_list_free (tags->tags);
-
-    return 0;
-}
-
 /* Create a new notmuch_tags_t object, with 'ctx' as its talloc owner.
+ * If steal is true, then the iterator will steal the reference to the
+ * list (useful for one-shot iterators); otherwise it will add a
+ * reference.
  *
  * This function can return NULL in case of out-of-memory.
  */
 notmuch_tags_t *
-_notmuch_tags_create (void *ctx)
+_notmuch_tags_create (const void *ctx, notmuch_string_list_t *list,
+		      notmuch_bool_t steal)
 {
     notmuch_tags_t *tags;
 
@@ -51,44 +41,15 @@ _notmuch_tags_create (void *ctx)
     if (unlikely (tags == NULL))
 	return NULL;
 
-    talloc_set_destructor (tags, _notmuch_tags_destructor);
-
-    tags->sorted = 1;
-    tags->tags = NULL;
-    tags->iterator = NULL;
+    tags->iterator = list->head;
+    if (steal)
+	talloc_steal (tags, list);
+    else
+	(void) talloc_reference (tags, list);
 
     return tags;
 }
 
-/* Add a new tag to 'tags'. The tags object will create its own copy
- * of the string.
- *
- * Note: The tags object will not do anything to prevent duplicate
- * tags being stored, so the caller really shouldn't pass
- * duplicates. */
-void
-_notmuch_tags_add_tag (notmuch_tags_t *tags, const char *tag)
-{
-    tags->tags = g_list_prepend (tags->tags, talloc_strdup (tags, tag));
-    tags->sorted = 0;
-}
-
-/* Prepare 'tag' for iteration.
- *
- * The internal creator of 'tags' should call this function before
- * returning 'tags' to the user to call the public functions such as
- * notmuch_tags_valid, notmuch_tags_get, and
- * notmuch_tags_move_to_next. */
-void
-_notmuch_tags_prepare_iterator (notmuch_tags_t *tags)
-{
-    if (! tags->sorted)
-	tags->tags = g_list_sort (tags->tags, (GCompareFunc) strcmp);
-    tags->sorted = 1;
-
-    tags->iterator = tags->tags;
-}
-
 notmuch_bool_t
 notmuch_tags_valid (notmuch_tags_t *tags)
 {
@@ -101,7 +62,7 @@ notmuch_tags_get (notmuch_tags_t *tags)
     if (tags->iterator == NULL)
 	return NULL;
 
-    return (char *) tags->iterator->data;
+    return (char *) tags->iterator->string;
 }
 
 void
diff --git a/lib/thread.cc b/lib/thread.cc
index b29f2c9..5b024fd 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -535,23 +535,23 @@ notmuch_thread_get_newest_date (notmuch_thread_t *thread)
 notmuch_tags_t *
 notmuch_thread_get_tags (notmuch_thread_t *thread)
 {
-    notmuch_tags_t *tags;
+    notmuch_string_list_t *tags;
     GList *keys, *l;
 
-    tags = _notmuch_tags_create (thread);
+    tags = _notmuch_string_list_create (thread);
     if (unlikely (tags == NULL))
 	return NULL;
 
     keys = g_hash_table_get_keys (thread->tags);
 
     for (l = keys; l; l = l->next)
-	_notmuch_tags_add_tag (tags, (char *) l->data);
+	_notmuch_string_list_append (tags, (char *) l->data);
 
     g_list_free (keys);
 
-    _notmuch_tags_prepare_iterator (tags);
+    _notmuch_string_list_sort (tags);
 
-    return tags;
+    return _notmuch_tags_create (thread, tags, true);
 }
 
 void
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] Add a generic function to get a list of terms with some prefix.
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
  2010-12-09 20:59 ` [PATCH 1/5] Use a single unified pass to fetch scalar message metadata Austin Clements
  2010-12-09 20:59 ` [PATCH 2/5] Implement an internal generic string list and use it Austin Clements
@ 2010-12-09 20:59 ` Austin Clements
  2010-12-09 20:59 ` [PATCH 4/5] Add the file name list to the unified message metadata pass Austin Clements
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch; +Cc: Austin Clements

Replace _notmuch_convert_tags with this and simplify
_create_filenames_for_terms_with_prefix.  This will also come in handy
shortly to get the message file name list.
---
 lib/database-private.h |   16 +++++++---------
 lib/database.cc        |   36 ++++++++++++++----------------------
 lib/directory.cc       |   18 +++---------------
 lib/message.cc         |    6 +++++-
 4 files changed, 29 insertions(+), 47 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 140b54e..9f83407 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -53,18 +53,16 @@ struct _notmuch_database {
     Xapian::ValueRangeProcessor *value_range_processor;
 };
 
-/* Convert tags from Xapian internal format to notmuch format.
- *
- * The function gets a TermIterator as argument and uses that iterator to find
- * all tag terms in the object. The tags are then converted to a
- * notmuch_tags_t list and returned. The function needs to allocate memory for
- * the resulting list and it uses the argument ctx as talloc context.
+/* Return the list of terms from the given iterator matching a prefix.
+ * The prefix will be stripped from the strings in the returned list.
+ * The list will be allocated using ctx as the talloc context.
  *
  * The function returns NULL on failure.
  */
-notmuch_tags_t *
-_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
-		       Xapian::TermIterator &end);
+notmuch_string_list_t *
+_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
+				Xapian::TermIterator &end,
+				const char *prefix);
 
 #pragma GCC visibility pop
 
diff --git a/lib/database.cc b/lib/database.cc
index 45613bd..f8245ab 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1747,49 +1747,41 @@ notmuch_database_remove_message (notmuch_database_t *notmuch,
     return status;
 }
 
-notmuch_tags_t *
-_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
-		       Xapian::TermIterator &end)
+notmuch_string_list_t *
+_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
+				Xapian::TermIterator &end,
+				const char *prefix)
 {
-    const char *prefix = _find_prefix ("tag");
+    int prefix_len = strlen (prefix);
     notmuch_string_list_t *list;
-    std::string tag;
-
-    /* Currently this iteration is written with the assumption that
-     * "tag" has a single-character prefix. */
-    assert (strlen (prefix) == 1);
 
     list = _notmuch_string_list_create (ctx);
     if (unlikely (list == NULL))
 	return NULL;
 
-    i.skip_to (prefix);
-
-    while (i != end) {
-	tag = *i;
-
-	if (tag.empty () || tag[0] != *prefix)
+    for (i.skip_to (prefix); i != end; i++) {
+	/* Terminate loop at first term without desired prefix. */
+	if (strncmp ((*i).c_str (), prefix, prefix_len))
 	    break;
 
-	_notmuch_string_list_append (list, tag.c_str () + 1);
-
-	i++;
+	_notmuch_string_list_append (list, (*i).c_str () + prefix_len);
     }
 
-    _notmuch_string_list_sort (list);
-
-    return _notmuch_tags_create (ctx, list, TRUE);
+    return list;
 }
 
 notmuch_tags_t *
 notmuch_database_get_all_tags (notmuch_database_t *db)
 {
     Xapian::TermIterator i, end;
+    notmuch_string_list_t *tags;
 
     try {
 	i = db->xapian_db->allterms_begin();
 	end = db->xapian_db->allterms_end();
-	return _notmuch_convert_tags(db, i, end);
+	tags = _notmuch_get_terms_with_prefix (db, i, end, _find_prefix ("tag"));
+	_notmuch_string_list_sort (tags);
+	return _notmuch_tags_create (db, tags, TRUE);
     } catch (const Xapian::Error &error) {
 	fprintf (stderr, "A Xapian exception occurred getting tags: %s.\n",
 		 error.get_msg().c_str());
diff --git a/lib/directory.cc b/lib/directory.cc
index aeee9ca..fb6ef03 100644
--- a/lib/directory.cc
+++ b/lib/directory.cc
@@ -23,10 +23,6 @@
 
 /* Create an iterator to iterate over the basenames of files (or
  * directories) that all share a common parent directory.
- *
- * The code here is general enough to be reused for any case of
- * iterating over the non-prefixed portion of terms sharing a common
- * prefix.
  */
 static notmuch_filenames_t *
 _create_filenames_for_terms_with_prefix (void *ctx,
@@ -35,21 +31,13 @@ _create_filenames_for_terms_with_prefix (void *ctx,
 {
     notmuch_string_list_t *filename_list;
     Xapian::TermIterator i, end;
-    int prefix_len = strlen (prefix);
 
-    filename_list = _notmuch_string_list_create (ctx);
+    i = notmuch->xapian_db->allterms_begin();
+    end = notmuch->xapian_db->allterms_end();
+    filename_list = _notmuch_get_terms_with_prefix (ctx, i, end, prefix);
     if (unlikely (filename_list == NULL))
 	return NULL;
 
-    end = notmuch->xapian_db->allterms_end (prefix);
-
-    for (i = notmuch->xapian_db->allterms_begin (prefix); i != end; i++)
-    {
-	std::string term = *i;
-
-	_notmuch_string_list_append (filename_list, term.c_str () + prefix_len);
-    }
-
     return _notmuch_filenames_create (ctx, filename_list);
 }
 
diff --git a/lib/message.cc b/lib/message.cc
index 031eda5..dbf683c 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -640,9 +640,13 @@ notmuch_tags_t *
 notmuch_message_get_tags (notmuch_message_t *message)
 {
     Xapian::TermIterator i, end;
+    notmuch_string_list_t *tags;
     i = message->doc.termlist_begin();
     end = message->doc.termlist_end();
-    return _notmuch_convert_tags(message, i, end);
+    tags = _notmuch_get_terms_with_prefix (message, i, end,
+					   _find_prefix ("tag"));
+    _notmuch_string_list_sort (tags);
+    return _notmuch_tags_create (message, tags, TRUE);
 }
 
 const char *
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] Add the file name list to the unified message metadata pass.
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
                   ` (2 preceding siblings ...)
  2010-12-09 20:59 ` [PATCH 3/5] Add a generic function to get a list of terms with some prefix Austin Clements
@ 2010-12-09 20:59 ` Austin Clements
  2010-12-09 20:59 ` [PATCH 5/5] Add the tag " Austin Clements
  2011-02-13 20:25 ` [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
  5 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch; +Cc: Austin Clements

Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms.  However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.

This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.

This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).
---
 lib/message.cc |   52 +++++++++++++++++++++++++++++-----------------------
 1 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index dbf683c..adb205f 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -32,6 +32,7 @@ struct _notmuch_message {
     char *message_id;
     char *thread_id;
     char *in_reply_to;
+    notmuch_string_list_t *filename_term_list;
     notmuch_string_list_t *filename_list;
     char *author;
     notmuch_message_file_t *message_file;
@@ -101,6 +102,7 @@ _notmuch_message_create_for_document (const void *talloc_owner,
     message->message_id = NULL;
     message->thread_id = NULL;
     message->in_reply_to = NULL;
+    message->filename_term_list = NULL;
     message->filename_list = NULL;
     message->message_file = NULL;
     message->author = NULL;
@@ -292,6 +294,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t *message)
     Xapian::TermIterator i, end;
     const char *thread_prefix = _find_prefix ("thread"),
 	*id_prefix = _find_prefix ("id"),
+	*filename_prefix = _find_prefix ("file-direntry"),
 	*replyto_prefix = _find_prefix ("replyto");
 
     /* We do this all in a single pass because Xapian decompresses the
@@ -314,8 +317,16 @@ _notmuch_message_ensure_metadata (notmuch_message_t *message)
 	message->message_id =
 	    _notmuch_message_get_term (message, i, end, id_prefix);
 
+    /* Get filename list.  Here we get only the terms.  We lazily
+     * expand them to full file names when needed in
+     * _notmuch_message_ensure_filename_list. */
+    assert (strcmp (id_prefix, filename_prefix) < 0);
+    if (!message->filename_term_list && !message->filename_list)
+	message->filename_term_list =
+	    _notmuch_get_terms_with_prefix (message, i, end, filename_prefix);
+
     /* Get reply to */
-    assert (strcmp (id_prefix, replyto_prefix) < 0);
+    assert (strcmp (filename_prefix, replyto_prefix) < 0);
     if (!message->in_reply_to)
 	message->in_reply_to =
 	    _notmuch_message_get_term (message, i, end, replyto_prefix);
@@ -334,6 +345,12 @@ _notmuch_message_invalidate_metadata (notmuch_message_t *message,
 	message->thread_id = NULL;
     }
 
+    if (strcmp ("file-direntry", prefix_name) == 0) {
+	talloc_free (message->filename_term_list);
+	talloc_free (message->filename_list);
+	message->filename_term_list = message->filename_list = NULL;
+    }
+
     if (strcmp ("replyto", prefix_name) == 0) {
 	talloc_free (message->in_reply_to);
 	message->in_reply_to = NULL;
@@ -433,11 +450,6 @@ _notmuch_message_add_filename (notmuch_message_t *message,
     void *local = talloc_new (message);
     char *direntry;
 
-    if (message->filename_list) {
-	talloc_free (message->filename_list);
-	message->filename_list = NULL;
-    }
-
     if (filename == NULL)
 	INTERNAL_ERROR ("Message filename cannot be NULL.");
 
@@ -504,21 +516,18 @@ _notmuch_message_clear_data (notmuch_message_t *message)
 static void
 _notmuch_message_ensure_filename_list (notmuch_message_t *message)
 {
-    const char *prefix = _find_prefix ("file-direntry");
-    int prefix_len = strlen (prefix);
-    Xapian::TermIterator i;
+    notmuch_string_node_t *node;
 
     if (message->filename_list)
 	return;
 
-    message->filename_list = _notmuch_string_list_create (message);
+    if (!message->filename_term_list)
+	_notmuch_message_ensure_metadata (message);
 
-    i = message->doc.termlist_begin ();
-    i.skip_to (prefix);
+    message->filename_list = _notmuch_string_list_create (message);
+    node = message->filename_term_list->head;
 
-    if (i == message->doc.termlist_end () ||
-	strncmp ((*i).c_str (), prefix, prefix_len))
-    {
+    if (!node) {
 	/* A message document created by an old version of notmuch
 	 * (prior to rename support) will have the filename in the
 	 * data of the document rather than as a file-direntry term.
@@ -537,19 +546,13 @@ _notmuch_message_ensure_filename_list (notmuch_message_t *message)
 	return;
     }
 
-    for (; i != message->doc.termlist_end (); i++) {
+    for (; node; node = node->next) {
 	void *local = talloc_new (message);
 	const char *db_path, *directory, *basename, *filename;
 	char *colon, *direntry = NULL;
 	unsigned int directory_id;
 
-	/* Terminate loop at first term without desired prefix. */
-	if (strncmp ((*i).c_str (), prefix, prefix_len))
-	    break;
-
-	direntry = talloc_strdup (local, (*i).c_str ());
-
-	direntry += prefix_len;
+	direntry = node->string;
 
 	directory_id = strtol (direntry, &colon, 10);
 
@@ -577,6 +580,9 @@ _notmuch_message_ensure_filename_list (notmuch_message_t *message)
 
 	talloc_free (local);
     }
+
+    talloc_free (message->filename_term_list);
+    message->filename_term_list = NULL;
 }
 
 const char *
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] Add the tag list to the unified message metadata pass.
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
                   ` (3 preceding siblings ...)
  2010-12-09 20:59 ` [PATCH 4/5] Add the file name list to the unified message metadata pass Austin Clements
@ 2010-12-09 20:59 ` Austin Clements
  2011-02-13 20:25 ` [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
  5 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-09 20:59 UTC (permalink / raw)
  To: notmuch; +Cc: Austin Clements

Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list.  In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.

This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster).  Combined with the
thread search optimization in b3caef1f0659dac8183441357c8fee500a940889,
that makes this query 2.5X faster than when I started.
---
 lib/message.cc |   32 ++++++++++++++++++++++++--------
 1 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index adb205f..cc75a71 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -32,6 +32,7 @@ struct _notmuch_message {
     char *message_id;
     char *thread_id;
     char *in_reply_to;
+    notmuch_string_list_t *tag_list;
     notmuch_string_list_t *filename_term_list;
     notmuch_string_list_t *filename_list;
     char *author;
@@ -102,6 +103,7 @@ _notmuch_message_create_for_document (const void *talloc_owner,
     message->message_id = NULL;
     message->thread_id = NULL;
     message->in_reply_to = NULL;
+    message->tag_list = NULL;
     message->filename_term_list = NULL;
     message->filename_list = NULL;
     message->message_file = NULL;
@@ -293,6 +295,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t *message)
 {
     Xapian::TermIterator i, end;
     const char *thread_prefix = _find_prefix ("thread"),
+	*tag_prefix = _find_prefix ("tag"),
 	*id_prefix = _find_prefix ("id"),
 	*filename_prefix = _find_prefix ("file-direntry"),
 	*replyto_prefix = _find_prefix ("replyto");
@@ -311,6 +314,13 @@ _notmuch_message_ensure_metadata (notmuch_message_t *message)
 	message->thread_id =
 	    _notmuch_message_get_term (message, i, end, thread_prefix);
 
+    /* Get tags */
+    if (!message->tag_list) {
+	message->tag_list =
+	    _notmuch_get_terms_with_prefix (message, i, end, tag_prefix);
+	_notmuch_string_list_sort (message->tag_list);
+    }
+
     /* Get id */
     assert (strcmp (thread_prefix, id_prefix) < 0);
     if (!message->message_id)
@@ -345,6 +355,11 @@ _notmuch_message_invalidate_metadata (notmuch_message_t *message,
 	message->thread_id = NULL;
     }
 
+    if (strcmp ("tag", prefix_name) == 0) {
+	talloc_unlink (message, message->tag_list);
+	message->tag_list = NULL;
+    }
+
     if (strcmp ("file-direntry", prefix_name) == 0) {
 	talloc_free (message->filename_term_list);
 	talloc_free (message->filename_list);
@@ -645,14 +660,14 @@ notmuch_message_get_date (notmuch_message_t *message)
 notmuch_tags_t *
 notmuch_message_get_tags (notmuch_message_t *message)
 {
-    Xapian::TermIterator i, end;
-    notmuch_string_list_t *tags;
-    i = message->doc.termlist_begin();
-    end = message->doc.termlist_end();
-    tags = _notmuch_get_terms_with_prefix (message, i, end,
-					   _find_prefix ("tag"));
-    _notmuch_string_list_sort (tags);
-    return _notmuch_tags_create (message, tags, TRUE);
+    if (!message->tag_list)
+	_notmuch_message_ensure_metadata (message);
+
+    /* We tell the iterator to add a talloc reference to the
+     * underlying list, rather than stealing it.  As a result, it's
+     * possible to modify the message tags while iterating because
+     * the iterator will keep the current list alive. */
+    return _notmuch_tags_create (message, message->tag_list, FALSE);
 }
 
 const char *
@@ -1210,6 +1225,7 @@ notmuch_message_remove_all_tags (notmuch_message_t *message)
     if (! message->frozen)
 	_notmuch_message_sync (message);
 
+    talloc_free (tags);
     return NOTMUCH_STATUS_SUCCESS;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] Implement an internal generic string list and use it.
  2010-12-09 20:59 ` [PATCH 2/5] Implement an internal generic string list and use it Austin Clements
@ 2010-12-23  2:44   ` Austin Clements
  0 siblings, 0 replies; 12+ messages in thread
From: Austin Clements @ 2010-12-23  2:44 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

It seems I somehow repeated the function prototypes for tags.c and
filenames.c twice at the bottom of notmuch-private.h (probably through some
rebase mishap).  Obviously those should be deduplicated.

On Thu, Dec 9, 2010 at 3:59 PM, Austin Clements <amdragon@mit.edu> wrote:

> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 303aeb3..b6f1095 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -453,48 +453,60 @@ notmuch_sha1_of_string (const char *str);
>  char *
>  notmuch_sha1_of_file (const char *filename);
>
> -/* tags.c */
> +/* strings.c */
>
> -notmuch_tags_t *
> -_notmuch_tags_create (void *ctx);
> +typedef struct _notmuch_string_node {
> +    char *string;
> +    struct _notmuch_string_node *next;
> +} notmuch_string_node_t;
> +
> +typedef struct _notmuch_string_list {
> +    int length;
> +    notmuch_string_node_t *head;
> +    notmuch_string_node_t **tail;
> +} notmuch_string_list_t;
>
> +notmuch_string_list_t *
> +_notmuch_string_list_create (const void *ctx);
> +
> +/* Add 'string' to 'list'.
> + *
> + * The list will create its own talloced copy of 'string'.
> + */
>  void
> -_notmuch_tags_add_tag (notmuch_tags_t *tags, const char *tag);
> +_notmuch_string_list_append (notmuch_string_list_t *list,
> +                            const char *string);
>
>  void
> -_notmuch_tags_prepare_iterator (notmuch_tags_t *tags);
> +_notmuch_string_list_sort (notmuch_string_list_t *list);
>
> -/* filenames.c */
> +/* tags.c */
> +
> +notmuch_tags_t *
> +_notmuch_tags_create (const void *ctx, notmuch_string_list_t *list,
> +                     notmuch_bool_t steal);
>
> -typedef struct _notmuch_filename_node {
> -    char *filename;
> -    struct _notmuch_filename_node *next;
> -} notmuch_filename_node_t;
> +/* filenames.c */
>
> -typedef struct _notmuch_filename_list {
> -    notmuch_filename_node_t *head;
> -    notmuch_filename_node_t **tail;
> -} notmuch_filename_list_t;
> +/* The notmuch_filenames_t iterates over a notmuch_string_list_t of
> + * file names */
> +notmuch_filenames_t *
> +_notmuch_filenames_create (const void *ctx,
> +                          notmuch_string_list_t *list);
>
> -notmuch_filename_list_t *
> -_notmuch_filename_list_create (const void *ctx);
> +/* tags.c */
>
> -/* Add 'filename' to 'list'.
> - *
> - * The list will create its own talloced copy of 'filename'.
> - */
> -void
> -_notmuch_filename_list_add_filename (notmuch_filename_list_t *list,
> -                                    const char *filename);
> +notmuch_tags_t *
> +_notmuch_tags_create (const void *ctx, notmuch_string_list_t *list,
> +                     notmuch_bool_t steal);
>
> -void
> -_notmuch_filename_list_destroy (notmuch_filename_list_t *list);
> +/* filenames.c */
>
> -/* The notmuch_filenames_t is an iterator object for a
> - * notmuch_filename_list_t */
> +/* The notmuch_filenames_t iterates over a notmuch_string_list_t of
> + * file names */
>  notmuch_filenames_t *
>  _notmuch_filenames_create (const void *ctx,
> -                          notmuch_filename_list_t *list);
> +                          notmuch_string_list_t *list);
>
>  #pragma GCC visibility pop
>

[-- Attachment #2: Type: text/html, Size: 3724 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Fetch all message metadata in a single pass
  2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
                   ` (4 preceding siblings ...)
  2010-12-09 20:59 ` [PATCH 5/5] Add the tag " Austin Clements
@ 2011-02-13 20:25 ` Austin Clements
  2011-03-11  3:48   ` Carl Worth
  5 siblings, 1 reply; 12+ messages in thread
From: Austin Clements @ 2011-02-13 20:25 UTC (permalink / raw)
  To: notmuch

I've rebased this against current master and fixed a few merge
conflicts.  The rebased patches are on the eager-metadata-v3 branch of
  http://awakening.csail.mit.edu/git/notmuch.git

On Thu, Dec 9, 2010 at 3:59 PM, Austin Clements <amdragon@mit.edu> wrote:
> This is the second of the two optimizations I described a while ago,
> and brings my inbox search down to 1.811 seconds, 2.5X faster than it
> was originally.
>
> This optimization is based on the observation that Xapian decompresses
> a document's term list every time you iterate over it.  As a result,
> notmuch can decompress the beginning of a single term list quite a few
> times.  This patch series combines all of this into a single pass that
> is only slightly more expensive than fetching one metadata field used
> to be, but offers a huge win in the common case where a message object
> is used for multiple metadata fields.
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Fetch all message metadata in a single pass
  2011-02-13 20:25 ` [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
@ 2011-03-11  3:48   ` Carl Worth
  2011-03-21  6:56     ` Austin Clements
  0 siblings, 1 reply; 12+ messages in thread
From: Carl Worth @ 2011-03-11  3:48 UTC (permalink / raw)
  To: Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 5776 bytes --]

On Sun, 13 Feb 2011 15:25:48 -0500, Austin Clements <amdragon@mit.edu> wrote:
> I've rebased this against current master and fixed a few merge
> conflicts.  The rebased patches are on the eager-metadata-v3 branch of
>   http://awakening.csail.mit.edu/git/notmuch.git

Hi Austin,

This looks like a great set of optimizations and cleanup. Here is some
(long-overdue) review.

First, the patch series looks excellent and my review here is quite
nit-picky. I feel bad reviewing this so late and not just immediately
merging it, but I will commit to picking up a refreshed series very
quickly.

One very minor thing is that the word "metadata" might be confused with
Xapian metadata which we are using already such as:

    version_string = notmuch->xapian_db->get_metadata ("version");

So we might want to come up with a better name here. I don't have any
concrete suggestion yet, so if you don't think of anything obvious, then
don't worry about it.

> +    /* Get tags */
> +    if (!message->tag_list) {
> +	message->tag_list =
> +	    _notmuch_get_terms_with_prefix (message, i, end, tag_prefix);
> +	_notmuch_string_list_sort (message->tag_list);
> +    }

Looks like the above case is missing the assert to ensure proper prefix
ordering.

> +    if (!message->tag_list)
> +	_notmuch_message_ensure_metadata (message);
> +
> +    /* We tell the iterator to add a talloc reference to the
> +     * underlying list, rather than stealing it.  As a result, it's
> +     * possible to modify the message tags while iterating because
> +     * the iterator will keep the current list alive. */
> +    return _notmuch_tags_create (message, message->tag_list, FALSE);

The behavior here is great, but don't like Boolean function parameters
being used to change the semantics. The problem with a Boolean argument
is that it's impossible to know the semantic by just reading the calling
code---you must also consult the implementation as well.

For any case where it seems natural to implement something with a
Boolean argument, I sometimes use an approach something like this:

	static void
	_foo_function_internal (foo_t *, ..., bool_t be_different)
	{
		...;
	}

        void
	foo_function (foo_t *, ...)
	{
		return _foo_function_internal (foo, ..., FALSE);
	}

        void
	foo_function_different (foo_t *, ...)
	{
		return _foo_function_internal (foo, ..., TRUE);
	}

That is, I'm willing to accept the Boolean parameter in the case of a
static function defined immediately next to all callers. Here, for any
non-static callers the semantics should be quite clear. (And perhaps the
function calling with the FALSE case will need a clarifying suffix as
well---one might have some_function_foo and some_function_bar for the
two cases).

Of course, my proscription against Boolean parameter doesn't apply to
something like a function that is setting some Boolean feature to TRUE
or FALSE. The important criterion is that the function semantics should
be evident to someone reading code calling the function. So if the
Boolean argument is obviously tied to some portion of the function name,
then that can be adequate.

Enough with the generalities, back to _notmuch_tags_create...

The caller above is the exceptional caller. It's the only one using
passing FALSE, and it also already has a large comment. So it seems that
the right fix here is to have the extra talloc_reference happen here
where there's a comment talking about adding a talloc reference.

So it would be great to see something here like:

	tags = _notmuch_tags_create (message, message->tag_list);
        talloc_reference (message, message->tag_list);
        return tags;

Of course, that would mean that _notmuch_tags_create can't have done the
talloc_steal. So perhaps all the other callers should be calling:

	_notmuch_tags_create_with_steal (ctx, tag_list);

What do you think?

> -notmuch_tags_t *
> -_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
> -		       Xapian::TermIterator &end);
> +notmuch_string_list_t *
> +_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
> +				Xapian::TermIterator &end,
> +				const char *prefix);

I don't really like the fact that _notmuch_get_terms_with_prefix doesn't
make a clear indication of what file it's defined in, (the old function
_notmuch_convert_tags had the same problem). It could be named
_notmuch_database_convert_tags since it's in database.cc, but that would
be odd in not actually accepting a notmuch_database_t * as the first
parameter. Any suggestion here?

> index 0000000..d92a0ba
> --- /dev/null
> +++ b/lib/strings.c
> @@ -0,0 +1,94 @@
> +/* strings.c - Iterator for a list of strings

Similarly, this file might be better as string-list.c.

> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
> + *
> + * Author: Carl Worth <cworth@cworth.org>

Hey, wait a second, some of this code is mine, but I think the sort
function is yours. Please do start annotating the Author tags on all
files as appropriate. (There are probably lots of files missing your
Author attribution---I just happened to notice this one here since you
happened to have an Author line in the patch.)

> -/* XXX: Should write some talloc-friendly list to avoid the need for
> - * this. */

Hurrah! I love patches that get to remove XXX comments.

> +    /* Get thread */
> +    if (!message->thread_id)
> +	message->thread_id =
> +	    _notmuch_message_get_term (message, i, end, thread_prefix);
> +
> +    /* Get id */
> +    assert (strcmp (thread_prefix, id_prefix) < 0);
> +    if (!message->message_id)
> +	message->message_id =
> +	    _notmuch_message_get_term (message, i, end, id_prefix);

Missing asserts on the above two as well?

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Fetch all message metadata in a single pass
  2011-03-11  3:48   ` Carl Worth
@ 2011-03-21  6:56     ` Austin Clements
  2011-04-25 20:52       ` Carl Worth
  0 siblings, 1 reply; 12+ messages in thread
From: Austin Clements @ 2011-03-21  6:56 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

Thanks for the thorough review.  My updated patches are on the
eager-metadata-v4 branch (also, for-review/eager-metadata-v4) at
  http://awakening.csail.mit.edu/git/notmuch.git
Responses inline.

On Thu, Mar 10, 2011 at 10:48 PM, Carl Worth <cworth@cworth.org> wrote:
> On Sun, 13 Feb 2011 15:25:48 -0500, Austin Clements <amdragon@mit.edu> wrote:
>> I've rebased this against current master and fixed a few merge
>> conflicts.  The rebased patches are on the eager-metadata-v3 branch of
>>   http://awakening.csail.mit.edu/git/notmuch.git
>
> Hi Austin,
>
> This looks like a great set of optimizations and cleanup. Here is some
> (long-overdue) review.
>
> First, the patch series looks excellent and my review here is quite
> nit-picky. I feel bad reviewing this so late and not just immediately
> merging it, but I will commit to picking up a refreshed series very
> quickly.
>
> One very minor thing is that the word "metadata" might be confused with
> Xapian metadata which we are using already such as:
>
>    version_string = notmuch->xapian_db->get_metadata ("version");
>
> So we might want to come up with a better name here. I don't have any
> concrete suggestion yet, so if you don't think of anything obvious, then
> don't worry about it.

Hmm.  I think this is okay because I always refer to "message
metadata" (and I couldn't think of a better term).

>> +    /* Get tags */
>> +    if (!message->tag_list) {
>> +     message->tag_list =
>> +         _notmuch_get_terms_with_prefix (message, i, end, tag_prefix);
>> +     _notmuch_string_list_sort (message->tag_list);
>> +    }
>
> Looks like the above case is missing the assert to ensure proper prefix
> ordering.

Fixed.

>> +    if (!message->tag_list)
>> +     _notmuch_message_ensure_metadata (message);
>> +
>> +    /* We tell the iterator to add a talloc reference to the
>> +     * underlying list, rather than stealing it.  As a result, it's
>> +     * possible to modify the message tags while iterating because
>> +     * the iterator will keep the current list alive. */
>> +    return _notmuch_tags_create (message, message->tag_list, FALSE);
>
> The behavior here is great, but don't like Boolean function parameters
> being used to change the semantics. The problem with a Boolean argument
> is that it's impossible to know the semantic by just reading the calling
> code---you must also consult the implementation as well.
>
> For any case where it seems natural to implement something with a
> Boolean argument, I sometimes use an approach something like this:
>
>        static void
>        _foo_function_internal (foo_t *, ..., bool_t be_different)
>        {
>                ...;
>        }
>
>        void
>        foo_function (foo_t *, ...)
>        {
>                return _foo_function_internal (foo, ..., FALSE);
>        }
>
>        void
>        foo_function_different (foo_t *, ...)
>        {
>                return _foo_function_internal (foo, ..., TRUE);
>        }
>
> That is, I'm willing to accept the Boolean parameter in the case of a
> static function defined immediately next to all callers. Here, for any
> non-static callers the semantics should be quite clear. (And perhaps the
> function calling with the FALSE case will need a clarifying suffix as
> well---one might have some_function_foo and some_function_bar for the
> two cases).
>
> Of course, my proscription against Boolean parameter doesn't apply to
> something like a function that is setting some Boolean feature to TRUE
> or FALSE. The important criterion is that the function semantics should
> be evident to someone reading code calling the function. So if the
> Boolean argument is obviously tied to some portion of the function name,
> then that can be adequate.
>
> Enough with the generalities, back to _notmuch_tags_create...
>
> The caller above is the exceptional caller. It's the only one using
> passing FALSE, and it also already has a large comment. So it seems that
> the right fix here is to have the extra talloc_reference happen here
> where there's a comment talking about adding a talloc reference.
>
> So it would be great to see something here like:
>
>        tags = _notmuch_tags_create (message, message->tag_list);
>        talloc_reference (message, message->tag_list);
>        return tags;
>
> Of course, that would mean that _notmuch_tags_create can't have done the
> talloc_steal. So perhaps all the other callers should be calling:
>
>        _notmuch_tags_create_with_steal (ctx, tag_list);
>
> What do you think?

Your point about the boolean argument is well taken.  In this case
there's really no need for two difference versions, so I wound up
making _notmuch_tags_create always steal the reference.

I debated for a while whether it should add a reference, thus forcing
most callers to talloc_unlink, or steal the reference, thus forcing
one caller to talloc_reference.  I ultimately decided it was much more
likely that the caller would forget the talloc_unlink, resulting in a
difficult-to-track memory leak (since the list would *eventually* be
cleaned up), than that the steal would cause confusion.  Also, there
is some precedence for internal functions that steal an argument.  So
I made it steal the reference.

This doesn't cause any problems in the one weird case that still needs
the reference to the list.  After the _notmuch_tags_create, the caller
simply adds that reference.

>> -notmuch_tags_t *
>> -_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
>> -                    Xapian::TermIterator &end);
>> +notmuch_string_list_t *
>> +_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
>> +                             Xapian::TermIterator &end,
>> +                             const char *prefix);
>
> I don't really like the fact that _notmuch_get_terms_with_prefix doesn't
> make a clear indication of what file it's defined in, (the old function
> _notmuch_convert_tags had the same problem). It could be named
> _notmuch_database_convert_tags since it's in database.cc, but that would
> be odd in not actually accepting a notmuch_database_t * as the first
> parameter. Any suggestion here?

_notmuch_get_terms_with_prefix is weird because it captures a pattern
that lies squarely in the Xapian world, being all about term
iterators.  Hence, I think the "right" solution (note, not the best
solution), would be to hide the term iterators and make two copies of
the function: one that takes a notmuch_database_t and always considers
all database terms, and one private to message.cc that acts as a
helper to what's now all bundled in _notmuch_message_ensure_metadata.

But that bothers me more than a function that doesn't take a
notmuch_database_t *.  So I just added "database" to the name.

>> index 0000000..d92a0ba
>> --- /dev/null
>> +++ b/lib/strings.c
>> @@ -0,0 +1,94 @@
>> +/* strings.c - Iterator for a list of strings
>
> Similarly, this file might be better as string-list.c.

Done.

>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
>> + *
>> + * Author: Carl Worth <cworth@cworth.org>
>
> Hey, wait a second, some of this code is mine, but I think the sort
> function is yours. Please do start annotating the Author tags on all
> files as appropriate. (There are probably lots of files missing your
> Author attribution---I just happened to notice this one here since you
> happened to have an Author line in the patch.)

Heh, fixed.  I suppose I hadn't been thinking about it, since none of
the source in lib/ has other authors listed.

But it raises an interesting question.  When is it kosher to add
oneself to a file's author list?  In this case I wrote about half of
the code in that admittedly short file, so that makes sense  Changing
a few lines presumably isn't enough.  Adding a few functions?

>> -/* XXX: Should write some talloc-friendly list to avoid the need for
>> - * this. */
>
> Hurrah! I love patches that get to remove XXX comments.

]:--8)

>> +    /* Get thread */
>> +    if (!message->thread_id)
>> +     message->thread_id =
>> +         _notmuch_message_get_term (message, i, end, thread_prefix);
>> +
>> +    /* Get id */
>> +    assert (strcmp (thread_prefix, id_prefix) < 0);
>> +    if (!message->message_id)
>> +     message->message_id =
>> +         _notmuch_message_get_term (message, i, end, id_prefix);
>
> Missing asserts on the above two as well?

Fixed.  (Actually, I think there was only one assert missing in total,
but it had propagated through the history.)

> -Carl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Fetch all message metadata in a single pass
  2011-03-21  6:56     ` Austin Clements
@ 2011-04-25 20:52       ` Carl Worth
  2011-04-25 21:42         ` Carl Worth
  0 siblings, 1 reply; 12+ messages in thread
From: Carl Worth @ 2011-04-25 20:52 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

On Mon, 21 Mar 2011 02:56:05 -0400, Austin Clements <amdragon@mit.edu> wrote:
> Thanks for the thorough review.  My updated patches are on the
> eager-metadata-v4 branch (also, for-review/eager-metadata-v4) at
>   http://awakening.csail.mit.edu/git/notmuch.git

Great. I failed at my commitment to quickly apply the updated series. I
want to do it now, but I'm failing to be able to fetch the code:

	$ git fetch amdragon
	error: RPC failed; result=22, HTTP code = 417
	fatal: The remote end hung up unexpectedly

(This is using the URL above.)

> This doesn't cause any problems in the one weird case that still needs
> the reference to the list.  After the _notmuch_tags_create, the caller
> simply adds that reference.

This, (and the other comments you made about the cleanup), all sound
great to me. So I'm looking forward to seeing the code.

> Heh, fixed.  I suppose I hadn't been thinking about it, since none of
> the source in lib/ has other authors listed.
> 
> But it raises an interesting question.  When is it kosher to add
> oneself to a file's author list?  In this case I wrote about half of
> the code in that admittedly short file, so that makes sense  Changing
> a few lines presumably isn't enough.  Adding a few functions?

My intention is simply to leave this up to the author of the code in
question. If you feel your name belongs on the author list, then please
put it there.

Similarly, if you think your code is sufficient to deserve copyright
protection, (so probably anything beyond simple typo fixes or otherwise
tiny changes), then *please* add your name (or your employer's name as
appropriate) to that list as well.

A quick check with grep shows that I've definitely not been asking for
this attribution as much as I should have. Hopefully we can all get in a
better habit going forward.

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Fetch all message metadata in a single pass
  2011-04-25 20:52       ` Carl Worth
@ 2011-04-25 21:42         ` Carl Worth
  0 siblings, 0 replies; 12+ messages in thread
From: Carl Worth @ 2011-04-25 21:42 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

On Mon, 25 Apr 2011 13:52:04 -0700, Carl Worth <cworth@cworth.org> wrote:
> 	$ git fetch amdragon
> 	error: RPC failed; result=22, HTTP code = 417
> 	fatal: The remote end hung up unexpectedly
> 
> (This is using the URL above.)

Sorry. This appears to have been due to a bogus http proxy on my end.

I've fixed that and pushed the code now.

Thanks!

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-04-25 21:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-09 20:59 [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
2010-12-09 20:59 ` [PATCH 1/5] Use a single unified pass to fetch scalar message metadata Austin Clements
2010-12-09 20:59 ` [PATCH 2/5] Implement an internal generic string list and use it Austin Clements
2010-12-23  2:44   ` Austin Clements
2010-12-09 20:59 ` [PATCH 3/5] Add a generic function to get a list of terms with some prefix Austin Clements
2010-12-09 20:59 ` [PATCH 4/5] Add the file name list to the unified message metadata pass Austin Clements
2010-12-09 20:59 ` [PATCH 5/5] Add the tag " Austin Clements
2011-02-13 20:25 ` [PATCH 0/5] Fetch all message metadata in a single pass Austin Clements
2011-03-11  3:48   ` Carl Worth
2011-03-21  6:56     ` Austin Clements
2011-04-25 20:52       ` Carl Worth
2011-04-25 21:42         ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).