From: Austin Clements <amdragon@MIT.EDU>
To: Jani Nikula <jani@nikula.org>
Cc: notmuch@notmuchmail.org
Subject: Re: [PATCH 08/11] search: Add stable queries to thread search results
Date: Wed, 9 Oct 2013 10:36:58 -0400 [thread overview]
Message-ID: <20131009143658.GQ21611@mit.edu> (raw)
In-Reply-To: <87fvsaao2q.fsf@nikula.org>
Quoth Jani Nikula on Oct 09 at 9:41 am:
> On Tue, 08 Oct 2013, Austin Clements <amdragon@MIT.EDU> wrote:
> > These queries will match exactly the set of messages currently in the
> > thread, even if more messages later arrive. Two queries are provided:
> > one for matched messages and one for unmatched messages.
> >
> > This can be used to fix race conditions with tagging threads from
> > search results. While tagging based on a thread: query can affect
> > messages that arrived after the search, tagging based on stable
> > queries affects only the messages the user was shown in the search UI.
> >
> > Since we want clients to be able to depend on the presence of these
> > queries, this ushers in schema version 2.
> > ---
> > devel/schemata | 22 +++++++++++++++++--
> > notmuch-client.h | 2 +-
> > notmuch-search.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > test/json | 2 ++
> > test/missing-headers | 6 ++++--
> > test/sexp | 4 ++--
> > 6 files changed, 89 insertions(+), 7 deletions(-)
> >
> > diff --git a/devel/schemata b/devel/schemata
> > index cdd0e43..41dc4a6 100644
> > --- a/devel/schemata
> > +++ b/devel/schemata
> > @@ -14,7 +14,17 @@ are interleaved. Keys are printed as keywords (symbols preceded by a
> > colon), e.g. (:id "123" :time 54321 :from "foobar"). Null is printed as
> > nil, true as t and false as nil.
> >
> > -This is version 1 of the structured output format.
> > +This is version 2 of the structured output format.
> > +
> > +Version history
> > +---------------
> > +
> > +v1
> > +- First versioned schema release.
> > +- Added part.content-length and part.content-transfer-encoding fields.
> > +
> > +v2
> > +- Added the thread_summary.query field.
> >
> > Common non-terminals
> > --------------------
> > @@ -145,7 +155,15 @@ thread_summary = {
> > authors: string, # comma-separated names with | between
> > # matched and unmatched
> > subject: string,
> > - tags: [string*]
> > + tags: [string*],
> > +
> > + # Two stable query strings identifying exactly the matched and
> > + # unmatched messages currently in this thread. The messages
> > + # matched by these queries will not change even if more messages
> > + # arrive in the thread. If there are no matched or unmatched
> > + # messages, the corresponding query will be null (there is no
> > + # query that matches nothing). (Added in schema version 2.)
> > + query: [string|null, string|null],
> > }
> >
> > notmuch reply schema
> > diff --git a/notmuch-client.h b/notmuch-client.h
> > index 8d986f4..1b14910 100644
> > --- a/notmuch-client.h
> > +++ b/notmuch-client.h
> > @@ -138,7 +138,7 @@ chomp_newline (char *str)
> > * this. New (required) map fields can be added without increasing
> > * this.
> > */
> > -#define NOTMUCH_FORMAT_CUR 1
> > +#define NOTMUCH_FORMAT_CUR 2
> > /* The minimum supported structured output format version. Requests
> > * for format versions below this will return an error. */
> > #define NOTMUCH_FORMAT_MIN 1
> > diff --git a/notmuch-search.c b/notmuch-search.c
> > index d9d39ec..1d14651 100644
> > --- a/notmuch-search.c
> > +++ b/notmuch-search.c
> > @@ -20,6 +20,7 @@
> >
> > #include "notmuch-client.h"
> > #include "sprinter.h"
> > +#include "string-util.h"
> >
> > typedef enum {
> > OUTPUT_SUMMARY,
> > @@ -46,6 +47,46 @@ sanitize_string (const void *ctx, const char *str)
> > return out;
> > }
> >
> > +/* Return two stable query strings that identify exactly the matched
> > + * and unmatched messages currently in thread. If there are no
> > + * matched or unmatched messages, the returned buffers will be
> > + * NULL. */
> > +static int
> > +get_thread_query (notmuch_thread_t *thread,
> > + char **matched_out, char **unmached_out)
> > +{
> > + notmuch_messages_t *messages;
> > + char *escaped = NULL;
> > + size_t escaped_len = 0;
> > +
> > + *matched_out = *unmached_out = NULL;
> > +
> > + for (messages = notmuch_thread_get_messages (thread);
> > + notmuch_messages_valid (messages);
> > + notmuch_messages_move_to_next (messages))
> > + {
> > + notmuch_message_t *message = notmuch_messages_get (messages);
> > + const char *mid = notmuch_message_get_message_id (message);
> > + /* Determine which query buffer to extend */
> > + char **buf = notmuch_message_get_flag (
> > + message, NOTMUCH_MESSAGE_FLAG_MATCH) ? matched_out : unmached_out;
> > + /* Allocate the query buffer is this is the first message */
> > + if (!*buf && (*buf = talloc_strdup (thread, "")) == NULL)
> > + return -1;
>
> I think it would improve clarity if you dropped the above...
>
> > + /* Add this message's id: query. Since "id" is an exclusive
> > + * prefix, it is implicitly 'or'd together, so we only need to
> > + * join queries with a space. */
> > + if (make_boolean_term (thread, "id", mid, &escaped, &escaped_len) < 0)
> > + return -1;
> > + *buf = talloc_asprintf_append_buffer (
> > + *buf, "%s%s", **buf ? " " : "", escaped);
>
> ...and turned this into:
>
> if (*buf)
> *buf = talloc_asprintf_append_buffer (*buf, " %s", escaped);
> else
> *buf = talloc_strdup (thread, escaped);
Much nicer!
> Also one talloc less. Which brings me to the main worry:
> performance. What's the impact?
Seems to be about 1%-3% for CLI search (tested on the medium corpus).
It's hard to measure what the effect on Emacs search is, though I
would expect it to be similarly negligible. Some work I did several
attempts at this ago suggests that this slows down tagging (though I
doubt it would be noticeable for single threads), but I also found
that switching to docid-based queries significantly sped things up:
id:CAH-f9WsPj=1Eu=g3sOePJgCTBFs6HrLdLq18xMEnJ8aZ00yCEg@mail.gmail.com
Actually, docid queries probably make tagging faster than it is *now*,
but I didn't measure that when I did the experiments.
> BR,
> Jani.
>
>
> > + if (!*buf)
> > + return -1;
> > + }
> > + talloc_free (escaped);
> > + return 0;
> > +}
> > +
> > static int
> > do_search_threads (sprinter_t *format,
> > notmuch_query_t *query,
> > @@ -131,6 +172,25 @@ do_search_threads (sprinter_t *format,
> > format->string (format, authors);
> > format->map_key (format, "subject");
> > format->string (format, subject);
> > + if (notmuch_format_version >= 2) {
> > + char *matched_query, *unmatched_query;
> > + if (get_thread_query (thread, &matched_query,
> > + &unmatched_query) < 0) {
> > + fprintf (stderr, "Out of memory\n");
> > + return 1;
> > + }
> > + format->map_key (format, "query");
> > + format->begin_list (format);
> > + if (matched_query)
> > + format->string (format, matched_query);
> > + else
> > + format->null (format);
> > + if (unmatched_query)
> > + format->string (format, unmatched_query);
> > + else
> > + format->null (format);
> > + format->end (format);
> > + }
> > }
> >
> > talloc_free (ctx_quote);
> > diff --git a/test/json b/test/json
> > index b87b7f6..e07a290 100755
> > --- a/test/json
> > +++ b/test/json
> > @@ -26,6 +26,7 @@ test_expect_equal_json "$output" "[{\"thread\": \"XXX\",
> > \"total\": 1,
> > \"authors\": \"Notmuch Test Suite\",
> > \"subject\": \"json-search-subject\",
> > + \"query\": [\"id:$gen_msg_id\", null],
> > \"tags\": [\"inbox\",
> > \"unread\"]}]"
> >
> > @@ -59,6 +60,7 @@ test_expect_equal_json "$output" "[{\"thread\": \"XXX\",
> > \"total\": 1,
> > \"authors\": \"Notmuch Test Suite\",
> > \"subject\": \"json-search-utf8-body-sübjéct\",
> > + \"query\": [\"id:$gen_msg_id\", null],
> > \"tags\": [\"inbox\",
> > \"unread\"]}]"
> >
> > diff --git a/test/missing-headers b/test/missing-headers
> > index f14b878..43e861b 100755
> > --- a/test/missing-headers
> > +++ b/test/missing-headers
> > @@ -43,7 +43,8 @@ test_expect_equal_json "$output" '
> > ],
> > "thread": "XXX",
> > "timestamp": 978709437,
> > - "total": 1
> > + "total": 1,
> > + "query": ["id:notmuch-sha1-7a6e4eac383ef958fcd3ebf2143db71b8ff01161", null]
> > },
> > {
> > "authors": "Notmuch Test Suite",
> > @@ -56,7 +57,8 @@ test_expect_equal_json "$output" '
> > ],
> > "thread": "XXX",
> > "timestamp": 0,
> > - "total": 1
> > + "total": 1,
> > + "query": ["id:notmuch-sha1-ca55943aff7a72baf2ab21fa74fab3d632401334", null]
> > }
> > ]'
> >
> > diff --git a/test/sexp b/test/sexp
> > index 492a82f..be815e1 100755
> > --- a/test/sexp
> > +++ b/test/sexp
> > @@ -19,7 +19,7 @@ test_expect_equal "$output" "((((:id \"${gen_msg_id}\" :match t :excluded nil :f
> > test_begin_subtest "Search message: sexp"
> > add_message "[subject]=\"sexp-search-subject\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"sexp-search-message\""
> > output=$(notmuch search --format=sexp "sexp-search-message" | notmuch_search_sanitize)
> > -test_expect_equal "$output" "((:thread \"0000000000000002\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-subject\" :tags (\"inbox\" \"unread\")))"
> > +test_expect_equal "$output" "((:thread \"0000000000000002\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-subject\" :query (\"id:$gen_msg_id\" nil) :tags (\"inbox\" \"unread\")))"
> >
> > test_begin_subtest "Show message: sexp, utf-8"
> > add_message "[subject]=\"sexp-show-utf8-body-sübjéct\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"jsön-show-méssage\""
> > @@ -44,7 +44,7 @@ test_expect_equal "$output" "((((:id \"$id\" :match t :excluded nil :filename \"
> > test_begin_subtest "Search message: sexp, utf-8"
> > add_message "[subject]=\"sexp-search-utf8-body-sübjéct\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"jsön-search-méssage\""
> > output=$(notmuch search --format=sexp "jsön-search-méssage" | notmuch_search_sanitize)
> > -test_expect_equal "$output" "((:thread \"0000000000000005\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-utf8-body-sübjéct\" :tags (\"inbox\" \"unread\")))"
> > +test_expect_equal "$output" "((:thread \"0000000000000005\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-utf8-body-sübjéct\" :query (\"id:$gen_msg_id\" nil) :tags (\"inbox\" \"unread\")))"
> >
> >
> > test_done
> >
> > _______________________________________________
> > notmuch mailing list
> > notmuch@notmuchmail.org
> > http://notmuchmail.org/mailman/listinfo/notmuch
--
Austin Clements MIT/'06/PhD/CSAIL
amdragon@mit.edu http://web.mit.edu/amdragon
Somewhere in the dream we call reality you will find me,
searching for the reality we call dreams.
next prev parent reply other threads:[~2013-10-09 14:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-07 22:33 [PATCH 00/11] Fix search tagging races Austin Clements
2013-10-07 22:33 ` [PATCH 01/11] schemata: Disambiguate non-terminal names Austin Clements
2013-10-07 22:33 ` [PATCH 02/11] cli: Separate current and deprecated format version Austin Clements
2013-10-08 6:48 ` Mark Walters
2013-10-09 14:08 ` Austin Clements
2013-10-07 22:33 ` [PATCH 03/11] lib: Document extent of some return values Austin Clements
2013-10-07 22:33 ` [PATCH 04/11] test: Fix missing erase-buffer in emacs test Austin Clements
2013-10-07 22:33 ` [PATCH 05/11] emacs: Move `notmuch-call-notmuch-process' to notmuch-lib Austin Clements
2013-10-07 22:33 ` [PATCH 06/11] emacs: Support passing input via `notmuch-call-notmuch-*' Austin Clements
2013-10-07 22:33 ` [PATCH 07/11] emacs: Use notmuch tag --batch for large tag queries Austin Clements
2013-10-08 7:27 ` Mark Walters
2013-10-09 14:11 ` Austin Clements
2013-10-09 7:18 ` Jani Nikula
2013-10-09 7:38 ` Mark Walters
2013-10-09 14:14 ` Austin Clements
2013-10-07 22:33 ` [PATCH 08/11] search: Add stable queries to thread search results Austin Clements
2013-10-08 16:37 ` Mark Walters
2013-10-09 7:41 ` Jani Nikula
2013-10-09 14:36 ` Austin Clements [this message]
2013-10-09 18:25 ` Jani Nikula
2013-10-07 22:33 ` [PATCH 09/11] Add TODO about more efficient stable thread queries Austin Clements
2013-10-07 22:33 ` [PATCH 10/11] emacs: Add known-broken tests for search tagging races Austin Clements
2013-10-08 16:47 ` Mark Walters
2013-10-07 22:33 ` [PATCH 11/11] emacs: Fix " Austin Clements
2013-10-08 7:56 ` [PATCH 00/11] " Mark Walters
2013-10-09 16:19 ` Austin Clements
2013-10-09 7:43 ` Mark Walters
2013-10-09 16:11 ` Austin Clements
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131009143658.GQ21611@mit.edu \
--to=amdragon@mit.edu \
--cc=jani@nikula.org \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).