Re: [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"

unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed

* Re: [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
@ 2011-07-01 16:37 Austin Clements
  2011-07-02 14:20 ` Pieter Praet
  0 siblings, 1 reply; 27+ messages in thread
From: Austin Clements @ 2011-07-01 16:37 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

On Jul 1, 2011 10:55 AM, "Austin Clements" <amdragon@mit.edu> wrote:
>
> On Thu, Jun 30, 2011 at 3:38 PM, Pieter Praet <pieter@praet.org> wrote:
> > Ok, even though my very first reply [1] may have created the impression
> > that I understood the issue, I clearly didn't...
> >
> > The test [2] needs a more applicable commit message, and the subsequent
> > patch [3] points more or less in the right direction, but the Message-Id
> > list should be local to the *search buffer* rather than to the
> > `notmuch-search-operate-all' function.
> >
> > `notmuch-search' could:
> >  - run "notmuch-command search" with the "--output=messages" option
> >    instead of a plain search,
> >  - maintain a buffer-local var with a list of returned Message-Id's,
> >  - ...and populate the buffer based on that list.
> >
> > As such we'd have -for each individual search buffer- a canonical list
> > of Message-Id's (i.e. messages which actually *match* the query AND are
> > currently visible in the search buffer), to be used by
> > `notmuch-search-operate-all' et al.
> >
> >
> > Peace
> >
> > --
> > Pieter
> >
> > [1] id:"87fwmuxxgd.fsf@praet.org"
> > [2] id:"1309450108-2793-2-git-send-email-pieter@praet.org"
> > [3] id:"1309450108-2793-1-git-send-email-pieter@praet.org"
>
> Ideally, this wouldn't be per-buffer, but per *line*.  This race
> equally affects adding and removing tags from individual results,
> since that's done using a thread: query, whose results could have
> changed since the original search.
>
> This almost certainly requires support from the notmuch core.  The
> good news is that the library already provides this information, so
> there will be virtually no performance hit for outputting it.

Actually, with a smidgeon of library support, you could even use document
IDs for this, rather than message IDs, which would make the tagging
operations (even '*') no more expensive than they are now.  (Of course, it
would be good to know just how much overhead going through message IDs
actually introduces.)

[-- Attachment #2: Type: text/html, Size: 2819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
  2011-07-01 16:37 [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command" Austin Clements
@ 2011-07-02 14:20 ` Pieter Praet
  2011-07-03 17:17   ` Austin Clements
  0 siblings, 1 reply; 27+ messages in thread
From: Pieter Praet @ 2011-07-02 14:20 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Fri, 1 Jul 2011 12:37:11 -0400, Austin Clements <amdragon@mit.edu> wrote:
Non-text part: multipart/alternative
> On Jul 1, 2011 10:55 AM, "Austin Clements" <amdragon@mit.edu> wrote:
> >
> > On Thu, Jun 30, 2011 at 3:38 PM, Pieter Praet <pieter@praet.org> wrote:
> > > Ok, even though my very first reply [1] may have created the impression
> > > that I understood the issue, I clearly didn't...
> > >
> > > The test [2] needs a more applicable commit message, and the subsequent
> > > patch [3] points more or less in the right direction, but the Message-Id
> > > list should be local to the *search buffer* rather than to the
> > > `notmuch-search-operate-all' function.
> > >
> > > `notmuch-search' could:
> > >  - run "notmuch-command search" with the "--output=messages" option
> > >    instead of a plain search,
> > >  - maintain a buffer-local var with a list of returned Message-Id's,
> > >  - ...and populate the buffer based on that list.
> > >
> > > As such we'd have -for each individual search buffer- a canonical list
> > > of Message-Id's (i.e. messages which actually *match* the query AND are
> > > currently visible in the search buffer), to be used by
> > > `notmuch-search-operate-all' et al.
> > >
> > >
> > > Peace
> > >
> > > --
> > > Pieter
> > >
> > > [1] id:"87fwmuxxgd.fsf@praet.org"
> > > [2] id:"1309450108-2793-2-git-send-email-pieter@praet.org"
> > > [3] id:"1309450108-2793-1-git-send-email-pieter@praet.org"
> >
> > Ideally, this wouldn't be per-buffer, but per *line*.  This race
> > equally affects adding and removing tags from individual results,
> > since that's done using a thread: query, whose results could have
> > changed since the original search.
> >
> > This almost certainly requires support from the notmuch core.  The
> > good news is that the library already provides this information, so
> > there will be virtually no performance hit for outputting it.
> 
> Actually, with a smidgeon of library support, you could even use document
> IDs for this, rather than message IDs, which would make the tagging
> operations (even '*') no more expensive than they are now.  (Of course, it
> would be good to know just how much overhead going through message IDs
> actually introduces.)
Non-text part: text/html


That would be awesome!

Though I'd rather leave the plumbing to someone sufficiently capable,
so, if leaving libnotmuch hacking out of the equation, how would one go
about doing this?

I (ignorantly) assume we'd get `notmuch-search-process-filter'
to append each line with an invisible field containing a list
of matching Message-Id's, presumably obtained via
`notmuch_thread_get_matched_messages' (@ lib/thread.cc) ?


Peace

-- 
Pieter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
  2011-07-02 14:20 ` Pieter Praet
@ 2011-07-03 17:17   ` Austin Clements
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
  0 siblings, 1 reply; 27+ messages in thread
From: Austin Clements @ 2011-07-03 17:17 UTC (permalink / raw)
  To: Pieter Praet; +Cc: notmuch

Quoth Pieter Praet on Jul 02 at  4:20 pm:
> On Fri, 1 Jul 2011 12:37:11 -0400, Austin Clements <amdragon@mit.edu> wrote:
> Non-text part: multipart/alternative
> > On Jul 1, 2011 10:55 AM, "Austin Clements" <amdragon@mit.edu> wrote:
> > >
> > > On Thu, Jun 30, 2011 at 3:38 PM, Pieter Praet <pieter@praet.org> wrote:
> > > > Ok, even though my very first reply [1] may have created the impression
> > > > that I understood the issue, I clearly didn't...
> > > >
> > > > The test [2] needs a more applicable commit message, and the subsequent
> > > > patch [3] points more or less in the right direction, but the Message-Id
> > > > list should be local to the *search buffer* rather than to the
> > > > `notmuch-search-operate-all' function.
> > > >
> > > > `notmuch-search' could:
> > > >  - run "notmuch-command search" with the "--output=messages" option
> > > >    instead of a plain search,
> > > >  - maintain a buffer-local var with a list of returned Message-Id's,
> > > >  - ...and populate the buffer based on that list.
> > > >
> > > > As such we'd have -for each individual search buffer- a canonical list
> > > > of Message-Id's (i.e. messages which actually *match* the query AND are
> > > > currently visible in the search buffer), to be used by
> > > > `notmuch-search-operate-all' et al.
> > > >
> > > >
> > > > Peace
> > > >
> > > >
> > > > [1] id:"87fwmuxxgd.fsf@praet.org"
> > > > [2] id:"1309450108-2793-2-git-send-email-pieter@praet.org"
> > > > [3] id:"1309450108-2793-1-git-send-email-pieter@praet.org"
> > >
> > > Ideally, this wouldn't be per-buffer, but per *line*.  This race
> > > equally affects adding and removing tags from individual results,
> > > since that's done using a thread: query, whose results could have
> > > changed since the original search.
> > >
> > > This almost certainly requires support from the notmuch core.  The
> > > good news is that the library already provides this information, so
> > > there will be virtually no performance hit for outputting it.
> > 
> > Actually, with a smidgeon of library support, you could even use document
> > IDs for this, rather than message IDs, which would make the tagging
> > operations (even '*') no more expensive than they are now.  (Of course, it
> > would be good to know just how much overhead going through message IDs
> > actually introduces.)
> Non-text part: text/html
> 
> 
> That would be awesome!
> 
> Though I'd rather leave the plumbing to someone sufficiently capable,
> so, if leaving libnotmuch hacking out of the equation, how would one go
> about doing this?
> 
> I (ignorantly) assume we'd get `notmuch-search-process-filter'
> to append each line with an invisible field containing a list
> of matching Message-Id's, presumably obtained via
> `notmuch_thread_get_matched_messages' (@ lib/thread.cc) ?

Here's a super-hacky patch that adds an "id:x or id:y or ..." query
string to the end of each search line, right after the list of tags.
This is just the C side; care to prototype the elisp side?

I also added a --stdin argument to notmuch tag that'll read the query
string from stdin, since these queries could get very long (though
ARG_MAX appears to be 2 megs these days, so maybe this wasn't
necessary for prototyping).

diff --git a/notmuch-search.c b/notmuch-search.c
index faccaf7..65fe438 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -190,6 +190,23 @@ format_thread_json (const void *ctx,
     talloc_free (ctx_quote);
 }
 
+static void
+show_message_ids (notmuch_messages_t *messages, notmuch_bool_t first)
+{
+    notmuch_message_t *message;
+
+    for (;
+	 notmuch_messages_valid (messages);
+	 notmuch_messages_move_to_next (messages)) {
+	if (!first)
+	    fputs (" or ", stdout);
+	first = FALSE;
+	message = notmuch_messages_get (messages);
+	printf ("id:\"%s\"", notmuch_message_get_message_id (message));
+	show_message_ids (notmuch_message_get_replies (message), FALSE);
+    }
+}
+
 static int
 do_search_threads (const search_format_t *format,
 		   notmuch_query_t *query,
@@ -252,6 +269,12 @@ do_search_threads (const search_format_t *format,
 
 	    fputs (format->tag_end, stdout);
 
+	    if (format == &format_text) {
+		notmuch_messages_t *toplevel = notmuch_thread_get_toplevel_messages (thread);
+		fputs (" ", stdout);
+		show_message_ids (toplevel, TRUE);
+	    }
+
 	    fputs (format->item_end, stdout);
 	}
 
diff --git a/notmuch-tag.c b/notmuch-tag.c
index 6204ae3..5609d02 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -30,6 +30,36 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
+static char*
+query_string_from_stdin (void *ctx)
+{
+    char *query_string = talloc_strdup (ctx, "");
+    char buf[4096];
+
+    if (query_string == NULL) {
+	fprintf (stderr, "Out of memory.\n");
+	return NULL;
+    }
+
+    while (1) {
+	ssize_t r = read(0, buf, sizeof (buf));
+	if (r < 0) {
+	    fprintf (stderr, "Error reading from stdin: %s\n",
+		     strerror (errno));
+	    return NULL;
+	} else if (r == 0) {
+	    break;
+	}
+	query_string = talloc_strndup_append (query_string, buf, r);
+	if (!query_string) {
+	    fprintf (stderr, "Out of memory.\n");
+	    return NULL;
+	}
+    }
+
+    return query_string;
+}
+
 int
 notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 {
@@ -44,6 +74,7 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
     notmuch_message_t *message;
     struct sigaction action;
     notmuch_bool_t synchronize_flags;
+    notmuch_bool_t use_stdin = FALSE;
     int i;
 
     /* Setup our handler for SIGINT */
@@ -70,7 +101,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	    i++;
 	    break;
 	}
-	if (argv[i][0] == '+') {
+	if (STRNCMP_LITERAL (argv[i], "--stdin") == 0) {
+	    use_stdin = TRUE;
+	} else if (argv[i][0] == '+') {
 	    add_tags[add_tags_count++] = i;
 	} else if (argv[i][0] == '-') {
 	    remove_tags[remove_tags_count++] = i;
@@ -84,7 +117,13 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
-    query_string = query_string_from_args (ctx, argc - i, &argv[i]);
+    if (use_stdin)
+	query_string = query_string_from_stdin (ctx);
+    else
+	query_string = query_string_from_args (ctx, argc - i, &argv[i]);
+
+    if (!query_string)
+	return 1;
 
     if (*query_string == '\0') {
 	fprintf (stderr, "Error: notmuch tag requires at least one search term.\n");

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PROTO] possible solution for "Race condition for '*' command"
  2011-07-03 17:17   ` Austin Clements
@ 2011-07-04  6:51     ` Pieter Praet
  2011-07-04  6:51       ` [PATCH 1/5] emacs: add property "matched-msgids" to each search result Pieter Praet
                         ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Thanks Austin!

Unfortunately, your patch causes *all* Message-Id's in the thread to be
appended, as opposed to only the ones matching the query:

#+BEGIN_EXAMPLE
$ notmuch search tag:inbox AND from:amdragon@mit.edu
thread:0000000000002777 Yest. 19:17 [1/3] Austin Clements| Pieter Praet;
[PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
(inbox replied sent to-me x/notmuch)
id:"CAH-f9WticM4EN8F1_ik_-mcBcBtrXwSpO+Drbtp7=UN7McECrg@mail.gmail.com"
or id:"87zkkwydag.fsf@praet.org" or id:"20110703171743.GL15901@mit.edu"
#+END_EXAMPLE

As you can see, according to matched/total ("[1/3]") only a single
message matches the query, yet all 3 MsgId's are returned.

If this were to be corrected (probably a trivial change, but I'm pretty
much oblivious as to the what and where of it), the following patch
series should work as intended.

The "--stdin" option works as expected (and ARG_MAX is indeed a very
valid concern with this particular use case), but I haven't yet gotten
around to making use of it from the Emacs UI as this would require some
screwing around with `notmuch-tag' and `notmuch-call-notmuch-process',
and it's still pretty early I-).

Peace

-- 
Pieter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/5] emacs: add property "matched-msgids" to each search result
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
@ 2011-07-04  6:51       ` Pieter Praet
  2011-07-04  6:51       ` [PATCH 2/5] emacs: add some functions to fetch the matched-msgids of a (region of) search result(s) Pieter Praet
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index f11ec24..674deb7 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -801,13 +801,14 @@ non-authors is found, assume that all of the authors match."
 	      (while more
 		(while (and (< line (length string)) (= (elt string line) ?\n))
 		  (setq line (1+ line)))
-		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" string line)
+		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\)) \\(.*\\)$" string line)
 		    (let* ((thread-id (match-string 1 string))
 			   (date (match-string 2 string))
 			   (count (match-string 3 string))
 			   (authors (match-string 4 string))
 			   (subject (match-string 5 string))
 			   (tags (match-string 6 string))
+			   (matched-msgids (match-string 7 string))
 			   (tag-list (if tags (save-match-data (split-string tags)))))
 		      (goto-char (point-max))
 		      (if (/= (match-beginning 1) line)
@@ -816,6 +817,7 @@ non-authors is found, assume that all of the authors match."
 			(notmuch-search-show-result date count authors subject tags)
 			(notmuch-search-color-line beg (point-marker) tag-list)
 			(put-text-property beg (point-marker) 'notmuch-search-thread-id thread-id)
+			(put-text-property beg (point-marker) 'notmuch-search-matched-msgids matched-msgids)
 			(put-text-property beg (point-marker) 'notmuch-search-authors authors)
 			(put-text-property beg (point-marker) 'notmuch-search-subject subject)
 			(if (string= thread-id notmuch-search-target-thread)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/5] emacs: add some functions to fetch the matched-msgids of a (region of) search result(s)
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
  2011-07-04  6:51       ` [PATCH 1/5] emacs: add property "matched-msgids" to each search result Pieter Praet
@ 2011-07-04  6:51       ` Pieter Praet
  2011-07-04  6:51       ` [PATCH 3/5] emacs: stashing (a region of) matched-msgids Pieter Praet
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index 674deb7..2338044 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -402,6 +402,14 @@ Complete list of currently available key bindings:
   "Return a list of threads for the current region"
   (notmuch-search-properties-in-region 'notmuch-search-thread-id beg end))
 
+(defun notmuch-search-find-matched-msgids ()
+  "Return the matched Message-Id's for the current thread"
+  (get-text-property (point) 'notmuch-search-matched-msgids))
+
+(defun notmuch-search-find-matched-msgids-region (beg end)
+  "Return a list of matched Message-Id's for the current region"
+  (notmuch-search-properties-in-region 'notmuch-search-matched-msgids beg end))
+
 (defun notmuch-search-find-authors ()
   "Return the authors for the current thread"
   (get-text-property (point) 'notmuch-search-authors))
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/5] emacs: stashing (a region of) matched-msgids
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
  2011-07-04  6:51       ` [PATCH 1/5] emacs: add property "matched-msgids" to each search result Pieter Praet
  2011-07-04  6:51       ` [PATCH 2/5] emacs: add some functions to fetch the matched-msgids of a (region of) search result(s) Pieter Praet
@ 2011-07-04  6:51       ` Pieter Praet
  2011-07-04  6:51       ` [PATCH 4/5] test: emacs: add/remove tags from all matching messages with `notmuch-search-operate-all' Pieter Praet
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index 2338044..46e276a 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -226,6 +226,7 @@ For a mouse binding, return nil."
 (defvar notmuch-search-stash-map
   (let ((map (make-sparse-keymap)))
     (define-key map "i" 'notmuch-search-stash-thread-id)
+    (define-key map "m" 'notmuch-search-stash-matched-msgids)
     map)
   "Submap for stash commands")
 (fset 'notmuch-search-stash-map notmuch-search-stash-map)
@@ -235,6 +236,19 @@ For a mouse binding, return nil."
   (interactive)
   (notmuch-common-do-stash (notmuch-search-find-thread-id)))
 
+(defun notmuch-search-stash-matched-msgids ()
+  "Copy matched Message-ID's of current thread (or threads in region) to kill-ring."
+  (interactive)
+  (save-excursion
+    (if (region-active-p)
+        (let* ((beg (region-beginning))
+               (end (region-end)))
+          (notmuch-common-do-stash
+           (mapconcat 'identity
+                      (notmuch-search-find-matched-msgids-region beg end)
+                      " or ")))
+      (notmuch-common-do-stash (notmuch-search-find-matched-msgids)))))
+
 (defvar notmuch-search-query-string)
 (defvar notmuch-search-target-thread)
 (defvar notmuch-search-target-line)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/5] test: emacs: add/remove tags from all matching messages with `notmuch-search-operate-all'
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
                         ` (2 preceding siblings ...)
  2011-07-04  6:51       ` [PATCH 3/5] emacs: stashing (a region of) matched-msgids Pieter Praet
@ 2011-07-04  6:51       ` Pieter Praet
  2011-07-04  6:51       ` [PATCH 5/5] emacs: make `notmuch-search-operate-all' use matched-msgids instead of the original query string Pieter Praet
  2011-07-04 17:56       ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
  5 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 test/emacs-search-operate-all |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)
 create mode 100755 test/emacs-search-operate-all

diff --git a/test/emacs-search-operate-all b/test/emacs-search-operate-all
new file mode 100755
index 0000000..ff3e701
--- /dev/null
+++ b/test/emacs-search-operate-all
@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+
+test_description="emacs interface"
+. test-lib.sh
+
+EXPECTED=$TEST_DIRECTORY/emacs.expected-output
+
+add_email_corpus
+
+test_begin_subtest "Add/remove tags from all matching messages."
+test_emacs '(notmuch-search "tag:inbox AND tags")
+	    (notmuch-test-wait)
+	    (notmuch-search-operate-all "+matching -inbox")
+	    (notmuch-search "tag:matching")
+	    (notmuch-test-wait)
+	    (test-output)'
+cat <<EOF >EXPECTED
+  2009-11-18 [3/3]   Adrian Perez de Castro, Keith Packard, Carl Worth  [notmuch] Introducing myself (matching signed unread)
+  2009-11-18 [1/3]   Carl Worth, Israel Herraiz, Keith Packard       [notmuch] New to the list (inbox matching unread)
+  2009-11-18 [2/2]   Keith Packard, Carl Worth    [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (matching unread)
+  2009-11-18 [1/2]   Keith Packard, Alexander Botero-Lowry    [notmuch] [PATCH] Create a default notmuch-show-hook that highlights URLs and uses word-wrap (inbox matching unread)
+  2009-11-18 [1/1]   Jan Janak            [notmuch] [PATCH] notmuch new: Support for conversion of spool subdirectories into tags (matching unread)
+  2009-11-18 [1/1]   Stewart Smith        [notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs. (matching unread)
+  2009-11-17 [1/2]   Ingmar Vanhassel, Carl Worth  [notmuch] [PATCH] Typsos (inbox matching unread)
+End of search results.
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_done
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/5] emacs: make `notmuch-search-operate-all' use matched-msgids instead of the original query string
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
                         ` (3 preceding siblings ...)
  2011-07-04  6:51       ` [PATCH 4/5] test: emacs: add/remove tags from all matching messages with `notmuch-search-operate-all' Pieter Praet
@ 2011-07-04  6:51       ` Pieter Praet
  2011-07-04 17:56       ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
  5 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-04  6:51 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index 46e276a..0d040a2 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -869,7 +869,10 @@ Each character of the tag name may consist of alphanumeric
 characters as well as `_.+-'.
 "
   (interactive "sOperation (+add -drop): notmuch tag ")
-  (let ((action-split (split-string action " +")))
+  (let ((action-split (split-string action " +"))
+        (msgids (mapconcat 'identity
+                           (notmuch-search-find-matched-msgids-region (point-min) (- (point-max) 2))
+                           " or ")))
     ;; Perform some validation
     (let ((words action-split))
       (when (null words) (error "No operation given"))
@@ -877,7 +880,7 @@ characters as well as `_.+-'.
 	(unless (string-match-p "^[-+][-+_.[:word:]]+$" (car words))
 	  (error "Action must be of the form `+thistag -that_tag'"))
 	(setq words (cdr words))))
-    (apply 'notmuch-tag notmuch-search-query-string action-split)))
+    (apply 'notmuch-tag msgids action-split)))
 
 (defun notmuch-search-buffer-title (query)
   "Returns the title for a buffer with notmuch search results."
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-07-04  6:51     ` [PROTO] " Pieter Praet
                         ` (4 preceding siblings ...)
  2011-07-04  6:51       ` [PATCH 5/5] emacs: make `notmuch-search-operate-all' use matched-msgids instead of the original query string Pieter Praet
@ 2011-07-04 17:56       ` Austin Clements
  2011-07-04 18:48         ` Pieter Praet
  5 siblings, 1 reply; 27+ messages in thread
From: Austin Clements @ 2011-07-04 17:56 UTC (permalink / raw)
  To: Pieter Praet; +Cc: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 2042 bytes --]

Ah, this is subtler than I thought. You're right that, for the '*' command, you want only the matched ID's. Tagging a region, however, operates on entire threads. I think it's important to retain this behavior because threads are what the user sees and selects in the search buffer (*, on the other hand, doesn't have a strong visual analogue).

(This also means * is not, in fact, equivalent to expanding the region the the entire buffer and then tagging the region.)

I should probably emit two lists per thread: one of matched IDs and one of unmatched IDs. Tagging a region can then operate on the concatenation of these, while * can operate only on the matched lists. This should be easy to do. I'll send an updated patch when I'm back at a computer.
-- 
Sent from my Android. Please excuse my brevity.

Pieter Praet <pieter@praet.org> wrote:

Thanks Austin!

Unfortunately, your patch causes *all* Message-Id's in the thread to be
appended, as opposed to only the ones matching the query:

#+BEGIN_EXAMPLE
$ notmuch search tag:inbox AND from:amdragon@mit.edu
thread:0000000000002777 Yest. 19:17 [1/3] Austin Clements| Pieter Praet;
[PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
(inbox replied sent to-me x/notmuch)
id:"CAH-f9WticM4EN8F1_ik_-mcBcBtrXwSpO+Drbtp7=UN7McECrg@mail.gmail.com"
or id:"87zkkwydag.fsf@praet.org" or id:"20110703171743.GL15901@mit.edu"
#+END_EXAMPLE

As you can see, according to matched/total ("[1/3]") only a single
message matches the query, yet all 3 MsgId's are returned.

If this were to be corrected (probably a trivial change, but I'm pretty
much oblivious as to the what and where of it), the following patch
series should work as intended.

The "--stdin" option works as expected (and ARG_MAX is indeed a very
valid concern with this particular use case), but I haven't yet gotten
around to making use of it from the Emacs UI as this would require some
screwing around with `notmuch-tag' and `notmuch-call-notmuch-process',
and it's still pretty early I-).

Peace

-- 
Pieter

[-- Attachment #2: Type: text/html, Size: 2575 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-07-04 17:56       ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
@ 2011-07-04 18:48         ` Pieter Praet
  2011-07-05 19:04           ` Pieter Praet
  0 siblings, 1 reply; 27+ messages in thread
From: Pieter Praet @ 2011-07-04 18:48 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

On Mon, 04 Jul 2011 13:56:26 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
Non-text part: multipart/alternative

> Ah, this is subtler than I thought. You're right that, for the '*'
> command, you want only the matched ID's. Tagging a region, however,
> operates on entire threads. I think it's important to retain this
> behavior because threads are what the user sees and selects in the
> search buffer (*, on the other hand, doesn't have a strong visual
> analogue).

Duly noted.

> (This also means * is not, in fact, equivalent to expanding the region
> the the entire buffer and then tagging the region.)

In fact it *is* but we'd be fetching the `matched-msgids' property
rather than the `thread-id' property, for each line in region.

> I should probably emit two lists per thread: one of matched IDs and
> one of unmatched IDs. Tagging a region can then operate on the
> concatenation of these, while * can operate only on the matched
> lists. This should be easy to do. I'll send an updated patch when I'm
> back at a computer.

The matched MsgIds will be sufficient, as we'll want to operate on
either the matched messages or the entire thread (for which the
`thread-id' property is already present).

Can't think of a use case for non-matched messages right now,
but if required, we'll just use `set-exclusive-or'.

> -- 
> Sent from my Android. Please excuse my brevity.

Tip: K-9 Mail [1], apart from being much better than the stock email client,
     supports bottom-posting :)

> Pieter Praet <pieter@praet.org> wrote:
> 
> Thanks Austin!
> 
> Unfortunately, your patch causes *all* Message-Id's in the thread to be
> appended, as opposed to only the ones matching the query:
> 
> #+BEGIN_EXAMPLE
> $ notmuch search tag:inbox AND from:amdragon@mit.edu
> thread:0000000000002777 Yest. 19:17 [1/3] Austin Clements| Pieter Praet;
> [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command"
> (inbox replied sent to-me x/notmuch)
> id:"CAH-f9WticM4EN8F1_ik_-mcBcBtrXwSpO+Drbtp7=UN7McECrg@mail.gmail.com"
> or id:"87zkkwydag.fsf@praet.org" or id:"20110703171743.GL15901@mit.edu"
> #+END_EXAMPLE
> 
> As you can see, according to matched/total ("[1/3]") only a single
> message matches the query, yet all 3 MsgId's are returned.
> 
> If this were to be corrected (probably a trivial change, but I'm pretty
> much oblivious as to the what and where of it), the following patch
> series should work as intended.
> 
> The "--stdin" option works as expected (and ARG_MAX is indeed a very
> valid concern with this particular use case), but I haven't yet gotten
> around to making use of it from the Emacs UI as this would require some
> screwing around with `notmuch-tag' and `notmuch-call-notmuch-process',
> and it's still pretty early I-).
> 
> Peace
> 
> -- 
> Pieter
> 
Non-text part: text/html


Peace

-- 
Pieter

[1] http://code.google.com/p/k9mail/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-07-04 18:48         ` Pieter Praet
@ 2011-07-05 19:04           ` Pieter Praet
  2011-07-05 21:42             ` Austin Clements
  0 siblings, 1 reply; 27+ messages in thread
From: Pieter Praet @ 2011-07-05 19:04 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

On Mon, 04 Jul 2011 20:48:12 +0200, Pieter Praet <pieter@praet.org> wrote:
> On Mon, 04 Jul 2011 13:56:26 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > I should probably emit two lists per thread: one of matched IDs and
> > one of unmatched IDs. Tagging a region can then operate on the
> > concatenation of these, while * can operate only on the matched
> > lists. This should be easy to do. I'll send an updated patch when I'm
> > back at a computer.
> 
> The matched MsgIds will be sufficient, as we'll want to operate on
> either the matched messages or the entire thread (for which the
> `thread-id' property is already present).
> 
> Can't think of a use case for non-matched messages right now,
> but if required, we'll just use `set-exclusive-or'.

Wasn't thinking clearly:

You're right, we *will* be needing both a list of matched as well as one
of unmatched Message-Id's per result. Otherwise there would still be a
potential race condition when tagging with +/-.


Peace

-- 
Pieter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-07-05 19:04           ` Pieter Praet
@ 2011-07-05 21:42             ` Austin Clements
  2011-07-10 14:11               ` [PATCH] emacs: bad regexp @ `notmuch-search-process-filter' Pieter Praet
                                 ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Austin Clements @ 2011-07-05 21:42 UTC (permalink / raw)
  To: Pieter Praet; +Cc: Notmuch Mail

Quoth Pieter Praet on Jul 05 at  9:04 pm:
> On Mon, 04 Jul 2011 20:48:12 +0200, Pieter Praet <pieter@praet.org> wrote:
> > On Mon, 04 Jul 2011 13:56:26 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > > I should probably emit two lists per thread: one of matched IDs and
> > > one of unmatched IDs. Tagging a region can then operate on the
> > > concatenation of these, while * can operate only on the matched
> > > lists. This should be easy to do. I'll send an updated patch when I'm
> > > back at a computer.
> > 
> > The matched MsgIds will be sufficient, as we'll want to operate on
> > either the matched messages or the entire thread (for which the
> > `thread-id' property is already present).
> > 
> > Can't think of a use case for non-matched messages right now,
> > but if required, we'll just use `set-exclusive-or'.
> 
> Wasn't thinking clearly:
> 
> You're right, we *will* be needing both a list of matched as well as one
> of unmatched Message-Id's per result. Otherwise there would still be a
> potential race condition when tagging with +/-.

Yes, exactly.  (I had originally thought this race was a strict
superset of the '*' race; I now realize that's not the case, but
they're related enough that they might as well be addressed together.)

Below is an updated patch that separates the matched and unmatched
ID's (it's ugly, but no point in cleaning it up since it's a
prototype).  Now the tag list on each search line is followed by
something that looks like

  (id:x or id:y) or id:z

or just

  (id:x or id:y)

where the parenthesized part of the query is for the matched messages
and the (possibly empty) unparenthesized part is for the non-matched
messages.  This is designed to be easy for emacs to parse: grab just
the parenthesized part for the queries used by '*' and the whole thing
for region tagging queries.  This should also eliminate corner cases
for pasting together multiple queries with "or".

diff --git a/notmuch-search.c b/notmuch-search.c
index faccaf7..2288eb7 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -190,6 +190,28 @@ format_thread_json (const void *ctx,
     talloc_free (ctx_quote);
 }
 
+static void
+show_message_ids (notmuch_messages_t *messages, const char **first,
+		  notmuch_bool_t match)
+{
+    notmuch_message_t *message;
+
+    for (;
+	 notmuch_messages_valid (messages);
+	 notmuch_messages_move_to_next (messages)) {
+	message = notmuch_messages_get (messages);
+	if (notmuch_message_get_flag (message, NOTMUCH_MESSAGE_FLAG_MATCH) == match) {
+	    if (*first)
+		fputs (*first, stdout);
+	    else
+		fputs (" or ", stdout);
+	    *first = NULL;
+	    printf ("id:\"%s\"", notmuch_message_get_message_id (message));
+	}
+	show_message_ids (notmuch_message_get_replies (message), first, match);
+    }
+}
+
 static int
 do_search_threads (const search_format_t *format,
 		   notmuch_query_t *query,
@@ -252,6 +274,22 @@ do_search_threads (const search_format_t *format,
 
 	    fputs (format->tag_end, stdout);
 
+	    if (format == &format_text) {
+		notmuch_messages_t *toplevel;
+		const char *first;
+
+		toplevel = notmuch_thread_get_toplevel_messages (thread);
+		first = " (";
+		show_message_ids (toplevel, &first, TRUE);
+		if (first)
+		    INTERNAL_ERROR ("No matched messages");
+		fputs (")", stdout);
+
+		toplevel = notmuch_thread_get_toplevel_messages (thread);
+		first = " or ";
+		show_message_ids (toplevel, &first, FALSE);
+	    }
+
 	    fputs (format->item_end, stdout);
 	}
 
diff --git a/notmuch-tag.c b/notmuch-tag.c
index 6204ae3..5609d02 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -30,6 +30,36 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
+static char*
+query_string_from_stdin (void *ctx)
+{
+    char *query_string = talloc_strdup (ctx, "");
+    char buf[4096];
+
+    if (query_string == NULL) {
+	fprintf (stderr, "Out of memory.\n");
+	return NULL;
+    }
+
+    while (1) {
+	ssize_t r = read(0, buf, sizeof (buf));
+	if (r < 0) {
+	    fprintf (stderr, "Error reading from stdin: %s\n",
+		     strerror (errno));
+	    return NULL;
+	} else if (r == 0) {
+	    break;
+	}
+	query_string = talloc_strndup_append (query_string, buf, r);
+	if (!query_string) {
+	    fprintf (stderr, "Out of memory.\n");
+	    return NULL;
+	}
+    }
+
+    return query_string;
+}
+
 int
 notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 {
@@ -44,6 +74,7 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
     notmuch_message_t *message;
     struct sigaction action;
     notmuch_bool_t synchronize_flags;
+    notmuch_bool_t use_stdin = FALSE;
     int i;
 
     /* Setup our handler for SIGINT */
@@ -70,7 +101,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	    i++;
 	    break;
 	}
-	if (argv[i][0] == '+') {
+	if (STRNCMP_LITERAL (argv[i], "--stdin") == 0) {
+	    use_stdin = TRUE;
+	} else if (argv[i][0] == '+') {
 	    add_tags[add_tags_count++] = i;
 	} else if (argv[i][0] == '-') {
 	    remove_tags[remove_tags_count++] = i;
@@ -84,7 +117,13 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
-    query_string = query_string_from_args (ctx, argc - i, &argv[i]);
+    if (use_stdin)
+	query_string = query_string_from_stdin (ctx);
+    else
+	query_string = query_string_from_args (ctx, argc - i, &argv[i]);
+
+    if (!query_string)
+	return 1;
 
     if (*query_string == '\0') {
 	fprintf (stderr, "Error: notmuch tag requires at least one search term.\n");

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-05 21:42             ` Austin Clements
@ 2011-07-10 14:11               ` Pieter Praet
  2011-07-11 20:43               ` [PATCH v2] " Pieter Praet
  2011-08-03 20:47               ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
  2 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-10 14:11 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Ok, I've got everything pretty much ready to go, minus the crux:
the regexp @ `notmuch-search-process-filter' :)

My misguided attempt seems to work perfectly fine with `re-builder' on a
large sample of my mail store, but in the actual search buffer and tests,
the Message-Ids appear to get cut off at some arbitrary point.

This patch introduces the following changes (on top of Austin's 2nd patch):
- notmuch-search.c: to prevent a slew of failing tests obscuring
  interesting failures, only output Message-Ids when supplied with the
  `--output=summary-ids' option.
- emacs/notmuch.el: add property `msgids' to every result in search buffer

After applying it, two tests fail, at different places, even though they
output the exact same results (albeit with a different sort order) :
- Basic notmuch-search view in emacs
- Navigation of notmuch-hello to search results

Could someone please provide my ignorant @$$ with a regexp that works?

Peace

Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el |    5 ++++-
 notmuch-search.c |    6 +++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index f11ec24..501c1a2 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -801,13 +801,14 @@ non-authors is found, assume that all of the authors match."
 	      (while more
 		(while (and (< line (length string)) (= (elt string line) ?\n))
 		  (setq line (1+ line)))
-		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" string line)
+		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\)) \\(.*\\)$" string line)
 		    (let* ((thread-id (match-string 1 string))
 			   (date (match-string 2 string))
 			   (count (match-string 3 string))
 			   (authors (match-string 4 string))
 			   (subject (match-string 5 string))
 			   (tags (match-string 6 string))
+			   (msgids (match-string 7 string))
 			   (tag-list (if tags (save-match-data (split-string tags)))))
 		      (goto-char (point-max))
 		      (if (/= (match-beginning 1) line)
@@ -816,6 +817,7 @@ non-authors is found, assume that all of the authors match."
 			(notmuch-search-show-result date count authors subject tags)
 			(notmuch-search-color-line beg (point-marker) tag-list)
 			(put-text-property beg (point-marker) 'notmuch-search-thread-id thread-id)
+			(put-text-property beg (point-marker) 'notmuch-search-msgids msgids)
 			(put-text-property beg (point-marker) 'notmuch-search-authors authors)
 			(put-text-property beg (point-marker) 'notmuch-search-subject subject)
 			(if (string= thread-id notmuch-search-target-thread)
@@ -913,6 +915,7 @@ The optional parameters are used as follows:
 	(let ((proc (start-process
 		     "notmuch-search" buffer
 		     notmuch-command "search"
+		     "--output=summary-ids"
 		     (if oldest-first
 			 "--sort=oldest-first"
 		       "--sort=newest-first")
diff --git a/notmuch-search.c b/notmuch-search.c
index 2288eb7..b3af88b 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -22,6 +22,7 @@
 
 typedef enum {
     OUTPUT_SUMMARY,
+    OUTPUT_SUMMARY_IDS,
     OUTPUT_THREADS,
     OUTPUT_MESSAGES,
     OUTPUT_FILES,
@@ -274,7 +275,7 @@ do_search_threads (const search_format_t *format,
 
 	    fputs (format->tag_end, stdout);
 
-	    if (format == &format_text) {
+	    if (format == &format_text && output == OUTPUT_SUMMARY_IDS) {
 		notmuch_messages_t *toplevel;
 		const char *first;
 
@@ -462,6 +463,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
 	    opt = argv[i] + sizeof ("--output=") - 1;
 	    if (strcmp (opt, "summary") == 0) {
 		output = OUTPUT_SUMMARY;
+	    } else if (strcmp (opt, "summary-ids") == 0) {
+		output = OUTPUT_SUMMARY_IDS;
 	    } else if (strcmp (opt, "threads") == 0) {
 		output = OUTPUT_THREADS;
 	    } else if (strcmp (opt, "messages") == 0) {
@@ -513,6 +516,7 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
     switch (output) {
     default:
     case OUTPUT_SUMMARY:
+    case OUTPUT_SUMMARY_IDS:
     case OUTPUT_THREADS:
 	ret = do_search_threads (format, query, sort, output);
 	break;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-05 21:42             ` Austin Clements
  2011-07-10 14:11               ` [PATCH] emacs: bad regexp @ `notmuch-search-process-filter' Pieter Praet
@ 2011-07-11 20:43               ` Pieter Praet
  2011-07-11 21:05                 ` Austin Clements
  2011-08-03 20:47               ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
  2 siblings, 1 reply; 27+ messages in thread
From: Pieter Praet @ 2011-07-11 20:43 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail


TL;DR: I can haz regex pl0x?


I've updated the regex a bit to prevent it from choking on the whole
"parens in subject vs. parens around tags vs. parens around matching
Message-Id's" deal, but it still causes errors in the search buffer due
to the list of Message-Id's being cut off at a seemingly arbitrary point,
and this for *different* results on pretty much every refresh.

All tests pass, except for the ones I've mentioned before [1] (which
don't test anything tagging-related, just `notmuch-show'), though even
those fail to fail consistently :<. This variability can't be related to
residual files, as I always run the test suite like this:

  rm -rf /dev/shm/notmuch/* && make test OPTIONS="--root=/dev/shm/notmuch"


So, to keep this on track for 0.7 whilst keeping myself from having to
send spammerific amounts of patches, I've squashed the whole deal in
this single patch. Don't worry though, it's all quite grokkable:


notmuch-search.c

- Make `notmuch search' *only* append results with their Message-Id's
  when supplied with the "--output=summary-ids" option, to prevent a
  slew of failing tests obscuring the relevant ones.


emacs/notmuch.el

- Make `notmuch-search' run the notmuch binary with the "--output=summary-ids"
  option, to receive search results appended with their lists of Message-Id's.

- Update the regex @ `notmuch-search-process-filter' to include a new atom,
  which matches the list of Message-Id's at the end of every search result
  returned by the notmuch binary.
  To each individual result in the search buffer, this matched string is
  added as a text property called `notmuch-search-msgids'.

- Add 2 functions to return the `notmuch-search-msgids' property of search
  results: `notmuch-search-find-msgids', `notmuch-search-find-msgids-region'.

- Add a function to stash the Message-Id's of (a region of) search results:
  `notmuch-search-stash-msgids', bound to "m" in `notmuch-search-stash-map'.
  Mainly for testing purposes.

- Merge `notmuch-call-notmuch-process' into `notmuch-tag':
  `notmuch-tag' was the only thing making use of `notmuch-call-notmuch-process',
  and the extra layer of abstraction would complicate making `notmuch-tag' send
  arguments on stdin (see next point).

- Make `notmuch-tag' send its query string on stdin:
  Instead of providing the query string as a (potentially very long) command
  line argument, `notmuch-tag' now dumps it into a temporary buffer, which
  `call-process-region' sends to `notmuch-command' on stdin.
  This is needed to circumvent "$command: arg list too long" errors due
  to command line argument length limitations imposed by the kernel (ARG_MAX).

- Fix the actual bug(s) this patch series is intended to address by making
  the tagging functions procure their targets from the `notmuch-search-msgids'
  property with `notmuch-search-find-msgids-region', instead of using x:
  - `notmuch-search-add-tag-region'    (x = `notmuch-search-find-thread-id-region')
  - `notmuch-search-remove-tag-region' (x = `notmuch-search-find-thread-id-region')
  - `notmuch-search-remove-tag'        (x = `notmuch-search-find-thread-id-region')
  - `notmuch-search-operate-all'       (x = `notmuch-search-query-string')


test/emacs, test/emacs-search-operate-all, test/notmuch-test

- Expand the test suite to also cover:
  - Tagging messages with `notmuch-search-operate-all'.
  - Tagging messages to which a reply is sent. For this I also needed to
    correct the title of an existing test, and add a test for sending
    replies from within Emacs.


Side note:
  After playing around with Austin's new patch for a bit, I've come to the
  conclusion that making a clear distinction between matched and unmatched
  messages in the binary's output *is* the way to go, but in the case of
  `notmuch-search-operate-all', this capability shouldn't be leveraged.

  The way `notmuch-search-operate-all' currently works, i.e. operate on
  matched *messages* instead of matched *threads*, is not only counter-
  intuitive (same as it would be for `notmuch-search-add-tag' and
  `notmuch-search-remove-tag' [2]), but semantically incorrect as well:
  Its name implies operating on *all* that is visible in the current
  buffer, instead of only a subset.


Peace

[1] id:"1310307099-25197-1-git-send-email-pieter@praet.org"
[2] id:"e8c5fbf4-4dfa-461a-8f5c-6c696291a270@email.android.com"


Signed-off-by: Pieter Praet <pieter@praet.org>
---
 emacs/notmuch.el              |   93 ++++++++++++++++++++++++++++-------------
 notmuch-search.c              |    6 ++-
 test/emacs                    |   49 +++++++++++++++++++++-
 test/emacs-search-operate-all |   29 +++++++++++++
 test/notmuch-test             |    1 +
 5 files changed, 147 insertions(+), 31 deletions(-)
 create mode 100755 test/emacs-search-operate-all

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index f11ec24..400adcc 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -226,6 +226,7 @@ For a mouse binding, return nil."
 (defvar notmuch-search-stash-map
   (let ((map (make-sparse-keymap)))
     (define-key map "i" 'notmuch-search-stash-thread-id)
+    (define-key map "m" 'notmuch-search-stash-msgids)
     map)
   "Submap for stash commands")
 (fset 'notmuch-search-stash-map notmuch-search-stash-map)
@@ -235,6 +236,19 @@ For a mouse binding, return nil."
   (interactive)
   (notmuch-common-do-stash (notmuch-search-find-thread-id)))
 
+(defun notmuch-search-stash-msgids ()
+  "Copy all Message-ID's in currently selected thread(s) to kill-ring."
+  (interactive)
+  (save-excursion
+    (if (region-active-p)
+        (let* ((beg (region-beginning))
+               (end (region-end)))
+          (notmuch-common-do-stash
+           (mapconcat 'identity
+                      (notmuch-search-find-msgids-region beg end)
+                      " or ")))
+      (notmuch-common-do-stash (notmuch-search-find-msgids)))))
+
 (defvar notmuch-search-query-string)
 (defvar notmuch-search-target-thread)
 (defvar notmuch-search-target-line)
@@ -402,6 +416,14 @@ Complete list of currently available key bindings:
   "Return a list of threads for the current region"
   (notmuch-search-properties-in-region 'notmuch-search-thread-id beg end))
 
+(defun notmuch-search-find-msgids ()
+  "Return all Message-Id's for the current thread"
+  (get-text-property (point) 'notmuch-search-msgids))
+
+(defun notmuch-search-find-msgids-region (beg end)
+  "Return a list of all Message-Id's for the threads in the current region"
+  (notmuch-search-properties-in-region 'notmuch-search-msgids beg end))
+
 (defun notmuch-search-find-authors ()
   "Return the authors for the current thread"
   (get-text-property (point) 'notmuch-search-authors))
@@ -448,23 +470,6 @@ Complete list of currently available key bindings:
   (let ((message-id (notmuch-search-find-thread-id)))
     (notmuch-mua-new-reply message-id prompt-for-sender)))
 
-(defun notmuch-call-notmuch-process (&rest args)
-  "Synchronously invoke \"notmuch\" with the given list of arguments.
-
-Output from the process will be presented to the user as an error
-and will also appear in a buffer named \"*Notmuch errors*\"."
-  (let ((error-buffer (get-buffer-create "*Notmuch errors*")))
-    (with-current-buffer error-buffer
-	(erase-buffer))
-    (if (eq (apply 'call-process notmuch-command nil error-buffer nil args) 0)
-	(point)
-      (progn
-	(with-current-buffer error-buffer
-	  (let ((beg (point-min))
-		(end (- (point-max) 1)))
-	    (error (buffer-substring beg end))
-	    ))))))
-
 (defun notmuch-tag (query &rest tags)
   "Add/remove tags in TAGS to messages matching QUERY.
 
@@ -476,8 +481,32 @@ messages instead of running (notmuch-call-notmuch-process \"tag\" ..)
 directly, so that hooks specified in notmuch-before-tag-hook and
 notmuch-after-tag-hook will be run."
   (run-hooks 'notmuch-before-tag-hook)
-  (apply 'notmuch-call-notmuch-process
-	 (append (list "tag") tags (list "--" query)))
+
+  (let ((query-buffer (get-buffer-create "*Notmuch query*"))
+        (error-buffer (get-buffer-create "*Notmuch errors*")))
+    (with-current-buffer error-buffer
+	(erase-buffer))
+
+    (with-current-buffer query-buffer
+        (erase-buffer)
+        (insert query)
+
+    (if (eq
+	 (apply 'call-process-region
+		(append
+		 (list (point-min) (point-max) notmuch-command nil error-buffer nil)
+		 (list "tag" "--stdin") tags))
+	 0)
+	(point)
+      (progn
+	(with-current-buffer error-buffer
+	  (let ((beg (point-min))
+		(end (- (point-max) 1)))
+	    (error (buffer-substring beg end))
+	    )))))
+
+    (kill-buffer query-buffer))
+
   (run-hooks 'notmuch-after-tag-hook))
 
 (defcustom notmuch-before-tag-hook nil
@@ -541,7 +570,7 @@ the messages that were tagged"
   (notmuch-search-add-tag-region tag (point) (point)))
 
 (defun notmuch-search-add-tag-region (tag beg end)
-  (let ((search-id-string (mapconcat 'identity (notmuch-search-find-thread-id-region beg end) " or ")))
+  (let ((search-id-string (mapconcat 'identity (notmuch-search-find-msgids-region beg end) " or ")))
     (notmuch-tag search-id-string (concat "+" tag))
     (save-excursion
       (let ((last-line (line-number-at-pos end))
@@ -555,7 +584,7 @@ the messages that were tagged"
   (notmuch-search-remove-tag-region tag (point) (point)))
 
 (defun notmuch-search-remove-tag-region (tag beg end)
-  (let ((search-id-string (mapconcat 'identity (notmuch-search-find-thread-id-region beg end) " or ")))
+  (let ((search-id-string (mapconcat 'identity (notmuch-search-find-msgids-region beg end) " or ")))
     (notmuch-tag search-id-string (concat "-" tag))
     (save-excursion
       (let ((last-line (line-number-at-pos end))
@@ -589,9 +618,9 @@ thread or threads in the current region."
 	  "Tag to remove: "
 	  (if (region-active-p)
 	      (mapconcat 'identity
-			 (notmuch-search-find-thread-id-region (region-beginning) (region-end))
+			 (notmuch-search-find-msgids-region (region-beginning) (region-end))
 			 " ")
-	    (notmuch-search-find-thread-id)))))
+	    (notmuch-search-find-msgids)))))
   (save-excursion
     (if (region-active-p)
 	(let* ((beg (region-beginning))
@@ -801,13 +830,14 @@ non-authors is found, assume that all of the authors match."
 	      (while more
 		(while (and (< line (length string)) (= (elt string line) ?\n))
 		  (setq line (1+ line)))
-		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" string line)
+		(if (string-match "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\)) \\((.*\\(or.*\\)*\\)$" string line)
 		    (let* ((thread-id (match-string 1 string))
 			   (date (match-string 2 string))
 			   (count (match-string 3 string))
 			   (authors (match-string 4 string))
 			   (subject (match-string 5 string))
 			   (tags (match-string 6 string))
+			   (msgids (match-string 7 string))
 			   (tag-list (if tags (save-match-data (split-string tags)))))
 		      (goto-char (point-max))
 		      (if (/= (match-beginning 1) line)
@@ -816,6 +846,7 @@ non-authors is found, assume that all of the authors match."
 			(notmuch-search-show-result date count authors subject tags)
 			(notmuch-search-color-line beg (point-marker) tag-list)
 			(put-text-property beg (point-marker) 'notmuch-search-thread-id thread-id)
+			(put-text-property beg (point-marker) 'notmuch-search-msgids msgids)
 			(put-text-property beg (point-marker) 'notmuch-search-authors authors)
 			(put-text-property beg (point-marker) 'notmuch-search-subject subject)
 			(if (string= thread-id notmuch-search-target-thread)
@@ -834,10 +865,10 @@ non-authors is found, assume that all of the authors match."
       (delete-process proc))))
 
 (defun notmuch-search-operate-all (action)
-  "Add/remove tags from all matching messages.
+  "Add/remove tags to/from all threads in current search buffer.
 
-This command adds or removes tags from all messages matching the
-current search terms. When called interactively, this command
+This command adds or removes tags from all threads displayed in
+the current search buffer. When called interactively, this command
 will prompt for tags to be added or removed. Tags prefixed with
 '+' will be added and tags prefixed with '-' will be removed.
 
@@ -845,7 +876,10 @@ Each character of the tag name may consist of alphanumeric
 characters as well as `_.+-'.
 "
   (interactive "sOperation (+add -drop): notmuch tag ")
-  (let ((action-split (split-string action " +")))
+  (let ((action-split (split-string action " +"))
+        (msgids (mapconcat 'identity
+                           (notmuch-search-find-msgids-region (point-min) (- (point-max) 2))
+                           " or ")))
     ;; Perform some validation
     (let ((words action-split))
       (when (null words) (error "No operation given"))
@@ -853,7 +887,7 @@ characters as well as `_.+-'.
 	(unless (string-match-p "^[-+][-+_.[:word:]]+$" (car words))
 	  (error "Action must be of the form `+thistag -that_tag'"))
 	(setq words (cdr words))))
-    (apply 'notmuch-tag notmuch-search-query-string action-split)))
+    (apply 'notmuch-tag msgids action-split)))
 
 (defun notmuch-search-buffer-title (query)
   "Returns the title for a buffer with notmuch search results."
@@ -913,6 +947,7 @@ The optional parameters are used as follows:
 	(let ((proc (start-process
 		     "notmuch-search" buffer
 		     notmuch-command "search"
+		     "--output=summary-ids"
 		     (if oldest-first
 			 "--sort=oldest-first"
 		       "--sort=newest-first")
diff --git a/notmuch-search.c b/notmuch-search.c
index 2288eb7..b3af88b 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -22,6 +22,7 @@
 
 typedef enum {
     OUTPUT_SUMMARY,
+    OUTPUT_SUMMARY_IDS,
     OUTPUT_THREADS,
     OUTPUT_MESSAGES,
     OUTPUT_FILES,
@@ -274,7 +275,7 @@ do_search_threads (const search_format_t *format,
 
 	    fputs (format->tag_end, stdout);
 
-	    if (format == &format_text) {
+	    if (format == &format_text && output == OUTPUT_SUMMARY_IDS) {
 		notmuch_messages_t *toplevel;
 		const char *first;
 
@@ -462,6 +463,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
 	    opt = argv[i] + sizeof ("--output=") - 1;
 	    if (strcmp (opt, "summary") == 0) {
 		output = OUTPUT_SUMMARY;
+	    } else if (strcmp (opt, "summary-ids") == 0) {
+		output = OUTPUT_SUMMARY_IDS;
 	    } else if (strcmp (opt, "threads") == 0) {
 		output = OUTPUT_THREADS;
 	    } else if (strcmp (opt, "messages") == 0) {
@@ -513,6 +516,7 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
     switch (output) {
     default:
     case OUTPUT_SUMMARY:
+    case OUTPUT_SUMMARY_IDS:
     case OUTPUT_THREADS:
 	ret = do_search_threads (format, query, sort, output);
 	break;
diff --git a/test/emacs b/test/emacs
index 53f455a..6479c4e 100755
--- a/test/emacs
+++ b/test/emacs
@@ -239,7 +239,7 @@ Subject:
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
-test_begin_subtest "Reply within emacs"
+test_begin_subtest "Compose reply in emacs"
 test_emacs '(notmuch-search "subject:\"testing message sent via SMTP\"")
 	    (notmuch-test-wait)
 	    (notmuch-search-reply-to-thread)
@@ -257,6 +257,53 @@ On 01 Jan 2000 12:00:00 -0000, Notmuch Test Suite <test_suite@notmuchmail.org> w
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "Send reply from within Emacs"
+$TEST_DIRECTORY/smtp-dummy sent_message &
+smtp_dummy_pid=$!
+test_emacs \
+'(let ((message-send-mail-function '\''message-smtpmail-send-it)
+       (smtpmail-smtp-server "localhost")
+       (smtpmail-smtp-service "25025"))
+  (notmuch-search "subject:\"testing message sent via SMTP\"")
+  (notmuch-test-wait)
+  (notmuch-search-reply-to-thread)
+  (message-goto-to)
+  (message-goto-body)
+  (end-of-buffer)
+  (newline)
+  (insert "Reply to a message via Emacs with fake SMTP")
+  (message-send-and-exit))' >/dev/null 2>&1
+wait ${smtp_dummy_pid}
+notmuch new >/dev/null
+sed \
+    -e s',^User-Agent: Notmuch/.* Emacs/.*,User-Agent: Notmuch/XXX Emacs/XXX,' \
+    -e s',^Message-ID: <.*>$,Message-ID: <XXX>,' \
+    -e s',^In-Reply-To: <.*>$,In-Reply-To: <XXX>,' \
+    -e s',^References: <.*>$,References: <XXX>,' \
+    -e s',^Date: .*$,Date: Fri\, 29 Mar 1974 10:05:00 -0000,' < sent_message >OUTPUT
+cat <<EOF >EXPECTED
+From: Notmuch Test Suite <test_suite@notmuchmail.org>
+To: user@example.com
+Subject: Re: Testing message sent via SMTP
+In-Reply-To: <XXX>
+References: <XXX>
+User-Agent: Notmuch/XXX Emacs/XXX
+Date: Fri, 29 Mar 1974 10:05:00 -0000
+Message-ID: <XXX>
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+
+On 01 Jan 2000 12:00:00 -0000, Notmuch Test Suite <test_suite@notmuchmail.org> wrote:
+> This is a test that messages are sent via SMTP
+
+Reply to a message via Emacs with fake SMTP
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "Verify that 'replied' tag is added to reply's parent message."
+output=$(notmuch search 'tag:replied' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2000-01-01 [1/2] Notmuch Test Suite; Testing message sent via SMTP (inbox replied)"
+
 test_begin_subtest "Save attachment from within emacs using notmuch-show-save-attachments"
 # save as archive to test that Emacs does not re-compress .gz
 test_emacs '(let ((standard-input "\"attachment1.gz\""))
diff --git a/test/emacs-search-operate-all b/test/emacs-search-operate-all
new file mode 100755
index 0000000..48326c8
--- /dev/null
+++ b/test/emacs-search-operate-all
@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+
+test_description="emacs interface"
+. test-lib.sh
+
+EXPECTED=$TEST_DIRECTORY/emacs.expected-output
+
+add_email_corpus
+
+test_begin_subtest "Add/remove tags to/from all matching threads."
+test_emacs '(notmuch-search "tag:inbox AND tags")
+	    (notmuch-test-wait)
+	    (notmuch-search-operate-all "+matching -inbox")
+	    (notmuch-search "tag:matching AND NOT tag:inbox")
+	    (notmuch-test-wait)
+	    (test-output)'
+cat <<EOF >EXPECTED
+  2009-11-18 [2/2]   Ingmar Vanhassel, Carl Worth  [notmuch] [PATCH] Typsos (matching unread)
+  2009-11-18 [3/3]   Adrian Perez de Castro, Keith Packard, Carl Worth  [notmuch] Introducing myself (matching signed unread)
+  2009-11-18 [3/3]   Israel Herraiz, Keith Packard, Carl Worth   [notmuch] New to the list (matching unread)
+  2009-11-18 [2/2]   Keith Packard, Carl Worth    [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (matching unread)
+  2009-11-18 [2/2]   Keith Packard, Alexander Botero-Lowry    [notmuch] [PATCH] Create a default notmuch-show-hook that highlights URLs and uses word-wrap (matching unread)
+  2009-11-18 [1/1]   Jan Janak            [notmuch] [PATCH] notmuch new: Support for conversion of spool subdirectories into tags (matching unread)
+  2009-11-18 [1/1]   Stewart Smith        [notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs. (matching unread)
+End of search results.
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_done
diff --git a/test/notmuch-test b/test/notmuch-test
index 79e6267..bafbcb1 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -38,6 +38,7 @@ TESTS="
   encoding
   emacs
   emacs-large-search-buffer
+  emacs-search-operate-all
   maildir-sync
   crypto
   symbol-hiding
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-11 20:43               ` [PATCH v2] " Pieter Praet
@ 2011-07-11 21:05                 ` Austin Clements
  2011-07-13 14:16                   ` Pieter Praet
  2011-08-12  8:07                   ` [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter' Sebastian Spaeth
  0 siblings, 2 replies; 27+ messages in thread
From: Austin Clements @ 2011-07-11 21:05 UTC (permalink / raw)
  To: Pieter Praet; +Cc: Notmuch Mail

Quoth Pieter Praet on Jul 11 at 10:43 pm:
> TL;DR: I can haz regex pl0x?

Oof, what a pain.  I'm happy to change the output format of search; I
hadn't realized how difficult it would be to parse.  In fact, I'm not
sure it's even parsable by regexp, because the message ID's themselves
could contain parens.

So what would be a good format?  One possibility would be to
NULL-delimit the query part; as distasteful as I find that, this part
of the search output isn't meant for user consumption.  Though I fear
this is endemic to the dual role the search output currently plays as
both user and computer readable.

I've also got the code to do everything using document ID's instead of
message ID's.  As a side-effect, it makes the search output clean and
readily parsable since document ID's are just numbers.  Hence, there
are no quoting or escaping issues (plus the output is much more
compact).  I haven't sent this to the list yet because I haven't had a
chance to benchmark it and determine if the performance benefits make
exposing document ID's worthwhile.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-11 21:05                 ` Austin Clements
@ 2011-07-13 14:16                   ` Pieter Praet
  2011-07-13 14:47                     ` David Edmondson
  2011-07-13 18:57                     ` Austin Clements
  2011-08-12  8:07                   ` [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter' Sebastian Spaeth
  1 sibling, 2 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-13 14:16 UTC (permalink / raw)
  To: Austin Clements, David Edmondson; +Cc: Notmuch Mail

On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > TL;DR: I can haz regex pl0x?
> 
> Oof, what a pain.  I'm happy to change the output format of search; I
> hadn't realized how difficult it would be to parse.  In fact, I'm not
> sure it's even parsable by regexp, because the message ID's themselves
> could contain parens.
> 
> So what would be a good format?  One possibility would be to
> NULL-delimit the query part; as distasteful as I find that, this part
> of the search output isn't meant for user consumption.  Though I fear
> this is endemic to the dual role the search output currently plays as
> both user and computer readable.
> 
> I've also got the code to do everything using document ID's instead of
> message ID's.  As a side-effect, it makes the search output clean and
> readily parsable since document ID's are just numbers.  Hence, there
> are no quoting or escaping issues (plus the output is much more
> compact).  I haven't sent this to the list yet because I haven't had a
> chance to benchmark it and determine if the performance benefits make
> exposing document ID's worthwhile.

Jamie Zawinski once said/wrote [1]:
  'Some people, when confronted with a problem, think "I know,
  I'll use regular expressions." Now they have two problems.'

With this in mind, I set out to get rid of this whole regex mess altogether,
by populating the search buffer using Notmuch's JSON output instead of doing
brittle text matching tricks.

Looking for some documentation, I stumbled upon a long-forgotten gem [2].

David's already done pretty much all of the work for us!

Unfortunately, it doesn't apply cleanly to master anymore.

David, would you mind rebasing it?


Peace

-- 
Pieter

[1] http://www.jwz.org/hacks/
[2] id:"1290777202-14040-1-git-send-email-dme@dme.org"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-13 14:16                   ` Pieter Praet
@ 2011-07-13 14:47                     ` David Edmondson
  2011-07-13 18:57                     ` Austin Clements
  1 sibling, 0 replies; 27+ messages in thread
From: David Edmondson @ 2011-07-13 14:47 UTC (permalink / raw)
  To: Pieter Praet; +Cc: Notmuch Mail, Austin Clements

* pieter@praet.org [2011-07-13 Wed 15:16]
> David, would you mind rebasing it?

I'm sorry, I'm not likely to have time to do this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-13 14:16                   ` Pieter Praet
  2011-07-13 14:47                     ` David Edmondson
@ 2011-07-13 18:57                     ` Austin Clements
  2011-07-16 15:07                       ` Pieter Praet
                                         ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Austin Clements @ 2011-07-13 18:57 UTC (permalink / raw)
  To: Pieter Praet; +Cc: Notmuch Mail, David Edmondson

Quoth Pieter Praet on Jul 13 at  4:16 pm:
> On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > > TL;DR: I can haz regex pl0x?
> > 
> > Oof, what a pain.  I'm happy to change the output format of search; I
> > hadn't realized how difficult it would be to parse.  In fact, I'm not
> > sure it's even parsable by regexp, because the message ID's themselves
> > could contain parens.
> > 
> > So what would be a good format?  One possibility would be to
> > NULL-delimit the query part; as distasteful as I find that, this part
> > of the search output isn't meant for user consumption.  Though I fear
> > this is endemic to the dual role the search output currently plays as
> > both user and computer readable.
> > 
> > I've also got the code to do everything using document ID's instead of
> > message ID's.  As a side-effect, it makes the search output clean and
> > readily parsable since document ID's are just numbers.  Hence, there
> > are no quoting or escaping issues (plus the output is much more
> > compact).  I haven't sent this to the list yet because I haven't had a
> > chance to benchmark it and determine if the performance benefits make
> > exposing document ID's worthwhile.
> 
> Jamie Zawinski once said/wrote [1]:
>   'Some people, when confronted with a problem, think "I know,
>   I'll use regular expressions." Now they have two problems.'
> 
> With this in mind, I set out to get rid of this whole regex mess altogether,
> by populating the search buffer using Notmuch's JSON output instead of doing
> brittle text matching tricks.
> 
> Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> 
> David's already done pretty much all of the work for us!

Yes, similar thoughts were running through my head as I futzed with
the formatting for this.  My concern with moving to JSON for search
buffers is that parsing it is about *30 times slower* than the current
regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
result search buffer).  I think JSON makes a lot of sense for show
buffers because there's generally less data and it has a lot of
complicated structure.  Search results, on the other hand, have a very
simple, regular, and constrained structure, so JSON doesn't buy us
nearly as much.

JSON is hard to parse because, like the text search output, it's
designed for human consumption (of course, unlike the text search
output, it's also designed for computer consumption).  There's
something to be said for the debuggability and generality of this and
JSON is very good for exchanging small objects, but it's a remarkably
inefficient way to exchange large amounts of data between two
programs.

I guess what I'm getting at, though it pains me to say it, is perhaps
search needs a fast, computer-readable interchange format.  The
structure of the data is so simple and constrained that this could be
altogether trivial.

Or maybe I need a faster computer.

If anyone is curious, here's how I timed the parsing.

(defmacro time-it (code)
  `(let ((start-time (get-internal-run-time)))
     ,code
     (float-time (time-subtract (get-internal-run-time) start-time))))

(with-current-buffer "json"
  (goto-char (point-min))
  (time-it (json-read)))

(with-current-buffer "text"
  (goto-char (point-min))
  (time-it
   (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" nil t))))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-13 18:57                     ` Austin Clements
@ 2011-07-16 15:07                       ` Pieter Praet
  2011-07-20  4:50                       ` servilio
  2011-07-20 20:50                       ` JSON parsing performance (was Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter') Austin Clements
  2 siblings, 0 replies; 27+ messages in thread
From: Pieter Praet @ 2011-07-16 15:07 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail, David Edmondson

[-- Attachment #1: Type: text/plain, Size: 4581 bytes --]

On Wed, 13 Jul 2011 14:57:21 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Pieter Praet on Jul 13 at  4:16 pm:
> > On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > > Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > > > TL;DR: I can haz regex pl0x?
> > > 
> > > Oof, what a pain.  I'm happy to change the output format of search; I
> > > hadn't realized how difficult it would be to parse.  In fact, I'm not
> > > sure it's even parsable by regexp, because the message ID's themselves
> > > could contain parens.
> > > 
> > > So what would be a good format?  One possibility would be to
> > > NULL-delimit the query part; as distasteful as I find that, this part
> > > of the search output isn't meant for user consumption.  Though I fear
> > > this is endemic to the dual role the search output currently plays as
> > > both user and computer readable.
> > > 
> > > I've also got the code to do everything using document ID's instead of
> > > message ID's.  As a side-effect, it makes the search output clean and
> > > readily parsable since document ID's are just numbers.  Hence, there
> > > are no quoting or escaping issues (plus the output is much more
> > > compact).  I haven't sent this to the list yet because I haven't had a
> > > chance to benchmark it and determine if the performance benefits make
> > > exposing document ID's worthwhile.
> > 
> > Jamie Zawinski once said/wrote [1]:
> >   'Some people, when confronted with a problem, think "I know,
> >   I'll use regular expressions." Now they have two problems.'
> > 
> > With this in mind, I set out to get rid of this whole regex mess altogether,
> > by populating the search buffer using Notmuch's JSON output instead of doing
> > brittle text matching tricks.
> > 
> > Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> > 
> > David's already done pretty much all of the work for us!
> 
> Yes, similar thoughts were running through my head as I futzed with
> the formatting for this.  My concern with moving to JSON for search
> buffers is that parsing it is about *30 times slower* than the current
> regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
> result search buffer).  I think JSON makes a lot of sense for show
> buffers because there's generally less data and it has a lot of
> complicated structure.  Search results, on the other hand, have a very
> simple, regular, and constrained structure, so JSON doesn't buy us
> nearly as much.

That seems about right. Using the entire Notmuch mailing list archive,
processing JSON ends up taking 23x longer (see test in att).

> JSON is hard to parse because, like the text search output, it's
> designed for human consumption (of course, unlike the text search
> output, it's also designed for computer consumption).  There's
> something to be said for the debuggability and generality of this and
> JSON is very good for exchanging small objects, but it's a remarkably
> inefficient way to exchange large amounts of data between two
> programs.
> 
> I guess what I'm getting at, though it pains me to say it, is perhaps
> search needs a fast, computer-readable interchange format.  The
> structure of the data is so simple and constrained that this could be
> altogether trivial.

I guess that's our only option then. Could you implement it for me?
I'll make sure to rebase my patch series in an acceptable time frame.

An extra output format shouldn't be that much of a problem though, if we
further compartmentalize the code. What are your thoughts on (in the
long term) moving to a plugin-based architecture? Eg. enable something
like this:

  ./input/{Maildir, ...}
  ./output/{plain, JSON, ...}
  ./filters/{crypto, ...}
  ./backends/(Xapian, ...)
  ./uis/{Emacs, VIM, web, ...}

> Or maybe I need a faster computer.

That's what M$ Tech Support would want you to believe :)
What we need is slower computers, so devs are forced to count cycles again.
The rise of netbooks has thankfully done wonders in this respect.

> If anyone is curious, here's how I timed the parsing.
> 
> (defmacro time-it (code)
>   `(let ((start-time (get-internal-run-time)))
>      ,code
>      (float-time (time-subtract (get-internal-run-time) start-time))))
> 
> (with-current-buffer "json"
>   (goto-char (point-min))
>   (time-it (json-read)))
> 
> (with-current-buffer "text"
>   (goto-char (point-min))
>   (time-it
>    (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" nil t))))


Peace

-- 
Pieter

[-- Attachment #2: regexp-vs-json.org --]
[-- Type: application/octet-stream, Size: 1437 bytes --]

* Parsing plain-text w/ regexp vs. parsing JSON.

  #+SOURCE: timer-macro
  #+BEGIN_SRC emacs-lisp
    (defmacro time-it (code)
      `(let ((start-time (get-internal-run-time)))
         ,code
         (float-time (time-subtract (get-internal-run-time) start-time))))
  #+END_SRC

  #+SOURCE: count-msgs
  #+BEGIN_SRC sh
    notmuch count -- tag:x/notmuch
  #+END_SRC

  #+SOURCE: time-text
  #+BEGIN_SRC emacs-lisp :noweb yes
    <<timer-macro>>
    (with-temp-buffer
      (call-process "notmuch" nil t nil "search" "--format=text" "--" "tag:x/notmuch")
      (goto-char (point-min))
      (time-it
       (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\);
                                  \\(.*\\) (\\([^()]*\\))$" nil t))))
  #+END_SRC

  #+SOURCE: time-json
  #+BEGIN_SRC emacs-lisp :noweb yes
    <<timer-macro>>
    (with-temp-buffer
      (call-process "notmuch" nil t nil "search" "--format=json" "--" "tag:x/notmuch")
      (goto-char (point-min))
      (time-it (json-read)))
  #+END_SRC

  #+TBLNAME: results
  |----------+------------+------------+----------|
  | msgcount | time(text) | time(json) | % slower |
  |----------+------------+------------+----------|
  |     5294 |       0.01 |       0.23 |    2300. |
  |----------+------------+------------+----------|
  #+TBLFM: $1='(sbe "count-msgs")' :: $2='(sbe "time-text")' :: $3='(sbe "time-json")' :: $4=($3/$2)*100

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-13 18:57                     ` Austin Clements
  2011-07-16 15:07                       ` Pieter Praet
@ 2011-07-20  4:50                       ` servilio
  2011-07-20 20:50                       ` JSON parsing performance (was Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter') Austin Clements
  2 siblings, 0 replies; 27+ messages in thread
From: servilio @ 2011-07-20  4:50 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail, David Edmondson

What about encoding in notmuch the elements composing the line, print
the elements with a separator that would be encoded if it appears in
an element, then do the reverse in emacs. One such encoding might be
URL-encoding.

Servilio

^ permalink raw reply	[flat|nested] 27+ messages in thread

* JSON parsing performance (was Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter')
  2011-07-13 18:57                     ` Austin Clements
  2011-07-16 15:07                       ` Pieter Praet
  2011-07-20  4:50                       ` servilio
@ 2011-07-20 20:50                       ` Austin Clements
  2 siblings, 0 replies; 27+ messages in thread
From: Austin Clements @ 2011-07-20 20:50 UTC (permalink / raw)
  To: Pieter Praet; +Cc: servilio, Notmuch Mail, David Edmondson

[-- Attachment #1: Type: text/plain, Size: 4076 bytes --]

Quoth myself on Jul 13 at  2:57 pm:
> Quoth Pieter Praet on Jul 13 at  4:16 pm:
> > Jamie Zawinski once said/wrote [1]:
> >   'Some people, when confronted with a problem, think "I know,
> >   I'll use regular expressions." Now they have two problems.'
> > 
> > With this in mind, I set out to get rid of this whole regex mess altogether,
> > by populating the search buffer using Notmuch's JSON output instead of doing
> > brittle text matching tricks.
> > 
> > Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> > 
> > David's already done pretty much all of the work for us!
> 
> Yes, similar thoughts were running through my head as I futzed with
> the formatting for this.  My concern with moving to JSON for search
> buffers is that parsing it is about *30 times slower* than the current
> regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
> result search buffer).  I think JSON makes a lot of sense for show
> buffers because there's generally less data and it has a lot of
> complicated structure.  Search results, on the other hand, have a very
> simple, regular, and constrained structure, so JSON doesn't buy us
> nearly as much.
> 
> JSON is hard to parse because, like the text search output, it's
> designed for human consumption (of course, unlike the text search
> output, it's also designed for computer consumption).  There's
> something to be said for the debuggability and generality of this and
> JSON is very good for exchanging small objects, but it's a remarkably
> inefficient way to exchange large amounts of data between two
> programs.
> 
> I guess what I'm getting at, though it pains me to say it, is perhaps
> search needs a fast, computer-readable interchange format.  The
> structure of the data is so simple and constrained that this could be
> altogether trivial.
> 
> Or maybe I need a faster computer.

Or maybe I need to un-lame my benchmark.

TL;DR: We should use JSON for search results, but possibly not the
json.el shipped with Emacs.

I realized that my text benchmark didn't capture the cost of
extracting the match strings.  re-search-forward records matches as
buffer positions, which don't get realized into strings until you call
match-string.  Hence, match-string is quite expensive.

Also, Emacs' json.el is slow, so I perked it up.  My modified json.el
is ~3X faster, particularly for string-heavy output like notmuch's.
Though now I'm well into the realm of "eq is faster than =" and "M-x
disassemble", so unless I missed something big, this is as fast as it
gets.

While I was still thinking about new IPC formats, I realized that the
text format and the Emacs UI are already tightly coupled, so why not
go all the way and use S-expressions for IPC?  I now think JSON is
fast enough to use, but S-expressions still have a certain appeal.
They share most of the benefits of JSON; structure and extensibility
in particular.  Further, while the content of some ad-hoc format could
easily diverge from both the text and JSON formats, S-expressions
could exactly parallel the JSON content (with a little more
abstraction, they could even share the same format code).  For kicks,
I included an S-expression benchmark.  It beats out the text parser by
a factor of two and the optimized JSON parser by a factor of three.

Here are the results for my 1,413 result search buffer and timeworn
computer

                 Time   Normalized
--format=text   0.148s     1.00x
--format=json   0.598s     4.04x
custom json.el  0.209s     1.41x
 + string keys  0.195s     1.32x
S-expressions   0.066s     0.45x

I don't have time right now, but next week I might be able to look
through and update dme's JSON-based search code.


The benchmark and modified json.el are attached.

The benchmark is written so you can open it and eval-buffer, then C-x
C-e the various calls in the comments.  You can either
make-text/make-json, or run notmuch manually, pipe the results into
files "text" and "json", and open them in Emacs.

Please excuse the modified json.el code; it's gone through zero
cleanup.

[-- Attachment #2: timeparse.el --]
[-- Type: text/plain, Size: 3674 bytes --]

(defmacro time-it (repeat &rest body)
  (declare (indent 1))
  (when (not (numberp repeat))
    (push repeat body)
    (setq repeat 1))
  (let ((start-time (gensym)) (i (gensym)))
    `(let ((,start-time (get-internal-run-time)))
       (dotimes (,i ,repeat)
	 ,@body)
       (/ (float-time (time-subtract (get-internal-run-time) ,start-time))
	  ,repeat))))

;; Text

(defun make-text ()
  (with-current-buffer (get-buffer-create "text")
    (erase-buffer)
    (call-process "notmuch" nil t nil "search" "--format=text" "--" "tag:x/notmuch")))

(defun time-text ()
  (with-current-buffer "text"
    (time-it 10
      (goto-char (point-min))
      (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" nil t)
	(let* ((thread-id (match-string 1))
	       (date (match-string 2))
	       (count (match-string 3))
	       (authors (match-string 4))
	       (subject (match-string 5))
	       (tags (match-string 6))
	       (tag-list (if tags (save-match-data (split-string tags)))))
	  t)))))

(byte-compile 'time-text)
;; (make-text)
;; (time-text)

;; JSON

(defun load-custom-json ()
  (byte-compile-file "json.el")
  (load-file "./json.elc"))

(defun make-json ()
  (with-current-buffer (get-buffer-create "json")
    (erase-buffer)
    (call-process "notmuch" nil t nil "search" "--format=json" "--" "tag:x/notmuch")))

(defun time-json (&optional buf)
  (with-current-buffer (or buf "json")
    (let ((json-array-type 'list)
	  (json-object-type 'alist)
	  (json-key-type 'symbol))
      (time-it 10
	(goto-char (point-min))
	(dolist (ent (json-read))
	  ;; (Surprisingly, traversing the structure has no noticeable
	  ;; impact to performance)
	  (let ((thread-id (assq 'thread ent))
		(date (assq 'timestamp ent))
		(matched (assq 'matched ent))
		(total (assq 'total ent))
		(authors (assq 'authors ent))
		(subject (assq 'subject ent))
		(tag-list (assq 'tags ent)))
	    t))))))

(defun time-json-string-keys (&optional buf)
  (with-current-buffer (or buf "json")
    (let ((json-array-type 'list)
	  (json-object-type 'alist)
	  (json-key-type 'string))
      (time-it 10
	(goto-char (point-min))
	(dolist (ent (json-read))
	  (let ((thread-id (assoc "thread" ent))
		(date (assoc "timestamp" ent))
		(matched (assoc "matched" ent))
		(total (assoc "total" ent))
		(authors (assoc "authors" ent))
		(subject (assoc "subject" ent))
		(tag-list (assoc "tags" ent)))
	    t))))))

(byte-compile 'time-json)
(byte-compile 'time-json-string-keys)
;; (make-json)
;; (time-json)
;; (time-json-string-keys)
;; (load-custom-json)

;; S-expression

(defun make-sexp ()
  (with-current-buffer (get-buffer-create "sexp")
    (erase-buffer))
  (print
   (with-current-buffer "json"
     (let ((json-array-type 'list)
	   (json-object-type 'alist)
	   (json-key-type 'symbol))
       (goto-char (point-min))
       (json-read)))
   (get-buffer "sexp"))
  t)

(defun time-sexp ()
  (with-current-buffer "sexp"
    (let ((buf (current-buffer)))
      (time-it 10 (goto-char (point-min)) (read buf)))))

(byte-compile 'time-sexp)
;; (make-sexp)
;; (time-sexp)

;; Packed JSON

(defun make-packed-json ()
  (let ((buf (get-buffer-create "packed-json")))
    (with-current-buffer "json"
      (copy-to-buffer buf (point-min) (point-max)))
    (with-current-buffer buf
      (while (re-search-forward "^\\([^\"]*\"[^\"]+\"\\): \\([[\"0-9]\\)" nil t)
	(replace-match "\\1:\\2" nil nil))
      (goto-char (point-min))
      (while (re-search-forward "\\([\"0-9]\\),\n" nil t)
	(replace-match "\\1," nil nil)))))

(defun time-packed-json ()
  (time-json "packed-json"))

;; (make-packed-json)
;; (time-packed-json)

[-- Attachment #3: json.el --]
[-- Type: text/plain, Size: 23249 bytes --]

;;; json.el --- JavaScript Object Notation parser / generator

;; Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.

;; Author: Edward O'Connor <ted@oconnor.cx>
;; Version: 1.2
;; Keywords: convenience

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;; This is a library for parsing and generating JSON (JavaScript Object
;; Notation).

;; Learn all about JSON here: <URL:http://json.org/>.

;; The user-serviceable entry points for the parser are the functions
;; `json-read' and `json-read-from-string'. The encoder has a single
;; entry point, `json-encode'.

;; Since there are several natural representations of key-value pair
;; mappings in elisp (alist, plist, hash-table), `json-read' allows you
;; to specify which you'd prefer (see `json-object-type' and
;; `json-array-type').

;; Similarly, since `false' and `null' are distinct in JSON, you can
;; distinguish them by binding `json-false' and `json-null' as desired.

;;; History:

;; 2011-07-20 - Optimized by Austin Clements <aclements@csail.mit.edu>.
;; 2006-03-11 - Initial version.
;; 2006-03-13 - Added JSON generation in addition to parsing. Various
;;              other cleanups, bugfixes, and improvements.
;; 2006-12-29 - XEmacs support, from Aidan Kehoe <kehoea@parhasard.net>.
;; 2008-02-21 - Installed in GNU Emacs.

;;; Code:

(eval-when-compile (require 'cl))

;; Compatibility code

(defalias 'json-encode-char0 'encode-char)
(defalias 'json-decode-char0 'decode-char)


;; Parameters

(defvar json-object-type 'alist
  "Type to convert JSON objects to.
Must be one of `alist', `plist', or `hash-table'.  Consider let-binding
this around your call to `json-read' instead of `setq'ing it.")

(defvar json-array-type 'vector
  "Type to convert JSON arrays to.
Must be one of `vector' or `list'.  Consider let-binding this around
your call to `json-read' instead of `setq'ing it.")

(defvar json-key-type nil
  "Type to convert JSON keys to.
Must be one of `string', `symbol', `keyword', or nil.

If nil, `json-read' will guess the type based on the value of
`json-object-type':

    If `json-object-type' is:   nil will be interpreted as:
      `hash-table'                `string'
      `alist'                     `symbol'
      `plist'                     `keyword'

Note that values other than `string' might behave strangely for
Sufficiently Weird keys.  Consider let-binding this around your call to
`json-read' instead of `setq'ing it.")

(defvar json-false :json-false
  "Value to use when reading JSON `false'.
If this has the same value as `json-null', you might not be able to tell
the difference between `false' and `null'.  Consider let-binding this
around your call to `json-read' instead of `setq'ing it.")

(defvar json-null nil
  "Value to use when reading JSON `null'.
If this has the same value as `json-false', you might not be able to
tell the difference between `false' and `null'.  Consider let-binding
this around your call to `json-read' instead of `setq'ing it.")

\f

;;; Utilities

(defun json-join (strings separator)
  "Join STRINGS with SEPARATOR."
  (mapconcat 'identity strings separator))

(defun json-alist-p (list)
  "Non-null if and only if LIST is an alist."
  (or (null list)
      (and (consp (car list))
           (json-alist-p (cdr list)))))

(defun json-plist-p (list)
  "Non-null if and only if LIST is a plist."
  (or (null list)
      (and (keywordp (car list))
           (consp (cdr list))
           (json-plist-p (cddr list)))))

;; Reader utilities

;; (defsubst json-advance (&optional n)
;;   "Skip past the following N characters."
;;   (forward-char n))

(defalias 'json-advance 'forward-char)

;; (defsubst json-peek ()
;;   "Return the character at point."
;;   (let ((char (char-after (point))))
;;     (or char :json-eof)))

(defsubst json-peek ()
  "Return the character at point."
  (or (char-after) :json-eof))

(defsubst json-pop ()
  "Advance past the character at point, returning it."
  (let ((char (json-peek)))
    (if (eq char :json-eof)
        (signal 'end-of-file nil)
      (json-advance)
      char)))

;; (defun json-skip-whitespace ()
;;   "Skip past the whitespace at point."
;;   (skip-chars-forward "\t\r\n\f\b "))

(defsubst json-skip-whitespace ()
  "Skip past the whitespace at point."
  (skip-chars-forward "\t\r\n\f\b "))

\f

;; Error conditions

(put 'json-error 'error-message "Unknown JSON error")
(put 'json-error 'error-conditions '(json-error error))

(put 'json-readtable-error 'error-message "JSON readtable error")
(put 'json-readtable-error 'error-conditions
     '(json-readtable-error json-error error))

(put 'json-unknown-keyword 'error-message "Unrecognized keyword")
(put 'json-unknown-keyword 'error-conditions
     '(json-unknown-keyword json-error error))

(put 'json-number-format 'error-message "Invalid number format")
(put 'json-number-format 'error-conditions
     '(json-number-format json-error error))

(put 'json-string-escape 'error-message "Bad unicode escape")
(put 'json-string-escape 'error-conditions
     '(json-string-escape json-error error))

(put 'json-string-format 'error-message "Bad string format")
(put 'json-string-format 'error-conditions
     '(json-string-format json-error error))

(put 'json-object-format 'error-message "Bad JSON object")
(put 'json-object-format 'error-conditions
     '(json-object-format json-error error))

\f

;;; Keywords

(defvar json-keywords '("true" "false" "null")
  "List of JSON keywords.")

;; Keyword parsing

(defun json-read-keyword (keyword)
  "Read a JSON keyword at point.
KEYWORD is the keyword expected."
  (unless (member keyword json-keywords)
    (signal 'json-unknown-keyword (list keyword)))
  (mapc (lambda (char)
          (unless (char-equal char (json-peek))
            (signal 'json-unknown-keyword
                    (list (save-excursion
                            (backward-word 1)
                            (thing-at-point 'word)))))
          (json-advance))
        keyword)
  (unless (looking-at "\\(\\s-\\|[],}]\\|$\\)")
    (signal 'json-unknown-keyword
            (list (save-excursion
                    (backward-word 1)
                    (thing-at-point 'word)))))
  (cond ((string-equal keyword "true") t)
        ((string-equal keyword "false") json-false)
        ((string-equal keyword "null") json-null)))

;; Keyword encoding

(defun json-encode-keyword (keyword)
  "Encode KEYWORD as a JSON value."
  (cond ((eq keyword t)          "true")
        ((eq keyword json-false) "false")
        ((eq keyword json-null)  "null")))

;;; Numbers

;; Number parsing

;; (defun json-read-number (&optional sign)
;;  "Read the JSON number following point.
;; The optional SIGN argument is for internal use.

;; N.B.: Only numbers which can fit in Emacs Lisp's native number
;; representation will be parsed correctly."
;;  ;; If SIGN is non-nil, the number is explicitly signed.
;;  (let ((number-regexp
;;         "\\([0-9]+\\)?\\(\\.[0-9]+\\)?\\([Ee][+-]?[0-9]+\\)?"))
;;    (cond ((and (null sign) (char-equal (json-peek) ?-))
;;           (json-advance)
;;           (- (json-read-number t)))
;;          ((and (null sign) (char-equal (json-peek) ?+))
;;           (json-advance)
;;           (json-read-number t))
;;          ((and (looking-at number-regexp)
;;                (or (match-beginning 1)
;;                    (match-beginning 2)))
;;           (goto-char (match-end 0))
;;           (string-to-number (match-string 0)))
;;          (t (signal 'json-number-format (list (point)))))))

(defun json-read-number ()
 "Read the JSON number following point.

N.B.: Only numbers which can fit in Emacs Lisp's native number
representation will be parsed correctly."
 ;; This regexp requires one character of backtrack in the common case
 ;; of a whole number, but is slightly faster than a more explicit
 ;; regexp like "\\([0-9]+\\)?\\(\\.[0-9]+\\)?"
 (if (looking-at "[-+]?[0-9]*[.0-9][0-9]*\\([Ee][+-]?[0-9]+\\)?")
     (progn
       (goto-char (match-end 0))
       (string-to-number (match-string 0)))
   (signal 'json-number-format (list (point)))))

;; Number encoding

(defun json-encode-number (number)
  "Return a JSON representation of NUMBER."
  (format "%s" number))

;;; Strings

(defvar json-special-chars
  '((?\" . ?\")
    (?\\ . ?\\)
    (?/ . ?/)
    (?b . ?\b)
    (?f . ?\f)
    (?n . ?\n)
    (?r . ?\r)
    (?t . ?\t))
  "Characters which are escaped in JSON, with their elisp counterparts.")

;; String parsing

(defun json-read-escaped-char ()
  "Read the JSON string escaped character at point."
  ;; Skip over the '\'
  (json-advance)
  (let* ((char (json-pop))
         (special (assq char json-special-chars)))
    (cond
     (special (cdr special))
     ((not (eq char ?u)) char)
     ((looking-at "[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]")
      (let ((hex (match-string 0)))
        (json-advance 4)
        (json-decode-char0 'ucs (string-to-number hex 16))))
     (t
      (signal 'json-string-escape (list (point)))))))

;; (defun json-read-string ()
;;   "Read the JSON string at point."
;;   (unless (char-equal (json-peek) ?\")
;;     (signal 'json-string-format (list "doesn't start with '\"'!")))
;;   ;; Skip over the '"'
;;   (json-advance)
;;   (let ((characters '())
;;         (char (json-peek)))
;;     (while (not (char-equal char ?\"))
;;       (push (if (char-equal char ?\\)
;;                 (json-read-escaped-char)
;;               (json-pop))
;;             characters)
;;       (setq char (json-peek)))
;;     ;; Skip over the '"'
;;     (json-advance)
;;     (if characters
;;         (apply 'string (nreverse characters))
;;       "")))

;; Really matters
(defun json-read-string ()
  "Read the JSON string at point."
;;  (unless (char-equal (json-peek) ?\")
;;    (signal 'json-string-format (list "doesn't start with '\"'!")))
  ;; Skip over the '"'
  (json-advance)
  (let ((parts '()) (more t))
    (while more
      (let ((start (point)))
	(when (> (skip-chars-forward "^\\\\\"") 0)
	  (push (buffer-substring-no-properties start (point)) parts)))
      ;; Helps a little
      (let ((char (char-after)))
	(cond ((eq char ?\") (json-advance) (setq more nil))
	      ((eq char ?\\) (push (string (json-read-escaped-char)) parts))
	      (t (error "XXX Unterminated string")))))
      ;; (let ((char (json-peek)))
      ;; 	(case char
      ;; 	  ((?\") (json-advance) (setq done t))
      ;; 	  ((?\\) (push (string (json-read-escaped-char)) parts))
      ;; 	  (t     (error "XXX Unterminated string")))))
    (if parts
	(if (cdr parts)
	    (apply 'concat (nreverse parts))
	  (car parts))
      "")))

;; String encoding

(defun json-encode-char (char)
  "Encode CHAR as a JSON string."
  (setq char (json-encode-char0 char 'ucs))
  (let ((control-char (car (rassoc char json-special-chars))))
    (cond
     ;; Special JSON character (\n, \r, etc.)
     (control-char
      (format "\\%c" control-char))
     ;; ASCIIish printable character
     ((and (> char 31) (< char 161))
      (format "%c" char))
     ;; Fallback: UCS code point in \uNNNN form
     (t
      (format "\\u%04x" char)))))

(defun json-encode-string (string)
  "Return a JSON representation of STRING."
  (format "\"%s\"" (mapconcat 'json-encode-char string "")))

;;; JSON Objects

(defun json-new-object ()
  "Create a new Elisp object corresponding to a JSON object.
Please see the documentation of `json-object-type'."
  (cond ((eq json-object-type 'hash-table)
         (make-hash-table :test 'equal))
        (t
         (list))))

(defun json-add-to-object (object key value)
  "Add a new KEY -> VALUE association to OBJECT.
Returns the updated object, which you should save, e.g.:
    (setq obj (json-add-to-object obj \"foo\" \"bar\"))
Please see the documentation of `json-object-type' and `json-key-type'."
  (let ((json-key-type
         (if (eq json-key-type nil)
             (cdr (assq json-object-type '((hash-table . string)
                                           (alist . symbol)
                                           (plist . keyword))))
           json-key-type)))
    (setq key
          (cond ((eq json-key-type 'string)
                 key)
                ((eq json-key-type 'symbol)
                 (intern key))
                ((eq json-key-type 'keyword)
                 (intern (concat ":" key)))))
    (cond ((eq json-object-type 'hash-table)
           (puthash key value object)
           object)
          ((eq json-object-type 'alist)
           (cons (cons key value) object))
          ((eq json-object-type 'plist)
           (cons key (cons value object))))))

;; JSON object parsing

;; (defun json-read-object ()
;;   "Read the JSON object at point."
;;   ;; Skip over the "{"
;;   (json-advance)
;;   (json-skip-whitespace)
;;   ;; read key/value pairs until "}"
;;   (let ((elements (json-new-object))
;;         key value)
;;     (while (not (char-equal (json-peek) ?}))
;;       (json-skip-whitespace)
;;       (setq key (json-read-string))
;;       (json-skip-whitespace)
;;       (if (char-equal (json-peek) ?:)
;;           (json-advance)
;;         (signal 'json-object-format (list ":" (json-peek))))
;;       (setq value (json-read))
;;       (setq elements (json-add-to-object elements key value))
;;       (json-skip-whitespace)
;;       (unless (char-equal (json-peek) ?})
;;         (if (char-equal (json-peek) ?,)
;;             (json-advance)
;;           (signal 'json-object-format (list "," (json-peek))))))
;;     ;; Skip over the "}"
;;     (json-advance)
;;     elements))


(defun json-read-object ()
  "Read the JSON object at point."
  ;; Skip over the "{"
  (json-advance)
  (json-skip-whitespace)
  ;; read key/value pairs until "}"
  (let ((elements (json-new-object))
        key value (more t))
    (unless (eq (char-after) ?})
;;    (while (not (eq (char-after) ?}))
      (while more
	(unless (eq (char-after) ?\")
	  (json-skip-whitespace)
	  (unless (eq (char-after) ?\")
	    (signal 'json-string-format (list "doesn't start with '\"'!"))))
      (setq key (json-read-string))
      ;; Makes a small but surprising difference, adds up if done
      ;; consistently
      (if (eq (char-after) ?:)
          (json-advance)
	(if (progn (json-skip-whitespace) (eq (char-after) ?:))
	    (json-advance)
	  (signal 'json-object-format (list ":" (json-peek)))))
      (setq value (json-read))
      (setq elements (json-add-to-object elements key value))
      ;; Order matters a little
      (cond ((eq (char-after) ?,) (json-advance))
      	    ((eq (char-after) ?}) (setq more nil))
      	    ((progn
      	       (json-skip-whitespace)
      	       (eq (char-after) ?,)) (json-advance))
      	    ((eq (char-after) ?}) (setq more nil))
      	    (t (signal 'json-object-format (list "," (json-peek)))))))
      ;; (unless (char-equal (json-peek) ?})
      ;;   (if (char-equal (json-peek) ?,)
      ;;       (json-advance)
      ;;     (signal 'json-object-format (list "," (json-peek))))))
    ;; Skip over the "}"
    (json-advance)
    elements))

;; Hash table encoding

(defun json-encode-hash-table (hash-table)
  "Return a JSON representation of HASH-TABLE."
  (format "{%s}"
          (json-join
           (let (r)
             (maphash
              (lambda (k v)
                (push (format "%s:%s"
                              (json-encode k)
                              (json-encode v))
                      r))
              hash-table)
             r)
           ", ")))

;; List encoding (including alists and plists)

(defun json-encode-alist (alist)
  "Return a JSON representation of ALIST."
  (format "{%s}"
          (json-join (mapcar (lambda (cons)
                               (format "%s:%s"
                                       (json-encode (car cons))
                                       (json-encode (cdr cons))))
                             alist)
                     ", ")))

(defun json-encode-plist (plist)
  "Return a JSON representation of PLIST."
  (let (result)
    (while plist
      (push (concat (json-encode (car plist))
                    ":"
                    (json-encode (cadr plist)))
            result)
      (setq plist (cddr plist)))
    (concat "{" (json-join (nreverse result) ", ") "}")))

(defun json-encode-list (list)
  "Return a JSON representation of LIST.
Tries to DWIM: simple lists become JSON arrays, while alists and plists
become JSON objects."
  (cond ((null list)         "null")
        ((json-alist-p list) (json-encode-alist list))
        ((json-plist-p list) (json-encode-plist list))
        ((listp list)        (json-encode-array list))
        (t
         (signal 'json-error (list list)))))

;;; Arrays

;; Array parsing

;; (defun json-read-array ()
;;   "Read the JSON array at point."
;;   ;; Skip over the "["
;;   (json-advance)
;;   (json-skip-whitespace)
;;   ;; read values until "]"
;;   (let (elements)
;;     (while (not (char-equal (json-peek) ?\]))
;;       (push (json-read) elements)
;;       (json-skip-whitespace)
;;       (unless (char-equal (json-peek) ?\])
;;         (if (char-equal (json-peek) ?,)
;;             (json-advance)
;;           (signal 'json-error (list 'bleah)))))
;;     ;; Skip over the "]"
;;     (json-advance)
;;     (apply json-array-type (nreverse elements))))

(defun json-read-array ()
  "Read the JSON array at point."
  ;; Skip over the "["
  (json-advance)
  (json-skip-whitespace)
  ;; read values until "]"
  (let* (elements (more t))
    (unless (eq (char-after) ?\])
      (while more
;;    (while (not (char-equal (json-peek) ?\]))
      (push (json-read) elements)
      ;; Doesn't help
;;	(setq tail (setcdr tail (cons (json-read) nil)))

;;      (json-skip-whitespace)
      (cond ((eq (char-after) ?,) (json-advance))
	    ((eq (char-after) ?\]) (setq more nil))
	    ((progn
	       (json-skip-whitespace)
	       (eq (char-after) ?,)) (json-advance))
	    ((eq (char-after) ?\]) (setq more nil))
	    (t (signal 'json-error (list 'bleah))))))
      ;; (unless (char-equal (json-peek) ?\])
      ;;   (if (char-equal (json-peek) ?,)
      ;;       (json-advance)
      ;;     (signal 'json-error (list 'bleah))))))
    ;; Skip over the "]"
    (json-advance)
    ;; Matters
    (if (eq json-array-type 'list)
	(nreverse elements)
      (apply json-array-type (nreverse elements)))))

;; Array encoding

(defun json-encode-array (array)
  "Return a JSON representation of ARRAY."
  (concat "[" (mapconcat 'json-encode array ", ") "]"))

\f

;;; JSON reader.

;; (defvar json-readtable
;;   (let ((table
;;          '((?t json-read-keyword "true")
;;            (?f json-read-keyword "false")
;;            (?n json-read-keyword "null")
;;            (?{ json-read-object)
;;            (?\[ json-read-array)
;;            (?\" json-read-string))))
;;     (mapc (lambda (char)
;;             (push (list char 'json-read-number) table))
;;           '(?- ?+ ?. ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))
;;     table)
;;   "Readtable for JSON reader.")

;; (defun json-read ()
;;   "Parse and return the JSON object following point.
;; Advances point just past JSON object."
;;   (json-skip-whitespace)
;;   (let ((char (json-peek)))
;;     (if (not (eq char :json-eof))
;;         (let ((record (cdr (assq char json-readtable))))
;;           (if (functionp (car record))
;;               (apply (car record) (cdr record))
;;             (signal 'json-readtable-error record)))
;;       (signal 'end-of-file nil))))

(defvar my-json-readtable
  (let ((table (make-char-table nil)))
    (aset table ?t '(json-read-keyword "true"))
    (aset table ?f '(json-read-keyword "false"))
    (aset table ?n '(json-read-keyword "null"))
    (aset table ?{ '(json-read-object))
    (aset table ?\[ '(json-read-array))
    (aset table ?\" '(json-read-string))
    (mapc (lambda (char)
	    (aset table char '(json-read-number)))
	  '(?- ?+ ?. ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))
    table)
  "Readtable for JSON reader.")

;; Char-table matters a bit; (if (null ..)) matters more
;; (defun json-read ()
;;   "Parse and return the JSON object following point.
;; Advances point just past JSON object."
;;   (json-skip-whitespace)
;;   (let ((char (json-peek)))
;;     (if (not (eq char :json-eof))
;; 	(let ((record (aref my-json-readtable char)))
;; 	  (if (null (car record))
;; 	      (signal 'json-readtable-error record)
;; 	    (apply (car record) (cdr record))))
;;       (signal 'end-of-file nil))))

;; Makes no difference or slower, probably because there's usually whitespace
;; (defun json-read ()
;;   "Parse and return the JSON object following point.
;; Advances point just past JSON object."
;;   (let ((record (and (char-after) (aref my-json-readtable (char-after)))))
;;     (when (null record)
;;       (json-skip-whitespace)
;;       (when (eobp)
;; 	(signal 'end-of-file nil))
;;       (setq record (aref my-json-readtable (char-after)))
;;       (when (null record)
;; 	(signal 'json-readtable-error record)))
;;     (apply (car record) (cdr record))))

;; Makes a difference
(defun json-read ()
  "Parse and return the JSON object following point.
Advances point just past JSON object."
  (json-skip-whitespace)
  (if (char-after)
      (let ((record (aref my-json-readtable (char-after))))
	(if record
	    (apply (car record) (cdr record))
	  (signal 'json-readtable-error record)))
    (signal 'end-of-file nil)))

;; Syntactic sugar for the reader

(defun json-read-from-string (string)
  "Read the JSON object contained in STRING and return it."
  (with-temp-buffer
    (insert string)
    (goto-char (point-min))
    (json-read)))

(defun json-read-file (file)
  "Read the first JSON object contained in FILE and return it."
  (with-temp-buffer
    (insert-file-contents file)
    (goto-char (point-min))
    (json-read)))

\f

;;; JSON encoder

(defun json-encode (object)
  "Return a JSON representation of OBJECT as a string."
  (cond ((memq object (list t json-null json-false))
         (json-encode-keyword object))
        ((stringp object)      (json-encode-string object))
        ((keywordp object)     (json-encode-string
                                (substring (symbol-name object) 1)))
        ((symbolp object)      (json-encode-string
                                (symbol-name object)))
        ((numberp object)      (json-encode-number object))
        ((arrayp object)       (json-encode-array object))
        ((hash-table-p object) (json-encode-hash-table object))
        ((listp object)        (json-encode-list object))
        (t                     (signal 'json-error (list object)))))

(provide 'json)

;; arch-tag: 15f6e4c8-b831-4172-8749-bbc680c50ea1
;;; json.el ends here

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-07-05 21:42             ` Austin Clements
  2011-07-10 14:11               ` [PATCH] emacs: bad regexp @ `notmuch-search-process-filter' Pieter Praet
  2011-07-11 20:43               ` [PATCH v2] " Pieter Praet
@ 2011-08-03 20:47               ` Austin Clements
  2011-08-03 21:42                 ` Jameson Graef Rollins
  2 siblings, 1 reply; 27+ messages in thread
From: Austin Clements @ 2011-08-03 20:47 UTC (permalink / raw)
  To: Pieter Praet, Carl Worth; +Cc: Olly Betts, Notmuch Mail

On Tue, Jul 5, 2011 at 5:42 PM, Austin Clements <amdragon@mit.edu> wrote:
> Quoth Pieter Praet on Jul 05 at  9:04 pm:
>> On Mon, 04 Jul 2011 20:48:12 +0200, Pieter Praet <pieter@praet.org> wrote:
>> > On Mon, 04 Jul 2011 13:56:26 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
>> > > I should probably emit two lists per thread: one of matched IDs and
>> > > one of unmatched IDs. Tagging a region can then operate on the
>> > > concatenation of these, while * can operate only on the matched
>> > > lists. This should be easy to do. I'll send an updated patch when I'm
>> > > back at a computer.
>> >
>> > The matched MsgIds will be sufficient, as we'll want to operate on
>> > either the matched messages or the entire thread (for which the
>> > `thread-id' property is already present).
>> >
>> > Can't think of a use case for non-matched messages right now,
>> > but if required, we'll just use `set-exclusive-or'.
>>
>> Wasn't thinking clearly:
>>
>> You're right, we *will* be needing both a list of matched as well as one
>> of unmatched Message-Id's per result. Otherwise there would still be a
>> potential race condition when tagging with +/-.
>
> Yes, exactly.  (I had originally thought this race was a strict
> superset of the '*' race; I now realize that's not the case, but
> they're related enough that they might as well be addressed together.)
>
> Below is an updated patch that separates the matched and unmatched
> ID's (it's ugly, but no point in cleaning it up since it's a
> prototype).  Now the tag list on each search line is followed by
> something that looks like
>
>  (id:x or id:y) or id:z
>
> or just
>
>  (id:x or id:y)
>
> where the parenthesized part of the query is for the matched messages
> and the (possibly empty) unparenthesized part is for the non-matched
> messages.  This is designed to be easy for emacs to parse: grab just
> the parenthesized part for the queries used by '*' and the whole thing
> for region tagging queries.  This should also eliminate corner cases
> for pasting together multiple queries with "or".
>
> *snip*

The patch I posted above includes message ID's in search results as a
proxy for the match set (which can then be used in a tagging operation
to tag exactly the results you saw).  However, from an efficiency
standpoint, it makes more sense to capture the match set directly as
document ID's.

I've had an implementation of this for a while, but finally got around
to benchmarking the difference between tagging using message ID's
versus using document ID's.  It looks like tagging spends about 2/3rds
of its time performing queries, and only about 1/3rd actually tagging,
so tagging using document ID's is 3x-4x faster.

The downside to using document ID's is that we need API's to expose
them.  My prototype exposes these as opaque "object ID"s, which acts a
lot like message IDs, but has no intrinsic meaning outside of the
library.  This needs two library functions: one to retrieve a
message's object ID and another to retrieve a message by object ID.

3x-4x isn't enough to make me jump on this added complexity, but it's
enough to make me consider it.  Carl, I'd love to hear your thoughts.

The benchmark results are below.  All results are for a corpus of 10K
messages taken from the Enron email data set and in all cases the
database is in the buffer cache.  "P4" is my old Pentium 4 and "C2" is
my newer Core 2 Duo.

                Message IDs   Document IDs   Diff
HDD/ext3,   P4     235.8s         76.3s      3.1x
SSD/btrfs,  P4     239.4s         69.0s      3.5x
HDD/reiser, C2      72.4s         23.7s      3.1x

I also ran with a patch to Xapian from ojwb that optimizes
adding/removing terms that don't have position information [1], which
reduces the baseline tagging cost.

HDD/reiser, C2      71.6s         19.4s      3.7x

[1] http://oligarchy.co.uk/xapian/patches/xapian-chert-optimise-adding-term-without-positions.patch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-08-03 20:47               ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
@ 2011-08-03 21:42                 ` Jameson Graef Rollins
  2011-08-03 22:21                   ` Austin Clements
  0 siblings, 1 reply; 27+ messages in thread
From: Jameson Graef Rollins @ 2011-08-03 21:42 UTC (permalink / raw)
  To: Austin Clements, Pieter Praet, Carl Worth; +Cc: Olly Betts, Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On Wed, 3 Aug 2011 16:47:32 -0400, Austin Clements <amdragon@mit.edu> wrote:
> The patch I posted above includes message ID's in search results as a
> proxy for the match set (which can then be used in a tagging operation
> to tag exactly the results you saw).  However, from an efficiency
> standpoint, it makes more sense to capture the match set directly as
> document ID's.
> 
> I've had an implementation of this for a while, but finally got around
> to benchmarking the difference between tagging using message ID's
> versus using document ID's.  It looks like tagging spends about 2/3rds
> of its time performing queries, and only about 1/3rd actually tagging,
> so tagging using document ID's is 3x-4x faster.

Wow, this sounds very cool, Austin.

> The downside to using document ID's is that we need API's to expose
> them.  My prototype exposes these as opaque "object ID"s, which acts a
> lot like message IDs, but has no intrinsic meaning outside of the
> library.  This needs two library functions: one to retrieve a
> message's object ID and another to retrieve a message by object ID.

This sounds totally reasonable to me.  Maybe we could use something like
"oid:" from the command line?

> 3x-4x isn't enough to make me jump on this added complexity, but it's
> enough to make me consider it.  Carl, I'd love to hear your thoughts.

Imho 3x-4x is actually a pretty huge improvement.  Is it really that
much of an added complexity to add those two functions?  That actually
seems like a relatively simple patch to me.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PROTO] possible solution for "Race condition for '*' command"
  2011-08-03 21:42                 ` Jameson Graef Rollins
@ 2011-08-03 22:21                   ` Austin Clements
  0 siblings, 0 replies; 27+ messages in thread
From: Austin Clements @ 2011-08-03 22:21 UTC (permalink / raw)
  To: Jameson Graef Rollins; +Cc: Olly Betts, Notmuch Mail

Quoth Jameson Graef Rollins on Aug 03 at  2:42 pm:
> On Wed, 3 Aug 2011 16:47:32 -0400, Austin Clements <amdragon@mit.edu> wrote:
> > The downside to using document ID's is that we need API's to expose
> > them.  My prototype exposes these as opaque "object ID"s, which acts a
> > lot like message IDs, but has no intrinsic meaning outside of the
> > library.  This needs two library functions: one to retrieve a
> > message's object ID and another to retrieve a message by object ID.
> 
> This sounds totally reasonable to me.  Maybe we could use something like
> "oid:" from the command line?

That would be difficult with Xapian's query parser (though, probably
fairly easy with the custom query parser).  This is one rough edge in
my prototype.  Currently, I have notmuch tag take an --objids flag
that makes it interpret the query as a whitespace-separated list of
object ID's instead of as a query string.  In a way, this makes sense,
but it would be annoying to do this to all of the commands.

We could special-case the query syntax and accept *either* a regular
query or an object ID query, but not a mixture of terms.  That would
be pleasantly forward-compatible and would address the special
handling in notmuch tag.  It would also give an easier path to fixing
the race condition, since we could fix it right away with message ID
queries and then move to object ID queries.  (Also, this could
eliminate the lookup-by-object-ID API function, though maybe we want
that anyway.)

> > 3x-4x isn't enough to make me jump on this added complexity, but it's
> > enough to make me consider it.  Carl, I'd love to hear your thoughts.
> 
> Imho 3x-4x is actually a pretty huge improvement.  Is it really that
> much of an added complexity to add those two functions?  That actually
> seems like a relatively simple patch to me.

The patch itself is only a few lines, but it adds a new concept.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-07-11 21:05                 ` Austin Clements
  2011-07-13 14:16                   ` Pieter Praet
@ 2011-08-12  8:07                   ` Sebastian Spaeth
  2011-08-12  8:28                     ` Austin Clements
  1 sibling, 1 reply; 27+ messages in thread
From: Sebastian Spaeth @ 2011-08-12  8:07 UTC (permalink / raw)
  To: Austin Clements, Pieter Praet; +Cc: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> So what would be a good format?  One possibility would be to
> NULL-delimit the query part; as distasteful as I find that, this part
> of the search output isn't meant for user consumption.  Though I fear
> this is endemic to the dual role the search output currently plays as
> both user and computer readable.

Perhaps we take a queue from xargs and have a command line switch for \n
vs \0 output?

--null
-- 0
  Input items are terminated by a null character instead of by
  whitespace, and the quotes and backslash are not special (every
  character is taken lit- erally).

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
  2011-08-12  8:07                   ` [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter' Sebastian Spaeth
@ 2011-08-12  8:28                     ` Austin Clements
  0 siblings, 0 replies; 27+ messages in thread
From: Austin Clements @ 2011-08-12  8:28 UTC (permalink / raw)
  To: Sebastian Spaeth; +Cc: Notmuch Mail

Quoth Sebastian Spaeth on Aug 12 at 10:07 am:
> On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > So what would be a good format?  One possibility would be to
> > NULL-delimit the query part; as distasteful as I find that, this part
> > of the search output isn't meant for user consumption.  Though I fear
> > this is endemic to the dual role the search output currently plays as
> > both user and computer readable.
> 
> Perhaps we take a queue from xargs and have a command line switch for \n
> vs \0 output?
> 
> --null
> -- 0
>   Input items are terminated by a null character instead of by
>   whitespace, and the quotes and backslash are not special (every
>   character is taken lit- erally).

This was one of the approaches I considered, but given that JSON
parsing (with my optimized json.el) is nearly as fast as regexp-based
parsing (which would also be the fastest way to parse \0-separated
output) and is flexible, robust, and structured (unlike any simple
delimited text format), I concluded we should just go with JSON.  If
we care about speed enough to introduce another format, I'd propose
S-expressions over a new text format, since they double parsing
performance compared to text, have the same structural benefits as
JSON, and JSON and S-expressions could even share the same formatting
code (so there's no chance of representation divergence).

BTW, reviving the JSON search parser is on my shortlist.  I should
also bundle up my optimized json.el, since it should seriously boost
the performance of notmuch-show, too.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-08-12  8:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-01 16:37 [PATCH 2/2] [RFC] possible solution for "Race condition for '*' command" Austin Clements
2011-07-02 14:20 ` Pieter Praet
2011-07-03 17:17   ` Austin Clements
2011-07-04  6:51     ` [PROTO] " Pieter Praet
2011-07-04  6:51       ` [PATCH 1/5] emacs: add property "matched-msgids" to each search result Pieter Praet
2011-07-04  6:51       ` [PATCH 2/5] emacs: add some functions to fetch the matched-msgids of a (region of) search result(s) Pieter Praet
2011-07-04  6:51       ` [PATCH 3/5] emacs: stashing (a region of) matched-msgids Pieter Praet
2011-07-04  6:51       ` [PATCH 4/5] test: emacs: add/remove tags from all matching messages with `notmuch-search-operate-all' Pieter Praet
2011-07-04  6:51       ` [PATCH 5/5] emacs: make `notmuch-search-operate-all' use matched-msgids instead of the original query string Pieter Praet
2011-07-04 17:56       ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
2011-07-04 18:48         ` Pieter Praet
2011-07-05 19:04           ` Pieter Praet
2011-07-05 21:42             ` Austin Clements
2011-07-10 14:11               ` [PATCH] emacs: bad regexp @ `notmuch-search-process-filter' Pieter Praet
2011-07-11 20:43               ` [PATCH v2] " Pieter Praet
2011-07-11 21:05                 ` Austin Clements
2011-07-13 14:16                   ` Pieter Praet
2011-07-13 14:47                     ` David Edmondson
2011-07-13 18:57                     ` Austin Clements
2011-07-16 15:07                       ` Pieter Praet
2011-07-20  4:50                       ` servilio
2011-07-20 20:50                       ` JSON parsing performance (was Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter') Austin Clements
2011-08-12  8:07                   ` [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter' Sebastian Spaeth
2011-08-12  8:28                     ` Austin Clements
2011-08-03 20:47               ` [PROTO] possible solution for "Race condition for '*' command" Austin Clements
2011-08-03 21:42                 ` Jameson Graef Rollins
2011-08-03 22:21                   ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).