Quoth myself on Jul 13 at  2:57 pm:
> Quoth Pieter Praet on Jul 13 at  4:16 pm:
> > Jamie Zawinski once said/wrote [1]:
> >   'Some people, when confronted with a problem, think "I know,
> >   I'll use regular expressions." Now they have two problems.'
> > 
> > With this in mind, I set out to get rid of this whole regex mess altogether,
> > by populating the search buffer using Notmuch's JSON output instead of doing
> > brittle text matching tricks.
> > 
> > Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> > 
> > David's already done pretty much all of the work for us!
> 
> Yes, similar thoughts were running through my head as I futzed with
> the formatting for this.  My concern with moving to JSON for search
> buffers is that parsing it is about *30 times slower* than the current
> regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
> result search buffer).  I think JSON makes a lot of sense for show
> buffers because there's generally less data and it has a lot of
> complicated structure.  Search results, on the other hand, have a very
> simple, regular, and constrained structure, so JSON doesn't buy us
> nearly as much.
> 
> JSON is hard to parse because, like the text search output, it's
> designed for human consumption (of course, unlike the text search
> output, it's also designed for computer consumption).  There's
> something to be said for the debuggability and generality of this and
> JSON is very good for exchanging small objects, but it's a remarkably
> inefficient way to exchange large amounts of data between two
> programs.
> 
> I guess what I'm getting at, though it pains me to say it, is perhaps
> search needs a fast, computer-readable interchange format.  The
> structure of the data is so simple and constrained that this could be
> altogether trivial.
> 
> Or maybe I need a faster computer.

Or maybe I need to un-lame my benchmark.

TL;DR: We should use JSON for search results, but possibly not the
json.el shipped with Emacs.

I realized that my text benchmark didn't capture the cost of
extracting the match strings.  re-search-forward records matches as
buffer positions, which don't get realized into strings until you call
match-string.  Hence, match-string is quite expensive.

Also, Emacs' json.el is slow, so I perked it up.  My modified json.el
is ~3X faster, particularly for string-heavy output like notmuch's.
Though now I'm well into the realm of "eq is faster than =" and "M-x
disassemble", so unless I missed something big, this is as fast as it
gets.

While I was still thinking about new IPC formats, I realized that the
text format and the Emacs UI are already tightly coupled, so why not
go all the way and use S-expressions for IPC?  I now think JSON is
fast enough to use, but S-expressions still have a certain appeal.
They share most of the benefits of JSON; structure and extensibility
in particular.  Further, while the content of some ad-hoc format could
easily diverge from both the text and JSON formats, S-expressions
could exactly parallel the JSON content (with a little more
abstraction, they could even share the same format code).  For kicks,
I included an S-expression benchmark.  It beats out the text parser by
a factor of two and the optimized JSON parser by a factor of three.

Here are the results for my 1,413 result search buffer and timeworn
computer

                 Time   Normalized
--format=text   0.148s     1.00x
--format=json   0.598s     4.04x
custom json.el  0.209s     1.41x
 + string keys  0.195s     1.32x
S-expressions   0.066s     0.45x

I don't have time right now, but next week I might be able to look
through and update dme's JSON-based search code.


The benchmark and modified json.el are attached.

The benchmark is written so you can open it and eval-buffer, then C-x
C-e the various calls in the comments.  You can either
make-text/make-json, or run notmuch manually, pipe the results into
files "text" and "json", and open them in Emacs.

Please excuse the modified json.el code; it's gone through zero
cleanup.