From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amdragon@mit.edu>
Received: from localhost (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 968D9431FD0
	for <notmuch@notmuchmail.org>; Wed, 13 Jul 2011 11:57:37 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
X-Spam-Flag: NO
X-Spam-Score: -0.7
X-Spam-Level: 
X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
	tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
Received: from olra.theworths.org ([127.0.0.1])
	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id ei+Ih34cvLfA for <notmuch@notmuchmail.org>;
	Wed, 13 Jul 2011 11:57:37 -0700 (PDT)
Received: from dmz-mailsec-scanner-1.mit.edu (DMZ-MAILSEC-SCANNER-1.MIT.EDU
	[18.9.25.12])
	by olra.theworths.org (Postfix) with ESMTP id E7368431FB6
	for <notmuch@notmuchmail.org>; Wed, 13 Jul 2011 11:57:36 -0700 (PDT)
X-AuditID: 1209190c-b7c65ae00000117c-fb-4e1deaa9a6c2
Received: from mailhub-auth-2.mit.edu ( [18.7.62.36])
	by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP
	id 2C.F0.04476.9AAED1E4; Wed, 13 Jul 2011 14:57:45 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
	by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id p6DIvZWc021267; 
	Wed, 13 Jul 2011 14:57:36 -0400
Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])
	(authenticated bits=0)
	(User authenticated as amdragon@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p6DIvW6D021713
	(version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT);
	Wed, 13 Jul 2011 14:57:35 -0400 (EDT)
Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.72)
	(envelope-from <amdragon@mit.edu>)
	id 1Qh4cb-0005hQ-Rl; Wed, 13 Jul 2011 14:57:21 -0400
Date: Wed, 13 Jul 2011 14:57:21 -0400
From: Austin Clements <amdragon@MIT.EDU>
To: Pieter Praet <pieter@praet.org>
Subject: Re: [PATCH v2] emacs: bad regexp @ `notmuch-search-process-filter'
Message-ID: <20110713185721.GI25558@mit.edu>
References: <20110705214234.GA15360@mit.edu>
	<1310416993-31031-1-git-send-email-pieter@praet.org>
	<20110711210532.GC25558@mit.edu> <878vs28dvo.fsf@praet.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <878vs28dvo.fsf@praet.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuplleLIzCtJLcpLzFFi42IRYrdT0V35StbP4N5HNot9d7YwWVy/OZPZ
	4vfrG8wOzB67nv9l8ni26hazR8e+y6wBzFFcNimpOZllqUX6dglcGUuaowp+SVf8XfqNuYFx
	hVgXIyeHhICJROe0NewQtpjEhXvr2boYuTiEBPYxSszes4cdwtnAKPFs6XMo5ySTxOvlB5kg
	nCWMEgt6j7OC9LMIqEo8+/iICcRmE9CQ2LZ/OSOILSKgLHH6yU+wHcwCnhIvJi0BqxEW8Jbo
	W3EWrIZXQEfi35WLYHEhgcWMEnd/Z0PEBSVOznzCAtGrJXHj30ugGg4gW1pi+T8OkDCngLrE
	4/9vwU4QFVCRuLa/nW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdQ
	LzezRC81pXQTIyjUOSV5djC+Oah0iFGAg1GJh5fjhKyfEGtiWXFl7iFGSQ4mJVHe8y+BQnxJ
	+SmVGYnFGfFFpTmpxYcYJTiYlUR46xuAcrwpiZVVqUX5MClpDhYlcd5y7/++QgLpiSWp2amp
	BalFMFkZDg4lCV5GYEwLCRalpqdWpGXmlCCkmTg4QYbzAA13BqnhLS5IzC3OTIfIn2JUlBLn
	FQVJCIAkMkrz4HphqegVozjQK8K8OiBVPMA0Btf9CmgwE9DgdVZgg0sSEVJSDYxxdQs/Lo/Q
	LOGUUp8WkrnqhVGHZuPPf5P+zjj1fpu/wwKrWqVbK/NUxDN87sn8rpXu5WmZySAnd75+uvbz
	N0JWx+9OiV8Q2r/o/k7nkJbX5csWcDZn+vr+vD93Lpe1cEvUzqXbjl+tf+Qn9jVxQlx+o3uE
	zZbEUDYecYabc7//73A7ONf7e6sSS3FGoqEWc1FxIgDdG6UGIAMAAA==
Cc: Notmuch Mail <notmuch@notmuchmail.org>, David Edmondson <dme@dme.org>
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: "Use and development of the notmuch mail system."
	<notmuch.notmuchmail.org>
List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Jul 2011 18:57:37 -0000

Quoth Pieter Praet on Jul 13 at  4:16 pm:
> On Mon, 11 Jul 2011 17:05:32 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > Quoth Pieter Praet on Jul 11 at 10:43 pm:
> > > TL;DR: I can haz regex pl0x?
> > 
> > Oof, what a pain.  I'm happy to change the output format of search; I
> > hadn't realized how difficult it would be to parse.  In fact, I'm not
> > sure it's even parsable by regexp, because the message ID's themselves
> > could contain parens.
> > 
> > So what would be a good format?  One possibility would be to
> > NULL-delimit the query part; as distasteful as I find that, this part
> > of the search output isn't meant for user consumption.  Though I fear
> > this is endemic to the dual role the search output currently plays as
> > both user and computer readable.
> > 
> > I've also got the code to do everything using document ID's instead of
> > message ID's.  As a side-effect, it makes the search output clean and
> > readily parsable since document ID's are just numbers.  Hence, there
> > are no quoting or escaping issues (plus the output is much more
> > compact).  I haven't sent this to the list yet because I haven't had a
> > chance to benchmark it and determine if the performance benefits make
> > exposing document ID's worthwhile.
> 
> Jamie Zawinski once said/wrote [1]:
>   'Some people, when confronted with a problem, think "I know,
>   I'll use regular expressions." Now they have two problems.'
> 
> With this in mind, I set out to get rid of this whole regex mess altogether,
> by populating the search buffer using Notmuch's JSON output instead of doing
> brittle text matching tricks.
> 
> Looking for some documentation, I stumbled upon a long-forgotten gem [2].
> 
> David's already done pretty much all of the work for us!

Yes, similar thoughts were running through my head as I futzed with
the formatting for this.  My concern with moving to JSON for search
buffers is that parsing it is about *30 times slower* than the current
regexp-based approach (0.6 seconds versus 0.02 seconds for a mere 1413
result search buffer).  I think JSON makes a lot of sense for show
buffers because there's generally less data and it has a lot of
complicated structure.  Search results, on the other hand, have a very
simple, regular, and constrained structure, so JSON doesn't buy us
nearly as much.

JSON is hard to parse because, like the text search output, it's
designed for human consumption (of course, unlike the text search
output, it's also designed for computer consumption).  There's
something to be said for the debuggability and generality of this and
JSON is very good for exchanging small objects, but it's a remarkably
inefficient way to exchange large amounts of data between two
programs.

I guess what I'm getting at, though it pains me to say it, is perhaps
search needs a fast, computer-readable interchange format.  The
structure of the data is so simple and constrained that this could be
altogether trivial.

Or maybe I need a faster computer.


If anyone is curious, here's how I timed the parsing.

(defmacro time-it (code)
  `(let ((start-time (get-internal-run-time)))
     ,code
     (float-time (time-subtract (get-internal-run-time) start-time))))

(with-current-buffer "json"
  (goto-char (point-min))
  (time-it (json-read)))

(with-current-buffer "text"
  (goto-char (point-min))
  (time-it
   (while (re-search-forward "^\\(thread:[0-9A-Fa-f]*\\) \\([^][]*\\) \\(\\[[0-9/]*\\]\\) \\([^;]*\\); \\(.*\\) (\\([^()]*\\))$" nil t))))