From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 43011431FAF for ; Thu, 5 Jul 2012 11:36:25 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xQXdBg+Hsd-E for ; Thu, 5 Jul 2012 11:36:24 -0700 (PDT) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by olra.theworths.org (Postfix) with ESMTP id AC51C431FAE for ; Thu, 5 Jul 2012 11:36:23 -0700 (PDT) X-AuditID: 12074424-b7f2a6d0000008bf-73-4ff5dea6533d Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 4B.55.02239.6AED5FF4; Thu, 5 Jul 2012 14:36:22 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id q65IaLka008157; Thu, 5 Jul 2012 14:36:21 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q65IaJKa023407 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Thu, 5 Jul 2012 14:36:20 -0400 (EDT) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1SmquZ-00045N-36; Thu, 05 Jul 2012 14:36:19 -0400 Date: Thu, 5 Jul 2012 14:36:19 -0400 From: Austin Clements To: Mark Walters Subject: Re: [PATCH 7/8] emacs: Implement an incremental JSON parser Message-ID: <20120705183619.GI21653@mit.edu> References: <1341354059-29396-1-git-send-email-amdragon@mit.edu> <1341354059-29396-8-git-send-email-amdragon@mit.edu> <873956fw4u.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <873956fw4u.fsf@qmul.ac.uk> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuplleLIzCtJLcpLzFFi42IRYrdT0V1276u/wc/9ihar5/JYXL85k9ni zcp5rA7MHjtn3WX3OPx1IYvHs1W3mAOYo7hsUlJzMstSi/TtErgyGtbNYi/oDqw4MeUnUwNj i10XIyeHhICJxOH5l5ggbDGJC/fWs3UxcnEICexjlNj76QojhLOeUaL/wS1WCOcEk8Txo5uZ IZwljBKrb7SxgvSzCKhIbFm8CWwWm4CGxLb9yxlBbBEBHYnbhxawg9jMAvoSK0/OZAaxhQVc JM5+Og9WwwtU83HVcaihUxkl2u5+YIVICEqcnPmEBaJZS+LGv5dACziAbGmJ5f84QExOoF0d c9RAKkSBTphychvbBEahWUiaZyFpnoXQvICReRWjbEpulW5uYmZOcWqybnFyYl5eapGuuV5u ZoleakrpJkZwqLuo7GBsPqR0iFGAg1GJh9cw94u/EGtiWXFl7iFGSQ4mJVHe11e/+gvxJeWn VGYkFmfEF5XmpBYfYpTgYFYS4e3NAMrxpiRWVqUW5cOkpDlYlMR5r6fc9BcSSE8sSc1OTS1I LYLJynBwKEnwGgFjWkiwKDU9tSItM6cEIc3EwQkynAdouBlIDW9xQWJucWY6RP4Uo6KUOO+9 u0AJAZBERmkeXC8sFb1iFAd6RZiXF6SdB5jG4LpfAQ1mAhqct/gTyOCSRISUVAPjrOrcqax/ 7b805tu0hstyrpl14cfy6KiJYtER6o1dfDNfZH/flJZv9GZjVUTV/UI5zT7FtA3JZ2Ykm7Hw q3k+2Ld+iuosa9HrOw5V8c7KepS1LJdh92/NUxa8Nu3/erL2v3h30v67SlnAB8Np9kai0bvb rk/oyamsOnXUJdUnKXPlvCt2ETlKLMUZiYZazEXFiQDUe/HSIAMAAA== Cc: tomi.ollila@iki.fi, notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jul 2012 18:36:26 -0000 Quoth Mark Walters on Jul 05 at 9:30 am: > On Tue, 03 Jul 2012, Austin Clements wrote: > > This parser is designed to read streaming JSON whose structure is > > known to the caller. Like a typical JSON parsing interface, it > > provides a function to read a complete JSON value from the input. > > However, it extends this with an additional function that > > requires the next value in the input to be a compound value and > > descends into it, allowing its elements to be read one at a time > > or further descended into. Both functions can return 'retry to > > indicate that not enough input is available. > > > > The parser supports efficient partial parsing, so there's no need to > > frame the input for correctness or performance. > > > > Currently only descending into JSON lists is supported because that's > > all we need, but support for descending into JSON objects can be added > > in the future. > > --- > > emacs/notmuch-lib.el | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 183 insertions(+) > > > > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el > > index c829df3..f7cda33 100644 > > --- a/emacs/notmuch-lib.el > > +++ b/emacs/notmuch-lib.el > > @@ -23,6 +23,7 @@ > > > > (require 'mm-view) > > (require 'mm-decode) > > +(require 'json) > > (eval-when-compile (require 'cl)) > > > > (defvar notmuch-command "notmuch" > > @@ -296,6 +297,188 @@ was called." > > (defvar notmuch-show-process-crypto nil) > > (make-variable-buffer-local 'notmuch-show-process-crypto) > > > > +;; Incremental JSON parsing > > + > > +(defun notmuch-json-create-parser (buffer) > > + "Return a streaming JSON parser that consumes input from BUFFER. > > + > > +This parser is designed to read streaming JSON whose structure is > > +known to the caller. Like a typical JSON parsing interface, it > > +provides a function to read a complete JSON value from the input. > > +However, it extends this with an additional function that > > +requires the next value in the input to be a compound value and > > +descends into it, allowing its elements to be read one at a time > > +or further descended into. Both functions can return 'retry to > > +indicate that not enough input is available. > > + > > +The parser always consumes input from BUFFER's point. Hence, the > > +caller is allowed to delete and data before point and may > > +resynchronize after an error by moving point." > > + > > + (list buffer > > + ;; Terminator stack: a stack of characters that indicate the > > + ;; end of the compound values enclosing point > > + '() > > + ;; Next: One of > > + ;; * 'expect-value if the next token must be a value, but a > > + ;; value has not yet been reached > > + ;; * 'value if point is at the beginning of a value > > + ;; * 'expect-comma if the next token must be a comma > > + 'expect-value > > + ;; Allow terminator: non-nil if the next token may be a > > + ;; terminator > > + nil > > + ;; Partial parse position: If state is 'value, a marker for > > + ;; the position of the partial parser or nil if no partial > > + ;; parsing has happened yet > > + nil > > + ;; Partial parse state: If state is 'value, the current > > + ;; `parse-partial-sexp' state > > + nil)) > > + > > +(defmacro notmuch-json-buffer (jp) `(first ,jp)) > > +(defmacro notmuch-json-term-stack (jp) `(second ,jp)) > > +(defmacro notmuch-json-next (jp) `(third ,jp)) > > +(defmacro notmuch-json-allow-term (jp) `(fourth ,jp)) > > +(defmacro notmuch-json-partial-pos (jp) `(fifth ,jp)) > > +(defmacro notmuch-json-partial-state (jp) `(sixth ,jp)) > > + > > +(defvar notmuch-json-syntax-table > > + (let ((table (make-syntax-table))) > > + ;; The standard syntax table is what we need except that "." needs > > + ;; to have word syntax instead of punctuation syntax. > > + (modify-syntax-entry ?. "w" table) > > + table) > > + "Syntax table used for incremental JSON parsing.") > > + > > +(defun notmuch-json-scan-to-value (jp) > > + ;; Helper function that consumes separators, terminators, and > > + ;; whitespace from point. Returns nil if it successfully reached > > + ;; the beginning of a value, 'end if it consumed a terminator, or > > + ;; 'retry if not enough input was available to reach a value. Upon > > + ;; nil return, (notmuch-json-next jp) is always 'value. > > + > > + (if (eq (notmuch-json-next jp) 'value) > > + ;; We're already at a value > > + nil > > + ;; Drive the state toward 'expect-value > > + (skip-chars-forward " \t\r\n") > > + (or (when (eobp) 'retry) > > + ;; Test for the terminator for the current compound > > + (when (and (notmuch-json-allow-term jp) > > + (eq (char-after) (car (notmuch-json-term-stack jp)))) > > + ;; Consume it and expect a comma or terminator next > > + (forward-char) > > + (setf (notmuch-json-term-stack jp) (cdr (notmuch-json-term-stack jp)) > > + (notmuch-json-next jp) 'expect-comma > > + (notmuch-json-allow-term jp) t) > > + 'end) > > + ;; Test for a separator > > + (when (eq (notmuch-json-next jp) 'expect-comma) > > + (when (/= (char-after) ?,) > > + (signal 'json-readtable-error (list "expected ','"))) > > + ;; Consume it, switch to 'expect-value, and disallow a > > + ;; terminator > > + (forward-char) > > + (skip-chars-forward " \t\r\n") > > + (setf (notmuch-json-next jp) 'expect-value > > + (notmuch-json-allow-term jp) nil) > > + ;; We moved point, so test for eobp again and fall through > > + ;; to the next test if there's more input > > + (when (eobp) 'retry)) > > + ;; Next must be 'expect-value and we know this isn't > > + ;; whitespace, EOB, or a terminator, so point must be on a > > + ;; value > > + (progn > > + (assert (eq (notmuch-json-next jp) 'expect-value)) > > + (setf (notmuch-json-next jp) 'value) > > + nil)))) > > + > > +(defun notmuch-json-begin-compound (jp) > > + "Parse the beginning of a compound value and traverse inside it. > > + > > +Returns 'retry if there is insufficient input to parse the > > +beginning of the compound. If this is able to parse the > > +beginning of a compound, it returns t and later calls to > > +`notmuch-json-read' will return the compound's elements. > > + > > +Entering JSON objects is current unimplemented." > > + > > + (with-current-buffer (notmuch-json-buffer jp) > > + ;; Disallow terminators > > + (setf (notmuch-json-allow-term jp) nil) > > + (or (notmuch-json-scan-to-value jp) > > + (if (/= (char-after) ?\[) > > + (signal 'json-readtable-error (list "expected '['")) > > + (forward-char) > > + (push ?\] (notmuch-json-term-stack jp)) > > + ;; Expect a value or terminator next > > + (setf (notmuch-json-next jp) 'expect-value > > + (notmuch-json-allow-term jp) t) > > + t)))) > > + > > +(defun notmuch-json-read (jp) > > + "Parse the value at point in JP's buffer. > > + > > +Returns 'retry if there is insufficient input to parse a complete > > +JSON value. If the parser is currently inside a compound value > > +and the next token ends the list or object, returns 'end. > > +Otherwise, returns the value." > > > This looks excellent! My only comment is that I think it would be > helpful it the above comment and the one for notmuch-json-begin-compound > said what happens to point when they get called. Good idea. > Also, I think in practice a lot of the parsing is done by json.el (that > you use this to enter one level of the hierarchy and then use this to > ensure that json.el only gets complete objects to parse) so I assume > this means that your much faster replacement for json.el could be > slotted in? It might be worth mentioning that in the commit message. Yes, definitely. My hope is that we'll skip over the optimized JSON parser and move straight to S-expressions, which perform ~3X better than my optimized json.el (and ~8X better than Emacs' json.el). > Best wishes > > Mark > > > + > > + (with-current-buffer (notmuch-json-buffer jp) > > + (or > > + ;; Get to a value state > > + (notmuch-json-scan-to-value jp) > > + > > + ;; Can we parse a complete value? > > + (let ((complete > > + (if (looking-at "[-+0-9tfn]") > > + ;; This is a number or a keyword, so the partial > > + ;; parser isn't going to help us because a truncated > > + ;; number or keyword looks like a complete symbol to > > + ;; it. Look for something that clearly ends it. > > + (save-excursion > > + (skip-chars-forward "^]},: \t\r\n") > > + (not (eobp))) > > + > > + ;; We're looking at a string, object, or array, which we > > + ;; can partial parse. If we just reached the value, set > > + ;; up the partial parser. > > + (when (null (notmuch-json-partial-state jp)) > > + (setf (notmuch-json-partial-pos jp) (point-marker))) > > + > > + ;; Extend the partial parse until we either reach EOB or > > + ;; get the whole value > > + (save-excursion > > + (let ((pstate > > + (with-syntax-table notmuch-json-syntax-table > > + (parse-partial-sexp > > + (notmuch-json-partial-pos jp) (point-max) 0 nil > > + (notmuch-json-partial-state jp))))) > > + ;; A complete value is available if we've reached > > + ;; depth 0 or less and encountered a complete > > + ;; subexpression. > > + (if (and (<= (first pstate) 0) (third pstate)) > > + t > > + ;; Not complete. Update the partial parser state > > + (setf (notmuch-json-partial-pos jp) (point-marker) > > + (notmuch-json-partial-state jp) pstate) > > + nil)))))) > > + > > + (if (not complete) > > + 'retry > > + ;; We have a value. Reset the partial parse state and expect > > + ;; a comma or terminator after the value. > > + (setf (notmuch-json-next jp) 'expect-comma > > + (notmuch-json-allow-term jp) t > > + (notmuch-json-partial-pos jp) nil > > + (notmuch-json-partial-state jp) nil) > > + ;; Parse the value > > + (let* ((json-object-type 'plist) > > + (json-array-type 'list) > > + (json-false nil)) > > + (json-read))))))) > > + > > (provide 'notmuch-lib) > > > > ;; Local Variables: