From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 07049431FAF for ; Mon, 27 May 2013 12:04:55 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4uuybgHLJgkT for ; Mon, 27 May 2013 12:04:45 -0700 (PDT) Received: from dmz-mailsec-scanner-4.mit.edu (DMZ-MAILSEC-SCANNER-4.MIT.EDU [18.9.25.15]) by olra.theworths.org (Postfix) with ESMTP id 4565E431FAE for ; Mon, 27 May 2013 12:04:45 -0700 (PDT) X-AuditID: 1209190f-b7f256d000005616-05-51a3ae491c90 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id 4C.05.22038.94EA3A15; Mon, 27 May 2013 15:04:41 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id r4RJ4ewa024942; Mon, 27 May 2013 15:04:41 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r4RJ4bg4029612 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Mon, 27 May 2013 15:04:38 -0400 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1Uh2ii-0008EC-PM; Mon, 27 May 2013 15:04:36 -0400 Date: Mon, 27 May 2013 15:04:36 -0400 From: Austin Clements To: Mark Walters Subject: Re: [PATCH 4/5] emacs: Streaming S-expression parser Message-ID: <20130527190436.GS5999@mit.edu> References: <1368851472-5382-1-git-send-email-amdragon@mit.edu> <1368851472-5382-5-git-send-email-amdragon@mit.edu> <87fvxg2wc4.fsf@qmul.ac.uk> <87ip2b3moz.fsf@awakening.csail.mit.edu> <8761y7ph8v.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8761y7ph8v.fsf@qmul.ac.uk> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrLKsWRmVeSWpSXmKPExsUixCmqreu5bnGgwcstYhar5/JYXL85k9mB yWPnrLvsHs9W3WIOYIrisklJzcksSy3St0vgyth1bTljwbf8iu6bk1kaGI+FdTFyckgImEg0 H5nPBmGLSVy4tx7I5uIQEtjHKLH0ynVmCGcjo8TKb9OgnNNMEo3dC9khnCWMEtPWLGIC6WcR UJVo7/nBCGKzCWhIbNu/HMwWEdCRuH1oATuIzSwgLfHtdzNQPQeHsICtxKL9YKt5BbQl9i35 wwox8w6jxJe+lUwQCUGJkzOfsED06kjs3HqHDaQXZM7yfxwQYXmJ5q2zmUFsTqC1E161g80U FVCRmHJyG9sERuFZSCbNQjJpFsKkWUgmLWBkWcUom5JbpZubmJlTnJqsW5ycmJeXWqRropeb WaKXmlK6iREcB5L8Oxi/HVQ6xCjAwajEwzshe3GgEGtiWXFl7iFGSQ4mJVHe0yuAQnxJ+SmV GYnFGfFFpTmpxYcYJTiYlUR4t4PkeFMSK6tSi/JhUtIcLErivFdTbvoLCaQnlqRmp6YWpBbB ZGU4OJQkeP3XAjUKFqWmp1akZeaUIKSZODhBhvMADdcAqeEtLkjMLc5Mh8ifYjTm2Hx+8jtG jravQFKIJS8/L1VKnNcJpFQApDSjNA9uGiyVvWIUB3pOmNcepIoHmAbh5r0CWsUEtEqcGWxV SSJCSqqBUeNA6of0Pk3vGeqBq//naWTxVC67FPLsQOUdycY10e1B9ds7/lccCbbb9r+h/9nB 6wtqLTvVfZlMbn44Hrl/X9fe+Q5vXk5bbpD2hTnqfemuHZvnMS7o1g20PGJWffnr4etdxbfN WBmTDt3ycZy9dpZcqhnnKRVvsczuT+5LFf8/UFB737oyS4mlOCPRUIu5qDgRAMQVG6RAAwAA Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 May 2013 19:04:55 -0000 Quoth Mark Walters on May 25 at 9:59 am: > > Hi > > On Wed, 22 May 2013, Austin Clements wrote: > > On Tue, 21 May 2013, Mark Walters wrote: > >> Hi > >> > >> This patch looks good to me. Some minor comments below. > > > > Some minor replies below. > > > > In building some other code on top of this, I found an interesting (but > > easy to fix) interface bug. Currently, the interface is designed as if > > it doesn't matter what buffer these functions are called from, however, > > because they move point and expect this point motion to persist, it's > > actually not safe to call this interface unless the caller is in the > > right buffer anyway. For example, if the buffer is selected in a > > window, the with-current-buffer in the parser functions will actually > > move a *temporary* point, meaning that the only way the caller can > > discover the new point is to first select the buffer for itself. I can > > think of two solutions: 1) maintain our own mark for the parser's > > current position or 2) tweak the doc strings and code so that it reads > > from the current buffer. 1 keeps the interface the way it's currently > > documented, but complicates the parser implementation and interface and > > doesn't simplify the caller. 2 simplifies the parser and it turns out > > all callers already satisfy the requirement. > > I am confused by this: the docs strings for json/sexp-parse-partial-list > both say something like "Parse a partial JSON list from current buffer"? > Or do you mean the with-current-buffer in notmuch-search-process-filter? I was referring to the lower level parser, which effectively has the same requirement but isn't documented to and has code that pointlessly tries to track the parsing buffer (I consider json/sexp-parse-partial-list to be a helper). In fact, one reason the lower level parser didn't choke is because right now we only use it through json/sexp-parse-partial-list, which requires that it be called from the right buffer. > >> On Sat, 18 May 2013, Austin Clements wrote: > >>> This provides the same interface as the streaming JSON parser, but > >>> reads S-expressions incrementally. The only difference is that the > >>> `notmuch-sexp-parse-partial-list' helper does not handle interleaved > >>> error messages (since we now have the ability to separate these out at > >>> the invocation level), so it no longer takes an error function and > >>> does not need to do the horrible resynchronization that the JSON > >>> parser had to. > >>> > >>> Some implementation improvements have been made over the JSON parser. > >>> This uses a vector instead of a list for the parser data structure, > >>> since this allows faster access to elements (and modern versions of > >>> Emacs handle storage of small vectors efficiently). Private functions > >>> follow the "prefix--name" convention. And the implementation is much > >>> simpler overall because S-expressions are much easier to parse. > >>> --- > >>> emacs/Makefile.local | 1 + > >>> emacs/notmuch-parser.el | 212 +++++++++++++++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 213 insertions(+) > >>> create mode 100644 emacs/notmuch-parser.el > >>> > >>> diff --git a/emacs/Makefile.local b/emacs/Makefile.local > >>> index 456700a..a910aff 100644 > >>> --- a/emacs/Makefile.local > >>> +++ b/emacs/Makefile.local > >>> @@ -3,6 +3,7 @@ > >>> dir := emacs > >>> emacs_sources := \ > >>> $(dir)/notmuch-lib.el \ > >>> + $(dir)/notmuch-parser.el \ > >>> $(dir)/notmuch.el \ > >>> $(dir)/notmuch-query.el \ > >>> $(dir)/notmuch-show.el \ > >>> diff --git a/emacs/notmuch-parser.el b/emacs/notmuch-parser.el > >>> new file mode 100644 > >>> index 0000000..1b7cf64 > >>> --- /dev/null > >>> +++ b/emacs/notmuch-parser.el > >>> @@ -0,0 +1,212 @@ > >>> +;; notmuch-parser.el --- streaming S-expression parser > >>> +;; > >>> +;; Copyright © Austin Clements > >>> +;; > >>> +;; This file is part of Notmuch. > >>> +;; > >>> +;; Notmuch is free software: you can redistribute it and/or modify it > >>> +;; under the terms of the GNU General Public License as published by > >>> +;; the Free Software Foundation, either version 3 of the License, or > >>> +;; (at your option) any later version. > >>> +;; > >>> +;; Notmuch is distributed in the hope that it will be useful, but > >>> +;; WITHOUT ANY WARRANTY; without even the implied warranty of > >>> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >>> +;; General Public License for more details. > >>> +;; > >>> +;; You should have received a copy of the GNU General Public License > >>> +;; along with Notmuch. If not, see . > >>> +;; > >>> +;; Authors: Austin Clements > >>> + > >>> +(require 'cl) > >>> + > >>> +(defun notmuch-sexp-create-parser (buffer) > >>> + "Return a streaming S-expression parser that reads from BUFFER. > >>> + > >>> +This parser is designed to incrementally read an S-expression > >>> +whose structure is known to the caller. Like a typical > >>> +S-expression parsing interface, it provides a function to read a > >>> +complete S-expression from the input. However, it extends this > >>> +with an additional function that requires the next value in the > >>> +input to be a list and descends into it, allowing its elements to > >>> +be read one at a time or further descended into. Both functions > >>> +can return 'retry to indicate that not enough input is available. > >>> + > >>> +The parser always consumes input from BUFFER's point. Hence, the > >>> +caller is allowed to delete any data before point and may > >>> +resynchronize after an error by moving point." > >>> + > >>> + (vector 'notmuch-sexp-parser > >>> + buffer > >>> + ;; List depth > >>> + 0 > >>> + ;; Partial parse position marker > >>> + nil > >>> + ;; Partial parse state > >>> + nil)) > >>> + > >>> +(defmacro notmuch-sexp--buffer (sp) `(aref ,sp 1)) > >>> +(defmacro notmuch-sexp--depth (sp) `(aref ,sp 2)) > >>> +(defmacro notmuch-sexp--partial-pos (sp) `(aref ,sp 3)) > >>> +(defmacro notmuch-sexp--partial-state (sp) `(aref ,sp 4)) > >> > >> Why the double hyphen --? Is it a name-space or some convention? > > > > More specifically, this seems to be the most common Elisp convention for > > indicating private symbols. > > Ok. If we are keeping the json parser it might be worth making it follow > the same convention but as it is purely internal it's probably not worth > bothering. Yeah, it is purely internal. Also, I was planning to remove the JSON parser (or maybe move it somewhere else? I feel bad deleting that much perfectly functional code, though of course git will keep it in perpetuity). > >>> +(defun notmuch-sexp-read (sp) > >>> + "Consume and return the value at point in SP's buffer. > >>> + > >>> +Returns 'retry if there is insufficient input to parse a complete > >>> +value (though it may still move point over whitespace). If the > >>> +parser is currently inside a list and the next token ends the > >>> +list, this moves point just past the terminator and returns 'end. > >>> +Otherwise, this moves point to just past the end of the value and > >>> +returns the value." > >>> + > >>> + (with-current-buffer (notmuch-sexp--buffer sp) > >>> + (skip-chars-forward " \n\r\t") > >>> + (cond ((eobp) 'retry) > >>> + ((= (char-after) ?\)) > >>> + ;; We've reached the end of a list > >>> + (if (= (notmuch-sexp--depth sp) 0) > >>> + ;; .. but we weren't in a list. Let read signal the > >>> + ;; error. > >>> + (read (current-buffer)) > >> > >> Why is good for read to signal the error rather than us doing it? > > > > This ensures the syntax error handling and signal behavior of > > notmuch-sexp-read is identical in every way to a regular read call. > > Maybe the comment should read "Let read signal the error like we do in > > all other code paths."? > > Yes that would be good: or perhaps "like it does in all other code > paths". Sure. > Best wishes > > Mark > > > > >>> + ;; Go up a level and return an end token > >>> + (decf (notmuch-sexp--depth sp)) > >>> + (forward-char) > >>> + 'end)) > >>> + ((= (char-after) ?\() > >>> + ;; We're at the beginning of a list. If we haven't started > >>> + ;; a partial parse yet, attempt to read the list in its > >>> + ;; entirety. If this fails, or we've started a partial > >>> + ;; parse, extend the partial parse to figure out when we > >>> + ;; have a complete list. > >>> + (catch 'return > >>> + (when (null (notmuch-sexp--partial-state sp)) > >>> + (let ((start (point))) > >>> + (condition-case nil > >>> + (throw 'return (read (current-buffer))) > >>> + (end-of-file (goto-char start))))) > >>> + ;; Extend the partial parse > >>> + (let (is-complete) > >>> + (save-excursion > >>> + (let* ((new-state (parse-partial-sexp > >>> + (or (notmuch-sexp--partial-pos sp) (point)) > >>> + (point-max) 0 nil > >>> + (notmuch-sexp--partial-state sp))) > >>> + ;; A complete value is available if we've > >>> + ;; reached depth 0. > >>> + (depth (first new-state))) > >>> + (assert (>= depth 0)) > >>> + (if (= depth 0) > >>> + ;; Reset partial parse state > >>> + (setf (notmuch-sexp--partial-state sp) nil > >>> + (notmuch-sexp--partial-pos sp) nil > >>> + is-complete t) > >>> + ;; Update partial parse state > >>> + (setf (notmuch-sexp--partial-state sp) new-state > >>> + (notmuch-sexp--partial-pos sp) (point-marker))))) > >>> + (if is-complete > >>> + (read (current-buffer)) > >>> + 'retry)))) > >>> + (t > >>> + ;; Attempt to read a non-compound value > >>> + (let ((start (point))) > >>> + (condition-case nil > >>> + (let ((val (read (current-buffer)))) > >>> + ;; We got what looks like a complete read, but if > >>> + ;; we reached the end of the buffer in the process, > >>> + ;; we may not actually have all of the input we > >>> + ;; need (unless it's a string, which is delimited). > >>> + (if (or (stringp val) (not (eobp))) > >>> + val > >>> + ;; We can't be sure the input was complete > >>> + (goto-char start) > >>> + 'retry)) > >>> + (end-of-file > >>> + (goto-char start) > >>> + 'retry))))))) > >>> + > >>> +(defun notmuch-sexp-begin-list (sp) > >>> + "Parse the beginning of a list value and enter the list. > >>> + > >>> +Returns 'retry if there is insufficient input to parse the > >>> +beginning of the list. If this is able to parse the beginning of > >>> +a list, it moves point past the token that opens the list and > >>> +returns t. Later calls to `notmuch-sexp-read' will return the > >>> +elements inside the list. If the input in buffer is not the > >>> +beginning of a list, throw invalid-read-syntax." > >>> + > >>> + (with-current-buffer (notmuch-sexp--buffer sp) > >>> + (skip-chars-forward " \n\r\t") > >>> + (cond ((eobp) 'retry) > >>> + ((= (char-after) ?\() > >>> + (forward-char) > >>> + (incf (notmuch-sexp--depth sp)) > >>> + t) > >>> + (t > >>> + ;; Skip over the bad character like `read' does > >>> + (forward-char) > >>> + (signal 'invalid-read-syntax (list (string (char-before)))))))) > >>> + > >>> +(defun notmuch-sexp-eof (sp) > >>> + "Signal an error if there is more data in SP's buffer. > >>> + > >>> +Moves point to the beginning of any trailing data or to the end > >>> +of the buffer if there is only trailing whitespace." > >>> + > >>> + (with-current-buffer (notmuch-sexp--buffer sp) > >>> + (skip-chars-forward " \n\r\t") > >>> + (unless (eobp) > >>> + (error "Trailing garbage following expression")))) > >>> + > >>> +(defvar notmuch-sexp--parser nil > >>> + "The buffer-local notmuch-sexp-parser instance. > >>> + > >>> +Used by `notmuch-sexp-parse-partial-list'.") > >>> + > >>> +(defvar notmuch-sexp--state nil > >>> + "The buffer-local `notmuch-sexp-parse-partial-list' state.") > >>> + > >>> +(defun notmuch-sexp-parse-partial-list (result-function result-buffer) > >>> + "Incrementally parse an S-expression list from the current buffer. > >>> + > >>> +This function consume an S-expression list from the current > >> > >> consumes > > > > Oops, yes. > > > >>> +buffer, applying RESULT-FUNCTION in RESULT-BUFFER to each > >>> +complete value in the list. It operates incrementally and should > >>> +be called whenever the input buffer has been extended with > >>> +additional data. The caller just needs to ensure it does not > >>> +move point in the input buffer." > >>> + > >>> + ;; Set up the initial state > >>> + (unless (local-variable-p 'notmuch-sexp--parser) > >>> + (set (make-local-variable 'notmuch-sexp--parser) > >>> + (notmuch-sexp-create-parser (current-buffer))) > >>> + (set (make-local-variable 'notmuch-sexp--state) 'begin)) > >>> + (let (done) > >>> + (while (not done) > >>> + (case notmuch-sexp--state > >>> + (begin > >>> + ;; Enter the list > >>> + (if (eq (notmuch-sexp-begin-list notmuch-sexp--parser) 'retry) > >>> + (setq done t) > >>> + (setq notmuch-sexp--state 'result))) > >>> + (result > >>> + ;; Parse a result > >>> + (let ((result (notmuch-sexp-read notmuch-sexp--parser))) > >>> + (case result > >>> + (retry (setq done t)) > >>> + (end (setq notmuch-sexp--state 'end)) > >>> + (t (with-current-buffer result-buffer > >>> + (funcall result-function result)))))) > >>> + (end > >>> + ;; Any trailing data is unexpected > >>> + (notmuch-sexp-eof notmuch-sexp--parser) > >>> + (setq done t))))) > >>> + ;; Clear out what we've parsed > >>> + (delete-region (point-min) (point))) > >>> + > >>> +(provide 'notmuch-parser) > >>> + > >>> +;; Local Variables: > >>> +;; byte-compile-warnings: (not cl-functions) > >>> +;; End: