From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.devel Subject: Re: Make peg.el a built-in library? Date: Wed, 08 Sep 2021 21:36:10 -0700 Message-ID: <875yvafjr9.fsf@ericabrahamsen.net> References: <875yvtbbn3.fsf@ericabrahamsen.net> <83wno8u3uz.fsf@gnu.org> <87v93s9q4n.fsf@ericabrahamsen.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7424"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Sep 09 06:37:26 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mOBoT-0001ku-Gs for ged-emacs-devel@m.gmane-mx.org; Thu, 09 Sep 2021 06:37:25 +0200 Original-Received: from localhost ([::1]:42088 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mOBoS-00035j-Dk for ged-emacs-devel@m.gmane-mx.org; Thu, 09 Sep 2021 00:37:24 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51436) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mOBnW-000227-7j for emacs-devel@gnu.org; Thu, 09 Sep 2021 00:36:26 -0400 Original-Received: from mail.ericabrahamsen.net ([52.70.2.18]:38852) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mOBnS-0006o1-RK; Thu, 09 Sep 2021 00:36:25 -0400 Original-Received: from localhost (c-71-197-232-156.hsd1.wa.comcast.net [71.197.232.156]) (Authenticated sender: eric@ericabrahamsen.net) by mail.ericabrahamsen.net (Postfix) with ESMTPSA id B0B5CFA82D; Thu, 9 Sep 2021 04:36:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericabrahamsen.net; s=mail; t=1631162173; bh=dIdU10Ps6If4Q87MhiVxCa1Hh+ZIIj82E76ydjHKWO0=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=fB4TmC9P1gkXbtpi75usRjpQAz0yrlfXxgVKNb/FxDOAx9SDjpzd4vAsfx3O37qFa zJYYstC40eqCjdZCVZG2F4iUt/eFvkmmgDMqTpyVN2tgrSySlFJuVt0qXlqtBlkFVN OWy6Gu/6BuEhLIJhqAsDqzG764BVgCenP1DMzWlQ= In-Reply-To: <87v93s9q4n.fsf@ericabrahamsen.net> (Eric Abrahamsen's message of "Thu, 26 Aug 2021 08:34:16 -0700") Received-SPF: pass client-ip=52.70.2.18; envelope-from=eric@ericabrahamsen.net; helo=mail.ericabrahamsen.net X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:274421 Archived-At: --=-=-= Content-Type: text/plain On 08/26/21 08:34 AM, Eric Abrahamsen wrote: > Eli Zaretskii writes: > >>> From: Eric Abrahamsen >>> Date: Wed, 25 Aug 2021 11:52:00 -0700 >>> Cc: Stefan Monnier >>> >>> In my on-again-off-again quest to not have to write text parsers myself, >>> I was pointed towards the PEG library (in ELPA), which does pretty much >>> exactly what I want (Parsing Expression Grammars). >>> >>> Would the maintainers consider moving this into Emacs proper? I ask >>> mostly because this would be very useful to have in Gnus, both to >>> replace the home-made parser in gnus-search.el, and I would hope to >>> parse eg IMAP server responses more fully and reliably. >> >> Fine with me, but please update the (outdated) Wiki page to say where >> the latest peg.el is, when it is imported. > > Will do. Stefan also asked me to make sure the library actually does > what I expect it to do, before making this move, so I'll write the code > first. Okay, I wrote some code: the "use-peg-in-gnus-search.diff" attachment is the result of that. It works really well! A net removal of ~100 LOC (obviously we're still in deficit with the addition of peg.el), it already fixes some wrong behavior of the old parser, and it's much easier to reason about and add new behavior to. It's the shiny declarative future I was looking forward to. Whether or not PEG gets added to core I'd like to propose some patches. The "peg-doc-patches.diff" attachment adds some documentation to the Commentary section, including an example grammar based on a much-simplified version of what gnus-search does. The peg-allow-symbols patch is more tentative. The issue is that _all_ of the entry-points to peg code are macros, meaning you can't build your grammar up in a variable, and then pass that variable to any of `peg-run', `peg-parse', `with-peg-rules', etc. Nobody will evaluate the variable; you have to literally write the rules inside the `with-peg-rules' form. It seems like a fairly plausible use-case to store the rules in a variable or an option, even if you're not doing run-time manipulation of them. The only solution, as Adam found with org-ql, is to `eval' one of the macros. This doesn't seem necessary! The patch has `with-peg-rules' check if the rules are a symbol, and take the `symbol-value' if so. But I wonder if it wouldn't be nicer to break some of the code out: `peg-normalize' seems to be the entry-point for "compile this grammar", and that could be modified to work the way that some languages provide for pre-compiled regexps: a way to let the developer build and compile the grammar at load-time or launch-time, then feed the stored compiled version to parsing routines. `peg-parse' could be a function, or maybe it also could also just check if its argument is a symbol. I hope someone will have some thoughts on this! Eric --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=use-peg-in-gnus-search.diff diff --git a/lisp/gnus/gnus-search.el b/lisp/gnus/gnus-search.el index 2a8069d400..5574061457 100644 --- a/lisp/gnus/gnus-search.el +++ b/lisp/gnus/gnus-search.el @@ -82,6 +82,7 @@ (require 'gnus-sum) (require 'message) (require 'gnus-util) +(require 'peg) (require 'eieio) (eval-when-compile (require 'cl-lib)) (autoload 'eieio-build-class-alist "eieio-opt") @@ -390,8 +391,29 @@ gnus-search-contact-tables ;;; Search language -;; This "language" was generalized from the original IMAP search query -;; parsing routine. +;; Here's our attempt at using the PEG library to rewrite the parser. + +(defvar gnus-search-query-pexs + '((query (+ (or compound-term term))) + (term (or subquery prefixed-term kv-term value) term-end) + (subquery "(" query ")" + `(query -- (if (= 1 (length query)) query (list query)))) + (prefixed-term (or negated-term near-term)) + (negated-term (or "not " "-") term + `(term -- (list 'not term))) + (near-term "near " term + `(term -- (list 'near term))) + (compound-term (or or-terms and-terms)) + (or-terms (or subquery prefixed-term term) "or " (or subquery prefixed-term term) + `(t1 t2 -- (list 'or t1 t2))) + (and-terms (or subquery prefixed-term term) "and " (or subquery prefixed-term term) + `(t1 t2 -- (list 'and t1 t2))) + (value (or quoted-value plain-value)) + (plain-value (substring (+ [word]))) + (quoted-value "\"" (substring (+ (not "\"") (any))) "\"") + (kv-term plain-value ":" value + `(k v -- (gnus-search-query-parse-kv k v))) + (term-end (opt (+ [space]))))) (defun gnus-search-parse-query (string) "Turn STRING into an s-expression based query. @@ -459,108 +481,26 @@ gnus-search-parse-query structured query. Malformed, unusable or invalid queries will typically be silently ignored." (with-temp-buffer - ;; Set up the parsing environment. (insert string) (goto-char (point-min)) - ;; Now, collect the output terms and return them. - (let (out) - (while (not (gnus-search-query-end-of-input)) - (push (gnus-search-query-next-expr) out)) - (reverse out)))) - -(defun gnus-search-query-next-expr (&optional count halt) - "Return the next expression from the current buffer." - (let ((term (gnus-search-query-next-term count)) - (next (gnus-search-query-peek-symbol))) - ;; Deal with top-level expressions. And, or, not, near... What - ;; else? Notmuch also provides xor and adj. It also provides a - ;; "nearness" parameter for near and adj. - (cond - ;; Handle 'expr or expr' - ((and (eq next 'or) - (null halt)) - (list 'or term (gnus-search-query-next-expr 2))) - ;; Handle 'near operator. - ((eq next 'near) - (let ((near-next (gnus-search-query-next-expr 2))) - (if (and (stringp term) - (stringp near-next)) - (list 'near term near-next) - (signal 'gnus-search-parse-error - (list "\"Near\" keyword must appear between two plain strings."))))) - ;; Anything else - (t term)))) - -(defun gnus-search-query-next-term (&optional count) - "Return the next TERM from the current buffer." - (let ((term (gnus-search-query-next-symbol count))) - ;; What sort of term is this? - (cond - ;; negated term - ((eq term 'not) (list 'not (gnus-search-query-next-expr nil 'halt))) - ;; generic term - (t term)))) - -(defun gnus-search-query-peek-symbol () - "Return the next symbol from the current buffer, but don't consume it." - (save-excursion - (gnus-search-query-next-symbol))) - -(defun gnus-search-query-next-symbol (&optional count) - "Return the next symbol from the current buffer, or nil if we are -at the end of the buffer. If supplied COUNT skips some symbols before -returning the one at the supplied position." - (when (and (numberp count) (> count 1)) - (gnus-search-query-next-symbol (1- count))) - (let ((case-fold-search t)) - ;; end of input stream? - (unless (gnus-search-query-end-of-input) - ;; No, return the next symbol from the stream. - (cond - ;; Negated expression -- return it and advance one char. - ((looking-at "-") (forward-char 1) 'not) - ;; List expression -- we parse the content and return this as a list. - ((looking-at "(") - (gnus-search-parse-query (gnus-search-query-return-string ")" t))) - ;; Keyword input -- return a symbol version. - ((looking-at "\\band\\b") (forward-char 3) 'and) - ((looking-at "\\bor\\b") (forward-char 2) 'or) - ((looking-at "\\bnot\\b") (forward-char 3) 'not) - ((looking-at "\\bnear\\b") (forward-char 4) 'near) - ;; Plain string, no keyword - ((looking-at "[\"/]?\\b[^:]+\\([[:blank:]]\\|\\'\\)") - (gnus-search-query-return-string - (when (looking-at-p "[\"/]") t))) - ;; Assume a K:V expression. - (t (let ((key (gnus-search-query-expand-key - (buffer-substring - (point) - (progn - (re-search-forward ":" (point-at-eol) t) - (1- (point)))))) - (value (gnus-search-query-return-string - (when (looking-at-p "[\"/]") t)))) - (gnus-search-query-parse-kv key value))))))) + (with-peg-rules gnus-search-query-pexs + peg-run (peg query)))) (defun gnus-search-query-parse-kv (key value) "Handle KEY and VALUE, parsing and expanding as necessary. -This may result in (key value) being turned into a larger query -structure. - In the simplest case, they are simply consed together. String KEY is converted to a symbol." - (let () ;; return - (cond - ((member key gnus-search-date-keys) - (when (string= "after" key) - (setq key "since")) - (setq value (gnus-search-query-parse-date value))) - ((equal key "mark") - (setq value (gnus-search-query-parse-mark value))) - ((string= "message-id" key) - (setq key "id"))) - (or nil ;; return - (cons (intern key) value)))) + (setq key (gnus-search-query-expand-key key)) + (cond + ((member key gnus-search-date-keys) + (when (string= "after" key) + (setq key "since")) + (setq value (gnus-search-query-parse-date value))) + ((equal key "mark") + (setq value (gnus-search-query-parse-mark value))) + ((string= "message-id" key) + (setq key "id"))) + (cons (intern key) value)) (defun gnus-search-query-parse-date (value &optional rel-date) "Interpret VALUE as a date specification. @@ -647,44 +587,6 @@ gnus-search-query-expand-key ;; We completed to a unique known key. comp)))) -(defun gnus-search-query-return-string (&optional delimited trim) - "Return a string from the current buffer. -If DELIMITED is non-nil, assume the next character is a delimiter -character, and return everything between point and the next -occurrence of the delimiter, including the delimiters themselves. -If TRIM is non-nil, do not return the delimiters. Otherwise, -return one word." - ;; This function cannot handle nested delimiters, as it's not a - ;; proper parser. Ie, you cannot parse "to:bob or (from:bob or - ;; (cc:bob or bcc:bob))". - (let ((start (point)) - (delimiter (if (stringp delimited) - delimited - (when delimited - (char-to-string (char-after))))) - end) - (if delimiter - (progn - (when trim - ;; Skip past first delimiter if we're trimming. - (forward-char 1)) - (while (not end) - (unless (search-forward delimiter nil t (unless trim 2)) - (signal 'gnus-search-parse-error - (list (format "Unmatched delimited input with %s in query" delimiter)))) - (let ((here (point))) - (unless (equal (buffer-substring (- here 2) (- here 1)) "\\") - (setq end (if trim (1- (point)) (point)) - start (if trim (1+ start) start)))))) - (setq end (progn (re-search-forward "\\([[:blank:]]+\\|$\\)" (point-max) t) - (match-beginning 0)))) - (buffer-substring-no-properties start end))) - -(defun gnus-search-query-end-of-input () - "Are we at the end of input?" - (skip-chars-forward "[:blank:]") - (looking-at "$")) - ;;; Search engines ;; Search engines are implemented as classes. This is good for two --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=peg-doc-patch.diff diff --git a/peg.el b/peg.el index d71c707dc0..0e4221eeb7 100644 --- a/peg.el +++ b/peg.el @@ -79,17 +79,69 @@ ;; Beginning-of-Symbol (bos) ;; End-of-Symbol (eos) ;; -;; PEXs also support parsing actions, i.e. Lisp snippets which -;; are executed when a pex matches. This can be used to construct -;; syntax trees or for similar tasks. Actions are written as +;; Rules can refer to other rules, and a grammar is often structured +;; as a tree, with a root rule referring to one or more "branch +;; rules", all the way down to the "leaf rules" that deal with actual +;; buffer text. Rules can be recursive or mutually referential, +;; though care must be taken not to create infinite loops. +;; +;; PEXs also support parsing actions, i.e. Lisp snippets which are +;; executed when a pex matches. This can be used to construct syntax +;; trees or for similar tasks. The most basic form of action is +;; written as: ;; ;; (action FORM) ; evaluate FORM for its side-effects -;; `(VAR... -- FORM...) ; stack action ;; ;; Actions don't consume input, but are executed at the point of -;; match. A "stack action" takes VARs from the "value stack" and -;; pushes the result of evaluating FORMs to that stack. -;; See `peg-ex-parse-int' in `peg-tests.el' for an example. +;; match. Another kind of action is called a "stack action", and +;; looks like this: +;; +;; `(VAR... -- FORM...) ; stack action +;; +;; A stack action takes VARs from the "value stack" and pushes the +;; results of evaluating FORMs to that stack. + +;; The value stack is created during the course of parsing. Certain +;; operators (see below) that match buffer text can push values onto +;; this stack. "Upstream" rules can then draw values from the stack, +;; and optionally push new ones back. For instance, consider this +;; very simple grammar: +;; +;; (with-peg-rules +;; ((query (+ term) (eol)) +;; (term key ":" value (opt (+ [space])) +;; `(k v -- (cons (intern k) v))) +;; (key (substring (and (not ":") (+ [word])))) +;; (value (or string-value number-value)) +;; (string-value (substring (+ [alpha]))) +;; (number-value (substring (+ [digit])) +;; `(val -- (string-to-number val)))) +;; (peg-run (peg query))) +;; +;; This invocation of `peg-run' would parse this buffer text: +;; +;; name:Jane age:30 +;; +;; And return this Elisp sexp: +;; +;; ((age . 30) (name . "Jane")) +;; +;; Note that, in complex grammars, some care must be taken to make +;; sure that the number and type of values drawn from the stack always +;; match those pushed. In the example above, both `string-value' and +;; `number-value' push a single value to the stack. Since the `value' +;; rule only includes these two sub-rules, any upstream rule that +;; makes use of `value' can be confident it will always and only push +;; a single value to the stack. +;; +;; Stack action forms are in a sense analogous to lambda forms: the +;; symbols before the "--" are the equivalent of lambda arguments, +;; while the forms after the "--" are return values. The difference +;; being that a lambda form can only return a single value, while a +;; stack action can push multiple values onto the stack. It's also +;; perfectly valid to use `(-- FORM...)' or `(VAR... --)': the former +;; pushes values to the stack without consuming any, and the latter +;; pops values from the stack and discards them. ;; ;; Derived Operators: ;; @@ -101,6 +153,8 @@ ;; (replace E RPL); Match E and replace the matched region with RPL. ;; (list E) ; Match E and push a list of the items that E produced. ;; +;; See `peg-ex-parse-int' in `peg-tests.el' for further examples. +;; ;; Regexp equivalents: ;; ;; Here a some examples for regexps and how those could be written as pex. @@ -177,7 +231,7 @@ EXPS is a list of rules/expressions that failed.") ;;;; Main entry points -;; Sometimes (with-peg-rule ... (peg-run (peg ...))) is too +;; Sometimes (with-peg-rules ... (peg-run (peg ...))) is too ;; longwinded for the task at hand, so `peg-parse' comes in handy. (defmacro peg-parse (&rest pexs) "Match PEXS at point. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=peg-allow-symbols.diff diff --git a/peg.el b/peg.el index 0e4221eeb7..fa7e23619f 100644 --- a/peg.el +++ b/peg.el @@ -314,10 +314,14 @@ RULES is a list of rules of the form (NAME . PEXS), where PEXS is a sequence of PEG expressions, implicitly combined with `and'." (declare (indent 1) (debug (sexp form))) ;FIXME: `sexp' is not good enough! (let ((rules - ;; First, macroexpand the rules. - (mapcar (lambda (rule) - (cons (car rule) (peg-normalize `(and . ,(cdr rule))))) - rules)) + (progn + ;; Handle RULES as a variable. + (when (symbolp rules) + (setq rules (symbol-value rules))) + ;; Then macroexpand the rules. + (mapcar (lambda (rule) + (cons (car rule) (peg-normalize `(and . ,(cdr rule))))) + rules))) (ctx (assq :peg-rules macroexpand-all-environment))) (macroexpand-all `(cl-labels --=-=-=--