* Make peg.el a built-in library?
@ 2021-08-25 18:52 Eric Abrahamsen
2021-08-26 6:17 ` Eli Zaretskii
` (3 more replies)
0 siblings, 4 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-08-25 18:52 UTC (permalink / raw)
To: emacs-devel; +Cc: Stefan Monnier
Hi all,
In my on-again-off-again quest to not have to write text parsers myself,
I was pointed towards the PEG library (in ELPA), which does pretty much
exactly what I want (Parsing Expression Grammars).
Would the maintainers consider moving this into Emacs proper? I ask
mostly because this would be very useful to have in Gnus, both to
replace the home-made parser in gnus-search.el, and I would hope to
parse eg IMAP server responses more fully and reliably.
I pinged the original author Helmut Eller, and he said the library
pretty much belongs to Stefan now, though he'd be happy to have it in
core. He also said he didn't think it was the most ergonomic or
efficient thing out there. It looks fine to me, but I haven't
benchmarked it.
I understand it might be redundant with bovine/wisent, but TBH I've
never been able to make them work at all.
Anyway, plenty of reasons to say no, but I thought I'd check!
Thanks,
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
@ 2021-08-26 6:17 ` Eli Zaretskii
2021-08-26 15:34 ` Eric Abrahamsen
2021-08-26 17:02 ` Adam Porter
` (2 subsequent siblings)
3 siblings, 1 reply; 100+ messages in thread
From: Eli Zaretskii @ 2021-08-26 6:17 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: monnier, emacs-devel
> From: Eric Abrahamsen <eric@ericabrahamsen.net>
> Date: Wed, 25 Aug 2021 11:52:00 -0700
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>
>
> In my on-again-off-again quest to not have to write text parsers myself,
> I was pointed towards the PEG library (in ELPA), which does pretty much
> exactly what I want (Parsing Expression Grammars).
>
> Would the maintainers consider moving this into Emacs proper? I ask
> mostly because this would be very useful to have in Gnus, both to
> replace the home-made parser in gnus-search.el, and I would hope to
> parse eg IMAP server responses more fully and reliably.
Fine with me, but please update the (outdated) Wiki page to say where
the latest peg.el is, when it is imported.
> I understand it might be redundant with bovine/wisent, but TBH I've
> never been able to make them work at all.
That should at least warrant a bug report, IMO.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-26 6:17 ` Eli Zaretskii
@ 2021-08-26 15:34 ` Eric Abrahamsen
2021-09-09 4:36 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-08-26 15:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>> Date: Wed, 25 Aug 2021 11:52:00 -0700
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>
>>
>> In my on-again-off-again quest to not have to write text parsers myself,
>> I was pointed towards the PEG library (in ELPA), which does pretty much
>> exactly what I want (Parsing Expression Grammars).
>>
>> Would the maintainers consider moving this into Emacs proper? I ask
>> mostly because this would be very useful to have in Gnus, both to
>> replace the home-made parser in gnus-search.el, and I would hope to
>> parse eg IMAP server responses more fully and reliably.
>
> Fine with me, but please update the (outdated) Wiki page to say where
> the latest peg.el is, when it is imported.
Will do. Stefan also asked me to make sure the library actually does
what I expect it to do, before making this move, so I'll write the code
first.
>> I understand it might be redundant with bovine/wisent, but TBH I've
>> never been able to make them work at all.
>
> That should at least warrant a bug report, IMO.
I'll take another look and remind myself of where I got lost.
Thanks,
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
2021-08-26 6:17 ` Eli Zaretskii
@ 2021-08-26 17:02 ` Adam Porter
2021-08-26 17:25 ` Eric Abrahamsen
2021-08-27 3:17 ` Eric Abrahamsen
2021-10-09 1:31 ` Michael Heerdegen
2022-11-07 3:33 ` Ihor Radchenko
3 siblings, 2 replies; 100+ messages in thread
From: Adam Porter @ 2021-08-26 17:02 UTC (permalink / raw)
To: emacs-devel
FWIW, I've been happy using peg.el in org-ql. I use it to parse strings
like:
"todo:WAITING scheduled:from=2021-08-01,to=2021-08-31"
into a sexp like:
(and (todo "WAITING")
(scheduled :from "2021-08-01" :to "2021-08-31"))
You can see the code I use here:
https://github.com/alphapapa/org-ql/blob/master/org-ql.el#L854
I can't speak much to its performance, because it's plenty fast enough for the
relatively light usage it gets in org-ql. My only, minor complaint is
that I ended up having to use `eval' on its `with-peg-rules' macro at
runtime:
https://github.com/alphapapa/org-ql/blob/94f9e6f3031b32cf5e2149beca7074807235dcb0/org-ql.el#L908
I tried many, many things before resorting to that, so I don't think I
missed any alternatives at the time. If that aspect of the API could be
improved, it would be welcomed, but I don't think it's necessary to do
so before adding it to Emacs.
(The issue is that the tokens that are parsed can be added to at
runtime, so they are stored in a variable, and the parsing function is
redefined as necessary at runtime, so it's not possible to define the
parser fully at expansion time.)
Thanks for suggesting this, Eric. And thanks to Helmut and Stefan for
their work on peg. It's a great tool.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-26 17:02 ` Adam Porter
@ 2021-08-26 17:25 ` Eric Abrahamsen
2021-08-27 3:17 ` Eric Abrahamsen
1 sibling, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-08-26 17:25 UTC (permalink / raw)
To: Adam Porter; +Cc: emacs-devel
Adam Porter <adam@alphapapa.net> writes:
> FWIW, I've been happy using peg.el in org-ql. I use it to parse strings
> like:
>
> "todo:WAITING scheduled:from=2021-08-01,to=2021-08-31"
>
> into a sexp like:
>
> (and (todo "WAITING")
> (scheduled :from "2021-08-01" :to "2021-08-31"))
>
> You can see the code I use here:
>
> https://github.com/alphapapa/org-ql/blob/master/org-ql.el#L854
This is very helpful, thanks. The peg.el comments are not very helpful
when it comes to actions, so it's great to have more examples. I'll try
to provide a comment patch to the library once I've made more progress.
> I can't speak much to its performance, because it's plenty fast enough for the
> relatively light usage it gets in org-ql.
In gnus-search.el it would be the same situation: performance wouldn't
matter at all. If I can use it for IMAP server parsing, though, it would
be important not to be too slow.
> My only, minor complaint is that I ended up having to use `eval' on
> its `with-peg-rules' macro at runtime:
>
> https://github.com/alphapapa/org-ql/blob/94f9e6f3031b32cf5e2149beca7074807235dcb0/org-ql.el#L908
>
> I tried many, many things before resorting to that, so I don't think I
> missed any alternatives at the time. If that aspect of the API could be
> improved, it would be welcomed, but I don't think it's necessary to do
> so before adding it to Emacs.
>
> (The issue is that the tokens that are parsed can be added to at
> runtime, so they are stored in a variable, and the parsing function is
> redefined as necessary at runtime, so it's not possible to define the
> parser fully at expansion time.)
This doesn't mean much to me (yet), but I'll keep an eye out!
> Thanks for suggesting this, Eric.
Thanks for pointing out that it exists!
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-26 17:02 ` Adam Porter
2021-08-26 17:25 ` Eric Abrahamsen
@ 2021-08-27 3:17 ` Eric Abrahamsen
2021-08-27 6:41 ` Helmut Eller
1 sibling, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-08-27 3:17 UTC (permalink / raw)
To: Adam Porter; +Cc: emacs-devel
Adam Porter <adam@alphapapa.net> writes:
> FWIW, I've been happy using peg.el in org-ql. I use it to parse strings
> like:
>
> "todo:WAITING scheduled:from=2021-08-01,to=2021-08-31"
>
> into a sexp like:
>
> (and (todo "WAITING")
> (scheduled :from "2021-08-01" :to "2021-08-31"))
>
> You can see the code I use here:
>
> https://github.com/alphapapa/org-ql/blob/master/org-ql.el#L854
Whoo, I've been trying to get enough of a handle on the parsing actions
to write a documentation patch for them -- now I'm seeing what Helmut
meant by "semantically unintuitive". The sum total of docs regarding
actions is:
A "stack action" takes VARs from the "value stack" and pushes the result
of evaluating FORMs to that stack.
So lower-level pexs need to explicitly push values onto the stack. They
can do that with either one of the built-in "operators" (substring,
region, replace, list), or by using the pattern:
(and <your pex> `(VARS... -- FORM...))
Which confused me mightily until I realized that the backquoted sexp was
essentially a lambda with funny syntax:
`(VARS... -- FORM...) ==> (lambda (vars...) form...)
`(-- FORM...) ==> (lambda () form...)
You don't actually need the leading `and' if you're writing a top-level
pex, it only seems necessary if you're lining up a series of them under
an `or'.
A built-in operator pushes a value onto the stack. No operator (or stack
action) means no push. An action lambda with no argument but a return
value simply pushes that value onto the stack. An action lambda that
accepts arguments consumes values from the stack, and then pushes a new
value (its return value) onto the stack.
So lower-level pexs can take values from the stack and push new ones
back onto the stack, and higher-level pexs can pick those up later. But
because higher-level pexs often simply "or" lower-level pexs, the
developer has to be consistent with the number and type of pushed
values: if a high-level pex looks like:
(foo (or baz bar) `(str -- (upcase str)))
Then the contract is that both the "baz" and "bar" pexs (or an even
lower-level pex referred to by them) will push a single string value
onto the stack (probably with the "substring" operator). Essentially we
need to be calling our upper-level lambda with the right number/type of
argument(s).
If this email makes no sense, it's because I'm halfway through trying to
understand this library. I guess I could wish that these action forms
were simply callables, since they're clearly modeled after function
calls. Anyhoo, I'm going to try to confirm all of the above, and then at
least add to the commentary section for the main package file.
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-27 3:17 ` Eric Abrahamsen
@ 2021-08-27 6:41 ` Helmut Eller
2021-08-27 16:57 ` Eric Abrahamsen
2021-09-26 10:59 ` Augusto Stoffel
0 siblings, 2 replies; 100+ messages in thread
From: Helmut Eller @ 2021-08-27 6:41 UTC (permalink / raw)
To: emacs-devel
On Thu, Aug 26 2021, Eric Abrahamsen wrote:
> Whoo, I've been trying to get enough of a handle on the parsing actions
> to write a documentation patch for them -- now I'm seeing what Helmut
> meant by "semantically unintuitive".
What I actually meant with "semantically unintuitive" are issues
described in Roman Redziejowski's "Trying to understand PEG" paper[*].
He writes:
The problem with limited backtracking is that by not trying hard it
may miss some inputs that it should accept. A notorious example is
the rule A = aAa | aa that defines the set of strings of a’s of even
length. Implemented with limited backtracking, this rule accepts only
strings of length 2^n.
> The sum total of docs regarding
> actions is:
>
> A "stack action" takes VARs from the "value stack" and pushes the result
> of evaluating FORMs to that stack.
Using an "open stack" for actions was my rather idiosyncratic choice and
I'm sure that many people will not like it. The syntax ( a b -- b a )
should be familiar to Forth programmers, where it's used to describe the
stack-effect of commands. The example would be the SWAP operator. If
you have never, or not recently, written some Forth or Postscript, then
mentally keeping track of the stack state can be challenging.
As for "documentation" of actions: there are also some examples. I
think that the s-exp parsing example turned out quite elegant.
Helmut
[*] http://www.romanredz.se/papers/FI2017.pdf
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-27 6:41 ` Helmut Eller
@ 2021-08-27 16:57 ` Eric Abrahamsen
2021-09-26 10:59 ` Augusto Stoffel
1 sibling, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-08-27 16:57 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
Thanks for jumping in (and thanks for the package)!
> On Thu, Aug 26 2021, Eric Abrahamsen wrote:
>
>> Whoo, I've been trying to get enough of a handle on the parsing actions
>> to write a documentation patch for them -- now I'm seeing what Helmut
>> meant by "semantically unintuitive".
>
> What I actually meant with "semantically unintuitive" are issues
> described in Roman Redziejowski's "Trying to understand PEG" paper[*].
> He writes:
>
> The problem with limited backtracking is that by not trying hard it
> may miss some inputs that it should accept. A notorious example is
> the rule A = aAa | aa that defines the set of strings of a’s of even
> length. Implemented with limited backtracking, this rule accepts only
> strings of length 2^n.
Oh... well personally I haven't got to the stage where this is an
issue...
>> The sum total of docs regarding
>> actions is:
>>
>> A "stack action" takes VARs from the "value stack" and pushes the result
>> of evaluating FORMs to that stack.
>
> Using an "open stack" for actions was my rather idiosyncratic choice and
> I'm sure that many people will not like it. The syntax ( a b -- b a )
> should be familiar to Forth programmers, where it's used to describe the
> stack-effect of commands. The example would be the SWAP operator. If
> you have never, or not recently, written some Forth or Postscript, then
> mentally keeping track of the stack state can be challenging.
The stack itself isn't that hard to handle, but I do think the
documentation could be fleshed out with a little hand-holding. The
examples are good, _after_ you understand the basics. I've never written
Forth, and we probably shouldn't expect anyone else to have, either.
I originally assumed the `(a b -- b a) bits could just be replaced with
lambda forms, but I suppose the problem there is that a lambda has a
single return value, and we'd have to do something ugly if we wanted to
push multiple values back onto the stack.
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-26 15:34 ` Eric Abrahamsen
@ 2021-09-09 4:36 ` Eric Abrahamsen
2021-09-19 15:25 ` Eric Abrahamsen
2021-09-30 19:44 ` Stefan Monnier
0 siblings, 2 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-09-09 4:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2892 bytes --]
On 08/26/21 08:34 AM, Eric Abrahamsen wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>>> Date: Wed, 25 Aug 2021 11:52:00 -0700
>>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>
>>>
>>> In my on-again-off-again quest to not have to write text parsers myself,
>>> I was pointed towards the PEG library (in ELPA), which does pretty much
>>> exactly what I want (Parsing Expression Grammars).
>>>
>>> Would the maintainers consider moving this into Emacs proper? I ask
>>> mostly because this would be very useful to have in Gnus, both to
>>> replace the home-made parser in gnus-search.el, and I would hope to
>>> parse eg IMAP server responses more fully and reliably.
>>
>> Fine with me, but please update the (outdated) Wiki page to say where
>> the latest peg.el is, when it is imported.
>
> Will do. Stefan also asked me to make sure the library actually does
> what I expect it to do, before making this move, so I'll write the code
> first.
Okay, I wrote some code: the "use-peg-in-gnus-search.diff" attachment is
the result of that. It works really well! A net removal of ~100 LOC
(obviously we're still in deficit with the addition of peg.el), it
already fixes some wrong behavior of the old parser, and it's much
easier to reason about and add new behavior to. It's the shiny
declarative future I was looking forward to.
Whether or not PEG gets added to core I'd like to propose some patches.
The "peg-doc-patches.diff" attachment adds some documentation to the
Commentary section, including an example grammar based on a
much-simplified version of what gnus-search does.
The peg-allow-symbols patch is more tentative. The issue is that _all_
of the entry-points to peg code are macros, meaning you can't build your
grammar up in a variable, and then pass that variable to any of
`peg-run', `peg-parse', `with-peg-rules', etc. Nobody will evaluate the
variable; you have to literally write the rules inside the
`with-peg-rules' form. It seems like a fairly plausible use-case to
store the rules in a variable or an option, even if you're not doing
run-time manipulation of them. The only solution, as Adam found with
org-ql, is to `eval' one of the macros.
This doesn't seem necessary! The patch has `with-peg-rules' check if the
rules are a symbol, and take the `symbol-value' if so. But I wonder if
it wouldn't be nicer to break some of the code out: `peg-normalize'
seems to be the entry-point for "compile this grammar", and that could
be modified to work the way that some languages provide for pre-compiled
regexps: a way to let the developer build and compile the grammar at
load-time or launch-time, then feed the stored compiled version to
parsing routines.
`peg-parse' could be a function, or maybe it also could also just check
if its argument is a symbol.
I hope someone will have some thoughts on this!
Eric
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: use-peg-in-gnus-search.diff --]
[-- Type: text/x-patch, Size: 8107 bytes --]
diff --git a/lisp/gnus/gnus-search.el b/lisp/gnus/gnus-search.el
index 2a8069d400..5574061457 100644
--- a/lisp/gnus/gnus-search.el
+++ b/lisp/gnus/gnus-search.el
@@ -82,6 +82,7 @@
(require 'gnus-sum)
(require 'message)
(require 'gnus-util)
+(require 'peg)
(require 'eieio)
(eval-when-compile (require 'cl-lib))
(autoload 'eieio-build-class-alist "eieio-opt")
@@ -390,8 +391,29 @@ gnus-search-contact-tables
;;; Search language
-;; This "language" was generalized from the original IMAP search query
-;; parsing routine.
+;; Here's our attempt at using the PEG library to rewrite the parser.
+
+(defvar gnus-search-query-pexs
+ '((query (+ (or compound-term term)))
+ (term (or subquery prefixed-term kv-term value) term-end)
+ (subquery "(" query ")"
+ `(query -- (if (= 1 (length query)) query (list query))))
+ (prefixed-term (or negated-term near-term))
+ (negated-term (or "not " "-") term
+ `(term -- (list 'not term)))
+ (near-term "near " term
+ `(term -- (list 'near term)))
+ (compound-term (or or-terms and-terms))
+ (or-terms (or subquery prefixed-term term) "or " (or subquery prefixed-term term)
+ `(t1 t2 -- (list 'or t1 t2)))
+ (and-terms (or subquery prefixed-term term) "and " (or subquery prefixed-term term)
+ `(t1 t2 -- (list 'and t1 t2)))
+ (value (or quoted-value plain-value))
+ (plain-value (substring (+ [word])))
+ (quoted-value "\"" (substring (+ (not "\"") (any))) "\"")
+ (kv-term plain-value ":" value
+ `(k v -- (gnus-search-query-parse-kv k v)))
+ (term-end (opt (+ [space])))))
(defun gnus-search-parse-query (string)
"Turn STRING into an s-expression based query.
@@ -459,108 +481,26 @@ gnus-search-parse-query
structured query. Malformed, unusable or invalid queries will
typically be silently ignored."
(with-temp-buffer
- ;; Set up the parsing environment.
(insert string)
(goto-char (point-min))
- ;; Now, collect the output terms and return them.
- (let (out)
- (while (not (gnus-search-query-end-of-input))
- (push (gnus-search-query-next-expr) out))
- (reverse out))))
-
-(defun gnus-search-query-next-expr (&optional count halt)
- "Return the next expression from the current buffer."
- (let ((term (gnus-search-query-next-term count))
- (next (gnus-search-query-peek-symbol)))
- ;; Deal with top-level expressions. And, or, not, near... What
- ;; else? Notmuch also provides xor and adj. It also provides a
- ;; "nearness" parameter for near and adj.
- (cond
- ;; Handle 'expr or expr'
- ((and (eq next 'or)
- (null halt))
- (list 'or term (gnus-search-query-next-expr 2)))
- ;; Handle 'near operator.
- ((eq next 'near)
- (let ((near-next (gnus-search-query-next-expr 2)))
- (if (and (stringp term)
- (stringp near-next))
- (list 'near term near-next)
- (signal 'gnus-search-parse-error
- (list "\"Near\" keyword must appear between two plain strings.")))))
- ;; Anything else
- (t term))))
-
-(defun gnus-search-query-next-term (&optional count)
- "Return the next TERM from the current buffer."
- (let ((term (gnus-search-query-next-symbol count)))
- ;; What sort of term is this?
- (cond
- ;; negated term
- ((eq term 'not) (list 'not (gnus-search-query-next-expr nil 'halt)))
- ;; generic term
- (t term))))
-
-(defun gnus-search-query-peek-symbol ()
- "Return the next symbol from the current buffer, but don't consume it."
- (save-excursion
- (gnus-search-query-next-symbol)))
-
-(defun gnus-search-query-next-symbol (&optional count)
- "Return the next symbol from the current buffer, or nil if we are
-at the end of the buffer. If supplied COUNT skips some symbols before
-returning the one at the supplied position."
- (when (and (numberp count) (> count 1))
- (gnus-search-query-next-symbol (1- count)))
- (let ((case-fold-search t))
- ;; end of input stream?
- (unless (gnus-search-query-end-of-input)
- ;; No, return the next symbol from the stream.
- (cond
- ;; Negated expression -- return it and advance one char.
- ((looking-at "-") (forward-char 1) 'not)
- ;; List expression -- we parse the content and return this as a list.
- ((looking-at "(")
- (gnus-search-parse-query (gnus-search-query-return-string ")" t)))
- ;; Keyword input -- return a symbol version.
- ((looking-at "\\band\\b") (forward-char 3) 'and)
- ((looking-at "\\bor\\b") (forward-char 2) 'or)
- ((looking-at "\\bnot\\b") (forward-char 3) 'not)
- ((looking-at "\\bnear\\b") (forward-char 4) 'near)
- ;; Plain string, no keyword
- ((looking-at "[\"/]?\\b[^:]+\\([[:blank:]]\\|\\'\\)")
- (gnus-search-query-return-string
- (when (looking-at-p "[\"/]") t)))
- ;; Assume a K:V expression.
- (t (let ((key (gnus-search-query-expand-key
- (buffer-substring
- (point)
- (progn
- (re-search-forward ":" (point-at-eol) t)
- (1- (point))))))
- (value (gnus-search-query-return-string
- (when (looking-at-p "[\"/]") t))))
- (gnus-search-query-parse-kv key value)))))))
+ (with-peg-rules gnus-search-query-pexs
+ peg-run (peg query))))
(defun gnus-search-query-parse-kv (key value)
"Handle KEY and VALUE, parsing and expanding as necessary.
-This may result in (key value) being turned into a larger query
-structure.
-
In the simplest case, they are simply consed together. String
KEY is converted to a symbol."
- (let () ;; return
- (cond
- ((member key gnus-search-date-keys)
- (when (string= "after" key)
- (setq key "since"))
- (setq value (gnus-search-query-parse-date value)))
- ((equal key "mark")
- (setq value (gnus-search-query-parse-mark value)))
- ((string= "message-id" key)
- (setq key "id")))
- (or nil ;; return
- (cons (intern key) value))))
+ (setq key (gnus-search-query-expand-key key))
+ (cond
+ ((member key gnus-search-date-keys)
+ (when (string= "after" key)
+ (setq key "since"))
+ (setq value (gnus-search-query-parse-date value)))
+ ((equal key "mark")
+ (setq value (gnus-search-query-parse-mark value)))
+ ((string= "message-id" key)
+ (setq key "id")))
+ (cons (intern key) value))
(defun gnus-search-query-parse-date (value &optional rel-date)
"Interpret VALUE as a date specification.
@@ -647,44 +587,6 @@ gnus-search-query-expand-key
;; We completed to a unique known key.
comp))))
-(defun gnus-search-query-return-string (&optional delimited trim)
- "Return a string from the current buffer.
-If DELIMITED is non-nil, assume the next character is a delimiter
-character, and return everything between point and the next
-occurrence of the delimiter, including the delimiters themselves.
-If TRIM is non-nil, do not return the delimiters. Otherwise,
-return one word."
- ;; This function cannot handle nested delimiters, as it's not a
- ;; proper parser. Ie, you cannot parse "to:bob or (from:bob or
- ;; (cc:bob or bcc:bob))".
- (let ((start (point))
- (delimiter (if (stringp delimited)
- delimited
- (when delimited
- (char-to-string (char-after)))))
- end)
- (if delimiter
- (progn
- (when trim
- ;; Skip past first delimiter if we're trimming.
- (forward-char 1))
- (while (not end)
- (unless (search-forward delimiter nil t (unless trim 2))
- (signal 'gnus-search-parse-error
- (list (format "Unmatched delimited input with %s in query" delimiter))))
- (let ((here (point)))
- (unless (equal (buffer-substring (- here 2) (- here 1)) "\\")
- (setq end (if trim (1- (point)) (point))
- start (if trim (1+ start) start))))))
- (setq end (progn (re-search-forward "\\([[:blank:]]+\\|$\\)" (point-max) t)
- (match-beginning 0))))
- (buffer-substring-no-properties start end)))
-
-(defun gnus-search-query-end-of-input ()
- "Are we at the end of input?"
- (skip-chars-forward "[:blank:]")
- (looking-at "$"))
-
;;; Search engines
;; Search engines are implemented as classes. This is good for two
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: peg-doc-patch.diff --]
[-- Type: text/x-patch, Size: 4172 bytes --]
diff --git a/peg.el b/peg.el
index d71c707dc0..0e4221eeb7 100644
--- a/peg.el
+++ b/peg.el
@@ -79,17 +79,69 @@
;; Beginning-of-Symbol (bos)
;; End-of-Symbol (eos)
;;
-;; PEXs also support parsing actions, i.e. Lisp snippets which
-;; are executed when a pex matches. This can be used to construct
-;; syntax trees or for similar tasks. Actions are written as
+;; Rules can refer to other rules, and a grammar is often structured
+;; as a tree, with a root rule referring to one or more "branch
+;; rules", all the way down to the "leaf rules" that deal with actual
+;; buffer text. Rules can be recursive or mutually referential,
+;; though care must be taken not to create infinite loops.
+;;
+;; PEXs also support parsing actions, i.e. Lisp snippets which are
+;; executed when a pex matches. This can be used to construct syntax
+;; trees or for similar tasks. The most basic form of action is
+;; written as:
;;
;; (action FORM) ; evaluate FORM for its side-effects
-;; `(VAR... -- FORM...) ; stack action
;;
;; Actions don't consume input, but are executed at the point of
-;; match. A "stack action" takes VARs from the "value stack" and
-;; pushes the result of evaluating FORMs to that stack.
-;; See `peg-ex-parse-int' in `peg-tests.el' for an example.
+;; match. Another kind of action is called a "stack action", and
+;; looks like this:
+;;
+;; `(VAR... -- FORM...) ; stack action
+;;
+;; A stack action takes VARs from the "value stack" and pushes the
+;; results of evaluating FORMs to that stack.
+
+;; The value stack is created during the course of parsing. Certain
+;; operators (see below) that match buffer text can push values onto
+;; this stack. "Upstream" rules can then draw values from the stack,
+;; and optionally push new ones back. For instance, consider this
+;; very simple grammar:
+;;
+;; (with-peg-rules
+;; ((query (+ term) (eol))
+;; (term key ":" value (opt (+ [space]))
+;; `(k v -- (cons (intern k) v)))
+;; (key (substring (and (not ":") (+ [word]))))
+;; (value (or string-value number-value))
+;; (string-value (substring (+ [alpha])))
+;; (number-value (substring (+ [digit]))
+;; `(val -- (string-to-number val))))
+;; (peg-run (peg query)))
+;;
+;; This invocation of `peg-run' would parse this buffer text:
+;;
+;; name:Jane age:30
+;;
+;; And return this Elisp sexp:
+;;
+;; ((age . 30) (name . "Jane"))
+;;
+;; Note that, in complex grammars, some care must be taken to make
+;; sure that the number and type of values drawn from the stack always
+;; match those pushed. In the example above, both `string-value' and
+;; `number-value' push a single value to the stack. Since the `value'
+;; rule only includes these two sub-rules, any upstream rule that
+;; makes use of `value' can be confident it will always and only push
+;; a single value to the stack.
+;;
+;; Stack action forms are in a sense analogous to lambda forms: the
+;; symbols before the "--" are the equivalent of lambda arguments,
+;; while the forms after the "--" are return values. The difference
+;; being that a lambda form can only return a single value, while a
+;; stack action can push multiple values onto the stack. It's also
+;; perfectly valid to use `(-- FORM...)' or `(VAR... --)': the former
+;; pushes values to the stack without consuming any, and the latter
+;; pops values from the stack and discards them.
;;
;; Derived Operators:
;;
@@ -101,6 +153,8 @@
;; (replace E RPL); Match E and replace the matched region with RPL.
;; (list E) ; Match E and push a list of the items that E produced.
;;
+;; See `peg-ex-parse-int' in `peg-tests.el' for further examples.
+;;
;; Regexp equivalents:
;;
;; Here a some examples for regexps and how those could be written as pex.
@@ -177,7 +231,7 @@ EXPS is a list of rules/expressions that failed.")
;;;; Main entry points
-;; Sometimes (with-peg-rule ... (peg-run (peg ...))) is too
+;; Sometimes (with-peg-rules ... (peg-run (peg ...))) is too
;; longwinded for the task at hand, so `peg-parse' comes in handy.
(defmacro peg-parse (&rest pexs)
"Match PEXS at point.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: peg-allow-symbols.diff --]
[-- Type: text/x-patch, Size: 919 bytes --]
diff --git a/peg.el b/peg.el
index 0e4221eeb7..fa7e23619f 100644
--- a/peg.el
+++ b/peg.el
@@ -314,10 +314,14 @@ RULES is a list of rules of the form (NAME . PEXS), where PEXS is a sequence
of PEG expressions, implicitly combined with `and'."
(declare (indent 1) (debug (sexp form))) ;FIXME: `sexp' is not good enough!
(let ((rules
- ;; First, macroexpand the rules.
- (mapcar (lambda (rule)
- (cons (car rule) (peg-normalize `(and . ,(cdr rule)))))
- rules))
+ (progn
+ ;; Handle RULES as a variable.
+ (when (symbolp rules)
+ (setq rules (symbol-value rules)))
+ ;; Then macroexpand the rules.
+ (mapcar (lambda (rule)
+ (cons (car rule) (peg-normalize `(and . ,(cdr rule)))))
+ rules)))
(ctx (assq :peg-rules macroexpand-all-environment)))
(macroexpand-all
`(cl-labels
^ permalink raw reply related [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-09 4:36 ` Eric Abrahamsen
@ 2021-09-19 15:25 ` Eric Abrahamsen
2021-09-30 19:44 ` Stefan Monnier
1 sibling, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-09-19 15:25 UTC (permalink / raw)
To: monnier; +Cc: emacs-devel
Bumping this up in case it slid off radars: I'd like to at least push
the documentation patch to peg.el...
On 09/08/21 21:36 PM, Eric Abrahamsen wrote:
> On 08/26/21 08:34 AM, Eric Abrahamsen wrote:
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>>>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>>>> Date: Wed, 25 Aug 2021 11:52:00 -0700
>>>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>
>>>>
>>>> In my on-again-off-again quest to not have to write text parsers myself,
>>>> I was pointed towards the PEG library (in ELPA), which does pretty much
>>>> exactly what I want (Parsing Expression Grammars).
>>>>
>>>> Would the maintainers consider moving this into Emacs proper? I ask
>>>> mostly because this would be very useful to have in Gnus, both to
>>>> replace the home-made parser in gnus-search.el, and I would hope to
>>>> parse eg IMAP server responses more fully and reliably.
>>>
>>> Fine with me, but please update the (outdated) Wiki page to say where
>>> the latest peg.el is, when it is imported.
>>
>> Will do. Stefan also asked me to make sure the library actually does
>> what I expect it to do, before making this move, so I'll write the code
>> first.
>
> Okay, I wrote some code: the "use-peg-in-gnus-search.diff" attachment is
> the result of that. It works really well! A net removal of ~100 LOC
> (obviously we're still in deficit with the addition of peg.el), it
> already fixes some wrong behavior of the old parser, and it's much
> easier to reason about and add new behavior to. It's the shiny
> declarative future I was looking forward to.
>
> Whether or not PEG gets added to core I'd like to propose some patches.
> The "peg-doc-patches.diff" attachment adds some documentation to the
> Commentary section, including an example grammar based on a
> much-simplified version of what gnus-search does.
>
> The peg-allow-symbols patch is more tentative. The issue is that _all_
> of the entry-points to peg code are macros, meaning you can't build your
> grammar up in a variable, and then pass that variable to any of
> `peg-run', `peg-parse', `with-peg-rules', etc. Nobody will evaluate the
> variable; you have to literally write the rules inside the
> `with-peg-rules' form. It seems like a fairly plausible use-case to
> store the rules in a variable or an option, even if you're not doing
> run-time manipulation of them. The only solution, as Adam found with
> org-ql, is to `eval' one of the macros.
>
> This doesn't seem necessary! The patch has `with-peg-rules' check if the
> rules are a symbol, and take the `symbol-value' if so. But I wonder if
> it wouldn't be nicer to break some of the code out: `peg-normalize'
> seems to be the entry-point for "compile this grammar", and that could
> be modified to work the way that some languages provide for pre-compiled
> regexps: a way to let the developer build and compile the grammar at
> load-time or launch-time, then feed the stored compiled version to
> parsing routines.
>
> `peg-parse' could be a function, or maybe it also could also just check
> if its argument is a symbol.
>
> I hope someone will have some thoughts on this!
>
> Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-27 6:41 ` Helmut Eller
2021-08-27 16:57 ` Eric Abrahamsen
@ 2021-09-26 10:59 ` Augusto Stoffel
2021-09-26 15:06 ` Eric Abrahamsen
2021-09-27 22:34 ` Richard Stallman
1 sibling, 2 replies; 100+ messages in thread
From: Augusto Stoffel @ 2021-09-26 10:59 UTC (permalink / raw)
To: Helmut Eller; +Cc: Eric Abrahamsen, emacs-devel
I think it would be really cool to have PEGs built into Emacs. Things
like json.el could be simplified by at least (log10 2) orders of
magnitude with PEGs. Whatever the use case of `rx' is, PEGs are
probably the "real" solution.
But I suspect this would only take traction with a fast and robust C
implementation like Lua's LPEG (see below for a reason).
On Fri, 27 Aug 2021 at 08:41, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Thu, Aug 26 2021, Eric Abrahamsen wrote:
>
>> Whoo, I've been trying to get enough of a handle on the parsing actions
>> to write a documentation patch for them -- now I'm seeing what Helmut
>> meant by "semantically unintuitive".
>
> What I actually meant with "semantically unintuitive" are issues
> described in Roman Redziejowski's "Trying to understand PEG" paper[*].
> He writes:
>
> The problem with limited backtracking is that by not trying hard it
> may miss some inputs that it should accept. A notorious example is
> the rule A = aAa | aa that defines the set of strings of a’s of even
> length. Implemented with limited backtracking, this rule accepts only
> strings of length 2^n.
When I started to write PEGs intensively, I thought the limited
backtracking would be a problem. It's not. In fact, I find the
regexp-style backtracking great, but only for “quick and dirty” things
(e.g., those throw-away little programs one writes for grep or isearch).
But if are trying to write a more complex parser, aggressive
backtracking actually gets in the way.
The example above is kind of silly. You can parse an even number of a's
with the rule A = aaA | ε. This is still kind of bad, because (unless
peg.el is way fancier than I'm imagining), it consumes the call stack.
LPEG has a kind of “tail call optimization” that allows you to do this.
Obviously, the sane way to parse an even number of a's is the rule
(aa)*, aka (* "aa"). But there are many justifiable use-cases for the
tail call optimization. For instance, given a pattern P, produce a new
pattern that looks ahead for the first match of P. This would be
P | .P, or
(or P (and (any) P))
in peg.el notation. Is there a simple an efficient way to do this in
peg.el, that allows to skip over thousands of characters without a new
call stack entry for each one of them?
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-26 10:59 ` Augusto Stoffel
@ 2021-09-26 15:06 ` Eric Abrahamsen
2021-09-26 18:36 ` Augusto Stoffel
2021-09-27 22:34 ` Richard Stallman
1 sibling, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-09-26 15:06 UTC (permalink / raw)
To: emacs-devel
Augusto Stoffel <arstoffel@gmail.com> writes:
> I think it would be really cool to have PEGs built into Emacs. Things
> like json.el could be simplified by at least (log10 2) orders of
> magnitude with PEGs. Whatever the use case of `rx' is, PEGs are
> probably the "real" solution.
>
> But I suspect this would only take traction with a fast and robust C
> implementation like Lua's LPEG (see below for a reason).
I wonder if it would make sense to adopt this elisp library for now, see
if people use it (or want to use it but complain about speed), and
consider translating to C if they do?
The elisp version has generic methods for `peg-normalize' (and
`peg--macroexpand', though I guess that's private) which would allow
library authors to write new peg expressions. We'd lose that with C,
though I suppose speed vs extensibility is always the tradeoff with
C vs Elisp.
In a previous message I complained a little bit about the entry-points
to PEG as it stands now -- they're all macros. Maybe if we were thinking
in terms of a future C translation, we could narrow the API down a
little and lock it down, and discourage authors from using anything that
wouldn't be made available by the future version.
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-26 15:06 ` Eric Abrahamsen
@ 2021-09-26 18:36 ` Augusto Stoffel
2021-09-27 16:18 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Augusto Stoffel @ 2021-09-26 18:36 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
On Sun, 26 Sep 2021 at 08:06, Eric Abrahamsen <eric@ericabrahamsen.net> wrote:
> Augusto Stoffel <arstoffel@gmail.com> writes:
>
>> I think it would be really cool to have PEGs built into Emacs. Things
>> like json.el could be simplified by at least (log10 2) orders of
>> magnitude with PEGs. Whatever the use case of `rx' is, PEGs are
>> probably the "real" solution.
>>
>> But I suspect this would only take traction with a fast and robust C
>> implementation like Lua's LPEG (see below for a reason).
>
> I wonder if it would make sense to adopt this elisp library for now, see
> if people use it (or want to use it but complain about speed), and
> consider translating to C if they do?
Yes, that sounds reasonable. But the efficiency problem isn't even just
about speed, it's also about which patterns you can run at all without
exhausting the call stack. Without the “tail call optimization” that I
mentioned in the previous message, I think much of the appeal of PEGs is
gone...
>
> The elisp version has generic methods for `peg-normalize' (and
> `peg--macroexpand', though I guess that's private) which would allow
> library authors to write new peg expressions. We'd lose that with C,
> though I suppose speed vs extensibility is always the tradeoff with
> C vs Elisp.
I'm not sure I understand this comment, and I confess I didn't look
closely at peg.el. But there's a difference between _defining_ a
pattern and _executing_ it. If the basic PEG vocabulary (sequence,
ordered choice, repetition, grammars, etc.) is implemented in C, you can
define all sorts of combinators, such as
(define-peg-rule search (patt)
(or patt (and (any) (search patt))))
[or whatever the syntax is for grammars/recursive definitions], and
executing the patterns doesn't involve any Lisp calls.
>
> In a previous message I complained a little bit about the entry-points
> to PEG as it stands now -- they're all macros. Maybe if we were thinking
> in terms of a future C translation, we could narrow the API down a
> little and lock it down, and discourage authors from using anything that
> wouldn't be made available by the future version.
I can't say anything useful here without studying peg.el a bit, but I
think it would be ideal if PEGs are just values (which, in particular,
you can manipulate without naming) and there are functions that allow
making new PEGs out of old ones.
And once again, Lua's LPEG is a fantastic library. It might be worth
taking a look at how it works.
>
> Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-26 18:36 ` Augusto Stoffel
@ 2021-09-27 16:18 ` Eric Abrahamsen
0 siblings, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-09-27 16:18 UTC (permalink / raw)
To: emacs-devel
Augusto Stoffel <arstoffel@gmail.com> writes:
> On Sun, 26 Sep 2021 at 08:06, Eric Abrahamsen <eric@ericabrahamsen.net> wrote:
>
>> Augusto Stoffel <arstoffel@gmail.com> writes:
>>
>>> I think it would be really cool to have PEGs built into Emacs. Things
>>> like json.el could be simplified by at least (log10 2) orders of
>>> magnitude with PEGs. Whatever the use case of `rx' is, PEGs are
>>> probably the "real" solution.
>>>
>>> But I suspect this would only take traction with a fast and robust C
>>> implementation like Lua's LPEG (see below for a reason).
>>
>> I wonder if it would make sense to adopt this elisp library for now, see
>> if people use it (or want to use it but complain about speed), and
>> consider translating to C if they do?
>
> Yes, that sounds reasonable. But the efficiency problem isn't even just
> about speed, it's also about which patterns you can run at all without
> exhausting the call stack. Without the “tail call optimization” that I
> mentioned in the previous message, I think much of the appeal of PEGs is
> gone...
For someone hoping to use PEG to simplify parsing of very regular
(though possibly complex) text (me), it's still pretty appealing.
>>
>> The elisp version has generic methods for `peg-normalize' (and
>> `peg--macroexpand', though I guess that's private) which would allow
>> library authors to write new peg expressions. We'd lose that with C,
>> though I suppose speed vs extensibility is always the tradeoff with
>> C vs Elisp.
>
> I'm not sure I understand this comment, and I confess I didn't look
> closely at peg.el. But there's a difference between _defining_ a
> pattern and _executing_ it. If the basic PEG vocabulary (sequence,
> ordered choice, repetition, grammars, etc.) is implemented in C, you can
> define all sorts of combinators, such as
>
> (define-peg-rule search (patt)
> (or patt (and (any) (search patt))))
>
> [or whatever the syntax is for grammars/recursive definitions], and
> executing the patterns doesn't involve any Lisp calls.
Yes, that's all I meant. So long as rules can still be defined in Lisp,
this isn't an issue.
>> In a previous message I complained a little bit about the entry-points
>> to PEG as it stands now -- they're all macros. Maybe if we were thinking
>> in terms of a future C translation, we could narrow the API down a
>> little and lock it down, and discourage authors from using anything that
>> wouldn't be made available by the future version.
>
> I can't say anything useful here without studying peg.el a bit, but I
> think it would be ideal if PEGs are just values (which, in particular,
> you can manipulate without naming) and there are functions that allow
> making new PEGs out of old ones.
>
> And once again, Lua's LPEG is a fantastic library. It might be worth
> taking a look at how it works.
I don't really know anything about PEGs or the theory behind them, and
was just hoping to be the squeaky wheel in this case. It would be great
to improve peg.el, but I still think it would be nice to get it into
Emacs first.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-26 10:59 ` Augusto Stoffel
2021-09-26 15:06 ` Eric Abrahamsen
@ 2021-09-27 22:34 ` Richard Stallman
2021-09-28 3:52 ` Eric Abrahamsen
1 sibling, 1 reply; 100+ messages in thread
From: Richard Stallman @ 2021-09-27 22:34 UTC (permalink / raw)
To: Augusto Stoffel; +Cc: eric, eller.helmut, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
What is a PEG?
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-27 22:34 ` Richard Stallman
@ 2021-09-28 3:52 ` Eric Abrahamsen
2021-09-28 8:09 ` tomas
2021-09-30 6:04 ` Richard Stallman
0 siblings, 2 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-09-28 3:52 UTC (permalink / raw)
To: Richard Stallman; +Cc: eller.helmut, Augusto Stoffel, emacs-devel
On 09/27/21 18:34 PM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> What is a PEG?
A Parsing Expression Grammar:
https://en.wikipedia.org/wiki/Parsing_expression_grammar
Basically a way of composing a parser out of smaller regexp-like
expressions. They can be very useful in a wide variety of situations.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-28 3:52 ` Eric Abrahamsen
@ 2021-09-28 8:09 ` tomas
2021-09-28 9:32 ` Helmut Eller
2021-09-28 15:24 ` Augusto Stoffel
2021-09-30 6:04 ` Richard Stallman
1 sibling, 2 replies; 100+ messages in thread
From: tomas @ 2021-09-28 8:09 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]
On Mon, Sep 27, 2021 at 08:52:38PM -0700, Eric Abrahamsen wrote:
>
> On 09/27/21 18:34 PM, Richard Stallman wrote:
> > [[[ To any NSA and FBI agents reading my email: please consider ]]]
> > [[[ whether defending the US Constitution against all enemies, ]]]
> > [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> >
> > What is a PEG?
>
> A Parsing Expression Grammar:
> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> Basically a way of composing a parser out of smaller regexp-like
> expressions. They can be very useful in a wide variety of situations.
In the Chomsky hierarchy, they live in some funny place between
regular (Type-3) and context free (Type-2). They are strictly
more powerful than regular grammars (but can eat memory for
breakfast [1], but (quoting the Wikipedia ref above:
"It is an open problem to give a concrete example of a
context-free language which cannot be recognized by a
parsing expression grammar."
I don't know at the moment whether there is a (non-constructive)
proof that CFGs be strictly more expressive than PEGs?
Cheers
[1] Memory has become significantly cheaper since Thompson, this
might have a practical significance or not ;-)
- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-28 8:09 ` tomas
@ 2021-09-28 9:32 ` Helmut Eller
2021-09-28 10:45 ` tomas
2021-09-28 15:24 ` Augusto Stoffel
1 sibling, 1 reply; 100+ messages in thread
From: Helmut Eller @ 2021-09-28 9:32 UTC (permalink / raw)
To: emacs-devel
On Tue, Sep 28 2021, tomas@tuxteam.de wrote:
> I don't know at the moment whether there is a (non-constructive)
> proof that CFGs be strictly more expressive than PEGs?
You could ask this question on the PEG mailing list [1].
Apparently it has been proven[2] that for every CFG in LL(1) there is a
corresponding PEG. This is very nice, because in practice we are mostly
interested in grammars that can be parsed efficiently. Unfortunately,
it seems[3] difficult/impossible to tell (statically) if a given PEG
corresponds to LL(1) or how much backtracking it needs.
Helmut
[1] https://lists.csail.mit.edu/mailman/listinfo/peg
[2] https://arxiv.org/abs/1304.3177
[3] Trying to understand PEG
Fundamenta Informaticae 157, 4 (2018) 463-475.
http://www.romanredz.se/papers/FI2017.pdf
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-28 9:32 ` Helmut Eller
@ 2021-09-28 10:45 ` tomas
0 siblings, 0 replies; 100+ messages in thread
From: tomas @ 2021-09-28 10:45 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 412 bytes --]
On Tue, Sep 28, 2021 at 11:32:58AM +0200, Helmut Eller wrote:
> On Tue, Sep 28 2021, tomas@tuxteam.de wrote:
>
> > I don't know at the moment whether there is a (non-constructive)
> > proof that CFGs be strictly more expressive than PEGs?
>
> You could ask this question on the PEG mailing list [1].
Uh, thanks for the links. They'll possibly fill most long evenings
this winter ;-)
Cheers
- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-28 8:09 ` tomas
2021-09-28 9:32 ` Helmut Eller
@ 2021-09-28 15:24 ` Augusto Stoffel
1 sibling, 0 replies; 100+ messages in thread
From: Augusto Stoffel @ 2021-09-28 15:24 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
On Tue, 28 Sep 2021 at 10:09, <tomas@tuxteam.de> wrote:
> On Mon, Sep 27, 2021 at 08:52:38PM -0700, Eric Abrahamsen wrote:
>>
>> On 09/27/21 18:34 PM, Richard Stallman wrote:
>> > [[[ To any NSA and FBI agents reading my email: please consider ]]]
>> > [[[ whether defending the US Constitution against all enemies, ]]]
>> > [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>> >
>> > What is a PEG?
>>
>> A Parsing Expression Grammar:
>> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>>
>> Basically a way of composing a parser out of smaller regexp-like
>> expressions. They can be very useful in a wide variety of situations.
>
> In the Chomsky hierarchy, they live in some funny place between
> regular (Type-3) and context free (Type-2). They are strictly
> more powerful than regular grammars (but can eat memory for
> breakfast [1], but (quoting the Wikipedia ref above:
>
> "It is an open problem to give a concrete example of a
> context-free language which cannot be recognized by a
> parsing expression grammar."
Perhaps more interesting in practice: a PEG can compute and return a
value as it parses the subject string. So one can (easily) write a PEG
that recognizes well-formed arithmetic expressions _and_ computes the
value of the arithmetic expression along the way. Or a PEG that
recognizes email headers and returns those headers as an alist.
Regexps usually only produce substrings of the subject string (in Emacs
regexps can also call Lisp code, but this is not as general.)
[Also note that a PEG defines a parser for a grammar, not just a
grammar.]
>
> I don't know at the moment whether there is a (non-constructive)
> proof that CFGs be strictly more expressive than PEGs?
>
> Cheers
>
> [1] Memory has become significantly cheaper since Thompson, this
> might have a practical significance or not ;-)
>
> - t
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-28 3:52 ` Eric Abrahamsen
2021-09-28 8:09 ` tomas
@ 2021-09-30 6:04 ` Richard Stallman
2021-10-01 3:27 ` Eric Abrahamsen
1 sibling, 1 reply; 100+ messages in thread
From: Richard Stallman @ 2021-09-30 6:04 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: eller.helmut, arstoffel, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Basically a way of composing a parser out of smaller regexp-like
> expressions. They can be very useful in a wide variety of situations.
It does sound useful. Can you post a descripion of a specific simple
example where this approach is advantageous?
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-09 4:36 ` Eric Abrahamsen
2021-09-19 15:25 ` Eric Abrahamsen
@ 2021-09-30 19:44 ` Stefan Monnier
2021-09-30 20:34 ` Adam Porter
1 sibling, 1 reply; 100+ messages in thread
From: Stefan Monnier @ 2021-09-30 19:44 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Eli Zaretskii, emacs-devel
> Whether or not PEG gets added to core I'd like to propose some patches.
> The "peg-doc-patches.diff" attachment adds some documentation to the
> Commentary section, including an example grammar based on a
> much-simplified version of what gnus-search does.
Looks great, thanks.
> The peg-allow-symbols patch is more tentative. The issue is that _all_
> of the entry-points to peg code are macros, meaning you can't build your
> grammar up in a variable, and then pass that variable to any of
> `peg-run', `peg-parse', `with-peg-rules', etc. Nobody will evaluate the
> variable; you have to literally write the rules inside the
> `with-peg-rules' form. It seems like a fairly plausible use-case to
> store the rules in a variable or an option, even if you're not doing
> run-time manipulation of them. The only solution, as Adam found with
> org-ql, is to `eval' one of the macros.
>
> This doesn't seem necessary! The patch has `with-peg-rules' check if the
> rules are a symbol, and take the `symbol-value' if so. But I wonder if
> it wouldn't be nicer to break some of the code out: `peg-normalize'
> seems to be the entry-point for "compile this grammar", and that could
> be modified to work the way that some languages provide for pre-compiled
> regexps: a way to let the developer build and compile the grammar at
> load-time or launch-time, then feed the stored compiled version to
> parsing routines.
`peg` is the macro that's supposed to be this compilation step: you pass
it a PEX and you receive a value in return. It's a bit like `lambda`.
You can then use this value (a "peg matcher") to parse something by
passing it to `peg-run`.
So you can do
(let ((parser (peg PEX)))
...
(peg-run parser ...)
...)
What might still be missing, tho is a way to invoke this `parser` from
within a PEX. So we might want to add a new PEX form that would be akin
to `funcall`. We could name it `call`:
(let* ((parser (peg PEX))
...
(with-peg-rules
((foo ...)
(bar ... (call parser) ...)
(baz ...))
...))
so (peg-parse (call FORM)) would end up equivalent to (peg-run FORM ...).
WDYT?
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-30 19:44 ` Stefan Monnier
@ 2021-09-30 20:34 ` Adam Porter
2021-10-01 8:14 ` Augusto Stoffel
2021-10-01 18:05 ` Stefan Monnier
0 siblings, 2 replies; 100+ messages in thread
From: Adam Porter @ 2021-09-30 20:34 UTC (permalink / raw)
To: emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> The peg-allow-symbols patch is more tentative. The issue is that _all_
>> of the entry-points to peg code are macros, meaning you can't build your
>> grammar up in a variable, and then pass that variable to any of
>> `peg-run', `peg-parse', `with-peg-rules', etc. Nobody will evaluate the
>> variable; you have to literally write the rules inside the
>> `with-peg-rules' form. It seems like a fairly plausible use-case to
>> store the rules in a variable or an option, even if you're not doing
>> run-time manipulation of them. The only solution, as Adam found with
>> org-ql, is to `eval' one of the macros.
>>
>> This doesn't seem necessary! The patch has `with-peg-rules' check if the
>> rules are a symbol, and take the `symbol-value' if so. But I wonder if
>> it wouldn't be nicer to break some of the code out: `peg-normalize'
>> seems to be the entry-point for "compile this grammar", and that could
>> be modified to work the way that some languages provide for pre-compiled
>> regexps: a way to let the developer build and compile the grammar at
>> load-time or launch-time, then feed the stored compiled version to
>> parsing routines.
>
> `peg` is the macro that's supposed to be this compilation step: you pass
> it a PEX and you receive a value in return. It's a bit like `lambda`.
>
> You can then use this value (a "peg matcher") to parse something by
> passing it to `peg-run`.
>
> So you can do
>
> (let ((parser (peg PEX)))
> ...
> (peg-run parser ...)
> ...)
>
> What might still be missing, tho is a way to invoke this `parser` from
> within a PEX. So we might want to add a new PEX form that would be akin
> to `funcall`. We could name it `call`:
>
> (let* ((parser (peg PEX))
> ...
> (with-peg-rules
> ((foo ...)
> (bar ... (call parser) ...)
> (baz ...))
> ...))
>
> so (peg-parse (call FORM)) would end up equivalent to (peg-run FORM ...).
> WDYT?
In org-ql, the PEX is redefined at load time and/or run time, being
derived from search keywords that are defined by the package and
possibly by the user. So the PEX can't be defined in advance, at
compile time. So having to use `with-peg-rules' means having to use
`eval'.
That's why it would be nice to have a `peg' function that could be
called with a PEX form, to return a function that could be stored in a
variable and later be called with a string argument, that would parse
the string with the PEG. Sort of like Python's re.compile.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-30 6:04 ` Richard Stallman
@ 2021-10-01 3:27 ` Eric Abrahamsen
0 siblings, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2021-10-01 3:27 UTC (permalink / raw)
To: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > Basically a way of composing a parser out of smaller regexp-like
> > expressions. They can be very useful in a wide variety of situations.
>
> It does sound useful. Can you post a descripion of a specific simple
> example where this approach is advantageous?
I feel like I've ended up advocating for this thing when I know less
about it than anyone here, but...
My sense is that really powerful PEG systems are the sort of thing you
use to parse source code into ASTs, or do syntax highlighting, etc. We
don't need that, and the use-cases I have in mind, anyway, are simpler
situations where I want to parse a stream of
well-defined-but-still-pretty-complicated text. The sort of thing where
a regexp solution turns into a rat's nest very quickly.
One theoretical example is parsing IMAP server responses. The response
text is fully defined, but could vary enormously depending on the
capabilities of the server. Writing naive regexps is a headache.
Another non-theoretical example is the homemade token-parser in
lisp/gnus/gnus-search.el:390-680, which turns a string like:
from:bob (subject:lunch or subject:dinner)
into the sexp
((from . "bob") (or (subject . "lunch") (subject . "dinner"))
There are many, many libraries that need to do something similar. With
peg.el I can parse the above (including arbitrarily-nested
sub-expressions) with twenty lines of peg definition, which is
comprehensible to look at (once you've got the basics), easier to reason
about, and easier to modify. I guess it's sort of equivalent to a BNF.
PEGs and their implementation are the subject of academic research,
obviously, but for my modest uses, anyway, almost anything will do.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-30 20:34 ` Adam Porter
@ 2021-10-01 8:14 ` Augusto Stoffel
2021-10-01 18:05 ` Stefan Monnier
1 sibling, 0 replies; 100+ messages in thread
From: Augusto Stoffel @ 2021-10-01 8:14 UTC (permalink / raw)
To: Adam Porter; +Cc: emacs-devel
On Thu, 30 Sep 2021 at 15:34, Adam Porter <adam@alphapapa.net> wrote:
> In org-ql, the PEX is redefined at load time and/or run time, being
> derived from search keywords that are defined by the package and
> possibly by the user. So the PEX can't be defined in advance, at
> compile time. So having to use `with-peg-rules' means having to use
> `eval'.
>
> That's why it would be nice to have a `peg' function that could be
> called with a PEX form, to return a function that could be stored in a
> variable and later be called with a string argument, that would parse
> the string with the PEG. Sort of like Python's re.compile.
FWIW, in my use of PEGs (which is outside of Emacs, in a code
analyzer/language server for TeX), such on-the-fly generation of parsers
is used all the time as well.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-09-30 20:34 ` Adam Porter
2021-10-01 8:14 ` Augusto Stoffel
@ 2021-10-01 18:05 ` Stefan Monnier
2021-10-01 18:40 ` Eric Abrahamsen
2021-10-02 7:32 ` Adam Porter
1 sibling, 2 replies; 100+ messages in thread
From: Stefan Monnier @ 2021-10-01 18:05 UTC (permalink / raw)
To: Adam Porter; +Cc: emacs-devel
> In org-ql, the PEX is redefined at load time and/or run time, being
> derived from search keywords that are defined by the package and
> possibly by the user. So the PEX can't be defined in advance, at
> compile time. So having to use `with-peg-rules' means having to use
> `eval'.
If the grammar changes radically at run time, based on external/user
data there's probably no better way than via `eval` or similar (`load`,
`byte-compile`, you name it).
But if the changes are sufficiently limited (e.g. have an (or "foo"
"bar" ....) with a variable set of strings that can match), then we can
do better.
E.g. we could have a PEX of the form (re FORM) where FORM can be any
ELisp expression that returns a regular expression. Then `org-ql.el`
could do
(let ((predicate-re (regexp-opt predicate-names)))
(peg-parse
((query (+ term
(opt (+ (syntax-class whitespace) (any)))))
[...]
(predicate (re predicate-re))
[...])))
-- Stefan
PS: BTW, regarding your comment:
;; Sort the keywords longest-first to work around what seems to be an
;; obscure bug in `peg': when one keyword is a substring of another,
;; and the shorter one is listed first, the shorter one fails to match.
The behavior you describe indeed seems like a bug, but maybe what you
see is slightly different (and not a bug): if you have a PEX like
(and (or "foo" "foobar") "X")
the "foo" will match when faced with "foobarX" and the parser won't
backtrack to try and match the "foobar" when the "X" fails to match.
It's one of those differences between BNF and PEG grammars.
So indeed you do want to sort from longest to shortest to avoid
this problem.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-01 18:05 ` Stefan Monnier
@ 2021-10-01 18:40 ` Eric Abrahamsen
2021-10-02 3:57 ` Stefan Monnier
2021-10-02 7:32 ` Adam Porter
1 sibling, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-10-01 18:40 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Adam Porter, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> In org-ql, the PEX is redefined at load time and/or run time, being
>> derived from search keywords that are defined by the package and
>> possibly by the user. So the PEX can't be defined in advance, at
>> compile time. So having to use `with-peg-rules' means having to use
>> `eval'.
>
> If the grammar changes radically at run time, based on external/user
> data there's probably no better way than via `eval` or similar (`load`,
> `byte-compile`, you name it).
Can you explain why a function plus some sort of pre-compilation step
won't work? Maybe if I just tried to write the patch I would naturally
see the problem, but theoretically I don't get it...
> But if the changes are sufficiently limited (e.g. have an (or "foo"
> "bar" ....) with a variable set of strings that can match), then we can
> do better.
>
> E.g. we could have a PEX of the form (re FORM) where FORM can be any
> ELisp expression that returns a regular expression.
I suppose the `call' pex you mentioned up-thread could also ease things
a bit.
I'll hold off on the documentation patch until we know whether any code
will change.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-01 18:40 ` Eric Abrahamsen
@ 2021-10-02 3:57 ` Stefan Monnier
0 siblings, 0 replies; 100+ messages in thread
From: Stefan Monnier @ 2021-10-02 3:57 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Adam Porter, emacs-devel
Eric Abrahamsen [2021-10-01 11:40:47] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> In org-ql, the PEX is redefined at load time and/or run time, being
>>> derived from search keywords that are defined by the package and
>>> possibly by the user. So the PEX can't be defined in advance, at
>>> compile time. So having to use `with-peg-rules' means having to use
>>> `eval'.
>>
>> If the grammar changes radically at run time, based on external/user
>> data there's probably no better way than via `eval` or similar (`load`,
>> `byte-compile`, you name it).
>
> Can you explain why a function plus some sort of pre-compilation step
> won't work?
That "function plus precompilation step" would do the equivalent of
`eval` ;-)
> I suppose the `call' pex you mentioned up-thread could also ease things
> a bit.
Indeed, with it you can define a function like `peg-and` such that
(peg-and (peg PEX1) (peg PEX2)) === (peg (and PEX1 PEX2))
but using such functions to build a PEG would result in substantially
slower code (because it gets split into many small functions, thus
increasing the function call overheads).
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-01 18:05 ` Stefan Monnier
2021-10-01 18:40 ` Eric Abrahamsen
@ 2021-10-02 7:32 ` Adam Porter
2021-10-02 14:45 ` Stefan Monnier
1 sibling, 1 reply; 100+ messages in thread
From: Adam Porter @ 2021-10-02 7:32 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
On Fri, Oct 1, 2021 at 1:05 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > In org-ql, the PEX is redefined at load time and/or run time, being
> > derived from search keywords that are defined by the package and
> > possibly by the user. So the PEX can't be defined in advance, at
> > compile time. So having to use `with-peg-rules' means having to use
> > `eval'.
>
> If the grammar changes radically at run time, based on external/user
> data there's probably no better way than via `eval` or similar (`load`,
> `byte-compile`, you name it).
>
> But if the changes are sufficiently limited (e.g. have an (or "foo"
> "bar" ....) with a variable set of strings that can match), then we can
> do better.
In org-ql's case, it's the latter: the grammar doesn't fundamentally
change, only the list of strings that can be matched in a certain
expression:
https://github.com/alphapapa/org-ql/blob/31aeb0a2505acf8044c07824888ddec7f3e529c1/org-ql.el#L869
> E.g. we could have a PEX of the form (re FORM) where FORM can be any
> ELisp expression that returns a regular expression. Then `org-ql.el`
> could do
>
> (let ((predicate-re (regexp-opt predicate-names)))
> (peg-parse
> ((query (+ term
> (opt (+ (syntax-class whitespace) (any)))))
> [...]
> (predicate (re predicate-re))
> [...])))
That would be helpful, yes.
> PS: BTW, regarding your comment:
>
> ;; Sort the keywords longest-first to work around what seems to be an
> ;; obscure bug in `peg': when one keyword is a substring of another,
> ;; and the shorter one is listed first, the shorter one fails to match.
>
> The behavior you describe indeed seems like a bug, but maybe what you
> see is slightly different (and not a bug): if you have a PEX like
> (and (or "foo" "foobar") "X")
> the "foo" will match when faced with "foobarX" and the parser won't
> backtrack to try and match the "foobar" when the "X" fails to match.
Hmm, thanks. I think an example of the problem is that a predicate in
org-ql might have a shorter alias, e.g. "heading" is has the alias
"h", and predicates are followed by arguments, like "heading:foo", so
IIRC, without sorting them there, "heading:foo" would work, while
"h:foo" wouldn't. (Or maybe a better example is predicates that
optionally accept keyword-style arguments, like
"ts-active:from=2021-10-01", which has the alias "ts-a", and could
also be called without arguments, like "ts-a:".)
> It's one of those differences between BNF and PEG grammars.
> So indeed you do want to sort from longest to shortest to avoid
> this problem.
Thanks, I didn't realize that.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-02 7:32 ` Adam Porter
@ 2021-10-02 14:45 ` Stefan Monnier
2021-10-02 15:13 ` Adam Porter
0 siblings, 1 reply; 100+ messages in thread
From: Stefan Monnier @ 2021-10-02 14:45 UTC (permalink / raw)
To: Adam Porter; +Cc: emacs-devel
>> E.g. we could have a PEX of the form (re FORM) where FORM can be any
>> ELisp expression that returns a regular expression. Then `org-ql.el`
>> could do
>>
>> (let ((predicate-re (regexp-opt predicate-names)))
>> (peg-parse
>> ((query (+ term
>> (opt (+ (syntax-class whitespace) (any)))))
>> [...]
>> (predicate (re predicate-re))
>> [...])))
>
> That would be helpful, yes.
Thanks, I'll think about what can be done here.
>> PS: BTW, regarding your comment:
>>
>> ;; Sort the keywords longest-first to work around what seems to be an
>> ;; obscure bug in `peg': when one keyword is a substring of another,
>> ;; and the shorter one is listed first, the shorter one fails to match.
>>
>> The behavior you describe indeed seems like a bug, but maybe what you
>> see is slightly different (and not a bug): if you have a PEX like
>> (and (or "foo" "foobar") "X")
>> the "foo" will match when faced with "foobarX" and the parser won't
>> backtrack to try and match the "foobar" when the "X" fails to match.
>
> Hmm, thanks. I think an example of the problem is that a predicate in
> org-ql might have a shorter alias, e.g. "heading" is has the alias
> "h", and predicates are followed by arguments, like "heading:foo", so
> IIRC, without sorting them there, "heading:foo" would work, while
> "h:foo" wouldn't.
Odd. If you have (or "h" "header") in the grammar then I'd expect
"h:foo" to be recognized but "heading:foo" to be rejected (IOW, that
would be a bug in the grammar rather than in `peg.el`).
But you describe the exact opposite for which I don't have
an explanation. So maybe it's a bug in `peg.el`. Could you try and
distill it into a bug report?
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-02 14:45 ` Stefan Monnier
@ 2021-10-02 15:13 ` Adam Porter
0 siblings, 0 replies; 100+ messages in thread
From: Adam Porter @ 2021-10-02 15:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
On Sat, Oct 2, 2021 at 9:45 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> >> PS: BTW, regarding your comment:
> >>
> >> ;; Sort the keywords longest-first to work around what seems to be an
> >> ;; obscure bug in `peg': when one keyword is a substring of another,
> >> ;; and the shorter one is listed first, the shorter one fails to match.
> >>
> >> The behavior you describe indeed seems like a bug, but maybe what you
> >> see is slightly different (and not a bug): if you have a PEX like
> >> (and (or "foo" "foobar") "X")
> >> the "foo" will match when faced with "foobarX" and the parser won't
> >> backtrack to try and match the "foobar" when the "X" fails to match.
> >
> > Hmm, thanks. I think an example of the problem is that a predicate in
> > org-ql might have a shorter alias, e.g. "heading" is has the alias
> > "h", and predicates are followed by arguments, like "heading:foo", so
> > IIRC, without sorting them there, "heading:foo" would work, while
> > "h:foo" wouldn't.
>
> Odd. If you have (or "h" "header") in the grammar then I'd expect
> "h:foo" to be recognized but "heading:foo" to be rejected (IOW, that
> would be a bug in the grammar rather than in `peg.el`).
>
> But you describe the exact opposite for which I don't have
> an explanation. So maybe it's a bug in `peg.el`. Could you try and
> distill it into a bug report?
Frankly, probably not. :) I worked on that code a long time ago and
haven't touched it since, so my recollection might not even be
accurate. For me, it Just Works(TM), and I have other Emacs-related
projects that are higher priority, so I don't expect to be able to
work on that part of org-ql or peg.el anytime soon. Sorry. :(
(e.g. I'd really like to make progress on this bug report, so I could
reasonably submit plz.el to ELPA (though I might do that anyway, since
it mostly works fine):
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=50166 But it's stumped
me so far. Maybe someone else would have some ideas sometime...)
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
2021-08-26 6:17 ` Eli Zaretskii
2021-08-26 17:02 ` Adam Porter
@ 2021-10-09 1:31 ` Michael Heerdegen
2021-10-09 5:28 ` Michael Heerdegen
` (2 more replies)
2022-11-07 3:33 ` Ihor Radchenko
3 siblings, 3 replies; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-09 1:31 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Stefan Monnier, emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Hi all,
>
> In my on-again-off-again quest to not have to write text parsers myself,
> I was pointed towards the PEG library (in ELPA), which does pretty much
> exactly what I want (Parsing Expression Grammars).
I like the idea, and I have some remarks:
(1) Can we improve the introduction in the file header a bit? I would
add a link to the wikipedia page:
https://en.wikipedia.org/wiki/Parsing_expression_grammar
it explains some background.
And: one example could contain the (non-standard if you only know
regexps, but very educative) solution to the problem: "how do you jump
over arbitrary text preceding a match?" (the answer seems to be: "use
`or' and recursion", at least this is what I found out by myself after a
while).
(2) Would (replace E RPL) not be much more useful if it would be allowed
to pop from the stack? Something like (replace E [VAR...] -- REPL)
where REPL could use the VAR bindings? Background is of course that a
replacement may depend on intermediate parsing results.
(3) `(_ --) seems to produce an "Empty let body" compiler warning - can
we silence it?
(4) How hard would it be to parse regexps (or translate `rx' forms) into
an equivalent peg?
(5) We need to add a Game-like tutorial to PEGs called Peg-Man. Ok,
that one was only a joke.
WDYT?
Thanks,
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 1:31 ` Michael Heerdegen
@ 2021-10-09 5:28 ` Michael Heerdegen
2021-10-09 8:12 ` Helmut Eller
2021-10-09 12:54 ` Stefan Monnier
2021-10-09 16:49 ` Eric Abrahamsen
2 siblings, 1 reply; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-09 5:28 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Stefan Monnier, emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> "how do you jump over arbitrary text preceding a match?" (the answer
> seems to be: "use `or' and recursion", at least this is what I found
> out by myself after a while).
No - using recursive rules of the kind
(rule [maches what I want])
(search (or rule (and (any) search)))
to advance over preceding text is not a good method in Emacs, this hits
Emacs' maximum recursion level after a bunch of lines if we advance one
character each time (which can't be avoided when searching text). Is
there a better solution for this kind of problem?
Thanks,
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 5:28 ` Michael Heerdegen
@ 2021-10-09 8:12 ` Helmut Eller
2021-10-09 12:52 ` Stefan Monnier
2021-10-14 10:25 ` Michael Heerdegen
0 siblings, 2 replies; 100+ messages in thread
From: Helmut Eller @ 2021-10-09 8:12 UTC (permalink / raw)
To: emacs-devel
On Sat, Oct 09 2021, Michael Heerdegen wrote:
>> "how do you jump over arbitrary text preceding a match?" (the answer
>> seems to be: "use `or' and recursion", at least this is what I found
>> out by myself after a while).
>
> No - using recursive rules of the kind
>
> (rule [maches what I want])
> (search (or rule (and (any) search)))
>
> to advance over preceding text is not a good method in Emacs, this hits
> Emacs' maximum recursion level after a bunch of lines if we advance one
> character each time (which can't be avoided when searching text). Is
> there a better solution for this kind of problem?
Self-recursion can sometimes be rewritten using *. In peg.el, * is "inlined"
and so doesn't run out of stack:
(rule [maches what I want])
(search (and (* (not rule) (any)) rule))
It's kinda like rewriting a self tail call to a while loop.
For the general case, peg.el would need some form of proper tail calls.
Helmut
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 8:12 ` Helmut Eller
@ 2021-10-09 12:52 ` Stefan Monnier
2021-10-10 5:49 ` Helmut Eller
2021-10-14 10:25 ` Michael Heerdegen
1 sibling, 1 reply; 100+ messages in thread
From: Stefan Monnier @ 2021-10-09 12:52 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
> For the general case, peg.el would need some form of proper tail calls.
Maybe we could (re)use the tail-call elimination that I implemented for
`named-let`.
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 1:31 ` Michael Heerdegen
2021-10-09 5:28 ` Michael Heerdegen
@ 2021-10-09 12:54 ` Stefan Monnier
2021-10-09 16:47 ` Eric Abrahamsen
2021-10-09 16:49 ` Eric Abrahamsen
2 siblings, 1 reply; 100+ messages in thread
From: Stefan Monnier @ 2021-10-09 12:54 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: Eric Abrahamsen, emacs-devel
> (1) Can we improve the introduction in the file header a bit? I would
> add a link to the wikipedia page:
>
> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> it explains some background.
I can't speak for Helmut, but I think you should feel free to make such
a change, yes.
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 12:54 ` Stefan Monnier
@ 2021-10-09 16:47 ` Eric Abrahamsen
2021-10-10 4:20 ` Michael Heerdegen
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-10-09 16:47 UTC (permalink / raw)
To: emacs-devel; +Cc: Michael Heerdegen
[-- Attachment #1: Type: text/plain, Size: 456 bytes --]
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> (1) Can we improve the introduction in the file header a bit? I would
>> add a link to the wikipedia page:
>>
>> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>>
>> it explains some background.
>
> I can't speak for Helmut, but I think you should feel free to make such
> a change, yes.
I've still got this documentation patch I haven't applied, I can just
add that link to this patch?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: peg-doc-patch.diff --]
[-- Type: text/x-patch, Size: 4172 bytes --]
diff --git a/peg.el b/peg.el
index d71c707dc0..0e4221eeb7 100644
--- a/peg.el
+++ b/peg.el
@@ -79,17 +79,69 @@
;; Beginning-of-Symbol (bos)
;; End-of-Symbol (eos)
;;
-;; PEXs also support parsing actions, i.e. Lisp snippets which
-;; are executed when a pex matches. This can be used to construct
-;; syntax trees or for similar tasks. Actions are written as
+;; Rules can refer to other rules, and a grammar is often structured
+;; as a tree, with a root rule referring to one or more "branch
+;; rules", all the way down to the "leaf rules" that deal with actual
+;; buffer text. Rules can be recursive or mutually referential,
+;; though care must be taken not to create infinite loops.
+;;
+;; PEXs also support parsing actions, i.e. Lisp snippets which are
+;; executed when a pex matches. This can be used to construct syntax
+;; trees or for similar tasks. The most basic form of action is
+;; written as:
;;
;; (action FORM) ; evaluate FORM for its side-effects
-;; `(VAR... -- FORM...) ; stack action
;;
;; Actions don't consume input, but are executed at the point of
-;; match. A "stack action" takes VARs from the "value stack" and
-;; pushes the result of evaluating FORMs to that stack.
-;; See `peg-ex-parse-int' in `peg-tests.el' for an example.
+;; match. Another kind of action is called a "stack action", and
+;; looks like this:
+;;
+;; `(VAR... -- FORM...) ; stack action
+;;
+;; A stack action takes VARs from the "value stack" and pushes the
+;; results of evaluating FORMs to that stack.
+
+;; The value stack is created during the course of parsing. Certain
+;; operators (see below) that match buffer text can push values onto
+;; this stack. "Upstream" rules can then draw values from the stack,
+;; and optionally push new ones back. For instance, consider this
+;; very simple grammar:
+;;
+;; (with-peg-rules
+;; ((query (+ term) (eol))
+;; (term key ":" value (opt (+ [space]))
+;; `(k v -- (cons (intern k) v)))
+;; (key (substring (and (not ":") (+ [word]))))
+;; (value (or string-value number-value))
+;; (string-value (substring (+ [alpha])))
+;; (number-value (substring (+ [digit]))
+;; `(val -- (string-to-number val))))
+;; (peg-run (peg query)))
+;;
+;; This invocation of `peg-run' would parse this buffer text:
+;;
+;; name:Jane age:30
+;;
+;; And return this Elisp sexp:
+;;
+;; ((age . 30) (name . "Jane"))
+;;
+;; Note that, in complex grammars, some care must be taken to make
+;; sure that the number and type of values drawn from the stack always
+;; match those pushed. In the example above, both `string-value' and
+;; `number-value' push a single value to the stack. Since the `value'
+;; rule only includes these two sub-rules, any upstream rule that
+;; makes use of `value' can be confident it will always and only push
+;; a single value to the stack.
+;;
+;; Stack action forms are in a sense analogous to lambda forms: the
+;; symbols before the "--" are the equivalent of lambda arguments,
+;; while the forms after the "--" are return values. The difference
+;; being that a lambda form can only return a single value, while a
+;; stack action can push multiple values onto the stack. It's also
+;; perfectly valid to use `(-- FORM...)' or `(VAR... --)': the former
+;; pushes values to the stack without consuming any, and the latter
+;; pops values from the stack and discards them.
;;
;; Derived Operators:
;;
@@ -101,6 +153,8 @@
;; (replace E RPL); Match E and replace the matched region with RPL.
;; (list E) ; Match E and push a list of the items that E produced.
;;
+;; See `peg-ex-parse-int' in `peg-tests.el' for further examples.
+;;
;; Regexp equivalents:
;;
;; Here a some examples for regexps and how those could be written as pex.
@@ -177,7 +231,7 @@ EXPS is a list of rules/expressions that failed.")
;;;; Main entry points
-;; Sometimes (with-peg-rule ... (peg-run (peg ...))) is too
+;; Sometimes (with-peg-rules ... (peg-run (peg ...))) is too
;; longwinded for the task at hand, so `peg-parse' comes in handy.
(defmacro peg-parse (&rest pexs)
"Match PEXS at point.
^ permalink raw reply related [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 1:31 ` Michael Heerdegen
2021-10-09 5:28 ` Michael Heerdegen
2021-10-09 12:54 ` Stefan Monnier
@ 2021-10-09 16:49 ` Eric Abrahamsen
2021-10-10 3:43 ` Stefan Monnier
2 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-10-09 16:49 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: Stefan Monnier, emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Hi all,
>>
>> In my on-again-off-again quest to not have to write text parsers myself,
>> I was pointed towards the PEG library (in ELPA), which does pretty much
>> exactly what I want (Parsing Expression Grammars).
>
> I like the idea, and I have some remarks:
>
> (1) Can we improve the introduction in the file header a bit? I would
> add a link to the wikipedia page:
>
> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> it explains some background.
[...]
> (4) How hard would it be to parse regexps (or translate `rx' forms) into
> an equivalent peg?
I had this idea as well -- we've already got "regexps that look like
forms", it seems like it would be a natural to integrate this with rx.
One thing we're not short of here is new ideas for code, but I do think
this would make a lot of sense.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 16:49 ` Eric Abrahamsen
@ 2021-10-10 3:43 ` Stefan Monnier
2021-10-10 4:46 ` Michael Heerdegen
0 siblings, 1 reply; 100+ messages in thread
From: Stefan Monnier @ 2021-10-10 3:43 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Michael Heerdegen, emacs-devel
>> (4) How hard would it be to parse regexps (or translate `rx' forms) into
>> an equivalent peg?
> I had this idea as well -- we've already got "regexps that look like
> forms", it seems like it would be a natural to integrate this with rx.
> One thing we're not short of here is new ideas for code, but I do think
> this would make a lot of sense.
I think turning a regexp into a PEG should be easy, but at one condition:
you shouldn't expect that PEG to be *equivalent* to the regexp.
E.g. when matching
(string-match "\\(ab\\|a)bc" "abc")
the "natural" PEG for that regexp will fail to match (because it will
see a success to match "ab" and will hence just skip the "a"
alternative).
Correctly matching regexps requires a deeper form of backtracking than
provided by PEGs.
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 16:47 ` Eric Abrahamsen
@ 2021-10-10 4:20 ` Michael Heerdegen
2021-10-10 21:40 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-10 4:20 UTC (permalink / raw)
To: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> >> (1) Can we improve the introduction in the file header a bit? I would
> >> add a link to the wikipedia page:
> >>
> >> https://en.wikipedia.org/wiki/Parsing_expression_grammar
> >>
> >> it explains some background.
> >
> > I can't speak for Helmut, but I think you should feel free to make such
> > a change, yes.
>
> I've still got this documentation patch I haven't applied, I can just
> add that link to this patch?
From my side, nothing against that. I have quickly skimmed over your
text and found nothing obviously wrong or confusing, and it makes some
things a bit clearer.
Should we say something about how to use globally defined pegs? AFAIU
you can use them like (my-peg) in parens, contrary to rules, which
appear as plain symbols. At least, this was one of the things I
wondered while trying this out: what do I have to wrap in
parens.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 3:43 ` Stefan Monnier
@ 2021-10-10 4:46 ` Michael Heerdegen
2021-10-10 5:58 ` Helmut Eller
0 siblings, 1 reply; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-10 4:46 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eric Abrahamsen, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
> Correctly matching regexps requires a deeper form of backtracking than
> provided by PEGs.
I learned PEGs are able to accept any type 3 language. I also learned
that PEGs alternatives work differently.
Is it practically possible to transform a regexp into a really
equivalent PEG, or is it too difficult, or would the resulting PEG just
be too large or inefficient?
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 12:52 ` Stefan Monnier
@ 2021-10-10 5:49 ` Helmut Eller
0 siblings, 0 replies; 100+ messages in thread
From: Helmut Eller @ 2021-10-10 5:49 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
On Sat, Oct 09 2021, Stefan Monnier wrote:
>> For the general case, peg.el would need some form of proper tail calls.
>
> Maybe we could (re)use the tail-call elimination that I implemented for
> `named-let`.
This reminds me of a question I wanted to ask. Suppose we want to
implement a LPeg-like virtual machine[*] as a dynamic module. Is there
a reasonably efficient API to read a buffer's content? Maybe something
like a FILE* stream backed by an Emacs buffer?
Helmut
[*] http://www.inf.puc-rio.br/~roberto/docs/ry08-4.pdf
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 4:46 ` Michael Heerdegen
@ 2021-10-10 5:58 ` Helmut Eller
2021-10-10 13:56 ` Stefan Monnier
` (2 more replies)
0 siblings, 3 replies; 100+ messages in thread
From: Helmut Eller @ 2021-10-10 5:58 UTC (permalink / raw)
To: emacs-devel
On Sun, Oct 10 2021, Michael Heerdegen wrote:
> Is it practically possible to transform a regexp into a really
> equivalent PEG, or is it too difficult, or would the resulting PEG just
> be too large or inefficient?
The LPEG people wrote a paper[*] about this problem. But I haven't read
it. I think, that regexp without backrefs can be implemented with DFAs,
and, hence, shouldn't need any backtracking. The problem probably are
backrefs and other extensions.
Helmut
http://www.lua.inf.puc-rio.br/publications/medeiros11regular.pdf
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 5:58 ` Helmut Eller
@ 2021-10-10 13:56 ` Stefan Monnier
2021-10-22 16:33 ` Michael Heerdegen
2021-10-31 23:43 ` Michael Heerdegen
2 siblings, 0 replies; 100+ messages in thread
From: Stefan Monnier @ 2021-10-10 13:56 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
>> Is it practically possible to transform a regexp into a really
>> equivalent PEG, or is it too difficult, or would the resulting PEG just
>> be too large or inefficient?
> The LPEG people wrote a paper[*] about this problem. But I haven't read
This is similar to turning the regexp into an NFA and then using the PEG
backtracking to run the NFA. Our regexp engine tries to remove some
simple forms of backtracking (e.g. for regexps like "\\(.foo\\)*\nbar"
because \n and . are mutually exclusive). This significantly reduces
the amount of stack use. We could/should perform similar optimizations
in `peg.el`.
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 4:20 ` Michael Heerdegen
@ 2021-10-10 21:40 ` Eric Abrahamsen
2021-10-13 2:58 ` Michael Heerdegen
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2021-10-10 21:40 UTC (permalink / raw)
To: emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> >> (1) Can we improve the introduction in the file header a bit? I would
>> >> add a link to the wikipedia page:
>> >>
>> >> https://en.wikipedia.org/wiki/Parsing_expression_grammar
>> >>
>> >> it explains some background.
>> >
>> > I can't speak for Helmut, but I think you should feel free to make such
>> > a change, yes.
>>
>> I've still got this documentation patch I haven't applied, I can just
>> add that link to this patch?
>
> From my side, nothing against that. I have quickly skimmed over your
> text and found nothing obviously wrong or confusing, and it makes some
> things a bit clearer.
>
> Should we say something about how to use globally defined pegs? AFAIU
> you can use them like (my-peg) in parens, contrary to rules, which
> appear as plain symbols. At least, this was one of the things I
> wondered while trying this out: what do I have to wrap in
> parens.
I'm not quite sure what you mean here. If you use the `define-peg-rule'
you can use the symbol plain, you don't have to wrap it in parentheses.
If you want to use one of the built-in action functions, like
"substring", then you have to wrap your symbol in that, same as if you
were defining a rule on the spot. But that's just for convenience. The
shorthand:
(substring <my-peg-symbol>)
is defined as:
(and `(-- (point))
<my-peg-symbol>
`(start -- (buffer-substring-no-properties start (point))))
I don't think you have to wrap anything in parentheses, though you *can*
if you want to, and it will work correctly.
Am I misunderstanding you?
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 21:40 ` Eric Abrahamsen
@ 2021-10-13 2:58 ` Michael Heerdegen
0 siblings, 0 replies; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-13 2:58 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> I don't think you have to wrap anything in parentheses, though you
> *can* if you want to, and it will work correctly.
>
> Am I misunderstanding you?
No, thanks for making that clear, I just didn't know.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-09 8:12 ` Helmut Eller
2021-10-09 12:52 ` Stefan Monnier
@ 2021-10-14 10:25 ` Michael Heerdegen
1 sibling, 0 replies; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-14 10:25 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
> Self-recursion can sometimes be rewritten using *. In peg.el, * is
> "inlined" and so doesn't run out of stack:
>
> (rule [maches what I want])
> (search (and (* (not rule) (any)) rule))
>
> It's kinda like rewriting a self tail call to a while loop.
Yes, that works well in my case, thanks.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 5:58 ` Helmut Eller
2021-10-10 13:56 ` Stefan Monnier
@ 2021-10-22 16:33 ` Michael Heerdegen
2021-10-31 23:43 ` Michael Heerdegen
2 siblings, 0 replies; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-22 16:33 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
> On Sun, Oct 10 2021, Michael Heerdegen wrote:
>
> > Is it practically possible to transform a regexp into a really
> > equivalent PEG, or is it too difficult, or would the resulting PEG just
> > be too large or inefficient?
>
> The LPEG people wrote a paper[*] about this problem.
IIUC their answer to the ordered `or' operator problem is simply, at the
end, to apply the distributive law when performing the transcription.
So e.g.
(and (or "a" "aa") "b") doesn't match "aab" as a peg, but
(or (and "a" "b") (and "aa" "b")) does.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-10 5:58 ` Helmut Eller
2021-10-10 13:56 ` Stefan Monnier
2021-10-22 16:33 ` Michael Heerdegen
@ 2021-10-31 23:43 ` Michael Heerdegen
2021-11-15 23:16 ` Michael Heerdegen
2 siblings, 1 reply; 100+ messages in thread
From: Michael Heerdegen @ 2021-10-31 23:43 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 720 bytes --]
Helmut Eller <eller.helmut@gmail.com> writes:
> The LPEG people wrote a paper[*] about this problem.
I tried to convert their transcription function to Elisp. See below.
Seems to work - but so far only basic regexp constructs are supported.
> The problem probably are backrefs and other extensions.
I think backrefs can be implemented in peg.el in a simple way. They
can't be a standard extension though because matching the backref needs
to advance point (so they are not just a certain `guard').
If that is supported by peg.el I think backrefs could just be
transcribed more or less directly but I I'm not sure about equivalence.
> Helmut
>
> http://www.lua.inf.puc-rio.br/publications/medeiros11regular.pdf
[-- Attachment #2: rx-to-peg.el --]
[-- Type: application/emacs-lisp, Size: 4283 bytes --]
[-- Attachment #3: Type: text/plain, Size: 12 bytes --]
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-10-31 23:43 ` Michael Heerdegen
@ 2021-11-15 23:16 ` Michael Heerdegen
0 siblings, 0 replies; 100+ messages in thread
From: Michael Heerdegen @ 2021-11-15 23:16 UTC (permalink / raw)
To: Helmut Eller; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]
Michael Heerdegen <michael_heerdegen@web.de> writes:
> > The LPEG people wrote a paper[*] about this problem.
The converter is more or less done - see below. Feedback welcome!
Nearly everything regexps support is implemented. I tried to make
everything so that the resulting peg is really equivalent to the given
regexp - please tell me if you find a translation where this is not
respected.
Remaining problems:
(1) Group numbering currently has to be explicit - unnumbered groups are
silently treated as shy. That's because getting the numbering right is
not trivial.
I implemented groups and backrefs using an uninterned global variable
owned by the peg. It would be better to add built-in support to peg.el
if we want that feature.
(2) Transforming character ranges to the vector representation that
peg.el uses is not trivial. I would welcome help to get it done
correctly. A possible (slow) fallback solution is a guard calling
`looking-at' followed by an (any).
Oh - why I think this conversion code is useful? It's nice for learning
but also for cases were a regexp would almost suffice but you need some
Elisp guard somewhere in the middle of matching the regexp to
examine the buffer at that position.
[-- Attachment #2: rx-to-peg.el --]
[-- Type: application/emacs-lisp, Size: 13406 bytes --]
[-- Attachment #3: Type: text/plain, Size: 11 bytes --]
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
` (2 preceding siblings ...)
2021-10-09 1:31 ` Michael Heerdegen
@ 2022-11-07 3:33 ` Ihor Radchenko
2022-11-07 19:46 ` Eric Abrahamsen
3 siblings, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2022-11-07 3:33 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel, Stefan Monnier
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Would the maintainers consider moving this into Emacs proper? I ask
> mostly because this would be very useful to have in Gnus, both to
> replace the home-made parser in gnus-search.el, and I would hope to
> parse eg IMAP server responses more fully and reliably.
Is there any progress merging peg.el to Emacs?
I do not see any obvious blockers in the discussion, but the merge never
happened?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-07 3:33 ` Ihor Radchenko
@ 2022-11-07 19:46 ` Eric Abrahamsen
2022-11-08 6:57 ` Helmut Eller
` (2 more replies)
0 siblings, 3 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-07 19:46 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-devel, Stefan Monnier
Ihor Radchenko <yantar92@posteo.net> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Would the maintainers consider moving this into Emacs proper? I ask
>> mostly because this would be very useful to have in Gnus, both to
>> replace the home-made parser in gnus-search.el, and I would hope to
>> parse eg IMAP server responses more fully and reliably.
>
> Is there any progress merging peg.el to Emacs?
> I do not see any obvious blockers in the discussion, but the merge never
> happened?
It certainly did lose momentum. I think there were some issues regarding
implementation and API, some open questions, and then whoever would have
needed to take ownership of the ticket and see it through did not do so.
Probably that should have been whoever opened the bug report to begin
with!
I believe peg.el does a few things in non-standard ways. I'm not very
familiar with parsing expression grammars, and I don't feel qualified to
judge just how non-standard those ways are, and whether it's a real
issue. But if no one has any massive objections (or plausible fixes)
then personally I'd be okay with it going in like this. I'm not a
maintainer though!
I will say that I tried to use PEG to resolve some gruesome text-parsing
issues in EBDB very recently, and failed to make it work in the hour or
two I'd allotted to the problem. The file-comment docs are pretty good,
but I think they would need to be expanded in a few crucial ways,
particularly to help those who don't necessarily know how PEGs work.
Specifically, it is not obvious (to me) the ways in which PEGs (or maybe
just peg.el) are not fully declarative. It doesn't backtrack, and I
suspect it won't ever backtrack or isn't even supposed to, which means
users should be made explicitly aware of the ways in which their rules
can fail, and the ways in which declaration order matter. The comment
for the `or' construct reads:
Prioritized Choice
And that's about the only hint you get. I was trying to parse a
multiword name like
Eric Edwin Abrahamsen
into the structure
(("Eric" "Edwin") "Abrahamsen")
using rules like
(plain-name (substring (+ [word])) (* [space]))
(full-name (list (+ plain-name) plain-name)
`(names -- (list (butlast names) (car (last names)))))
Which always fails to match because (+ plain-name) is greedy and eats up
all the words. It doesn't ever try leaving out the last word in an
attempt to make the rule match.
I'm happy to write the docs (should it have its own info manual
section?), if we really think there are no other necessary
fixes/improvements.
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-07 19:46 ` Eric Abrahamsen
@ 2022-11-08 6:57 ` Helmut Eller
2022-11-08 8:51 ` Ihor Radchenko
2022-11-10 4:04 ` Richard Stallman
2022-11-08 8:47 ` Ihor Radchenko
2022-11-08 14:01 ` Stefan Monnier
2 siblings, 2 replies; 100+ messages in thread
From: Helmut Eller @ 2022-11-08 6:57 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Ihor Radchenko, emacs-devel
On Mon, Nov 07 2022, Eric Abrahamsen wrote:
>> Is there any progress merging peg.el to Emacs?
>> I do not see any obvious blockers in the discussion, but the merge never
>> happened?
>
> It certainly did lose momentum. I think there were some issues regarding
> implementation and API, some open questions, and then whoever would have
> needed to take ownership of the ticket and see it through did not do so.
> Probably that should have been whoever opened the bug report to begin
> with!
Isn't Tree-sitter a better alternative to peg.el? I've never used
Tree-sitter, but from the few things I read about it, it seems to be
more "declarative", more efficient, and actually supported in core.
Helmut
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-07 19:46 ` Eric Abrahamsen
2022-11-08 6:57 ` Helmut Eller
@ 2022-11-08 8:47 ` Ihor Radchenko
2022-11-08 16:18 ` Eric Abrahamsen
2022-11-08 14:01 ` Stefan Monnier
2 siblings, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2022-11-08 8:47 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel, Stefan Monnier
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>> Is there any progress merging peg.el to Emacs?
>> I do not see any obvious blockers in the discussion, but the merge never
>> happened?
>
> I will say that I tried to use PEG to resolve some gruesome text-parsing
> issues in EBDB very recently, and failed to make it work in the hour or
> two I'd allotted to the problem. The file-comment docs are pretty good,
> but I think they would need to be expanded in a few crucial ways,
> particularly to help those who don't necessarily know how PEGs work.
>
> Specifically, it is not obvious (to me) the ways in which PEGs (or maybe
> just peg.el) are not fully declarative. It doesn't backtrack, and I
> suspect it won't ever backtrack or isn't even supposed to, which means
> users should be made explicitly aware of the ways in which their rules
> can fail, and the ways in which declaration order matter. The comment
> for the `or' construct reads:
>
> Prioritized Choice
>
> And that's about the only hint you get.
As the comment in peg.el states, the definitions are adapted from the
original PEG paper. There is even a link to paper and also to
presentation explaining how peg works. I strongly advice you to read
that. Prioritized Choice is explained there.
> I was trying to parse a
> multiword name like
>
> Eric Edwin Abrahamsen
>
> into the structure
>
> (("Eric" "Edwin") "Abrahamsen")
>
> using rules like
>
> (plain-name (substring (+ [word])) (* [space]))
> (full-name (list (+ plain-name) plain-name)
> `(names -- (list (butlast names) (car (last names)))))
>
> Which always fails to match because (+ plain-name) is greedy and eats up
> all the words. It doesn't ever try leaving out the last word in an
> attempt to make the rule match.
One way is
(with-peg-rules
((name (substring (+ [word])) (* [blank]))
(given-name name (not (eol)))
(last-name name (and (eol)))
(full-name (list (+ given-name) last-name) `(names -- (list (butlast names) (car (last names))))))
(peg-run (peg full-name)))
A simple-minded non-greedy version would be ambiguous. You must
necessarily indicate end of input.
A more appropriate non-ambiguous non-greedy statement would involve or
(which you admittedly did not understand):
(with-peg-rules
((name (substring (+ [word])) (* [blank]))
(given-name name)
(last-name name (and (eol)))
(full-name (list (+ (or last-name given-name)) (and (eol))) `(names -- (list (butlast names) (car (last names))))))
;;;;;;;;;;;;;;;;;;;;;^^
(peg-run (peg full-name)))
> I'm happy to write the docs (should it have its own info manual
> section?), if we really think there are no other necessary
> fixes/improvements.
I find PEG to be a nice addition when regexps do not cut the necessary
parsing, while using Bovine or tree-sitter is an overkill.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 6:57 ` Helmut Eller
@ 2022-11-08 8:51 ` Ihor Radchenko
2022-11-10 4:04 ` Richard Stallman
1 sibling, 0 replies; 100+ messages in thread
From: Ihor Radchenko @ 2022-11-08 8:51 UTC (permalink / raw)
To: Helmut Eller; +Cc: Eric Abrahamsen, emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
> Isn't Tree-sitter a better alternative to peg.el? I've never used
> Tree-sitter, but from the few things I read about it, it seems to be
> more "declarative", more efficient, and actually supported in core.
Tree-sitter is a massive overkill when you need to parse something just
slightly more complex than can be done via regexps.
Tree-sitter requires a whole separate .so file with compiled parser +
buffer setup.
Bovine is a bit easier to use (you can, at least, define grammar in
Elisp), but you also need to setup parser in a separate buffer with
existing documentation being even more limited compared to peg.el.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-07 19:46 ` Eric Abrahamsen
2022-11-08 6:57 ` Helmut Eller
2022-11-08 8:47 ` Ihor Radchenko
@ 2022-11-08 14:01 ` Stefan Monnier
2022-11-08 14:42 ` tomas
` (2 more replies)
2 siblings, 3 replies; 100+ messages in thread
From: Stefan Monnier @ 2022-11-08 14:01 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: Ihor Radchenko, emacs-devel
> And that's about the only hint you get. I was trying to parse a
> multiword name like
>
> Eric Edwin Abrahamsen
Side note: the division between "given name" a "family name" is not
a universal property, so as general rule I'd advise against trying to do
it (and treat the whole thing as just "the name" without trying to
analyze its structure) unless there's some strong external factor that
requires it.
Stefan
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 14:01 ` Stefan Monnier
@ 2022-11-08 14:42 ` tomas
2022-11-08 15:08 ` Visuwesh
2022-11-08 16:10 ` Eric Abrahamsen
2 siblings, 0 replies; 100+ messages in thread
From: tomas @ 2022-11-08 14:42 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 657 bytes --]
On Tue, Nov 08, 2022 at 09:01:00AM -0500, Stefan Monnier wrote:
> > And that's about the only hint you get. I was trying to parse a
> > multiword name like
> >
> > Eric Edwin Abrahamsen
>
> Side note: the division between "given name" a "family name" is not
> a universal property [...]
HAH! That's what I try to tell all my customers. But they won't listen.
I cheat: the display says "given name" and "family name", but search
goes just over a combination of those. Users don't complain :-)
The different conventions in middle/western Europe and USA are
already pretty dizzying. Including Africa and East Asia well...
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 14:01 ` Stefan Monnier
2022-11-08 14:42 ` tomas
@ 2022-11-08 15:08 ` Visuwesh
2022-11-08 16:29 ` Juanma Barranquero
2022-11-08 16:10 ` Eric Abrahamsen
2 siblings, 1 reply; 100+ messages in thread
From: Visuwesh @ 2022-11-08 15:08 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eric Abrahamsen, Ihor Radchenko, emacs-devel
[செவ்வாய் நவம்பர் 08, 2022] Stefan Monnier wrote:
>> And that's about the only hint you get. I was trying to parse a
>> multiword name like
>>
>> Eric Edwin Abrahamsen
>
> Side note: the division between "given name" a "family name" is not
> a universal property, so as general rule I'd advise against trying to do
> it (and treat the whole thing as just "the name" without trying to
> analyze its structure) unless there's some strong external factor that
> requires it.
+1. Nothing annoys me more than a form that says "First name",
"Surname", and "Last name": I don't have a last name, just an initial.
I usually put my father's "first name" as my last name when there's an
absolute need to but then all the mails addressed to me make *zero*
sense since they are addressed as
Dear <Father's "first name">
rather than
Dear Visuwesh
which always makes me doubt that I got my father's mail *somehow*
instead. (Side side note: every single time my family needs to fill up
a form, we have a ten minute meeting about what to do with the
first-name-last-name situation; it is not fun as you can imagine.)
I sighed a breath of relief when the FSF CA form did not have anything
like "last name".
[ Ever since I came to the university campus, explaining that I have no
"last name" has been a recurring and fun activity. ]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 14:01 ` Stefan Monnier
2022-11-08 14:42 ` tomas
2022-11-08 15:08 ` Visuwesh
@ 2022-11-08 16:10 ` Eric Abrahamsen
2022-11-08 18:59 ` tomas
2 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-08 16:10 UTC (permalink / raw)
To: emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> And that's about the only hint you get. I was trying to parse a
>> multiword name like
>>
>> Eric Edwin Abrahamsen
>
> Side note: the division between "given name" a "family name" is not
> a universal property, so as general rule I'd advise against trying to do
> it (and treat the whole thing as just "the name" without trying to
> analyze its structure) unless there's some strong external factor that
> requires it.
Oh, I've gone down all the rabbit holes... EBDB doesn't force this, it
distinguishes between "complex" and "simple" names, and also allows
"complex" names that only have a given name, or a list of given names,
etc. Input can be done explicitly, field by field, or you can just chunk
a string in there and see what happens. This peg.el adventure is only
about "seeing what happens" with complex names.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 8:47 ` Ihor Radchenko
@ 2022-11-08 16:18 ` Eric Abrahamsen
2022-11-08 19:08 ` tomas
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-08 16:18 UTC (permalink / raw)
To: emacs-devel
Ihor Radchenko <yantar92@posteo.net> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>>> Is there any progress merging peg.el to Emacs?
>>> I do not see any obvious blockers in the discussion, but the merge never
>>> happened?
>>
>> I will say that I tried to use PEG to resolve some gruesome text-parsing
>> issues in EBDB very recently, and failed to make it work in the hour or
>> two I'd allotted to the problem. The file-comment docs are pretty good,
>> but I think they would need to be expanded in a few crucial ways,
>> particularly to help those who don't necessarily know how PEGs work.
>>
>> Specifically, it is not obvious (to me) the ways in which PEGs (or maybe
>> just peg.el) are not fully declarative. It doesn't backtrack, and I
>> suspect it won't ever backtrack or isn't even supposed to, which means
>> users should be made explicitly aware of the ways in which their rules
>> can fail, and the ways in which declaration order matter. The comment
>> for the `or' construct reads:
>>
>> Prioritized Choice
>>
>> And that's about the only hint you get.
>
> As the comment in peg.el states, the definitions are adapted from the
> original PEG paper. There is even a link to paper and also to
> presentation explaining how peg works. I strongly advice you to read
> that. Prioritized Choice is explained there.
This is what I was saying in my original message, though: if peg.el is
going to go into core, it probably needs more/better docs than code
comments and "read this paper". Its likely users will be Elisp library
authors like me, who are just trying to free themselves from regexp hell
and want a relatively straightforward alternative.
I used peg.el to prototype search-string parsing in Gnus and everything
Just Worked the first time and it was pretty amazing. In my later
example below everything did not Just Work, but I think with some more
hand-holdy documentation it would have.
>> I was trying to parse a
>> multiword name like
>>
>> Eric Edwin Abrahamsen
>>
>> into the structure
>>
>> (("Eric" "Edwin") "Abrahamsen")
>>
>> using rules like
>>
>> (plain-name (substring (+ [word])) (* [space]))
>> (full-name (list (+ plain-name) plain-name)
>> `(names -- (list (butlast names) (car (last names)))))
>>
>> Which always fails to match because (+ plain-name) is greedy and eats up
>> all the words. It doesn't ever try leaving out the last word in an
>> attempt to make the rule match.
>
> One way is
>
> (with-peg-rules
> ((name (substring (+ [word])) (* [blank]))
> (given-name name (not (eol)))
> (last-name name (and (eol)))
> (full-name (list (+ given-name) last-name) `(names -- (list (butlast names) (car (last names))))))
> (peg-run (peg full-name)))
>
> A simple-minded non-greedy version would be ambiguous. You must
> necessarily indicate end of input.
>
> A more appropriate non-ambiguous non-greedy statement would involve or
> (which you admittedly did not understand):
>
> (with-peg-rules
> ((name (substring (+ [word])) (* [blank]))
> (given-name name)
> (last-name name (and (eol)))
> (full-name (list (+ (or last-name given-name)) (and (eol))) `(names -- (list (butlast names) (car (last names))))))
> ;;;;;;;;;;;;;;;;;;;;;^^
> (peg-run (peg full-name)))
Thanks! This is very helpful to my understanding. In this particular
case I'm putting strings in a temporary buffer, so signals like (eol) or
more likely (eob) are present and reliable.
>> I'm happy to write the docs (should it have its own info manual
>> section?), if we really think there are no other necessary
>> fixes/improvements.
>
> I find PEG to be a nice addition when regexps do not cut the necessary
> parsing, while using Bovine or tree-sitter is an overkill.
I've never tried tree-sitter, but I have tried and failed to make Bovine
do this sort of thing more than once over the years. I also agree that a
middle ground is needed.
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 15:08 ` Visuwesh
@ 2022-11-08 16:29 ` Juanma Barranquero
2022-12-02 20:20 ` Augusto Stoffel
0 siblings, 1 reply; 100+ messages in thread
From: Juanma Barranquero @ 2022-11-08 16:29 UTC (permalink / raw)
To: Visuwesh; +Cc: Stefan Monnier, Eric Abrahamsen, Ihor Radchenko, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 913 bytes --]
> [ Ever since I came to the university campus, explaining that I have no
> "last name" has been a recurring and fun activity. ]
My name, following Spanish uses, is "Juan Manuel" (name) "Barranquero Ríos"
(two surnames).
It's uncommon here to refer to someone by the two surnames, other than in
specific
situations. And I don't like my double name, so I *always* use Juanma,
except in official
documents. I introduce myself as Juanma Barranquero and that's how I
self-identify.
...Except that I went to Buenos Aires, coming from São Paulo, and I don't
know what did
the travel agency assume about my origins. All I know is that I was in a
hotel lobby and
suddenly they called for a "Mr. Juan Ríos" and I thought for a moment
"curious, that guy's
got the same name as my maternal grandfather"... Until it dawned on me, a
few seconds
later, that *I* was supposed to be "Mr. Juan Ríos".
[-- Attachment #2: Type: text/html, Size: 1681 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 16:10 ` Eric Abrahamsen
@ 2022-11-08 18:59 ` tomas
2022-11-08 19:42 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: tomas @ 2022-11-08 18:59 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 904 bytes --]
On Tue, Nov 08, 2022 at 08:10:55AM -0800, Eric Abrahamsen wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> >> And that's about the only hint you get. I was trying to parse a
> >> multiword name like
> >>
> >> Eric Edwin Abrahamsen
> >
> > Side note: the division between "given name" a "family name" is not
> > a universal property [...]
> Oh, I've gone down all the rabbit holes...
;-D
And this all because a small bunch of PEGs..,
> EBDB doesn't force this, it
> distinguishes between "complex" and "simple" names, and also allows
> "complex" names that only have a given name, or a list of given names,
> etc. Input can be done explicitly, field by field, or you can just chunk
> a string in there and see what happens. This peg.el adventure is only
> about "seeing what happens" with complex names.
I think we all got that. Still... ;-)
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 16:18 ` Eric Abrahamsen
@ 2022-11-08 19:08 ` tomas
2022-11-08 19:42 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: tomas @ 2022-11-08 19:08 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]
On Tue, Nov 08, 2022 at 08:18:15AM -0800, Eric Abrahamsen wrote:
> Ihor Radchenko <yantar92@posteo.net> writes:
>
> > Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> >
> >>> Is there any progress merging peg.el to Emacs?
> >>> I do not see any obvious blockers in the discussion, but the merge never
> >>> happened?
[...]
> > As the comment in peg.el states, the definitions are adapted from the
> > original PEG paper [...]
> This is what I was saying in my original message, though: if peg.el is
> going to go into core, it probably needs more/better docs than code
> comments and "read this paper". Its likely users will be Elisp library
> authors like me, who are just trying to free themselves from regexp hell
> and want a relatively straightforward alternative.
Yes. Coming from regexp they are deceivingly similar but frustratingly
different.
The best way I found to wrap my head around them is that they are a
fancy notation for a recursive descent parser. Thus slightly more
powerful than regexps, but slightly less than a full YACC (i.e. LALR
or thereabouts).
What is attractive about them is that one can do "full" parsers
(as long as your grammar is roughly LL(k)) without having to build
two storey buildings. I guess it takes some practice, though (I
haven't).
I think comparing them to treesitter is a category error.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 18:59 ` tomas
@ 2022-11-08 19:42 ` Eric Abrahamsen
2022-11-08 22:03 ` Tim Cross
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-08 19:42 UTC (permalink / raw)
To: emacs-devel
<tomas@tuxteam.de> writes:
> On Tue, Nov 08, 2022 at 08:10:55AM -0800, Eric Abrahamsen wrote:
>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>
>> >> And that's about the only hint you get. I was trying to parse a
>> >> multiword name like
>> >>
>> >> Eric Edwin Abrahamsen
>> >
>> > Side note: the division between "given name" a "family name" is not
>> > a universal property [...]
>
>> Oh, I've gone down all the rabbit holes...
>
> ;-D
>
> And this all because a small bunch of PEGs..,
Oh the rabbit holes started as soon as I started EBDB! Personal
information is complicated -- I won't claim it's as bad as timezones and
calendars, but it's pretty messy...
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 19:08 ` tomas
@ 2022-11-08 19:42 ` Eric Abrahamsen
2022-11-16 4:27 ` [PATCH] " Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-08 19:42 UTC (permalink / raw)
To: emacs-devel
<tomas@tuxteam.de> writes:
> On Tue, Nov 08, 2022 at 08:18:15AM -0800, Eric Abrahamsen wrote:
>> Ihor Radchenko <yantar92@posteo.net> writes:
>>
>> > Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>> >
>> >>> Is there any progress merging peg.el to Emacs?
>> >>> I do not see any obvious blockers in the discussion, but the merge never
>> >>> happened?
>
> [...]
>
>> > As the comment in peg.el states, the definitions are adapted from the
>> > original PEG paper [...]
>
>> This is what I was saying in my original message, though: if peg.el is
>> going to go into core, it probably needs more/better docs than code
>> comments and "read this paper". Its likely users will be Elisp library
>> authors like me, who are just trying to free themselves from regexp hell
>> and want a relatively straightforward alternative.
>
> Yes. Coming from regexp they are deceivingly similar but frustratingly
> different.
>
> The best way I found to wrap my head around them is that they are a
> fancy notation for a recursive descent parser. Thus slightly more
> powerful than regexps, but slightly less than a full YACC (i.e. LALR
> or thereabouts).
>
> What is attractive about them is that one can do "full" parsers
> (as long as your grammar is roughly LL(k)) without having to build
> two storey buildings. I guess it takes some practice, though (I
> haven't).
>
> I think comparing them to treesitter is a category error.
Okay, this is all sounding good. I'm going to read the paper, try to get
my head around all this, and write some docs for peg.el.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 19:42 ` Eric Abrahamsen
@ 2022-11-08 22:03 ` Tim Cross
0 siblings, 0 replies; 100+ messages in thread
From: Tim Cross @ 2022-11-08 22:03 UTC (permalink / raw)
To: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> <tomas@tuxteam.de> writes:
>
>> On Tue, Nov 08, 2022 at 08:10:55AM -0800, Eric Abrahamsen wrote:
>>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>
>>> >> And that's about the only hint you get. I was trying to parse a
>>> >> multiword name like
>>> >>
>>> >> Eric Edwin Abrahamsen
>>> >
>>> > Side note: the division between "given name" a "family name" is not
>>> > a universal property [...]
>>
>>> Oh, I've gone down all the rabbit holes...
>>
>> ;-D
>>
>> And this all because a small bunch of PEGs..,
>
> Oh the rabbit holes started as soon as I started EBDB! Personal
> information is complicated -- I won't claim it's as bad as timezones and
> calendars, but it's pretty messy...
Yes, a definite mine field. I worked in the identity management space
for a few years and this was a constant challenge. As Stefan noted,
there is nothing intrinsic about the name which tells you what case the
letters should have, the relationship between first/last name, cultural
differences - some locales don't have anything which corresponds to
first/last and some vary the order depending on the context or have
different names depending on the level of perceived formality etc. To
make it even more difficult, oddly enough, names are very personal and
people get upset when you get it wrong. Then you can add in things like
title e.g. Mr, Mrs, Ms etc and you open the whole gender identity issue.
Our general solution at the time was two fold
- As far as possible, allow the user to specify how they wanted to be
addressed or how their name was to be displayed 'on-line'. This may
require formal and informal versions
- Train/educate staff and developers to avoid unnecessary use of names,
title etc.
We also tried to avoid using culturally bias terms like 'surname' or
even 'first name' 'last name' as this simply doesn't map to anything
consistent for some locations.
Where I found the wheels often dropped off was when the legal department
got involved. My experience was they were the least culturally aware
area in the organisation. Not only did they often fail to recognise
external cultural differences, they were also slow to acknowledge
internal cultural evolution.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 6:57 ` Helmut Eller
2022-11-08 8:51 ` Ihor Radchenko
@ 2022-11-10 4:04 ` Richard Stallman
2022-11-10 5:25 ` tomas
1 sibling, 1 reply; 100+ messages in thread
From: Richard Stallman @ 2022-11-10 4:04 UTC (permalink / raw)
To: Helmut Eller; +Cc: eric, yantar92, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Would someone like to tell me in 10 lines what job peg.el does?
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-10 4:04 ` Richard Stallman
@ 2022-11-10 5:25 ` tomas
2022-11-10 8:15 ` Eli Zaretskii
2022-11-11 4:36 ` Richard Stallman
0 siblings, 2 replies; 100+ messages in thread
From: tomas @ 2022-11-10 5:25 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]
On Wed, Nov 09, 2022 at 11:04:48PM -0500, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> Would someone like to tell me in 10 lines what job peg.el does?
PEG (Parsing Expression Grammars [1]) is a grammar notation which can
be automatically translated into a parser (think regular expressions).
The notation is actually similar to that of regexps. The main difference
is that the "alternative" operator is an "ordered" choice instead of an
ambiguous choice. To compensate for this, the notation provides for a
(potential) lookahead mechanism, which, in the naive implementation would
lead to exponential running time in the worst case. The canonical
implementation (nicknamed "packrat") addresses that by memoizing.
Basically they can do what a recursive descent parser can, are thus
slightly more powerful than regexps. They lead to nice little grammars,
but they do take some practice to be useful.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-10 5:25 ` tomas
@ 2022-11-10 8:15 ` Eli Zaretskii
2022-11-10 8:29 ` tomas
2022-11-11 4:36 ` Richard Stallman
1 sibling, 1 reply; 100+ messages in thread
From: Eli Zaretskii @ 2022-11-10 8:15 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
> Date: Thu, 10 Nov 2022 06:25:55 +0100
> From: <tomas@tuxteam.de>
>
> > Would someone like to tell me in 10 lines what job peg.el does?
>
> PEG (Parsing Expression Grammars [1]) is a grammar notation which can
> be automatically translated into a parser (think regular expressions).
The reference [1] was probably meant to be
https://en.wikipedia.org/wiki/Parsing_expression_grammar
or somesuch
> The notation is actually similar to that of regexps.
I believe you meant "similar to regular expressions in rx form"?
> The main difference
> is that the "alternative" operator is an "ordered" choice instead of an
> ambiguous choice. To compensate for this, the notation provides for a
> (potential) lookahead mechanism, which, in the naive implementation would
> lead to exponential running time in the worst case. The canonical
> implementation (nicknamed "packrat") addresses that by memoizing.
>
> Basically they can do what a recursive descent parser can, are thus
> slightly more powerful than regexps. They lead to nice little grammars,
> but they do take some practice to be useful.
I think an example from peg.el will clarify the issue:
;; This file implements the macros `define-peg-rule', `with-peg-rules', and
;; `peg-parse' which parses the current buffer according to a PEG.
;; E.g. we can match integers with:
;;
;; (with-peg-rules
;; ((number sign digit (* digit))
;; (sign (or "+" "-" ""))
;; (digit [0-9]))
;; (peg-run (peg number)))
;; or
;; (define-peg-rule digit ()
;; [0-9])
;; (peg-parse (number sign digit (* digit))
;; (sign (or "+" "-" "")))
HTH
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-10 8:15 ` Eli Zaretskii
@ 2022-11-10 8:29 ` tomas
0 siblings, 0 replies; 100+ messages in thread
From: tomas @ 2022-11-10 8:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1738 bytes --]
On Thu, Nov 10, 2022 at 10:15:23AM +0200, Eli Zaretskii wrote:
> > Date: Thu, 10 Nov 2022 06:25:55 +0100
> > From: <tomas@tuxteam.de>
> >
> > > Would someone like to tell me in 10 lines what job peg.el does?
> >
> > PEG (Parsing Expression Grammars [1]) is a grammar notation which can
> > be automatically translated into a parser (think regular expressions).
>
> The reference [1] was probably meant to be
>
> https://en.wikipedia.org/wiki/Parsing_expression_grammar
Thanks. -ENOCOFFEE, possibly :)
> or somesuch
>
> > The notation is actually similar to that of regexps.
>
> I believe you meant "similar to regular expressions in rx form"?
I wasn't particularly thinking of rx. PEGs in general also have a
"classical" spelling which looks deceivingly similar to "classic"
regular expressions.
[...]
> I think an example from peg.el will clarify the issue:
>
> ;; This file implements the macros `define-peg-rule', `with-peg-rules', and
> ;; `peg-parse' which parses the current buffer according to a PEG.
> ;; E.g. we can match integers with:
> ;;
> ;; (with-peg-rules
> ;; ((number sign digit (* digit))
> ;; (sign (or "+" "-" ""))
> ;; (digit [0-9]))
> ;; (peg-run (peg number)))
> ;; or
> ;; (define-peg-rule digit ()
> ;; [0-9])
> ;; (peg-parse (number sign digit (* digit))
> ;; (sign (or "+" "-" "")))
Thanks for the example. This one stresses tha main "selling point" of
PEGs, that they can "do" a grown up parser without separating lex and
"parse" into two "floors". The downside is that they aren't as "tall"
(as the customary LALR/regexp combo).
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-10 5:25 ` tomas
2022-11-10 8:15 ` Eli Zaretskii
@ 2022-11-11 4:36 ` Richard Stallman
1 sibling, 0 replies; 100+ messages in thread
From: Richard Stallman @ 2022-11-11 4:36 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Would someone like to tell me in 10 lines what job peg.el does?
> PEG (Parsing Expression Grammars [1]) is a grammar notation which can
> be automatically translated into a parser (think regular expressions).
...
Thanks.
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 100+ messages in thread
* [PATCH] Re: Make peg.el a built-in library?
2022-11-08 19:42 ` Eric Abrahamsen
@ 2022-11-16 4:27 ` Eric Abrahamsen
2022-11-16 5:07 ` tomas
` (2 more replies)
0 siblings, 3 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-16 4:27 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> <tomas@tuxteam.de> writes:
>
>> On Tue, Nov 08, 2022 at 08:18:15AM -0800, Eric Abrahamsen wrote:
>>> Ihor Radchenko <yantar92@posteo.net> writes:
>>>
>>> > Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>> >
>>> >>> Is there any progress merging peg.el to Emacs?
>>> >>> I do not see any obvious blockers in the discussion, but the merge never
>>> >>> happened?
>>
>> [...]
>>
>>> > As the comment in peg.el states, the definitions are adapted from the
>>> > original PEG paper [...]
>>
>>> This is what I was saying in my original message, though: if peg.el is
>>> going to go into core, it probably needs more/better docs than code
>>> comments and "read this paper". Its likely users will be Elisp library
>>> authors like me, who are just trying to free themselves from regexp hell
>>> and want a relatively straightforward alternative.
>>
>> Yes. Coming from regexp they are deceivingly similar but frustratingly
>> different.
>>
>> The best way I found to wrap my head around them is that they are a
>> fancy notation for a recursive descent parser. Thus slightly more
>> powerful than regexps, but slightly less than a full YACC (i.e. LALR
>> or thereabouts).
>>
>> What is attractive about them is that one can do "full" parsers
>> (as long as your grammar is roughly LL(k)) without having to build
>> two storey buildings. I guess it takes some practice, though (I
>> haven't).
>>
>> I think comparing them to treesitter is a category error.
>
> Okay, this is all sounding good. I'm going to read the paper, try to get
> my head around all this, and write some docs for peg.el.
Okay, here's a first stab. I read the paper, and understood about half
of it, which seemed like enough. It was interesting to see that the
paper explicitly calls out the exact greedy-matching behavior I'd
encountered.
I'm sure I've got some of the conventions wrong, here, and it's
unfortunate that there's already a manual node called "Expression
Parsing", but I don't know what to call this except "Expression Parsing
Grammars"...
Eric
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pexmanual.diff --]
[-- Type: text/x-patch, Size: 10722 bytes --]
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index a3d1d80408..6440728541 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -222,6 +222,7 @@ Top
* Non-ASCII Characters:: Non-ASCII text in buffers and strings.
* Searching and Matching:: Searching buffers for strings or regexps.
* Syntax Tables:: The syntax table controls word and list parsing.
+* Parsing Expression Grammars:: Parsing buffer text.
* Abbrevs:: How Abbrev mode works, and its data structures.
* Threads:: Concurrency in Emacs Lisp.
@@ -1703,6 +1704,7 @@ Top
@include searching.texi
@include syntax.texi
+@include peg.texi
@include abbrevs.texi
@include threads.texi
@include processes.texi
diff --git a/doc/lispref/peg.texi b/doc/lispref/peg.texi
new file mode 100644
index 0000000000..ec3962d7bf
--- /dev/null
+++ b/doc/lispref/peg.texi
@@ -0,0 +1,314 @@
+@c -*-texinfo-*-
+@c This is part of the GNU Emacs Lisp Reference Manual.
+@c Copyright (C) 1990--1995, 1998--1999, 2001--2022 Free Software
+@c Foundation, Inc.
+@c See the file elisp.texi for copying conditions.
+@node Parsing Expression Grammars
+@chapter Parsing Expression Grammars
+@cindex text parsing
+
+ Emacs Lisp provide several tools for parsing and matching text, from
+regular expressions (@pxref{Regular Expressions}) to full @acronym{LL}
+grammar parsers (@pxref{Top,, Bovine parser development, bovine}).
+@dfn{Parsing Expression Grammars} (@acronym{PEG}) are another approach
+to text parsing that offer more structure and composibility than
+regular expressions, but less complexity than context-free grammars.
+
+A @acronym{PEG} parser is defined as a list of named rules, each of
+which match text patterns, and/or contain references to other rules.
+Parsing is initiated with the function @code{peg-run} or the macro
+@code{peg-parse}, and parses text after point in the current buffer,
+using a given set of rules.
+
+The definition of each rule is referred to as a @dfn{parsing
+expression} (@acronym{PEX}), and can consist of a literal string, a
+regexp-like character range or set, a peg-specific construct
+resembling an elisp function call, a reference to another rule, or a
+combination of any of these. A grammar is expressed as a set of rules
+in which one rule is typically treated as a ``top-level'' or
+``entry-point'' rule. For instance:
+
+@example
+@group
+((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+@end group
+@end example
+
+The above grammar could be used directly in a call to
+@code{peg-parse}, in which the first rule is considered the
+``entry-point'' rule:
+
+@example
+(peg-parse
+ ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9])))
+@end example
+
+Or set as the value of a variable, and the variable used in a
+combination of calls to @code{with-peg-rules} and @code{peg-run},
+where the ``entry-point'' rule is given explicitly:
+
+@example
+(defvar number-grammar
+ '((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9])))
+
+(with-peg-rules number-grammar
+ (peg-run (peg number)))
+@end example
+
+By default, calls to @code{peg-run} or @code{peg-parse} produce no
+output: parsing simply moves point. In order to return or otherwise
+act upon parsed strings, rules can include @dfn{actions}, see
+@xref{Parsing Actions} for more information.
+
+Individual rules can also be defined using a more @code{defun}-like
+syntax, using the macro @code{define-peg-rule}:
+
+@example
+(define-peg-rule digit ()
+ [0-9])
+@end example
+
+This allows the rule to be referred to by name within calls to
+@code{peg-run} or @code{peg-parse} elsewhere, and also allows the use
+of function arguments in the rule body.
+
+@node PEX Definitions
+@section PEX Definitions
+
+Parsing expressions can be defined using the following syntax:
+
+@table @code
+@item (and E1 E2 ...)
+A sequence of PEXs that must all be matched. The @code{and} form is
+optional and implicit.
+
+@item (or E1 E2 ...)
+Prioritized choices, meaning that, as in Elisp, the choices are tried
+in order, and the first successful match is used.
+
+@item (any)
+Matches any single character, as the regexp ``.''.
+
+@item "abc"
+A literal string.
+
+@item (char C)
+A single character, as an Elisp character literal.
+
+@item (* E)
+Zero or more of an expression, as the regexp ``*''.
+
+@item (+ E)
+One or more of an expression, as the regexp ``+''.
+
+@item (opt E)
+Zero or one of an expression, as the regexp ``?''.
+
+@item SYMBOL
+A symbol representing a previously-define PEG rule.
+
+@item (range A B)
+The character range between A and B, as the regexp ``[A-B]''.
+
+@item [a-b "+*" ?x]
+A character set, including ranges, literal characters, or strings of
+characters.
+
+@item [ascii cntrl]
+A list of named character classes (see below).
+
+@item (syntax-class NAME)
+A single syntax class.
+
+@item (null)
+The empty string.
+@end table
+
+The following expressions are used as anchors -- they do not move
+point.
+
+@table @code
+@item (bob)
+Beginning of buffer.
+
+@item (eob)
+End of buffer.
+
+@item (bol)
+Beginning of line.
+
+@item (eol)
+End of line.
+
+@item (bow)
+Beginning of word.
+
+@item (eow)
+End of word.
+
+@item (bos)
+Beginning of symbol.
+
+@item (eos)
+End of symbol.
+@end table
+
+The following expressions are used as booleans, to constrain matching
+(@pxref{Writing PEG Rules}), and do not move point.
+
+@table @code
+@item (not E)
+@item (if E)
+@item (guard EXP)
+@end table
+
+@vindex peg-char-classes
+Named character classes include the following:
+
+@itemize
+@item ascii
+@item alnum
+@item alpha
+@item blank
+@item cntrl
+@item digit
+@item graph
+@item lower
+@item multibyte
+@item nonascii
+@item print
+@item punct
+@item space
+@item unibyte
+@item upper
+@item word
+@item xdigit
+@end itemize
+
+@node Parsing Actions
+@section Parsing Actions
+
+By default the process of parsing simply moves point in the current
+buffer, ultimately returning @code{t} if the parsing succeeds, and
+@code{nil} if it doesn't. It's also possible to define ``actions''
+that can run arbitrary Elisp at certain points during parsing. These
+actions can affect something called the @dfn{parsing stack}: a list of
+values built up during the course of parsing. If the stack is
+non-@code{nil} at the end of parsing, it is returned as the final
+value of the parsing process.
+
+Actions can be added anywhere in the definition of a rule. They are
+distinguished from parsing expressions by an initial backquote
+(@samp{`}), followed by a parenthetical form that must contain a pair
+of hyphens (@samp{--}) somewhere within it. Symbols to the left of
+the hyphens are bound to values popped from the stack (they are
+somewhat analogous to the argument list in a lambda). Values produced
+by code to the right are pushed to the stack (analogous to the return
+value of the lambda). For instance, the previous grammar can be
+augmented with actions to return the parsed number as an actual
+integer:
+
+@example
+(with-peg-rules ((number sign digit (* digit
+ `(a b -- (+ (* a 10) b)))
+ `(sign val -- (* sign val)))
+ (sign (or (and "+" `(-- 1))
+ (and "-" `(-- -1))
+ (and "" `(-- 1))))
+ (digit [0-9] `(-- (- (char-before) ?0))))
+ (peg-run (peg number)))
+@end example
+
+There must be values on the stack before they can be popped and
+returned. An action with no left-hand terms will only push values to
+the stack; an action with no right-hand terms will consume (and
+discard) values from the stack.
+
+To return the string matched by a PEX (instead of simply moving point
+over it), a rule like this can be used:
+
+@example
+(one-word
+ `(-- (point))
+ (+ [word])
+ `(start -- (buffer-substring start (point))))
+@end example
+
+The first action pushes the initial value of point to the stack. The
+intervening PEX moves point over the next word. The second action pops
+the previous value from the stack (binding it to the variable
+@code{start}), and uses that value to extract a substring from the
+buffer and push it to the stack. This pattern is so common that
+peg.el provides a shorthand function that does exactly the above,
+along with a few other shorthands for common scenarios:
+
+@table @code
+@item (substring E)
+Match PEX E and push the matched string to the stack.
+
+@item (region E)
+Match E and push the start and end positions of the matched region to
+the stack.
+
+@item (replace E "repl")
+Match E and replaced the matched region with the string "repl".
+
+@item (list E)
+Match E, collect all values produced by E (and its sub-expressions)
+into a list, and push that list to the stack.
+@end table
+
+It is up to the grammar author to keep track of which rules and
+sub-rules push values to the stack, and the state of the stack at any
+given point in the parsing. If an action pops values from an empty
+stack, the symbols will be bound to @code{nil}.
+
+@node Writing PEG Rules
+@section Writing PEG Rules
+
+Something to be aware of when writing PEG rules is that they are
+greedy. Rules which consume a variable amount of text will always
+consume the maximum amount possible, even if that causes a rule that
+might otherwise have matched to fail later on. For instance, this
+rule will never succeed:
+
+@example
+(forest (+ "tree" (* [blank])) "tree" (eol))
+@end example
+
+The @acronym{PEX} @code{(+ "tree" (* [blank]))} will consume all
+repetitions of the word ``tree'', leaving none to match the final
+@code{"tree"}.
+
+In these situations, the desired result can be obtained by using
+predicates and guards -- namely the @code{not}, @code{if} and
+@code{guard} expressions -- to restrict behavior. For instance:
+
+@example
+(forest (+ "tree" (* [blank])) (not (eol)) "tree" (eol))
+@end example
+
+The @code{if} and @code{not} operators accept a parsing expression and
+interpret it as a boolean, without moving point. The contents of a
+@code{guard} operator are evaluated as regular Elisp (not a
+@acronym{PEX}) and should return a boolean value. A @code{nil} value
+causes the match to fail.
+
+Another potentially unexpected behavior is that parsing will move
+point as far as possible, even if the parsing ultimately fails. This
+rule:
+
+@example
+(end-game "game" (eob))
+@end example
+
+when run in a buffer containing the text ``game over'' after point,
+will move point to just after ``game'' and halt parsing, returning
+@code{nil}. Successful parsing will always return @code{t}, or the
+contexts of the parsing stack.
^ permalink raw reply related [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 4:27 ` [PATCH] " Eric Abrahamsen
@ 2022-11-16 5:07 ` tomas
2022-11-16 5:39 ` Eric Abrahamsen
2022-11-16 6:24 ` Ihor Radchenko
2023-01-11 7:39 ` Michael Heerdegen
2 siblings, 1 reply; 100+ messages in thread
From: tomas @ 2022-11-16 5:07 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 918 bytes --]
On Tue, Nov 15, 2022 at 08:27:56PM -0800, Eric Abrahamsen wrote:
[...]
> Okay, here's a first stab. I read the paper, and understood about half
> of it, which seemed like enough. It was interesting to see that the
> paper explicitly calls out the exact greedy-matching behavior I'd
> encountered.
Half of it sounds like double as much as I understood ;-)
Seriously: thanks for your work. And thanks to the original authors!
> I'm sure I've got some of the conventions wrong, here, and it's
> unfortunate that there's already a manual node called "Expression
> Parsing", but I don't know what to call this except "Expression Parsing
> Grammars"...
Hm. Perhaps "Parsing Expression Grammars" might be less confusing,
since it is the moniker which has established itself. Admittedly,
it grammars a bit awkwardly, but people having seen it once will
rather recognize that one.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 5:07 ` tomas
@ 2022-11-16 5:39 ` Eric Abrahamsen
2022-11-16 15:53 ` tomas
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-16 5:39 UTC (permalink / raw)
To: emacs-devel
<tomas@tuxteam.de> writes:
> On Tue, Nov 15, 2022 at 08:27:56PM -0800, Eric Abrahamsen wrote:
>
> [...]
>
>> Okay, here's a first stab. I read the paper, and understood about half
>> of it, which seemed like enough. It was interesting to see that the
>> paper explicitly calls out the exact greedy-matching behavior I'd
>> encountered.
>
> Half of it sounds like double as much as I understood ;-)
The second half is all mathematical notation. I only understand
mathematics when it's written in plain English :)
> Seriously: thanks for your work. And thanks to the original authors!
>
>> I'm sure I've got some of the conventions wrong, here, and it's
>> unfortunate that there's already a manual node called "Expression
>> Parsing", but I don't know what to call this except "Expression Parsing
>> Grammars"...
>
> Hm. Perhaps "Parsing Expression Grammars" might be less confusing,
> since it is the moniker which has established itself. Admittedly,
> it grammars a bit awkwardly, but people having seen it once will
> rather recognize that one.
I foolishly reversed both of those: the new node *is* called "Parsing
Expression Grammars", and the existing node is called "Parsing
Expressions". Same issue, just reversed...
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 4:27 ` [PATCH] " Eric Abrahamsen
2022-11-16 5:07 ` tomas
@ 2022-11-16 6:24 ` Ihor Radchenko
2022-11-16 18:15 ` Eric Abrahamsen
2023-01-11 7:39 ` Michael Heerdegen
2 siblings, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2022-11-16 6:24 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Okay, here's a first stab. I read the paper, and understood about half
> of it, which seemed like enough. It was interesting to see that the
> paper explicitly calls out the exact greedy-matching behavior I'd
> encountered.
Thanks!
> + Emacs Lisp provide several tools for parsing and matching text, from
provides
> +regular expressions (@pxref{Regular Expressions}) to full @acronym{LL}
> +grammar parsers (@pxref{Top,, Bovine parser development, bovine}).
> +@dfn{Parsing Expression Grammars} (@acronym{PEG}) are another approach
> +to text parsing that offer more structure and composibility than
> +regular expressions, but less complexity than context-free grammars.
> +
> +A @acronym{PEG} parser is defined as a list of named rules, each of
> +which match text patterns, and/or contain references to other rules.
> +Parsing is initiated with the function @code{peg-run} or the macro
> +@code{peg-parse}, and parses text after point in the current buffer,
> +using a given set of rules.
> +
> +The definition of each rule is referred to as a @dfn{parsing
> +expression} (@acronym{PEX}), and can consist of a literal string, a
> +regexp-like character range or set, a peg-specific construct
> +resembling an elisp function call, a reference to another rule, or a
> +combination of any of these. A grammar is expressed as a set of rules
> +in which one rule is typically treated as a ``top-level'' or
> +``entry-point'' rule. For instance:
> +
> +@example
> +@group
> +((number sign digit (* digit))
> + (sign (or "+" "-" ""))
> + (digit [0-9]))
> +@end group
> +@end example
> +
> +The above grammar could be used directly in a call to
> +@code{peg-parse}, in which the first rule is considered the
> +``entry-point'' rule:
> +
> +@example
> +(peg-parse
> + ((number sign digit (* digit))
> + (sign (or "+" "-" ""))
> + (digit [0-9])))
> +@end example
> +
> +Or set as the value of a variable, and the variable used in a
> +combination of calls to @code{with-peg-rules} and @code{peg-run},
> +where the ``entry-point'' rule is given explicitly:
> +
> +@example
> +(defvar number-grammar
> + '((number sign digit (* digit))
> + (sign (or "+" "-" ""))
> + (digit [0-9])))
> +
> +(with-peg-rules number-grammar
> + (peg-run (peg number)))
> +@end example
> +
> +By default, calls to @code{peg-run} or @code{peg-parse} produce no
> +output: parsing simply moves point. In order to return or otherwise
> +act upon parsed strings, rules can include @dfn{actions}, see
> +@xref{Parsing Actions} for more information.
> +
> +Individual rules can also be defined using a more @code{defun}-like
> +syntax, using the macro @code{define-peg-rule}:
> +
> +@example
> +(define-peg-rule digit ()
> + [0-9])
> +@end example
> +
> +This allows the rule to be referred to by name within calls to
> +@code{peg-run} or @code{peg-parse} elsewhere, and also allows the use
> +of function arguments in the rule body.
> +
> +@node PEX Definitions
> +@section PEX Definitions
> +
> +Parsing expressions can be defined using the following syntax:
> +
> +@table @code
> +@item (and E1 E2 ...)
> +A sequence of PEXs that must all be matched. The @code{and} form is
> +optional and implicit.
> +
> +@item (or E1 E2 ...)
> +Prioritized choices, meaning that, as in Elisp, the choices are tried
> +in order, and the first successful match is used.
It is worth highlighting that it is different from CFGs.
> +@item (* E)
> +Zero or more of an expression, as the regexp ``*''.
> +
> +@item (+ E)
> +One or more of an expression, as the regexp ``+''.
It is worth highlighting the greedy part here and referring to &A and
!A.
> +@item SYMBOL
> +A symbol representing a previously-define PEG rule.
defined
> +By default the process of parsing simply moves point in the current
> +buffer, ultimately returning @code{t} if the parsing succeeds, and
> +@code{nil} if it doesn't. It's also possible to define ``actions''
> +that can run arbitrary Elisp at certain points during parsing. These
> +actions can affect something called the @dfn{parsing stack}: a list of
> +values built up during the course of parsing. If the stack is
> +non-@code{nil} at the end of parsing, it is returned as the final
> +value of the parsing process.
Actions are only run when the expression matches; with point moved after
the match, right? What about &A and !A?
> +There must be values on the stack before they can be popped and
> +returned.
What if there is just one value in the stack while the action required two?
> +@item (list E)
> +Match E, collect all values produced by E (and its sub-expressions)
> +into a list, and push that list to the stack.
> +@end table
This one is not very clear. Does it imply that E is recursively wrapped
into substring?
> +It is up to the grammar author to keep track of which rules and
> +sub-rules push values to the stack, and the state of the stack at any
> +given point in the parsing. If an action pops values from an empty
> +stack, the symbols will be bound to @code{nil}.
The part about popping out of empty stack looks out of scope. Maybe move
it to earlier discussion of variable bindings in actions?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 5:39 ` Eric Abrahamsen
@ 2022-11-16 15:53 ` tomas
0 siblings, 0 replies; 100+ messages in thread
From: tomas @ 2022-11-16 15:53 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
On Tue, Nov 15, 2022 at 09:39:25PM -0800, Eric Abrahamsen wrote:
> <tomas@tuxteam.de> writes:
[...]
> > Hm. Perhaps "Parsing Expression Grammars" might be less confusing,
[...]
> I foolishly reversed both of those: the new node *is* called "Parsing
> Expression Grammars", and the existing node is called "Parsing
> Expressions". Same issue, just reversed...
Kind of makes sense :)
Thanks
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 6:24 ` Ihor Radchenko
@ 2022-11-16 18:15 ` Eric Abrahamsen
2022-11-17 12:21 ` Ihor Radchenko
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-16 18:15 UTC (permalink / raw)
To: emacs-devel
Ihor Radchenko <yantar92@posteo.net> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Okay, here's a first stab. I read the paper, and understood about half
>> of it, which seemed like enough. It was interesting to see that the
>> paper explicitly calls out the exact greedy-matching behavior I'd
>> encountered.
>
> Thanks!
And thanks for the review! I'll add in all your simpler notes; more
responses below.
>> + Emacs Lisp provide several tools for parsing and matching text, from
>
> provides
>
>> +regular expressions (@pxref{Regular Expressions}) to full @acronym{LL}
>> +grammar parsers (@pxref{Top,, Bovine parser development, bovine}).
>> +@dfn{Parsing Expression Grammars} (@acronym{PEG}) are another approach
>> +to text parsing that offer more structure and composibility than
>> +regular expressions, but less complexity than context-free grammars.
>> +
>> +A @acronym{PEG} parser is defined as a list of named rules, each of
>> +which match text patterns, and/or contain references to other rules.
>> +Parsing is initiated with the function @code{peg-run} or the macro
>> +@code{peg-parse}, and parses text after point in the current buffer,
>> +using a given set of rules.
>> +
>> +The definition of each rule is referred to as a @dfn{parsing
>> +expression} (@acronym{PEX}), and can consist of a literal string, a
>> +regexp-like character range or set, a peg-specific construct
>> +resembling an elisp function call, a reference to another rule, or a
>> +combination of any of these. A grammar is expressed as a set of rules
>> +in which one rule is typically treated as a ``top-level'' or
>> +``entry-point'' rule. For instance:
>> +
>> +@example
>> +@group
>> +((number sign digit (* digit))
>> + (sign (or "+" "-" ""))
>> + (digit [0-9]))
>> +@end group
>> +@end example
>> +
>> +The above grammar could be used directly in a call to
>> +@code{peg-parse}, in which the first rule is considered the
>> +``entry-point'' rule:
>> +
>> +@example
>> +(peg-parse
>> + ((number sign digit (* digit))
>> + (sign (or "+" "-" ""))
>> + (digit [0-9])))
>> +@end example
>> +
>> +Or set as the value of a variable, and the variable used in a
>> +combination of calls to @code{with-peg-rules} and @code{peg-run},
>> +where the ``entry-point'' rule is given explicitly:
>> +
>> +@example
>> +(defvar number-grammar
>> + '((number sign digit (* digit))
>> + (sign (or "+" "-" ""))
>> + (digit [0-9])))
>> +
>> +(with-peg-rules number-grammar
>> + (peg-run (peg number)))
>> +@end example
>> +
>> +By default, calls to @code{peg-run} or @code{peg-parse} produce no
>> +output: parsing simply moves point. In order to return or otherwise
>> +act upon parsed strings, rules can include @dfn{actions}, see
>> +@xref{Parsing Actions} for more information.
>> +
>> +Individual rules can also be defined using a more @code{defun}-like
>> +syntax, using the macro @code{define-peg-rule}:
>> +
>> +@example
>> +(define-peg-rule digit ()
>> + [0-9])
>> +@end example
>> +
>> +This allows the rule to be referred to by name within calls to
>> +@code{peg-run} or @code{peg-parse} elsewhere, and also allows the use
>> +of function arguments in the rule body.
>> +
>> +@node PEX Definitions
>> +@section PEX Definitions
>> +
>> +Parsing expressions can be defined using the following syntax:
>> +
>> +@table @code
>> +@item (and E1 E2 ...)
>> +A sequence of PEXs that must all be matched. The @code{and} form is
>> +optional and implicit.
>> +
>> +@item (or E1 E2 ...)
>> +Prioritized choices, meaning that, as in Elisp, the choices are tried
>> +in order, and the first successful match is used.
>
> It is worth highlighting that it is different from CFGs.
>
>> +@item (* E)
>> +Zero or more of an expression, as the regexp ``*''.
>> +
>> +@item (+ E)
>> +One or more of an expression, as the regexp ``+''.
>
> It is worth highlighting the greedy part here and referring to &A and
> !A.
I don't believe there is separate syntax for &A and !A -- those are
written (if A) and (not A).
>> +@item SYMBOL
>> +A symbol representing a previously-define PEG rule.
>
> defined
>
>> +By default the process of parsing simply moves point in the current
>> +buffer, ultimately returning @code{t} if the parsing succeeds, and
>> +@code{nil} if it doesn't. It's also possible to define ``actions''
>> +that can run arbitrary Elisp at certain points during parsing. These
>> +actions can affect something called the @dfn{parsing stack}: a list of
>> +values built up during the course of parsing. If the stack is
>> +non-@code{nil} at the end of parsing, it is returned as the final
>> +value of the parsing process.
>
> Actions are only run when the expression matches; with point moved after
> the match, right? What about &A and !A?
That's right, actions only run if the parsing succeeds, and they run all
at once at the end. Maybe I can move all discussons of parsing success
vs failure into one place.
>> +There must be values on the stack before they can be popped and
>> +returned.
>
> What if there is just one value in the stack while the action required two?
>
>> +@item (list E)
>> +Match E, collect all values produced by E (and its sub-expressions)
>> +into a list, and push that list to the stack.
>> +@end table
>
> This one is not very clear. Does it imply that E is recursively wrapped
> into substring?
It's not very clear because I don't fully understand it! It does not
implicitly create any value-returning calls (such as `substring'). I
think what it means is that, by default, values returned by actions are
all spliced into a single flat list. If you need some of those values to
be returned in a sub-list, you can use this form.
It's a bit tricky to use because the E in (list E) could potentially
descend many levels and branch out into any number of sub-expressions,
so you need to have a clear mental model of what values might ultimately
be coming out of E. I guess that's also true for the whole thing,
though.
>> +It is up to the grammar author to keep track of which rules and
>> +sub-rules push values to the stack, and the state of the stack at any
>> +given point in the parsing. If an action pops values from an empty
>> +stack, the symbols will be bound to @code{nil}.
>
> The part about popping out of empty stack looks out of scope. Maybe move
> it to earlier discussion of variable bindings in actions?
Okay, I'll remove this, and just add a shorter note up above about empty
stacks.
Thanks again,
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 18:15 ` Eric Abrahamsen
@ 2022-11-17 12:21 ` Ihor Radchenko
2022-11-27 1:46 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2022-11-17 12:21 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>> +@item (* E)
>>> +Zero or more of an expression, as the regexp ``*''.
>>> +
>>> +@item (+ E)
>>> +One or more of an expression, as the regexp ``+''.
>>
>> It is worth highlighting the greedy part here and referring to &A and
>> !A.
>
> I don't believe there is separate syntax for &A and !A -- those are
> written (if A) and (not A).
Indeed. I just felt lazy to write (if A) and (not A) and wrote &A and !A :)
The comment is suggesting to add reference to the (if A)/(not A) and the
"Writing PEGs" section.
>> Actions are only run when the expression matches; with point moved after
>> the match, right? What about &A and !A?
>
> That's right, actions only run if the parsing succeeds, and they run all
> at once at the end. Maybe I can move all discussons of parsing success
> vs failure into one place.
I think that there might be confusion here because people are used to
full success/full failure but not to partial success.
And (if A) feels even more confusing because it does not actually move
point and does not advance the parser. So, it is unclear what success
means and what is the buffer/stack context when action is executed.
>>> +@item (list E)
>>> +Match E, collect all values produced by E (and its sub-expressions)
>>> +into a list, and push that list to the stack.
>>> +@end table
>>
>> This one is not very clear. Does it imply that E is recursively wrapped
>> into substring?
>
> It's not very clear because I don't fully understand it! It does not
> implicitly create any value-returning calls (such as `substring'). I
> think what it means is that, by default, values returned by actions are
> all spliced into a single flat list. If you need some of those values to
> be returned in a sub-list, you can use this form.
>
> It's a bit tricky to use because the E in (list E) could potentially
> descend many levels and branch out into any number of sub-expressions,
> so you need to have a clear mental model of what values might ultimately
> be coming out of E. I guess that's also true for the whole thing,
> though.
I also don't fully understand this, but I tried to play around with the
following:
(with-peg-rules
((name (substring (+ [word])) (* [blank]))
(given-name name (not (eol)))
(last-name (list name) (if (eol)))
(full-name (list (+ given-name)) last-name))
(peg-run (peg full-name)))
;; <point>Eric Edwin Abrahamsen
;; => (("Abrahamsen") ("Eric" "Edwin"))
;; Suggested stack states:
;; 1. nil
;; 2. Match Eric via given-name: ("Eric")
;; 3. Match Edwin via given-name: ("Edwin" "Eric")
;; 4. No more match for given-name. List operation: (("Eric" "Edwin"))
;; 5. Match Abrahamsen via last-name. ("Abrahamsen" ("Eric" "Edwin"))
;; 6. Done with last-name. List operation: (("Abrahamsen") ("Eric" "Edwin"))
;; 7. done
So, one may think that the stack values coming from E in (list E) are
simply reversed, wrapped into a list, and pushed back into the stack.
Kind of group operation.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-17 12:21 ` Ihor Radchenko
@ 2022-11-27 1:46 ` Eric Abrahamsen
2022-11-27 8:57 ` Eli Zaretskii
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-27 1:46 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]
Ihor Radchenko <yantar92@posteo.net> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>>>> +@item (* E)
>>>> +Zero or more of an expression, as the regexp ``*''.
>>>> +
>>>> +@item (+ E)
>>>> +One or more of an expression, as the regexp ``+''.
>>>
>>> It is worth highlighting the greedy part here and referring to &A and
>>> !A.
>>
>> I don't believe there is separate syntax for &A and !A -- those are
>> written (if A) and (not A).
>
> Indeed. I just felt lazy to write (if A) and (not A) and wrote &A and !A :)
>
> The comment is suggesting to add reference to the (if A)/(not A) and the
> "Writing PEGs" section.
>
>>> Actions are only run when the expression matches; with point moved after
>>> the match, right? What about &A and !A?
>>
>> That's right, actions only run if the parsing succeeds, and they run all
>> at once at the end. Maybe I can move all discussons of parsing success
>> vs failure into one place.
>
> I think that there might be confusion here because people are used to
> full success/full failure but not to partial success.
>
> And (if A) feels even more confusing because it does not actually move
> point and does not advance the parser. So, it is unclear what success
> means and what is the buffer/stack context when action is executed.
Here's a new version, that I hope clarifies these questions (instead of
doing the opposite).
Note that there's an open peg.el bug now (#59345), about whether the
"syntax-class" PEX is supposed to advance point or not -- you'd think
that it would, but it doesn't. No word from the author yet.
Lastly, nobody with a maintainer's hat on has actually given the green
light on this, and I assume we'll want to hold off until the next
version of Emacs is released; anyway it would be good to know what
Eli/Lars think. I haven't done any NEWS additions or anything, either.
Thanks!
Eric
[-- Attachment #2: peg.texi --]
[-- Type: application/x-texinfo, Size: 10028 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-27 1:46 ` Eric Abrahamsen
@ 2022-11-27 8:57 ` Eli Zaretskii
2022-11-28 1:09 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Eli Zaretskii @ 2022-11-27 8:57 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: yantar92, emacs-devel
> From: Eric Abrahamsen <eric@ericabrahamsen.net>
> Cc: emacs-devel@gnu.org
> Date: Sat, 26 Nov 2022 17:46:04 -0800
>
> Here's a new version, that I hope clarifies these questions (instead of
> doing the opposite).
Thanks, a few minor comments below.
> Lastly, nobody with a maintainer's hat on has actually given the green
> light on this, and I assume we'll want to hold off until the next
> version of Emacs is released; anyway it would be good to know what
> Eli/Lars think. I haven't done any NEWS additions or anything, either.
What exactly are you asking about here?
> @c -*-texinfo-*-
> @c This is part of the GNU Emacs Lisp Reference Manual.
This would mean a suitable change to elisp.texi at the least, and probably
also to another file that is part of the ELisp reference manual sources?
> A @acronym{PEG} parser is defined as a list of named rules, each of
> which match text patterns, and/or contain references to other rules.
^^^^^ ^^^^^^^
"matches" and "contains", in singular.
> Parsing is initiated with the function @code{peg-run} or the macro
> @code{peg-parse}, and parses text after point in the current buffer,
> using a given set of rules.
This function and this macro need to be formally documented with @defun and
@defmac, as we do elsewhere in the ELisp reference.
> The definition of each rule is referred to as a @dfn{parsing
> expression} (@acronym{PEX}), and can consist of a literal string, a
Ideally, each @dfn in the manual should have a @cindex entry, because people
are likely to look up these terms.
> Or set as the value of a variable, and the variable used in a
> combination of calls to @code{with-peg-rules} and @code{peg-run},
> where the ``entry-point'' rule is given explicitly:
This sentence reads awkwardly, because it starts with "Or set". Suggest to
rephrase:
Alternatively, use a variable whose value is a grammar, and use it in a
combination of calls to...
> @example
> (defvar number-grammar
> '((number sign digit (* digit))
> (sign (or "+" "-" ""))
> (digit [0-9])))
Btw, this begs a question: how come the value of the variable is a (quoted)
list, but the value you pass to peg-parse in the previous example was not
quoted?
> By default, calls to @code{peg-run} or @code{peg-parse} produce no
> output: parsing simply moves point. In order to return or otherwise
> act upon parsed strings, rules can include @dfn{actions}, see
> @xref{Parsing Actions} for more information.
Again, a @cindex for "actions" is in order here.
Also, @xref produces a Capitalized "See", so you want a @ref here, not
@xref. And please always follow the closing brace of a cross-reference with
a period or a comma, because some versions of Texinfo insist on that. (The
only exception from this rule is @pxref inside parentheses.)
> Individual rules can also be defined using a more @code{defun}-like
> syntax, using the macro @code{define-peg-rule}:
>
> @example
> (define-peg-rule digit ()
> [0-9])
> @end example
define-peg-rule should be documented with a @defmac.
> @node PEX Definitions
> @section PEX Definitions
There should be a @menu in the parent @chapter's node for all the child
@section nodes. Otherwise, makeinfo will barf.
> @item "abc"
> A literal string.
You don't mean "abc" literally here, do you? The correct way of expressing
"a string" is
@item @var{string}
> @item (char C)
> A single character, as an Elisp character literal.
Likewise here:
@item @var{C}
A single character @var{C}, as a Lisp character literal.
> @item (* E)
> Zero or more of an expression, as the regexp ``*''. Matching is
> always ``greedy''.
Likewise. Basically, all the elements here are meta-syntactic variables:
they stand for something else. The right markup for them is @var.
Also, "zero or more of an expression" reads awkwardly. I don't even think I
understand what you mean.
And please quote regexps using @samp, not literal quotes (here and
elsewhere).
> @item (+ E)
> One or more of an expression, as the regexp ``+''. Matching is always
> ``greedy''.
Likewise about "one or more of an expression".
> @item (opt E)
> Zero or one of an expression, as the regexp ``?''.
Same.
> @item (range A B)
> The character range between A and B, as the regexp ``[A-B]''.
It is better to use CH1 and CH2 instead of A and B.
> @item [a-b "+*" ?x]
> A character set, including ranges, literal characters, or strings of
> characters.
Same comment about a and b.
> @vindex peg-char-classes
> Named character classes include the following:
Instead of listing them, just use a cross-reference to the node where
classes are documented as part of regexp syntax.
> The first action pushes the initial value of point to the stack. The
> intervening @acronym{PEX} moves point over the next word. The second
^^
Two spaces there.
> action pops the previous value from the stack (binding it to the
> variable @code{start}), and uses that value to extract a substring
> from the buffer and push it to the stack. This pattern is so common
> that peg.el provides a shorthand function that does exactly the above,
^^^^^^
@file{peg.el}. Or maybe just @acronym{PEG}.
> @item (substring E)
> Match @acronym{PEX} E and push the matched string to the stack.
Same comments here regarding @var markup of meta-syntactic variables.
> @item (replace E "repl")
> Match E and replaced the matched region with the string "repl".
"repl" is not a literal string, it's a meta-syntactic variable, just like E.
Finally, this needs a lot of index entries to make it a useful reference
that is easily looked up for stuff.
Thanks.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-27 8:57 ` Eli Zaretskii
@ 2022-11-28 1:09 ` Eric Abrahamsen
2022-11-28 12:16 ` Eli Zaretskii
0 siblings, 1 reply; 100+ messages in thread
From: Eric Abrahamsen @ 2022-11-28 1:09 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: yantar92, emacs-devel
On 11/27/22 10:57 AM, Eli Zaretskii wrote:
>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>> Cc: emacs-devel@gnu.org
>> Date: Sat, 26 Nov 2022 17:46:04 -0800
>>
>> Here's a new version, that I hope clarifies these questions (instead of
>> doing the opposite).
>
> Thanks, a few minor comments below.
Thank you! I feel like you've given me many of the same notes in the
past (particularly @xref/@ref), I'll get it eventually.
>> Lastly, nobody with a maintainer's hat on has actually given the green
>> light on this, and I assume we'll want to hold off until the next
>> version of Emacs is released; anyway it would be good to know what
>> Eli/Lars think. I haven't done any NEWS additions or anything, either.
>
> What exactly are you asking about here?
Making peg.el a built-in. I looked back over this whole thread and it
turns out you already gave the OK early on, but now I'm not sure if this
would go in Emacs proper, or as a built-in package... So that's my
question. Where is the natural place to put it?
>> @c -*-texinfo-*-
>> @c This is part of the GNU Emacs Lisp Reference Manual.
>
> This would mean a suitable change to elisp.texi at the least, and probably
> also to another file that is part of the ELisp reference manual sources?
This would depend on how, exactly, it gets included.
[...]
>> @example
>> (defvar number-grammar
>> '((number sign digit (* digit))
>> (sign (or "+" "-" ""))
>> (digit [0-9])))
>
> Btw, this begs a question: how come the value of the variable is a (quoted)
> list, but the value you pass to peg-parse in the previous example was not
> quoted?
peg-parse is a macro, peg-run is a function. peg-parse constructs a call
to peg-run, passing in the car of whatever list you've given to it as
the argument.
The rest of your comments seem straightforward, I'll make those edits
now.
Thanks,
Eric
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-28 1:09 ` Eric Abrahamsen
@ 2022-11-28 12:16 ` Eli Zaretskii
2023-09-25 1:30 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Eli Zaretskii @ 2022-11-28 12:16 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: yantar92, emacs-devel
> From: Eric Abrahamsen <eric@ericabrahamsen.net>
> Cc: yantar92@posteo.net, emacs-devel@gnu.org
> Date: Sun, 27 Nov 2022 17:09:38 -0800
>
> >> Lastly, nobody with a maintainer's hat on has actually given the green
> >> light on this, and I assume we'll want to hold off until the next
> >> version of Emacs is released; anyway it would be good to know what
> >> Eli/Lars think. I haven't done any NEWS additions or anything, either.
> >
> > What exactly are you asking about here?
>
> Making peg.el a built-in. I looked back over this whole thread and it
> turns out you already gave the OK early on, but now I'm not sure if this
> would go in Emacs proper, or as a built-in package...
The former, of course. I'd defer to Stefan if I thought it should go to
ELPA.
> Where is the natural place to put it?
Either in lisp/progmodes or in lisp/emacs-lisp. I prefer the former, FWIW.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Make peg.el a built-in library?
2022-11-08 16:29 ` Juanma Barranquero
@ 2022-12-02 20:20 ` Augusto Stoffel
0 siblings, 0 replies; 100+ messages in thread
From: Augusto Stoffel @ 2022-12-02 20:20 UTC (permalink / raw)
To: Juanma Barranquero
Cc: Visuwesh, Stefan Monnier, Eric Abrahamsen, Ihor Radchenko,
emacs-devel
On Tue, 8 Nov 2022 at 17:29, Juanma Barranquero wrote:
> ...Except that I went to Buenos Aires, coming from São Paulo, and I don't know what did
> the travel agency assume about my origins. All I know is that I was in a hotel lobby and
> suddenly they called for a "Mr. Juan Ríos" and I thought for a moment "curious, that guy's
> got the same name as my maternal grandfather"... Until it dawned on me, a few seconds
> later, that *I* was supposed to be "Mr. Juan Ríos".
That's because the Portuguese system is similar to the Spanish one,
except that the order or names is reversed. Typically, one's last last
name comes from the paternal side, etc.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-16 4:27 ` [PATCH] " Eric Abrahamsen
2022-11-16 5:07 ` tomas
2022-11-16 6:24 ` Ihor Radchenko
@ 2023-01-11 7:39 ` Michael Heerdegen
2023-01-11 8:04 ` Ihor Radchenko
2 siblings, 1 reply; 100+ messages in thread
From: Michael Heerdegen @ 2023-01-11 7:39 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Okay, here's a first stab. I read the paper, and understood about half
> of it, which seemed like enough. It was interesting to see that the
> paper explicitly calls out the exact greedy-matching behavior I'd
> encountered.
I missed this discussion.
Two points from my side:
- When you have worked in all comments could you please post an
up-to-date version of your additions to the manual for me to review?
- When I had read that paper the outcome had been an rx-to-peg.el
translator. If someome is interested I can attach it. This was some
time ago and I don't know that much about pegs any more than the person
at that time.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-01-11 7:39 ` Michael Heerdegen
@ 2023-01-11 8:04 ` Ihor Radchenko
2023-01-11 11:01 ` Michael Heerdegen
0 siblings, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2023-01-11 8:04 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: Eric Abrahamsen, emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> - When I had read that paper the outcome had been an rx-to-peg.el
> translator. If someome is interested I can attach it. This was some
> time ago and I don't know that much about pegs any more than the person
> at that time.
I am wondering if we may instead just support traditional regexps as an
extra PEG construct. Considering that regexp support is anyway built-in,
why not?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-01-11 8:04 ` Ihor Radchenko
@ 2023-01-11 11:01 ` Michael Heerdegen
2023-01-11 11:32 ` tomas
2023-02-05 12:10 ` Ihor Radchenko
0 siblings, 2 replies; 100+ messages in thread
From: Michael Heerdegen @ 2023-01-11 11:01 UTC (permalink / raw)
To: emacs-devel
Ihor Radchenko <yantar92@posteo.net> writes:
> I am wondering if we may instead just support traditional regexps as an
> extra PEG construct. Considering that regexp support is anyway built-in,
> why not?
Dunno. I wrote the translator for academic interest and for
learning.
AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
Or match data handling.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-01-11 11:01 ` Michael Heerdegen
@ 2023-01-11 11:32 ` tomas
2023-02-05 12:10 ` Ihor Radchenko
1 sibling, 0 replies; 100+ messages in thread
From: tomas @ 2023-01-11 11:32 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 596 bytes --]
On Wed, Jan 11, 2023 at 12:01:43PM +0100, Michael Heerdegen wrote:
> Ihor Radchenko <yantar92@posteo.net> writes:
>
> > I am wondering if we may instead just support traditional regexps as an
> > extra PEG construct. Considering that regexp support is anyway built-in,
> > why not?
>
> Dunno. I wrote the translator for academic interest and for
> learning.
>
> AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
> Or match data handling.
Plus, I don't think a PEG packrat parser is always as efficient as we
know and love our regexps.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-01-11 11:01 ` Michael Heerdegen
2023-01-11 11:32 ` tomas
@ 2023-02-05 12:10 ` Ihor Radchenko
2023-02-05 15:41 ` Eduardo Ochs
2023-02-06 0:33 ` Michael Heerdegen
1 sibling, 2 replies; 100+ messages in thread
From: Ihor Radchenko @ 2023-02-05 12:10 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> Ihor Radchenko <yantar92@posteo.net> writes:
>
>> I am wondering if we may instead just support traditional regexps as an
>> extra PEG construct. Considering that regexp support is anyway built-in,
>> why not?
>
> Dunno. I wrote the translator for academic interest and for
> learning.
>
> AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
> Or match data handling.
Sure. But if we make Emacs regexp a valid PEG construct, they will work.
It is the whole point.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 12:10 ` Ihor Radchenko
@ 2023-02-05 15:41 ` Eduardo Ochs
2023-02-05 15:45 ` Ihor Radchenko
2023-02-09 5:44 ` Jean Louis
2023-02-06 0:33 ` Michael Heerdegen
1 sibling, 2 replies; 100+ messages in thread
From: Eduardo Ochs @ 2023-02-05 15:41 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Michael Heerdegen, emacs-devel
On Sun, 5 Feb 2023 at 09:10, Ihor Radchenko <yantar92@posteo.net> wrote:
>
> Michael Heerdegen <michael_heerdegen@web.de> writes:
>
> > Ihor Radchenko <yantar92@posteo.net> writes:
> >
> >> I am wondering if we may instead just support traditional regexps as an
> >> extra PEG construct. Considering that regexp support is anyway built-in,
> >> why not?
> >
> > Dunno. I wrote the translator for academic interest and for
> > learning.
> >
> > AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
> > Or match data handling.
>
> Sure. But if we make Emacs regexp a valid PEG construct, they will work.
> It is the whole point.
I played a bit with peg.el some time ago - it is very elegant and it's
very easy to inspect how it does things, but it is much slower than
Lua's LPEG. I'm now using this to write my parsers:
https://github.com/edubart/lpegrex
https://github.com/edubart/lpegrex/blob/main/parsers/lua.lua
The second link above is an example - a parser for Lua written in
Lpegrex.
I'm starting to use this thing, that lets me run a Lua interpreter
inside Emacs as a module,
https://github.com/edrx/emlua/#introduction
to call lpegrex parsers to parse parts of Emacs buffers. The result -
let me call it lpegrex+emlua - is very fragile because I'm too bad &
lazy with C programming to implement better error handling in emlua,
but if anyone else wants to play with lpegrex+emlua I can create a
page with instructions...
Cheers,
Eduardo Ochs
http://anggtwu.net/#eev
http://anggtwu.net/eepitch.html
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 15:41 ` Eduardo Ochs
@ 2023-02-05 15:45 ` Ihor Radchenko
2023-02-05 16:19 ` Eduardo Ochs
2023-02-09 5:44 ` Jean Louis
1 sibling, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2023-02-05 15:45 UTC (permalink / raw)
To: Eduardo Ochs; +Cc: Michael Heerdegen, emacs-devel
Eduardo Ochs <eduardoochs@gmail.com> writes:
>> > AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
>> > Or match data handling.
>>
>> Sure. But if we make Emacs regexp a valid PEG construct, they will work.
>> It is the whole point.
>
> I played a bit with peg.el some time ago - it is very elegant and it's
> very easy to inspect how it does things, but it is much slower than
> Lua's LPEG...
What do you mean by slower?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 15:45 ` Ihor Radchenko
@ 2023-02-05 16:19 ` Eduardo Ochs
2023-02-05 16:50 ` Ihor Radchenko
0 siblings, 1 reply; 100+ messages in thread
From: Eduardo Ochs @ 2023-02-05 16:19 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Michael Heerdegen, emacs-devel
On Sun, 5 Feb 2023 at 12:44, Ihor Radchenko <yantar92@posteo.net> wrote:
>
> Eduardo Ochs <eduardoochs@gmail.com> writes:
>
> >> > AFAIR not all Emacs regexp features work in PEGs - backrefs for example.
> >> > Or match data handling.
> >>
> >> Sure. But if we make Emacs regexp a valid PEG construct, they will work.
> >> It is the whole point.
> >
> > I played a bit with peg.el some time ago - it is very elegant and it's
> > very easy to inspect how it does things, but it is much slower than
> > Lua's LPEG...
>
> What do you mean by slower?
I wrote a simple peg.el parser and it took two seconds to parse an
input that has just 2KB. The lpeg parser that I use to htmlize some of
my files take about 0.5s to parse a file with 3MB and to return the
htmlized version.
[[]], E.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 16:19 ` Eduardo Ochs
@ 2023-02-05 16:50 ` Ihor Radchenko
0 siblings, 0 replies; 100+ messages in thread
From: Ihor Radchenko @ 2023-02-05 16:50 UTC (permalink / raw)
To: Eduardo Ochs; +Cc: Michael Heerdegen, emacs-devel
Eduardo Ochs <eduardoochs@gmail.com> writes:
>> What do you mean by slower?
>
> I wrote a simple peg.el parser and it took two seconds to parse an
> input that has just 2KB. The lpeg parser that I use to htmlize some of
> my files take about 0.5s to parse a file with 3MB and to return the
> htmlized version.
There is no fundamental reason why Elisp peg implementation should be
that much slower. I suggest filing a bug report.
I am also wondering if you can share the example file and grammar.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 12:10 ` Ihor Radchenko
2023-02-05 15:41 ` Eduardo Ochs
@ 2023-02-06 0:33 ` Michael Heerdegen
1 sibling, 0 replies; 100+ messages in thread
From: Michael Heerdegen @ 2023-02-06 0:33 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-devel
Ihor Radchenko <yantar92@posteo.net> writes:
> Sure. But if we make Emacs regexp a valid PEG construct, they will work.
> It is the whole point.
I misunderstood what you meant. That could make sense, and it should
not be hard to do.
Michael.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-02-05 15:41 ` Eduardo Ochs
2023-02-05 15:45 ` Ihor Radchenko
@ 2023-02-09 5:44 ` Jean Louis
1 sibling, 0 replies; 100+ messages in thread
From: Jean Louis @ 2023-02-09 5:44 UTC (permalink / raw)
To: Eduardo Ochs; +Cc: Ihor Radchenko, Michael Heerdegen, emacs-devel
* Eduardo Ochs <eduardoochs@gmail.com> [2023-02-05 18:42]:
> https://github.com/edrx/emlua/#introduction
>
> to call lpegrex parsers to parse parts of Emacs buffers. The result -
> let me call it lpegrex+emlua - is very fragile because I'm too bad &
> lazy with C programming to implement better error handling in emlua,
> but if anyone else wants to play with lpegrex+emlua I can create a
> page with instructions...
I just recommend not relying on Github, rather on free software
repositories.
Savannah on nongnu.org:
https//savannah.nongnu.org
Savannah, the software forge for people committed to free software:
https://savannah.gnu.org
Codeberg.org (Germany):
https://codeberg.org
Sourcehut.org:
https://sourcehut.org
Pagure:
https://pagure.io/pagure
Trisquel GNU/Linux-libre Git Repositories:
https://devel.trisquel.info/groups/trisquel
GitGud - Fast and Free Git Hosting:
https://gitgud.io/users/sign_in
Fosshost:
https://fosshost.org/
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
In support of Richard M. Stallman
https://stallmansupport.org/
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2022-11-28 12:16 ` Eli Zaretskii
@ 2023-09-25 1:30 ` Eric Abrahamsen
2023-09-25 2:27 ` Adam Porter
2024-03-24 14:19 ` Ihor Radchenko
0 siblings, 2 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2023-09-25 1:30 UTC (permalink / raw)
To: emacs-devel; +Cc: Michael Heerdegen, Eli Zaretskii, Stefan Monnier, yantar92
[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]
Before another year goes by...
Since my last attempt at this, Stefan has made some additions to the
ELPA version of peg.el (adding him to cc in case he wants to look at
this), and I have realized that my last stab at the manual inadvertently
documented some local changes I had made and then forgotten about.
So here's a commit adding package, tests, and manual all at once. I've
cc'd the people who indicated interest. The manual should be up to date
with the code, I hope I've managed to follow all the pointers, and I
believe I've done a better job of explaining how to use the various
entry points of the library.
I hope this looks okay!
Thanks,
Eric
On 11/28/22 14:16 PM, Eli Zaretskii wrote:
>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>> Cc: yantar92@posteo.net, emacs-devel@gnu.org
>> Date: Sun, 27 Nov 2022 17:09:38 -0800
>>
>> >> Lastly, nobody with a maintainer's hat on has actually given the green
>> >> light on this, and I assume we'll want to hold off until the next
>> >> version of Emacs is released; anyway it would be good to know what
>> >> Eli/Lars think. I haven't done any NEWS additions or anything, either.
>> >
>> > What exactly are you asking about here?
>>
>> Making peg.el a built-in. I looked back over this whole thread and it
>> turns out you already gave the OK early on, but now I'm not sure if this
>> would go in Emacs proper, or as a built-in package...
>
> The former, of course. I'd defer to Stefan if I thought it should go to
> ELPA.
>
>> Where is the natural place to put it?
>
> Either in lisp/progmodes or in lisp/emacs-lisp. I prefer the former, FWIW.
[-- Attachment #2: 0001-Add-peg.el-as-a-built-in-library.patch --]
[-- Type: text/x-patch, Size: 65704 bytes --]
From a8d1b3ad3162e92b4f8c8dd52690d9c1f3333661 Mon Sep 17 00:00:00 2001
From: Eric Abrahamsen <eric@ericabrahamsen.net>
Date: Mon, 5 Dec 2022 21:59:03 -0800
Subject: [PATCH] Add peg.el as a built-in library
* lisp/progmodes/peg.el: New file, taken from ELPA package.
* test/lisp/peg-tests.el: Package tests.
* doc/lispref/peg.texi: Documentation.
---
doc/lispref/Makefile.in | 1 +
doc/lispref/elisp.texi | 2 +
doc/lispref/peg.texi | 351 +++++++++++++++
lisp/progmodes/peg.el | 944 ++++++++++++++++++++++++++++++++++++++++
test/lisp/peg-tests.el | 367 ++++++++++++++++
5 files changed, 1665 insertions(+)
create mode 100644 doc/lispref/peg.texi
create mode 100644 lisp/progmodes/peg.el
create mode 100644 test/lisp/peg-tests.el
diff --git a/doc/lispref/Makefile.in b/doc/lispref/Makefile.in
index 325f23a3c0f..8ac1242996d 100644
--- a/doc/lispref/Makefile.in
+++ b/doc/lispref/Makefile.in
@@ -112,6 +112,7 @@ srcs =
$(srcdir)/os.texi \
$(srcdir)/package.texi \
$(srcdir)/parsing.texi \
+ $(srcdir)/peg.texi \
$(srcdir)/positions.texi \
$(srcdir)/processes.texi \
$(srcdir)/records.texi \
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index 72441c8d442..e12f61fc7eb 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -222,6 +222,7 @@ Top
* Non-ASCII Characters:: Non-ASCII text in buffers and strings.
* Searching and Matching:: Searching buffers for strings or regexps.
* Syntax Tables:: The syntax table controls word and list parsing.
+* Parsing Expression Grammars:: Parsing structured buffer text.
* Parsing Program Source:: Generate syntax tree for program sources.
* Abbrevs:: How Abbrev mode works, and its data structures.
@@ -1719,6 +1720,7 @@ Top
@include searching.texi
@include syntax.texi
+@include peg.texi
@include parsing.texi
@include abbrevs.texi
@include threads.texi
diff --git a/doc/lispref/peg.texi b/doc/lispref/peg.texi
new file mode 100644
index 00000000000..64950f148b1
--- /dev/null
+++ b/doc/lispref/peg.texi
@@ -0,0 +1,351 @@
+@c -*-texinfo-*-
+@c This is part of the GNU Emacs Lisp Reference Manual.
+@c Copyright (C) 1990--1995, 1998--1999, 2001--2023 Free Software
+@c Foundation, Inc.
+@c See the file elisp.texi for copying conditions.
+@node Parsing Expression Grammars
+@chapter Parsing Expression Grammars
+@cindex text parsing
+@cindex parsing expression grammar
+
+ Emacs Lisp provides several tools for parsing and matching text,
+from regular expressions (@pxref{Regular Expressions}) to full
+@acronym{LL} grammar parsers (@pxref{Top,, Bovine parser
+development,bovine}). @dfn{Parsing Expression Grammars}
+(@acronym{PEG}) are another approach to text parsing that offer more
+structure and composibility than regular expressions, but less
+complexity than context-free grammars.
+
+A @acronym{PEG} parser is defined as a list of named rules, each of
+which matches text patterns, and/or contains references to other
+rules. Parsing is initiated with the function @code{peg-run} or the
+macro @code{peg-parse} (see below), and parses text after point in the
+current buffer, using a given set of rules.
+
+@cindex parsing expression
+The definition of each rule is referred to as a @dfn{parsing
+expression} (@acronym{PEX}), and can consist of a literal string, a
+regexp-like character range or set, a peg-specific construct
+resembling an elisp function call, a reference to another rule, or a
+combination of any of these. A grammar is expressed as a tree of
+rules in which one rule is typically treated as a ``root'' or
+``entry-point'' rule. For instance:
+
+@example
+@group
+((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+@end group
+@end example
+
+Once defined, grammars can be used to parse text after point in the
+current buffer, in the following ways:
+
+@defmac peg-parse &rest pexs
+Match @var{pexs} at point. If @var{pexs} is a list of PEG rules, the
+first rule is considered the ``entry-point'':
+@end defmac
+
+@example
+@group
+(peg-parse
+ ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9])))
+@end group
+@end example
+
+This macro represents the simplest use of the @acronym{PEG} library,
+but also the least flexible, as the rules must be written directly
+into the source code. A more flexible approach involves use of three
+macros in conjunction: @code{with-peg-rules}, a @code{let}-like
+construct that makes a set of rules available within the macro body;
+@code{peg-run}, which initiates parsing given a single rule; and
+@code{peg}, which is used to wrap the entry-point rule name. In fact,
+a call to @code{peg-parse} expands to just this set of calls. The
+above example could be written as:
+
+@example
+@group
+(with-peg-rules
+ ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+ (peg-run (peg number)))
+@end group
+@end example
+
+This allows more explicit control over the ``entry-point'' of parsing,
+and allows the combination of rules from different sources.
+
+Individual rules can also be defined using a more @code{defun}-like
+syntax, using the macro @code{define-peg-rule}:
+
+@example
+(define-peg-rule digit ()
+ [0-9])
+@end example
+
+This also allows for rules that accept an argument (supplied by the
+@code{funcall} PEG rule).
+
+Another possibility is to define a named set of rules with
+@code{define-peg-ruleset}:
+
+@example
+(define-peg-ruleset number-grammar
+ '((number sign digit (* digit))
+ digit ;; A reference to the definition above.
+ (sign (or "+" "-" ""))))
+@end example
+
+Rules and rulesets defined this way can be referred to by name in
+later calls to @code{peg-run} or @code{with-peg-rules}:
+
+@example
+(with-peg-rules number-grammar
+ (peg-run (peg number)))
+@end example
+
+By default, calls to @code{peg-run} or @code{peg-parse} produce no
+output: parsing simply moves point. In order to return or otherwise
+act upon parsed strings, rules can include @dfn{actions}, see
+@ref{Parsing Actions}.
+
+@menu
+* PEX Definitions:: The syntax of PEX rules.
+* Parsing Actions:: Running actions upon successful parsing.
+* Writing PEG Rules:: Tips for writing parsing rules.
+@end menu
+
+@node PEX Definitions
+@section PEX Definitions
+
+Parsing expressions can be defined using the following syntax:
+
+@table @code
+@item (and E1 E2 ...)
+A sequence of @acronym{PEX}s that must all be matched. The @code{and} form is
+optional and implicit.
+
+@item (or E1 E2 ...)
+Prioritized choices, meaning that, as in Elisp, the choices are tried
+in order, and the first successful match is used. Note that this is
+distinct from context-free grammars, in which selection between
+multiple matches is indeterminate.
+
+@item (any)
+Matches any single character, as the regexp ``.''.
+
+@item @var{string}
+A literal string.
+
+@item (char @var{C})
+A single character @var{C}, as an Elisp character literal.
+
+@item (* @var{E})
+Zero or more instances of expression @var{E}, as the regexp @samp{*}.
+Matching is always ``greedy''.
+
+@item (+ @var{E})
+One or more instances of expression @var{E}, as the regexp @samp{+}.
+Matching is always ``greedy''.
+
+@item (opt @var{E})
+Zero or one instance of expression @var{E}, as the regexp @samp{?}.
+
+@item SYMBOL
+A symbol representing a previously-defined PEG rule.
+
+@item (range CH1 CH2)
+The character range between CH1 and CH2, as the regexp @samp{[CH1-CH2]}.
+
+@item [CH1-CH2 "+*" ?x]
+A character set, which can include ranges, character literals, or
+strings of characters.
+
+@item [ascii cntrl]
+A list of named character classes.
+
+@item (syntax-class @var{NAME})
+A single syntax class.
+
+@item (funcall E ARGS...)
+Call @acronym{PEX} E (previously defined with @code{define-peg-rule})
+with arguments @var{ARGS}.
+
+@item (null)
+The empty string.
+
+@end table
+
+The following expressions are used as anchors or tests -- they do not
+move point, but return a boolean value which can be used to constrain
+matches as a way of controlling the parsing process (@pxref{Writing
+PEG Rules}).
+
+@table @code
+@item (bob)
+Beginning of buffer.
+
+@item (eob)
+End of buffer.
+
+@item (bol)
+Beginning of line.
+
+@item (eol)
+End of line.
+
+@item (bow)
+Beginning of word.
+
+@item (eow)
+End of word.
+
+@item (bos)
+Beginning of symbol.
+
+@item (eos)
+End of symbol.
+
+@item (if E)
+Returns non-@code{nil} if parsing @acronym{PEX} E from point succeeds (point
+is not moved).
+
+@item (not E)
+Returns non-@code{nil} if parsing @acronym{PEX} E from point fails (point
+is not moved).
+
+@item (guard EXP)
+Treats the value of the Lisp expression EXP as a boolean.
+
+@end table
+
+@vindex peg-char-classes
+Character class matching can use the same named character classes as
+in regular expressions (@pxref{Top,, Character Classes,elisp})
+
+@node Parsing Actions
+@section Parsing Actions
+
+@cindex parsing actions
+@cindex parsing stack
+By default the process of parsing simply moves point in the current
+buffer, ultimately returning @code{t} if the parsing succeeds, and
+@code{nil} if it doesn't. It's also possible to define ``actions''
+that can run arbitrary Elisp at certain points in the parsed text.
+These actions can optionally affect something called the @dfn{parsing
+stack}, which is a list of values returned by the parsing process.
+These actions only run (and only return values) if the parsing process
+ultimately succeeds; if it fails the action code is not run at all.
+
+Actions can be added anywhere in the definition of a rule. They are
+distinguished from parsing expressions by an initial backquote
+(@samp{`}), followed by a parenthetical form that must contain a pair
+of hyphens (@samp{--}) somewhere within it. Symbols to the left of
+the hyphens are bound to values popped from the stack (they are
+somewhat analogous to the argument list of a lambda form). Values
+produced by code to the right are pushed to the stack (analogous to
+the return value of the lambda). For instance, the previous grammar
+can be augmented with actions to return the parsed number as an actual
+integer:
+
+@example
+(with-peg-rules ((number sign digit (* digit
+ `(a b -- (+ (* a 10) b)))
+ `(sign val -- (* sign val)))
+ (sign (or (and "+" `(-- 1))
+ (and "-" `(-- -1))
+ (and "" `(-- 1))))
+ (digit [0-9] `(-- (- (char-before) ?0))))
+ (peg-run (peg number)))
+@end example
+
+There must be values on the stack before they can be popped and
+returned -- if there aren't enough stack values to bind to an action's
+left-hand terms, they will be bound to @code{nil}. An action with
+only right-hand terms will push values to the stack; an action with
+only left-hand terms will consume (and discard) values from the stack.
+At the end of parsing, stack values are returned as a flat list.
+
+To return the string matched by a @acronym{PEX} (instead of simply
+moving point over it), a rule like this can be used:
+
+@example
+(one-word
+ `(-- (point))
+ (+ [word])
+ `(start -- (buffer-substring start (point))))
+@end example
+
+The first action pushes the initial value of point to the stack. The
+intervening @acronym{PEX} moves point over the next word. The second
+action pops the previous value from the stack (binding it to the
+variable @code{start}), and uses that value to extract a substring
+from the buffer and push it to the stack. This pattern is so common
+that @acronym{PEG} provides a shorthand function that does exactly the
+above, along with a few other shorthands for common scenarios:
+
+@table @code
+@item (substring @var{E})
+Match @acronym{PEX} @var{E} and push the matched string to the stack.
+
+@item (region @var{E})
+Match @var{E} and push the start and end positions of the matched
+region to the stack.
+
+@item (replace @var{E} @var{replacement})
+Match @var{E} and replaced the matched region with the string @var{replacement}.
+
+@item (list @var{E})
+Match @var{E}, collect all values produced by @var{E} (and its
+sub-expressions) into a list, and push that list to the stack. Stack
+values are typically returned as a flat list; this is a way of
+``grouping'' values together.
+@end table
+
+@node Writing PEG Rules
+@section Writing PEG Rules
+
+Something to be aware of when writing PEG rules is that they are
+greedy. Rules which can consume a variable amount of text will always
+consume the maximum amount possible, even if that causes a rule that
+might otherwise have matched to fail later on -- there is no
+backtracking. For instance, this rule will never succeed:
+
+@example
+(forest (+ "tree" (* [blank])) "tree" (eol))
+@end example
+
+The @acronym{PEX} @code{(+ "tree" (* [blank]))} will consume all
+repetitions of the word ``tree'', leaving none to match the final
+@code{"tree"}.
+
+In these situations, the desired result can be obtained by using
+predicates and guards -- namely the @code{not}, @code{if} and
+@code{guard} expressions -- to constrain behavior. For instance:
+
+@example
+(forest (+ "tree" (* [blank])) (not (eol)) "tree" (eol))
+@end example
+
+The @code{if} and @code{not} operators accept a parsing expression and
+interpret it as a boolean, without moving point. The contents of a
+@code{guard} operator are evaluated as regular Lisp (not a
+@acronym{PEX}) and should return a boolean value. A @code{nil} value
+causes the match to fail.
+
+Another potentially unexpected behavior is that parsing will move
+point as far as possible, even if the parsing ultimately fails. This
+rule:
+
+@example
+(end-game "game" (eob))
+@end example
+
+when run in a buffer containing the text ``game over'' after point,
+will move point to just after ``game'' then halt parsing, returning
+@code{nil}. Successful parsing will always return @code{t}, or the
+contexts of the parsing stack.
diff --git a/lisp/progmodes/peg.el b/lisp/progmodes/peg.el
new file mode 100644
index 00000000000..2eb4a7384d0
--- /dev/null
+++ b/lisp/progmodes/peg.el
@@ -0,0 +1,944 @@
+;;; peg.el --- Parsing Expression Grammars in Emacs Lisp -*- lexical-binding:t -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+;;
+;; Author: Helmut Eller <eller.helmut@gmail.com>
+;; Maintainer: Stefan Monnier <monnier@iro.umontreal.ca>
+;; Version: 1.0.1
+;;
+;; This program is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with this program. If not, see <https://www.gnu.org/licenses/>.
+;;
+;;; Commentary:
+;;
+;; This package implements Parsing Expression Grammars for Emacs Lisp.
+
+;; Parsing Expression Grammars (PEG) are a formalism in the spirit of
+;; Context Free Grammars (CFG) with some simplifications which makes
+;; the implementation of PEGs as recursive descent parsers particularly
+;; simple and easy to understand [Ford, Baker].
+;; PEGs are more expressive than regexps and potentially easier to use.
+;;
+;; This file implements the macros `define-peg-rule', `with-peg-rules', and
+;; `peg-parse' which parses the current buffer according to a PEG.
+;; E.g. we can match integers with:
+;;
+;; (with-peg-rules
+;; ((number sign digit (* digit))
+;; (sign (or "+" "-" ""))
+;; (digit [0-9]))
+;; (peg-run (peg number)))
+;; or
+;; (define-peg-rule digit ()
+;; [0-9])
+;; (peg-parse (number sign digit (* digit))
+;; (sign (or "+" "-" "")))
+;;
+;; In contrast to regexps, PEGs allow us to define recursive "rules".
+;; A "grammar" is a set of rules. A rule is written as (NAME PEX...)
+;; E.g. (sign (or "+" "-" "")) is a rule with the name "sign".
+;; The syntax for PEX (Parsing Expression) is a follows:
+;;
+;; Description Lisp Traditional, as in Ford's paper
+;; =========== ==== ===========
+;; Sequence (and E1 E2) e1 e2
+;; Prioritized Choice (or E1 E2) e1 / e2
+;; Not-predicate (not E) !e
+;; And-predicate (if E) &e
+;; Any character (any) .
+;; Literal string "abc" "abc"
+;; Character C (char C) 'c'
+;; Zero-or-more (* E) e*
+;; One-or-more (+ E) e+
+;; Optional (opt E) e?
+;; Non-terminal SYMBOL A
+;; Character range (range A B) [a-b]
+;; Character set [a-b "+*" ?x] [a-b+*x] ;Note: it's a vector
+;; Character classes [ascii cntrl]
+;; Boolean-guard (guard EXP)
+;; Syntax-Class (syntax-class NAME)
+;; Local definitions (with RULES PEX...)
+;; Indirect call (funcall EXP ARGS...)
+;; and
+;; Empty-string (null) ε
+;; Beginning-of-Buffer (bob)
+;; End-of-Buffer (eob)
+;; Beginning-of-Line (bol)
+;; End-of-Line (eol)
+;; Beginning-of-Word (bow)
+;; End-of-Word (eow)
+;; Beginning-of-Symbol (bos)
+;; End-of-Symbol (eos)
+;;
+;; Rules can refer to other rules, and a grammar is often structured
+;; as a tree, with a root rule referring to one or more "branch
+;; rules", all the way down to the "leaf rules" that deal with actual
+;; buffer text. Rules can be recursive or mutually referential,
+;; though care must be taken not to create infinite loops.
+;;
+;;;; Named rulesets:
+;;
+;; You can define a set of rules for later use with:
+;;
+;; (define-peg-ruleset myrules
+;; (sign () (or "+" "-" ""))
+;; (digit () [0-9])
+;; (nat () digit (* digit))
+;; (int () sign digit (* digit))
+;; (float () int "." nat))
+;;
+;; and later refer to it:
+;;
+;; (with-peg-rules
+;; (myrules
+;; (complex float "+i" float))
+;; ... (peg-parse nat "," nat "," complex) ...)
+;;
+;;;; Parsing actions:
+;;
+;; PEXs also support parsing actions, i.e. Lisp snippets which are
+;; executed when a pex matches. This can be used to construct syntax
+;; trees or for similar tasks. The most basic form of action is
+;; written as:
+;;
+;; (action FORM) ; evaluate FORM for its side-effects
+;;
+;; Actions don't consume input, but are executed at the point of
+;; match. Another kind of action is called a "stack action", and
+;; looks like this:
+;;
+;; `(VAR... -- FORM...) ; stack action
+;;
+;; A stack action takes VARs from the "value stack" and pushes the
+;; results of evaluating FORMs to that stack.
+
+;; The value stack is created during the course of parsing. Certain
+;; operators (see below) that match buffer text can push values onto
+;; this stack. "Upstream" rules can then draw values from the stack,
+;; and optionally push new ones back. For instance, consider this
+;; very simple grammar:
+;;
+;; (with-peg-rules
+;; ((query (+ term) (eol))
+;; (term key ":" value (opt (+ [space]))
+;; `(k v -- (cons (intern k) v)))
+;; (key (substring (and (not ":") (+ [word]))))
+;; (value (or string-value number-value))
+;; (string-value (substring (+ [alpha])))
+;; (number-value (substring (+ [digit]))
+;; `(val -- (string-to-number val))))
+;; (peg-run (peg query)))
+;;
+;; This invocation of `peg-run' would parse this buffer text:
+;;
+;; name:Jane age:30
+;;
+;; And return this Elisp sexp:
+;;
+;; ((age . 30) (name . "Jane"))
+;;
+;; Note that, in complex grammars, some care must be taken to make
+;; sure that the number and type of values drawn from the stack always
+;; match those pushed. In the example above, both `string-value' and
+;; `number-value' push a single value to the stack. Since the `value'
+;; rule only includes these two sub-rules, any upstream rule that
+;; makes use of `value' can be confident it will always and only push
+;; a single value to the stack.
+;;
+;; Stack action forms are in a sense analogous to lambda forms: the
+;; symbols before the "--" are the equivalent of lambda arguments,
+;; while the forms after the "--" are return values. The difference
+;; being that a lambda form can only return a single value, while a
+;; stack action can push multiple values onto the stack. It's also
+;; perfectly valid to use `(-- FORM...)' or `(VAR... --)': the former
+;; pushes values to the stack without consuming any, and the latter
+;; pops values from the stack and discards them.
+;;
+;;;; Derived Operators:
+;;
+;; The following operators are implemented as combinations of
+;; primitive expressions:
+;;
+;; (substring E) ; Match E and push the substring for the matched region.
+;; (region E) ; Match E and push the start and end positions.
+;; (replace E RPL); Match E and replace the matched region with RPL.
+;; (list E) ; Match E and push a list of the items that E produced.
+;;
+;; See `peg-ex-parse-int' in `peg-tests.el' for further examples.
+;;
+;; Regexp equivalents:
+;;
+;; Here a some examples for regexps and how those could be written as pex.
+;; [Most are taken from rx.el]
+;;
+;; "^[a-z]*"
+;; (and (bol) (* [a-z]))
+;;
+;; "\n[^ \t]"
+;; (and "\n" (not [" \t"]) (any))
+;;
+;; "\\*\\*\\* EOOH \\*\\*\\*\n"
+;; "*** EOOH ***\n"
+;;
+;; "\\<\\(catch\\|finally\\)\\>[^_]"
+;; (and (bow) (or "catch" "finally") (eow) (not "_") (any))
+;;
+;; "[ \t\n]*:\\([^:]+\\|$\\)"
+;; (and (* [" \t\n"]) ":" (or (+ (not ":") (any)) (eol)))
+;;
+;; "^content-transfer-encoding:\\(\n?[\t ]\\)*quoted-printable\\(\n?[\t ]\\)*"
+;; (and (bol)
+;; "content-transfer-encoding:"
+;; (* (opt "\n") ["\t "])
+;; "quoted-printable"
+;; (* (opt "\n") ["\t "]))
+;;
+;; "\\$[I]d: [^ ]+ \\([^ ]+\\) "
+;; (and "$Id: " (+ (not " ") (any)) " " (+ (not " ") (any)) " ")
+;;
+;; "^;;\\s-*\n\\|^\n"
+;; (or (and (bol) ";;" (* (syntax-class whitespace)) "\n")
+;; (and (bol) "\n"))
+;;
+;; "\\\\\\\\\\[\\w+"
+;; (and "\\\\[" (+ (syntax-class word)))
+;;
+;; See ";;; Examples" in `peg-tests.el' for other examples.
+;;
+;;;; Rule argument and indirect calls:
+;;
+;; Rules can take arguments and those arguments can themselves be PEGs.
+;; For example:
+;;
+;; (define-peg-rule 2-or-more (peg)
+;; (funcall peg)
+;; (funcall peg)
+;; (* (funcall peg)))
+;;
+;; ... (peg-parse
+;; ...
+;; (2-or-more (peg foo))
+;; ...
+;; (2-or-more (peg bar))
+;; ...)
+;;
+;;;; References:
+;;
+;; [Ford] Bryan Ford. Parsing Expression Grammars: a Recognition-Based
+;; Syntactic Foundation. In POPL'04: Proceedings of the 31st ACM
+;; SIGPLAN-SIGACT symposium on Principles of Programming Languages,
+;; pages 111-122, New York, NY, USA, 2004. ACM Press.
+;; http://pdos.csail.mit.edu/~baford/packrat/
+;;
+;; [Baker] Baker, Henry G. "Pragmatic Parsing in Common Lisp". ACM Lisp
+;; Pointers 4(2), April--June 1991, pp. 3--15.
+;; http://home.pipeline.com/~hbaker1/Prag-Parse.html
+;;
+;; Roman Redziejowski does good PEG related research
+;; http://www.romanredz.se/pubs.htm
+
+;;;; Todo:
+
+;; - Fix the exponential blowup in `peg-translate-exp'.
+;; - Add a proper debug-spec for PEXs.
+
+;;; News:
+
+;; Since 1.0.1:
+;; - Use OClosures to represent PEG rules when available, and let cl-print
+;; display their source code.
+;; - New PEX form (with RULES PEX...).
+;; - Named rulesets.
+;; - You can pass arguments to rules.
+;; - New `funcall' rule to call rules indirectly (e.g. a peg you received
+;; as argument).
+
+;; Version 1.0:
+;; - New official entry points `peg` and `peg-run`.
+
+;;; Code:
+
+(eval-when-compile (require 'cl-lib))
+
+(defvar peg--actions nil
+ "Actions collected along the current parse.
+Used at runtime for backtracking. It's a list ((POS . THUNK)...).
+Each THUNK is executed at the corresponding POS. Thunks are
+executed in a postprocessing step, not during parsing.")
+
+(defvar peg--errors nil
+ "Data keeping track of the rightmost parse failure location.
+It's a pair (POSITION . EXPS ...). POSITION is the buffer position and
+EXPS is a list of rules/expressions that failed.")
+
+;;;; Main entry points
+
+(defmacro peg--when-fboundp (f &rest body)
+ (declare (indent 1) (debug (sexp body)))
+ (when (fboundp f)
+ (macroexp-progn body)))
+
+(peg--when-fboundp oclosure-define
+ (oclosure-define peg-function
+ "Parsing function built from PEG rule."
+ pexs)
+
+ (cl-defmethod cl-print-object ((peg peg-function) stream)
+ (princ "#f<peg " stream)
+ (let ((args (help-function-arglist peg 'preserve-names)))
+ (if args
+ (prin1 args stream)
+ (princ "()" stream)))
+ (princ " " stream)
+ (prin1 (peg-function--pexs peg) stream)
+ (princ ">" stream)))
+
+(defmacro peg--lambda (pexs args &rest body)
+ (declare (indent 2)
+ (debug (&define form lambda-list def-body)))
+ (if (fboundp 'oclosure-lambda)
+ `(oclosure-lambda (peg-function (pexs ,pexs)) ,args . ,body)
+ `(lambda ,args . ,body)))
+
+;; Sometimes (with-peg-rules ... (peg-run (peg ...))) is too
+;; longwinded for the task at hand, so `peg-parse' comes in handy.
+(defmacro peg-parse (&rest pexs)
+ "Match PEXS at point.
+PEXS is a sequence of PEG expressions, implicitly combined with `and'.
+Returns STACK if the match succeed and signals an error on failure,
+moving point along the way.
+PEXS can also be a list of PEG rules, in which case the first rule is used."
+ (if (and (consp (car pexs))
+ (symbolp (caar pexs))
+ (not (ignore-errors (peg-normalize (car pexs)))))
+ ;; `pexs' is a list of rules: use the first rule as entry point.
+ `(with-peg-rules ,pexs (peg-run (peg ,(caar pexs)) #'peg-signal-failure))
+ `(peg-run (peg ,@pexs) #'peg-signal-failure)))
+
+(defmacro peg (&rest pexs)
+ "Return a PEG-matcher that matches PEXS."
+ (pcase (peg-normalize `(and . ,pexs))
+ (`(call ,name) `#',(peg--rule-id name)) ;Optimize this case by η-reduction!
+ (exp `(peg--lambda ',pexs () ,(peg-translate-exp exp)))))
+
+;; There are several "infos we want to return" when parsing a given PEX:
+;; 1- We want to return the success/failure of the parse.
+;; 2- We want to return the data of the successful parse (the stack).
+;; 3- We want to return the diagnostic of the failures.
+;; 4- We want to perform the actions (upon parse success)!
+;; `peg-parse' used an error signal to encode the (1) boolean, which
+;; lets it return all the info conveniently but the error signal was sometimes
+;; inconvenient. Other times one wants to just know (1) maybe without even
+;; performing (4).
+;; `peg-run' lets you choose all that, and by default gives you
+;; (1) as a simple boolean, while also doing (2), and (4).
+
+(defun peg-run (peg-matcher &optional failure-function success-function)
+ "Parse with PEG-MATCHER at point and run the success/failure function.
+If a match was found, move to the end of the match and call SUCCESS-FUNCTION
+with one argument: a function which will perform all the actions collected
+during the parse and then return the resulting stack (or t if empty).
+If no match was found, move to the (rightmost) point of parse failure and call
+FAILURE-FUNCTION with one argument, which is a list of PEG expressions that
+failed at this point.
+SUCCESS-FUNCTION defaults to `funcall' and FAILURE-FUNCTION
+defaults to `ignore'."
+ (let ((peg--actions '()) (peg--errors '(-1)))
+ (if (funcall peg-matcher)
+ ;; Found a parse: run the actions collected along the way.
+ (funcall (or success-function #'funcall)
+ (lambda ()
+ (save-excursion (peg-postprocess peg--actions))))
+ (goto-char (car peg--errors))
+ (when failure-function
+ (funcall failure-function (peg-merge-errors (cdr peg--errors)))))))
+
+(defmacro define-peg-rule (name args &rest pexs)
+ "Define PEG rule NAME as equivalent to PEXS.
+The PEG expressions in PEXS are implicitly combined with the
+sequencing `and' operator of PEG grammars."
+ (declare (indent 1))
+ (let ((inline nil))
+ (while (keywordp (car pexs))
+ (pcase (pop pexs)
+ (:inline (setq inline (car pexs))))
+ (setq pexs (cdr pexs)))
+ (let ((id (peg--rule-id name))
+ (exp (peg-normalize `(and . ,pexs))))
+ `(progn
+ (defalias ',id
+ (peg--lambda ',pexs ,args
+ ,(if inline
+ ;; Short-circuit to peg--translate in order to skip
+ ;; the extra failure-recording of `peg-translate-exp'.
+ ;; It also skips the cycle detection of
+ ;; `peg--translate-rule-body', which is not the main
+ ;; purpose but we can live with it.
+ (apply #'peg--translate exp)
+ (peg--translate-rule-body name exp))))
+ (eval-and-compile
+ ;; FIXME: We shouldn't need this any more since the info is now
+ ;; stored in the function, but sadly we need to find a name's EXP
+ ;; during compilation (i.e. before the `defalias' is executed)
+ ;; as part of cycle-detection!
+ (put ',id 'peg--rule-definition ',exp)
+ ,@(when inline
+ ;; FIXME: Copied from `defsubst'.
+ `(;; Never native-compile defsubsts as we need the byte
+ ;; definition in `byte-compile-unfold-bcf' to perform the
+ ;; inlining (Bug#42664, Bug#43280, Bug#44209).
+ ,(byte-run--set-speed id nil -1)
+ (put ',id 'byte-optimizer #'byte-compile-inline-expand))))))))
+
+(defmacro define-peg-ruleset (name &rest rules)
+ "Define a set of PEG rules for later use, e.g., in `with-peg-rules'."
+ (declare (indent 1))
+ (let ((defs ())
+ (aliases ()))
+ (dolist (rule rules)
+ (let* ((rname (car rule))
+ (full-rname (format "%s %s" name rname)))
+ (push `(define-peg-rule ,full-rname . ,(cdr rule)) defs)
+ (push `(,(peg--rule-id rname) #',(peg--rule-id full-rname)) aliases)))
+ `(cl-flet ,aliases
+ ,@defs
+ (eval-and-compile (put ',name 'peg--rules ',aliases)))))
+
+(defmacro with-peg-rules (rules &rest body)
+ "Make PEG rules RULES available within the scope of BODY.
+RULES is a list of rules of the form (NAME . PEXS), where PEXS is a sequence
+of PEG expressions, implicitly combined with `and'.
+RULES can also contain symbols in which case these must name
+rulesets defined previously with `define-peg-ruleset'."
+ (declare (indent 1) (debug (sexp form))) ;FIXME: `sexp' is not good enough!
+ (let* ((rulesets nil)
+ (rules
+ ;; First, macroexpand the rules.
+ (delq nil
+ (mapcar (lambda (rule)
+ (if (symbolp rule)
+ (progn (push rule rulesets) nil)
+ (cons (car rule) (peg-normalize `(and . ,(cdr rule))))))
+ rules)))
+ (ctx (assq :peg-rules macroexpand-all-environment)))
+ (macroexpand-all
+ `(cl-labels
+ ,(mapcar (lambda (rule)
+ ;; FIXME: Use `peg--lambda' as well.
+ `(,(peg--rule-id (car rule))
+ ()
+ ,(peg--translate-rule-body (car rule) (cdr rule))))
+ rules)
+ ,@body)
+ `((:peg-rules ,@(append rules (cdr ctx)))
+ ,@macroexpand-all-environment))))
+
+;;;;; Old entry points
+
+(defmacro peg-parse-exp (exp)
+ "Match the parsing expression EXP at point."
+ (declare (obsolete peg-parse "peg-0.9"))
+ `(peg-run (peg ,exp)))
+
+;;;; The actual implementation
+
+(defun peg--lookup-rule (name)
+ (or (cdr (assq name (cdr (assq :peg-rules macroexpand-all-environment))))
+ ;; With `peg-function' objects, we can recover the PEG from which it was
+ ;; defined, but this info is not yet available at compile-time. :-(
+ ;;(let ((id (peg--rule-id name)))
+ ;; (peg-function--pexs (symbol-function id)))
+ (get (peg--rule-id name) 'peg--rule-definition)))
+
+(defun peg--rule-id (name)
+ (intern (format "peg-rule %s" name)))
+
+(define-error 'peg-search-failed "Parse error at %d (expecting %S)")
+
+(defun peg-signal-failure (failures)
+ (signal 'peg-search-failed (list (point) failures)))
+
+(defun peg-parse-at-point (peg-matcher)
+ "Parse text at point according to the PEG rule PEG-MATCHER."
+ (declare (obsolete peg-run "peg-1.0"))
+ (peg-run peg-matcher
+ #'peg-signal-failure
+ (lambda (f) (let ((r (funcall f))) (if (listp r) r)))))
+
+;; Internally we use a regularized syntax, e.g. we only have binary OR
+;; nodes. Regularized nodes are lists of the form (OP ARGS...).
+(cl-defgeneric peg-normalize (exp)
+ "Return a \"normalized\" form of EXP."
+ (error "Invalid parsing expression: %S" exp))
+
+(cl-defmethod peg-normalize ((exp string))
+ (let ((len (length exp)))
+ (cond ((zerop len) '(guard t))
+ ((= len 1) `(char ,(aref exp 0)))
+ (t `(str ,exp)))))
+
+(cl-defmethod peg-normalize ((exp symbol))
+ ;; (peg--lookup-rule exp)
+ `(call ,exp))
+
+(cl-defmethod peg-normalize ((exp vector))
+ (peg-normalize `(set . ,(append exp '()))))
+
+(cl-defmethod peg-normalize ((exp cons))
+ (apply #'peg--macroexpand exp))
+
+(defconst peg-leaf-types '(any call action char range str set
+ guard syntax-class = funcall))
+
+(cl-defgeneric peg--macroexpand (head &rest args)
+ (cond
+ ((memq head peg-leaf-types) (cons head args))
+ (t `(call ,head ,@args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql or)) &rest args)
+ (cond ((null args) '(guard nil))
+ ((null (cdr args)) (peg-normalize (car args)))
+ (t `(or ,(peg-normalize (car args))
+ ,(peg-normalize `(or . ,(cdr args)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql and)) &rest args)
+ (cond ((null args) '(guard t))
+ ((null (cdr args)) (peg-normalize (car args)))
+ (t `(and ,(peg-normalize (car args))
+ ,(peg-normalize `(and . ,(cdr args)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql *)) &rest args)
+ `(* ,(peg-normalize `(and . ,args))))
+
+;; FIXME: this duplicates code; could use some loop to avoid that
+(cl-defmethod peg--macroexpand ((_ (eql +)) &rest args)
+ (let ((e (peg-normalize `(and . ,args))))
+ `(and ,e (* ,e))))
+
+(cl-defmethod peg--macroexpand ((_ (eql opt)) &rest args)
+ (let ((e (peg-normalize `(and . ,args))))
+ `(or ,e (guard t))))
+
+(cl-defmethod peg--macroexpand ((_ (eql if)) &rest args)
+ `(if ,(peg-normalize `(and . ,args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql not)) &rest args)
+ `(not ,(peg-normalize `(and . ,args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql \`)) form)
+ (peg-normalize `(stack-action ,form)))
+
+(cl-defmethod peg--macroexpand ((_ (eql stack-action)) form)
+ (unless (member '-- form)
+ (error "Malformed stack action: %S" form))
+ (let ((args (cdr (member '-- (reverse form))))
+ (values (cdr (member '-- form))))
+ (let ((form `(let ,(mapcar (lambda (var) `(,var (pop peg--stack))) args)
+ ,@(mapcar (lambda (val) `(push ,val peg--stack)) values))))
+ `(action ,form))))
+
+(defvar peg-char-classes
+ '(ascii alnum alpha blank cntrl digit graph lower multibyte nonascii print
+ punct space unibyte upper word xdigit))
+
+(cl-defmethod peg--macroexpand ((_ (eql set)) &rest specs)
+ (cond ((null specs) '(guard nil))
+ ((and (null (cdr specs))
+ (let ((range (peg-range-designator (car specs))))
+ (and range `(range ,(car range) ,(cdr range))))))
+ (t
+ (let ((chars '()) (ranges '()) (classes '()))
+ (while specs
+ (let* ((spec (pop specs))
+ (range (peg-range-designator spec)))
+ (cond (range
+ (push range ranges))
+ ((peg-characterp spec)
+ (push spec chars))
+ ((stringp spec)
+ (setq chars (append (reverse (append spec ())) chars)))
+ ((memq spec peg-char-classes)
+ (push spec classes))
+ (t (error "Invalid set specifier: %S" spec)))))
+ (setq ranges (reverse ranges))
+ (setq chars (delete-dups (reverse chars)))
+ (setq classes (reverse classes))
+ (cond ((and (null ranges)
+ (null classes)
+ (cond ((null chars) '(guard nil))
+ ((null (cdr chars)) `(char ,(car chars))))))
+ (t `(set ,ranges ,chars ,classes)))))))
+
+(defun peg-range-designator (x)
+ (and (symbolp x)
+ (let ((str (symbol-name x)))
+ (and (= (length str) 3)
+ (eq (aref str 1) ?-)
+ (< (aref str 0) (aref str 2))
+ (cons (aref str 0) (aref str 2))))))
+
+;; characterp is new in Emacs 23.
+(defun peg-characterp (x)
+ (if (fboundp 'characterp)
+ (characterp x)
+ (integerp x)))
+
+(cl-defmethod peg--macroexpand ((_ (eql list)) &rest args)
+ (peg-normalize
+ (let ((marker (make-symbol "magic-marker")))
+ `(and (stack-action (-- ',marker))
+ ,@args
+ (stack-action (--
+ (let ((l '()))
+ (while
+ (let ((e (pop peg--stack)))
+ (cond ((eq e ',marker) nil)
+ ((null peg--stack)
+ (error "No marker on stack"))
+ (t (push e l) t))))
+ l)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql substring)) &rest args)
+ (peg-normalize
+ `(and `(-- (point))
+ ,@args
+ `(start -- (buffer-substring-no-properties start (point))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql region)) &rest args)
+ (peg-normalize
+ `(and `(-- (point))
+ ,@args
+ `(-- (point)))))
+
+(cl-defmethod peg--macroexpand ((_ (eql replace)) pe replacement)
+ (peg-normalize
+ `(and (stack-action (-- (point)))
+ ,pe
+ (stack-action (start -- (progn
+ (delete-region start (point))
+ (insert-before-markers ,replacement))))
+ (stack-action (_ --)))))
+
+(cl-defmethod peg--macroexpand ((_ (eql quote)) _form)
+ (error "quote is reserved for future use"))
+
+(cl-defgeneric peg--translate (head &rest args)
+ (error "No translator for: %S" (cons head args)))
+
+(defun peg--translate-rule-body (name exp)
+ (let ((msg (condition-case err
+ (progn (peg-detect-cycles exp (list name)) nil)
+ (error (error-message-string err))))
+ (code (peg-translate-exp exp)))
+ (cond
+ ((null msg) code)
+ ((fboundp 'macroexp--warn-and-return)
+ (macroexp--warn-and-return msg code))
+ (t
+ (message "%s" msg)
+ code))))
+
+;; This is the main translation function.
+(defun peg-translate-exp (exp)
+ "Return the ELisp code to match the PE EXP."
+ ;; FIXME: This expansion basically duplicates `exp' in the output, which is
+ ;; a serious problem because it's done recursively, so it makes the output
+ ;; code's size exponentially larger than the input!
+ `(or ,(apply #'peg--translate exp)
+ (peg--record-failure ',exp))) ; for error reporting
+
+(define-obsolete-function-alias 'peg-record-failure
+ #'peg--record-failure "peg-1.0")
+(defun peg--record-failure (exp)
+ (cond ((= (point) (car peg--errors))
+ (setcdr peg--errors (cons exp (cdr peg--errors))))
+ ((> (point) (car peg--errors))
+ (setq peg--errors (list (point) exp))))
+ nil)
+
+(cl-defmethod peg--translate ((_ (eql and)) e1 e2)
+ `(and ,(peg-translate-exp e1)
+ ,(peg-translate-exp e2)))
+
+;; Choicepoints are used for backtracking. At a choicepoint we save
+;; enough state, so that we can continue from there if needed.
+(defun peg--choicepoint-moved-p (choicepoint)
+ `(/= ,(car choicepoint) (point)))
+
+(defun peg--choicepoint-restore (choicepoint)
+ `(progn
+ (goto-char ,(car choicepoint))
+ (setq peg--actions ,(cdr choicepoint))))
+
+(defmacro peg--with-choicepoint (var &rest body)
+ (declare (indent 1) (debug (symbolp form)))
+ `(let ((,var (cons (make-symbol "point") (make-symbol "actions"))))
+ `(let ((,(car ,var) (point))
+ (,(cdr ,var) peg--actions))
+ ,@(list ,@body))))
+
+(cl-defmethod peg--translate ((_ (eql or)) e1 e2)
+ (peg--with-choicepoint cp
+ `(or ,(peg-translate-exp e1)
+ (,@(peg--choicepoint-restore cp)
+ ,(peg-translate-exp e2)))))
+
+(cl-defmethod peg--translate ((_ (eql with)) rules &rest exps)
+ `(with-peg-rules ,rules ,(peg--translate `(and . ,exps))))
+
+(cl-defmethod peg--translate ((_ (eql guard)) exp) exp)
+
+(defvar peg-syntax-classes
+ '((whitespace ?-) (word ?w) (symbol ?s) (punctuation ?.)
+ (open ?\() (close ?\)) (string ?\") (escape ?\\) (charquote ?/)
+ (math ?$) (prefix ?') (comment ?<) (endcomment ?>)
+ (comment-fence ?!) (string-fence ?|)))
+
+(cl-defmethod peg--translate ((_ (eql syntax-class)) class)
+ (let ((probe (assoc class peg-syntax-classes)))
+ (cond (probe `(when (looking-at ,(format "\\s%c" (cadr probe)))
+ (forward-char)
+ t))
+ (t (error "Invalid syntax class: %S\nMust be one of: %s" class
+ (mapcar #'car peg-syntax-classes))))))
+
+(cl-defmethod peg--translate ((_ (eql =)) string)
+ `(let ((str ,string))
+ (when (zerop (length str))
+ (error "Empty strings not allowed for ="))
+ (search-forward str (+ (point) (length str)) t)))
+
+(cl-defmethod peg--translate ((_ (eql *)) e)
+ `(progn (while ,(peg--with-choicepoint cp
+ `(if ,(peg-translate-exp e)
+ ;; Just as regexps do for the `*' operator,
+ ;; we allow the body of `*' loops to match
+ ;; the empty string, but we don't repeat the loop if
+ ;; we haven't moved, to avoid inf-loops.
+ ,(peg--choicepoint-moved-p cp)
+ ,(peg--choicepoint-restore cp)
+ nil)))
+ t))
+
+(cl-defmethod peg--translate ((_ (eql if)) e)
+ (peg--with-choicepoint cp
+ `(when ,(peg-translate-exp e)
+ ,(peg--choicepoint-restore cp)
+ t)))
+
+(cl-defmethod peg--translate ((_ (eql not)) e)
+ (peg--with-choicepoint cp
+ `(unless ,(peg-translate-exp e)
+ ,(peg--choicepoint-restore cp)
+ t)))
+
+(cl-defmethod peg--translate ((_ (eql any)) )
+ '(when (not (eobp))
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql char)) c)
+ `(when (eq (char-after) ',c)
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql set)) ranges chars classes)
+ `(when (looking-at ',(peg-make-charset-regexp ranges chars classes))
+ (forward-char)
+ t))
+
+(defun peg-make-charset-regexp (ranges chars classes)
+ (when (and (not ranges) (not classes) (<= (length chars) 1))
+ (error "Bug"))
+ (let ((rbracket (member ?\] chars))
+ (minus (member ?- chars))
+ (hat (member ?^ chars)))
+ (dolist (c '(?\] ?- ?^))
+ (setq chars (remove c chars)))
+ (format "[%s%s%s%s%s%s]"
+ (if rbracket "]" "")
+ (if minus "-" "")
+ (mapconcat (lambda (x) (format "%c-%c" (car x) (cdr x))) ranges "")
+ (mapconcat (lambda (c) (format "[:%s:]" c)) classes "")
+ (mapconcat (lambda (c) (format "%c" c)) chars "")
+ (if hat "^" ""))))
+
+(cl-defmethod peg--translate ((_ (eql range)) from to)
+ `(when (and (char-after)
+ (<= ',from (char-after))
+ (<= (char-after) ',to))
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql str)) str)
+ `(when (looking-at ',(regexp-quote str))
+ (goto-char (match-end 0))
+ t))
+
+(cl-defmethod peg--translate ((_ (eql call)) name &rest args)
+ `(,(peg--rule-id name) ,@args))
+
+(cl-defmethod peg--translate ((_ (eql funcall)) exp &rest args)
+ `(funcall ,exp ,@args))
+
+(cl-defmethod peg--translate ((_ (eql action)) form)
+ `(progn
+ (push (cons (point) (lambda () ,form)) peg--actions)
+ t))
+
+(defvar peg--stack nil)
+(defun peg-postprocess (actions)
+ "Execute \"actions\"."
+ (let ((peg--stack '())
+ (forw-actions ()))
+ (pcase-dolist (`(,pos . ,thunk) actions)
+ (push (cons (copy-marker pos) thunk) forw-actions))
+ (pcase-dolist (`(,pos . ,thunk) forw-actions)
+ (goto-char pos)
+ (funcall thunk))
+ (or peg--stack t)))
+
+;; Left recursion is presumably a common mistake when using PEGs.
+;; Here we try to detect such mistakes. Essentially we traverse the
+;; graph as long as we can without consuming input. When we find a
+;; recursive call we signal an error.
+
+(defun peg-detect-cycles (exp path)
+ "Signal an error on a cycle.
+Otherwise traverse EXP recursively and return T if EXP can match
+without consuming input. Return nil if EXP definitely consumes
+input. PATH is the list of rules that we have visited so far."
+ (apply #'peg--detect-cycles path exp))
+
+(cl-defgeneric peg--detect-cycles (head _path &rest args)
+ (error "No detect-cycle method for: %S" (cons head args)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql call)) name)
+ (if (member name path)
+ (error "Possible left recursion: %s"
+ (mapconcat (lambda (x) (format "%s" x))
+ (reverse (cons name path)) " -> "))
+ (let ((exp (peg--lookup-rule name)))
+ (if (null exp)
+ ;; If there's no rule by that name, either we'll fail at
+ ;; run-time or it will be defined later. In any case, at this
+ ;; point there's no evidence of a cycle, and if a cycle appears
+ ;; later we'll hopefully catch it when the rule gets defined.
+ ;; FIXME: In practice, if `name' is part of the cycle, we will
+ ;; indeed detect it when it gets defined, but OTOH if `name'
+ ;; is not part of a cycle but it *enables* a cycle because
+ ;; it matches the empty string (i.e. we should have returned t
+ ;; here), then we may not catch the problem at all :-(
+ nil
+ (peg-detect-cycles exp (cons name path))))))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql and)) e1 e2)
+ (and (peg-detect-cycles e1 path)
+ (peg-detect-cycles e2 path)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql or)) e1 e2)
+ (or (peg-detect-cycles e1 path)
+ (peg-detect-cycles e2 path)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql *)) e)
+ (peg-detect-cycles e path)
+ t)
+
+(cl-defmethod peg--detect-cycles (path (_ (eql if)) e)
+ (peg-unary-nullable e path))
+(cl-defmethod peg--detect-cycles (path (_ (eql not)) e)
+ (peg-unary-nullable e path))
+
+(defun peg-unary-nullable (exp path)
+ (peg-detect-cycles exp path)
+ t)
+
+(cl-defmethod peg--detect-cycles (_path (_ (eql any))) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql char)) _c) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql set)) _r _c _k) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql range)) _c1 _c2) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql str)) s) (equal s ""))
+(cl-defmethod peg--detect-cycles (_path (_ (eql guard)) _e) t)
+(cl-defmethod peg--detect-cycles (_path (_ (eql =)) _s) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql syntax-class)) _n) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql action)) _form) t)
+
+(defun peg-merge-errors (exps)
+ "Build a more readable error message out of failed expression."
+ (let ((merged '()))
+ (dolist (exp exps)
+ (setq merged (peg-merge-error exp merged)))
+ merged))
+
+(defun peg-merge-error (exp merged)
+ (apply #'peg--merge-error merged exp))
+
+(cl-defgeneric peg--merge-error (_merged head &rest args)
+ (error "No merge-error method for: %S" (cons head args)))
+
+(cl-defmethod peg--merge-error (merged (_ (eql or)) e1 e2)
+ (peg-merge-error e2 (peg-merge-error e1 merged)))
+
+(cl-defmethod peg--merge-error (merged (_ (eql and)) e1 _e2)
+ ;; FIXME: Why is `e2' not used?
+ (peg-merge-error e1 merged))
+
+(cl-defmethod peg--merge-error (merged (_ (eql str)) str)
+ ;;(add-to-list 'merged str)
+ (cl-adjoin str merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql call)) rule)
+ ;; (add-to-list 'merged rule)
+ (cl-adjoin rule merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql char)) char)
+ ;; (add-to-list 'merged (string char))
+ (cl-adjoin (string char) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql set)) r c k)
+ ;; (add-to-list 'merged (peg-make-charset-regexp r c k))
+ (cl-adjoin (peg-make-charset-regexp r c k) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql range)) from to)
+ ;; (add-to-list 'merged (format "[%c-%c]" from to))
+ (cl-adjoin (format "[%c-%c]" from to) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql *)) exp)
+ (peg-merge-error exp merged))
+
+(cl-defmethod peg--merge-error (merged (_ (eql any)))
+ ;; (add-to-list 'merged '(any))
+ (cl-adjoin '(any) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql not)) x)
+ ;; (add-to-list 'merged `(not ,x))
+ (cl-adjoin `(not ,x) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql action)) _action) merged)
+(cl-defmethod peg--merge-error (merged (_ (eql null))) merged)
+
+(provide 'peg)
+(require 'peg)
+
+(define-peg-rule null () :inline t (guard t))
+(define-peg-rule fail () :inline t (guard nil))
+(define-peg-rule bob () :inline t (guard (bobp)))
+(define-peg-rule eob () :inline t (guard (eobp)))
+(define-peg-rule bol () :inline t (guard (bolp)))
+(define-peg-rule eol () :inline t (guard (eolp)))
+(define-peg-rule bow () :inline t (guard (looking-at "\\<")))
+(define-peg-rule eow () :inline t (guard (looking-at "\\>")))
+(define-peg-rule bos () :inline t (guard (looking-at "\\_<")))
+(define-peg-rule eos () :inline t (guard (looking-at "\\_>")))
+
+;;; peg.el ends here
diff --git a/test/lisp/peg-tests.el b/test/lisp/peg-tests.el
new file mode 100644
index 00000000000..864e09b4200
--- /dev/null
+++ b/test/lisp/peg-tests.el
@@ -0,0 +1,367 @@
+;;; peg-tests.el --- Tests of PEG parsers -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+
+;; This program is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Tests and examples, that used to live in peg.el wrapped inside an `eval'.
+
+;;; Code:
+
+(require 'peg)
+(require 'ert)
+
+;;; Tests:
+
+(defmacro peg-parse-string (pex string &optional noerror)
+ "Parse STRING according to PEX.
+If NOERROR is non-nil, push nil resp. t if the parse failed
+resp. succeeded instead of signaling an error."
+ (let ((oldstyle (consp (car-safe pex)))) ;PEX is really a list of rules.
+ `(with-temp-buffer
+ (insert ,string)
+ (goto-char (point-min))
+ ,(if oldstyle
+ `(with-peg-rules ,pex
+ (peg-run (peg ,(caar pex))
+ ,(unless noerror '#'peg-signal-failure)))
+ `(peg-run (peg ,pex)
+ ,(unless noerror '#'peg-signal-failure))))))
+
+(define-peg-rule peg-test-natural ()
+ [0-9] (* [0-9]))
+
+(ert-deftest peg-test ()
+ (should (peg-parse-string peg-test-natural "99 bottles" t))
+ (should (peg-parse-string ((s "a")) "a" t))
+ (should (not (peg-parse-string ((s "a")) "b" t)))
+ (should (peg-parse-string ((s (not "a"))) "b" t))
+ (should (not (peg-parse-string ((s (not "a"))) "a" t)))
+ (should (peg-parse-string ((s (if "a"))) "a" t))
+ (should (not (peg-parse-string ((s (if "a"))) "b" t)))
+ (should (peg-parse-string ((s "ab")) "ab" t))
+ (should (not (peg-parse-string ((s "ab")) "ba" t)))
+ (should (not (peg-parse-string ((s "ab")) "a" t)))
+ (should (peg-parse-string ((s (range ?0 ?9))) "0" t))
+ (should (not (peg-parse-string ((s (range ?0 ?9))) "a" t)))
+ (should (peg-parse-string ((s [0-9])) "0" t))
+ (should (not (peg-parse-string ((s [0-9])) "a" t)))
+ (should (not (peg-parse-string ((s [0-9])) "" t)))
+ (should (peg-parse-string ((s (any))) "0" t))
+ (should (not (peg-parse-string ((s (any))) "" t)))
+ (should (peg-parse-string ((s (eob))) "" t))
+ (should (peg-parse-string ((s (not (eob)))) "a" t))
+ (should (peg-parse-string ((s (or "a" "b"))) "a" t))
+ (should (peg-parse-string ((s (or "a" "b"))) "b" t))
+ (should (not (peg-parse-string ((s (or "a" "b"))) "c" t)))
+ (should (peg-parse-string (and "a" "b") "ab" t))
+ (should (peg-parse-string ((s (and "a" "b"))) "abc" t))
+ (should (not (peg-parse-string (and "a" "b") "ba" t)))
+ (should (peg-parse-string ((s (and "a" "b" "c"))) "abc" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "b" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "ab" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "aaab" t))
+ (should (not (peg-parse-string ((s (* "a") "b" (eob))) "abc" t)))
+ (should (peg-parse-string ((s "")) "abc" t))
+ (should (peg-parse-string ((s "" (eob))) "" t))
+ (should (peg-parse-string ((s (opt "a") "b")) "abc" t))
+ (should (peg-parse-string ((s (opt "a") "b")) "bc" t))
+ (should (not (peg-parse-string ((s (or))) "ab" t)))
+ (should (peg-parse-string ((s (and))) "ab" t))
+ (should (peg-parse-string ((s (and))) "" t))
+ (should (peg-parse-string ((s ["^"])) "^" t))
+ (should (peg-parse-string ((s ["^a"])) "a" t))
+ (should (peg-parse-string ["-"] "-" t))
+ (should (peg-parse-string ((s ["]-"])) "]" t))
+ (should (peg-parse-string ((s ["^]"])) "^" t))
+ (should (peg-parse-string ((s [alpha])) "z" t))
+ (should (not (peg-parse-string ((s [alpha])) "0" t)))
+ (should (not (peg-parse-string ((s [alpha])) "" t)))
+ (should (not (peg-parse-string ((s ["][:alpha:]"])) "z" t)))
+ (should (peg-parse-string ((s (bob))) "" t))
+ (should (peg-parse-string ((s (bos))) "x" t))
+ (should (not (peg-parse-string ((s (bos))) " x" t)))
+ (should (peg-parse-string ((s "x" (eos))) "x" t))
+ (should (peg-parse-string ((s (syntax-class whitespace))) " " t))
+ (should (peg-parse-string ((s (= "foo"))) "foo" t))
+ (should (let ((f "foo")) (peg-parse-string ((s (= f))) "foo" t)))
+ (should (not (peg-parse-string ((s (= "foo"))) "xfoo" t)))
+ (should (equal (peg-parse-string ((s `(-- 1 2))) "") '(2 1)))
+ (should (equal (peg-parse-string ((s `(-- 1 2) `(a b -- a b))) "") '(2 1)))
+ (should (equal (peg-parse-string ((s (or (and (any) s)
+ (substring [0-9]))))
+ "ab0cd1ef2gh")
+ '("2")))
+ ;; The PEG rule `other' doesn't exist, which will cause a byte-compiler
+ ;; warning, but not an error at run time because the rule is not actually
+ ;; used in this particular case.
+ (should (equal (peg-parse-string ((s (substring (or "a" other)))
+ ;; Unused left-recursive rule, should
+ ;; cause a byte-compiler warning.
+ (r (* "a") r))
+ "af")
+ '("a")))
+ (should (equal (peg-parse-string ((s (list x y))
+ (x `(-- 1))
+ (y `(-- 2)))
+ "")
+ '((1 2))))
+ (should (equal (peg-parse-string ((s (list (* x)))
+ (x "" `(-- 'x)))
+ "xxx")
+ ;; The empty loop body should be matched once!
+ '((x))))
+ (should (equal (peg-parse-string ((s (list (* x)))
+ (x "x" `(-- 'x)))
+ "xxx")
+ '((x x x))))
+ (should (equal (peg-parse-string ((s (region (* x)))
+ (x "x" `(-- 'x)))
+ "xxx")
+ ;; FIXME: Since string positions start at 0, this should
+ ;; really be '(3 x x x 0) !!
+ '(4 x x x 1)))
+ (should (equal (peg-parse-string ((s (region (list (* x))))
+ (x "x" `(-- 'x 'y)))
+ "xxx")
+ '(4 (x y x y x y) 1)))
+ (should (equal (with-temp-buffer
+ (save-excursion (insert "abcdef"))
+ (list
+ (peg-run (peg "a"
+ (replace "bc" "x")
+ (replace "de" "y")
+ "f"))
+ (buffer-string)))
+ '(t "axyf")))
+ (with-temp-buffer
+ (insert "toro")
+ (goto-char (point-min))
+ (should (peg-run (peg "to")))
+ (should-not (peg-run (peg "to")))
+ (should (peg-run (peg "ro")))
+ (should (eobp)))
+ (with-temp-buffer
+ (insert " ")
+ (goto-char (point-min))
+ (peg-run (peg (+ (syntax-class whitespace))))
+ (should (eobp)))
+ )
+
+;;; Examples:
+
+;; peg-ex-recognize-int recognizes integers. An integer begins with a
+;; optional sign, then follows one or more digits. Digits are all
+;; characters from 0 to 9.
+;;
+;; Notes:
+;; 1) "" matches the empty sequence, i.e. matches without consuming
+;; input.
+;; 2) [0-9] is the character range from 0 to 9. This can also be
+;; written as (range ?0 ?9). Note that 0-9 is a symbol.
+(defun peg-ex-recognize-int ()
+ (with-peg-rules ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+ (peg-run (peg number))))
+
+;; peg-ex-parse-int recognizes integers and computes the corresponding
+;; value. The grammar is the same as for `peg-ex-recognize-int'
+;; augmented with parsing actions. Unfortunaletly, the actions add
+;; quite a bit of clutter.
+;;
+;; The actions for the sign rule push -1 on the stack for a minus sign
+;; and 1 for plus or no sign.
+;;
+;; The action for the digit rule pushes the value for a single digit.
+;;
+;; The action `(a b -- (+ (* a 10) b)), takes two items from the stack
+;; and pushes the first digit times 10 added to the second digit.
+;;
+;; The action `(sign val -- (* sign val)), multiplies val with the
+;; sign (1 or -1).
+(defun peg-ex-parse-int ()
+ (with-peg-rules ((number sign digit (* digit
+ `(a b -- (+ (* a 10) b)))
+ `(sign val -- (* sign val)))
+ (sign (or (and "+" `(-- 1))
+ (and "-" `(-- -1))
+ (and "" `(-- 1))))
+ (digit [0-9] `(-- (- (char-before) ?0))))
+ (peg-run (peg number))))
+
+;; Put point after the ) and press C-x C-e
+;; (peg-ex-parse-int)-234234
+
+;; Parse arithmetic expressions and compute the result as side effect.
+(defun peg-ex-arith ()
+ (peg-parse
+ (expr _ sum eol)
+ (sum product (* (or (and "+" _ product `(a b -- (+ a b)))
+ (and "-" _ product `(a b -- (- a b))))))
+ (product value (* (or (and "*" _ value `(a b -- (* a b)))
+ (and "/" _ value `(a b -- (/ a b))))))
+ (value (or (and (substring number) `(string -- (string-to-number string)))
+ (and "(" _ sum ")" _)))
+ (number (+ [0-9]) _)
+ (_ (* [" \t"]))
+ (eol (or "\n" "\r\n" "\r"))))
+
+;; (peg-ex-arith) 1 + 2 * 3 * (4 + 5)
+;; (peg-ex-arith) 1 + 2 ^ 3 * (4 + 5) ; fails to parse
+
+;; Parse URI according to RFC 2396.
+(defun peg-ex-uri ()
+ (peg-parse
+ (URI-reference (or absoluteURI relativeURI)
+ (or (and "#" (substring fragment))
+ `(-- nil))
+ `(scheme user host port path query fragment --
+ (list :scheme scheme :user user
+ :host host :port port
+ :path path :query query
+ :fragment fragment)))
+ (absoluteURI (substring scheme) ":" (or hier-part opaque-part))
+ (hier-part ;(-- user host port path query)
+ (or net-path
+ (and `(-- nil nil nil)
+ abs-path))
+ (or (and "?" (substring query))
+ `(-- nil)))
+ (net-path "//" authority (or abs-path `(-- nil)))
+ (abs-path "/" path-segments)
+ (path-segments segment (list (* "/" segment)) `(s l -- (cons s l)))
+ (segment (substring (* pchar) (* ";" param)))
+ (param (* pchar))
+ (pchar (or unreserved escaped [":@&=+$,"]))
+ (query (* uric))
+ (fragment (* uric))
+ (relativeURI (or net-path abs-path rel-path) (opt "?" query))
+ (rel-path rel-segment (opt abs-path))
+ (rel-segment (+ unreserved escaped [";@&=+$,"]))
+ (authority (or server reg-name))
+ (server (or (and (or (and (substring userinfo) "@")
+ `(-- nil))
+ hostport)
+ `(-- nil nil nil)))
+ (userinfo (* (or unreserved escaped [";:&=+$,"])))
+ (hostport (substring host) (or (and ":" (substring port))
+ `(-- nil)))
+ (host (or hostname ipv4address))
+ (hostname (* domainlabel ".") toplabel (opt "."))
+ (domainlabel alphanum
+ (opt (* (or alphanum "-") (if alphanum))
+ alphanum))
+ (toplabel alpha
+ (* (or alphanum "-") (if alphanum))
+ alphanum)
+ (ipv4address (+ digit) "." (+ digit) "." (+ digit) "." (+ digit))
+ (port (* digit))
+ (scheme alpha (* (or alpha digit ["+-."])))
+ (reg-name (or unreserved escaped ["$,;:@&=+"]))
+ (opaque-part uric-no-slash (* uric))
+ (uric (or reserved unreserved escaped))
+ (uric-no-slash (or unreserved escaped [";?:@&=+$,"]))
+ (reserved (set ";/?:@&=+$,"))
+ (unreserved (or alphanum mark))
+ (escaped "%" hex hex)
+ (hex (or digit [A-F] [a-f]))
+ (mark (set "-_.!~*'()"))
+ (alphanum (or alpha digit))
+ (alpha (or lowalpha upalpha))
+ (lowalpha [a-z])
+ (upalpha [A-Z])
+ (digit [0-9])))
+
+;; (peg-ex-uri)http://luser@www.foo.com:8080/bar/baz.html?x=1#foo
+;; (peg-ex-uri)file:/bar/baz.html?foo=df#x
+
+;; Split STRING where SEPARATOR occurs.
+(defun peg-ex-split (string separator)
+ (peg-parse-string ((s (list (* (* sep) elt)))
+ (elt (substring (+ (not sep) (any))))
+ (sep (= separator)))
+ string))
+
+;; (peg-ex-split "-abc-cd-" "-")
+
+;; Parse a lisp style Sexp.
+;; [To keep the example short, ' and . are handled as ordinary symbol.]
+(defun peg-ex-lisp ()
+ (peg-parse
+ (sexp _ (or string list number symbol))
+ (_ (* (or [" \n\t"] comment)))
+ (comment ";" (* (not (or "\n" (eob))) (any)))
+ (string "\"" (substring (* (not "\"") (any))) "\"")
+ (number (substring (opt (set "+-")) (+ digit))
+ (if terminating)
+ `(string -- (string-to-number string)))
+ (symbol (substring (and symchar (* (not terminating) symchar)))
+ `(s -- (intern s)))
+ (symchar [a-z A-Z 0-9 "-;!#%&'*+,./:;<=>?@[]^_`{|}~"])
+ (list "(" `(-- (cons nil nil)) `(hd -- hd hd)
+ (* sexp `(tl e -- (setcdr tl (list e))))
+ _ ")" `(hd _tl -- (cdr hd)))
+ (digit [0-9])
+ (terminating (or (set " \n\t();\"'") (eob)))))
+
+;; (peg-ex-lisp)
+
+;; We try to detect left recursion and report it as error.
+(defun peg-ex-left-recursion ()
+ (eval '(peg-parse (exp (or term
+ (and exp "+" exp)))
+ (term (or digit
+ (and term "*" term)))
+ (digit [0-9]))
+ t))
+
+(defun peg-ex-infinite-loop ()
+ (eval '(peg-parse (exp (* (or "x"
+ "y"
+ (action (foo))))))
+ t))
+
+;; Some efficiency problems:
+
+;; Find the last digit in a string.
+;; Recursive definition with excessive stack usage.
+(defun peg-ex-last-digit (string)
+ (peg-parse-string ((s (or (and (any) s)
+ (substring [0-9]))))
+ string))
+
+;; (peg-ex-last-digit "ab0cd1ef2gh")
+;; (peg-ex-last-digit (make-string 50 ?-))
+;; (peg-ex-last-digit (make-string 1000 ?-))
+
+;; Find the last digit without recursion. Doesn't run out of stack,
+;; but probably still too inefficient for large inputs.
+(defun peg-ex-last-digit2 (string)
+ (peg-parse-string ((s `(-- nil)
+ (+ (* (not digit) (any))
+ (substring digit)
+ `(_d1 d2 -- d2)))
+ (digit [0-9]))
+ string))
+
+;; (peg-ex-last-digit2 "ab0cd1ef2gh")
+;; (peg-ex-last-digit2 (concat (make-string 500000 ?-) "8a9b"))
+;; (peg-ex-last-digit2 (make-string 500000 ?-))
+;; (peg-ex-last-digit2 (make-string 500000 ?5))
+
+(provide 'peg-tests)
+;;; peg-tests.el ends here
--
2.42.0
^ permalink raw reply related [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-09-25 1:30 ` Eric Abrahamsen
@ 2023-09-25 2:27 ` Adam Porter
2023-09-25 13:00 ` Alexander Adolf
2024-03-24 14:19 ` Ihor Radchenko
1 sibling, 1 reply; 100+ messages in thread
From: Adam Porter @ 2023-09-25 2:27 UTC (permalink / raw)
To: eric; +Cc: eliz, emacs-devel, michael_heerdegen, monnier, yantar92
Hi Eric,
Thanks for picking this up again. I recently used peg.el in another
package of mine to rewrite and simplify the parsing of a simple query
syntax, and I was reminded of how useful it is. I think Emacs would
definitely benefit from having it in core.
--Adam
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-09-25 2:27 ` Adam Porter
@ 2023-09-25 13:00 ` Alexander Adolf
0 siblings, 0 replies; 100+ messages in thread
From: Alexander Adolf @ 2023-09-25 13:00 UTC (permalink / raw)
To: Adam Porter; +Cc: eric, eliz, emacs-devel, michael_heerdegen, monnier, yantar92
[-- Attachment #1: Type: text/plain, Size: 602 bytes --]
Hello,
I fully second Adam’s comments, and would thus also be in favour of including peg.el in core.
--alex
--
www.condition-alpha.com / @c_alpha
Sent from my iPhone; apologies for brevity and autocorrect weirdness.
> On 25. Sep 2023, at 04:28, Adam Porter <adam@alphapapa.net> wrote:
>
> Hi Eric,
>
> Thanks for picking this up again. I recently used peg.el in another package of mine to rewrite and simplify the parsing of a simple query syntax, and I was reminded of how useful it is. I think Emacs would definitely benefit from having it in core.
>
> --Adam
>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1944 bytes --]
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2023-09-25 1:30 ` Eric Abrahamsen
2023-09-25 2:27 ` Adam Porter
@ 2024-03-24 14:19 ` Ihor Radchenko
2024-03-24 15:32 ` Eli Zaretskii
1 sibling, 1 reply; 100+ messages in thread
From: Ihor Radchenko @ 2024-03-24 14:19 UTC (permalink / raw)
To: Eric Abrahamsen
Cc: emacs-devel, Michael Heerdegen, Eli Zaretskii, Stefan Monnier
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> So here's a commit adding package, tests, and manual all at once. I've
> cc'd the people who indicated interest. The manual should be up to date
> with the code, I hope I've managed to follow all the pointers, and I
> believe I've done a better job of explaining how to use the various
> entry points of the library.
It has been a while since the last message in this thread.
I am wondering if there is anything wrong with the latest version of the
patch. Or maybe something else should be done to move forward towards
merging peg.el?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2024-03-24 14:19 ` Ihor Radchenko
@ 2024-03-24 15:32 ` Eli Zaretskii
2024-03-25 1:45 ` Eric Abrahamsen
0 siblings, 1 reply; 100+ messages in thread
From: Eli Zaretskii @ 2024-03-24 15:32 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: eric, emacs-devel, michael_heerdegen, monnier
> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: emacs-devel@gnu.org, Michael Heerdegen <michael_heerdegen@web.de>, Eli
> Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 24 Mar 2024 14:19:58 +0000
>
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
> > So here's a commit adding package, tests, and manual all at once. I've
> > cc'd the people who indicated interest. The manual should be up to date
> > with the code, I hope I've managed to follow all the pointers, and I
> > believe I've done a better job of explaining how to use the various
> > entry points of the library.
>
> It has been a while since the last message in this thread.
> I am wondering if there is anything wrong with the latest version of the
> patch. Or maybe something else should be done to move forward towards
> merging peg.el?
If the patch is still good to go, the only thing that's missing,
AFAICT, is a NEWS entry.
^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: [PATCH] Re: Make peg.el a built-in library?
2024-03-24 15:32 ` Eli Zaretskii
@ 2024-03-25 1:45 ` Eric Abrahamsen
0 siblings, 0 replies; 100+ messages in thread
From: Eric Abrahamsen @ 2024-03-25 1:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Ihor Radchenko, emacs-devel, michael_heerdegen, monnier
[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]
On 03/24/24 17:32 PM, Eli Zaretskii wrote:
>> From: Ihor Radchenko <yantar92@posteo.net>
>> Cc: emacs-devel@gnu.org, Michael Heerdegen <michael_heerdegen@web.de>, Eli
>> Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>
>> Date: Sun, 24 Mar 2024 14:19:58 +0000
>>
>> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>
>> > So here's a commit adding package, tests, and manual all at once. I've
>> > cc'd the people who indicated interest. The manual should be up to date
>> > with the code, I hope I've managed to follow all the pointers, and I
>> > believe I've done a better job of explaining how to use the various
>> > entry points of the library.
>>
>> It has been a while since the last message in this thread.
>> I am wondering if there is anything wrong with the latest version of the
>> patch. Or maybe something else should be done to move forward towards
>> merging peg.el?
>
> If the patch is still good to go, the only thing that's missing,
> AFAICT, is a NEWS entry.
Huh, I'm not sure what I was expecting to happen after my last message.
Anyway, thanks for the nudge! The code itself reached a stable state a
while ago; the last feedback on the patch was from Eli regarding
improvements to the manual, all of which I incorporated. Just so we're
all on the same page I'm reattaching the last version of the patch. I'm
assuming all this is okay, and in a little bit I'll add a NEWS entry and
push.
Thanks to all!
Eric
[-- Attachment #2: 0001-Add-peg.el-as-a-built-in-library.patch --]
[-- Type: text/x-patch, Size: 65704 bytes --]
From a8d1b3ad3162e92b4f8c8dd52690d9c1f3333661 Mon Sep 17 00:00:00 2001
From: Eric Abrahamsen <eric@ericabrahamsen.net>
Date: Mon, 5 Dec 2022 21:59:03 -0800
Subject: [PATCH] Add peg.el as a built-in library
* lisp/progmodes/peg.el: New file, taken from ELPA package.
* test/lisp/peg-tests.el: Package tests.
* doc/lispref/peg.texi: Documentation.
---
doc/lispref/Makefile.in | 1 +
doc/lispref/elisp.texi | 2 +
doc/lispref/peg.texi | 351 +++++++++++++++
lisp/progmodes/peg.el | 944 ++++++++++++++++++++++++++++++++++++++++
test/lisp/peg-tests.el | 367 ++++++++++++++++
5 files changed, 1665 insertions(+)
create mode 100644 doc/lispref/peg.texi
create mode 100644 lisp/progmodes/peg.el
create mode 100644 test/lisp/peg-tests.el
diff --git a/doc/lispref/Makefile.in b/doc/lispref/Makefile.in
index 325f23a3c0f..8ac1242996d 100644
--- a/doc/lispref/Makefile.in
+++ b/doc/lispref/Makefile.in
@@ -112,6 +112,7 @@ srcs =
$(srcdir)/os.texi \
$(srcdir)/package.texi \
$(srcdir)/parsing.texi \
+ $(srcdir)/peg.texi \
$(srcdir)/positions.texi \
$(srcdir)/processes.texi \
$(srcdir)/records.texi \
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index 72441c8d442..e12f61fc7eb 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -222,6 +222,7 @@ Top
* Non-ASCII Characters:: Non-ASCII text in buffers and strings.
* Searching and Matching:: Searching buffers for strings or regexps.
* Syntax Tables:: The syntax table controls word and list parsing.
+* Parsing Expression Grammars:: Parsing structured buffer text.
* Parsing Program Source:: Generate syntax tree for program sources.
* Abbrevs:: How Abbrev mode works, and its data structures.
@@ -1719,6 +1720,7 @@ Top
@include searching.texi
@include syntax.texi
+@include peg.texi
@include parsing.texi
@include abbrevs.texi
@include threads.texi
diff --git a/doc/lispref/peg.texi b/doc/lispref/peg.texi
new file mode 100644
index 00000000000..64950f148b1
--- /dev/null
+++ b/doc/lispref/peg.texi
@@ -0,0 +1,351 @@
+@c -*-texinfo-*-
+@c This is part of the GNU Emacs Lisp Reference Manual.
+@c Copyright (C) 1990--1995, 1998--1999, 2001--2023 Free Software
+@c Foundation, Inc.
+@c See the file elisp.texi for copying conditions.
+@node Parsing Expression Grammars
+@chapter Parsing Expression Grammars
+@cindex text parsing
+@cindex parsing expression grammar
+
+ Emacs Lisp provides several tools for parsing and matching text,
+from regular expressions (@pxref{Regular Expressions}) to full
+@acronym{LL} grammar parsers (@pxref{Top,, Bovine parser
+development,bovine}). @dfn{Parsing Expression Grammars}
+(@acronym{PEG}) are another approach to text parsing that offer more
+structure and composibility than regular expressions, but less
+complexity than context-free grammars.
+
+A @acronym{PEG} parser is defined as a list of named rules, each of
+which matches text patterns, and/or contains references to other
+rules. Parsing is initiated with the function @code{peg-run} or the
+macro @code{peg-parse} (see below), and parses text after point in the
+current buffer, using a given set of rules.
+
+@cindex parsing expression
+The definition of each rule is referred to as a @dfn{parsing
+expression} (@acronym{PEX}), and can consist of a literal string, a
+regexp-like character range or set, a peg-specific construct
+resembling an elisp function call, a reference to another rule, or a
+combination of any of these. A grammar is expressed as a tree of
+rules in which one rule is typically treated as a ``root'' or
+``entry-point'' rule. For instance:
+
+@example
+@group
+((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+@end group
+@end example
+
+Once defined, grammars can be used to parse text after point in the
+current buffer, in the following ways:
+
+@defmac peg-parse &rest pexs
+Match @var{pexs} at point. If @var{pexs} is a list of PEG rules, the
+first rule is considered the ``entry-point'':
+@end defmac
+
+@example
+@group
+(peg-parse
+ ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9])))
+@end group
+@end example
+
+This macro represents the simplest use of the @acronym{PEG} library,
+but also the least flexible, as the rules must be written directly
+into the source code. A more flexible approach involves use of three
+macros in conjunction: @code{with-peg-rules}, a @code{let}-like
+construct that makes a set of rules available within the macro body;
+@code{peg-run}, which initiates parsing given a single rule; and
+@code{peg}, which is used to wrap the entry-point rule name. In fact,
+a call to @code{peg-parse} expands to just this set of calls. The
+above example could be written as:
+
+@example
+@group
+(with-peg-rules
+ ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+ (peg-run (peg number)))
+@end group
+@end example
+
+This allows more explicit control over the ``entry-point'' of parsing,
+and allows the combination of rules from different sources.
+
+Individual rules can also be defined using a more @code{defun}-like
+syntax, using the macro @code{define-peg-rule}:
+
+@example
+(define-peg-rule digit ()
+ [0-9])
+@end example
+
+This also allows for rules that accept an argument (supplied by the
+@code{funcall} PEG rule).
+
+Another possibility is to define a named set of rules with
+@code{define-peg-ruleset}:
+
+@example
+(define-peg-ruleset number-grammar
+ '((number sign digit (* digit))
+ digit ;; A reference to the definition above.
+ (sign (or "+" "-" ""))))
+@end example
+
+Rules and rulesets defined this way can be referred to by name in
+later calls to @code{peg-run} or @code{with-peg-rules}:
+
+@example
+(with-peg-rules number-grammar
+ (peg-run (peg number)))
+@end example
+
+By default, calls to @code{peg-run} or @code{peg-parse} produce no
+output: parsing simply moves point. In order to return or otherwise
+act upon parsed strings, rules can include @dfn{actions}, see
+@ref{Parsing Actions}.
+
+@menu
+* PEX Definitions:: The syntax of PEX rules.
+* Parsing Actions:: Running actions upon successful parsing.
+* Writing PEG Rules:: Tips for writing parsing rules.
+@end menu
+
+@node PEX Definitions
+@section PEX Definitions
+
+Parsing expressions can be defined using the following syntax:
+
+@table @code
+@item (and E1 E2 ...)
+A sequence of @acronym{PEX}s that must all be matched. The @code{and} form is
+optional and implicit.
+
+@item (or E1 E2 ...)
+Prioritized choices, meaning that, as in Elisp, the choices are tried
+in order, and the first successful match is used. Note that this is
+distinct from context-free grammars, in which selection between
+multiple matches is indeterminate.
+
+@item (any)
+Matches any single character, as the regexp ``.''.
+
+@item @var{string}
+A literal string.
+
+@item (char @var{C})
+A single character @var{C}, as an Elisp character literal.
+
+@item (* @var{E})
+Zero or more instances of expression @var{E}, as the regexp @samp{*}.
+Matching is always ``greedy''.
+
+@item (+ @var{E})
+One or more instances of expression @var{E}, as the regexp @samp{+}.
+Matching is always ``greedy''.
+
+@item (opt @var{E})
+Zero or one instance of expression @var{E}, as the regexp @samp{?}.
+
+@item SYMBOL
+A symbol representing a previously-defined PEG rule.
+
+@item (range CH1 CH2)
+The character range between CH1 and CH2, as the regexp @samp{[CH1-CH2]}.
+
+@item [CH1-CH2 "+*" ?x]
+A character set, which can include ranges, character literals, or
+strings of characters.
+
+@item [ascii cntrl]
+A list of named character classes.
+
+@item (syntax-class @var{NAME})
+A single syntax class.
+
+@item (funcall E ARGS...)
+Call @acronym{PEX} E (previously defined with @code{define-peg-rule})
+with arguments @var{ARGS}.
+
+@item (null)
+The empty string.
+
+@end table
+
+The following expressions are used as anchors or tests -- they do not
+move point, but return a boolean value which can be used to constrain
+matches as a way of controlling the parsing process (@pxref{Writing
+PEG Rules}).
+
+@table @code
+@item (bob)
+Beginning of buffer.
+
+@item (eob)
+End of buffer.
+
+@item (bol)
+Beginning of line.
+
+@item (eol)
+End of line.
+
+@item (bow)
+Beginning of word.
+
+@item (eow)
+End of word.
+
+@item (bos)
+Beginning of symbol.
+
+@item (eos)
+End of symbol.
+
+@item (if E)
+Returns non-@code{nil} if parsing @acronym{PEX} E from point succeeds (point
+is not moved).
+
+@item (not E)
+Returns non-@code{nil} if parsing @acronym{PEX} E from point fails (point
+is not moved).
+
+@item (guard EXP)
+Treats the value of the Lisp expression EXP as a boolean.
+
+@end table
+
+@vindex peg-char-classes
+Character class matching can use the same named character classes as
+in regular expressions (@pxref{Top,, Character Classes,elisp})
+
+@node Parsing Actions
+@section Parsing Actions
+
+@cindex parsing actions
+@cindex parsing stack
+By default the process of parsing simply moves point in the current
+buffer, ultimately returning @code{t} if the parsing succeeds, and
+@code{nil} if it doesn't. It's also possible to define ``actions''
+that can run arbitrary Elisp at certain points in the parsed text.
+These actions can optionally affect something called the @dfn{parsing
+stack}, which is a list of values returned by the parsing process.
+These actions only run (and only return values) if the parsing process
+ultimately succeeds; if it fails the action code is not run at all.
+
+Actions can be added anywhere in the definition of a rule. They are
+distinguished from parsing expressions by an initial backquote
+(@samp{`}), followed by a parenthetical form that must contain a pair
+of hyphens (@samp{--}) somewhere within it. Symbols to the left of
+the hyphens are bound to values popped from the stack (they are
+somewhat analogous to the argument list of a lambda form). Values
+produced by code to the right are pushed to the stack (analogous to
+the return value of the lambda). For instance, the previous grammar
+can be augmented with actions to return the parsed number as an actual
+integer:
+
+@example
+(with-peg-rules ((number sign digit (* digit
+ `(a b -- (+ (* a 10) b)))
+ `(sign val -- (* sign val)))
+ (sign (or (and "+" `(-- 1))
+ (and "-" `(-- -1))
+ (and "" `(-- 1))))
+ (digit [0-9] `(-- (- (char-before) ?0))))
+ (peg-run (peg number)))
+@end example
+
+There must be values on the stack before they can be popped and
+returned -- if there aren't enough stack values to bind to an action's
+left-hand terms, they will be bound to @code{nil}. An action with
+only right-hand terms will push values to the stack; an action with
+only left-hand terms will consume (and discard) values from the stack.
+At the end of parsing, stack values are returned as a flat list.
+
+To return the string matched by a @acronym{PEX} (instead of simply
+moving point over it), a rule like this can be used:
+
+@example
+(one-word
+ `(-- (point))
+ (+ [word])
+ `(start -- (buffer-substring start (point))))
+@end example
+
+The first action pushes the initial value of point to the stack. The
+intervening @acronym{PEX} moves point over the next word. The second
+action pops the previous value from the stack (binding it to the
+variable @code{start}), and uses that value to extract a substring
+from the buffer and push it to the stack. This pattern is so common
+that @acronym{PEG} provides a shorthand function that does exactly the
+above, along with a few other shorthands for common scenarios:
+
+@table @code
+@item (substring @var{E})
+Match @acronym{PEX} @var{E} and push the matched string to the stack.
+
+@item (region @var{E})
+Match @var{E} and push the start and end positions of the matched
+region to the stack.
+
+@item (replace @var{E} @var{replacement})
+Match @var{E} and replaced the matched region with the string @var{replacement}.
+
+@item (list @var{E})
+Match @var{E}, collect all values produced by @var{E} (and its
+sub-expressions) into a list, and push that list to the stack. Stack
+values are typically returned as a flat list; this is a way of
+``grouping'' values together.
+@end table
+
+@node Writing PEG Rules
+@section Writing PEG Rules
+
+Something to be aware of when writing PEG rules is that they are
+greedy. Rules which can consume a variable amount of text will always
+consume the maximum amount possible, even if that causes a rule that
+might otherwise have matched to fail later on -- there is no
+backtracking. For instance, this rule will never succeed:
+
+@example
+(forest (+ "tree" (* [blank])) "tree" (eol))
+@end example
+
+The @acronym{PEX} @code{(+ "tree" (* [blank]))} will consume all
+repetitions of the word ``tree'', leaving none to match the final
+@code{"tree"}.
+
+In these situations, the desired result can be obtained by using
+predicates and guards -- namely the @code{not}, @code{if} and
+@code{guard} expressions -- to constrain behavior. For instance:
+
+@example
+(forest (+ "tree" (* [blank])) (not (eol)) "tree" (eol))
+@end example
+
+The @code{if} and @code{not} operators accept a parsing expression and
+interpret it as a boolean, without moving point. The contents of a
+@code{guard} operator are evaluated as regular Lisp (not a
+@acronym{PEX}) and should return a boolean value. A @code{nil} value
+causes the match to fail.
+
+Another potentially unexpected behavior is that parsing will move
+point as far as possible, even if the parsing ultimately fails. This
+rule:
+
+@example
+(end-game "game" (eob))
+@end example
+
+when run in a buffer containing the text ``game over'' after point,
+will move point to just after ``game'' then halt parsing, returning
+@code{nil}. Successful parsing will always return @code{t}, or the
+contexts of the parsing stack.
diff --git a/lisp/progmodes/peg.el b/lisp/progmodes/peg.el
new file mode 100644
index 00000000000..2eb4a7384d0
--- /dev/null
+++ b/lisp/progmodes/peg.el
@@ -0,0 +1,944 @@
+;;; peg.el --- Parsing Expression Grammars in Emacs Lisp -*- lexical-binding:t -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+;;
+;; Author: Helmut Eller <eller.helmut@gmail.com>
+;; Maintainer: Stefan Monnier <monnier@iro.umontreal.ca>
+;; Version: 1.0.1
+;;
+;; This program is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with this program. If not, see <https://www.gnu.org/licenses/>.
+;;
+;;; Commentary:
+;;
+;; This package implements Parsing Expression Grammars for Emacs Lisp.
+
+;; Parsing Expression Grammars (PEG) are a formalism in the spirit of
+;; Context Free Grammars (CFG) with some simplifications which makes
+;; the implementation of PEGs as recursive descent parsers particularly
+;; simple and easy to understand [Ford, Baker].
+;; PEGs are more expressive than regexps and potentially easier to use.
+;;
+;; This file implements the macros `define-peg-rule', `with-peg-rules', and
+;; `peg-parse' which parses the current buffer according to a PEG.
+;; E.g. we can match integers with:
+;;
+;; (with-peg-rules
+;; ((number sign digit (* digit))
+;; (sign (or "+" "-" ""))
+;; (digit [0-9]))
+;; (peg-run (peg number)))
+;; or
+;; (define-peg-rule digit ()
+;; [0-9])
+;; (peg-parse (number sign digit (* digit))
+;; (sign (or "+" "-" "")))
+;;
+;; In contrast to regexps, PEGs allow us to define recursive "rules".
+;; A "grammar" is a set of rules. A rule is written as (NAME PEX...)
+;; E.g. (sign (or "+" "-" "")) is a rule with the name "sign".
+;; The syntax for PEX (Parsing Expression) is a follows:
+;;
+;; Description Lisp Traditional, as in Ford's paper
+;; =========== ==== ===========
+;; Sequence (and E1 E2) e1 e2
+;; Prioritized Choice (or E1 E2) e1 / e2
+;; Not-predicate (not E) !e
+;; And-predicate (if E) &e
+;; Any character (any) .
+;; Literal string "abc" "abc"
+;; Character C (char C) 'c'
+;; Zero-or-more (* E) e*
+;; One-or-more (+ E) e+
+;; Optional (opt E) e?
+;; Non-terminal SYMBOL A
+;; Character range (range A B) [a-b]
+;; Character set [a-b "+*" ?x] [a-b+*x] ;Note: it's a vector
+;; Character classes [ascii cntrl]
+;; Boolean-guard (guard EXP)
+;; Syntax-Class (syntax-class NAME)
+;; Local definitions (with RULES PEX...)
+;; Indirect call (funcall EXP ARGS...)
+;; and
+;; Empty-string (null) ε
+;; Beginning-of-Buffer (bob)
+;; End-of-Buffer (eob)
+;; Beginning-of-Line (bol)
+;; End-of-Line (eol)
+;; Beginning-of-Word (bow)
+;; End-of-Word (eow)
+;; Beginning-of-Symbol (bos)
+;; End-of-Symbol (eos)
+;;
+;; Rules can refer to other rules, and a grammar is often structured
+;; as a tree, with a root rule referring to one or more "branch
+;; rules", all the way down to the "leaf rules" that deal with actual
+;; buffer text. Rules can be recursive or mutually referential,
+;; though care must be taken not to create infinite loops.
+;;
+;;;; Named rulesets:
+;;
+;; You can define a set of rules for later use with:
+;;
+;; (define-peg-ruleset myrules
+;; (sign () (or "+" "-" ""))
+;; (digit () [0-9])
+;; (nat () digit (* digit))
+;; (int () sign digit (* digit))
+;; (float () int "." nat))
+;;
+;; and later refer to it:
+;;
+;; (with-peg-rules
+;; (myrules
+;; (complex float "+i" float))
+;; ... (peg-parse nat "," nat "," complex) ...)
+;;
+;;;; Parsing actions:
+;;
+;; PEXs also support parsing actions, i.e. Lisp snippets which are
+;; executed when a pex matches. This can be used to construct syntax
+;; trees or for similar tasks. The most basic form of action is
+;; written as:
+;;
+;; (action FORM) ; evaluate FORM for its side-effects
+;;
+;; Actions don't consume input, but are executed at the point of
+;; match. Another kind of action is called a "stack action", and
+;; looks like this:
+;;
+;; `(VAR... -- FORM...) ; stack action
+;;
+;; A stack action takes VARs from the "value stack" and pushes the
+;; results of evaluating FORMs to that stack.
+
+;; The value stack is created during the course of parsing. Certain
+;; operators (see below) that match buffer text can push values onto
+;; this stack. "Upstream" rules can then draw values from the stack,
+;; and optionally push new ones back. For instance, consider this
+;; very simple grammar:
+;;
+;; (with-peg-rules
+;; ((query (+ term) (eol))
+;; (term key ":" value (opt (+ [space]))
+;; `(k v -- (cons (intern k) v)))
+;; (key (substring (and (not ":") (+ [word]))))
+;; (value (or string-value number-value))
+;; (string-value (substring (+ [alpha])))
+;; (number-value (substring (+ [digit]))
+;; `(val -- (string-to-number val))))
+;; (peg-run (peg query)))
+;;
+;; This invocation of `peg-run' would parse this buffer text:
+;;
+;; name:Jane age:30
+;;
+;; And return this Elisp sexp:
+;;
+;; ((age . 30) (name . "Jane"))
+;;
+;; Note that, in complex grammars, some care must be taken to make
+;; sure that the number and type of values drawn from the stack always
+;; match those pushed. In the example above, both `string-value' and
+;; `number-value' push a single value to the stack. Since the `value'
+;; rule only includes these two sub-rules, any upstream rule that
+;; makes use of `value' can be confident it will always and only push
+;; a single value to the stack.
+;;
+;; Stack action forms are in a sense analogous to lambda forms: the
+;; symbols before the "--" are the equivalent of lambda arguments,
+;; while the forms after the "--" are return values. The difference
+;; being that a lambda form can only return a single value, while a
+;; stack action can push multiple values onto the stack. It's also
+;; perfectly valid to use `(-- FORM...)' or `(VAR... --)': the former
+;; pushes values to the stack without consuming any, and the latter
+;; pops values from the stack and discards them.
+;;
+;;;; Derived Operators:
+;;
+;; The following operators are implemented as combinations of
+;; primitive expressions:
+;;
+;; (substring E) ; Match E and push the substring for the matched region.
+;; (region E) ; Match E and push the start and end positions.
+;; (replace E RPL); Match E and replace the matched region with RPL.
+;; (list E) ; Match E and push a list of the items that E produced.
+;;
+;; See `peg-ex-parse-int' in `peg-tests.el' for further examples.
+;;
+;; Regexp equivalents:
+;;
+;; Here a some examples for regexps and how those could be written as pex.
+;; [Most are taken from rx.el]
+;;
+;; "^[a-z]*"
+;; (and (bol) (* [a-z]))
+;;
+;; "\n[^ \t]"
+;; (and "\n" (not [" \t"]) (any))
+;;
+;; "\\*\\*\\* EOOH \\*\\*\\*\n"
+;; "*** EOOH ***\n"
+;;
+;; "\\<\\(catch\\|finally\\)\\>[^_]"
+;; (and (bow) (or "catch" "finally") (eow) (not "_") (any))
+;;
+;; "[ \t\n]*:\\([^:]+\\|$\\)"
+;; (and (* [" \t\n"]) ":" (or (+ (not ":") (any)) (eol)))
+;;
+;; "^content-transfer-encoding:\\(\n?[\t ]\\)*quoted-printable\\(\n?[\t ]\\)*"
+;; (and (bol)
+;; "content-transfer-encoding:"
+;; (* (opt "\n") ["\t "])
+;; "quoted-printable"
+;; (* (opt "\n") ["\t "]))
+;;
+;; "\\$[I]d: [^ ]+ \\([^ ]+\\) "
+;; (and "$Id: " (+ (not " ") (any)) " " (+ (not " ") (any)) " ")
+;;
+;; "^;;\\s-*\n\\|^\n"
+;; (or (and (bol) ";;" (* (syntax-class whitespace)) "\n")
+;; (and (bol) "\n"))
+;;
+;; "\\\\\\\\\\[\\w+"
+;; (and "\\\\[" (+ (syntax-class word)))
+;;
+;; See ";;; Examples" in `peg-tests.el' for other examples.
+;;
+;;;; Rule argument and indirect calls:
+;;
+;; Rules can take arguments and those arguments can themselves be PEGs.
+;; For example:
+;;
+;; (define-peg-rule 2-or-more (peg)
+;; (funcall peg)
+;; (funcall peg)
+;; (* (funcall peg)))
+;;
+;; ... (peg-parse
+;; ...
+;; (2-or-more (peg foo))
+;; ...
+;; (2-or-more (peg bar))
+;; ...)
+;;
+;;;; References:
+;;
+;; [Ford] Bryan Ford. Parsing Expression Grammars: a Recognition-Based
+;; Syntactic Foundation. In POPL'04: Proceedings of the 31st ACM
+;; SIGPLAN-SIGACT symposium on Principles of Programming Languages,
+;; pages 111-122, New York, NY, USA, 2004. ACM Press.
+;; http://pdos.csail.mit.edu/~baford/packrat/
+;;
+;; [Baker] Baker, Henry G. "Pragmatic Parsing in Common Lisp". ACM Lisp
+;; Pointers 4(2), April--June 1991, pp. 3--15.
+;; http://home.pipeline.com/~hbaker1/Prag-Parse.html
+;;
+;; Roman Redziejowski does good PEG related research
+;; http://www.romanredz.se/pubs.htm
+
+;;;; Todo:
+
+;; - Fix the exponential blowup in `peg-translate-exp'.
+;; - Add a proper debug-spec for PEXs.
+
+;;; News:
+
+;; Since 1.0.1:
+;; - Use OClosures to represent PEG rules when available, and let cl-print
+;; display their source code.
+;; - New PEX form (with RULES PEX...).
+;; - Named rulesets.
+;; - You can pass arguments to rules.
+;; - New `funcall' rule to call rules indirectly (e.g. a peg you received
+;; as argument).
+
+;; Version 1.0:
+;; - New official entry points `peg` and `peg-run`.
+
+;;; Code:
+
+(eval-when-compile (require 'cl-lib))
+
+(defvar peg--actions nil
+ "Actions collected along the current parse.
+Used at runtime for backtracking. It's a list ((POS . THUNK)...).
+Each THUNK is executed at the corresponding POS. Thunks are
+executed in a postprocessing step, not during parsing.")
+
+(defvar peg--errors nil
+ "Data keeping track of the rightmost parse failure location.
+It's a pair (POSITION . EXPS ...). POSITION is the buffer position and
+EXPS is a list of rules/expressions that failed.")
+
+;;;; Main entry points
+
+(defmacro peg--when-fboundp (f &rest body)
+ (declare (indent 1) (debug (sexp body)))
+ (when (fboundp f)
+ (macroexp-progn body)))
+
+(peg--when-fboundp oclosure-define
+ (oclosure-define peg-function
+ "Parsing function built from PEG rule."
+ pexs)
+
+ (cl-defmethod cl-print-object ((peg peg-function) stream)
+ (princ "#f<peg " stream)
+ (let ((args (help-function-arglist peg 'preserve-names)))
+ (if args
+ (prin1 args stream)
+ (princ "()" stream)))
+ (princ " " stream)
+ (prin1 (peg-function--pexs peg) stream)
+ (princ ">" stream)))
+
+(defmacro peg--lambda (pexs args &rest body)
+ (declare (indent 2)
+ (debug (&define form lambda-list def-body)))
+ (if (fboundp 'oclosure-lambda)
+ `(oclosure-lambda (peg-function (pexs ,pexs)) ,args . ,body)
+ `(lambda ,args . ,body)))
+
+;; Sometimes (with-peg-rules ... (peg-run (peg ...))) is too
+;; longwinded for the task at hand, so `peg-parse' comes in handy.
+(defmacro peg-parse (&rest pexs)
+ "Match PEXS at point.
+PEXS is a sequence of PEG expressions, implicitly combined with `and'.
+Returns STACK if the match succeed and signals an error on failure,
+moving point along the way.
+PEXS can also be a list of PEG rules, in which case the first rule is used."
+ (if (and (consp (car pexs))
+ (symbolp (caar pexs))
+ (not (ignore-errors (peg-normalize (car pexs)))))
+ ;; `pexs' is a list of rules: use the first rule as entry point.
+ `(with-peg-rules ,pexs (peg-run (peg ,(caar pexs)) #'peg-signal-failure))
+ `(peg-run (peg ,@pexs) #'peg-signal-failure)))
+
+(defmacro peg (&rest pexs)
+ "Return a PEG-matcher that matches PEXS."
+ (pcase (peg-normalize `(and . ,pexs))
+ (`(call ,name) `#',(peg--rule-id name)) ;Optimize this case by η-reduction!
+ (exp `(peg--lambda ',pexs () ,(peg-translate-exp exp)))))
+
+;; There are several "infos we want to return" when parsing a given PEX:
+;; 1- We want to return the success/failure of the parse.
+;; 2- We want to return the data of the successful parse (the stack).
+;; 3- We want to return the diagnostic of the failures.
+;; 4- We want to perform the actions (upon parse success)!
+;; `peg-parse' used an error signal to encode the (1) boolean, which
+;; lets it return all the info conveniently but the error signal was sometimes
+;; inconvenient. Other times one wants to just know (1) maybe without even
+;; performing (4).
+;; `peg-run' lets you choose all that, and by default gives you
+;; (1) as a simple boolean, while also doing (2), and (4).
+
+(defun peg-run (peg-matcher &optional failure-function success-function)
+ "Parse with PEG-MATCHER at point and run the success/failure function.
+If a match was found, move to the end of the match and call SUCCESS-FUNCTION
+with one argument: a function which will perform all the actions collected
+during the parse and then return the resulting stack (or t if empty).
+If no match was found, move to the (rightmost) point of parse failure and call
+FAILURE-FUNCTION with one argument, which is a list of PEG expressions that
+failed at this point.
+SUCCESS-FUNCTION defaults to `funcall' and FAILURE-FUNCTION
+defaults to `ignore'."
+ (let ((peg--actions '()) (peg--errors '(-1)))
+ (if (funcall peg-matcher)
+ ;; Found a parse: run the actions collected along the way.
+ (funcall (or success-function #'funcall)
+ (lambda ()
+ (save-excursion (peg-postprocess peg--actions))))
+ (goto-char (car peg--errors))
+ (when failure-function
+ (funcall failure-function (peg-merge-errors (cdr peg--errors)))))))
+
+(defmacro define-peg-rule (name args &rest pexs)
+ "Define PEG rule NAME as equivalent to PEXS.
+The PEG expressions in PEXS are implicitly combined with the
+sequencing `and' operator of PEG grammars."
+ (declare (indent 1))
+ (let ((inline nil))
+ (while (keywordp (car pexs))
+ (pcase (pop pexs)
+ (:inline (setq inline (car pexs))))
+ (setq pexs (cdr pexs)))
+ (let ((id (peg--rule-id name))
+ (exp (peg-normalize `(and . ,pexs))))
+ `(progn
+ (defalias ',id
+ (peg--lambda ',pexs ,args
+ ,(if inline
+ ;; Short-circuit to peg--translate in order to skip
+ ;; the extra failure-recording of `peg-translate-exp'.
+ ;; It also skips the cycle detection of
+ ;; `peg--translate-rule-body', which is not the main
+ ;; purpose but we can live with it.
+ (apply #'peg--translate exp)
+ (peg--translate-rule-body name exp))))
+ (eval-and-compile
+ ;; FIXME: We shouldn't need this any more since the info is now
+ ;; stored in the function, but sadly we need to find a name's EXP
+ ;; during compilation (i.e. before the `defalias' is executed)
+ ;; as part of cycle-detection!
+ (put ',id 'peg--rule-definition ',exp)
+ ,@(when inline
+ ;; FIXME: Copied from `defsubst'.
+ `(;; Never native-compile defsubsts as we need the byte
+ ;; definition in `byte-compile-unfold-bcf' to perform the
+ ;; inlining (Bug#42664, Bug#43280, Bug#44209).
+ ,(byte-run--set-speed id nil -1)
+ (put ',id 'byte-optimizer #'byte-compile-inline-expand))))))))
+
+(defmacro define-peg-ruleset (name &rest rules)
+ "Define a set of PEG rules for later use, e.g., in `with-peg-rules'."
+ (declare (indent 1))
+ (let ((defs ())
+ (aliases ()))
+ (dolist (rule rules)
+ (let* ((rname (car rule))
+ (full-rname (format "%s %s" name rname)))
+ (push `(define-peg-rule ,full-rname . ,(cdr rule)) defs)
+ (push `(,(peg--rule-id rname) #',(peg--rule-id full-rname)) aliases)))
+ `(cl-flet ,aliases
+ ,@defs
+ (eval-and-compile (put ',name 'peg--rules ',aliases)))))
+
+(defmacro with-peg-rules (rules &rest body)
+ "Make PEG rules RULES available within the scope of BODY.
+RULES is a list of rules of the form (NAME . PEXS), where PEXS is a sequence
+of PEG expressions, implicitly combined with `and'.
+RULES can also contain symbols in which case these must name
+rulesets defined previously with `define-peg-ruleset'."
+ (declare (indent 1) (debug (sexp form))) ;FIXME: `sexp' is not good enough!
+ (let* ((rulesets nil)
+ (rules
+ ;; First, macroexpand the rules.
+ (delq nil
+ (mapcar (lambda (rule)
+ (if (symbolp rule)
+ (progn (push rule rulesets) nil)
+ (cons (car rule) (peg-normalize `(and . ,(cdr rule))))))
+ rules)))
+ (ctx (assq :peg-rules macroexpand-all-environment)))
+ (macroexpand-all
+ `(cl-labels
+ ,(mapcar (lambda (rule)
+ ;; FIXME: Use `peg--lambda' as well.
+ `(,(peg--rule-id (car rule))
+ ()
+ ,(peg--translate-rule-body (car rule) (cdr rule))))
+ rules)
+ ,@body)
+ `((:peg-rules ,@(append rules (cdr ctx)))
+ ,@macroexpand-all-environment))))
+
+;;;;; Old entry points
+
+(defmacro peg-parse-exp (exp)
+ "Match the parsing expression EXP at point."
+ (declare (obsolete peg-parse "peg-0.9"))
+ `(peg-run (peg ,exp)))
+
+;;;; The actual implementation
+
+(defun peg--lookup-rule (name)
+ (or (cdr (assq name (cdr (assq :peg-rules macroexpand-all-environment))))
+ ;; With `peg-function' objects, we can recover the PEG from which it was
+ ;; defined, but this info is not yet available at compile-time. :-(
+ ;;(let ((id (peg--rule-id name)))
+ ;; (peg-function--pexs (symbol-function id)))
+ (get (peg--rule-id name) 'peg--rule-definition)))
+
+(defun peg--rule-id (name)
+ (intern (format "peg-rule %s" name)))
+
+(define-error 'peg-search-failed "Parse error at %d (expecting %S)")
+
+(defun peg-signal-failure (failures)
+ (signal 'peg-search-failed (list (point) failures)))
+
+(defun peg-parse-at-point (peg-matcher)
+ "Parse text at point according to the PEG rule PEG-MATCHER."
+ (declare (obsolete peg-run "peg-1.0"))
+ (peg-run peg-matcher
+ #'peg-signal-failure
+ (lambda (f) (let ((r (funcall f))) (if (listp r) r)))))
+
+;; Internally we use a regularized syntax, e.g. we only have binary OR
+;; nodes. Regularized nodes are lists of the form (OP ARGS...).
+(cl-defgeneric peg-normalize (exp)
+ "Return a \"normalized\" form of EXP."
+ (error "Invalid parsing expression: %S" exp))
+
+(cl-defmethod peg-normalize ((exp string))
+ (let ((len (length exp)))
+ (cond ((zerop len) '(guard t))
+ ((= len 1) `(char ,(aref exp 0)))
+ (t `(str ,exp)))))
+
+(cl-defmethod peg-normalize ((exp symbol))
+ ;; (peg--lookup-rule exp)
+ `(call ,exp))
+
+(cl-defmethod peg-normalize ((exp vector))
+ (peg-normalize `(set . ,(append exp '()))))
+
+(cl-defmethod peg-normalize ((exp cons))
+ (apply #'peg--macroexpand exp))
+
+(defconst peg-leaf-types '(any call action char range str set
+ guard syntax-class = funcall))
+
+(cl-defgeneric peg--macroexpand (head &rest args)
+ (cond
+ ((memq head peg-leaf-types) (cons head args))
+ (t `(call ,head ,@args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql or)) &rest args)
+ (cond ((null args) '(guard nil))
+ ((null (cdr args)) (peg-normalize (car args)))
+ (t `(or ,(peg-normalize (car args))
+ ,(peg-normalize `(or . ,(cdr args)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql and)) &rest args)
+ (cond ((null args) '(guard t))
+ ((null (cdr args)) (peg-normalize (car args)))
+ (t `(and ,(peg-normalize (car args))
+ ,(peg-normalize `(and . ,(cdr args)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql *)) &rest args)
+ `(* ,(peg-normalize `(and . ,args))))
+
+;; FIXME: this duplicates code; could use some loop to avoid that
+(cl-defmethod peg--macroexpand ((_ (eql +)) &rest args)
+ (let ((e (peg-normalize `(and . ,args))))
+ `(and ,e (* ,e))))
+
+(cl-defmethod peg--macroexpand ((_ (eql opt)) &rest args)
+ (let ((e (peg-normalize `(and . ,args))))
+ `(or ,e (guard t))))
+
+(cl-defmethod peg--macroexpand ((_ (eql if)) &rest args)
+ `(if ,(peg-normalize `(and . ,args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql not)) &rest args)
+ `(not ,(peg-normalize `(and . ,args))))
+
+(cl-defmethod peg--macroexpand ((_ (eql \`)) form)
+ (peg-normalize `(stack-action ,form)))
+
+(cl-defmethod peg--macroexpand ((_ (eql stack-action)) form)
+ (unless (member '-- form)
+ (error "Malformed stack action: %S" form))
+ (let ((args (cdr (member '-- (reverse form))))
+ (values (cdr (member '-- form))))
+ (let ((form `(let ,(mapcar (lambda (var) `(,var (pop peg--stack))) args)
+ ,@(mapcar (lambda (val) `(push ,val peg--stack)) values))))
+ `(action ,form))))
+
+(defvar peg-char-classes
+ '(ascii alnum alpha blank cntrl digit graph lower multibyte nonascii print
+ punct space unibyte upper word xdigit))
+
+(cl-defmethod peg--macroexpand ((_ (eql set)) &rest specs)
+ (cond ((null specs) '(guard nil))
+ ((and (null (cdr specs))
+ (let ((range (peg-range-designator (car specs))))
+ (and range `(range ,(car range) ,(cdr range))))))
+ (t
+ (let ((chars '()) (ranges '()) (classes '()))
+ (while specs
+ (let* ((spec (pop specs))
+ (range (peg-range-designator spec)))
+ (cond (range
+ (push range ranges))
+ ((peg-characterp spec)
+ (push spec chars))
+ ((stringp spec)
+ (setq chars (append (reverse (append spec ())) chars)))
+ ((memq spec peg-char-classes)
+ (push spec classes))
+ (t (error "Invalid set specifier: %S" spec)))))
+ (setq ranges (reverse ranges))
+ (setq chars (delete-dups (reverse chars)))
+ (setq classes (reverse classes))
+ (cond ((and (null ranges)
+ (null classes)
+ (cond ((null chars) '(guard nil))
+ ((null (cdr chars)) `(char ,(car chars))))))
+ (t `(set ,ranges ,chars ,classes)))))))
+
+(defun peg-range-designator (x)
+ (and (symbolp x)
+ (let ((str (symbol-name x)))
+ (and (= (length str) 3)
+ (eq (aref str 1) ?-)
+ (< (aref str 0) (aref str 2))
+ (cons (aref str 0) (aref str 2))))))
+
+;; characterp is new in Emacs 23.
+(defun peg-characterp (x)
+ (if (fboundp 'characterp)
+ (characterp x)
+ (integerp x)))
+
+(cl-defmethod peg--macroexpand ((_ (eql list)) &rest args)
+ (peg-normalize
+ (let ((marker (make-symbol "magic-marker")))
+ `(and (stack-action (-- ',marker))
+ ,@args
+ (stack-action (--
+ (let ((l '()))
+ (while
+ (let ((e (pop peg--stack)))
+ (cond ((eq e ',marker) nil)
+ ((null peg--stack)
+ (error "No marker on stack"))
+ (t (push e l) t))))
+ l)))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql substring)) &rest args)
+ (peg-normalize
+ `(and `(-- (point))
+ ,@args
+ `(start -- (buffer-substring-no-properties start (point))))))
+
+(cl-defmethod peg--macroexpand ((_ (eql region)) &rest args)
+ (peg-normalize
+ `(and `(-- (point))
+ ,@args
+ `(-- (point)))))
+
+(cl-defmethod peg--macroexpand ((_ (eql replace)) pe replacement)
+ (peg-normalize
+ `(and (stack-action (-- (point)))
+ ,pe
+ (stack-action (start -- (progn
+ (delete-region start (point))
+ (insert-before-markers ,replacement))))
+ (stack-action (_ --)))))
+
+(cl-defmethod peg--macroexpand ((_ (eql quote)) _form)
+ (error "quote is reserved for future use"))
+
+(cl-defgeneric peg--translate (head &rest args)
+ (error "No translator for: %S" (cons head args)))
+
+(defun peg--translate-rule-body (name exp)
+ (let ((msg (condition-case err
+ (progn (peg-detect-cycles exp (list name)) nil)
+ (error (error-message-string err))))
+ (code (peg-translate-exp exp)))
+ (cond
+ ((null msg) code)
+ ((fboundp 'macroexp--warn-and-return)
+ (macroexp--warn-and-return msg code))
+ (t
+ (message "%s" msg)
+ code))))
+
+;; This is the main translation function.
+(defun peg-translate-exp (exp)
+ "Return the ELisp code to match the PE EXP."
+ ;; FIXME: This expansion basically duplicates `exp' in the output, which is
+ ;; a serious problem because it's done recursively, so it makes the output
+ ;; code's size exponentially larger than the input!
+ `(or ,(apply #'peg--translate exp)
+ (peg--record-failure ',exp))) ; for error reporting
+
+(define-obsolete-function-alias 'peg-record-failure
+ #'peg--record-failure "peg-1.0")
+(defun peg--record-failure (exp)
+ (cond ((= (point) (car peg--errors))
+ (setcdr peg--errors (cons exp (cdr peg--errors))))
+ ((> (point) (car peg--errors))
+ (setq peg--errors (list (point) exp))))
+ nil)
+
+(cl-defmethod peg--translate ((_ (eql and)) e1 e2)
+ `(and ,(peg-translate-exp e1)
+ ,(peg-translate-exp e2)))
+
+;; Choicepoints are used for backtracking. At a choicepoint we save
+;; enough state, so that we can continue from there if needed.
+(defun peg--choicepoint-moved-p (choicepoint)
+ `(/= ,(car choicepoint) (point)))
+
+(defun peg--choicepoint-restore (choicepoint)
+ `(progn
+ (goto-char ,(car choicepoint))
+ (setq peg--actions ,(cdr choicepoint))))
+
+(defmacro peg--with-choicepoint (var &rest body)
+ (declare (indent 1) (debug (symbolp form)))
+ `(let ((,var (cons (make-symbol "point") (make-symbol "actions"))))
+ `(let ((,(car ,var) (point))
+ (,(cdr ,var) peg--actions))
+ ,@(list ,@body))))
+
+(cl-defmethod peg--translate ((_ (eql or)) e1 e2)
+ (peg--with-choicepoint cp
+ `(or ,(peg-translate-exp e1)
+ (,@(peg--choicepoint-restore cp)
+ ,(peg-translate-exp e2)))))
+
+(cl-defmethod peg--translate ((_ (eql with)) rules &rest exps)
+ `(with-peg-rules ,rules ,(peg--translate `(and . ,exps))))
+
+(cl-defmethod peg--translate ((_ (eql guard)) exp) exp)
+
+(defvar peg-syntax-classes
+ '((whitespace ?-) (word ?w) (symbol ?s) (punctuation ?.)
+ (open ?\() (close ?\)) (string ?\") (escape ?\\) (charquote ?/)
+ (math ?$) (prefix ?') (comment ?<) (endcomment ?>)
+ (comment-fence ?!) (string-fence ?|)))
+
+(cl-defmethod peg--translate ((_ (eql syntax-class)) class)
+ (let ((probe (assoc class peg-syntax-classes)))
+ (cond (probe `(when (looking-at ,(format "\\s%c" (cadr probe)))
+ (forward-char)
+ t))
+ (t (error "Invalid syntax class: %S\nMust be one of: %s" class
+ (mapcar #'car peg-syntax-classes))))))
+
+(cl-defmethod peg--translate ((_ (eql =)) string)
+ `(let ((str ,string))
+ (when (zerop (length str))
+ (error "Empty strings not allowed for ="))
+ (search-forward str (+ (point) (length str)) t)))
+
+(cl-defmethod peg--translate ((_ (eql *)) e)
+ `(progn (while ,(peg--with-choicepoint cp
+ `(if ,(peg-translate-exp e)
+ ;; Just as regexps do for the `*' operator,
+ ;; we allow the body of `*' loops to match
+ ;; the empty string, but we don't repeat the loop if
+ ;; we haven't moved, to avoid inf-loops.
+ ,(peg--choicepoint-moved-p cp)
+ ,(peg--choicepoint-restore cp)
+ nil)))
+ t))
+
+(cl-defmethod peg--translate ((_ (eql if)) e)
+ (peg--with-choicepoint cp
+ `(when ,(peg-translate-exp e)
+ ,(peg--choicepoint-restore cp)
+ t)))
+
+(cl-defmethod peg--translate ((_ (eql not)) e)
+ (peg--with-choicepoint cp
+ `(unless ,(peg-translate-exp e)
+ ,(peg--choicepoint-restore cp)
+ t)))
+
+(cl-defmethod peg--translate ((_ (eql any)) )
+ '(when (not (eobp))
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql char)) c)
+ `(when (eq (char-after) ',c)
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql set)) ranges chars classes)
+ `(when (looking-at ',(peg-make-charset-regexp ranges chars classes))
+ (forward-char)
+ t))
+
+(defun peg-make-charset-regexp (ranges chars classes)
+ (when (and (not ranges) (not classes) (<= (length chars) 1))
+ (error "Bug"))
+ (let ((rbracket (member ?\] chars))
+ (minus (member ?- chars))
+ (hat (member ?^ chars)))
+ (dolist (c '(?\] ?- ?^))
+ (setq chars (remove c chars)))
+ (format "[%s%s%s%s%s%s]"
+ (if rbracket "]" "")
+ (if minus "-" "")
+ (mapconcat (lambda (x) (format "%c-%c" (car x) (cdr x))) ranges "")
+ (mapconcat (lambda (c) (format "[:%s:]" c)) classes "")
+ (mapconcat (lambda (c) (format "%c" c)) chars "")
+ (if hat "^" ""))))
+
+(cl-defmethod peg--translate ((_ (eql range)) from to)
+ `(when (and (char-after)
+ (<= ',from (char-after))
+ (<= (char-after) ',to))
+ (forward-char)
+ t))
+
+(cl-defmethod peg--translate ((_ (eql str)) str)
+ `(when (looking-at ',(regexp-quote str))
+ (goto-char (match-end 0))
+ t))
+
+(cl-defmethod peg--translate ((_ (eql call)) name &rest args)
+ `(,(peg--rule-id name) ,@args))
+
+(cl-defmethod peg--translate ((_ (eql funcall)) exp &rest args)
+ `(funcall ,exp ,@args))
+
+(cl-defmethod peg--translate ((_ (eql action)) form)
+ `(progn
+ (push (cons (point) (lambda () ,form)) peg--actions)
+ t))
+
+(defvar peg--stack nil)
+(defun peg-postprocess (actions)
+ "Execute \"actions\"."
+ (let ((peg--stack '())
+ (forw-actions ()))
+ (pcase-dolist (`(,pos . ,thunk) actions)
+ (push (cons (copy-marker pos) thunk) forw-actions))
+ (pcase-dolist (`(,pos . ,thunk) forw-actions)
+ (goto-char pos)
+ (funcall thunk))
+ (or peg--stack t)))
+
+;; Left recursion is presumably a common mistake when using PEGs.
+;; Here we try to detect such mistakes. Essentially we traverse the
+;; graph as long as we can without consuming input. When we find a
+;; recursive call we signal an error.
+
+(defun peg-detect-cycles (exp path)
+ "Signal an error on a cycle.
+Otherwise traverse EXP recursively and return T if EXP can match
+without consuming input. Return nil if EXP definitely consumes
+input. PATH is the list of rules that we have visited so far."
+ (apply #'peg--detect-cycles path exp))
+
+(cl-defgeneric peg--detect-cycles (head _path &rest args)
+ (error "No detect-cycle method for: %S" (cons head args)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql call)) name)
+ (if (member name path)
+ (error "Possible left recursion: %s"
+ (mapconcat (lambda (x) (format "%s" x))
+ (reverse (cons name path)) " -> "))
+ (let ((exp (peg--lookup-rule name)))
+ (if (null exp)
+ ;; If there's no rule by that name, either we'll fail at
+ ;; run-time or it will be defined later. In any case, at this
+ ;; point there's no evidence of a cycle, and if a cycle appears
+ ;; later we'll hopefully catch it when the rule gets defined.
+ ;; FIXME: In practice, if `name' is part of the cycle, we will
+ ;; indeed detect it when it gets defined, but OTOH if `name'
+ ;; is not part of a cycle but it *enables* a cycle because
+ ;; it matches the empty string (i.e. we should have returned t
+ ;; here), then we may not catch the problem at all :-(
+ nil
+ (peg-detect-cycles exp (cons name path))))))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql and)) e1 e2)
+ (and (peg-detect-cycles e1 path)
+ (peg-detect-cycles e2 path)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql or)) e1 e2)
+ (or (peg-detect-cycles e1 path)
+ (peg-detect-cycles e2 path)))
+
+(cl-defmethod peg--detect-cycles (path (_ (eql *)) e)
+ (peg-detect-cycles e path)
+ t)
+
+(cl-defmethod peg--detect-cycles (path (_ (eql if)) e)
+ (peg-unary-nullable e path))
+(cl-defmethod peg--detect-cycles (path (_ (eql not)) e)
+ (peg-unary-nullable e path))
+
+(defun peg-unary-nullable (exp path)
+ (peg-detect-cycles exp path)
+ t)
+
+(cl-defmethod peg--detect-cycles (_path (_ (eql any))) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql char)) _c) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql set)) _r _c _k) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql range)) _c1 _c2) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql str)) s) (equal s ""))
+(cl-defmethod peg--detect-cycles (_path (_ (eql guard)) _e) t)
+(cl-defmethod peg--detect-cycles (_path (_ (eql =)) _s) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql syntax-class)) _n) nil)
+(cl-defmethod peg--detect-cycles (_path (_ (eql action)) _form) t)
+
+(defun peg-merge-errors (exps)
+ "Build a more readable error message out of failed expression."
+ (let ((merged '()))
+ (dolist (exp exps)
+ (setq merged (peg-merge-error exp merged)))
+ merged))
+
+(defun peg-merge-error (exp merged)
+ (apply #'peg--merge-error merged exp))
+
+(cl-defgeneric peg--merge-error (_merged head &rest args)
+ (error "No merge-error method for: %S" (cons head args)))
+
+(cl-defmethod peg--merge-error (merged (_ (eql or)) e1 e2)
+ (peg-merge-error e2 (peg-merge-error e1 merged)))
+
+(cl-defmethod peg--merge-error (merged (_ (eql and)) e1 _e2)
+ ;; FIXME: Why is `e2' not used?
+ (peg-merge-error e1 merged))
+
+(cl-defmethod peg--merge-error (merged (_ (eql str)) str)
+ ;;(add-to-list 'merged str)
+ (cl-adjoin str merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql call)) rule)
+ ;; (add-to-list 'merged rule)
+ (cl-adjoin rule merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql char)) char)
+ ;; (add-to-list 'merged (string char))
+ (cl-adjoin (string char) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql set)) r c k)
+ ;; (add-to-list 'merged (peg-make-charset-regexp r c k))
+ (cl-adjoin (peg-make-charset-regexp r c k) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql range)) from to)
+ ;; (add-to-list 'merged (format "[%c-%c]" from to))
+ (cl-adjoin (format "[%c-%c]" from to) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql *)) exp)
+ (peg-merge-error exp merged))
+
+(cl-defmethod peg--merge-error (merged (_ (eql any)))
+ ;; (add-to-list 'merged '(any))
+ (cl-adjoin '(any) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql not)) x)
+ ;; (add-to-list 'merged `(not ,x))
+ (cl-adjoin `(not ,x) merged :test #'equal))
+
+(cl-defmethod peg--merge-error (merged (_ (eql action)) _action) merged)
+(cl-defmethod peg--merge-error (merged (_ (eql null))) merged)
+
+(provide 'peg)
+(require 'peg)
+
+(define-peg-rule null () :inline t (guard t))
+(define-peg-rule fail () :inline t (guard nil))
+(define-peg-rule bob () :inline t (guard (bobp)))
+(define-peg-rule eob () :inline t (guard (eobp)))
+(define-peg-rule bol () :inline t (guard (bolp)))
+(define-peg-rule eol () :inline t (guard (eolp)))
+(define-peg-rule bow () :inline t (guard (looking-at "\\<")))
+(define-peg-rule eow () :inline t (guard (looking-at "\\>")))
+(define-peg-rule bos () :inline t (guard (looking-at "\\_<")))
+(define-peg-rule eos () :inline t (guard (looking-at "\\_>")))
+
+;;; peg.el ends here
diff --git a/test/lisp/peg-tests.el b/test/lisp/peg-tests.el
new file mode 100644
index 00000000000..864e09b4200
--- /dev/null
+++ b/test/lisp/peg-tests.el
@@ -0,0 +1,367 @@
+;;; peg-tests.el --- Tests of PEG parsers -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+
+;; This program is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Tests and examples, that used to live in peg.el wrapped inside an `eval'.
+
+;;; Code:
+
+(require 'peg)
+(require 'ert)
+
+;;; Tests:
+
+(defmacro peg-parse-string (pex string &optional noerror)
+ "Parse STRING according to PEX.
+If NOERROR is non-nil, push nil resp. t if the parse failed
+resp. succeeded instead of signaling an error."
+ (let ((oldstyle (consp (car-safe pex)))) ;PEX is really a list of rules.
+ `(with-temp-buffer
+ (insert ,string)
+ (goto-char (point-min))
+ ,(if oldstyle
+ `(with-peg-rules ,pex
+ (peg-run (peg ,(caar pex))
+ ,(unless noerror '#'peg-signal-failure)))
+ `(peg-run (peg ,pex)
+ ,(unless noerror '#'peg-signal-failure))))))
+
+(define-peg-rule peg-test-natural ()
+ [0-9] (* [0-9]))
+
+(ert-deftest peg-test ()
+ (should (peg-parse-string peg-test-natural "99 bottles" t))
+ (should (peg-parse-string ((s "a")) "a" t))
+ (should (not (peg-parse-string ((s "a")) "b" t)))
+ (should (peg-parse-string ((s (not "a"))) "b" t))
+ (should (not (peg-parse-string ((s (not "a"))) "a" t)))
+ (should (peg-parse-string ((s (if "a"))) "a" t))
+ (should (not (peg-parse-string ((s (if "a"))) "b" t)))
+ (should (peg-parse-string ((s "ab")) "ab" t))
+ (should (not (peg-parse-string ((s "ab")) "ba" t)))
+ (should (not (peg-parse-string ((s "ab")) "a" t)))
+ (should (peg-parse-string ((s (range ?0 ?9))) "0" t))
+ (should (not (peg-parse-string ((s (range ?0 ?9))) "a" t)))
+ (should (peg-parse-string ((s [0-9])) "0" t))
+ (should (not (peg-parse-string ((s [0-9])) "a" t)))
+ (should (not (peg-parse-string ((s [0-9])) "" t)))
+ (should (peg-parse-string ((s (any))) "0" t))
+ (should (not (peg-parse-string ((s (any))) "" t)))
+ (should (peg-parse-string ((s (eob))) "" t))
+ (should (peg-parse-string ((s (not (eob)))) "a" t))
+ (should (peg-parse-string ((s (or "a" "b"))) "a" t))
+ (should (peg-parse-string ((s (or "a" "b"))) "b" t))
+ (should (not (peg-parse-string ((s (or "a" "b"))) "c" t)))
+ (should (peg-parse-string (and "a" "b") "ab" t))
+ (should (peg-parse-string ((s (and "a" "b"))) "abc" t))
+ (should (not (peg-parse-string (and "a" "b") "ba" t)))
+ (should (peg-parse-string ((s (and "a" "b" "c"))) "abc" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "b" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "ab" t))
+ (should (peg-parse-string ((s (* "a") "b" (eob))) "aaab" t))
+ (should (not (peg-parse-string ((s (* "a") "b" (eob))) "abc" t)))
+ (should (peg-parse-string ((s "")) "abc" t))
+ (should (peg-parse-string ((s "" (eob))) "" t))
+ (should (peg-parse-string ((s (opt "a") "b")) "abc" t))
+ (should (peg-parse-string ((s (opt "a") "b")) "bc" t))
+ (should (not (peg-parse-string ((s (or))) "ab" t)))
+ (should (peg-parse-string ((s (and))) "ab" t))
+ (should (peg-parse-string ((s (and))) "" t))
+ (should (peg-parse-string ((s ["^"])) "^" t))
+ (should (peg-parse-string ((s ["^a"])) "a" t))
+ (should (peg-parse-string ["-"] "-" t))
+ (should (peg-parse-string ((s ["]-"])) "]" t))
+ (should (peg-parse-string ((s ["^]"])) "^" t))
+ (should (peg-parse-string ((s [alpha])) "z" t))
+ (should (not (peg-parse-string ((s [alpha])) "0" t)))
+ (should (not (peg-parse-string ((s [alpha])) "" t)))
+ (should (not (peg-parse-string ((s ["][:alpha:]"])) "z" t)))
+ (should (peg-parse-string ((s (bob))) "" t))
+ (should (peg-parse-string ((s (bos))) "x" t))
+ (should (not (peg-parse-string ((s (bos))) " x" t)))
+ (should (peg-parse-string ((s "x" (eos))) "x" t))
+ (should (peg-parse-string ((s (syntax-class whitespace))) " " t))
+ (should (peg-parse-string ((s (= "foo"))) "foo" t))
+ (should (let ((f "foo")) (peg-parse-string ((s (= f))) "foo" t)))
+ (should (not (peg-parse-string ((s (= "foo"))) "xfoo" t)))
+ (should (equal (peg-parse-string ((s `(-- 1 2))) "") '(2 1)))
+ (should (equal (peg-parse-string ((s `(-- 1 2) `(a b -- a b))) "") '(2 1)))
+ (should (equal (peg-parse-string ((s (or (and (any) s)
+ (substring [0-9]))))
+ "ab0cd1ef2gh")
+ '("2")))
+ ;; The PEG rule `other' doesn't exist, which will cause a byte-compiler
+ ;; warning, but not an error at run time because the rule is not actually
+ ;; used in this particular case.
+ (should (equal (peg-parse-string ((s (substring (or "a" other)))
+ ;; Unused left-recursive rule, should
+ ;; cause a byte-compiler warning.
+ (r (* "a") r))
+ "af")
+ '("a")))
+ (should (equal (peg-parse-string ((s (list x y))
+ (x `(-- 1))
+ (y `(-- 2)))
+ "")
+ '((1 2))))
+ (should (equal (peg-parse-string ((s (list (* x)))
+ (x "" `(-- 'x)))
+ "xxx")
+ ;; The empty loop body should be matched once!
+ '((x))))
+ (should (equal (peg-parse-string ((s (list (* x)))
+ (x "x" `(-- 'x)))
+ "xxx")
+ '((x x x))))
+ (should (equal (peg-parse-string ((s (region (* x)))
+ (x "x" `(-- 'x)))
+ "xxx")
+ ;; FIXME: Since string positions start at 0, this should
+ ;; really be '(3 x x x 0) !!
+ '(4 x x x 1)))
+ (should (equal (peg-parse-string ((s (region (list (* x))))
+ (x "x" `(-- 'x 'y)))
+ "xxx")
+ '(4 (x y x y x y) 1)))
+ (should (equal (with-temp-buffer
+ (save-excursion (insert "abcdef"))
+ (list
+ (peg-run (peg "a"
+ (replace "bc" "x")
+ (replace "de" "y")
+ "f"))
+ (buffer-string)))
+ '(t "axyf")))
+ (with-temp-buffer
+ (insert "toro")
+ (goto-char (point-min))
+ (should (peg-run (peg "to")))
+ (should-not (peg-run (peg "to")))
+ (should (peg-run (peg "ro")))
+ (should (eobp)))
+ (with-temp-buffer
+ (insert " ")
+ (goto-char (point-min))
+ (peg-run (peg (+ (syntax-class whitespace))))
+ (should (eobp)))
+ )
+
+;;; Examples:
+
+;; peg-ex-recognize-int recognizes integers. An integer begins with a
+;; optional sign, then follows one or more digits. Digits are all
+;; characters from 0 to 9.
+;;
+;; Notes:
+;; 1) "" matches the empty sequence, i.e. matches without consuming
+;; input.
+;; 2) [0-9] is the character range from 0 to 9. This can also be
+;; written as (range ?0 ?9). Note that 0-9 is a symbol.
+(defun peg-ex-recognize-int ()
+ (with-peg-rules ((number sign digit (* digit))
+ (sign (or "+" "-" ""))
+ (digit [0-9]))
+ (peg-run (peg number))))
+
+;; peg-ex-parse-int recognizes integers and computes the corresponding
+;; value. The grammar is the same as for `peg-ex-recognize-int'
+;; augmented with parsing actions. Unfortunaletly, the actions add
+;; quite a bit of clutter.
+;;
+;; The actions for the sign rule push -1 on the stack for a minus sign
+;; and 1 for plus or no sign.
+;;
+;; The action for the digit rule pushes the value for a single digit.
+;;
+;; The action `(a b -- (+ (* a 10) b)), takes two items from the stack
+;; and pushes the first digit times 10 added to the second digit.
+;;
+;; The action `(sign val -- (* sign val)), multiplies val with the
+;; sign (1 or -1).
+(defun peg-ex-parse-int ()
+ (with-peg-rules ((number sign digit (* digit
+ `(a b -- (+ (* a 10) b)))
+ `(sign val -- (* sign val)))
+ (sign (or (and "+" `(-- 1))
+ (and "-" `(-- -1))
+ (and "" `(-- 1))))
+ (digit [0-9] `(-- (- (char-before) ?0))))
+ (peg-run (peg number))))
+
+;; Put point after the ) and press C-x C-e
+;; (peg-ex-parse-int)-234234
+
+;; Parse arithmetic expressions and compute the result as side effect.
+(defun peg-ex-arith ()
+ (peg-parse
+ (expr _ sum eol)
+ (sum product (* (or (and "+" _ product `(a b -- (+ a b)))
+ (and "-" _ product `(a b -- (- a b))))))
+ (product value (* (or (and "*" _ value `(a b -- (* a b)))
+ (and "/" _ value `(a b -- (/ a b))))))
+ (value (or (and (substring number) `(string -- (string-to-number string)))
+ (and "(" _ sum ")" _)))
+ (number (+ [0-9]) _)
+ (_ (* [" \t"]))
+ (eol (or "\n" "\r\n" "\r"))))
+
+;; (peg-ex-arith) 1 + 2 * 3 * (4 + 5)
+;; (peg-ex-arith) 1 + 2 ^ 3 * (4 + 5) ; fails to parse
+
+;; Parse URI according to RFC 2396.
+(defun peg-ex-uri ()
+ (peg-parse
+ (URI-reference (or absoluteURI relativeURI)
+ (or (and "#" (substring fragment))
+ `(-- nil))
+ `(scheme user host port path query fragment --
+ (list :scheme scheme :user user
+ :host host :port port
+ :path path :query query
+ :fragment fragment)))
+ (absoluteURI (substring scheme) ":" (or hier-part opaque-part))
+ (hier-part ;(-- user host port path query)
+ (or net-path
+ (and `(-- nil nil nil)
+ abs-path))
+ (or (and "?" (substring query))
+ `(-- nil)))
+ (net-path "//" authority (or abs-path `(-- nil)))
+ (abs-path "/" path-segments)
+ (path-segments segment (list (* "/" segment)) `(s l -- (cons s l)))
+ (segment (substring (* pchar) (* ";" param)))
+ (param (* pchar))
+ (pchar (or unreserved escaped [":@&=+$,"]))
+ (query (* uric))
+ (fragment (* uric))
+ (relativeURI (or net-path abs-path rel-path) (opt "?" query))
+ (rel-path rel-segment (opt abs-path))
+ (rel-segment (+ unreserved escaped [";@&=+$,"]))
+ (authority (or server reg-name))
+ (server (or (and (or (and (substring userinfo) "@")
+ `(-- nil))
+ hostport)
+ `(-- nil nil nil)))
+ (userinfo (* (or unreserved escaped [";:&=+$,"])))
+ (hostport (substring host) (or (and ":" (substring port))
+ `(-- nil)))
+ (host (or hostname ipv4address))
+ (hostname (* domainlabel ".") toplabel (opt "."))
+ (domainlabel alphanum
+ (opt (* (or alphanum "-") (if alphanum))
+ alphanum))
+ (toplabel alpha
+ (* (or alphanum "-") (if alphanum))
+ alphanum)
+ (ipv4address (+ digit) "." (+ digit) "." (+ digit) "." (+ digit))
+ (port (* digit))
+ (scheme alpha (* (or alpha digit ["+-."])))
+ (reg-name (or unreserved escaped ["$,;:@&=+"]))
+ (opaque-part uric-no-slash (* uric))
+ (uric (or reserved unreserved escaped))
+ (uric-no-slash (or unreserved escaped [";?:@&=+$,"]))
+ (reserved (set ";/?:@&=+$,"))
+ (unreserved (or alphanum mark))
+ (escaped "%" hex hex)
+ (hex (or digit [A-F] [a-f]))
+ (mark (set "-_.!~*'()"))
+ (alphanum (or alpha digit))
+ (alpha (or lowalpha upalpha))
+ (lowalpha [a-z])
+ (upalpha [A-Z])
+ (digit [0-9])))
+
+;; (peg-ex-uri)http://luser@www.foo.com:8080/bar/baz.html?x=1#foo
+;; (peg-ex-uri)file:/bar/baz.html?foo=df#x
+
+;; Split STRING where SEPARATOR occurs.
+(defun peg-ex-split (string separator)
+ (peg-parse-string ((s (list (* (* sep) elt)))
+ (elt (substring (+ (not sep) (any))))
+ (sep (= separator)))
+ string))
+
+;; (peg-ex-split "-abc-cd-" "-")
+
+;; Parse a lisp style Sexp.
+;; [To keep the example short, ' and . are handled as ordinary symbol.]
+(defun peg-ex-lisp ()
+ (peg-parse
+ (sexp _ (or string list number symbol))
+ (_ (* (or [" \n\t"] comment)))
+ (comment ";" (* (not (or "\n" (eob))) (any)))
+ (string "\"" (substring (* (not "\"") (any))) "\"")
+ (number (substring (opt (set "+-")) (+ digit))
+ (if terminating)
+ `(string -- (string-to-number string)))
+ (symbol (substring (and symchar (* (not terminating) symchar)))
+ `(s -- (intern s)))
+ (symchar [a-z A-Z 0-9 "-;!#%&'*+,./:;<=>?@[]^_`{|}~"])
+ (list "(" `(-- (cons nil nil)) `(hd -- hd hd)
+ (* sexp `(tl e -- (setcdr tl (list e))))
+ _ ")" `(hd _tl -- (cdr hd)))
+ (digit [0-9])
+ (terminating (or (set " \n\t();\"'") (eob)))))
+
+;; (peg-ex-lisp)
+
+;; We try to detect left recursion and report it as error.
+(defun peg-ex-left-recursion ()
+ (eval '(peg-parse (exp (or term
+ (and exp "+" exp)))
+ (term (or digit
+ (and term "*" term)))
+ (digit [0-9]))
+ t))
+
+(defun peg-ex-infinite-loop ()
+ (eval '(peg-parse (exp (* (or "x"
+ "y"
+ (action (foo))))))
+ t))
+
+;; Some efficiency problems:
+
+;; Find the last digit in a string.
+;; Recursive definition with excessive stack usage.
+(defun peg-ex-last-digit (string)
+ (peg-parse-string ((s (or (and (any) s)
+ (substring [0-9]))))
+ string))
+
+;; (peg-ex-last-digit "ab0cd1ef2gh")
+;; (peg-ex-last-digit (make-string 50 ?-))
+;; (peg-ex-last-digit (make-string 1000 ?-))
+
+;; Find the last digit without recursion. Doesn't run out of stack,
+;; but probably still too inefficient for large inputs.
+(defun peg-ex-last-digit2 (string)
+ (peg-parse-string ((s `(-- nil)
+ (+ (* (not digit) (any))
+ (substring digit)
+ `(_d1 d2 -- d2)))
+ (digit [0-9]))
+ string))
+
+;; (peg-ex-last-digit2 "ab0cd1ef2gh")
+;; (peg-ex-last-digit2 (concat (make-string 500000 ?-) "8a9b"))
+;; (peg-ex-last-digit2 (make-string 500000 ?-))
+;; (peg-ex-last-digit2 (make-string 500000 ?5))
+
+(provide 'peg-tests)
+;;; peg-tests.el ends here
--
2.42.0
^ permalink raw reply related [flat|nested] 100+ messages in thread
end of thread, other threads:[~2024-03-25 1:45 UTC | newest]
Thread overview: 100+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
2021-08-26 6:17 ` Eli Zaretskii
2021-08-26 15:34 ` Eric Abrahamsen
2021-09-09 4:36 ` Eric Abrahamsen
2021-09-19 15:25 ` Eric Abrahamsen
2021-09-30 19:44 ` Stefan Monnier
2021-09-30 20:34 ` Adam Porter
2021-10-01 8:14 ` Augusto Stoffel
2021-10-01 18:05 ` Stefan Monnier
2021-10-01 18:40 ` Eric Abrahamsen
2021-10-02 3:57 ` Stefan Monnier
2021-10-02 7:32 ` Adam Porter
2021-10-02 14:45 ` Stefan Monnier
2021-10-02 15:13 ` Adam Porter
2021-08-26 17:02 ` Adam Porter
2021-08-26 17:25 ` Eric Abrahamsen
2021-08-27 3:17 ` Eric Abrahamsen
2021-08-27 6:41 ` Helmut Eller
2021-08-27 16:57 ` Eric Abrahamsen
2021-09-26 10:59 ` Augusto Stoffel
2021-09-26 15:06 ` Eric Abrahamsen
2021-09-26 18:36 ` Augusto Stoffel
2021-09-27 16:18 ` Eric Abrahamsen
2021-09-27 22:34 ` Richard Stallman
2021-09-28 3:52 ` Eric Abrahamsen
2021-09-28 8:09 ` tomas
2021-09-28 9:32 ` Helmut Eller
2021-09-28 10:45 ` tomas
2021-09-28 15:24 ` Augusto Stoffel
2021-09-30 6:04 ` Richard Stallman
2021-10-01 3:27 ` Eric Abrahamsen
2021-10-09 1:31 ` Michael Heerdegen
2021-10-09 5:28 ` Michael Heerdegen
2021-10-09 8:12 ` Helmut Eller
2021-10-09 12:52 ` Stefan Monnier
2021-10-10 5:49 ` Helmut Eller
2021-10-14 10:25 ` Michael Heerdegen
2021-10-09 12:54 ` Stefan Monnier
2021-10-09 16:47 ` Eric Abrahamsen
2021-10-10 4:20 ` Michael Heerdegen
2021-10-10 21:40 ` Eric Abrahamsen
2021-10-13 2:58 ` Michael Heerdegen
2021-10-09 16:49 ` Eric Abrahamsen
2021-10-10 3:43 ` Stefan Monnier
2021-10-10 4:46 ` Michael Heerdegen
2021-10-10 5:58 ` Helmut Eller
2021-10-10 13:56 ` Stefan Monnier
2021-10-22 16:33 ` Michael Heerdegen
2021-10-31 23:43 ` Michael Heerdegen
2021-11-15 23:16 ` Michael Heerdegen
2022-11-07 3:33 ` Ihor Radchenko
2022-11-07 19:46 ` Eric Abrahamsen
2022-11-08 6:57 ` Helmut Eller
2022-11-08 8:51 ` Ihor Radchenko
2022-11-10 4:04 ` Richard Stallman
2022-11-10 5:25 ` tomas
2022-11-10 8:15 ` Eli Zaretskii
2022-11-10 8:29 ` tomas
2022-11-11 4:36 ` Richard Stallman
2022-11-08 8:47 ` Ihor Radchenko
2022-11-08 16:18 ` Eric Abrahamsen
2022-11-08 19:08 ` tomas
2022-11-08 19:42 ` Eric Abrahamsen
2022-11-16 4:27 ` [PATCH] " Eric Abrahamsen
2022-11-16 5:07 ` tomas
2022-11-16 5:39 ` Eric Abrahamsen
2022-11-16 15:53 ` tomas
2022-11-16 6:24 ` Ihor Radchenko
2022-11-16 18:15 ` Eric Abrahamsen
2022-11-17 12:21 ` Ihor Radchenko
2022-11-27 1:46 ` Eric Abrahamsen
2022-11-27 8:57 ` Eli Zaretskii
2022-11-28 1:09 ` Eric Abrahamsen
2022-11-28 12:16 ` Eli Zaretskii
2023-09-25 1:30 ` Eric Abrahamsen
2023-09-25 2:27 ` Adam Porter
2023-09-25 13:00 ` Alexander Adolf
2024-03-24 14:19 ` Ihor Radchenko
2024-03-24 15:32 ` Eli Zaretskii
2024-03-25 1:45 ` Eric Abrahamsen
2023-01-11 7:39 ` Michael Heerdegen
2023-01-11 8:04 ` Ihor Radchenko
2023-01-11 11:01 ` Michael Heerdegen
2023-01-11 11:32 ` tomas
2023-02-05 12:10 ` Ihor Radchenko
2023-02-05 15:41 ` Eduardo Ochs
2023-02-05 15:45 ` Ihor Radchenko
2023-02-05 16:19 ` Eduardo Ochs
2023-02-05 16:50 ` Ihor Radchenko
2023-02-09 5:44 ` Jean Louis
2023-02-06 0:33 ` Michael Heerdegen
2022-11-08 14:01 ` Stefan Monnier
2022-11-08 14:42 ` tomas
2022-11-08 15:08 ` Visuwesh
2022-11-08 16:29 ` Juanma Barranquero
2022-12-02 20:20 ` Augusto Stoffel
2022-11-08 16:10 ` Eric Abrahamsen
2022-11-08 18:59 ` tomas
2022-11-08 19:42 ` Eric Abrahamsen
2022-11-08 22:03 ` Tim Cross
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).