unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Augusto Stoffel <arstoffel@gmail.com>
To: Helmut Eller <eller.helmut@gmail.com>
Cc: Eric Abrahamsen <eric@ericabrahamsen.net>, emacs-devel@gnu.org
Subject: Re: Make peg.el a built-in library?
Date: Sun, 26 Sep 2021 12:59:03 +0200	[thread overview]
Message-ID: <874ka7wqko.fsf@gmail.com> (raw)
In-Reply-To: <m235qvxucw.fsf@gmail.com> (Helmut Eller's message of "Fri, 27 Aug 2021 08:41:19 +0200")

I think it would be really cool to have PEGs built into Emacs.  Things
like json.el could be simplified by at least (log10 2) orders of
magnitude with PEGs.  Whatever the use case of `rx' is, PEGs are
probably the "real" solution.

But I suspect this would only take traction with a fast and robust C
implementation like Lua's LPEG (see below for a reason).

On Fri, 27 Aug 2021 at 08:41, Helmut Eller <eller.helmut@gmail.com> wrote:

> On Thu, Aug 26 2021, Eric Abrahamsen wrote:

>
>> Whoo, I've been trying to get enough of a handle on the parsing actions
>> to write a documentation patch for them -- now I'm seeing what Helmut
>> meant by "semantically unintuitive".
>
> What I actually meant with "semantically unintuitive" are issues
> described in Roman Redziejowski's "Trying to understand PEG" paper[*].
> He writes:
>
>    The problem with limited backtracking is that by not trying hard it
>    may miss some inputs that it should accept.  A notorious example is
>    the rule A = aAa | aa that defines the set of strings of a’s of even
>    length. Implemented with limited backtracking, this rule accepts only
>    strings of length 2^n.

When I started to write PEGs intensively, I thought the limited
backtracking would be a problem.  It's not.  In fact, I find the
regexp-style backtracking great, but only for “quick and dirty” things
(e.g., those throw-away little programs one writes for grep or isearch).
But if are trying to write a more complex parser, aggressive
backtracking actually gets in the way.

The example above is kind of silly.  You can parse an even number of a's
with the rule A = aaA | ε.  This is still kind of bad, because (unless
peg.el is way fancier than I'm imagining), it consumes the call stack.
LPEG has a kind of “tail call optimization” that allows you to do this.

Obviously, the sane way to parse an even number of a's is the rule
(aa)*, aka (* "aa").  But there are many justifiable use-cases for the
tail call optimization.  For instance, given a pattern P, produce a new
pattern that looks ahead for the first match of P.  This would be
P | .P, or

    (or P (and (any) P))

in peg.el notation.  Is there a simple an efficient way to do this in
peg.el, that allows to skip over thousands of characters without a new
call stack entry for each one of them?



  parent reply	other threads:[~2021-09-26 10:59 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 18:52 Make peg.el a built-in library? Eric Abrahamsen
2021-08-26  6:17 ` Eli Zaretskii
2021-08-26 15:34   ` Eric Abrahamsen
2021-09-09  4:36     ` Eric Abrahamsen
2021-09-19 15:25       ` Eric Abrahamsen
2021-09-30 19:44       ` Stefan Monnier
2021-09-30 20:34         ` Adam Porter
2021-10-01  8:14           ` Augusto Stoffel
2021-10-01 18:05           ` Stefan Monnier
2021-10-01 18:40             ` Eric Abrahamsen
2021-10-02  3:57               ` Stefan Monnier
2021-10-02  7:32             ` Adam Porter
2021-10-02 14:45               ` Stefan Monnier
2021-10-02 15:13                 ` Adam Porter
2021-08-26 17:02 ` Adam Porter
2021-08-26 17:25   ` Eric Abrahamsen
2021-08-27  3:17   ` Eric Abrahamsen
2021-08-27  6:41     ` Helmut Eller
2021-08-27 16:57       ` Eric Abrahamsen
2021-09-26 10:59       ` Augusto Stoffel [this message]
2021-09-26 15:06         ` Eric Abrahamsen
2021-09-26 18:36           ` Augusto Stoffel
2021-09-27 16:18             ` Eric Abrahamsen
2021-09-27 22:34         ` Richard Stallman
2021-09-28  3:52           ` Eric Abrahamsen
2021-09-28  8:09             ` tomas
2021-09-28  9:32               ` Helmut Eller
2021-09-28 10:45                 ` tomas
2021-09-28 15:24               ` Augusto Stoffel
2021-09-30  6:04             ` Richard Stallman
2021-10-01  3:27               ` Eric Abrahamsen
2021-10-09  1:31 ` Michael Heerdegen
2021-10-09  5:28   ` Michael Heerdegen
2021-10-09  8:12     ` Helmut Eller
2021-10-09 12:52       ` Stefan Monnier
2021-10-10  5:49         ` Helmut Eller
2021-10-14 10:25       ` Michael Heerdegen
2021-10-09 12:54   ` Stefan Monnier
2021-10-09 16:47     ` Eric Abrahamsen
2021-10-10  4:20       ` Michael Heerdegen
2021-10-10 21:40         ` Eric Abrahamsen
2021-10-13  2:58           ` Michael Heerdegen
2021-10-09 16:49   ` Eric Abrahamsen
2021-10-10  3:43     ` Stefan Monnier
2021-10-10  4:46       ` Michael Heerdegen
2021-10-10  5:58         ` Helmut Eller
2021-10-10 13:56           ` Stefan Monnier
2021-10-22 16:33           ` Michael Heerdegen
2021-10-31 23:43           ` Michael Heerdegen
2021-11-15 23:16             ` Michael Heerdegen
2022-11-07  3:33 ` Ihor Radchenko
2022-11-07 19:46   ` Eric Abrahamsen
2022-11-08  6:57     ` Helmut Eller
2022-11-08  8:51       ` Ihor Radchenko
2022-11-10  4:04       ` Richard Stallman
2022-11-10  5:25         ` tomas
2022-11-10  8:15           ` Eli Zaretskii
2022-11-10  8:29             ` tomas
2022-11-11  4:36           ` Richard Stallman
2022-11-08  8:47     ` Ihor Radchenko
2022-11-08 16:18       ` Eric Abrahamsen
2022-11-08 19:08         ` tomas
2022-11-08 19:42           ` Eric Abrahamsen
2022-11-16  4:27             ` [PATCH] " Eric Abrahamsen
2022-11-16  5:07               ` tomas
2022-11-16  5:39                 ` Eric Abrahamsen
2022-11-16 15:53                   ` tomas
2022-11-16  6:24               ` Ihor Radchenko
2022-11-16 18:15                 ` Eric Abrahamsen
2022-11-17 12:21                   ` Ihor Radchenko
2022-11-27  1:46                     ` Eric Abrahamsen
2022-11-27  8:57                       ` Eli Zaretskii
2022-11-28  1:09                         ` Eric Abrahamsen
2022-11-28 12:16                           ` Eli Zaretskii
2023-09-25  1:30                             ` Eric Abrahamsen
2023-09-25  2:27                               ` Adam Porter
2023-09-25 13:00                                 ` Alexander Adolf
2024-03-24 14:19                               ` Ihor Radchenko
2024-03-24 15:32                                 ` Eli Zaretskii
2024-03-25  1:45                                   ` Eric Abrahamsen
2023-01-11  7:39               ` Michael Heerdegen
2023-01-11  8:04                 ` Ihor Radchenko
2023-01-11 11:01                   ` Michael Heerdegen
2023-01-11 11:32                     ` tomas
2023-02-05 12:10                     ` Ihor Radchenko
2023-02-05 15:41                       ` Eduardo Ochs
2023-02-05 15:45                         ` Ihor Radchenko
2023-02-05 16:19                           ` Eduardo Ochs
2023-02-05 16:50                             ` Ihor Radchenko
2023-02-09  5:44                         ` Jean Louis
2023-02-06  0:33                       ` Michael Heerdegen
2022-11-08 14:01     ` Stefan Monnier
2022-11-08 14:42       ` tomas
2022-11-08 15:08       ` Visuwesh
2022-11-08 16:29         ` Juanma Barranquero
2022-12-02 20:20           ` Augusto Stoffel
2022-11-08 16:10       ` Eric Abrahamsen
2022-11-08 18:59         ` tomas
2022-11-08 19:42           ` Eric Abrahamsen
2022-11-08 22:03             ` Tim Cross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874ka7wqko.fsf@gmail.com \
    --to=arstoffel@gmail.com \
    --cc=eller.helmut@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=eric@ericabrahamsen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).