From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Augusto Stoffel Newsgroups: gmane.emacs.devel Subject: Re: Make peg.el a built-in library? Date: Sun, 26 Sep 2021 12:59:03 +0200 Message-ID: <874ka7wqko.fsf@gmail.com> References: <875yvtbbn3.fsf@ericabrahamsen.net> <87bl5k87hq.fsf@alphapapa.net> <87fsuvpod4.fsf@ericabrahamsen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7228"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Eric Abrahamsen , emacs-devel@gnu.org To: Helmut Eller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Sep 26 13:01:29 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mURuT-0001my-3b for ged-emacs-devel@m.gmane-mx.org; Sun, 26 Sep 2021 13:01:29 +0200 Original-Received: from localhost ([::1]:34516 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mURuR-0007IT-Hc for ged-emacs-devel@m.gmane-mx.org; Sun, 26 Sep 2021 07:01:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35998) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mURsD-0005UZ-8t for emacs-devel@gnu.org; Sun, 26 Sep 2021 06:59:10 -0400 Original-Received: from mail-wr1-x42b.google.com ([2a00:1450:4864:20::42b]:40895) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mURsB-0003dp-OP for emacs-devel@gnu.org; Sun, 26 Sep 2021 06:59:09 -0400 Original-Received: by mail-wr1-x42b.google.com with SMTP id s21so2465151wra.7 for ; Sun, 26 Sep 2021 03:59:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=DimPlEzMdPRgOeIMdleQ5zYVsY81fJ2tM5MwUnkVBK4=; b=DBpxHeQ4P0TczDk1UFDWpVXensmnWUJvNUfK9ckSbQyWsVnRPwZlElOC6Z7yzTvQRr WGTG3IEqD+UTGF/8NDsG61ADZ+1CwSzXjZuBdsYstKk33nil92Q/g1Gf1D0yECR93b1O p2YFmw0Pq9AFqmu8/GoNz2bfNa2CQDUgIoFf2zf8s1EGHNspWMvB4FyL7/IXXYNhrJtg cyKc/JiZXbBUT4lUNhTWImMf9h6M0hvqfE1fEYJ2MvCUXW73qqGQQzq4DFlrr6PfowfZ CNM++T8b3/Q/UjhwsAULmJXI/FIO3RPbjhjQl7QYWW/CVs4UTzLgQfgVqQzkIt1OSxc/ Oq+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=DimPlEzMdPRgOeIMdleQ5zYVsY81fJ2tM5MwUnkVBK4=; b=GIiZc4YEyK1c72ZjP7Mt6TVEXhCeUkADsgLdYK/G23iAURYHg5dzQrY62pCDrAr+4L K7TgP5wGyU+bHS6f8T0cvj+fUZza3ERBG0GhLIZMXUASZQlsDCzjrhEnd+XmJjoZ9DhG LMgVcclh7GHbG3LNIrH8OSsFR9qiAr3xLzPBkuKnpBHG2YzhcODNpnOYfAqJPFKy9xpA /K4QIosZlARCTTTLMuBwax9Y9MmSY2bY/VQ0ZFGbO1cq8upeab3NyTkKDFQqDGnYBVk0 vRJcnH7woZUQyz4mneCyTLrCrX8/bTEeWzG0zhuZ+yDFZ5yIoVff1FN1oUu+v8EgS7Es zkGw== X-Gm-Message-State: AOAM532ePuCtyLsFqN6HrsZCW+LSUyLr9c42pAS7/H6S7d33ReVuzht2 WCTYW+RqXX7TM4i4DJkXqldd/CiQZYo= X-Google-Smtp-Source: ABdhPJxlNPmpxdJcv7Ke27g64EeuKZq9womnZLtxlCjLEyoTAgn3BGTPEGX40aNIysStXU8ujUm0fQ== X-Received: by 2002:a1c:80cd:: with SMTP id b196mr10882342wmd.36.1632653945417; Sun, 26 Sep 2021 03:59:05 -0700 (PDT) Original-Received: from ars3 ([2a02:8109:8ac0:56d0::2f72]) by smtp.gmail.com with ESMTPSA id z12sm13433831wro.75.2021.09.26.03.59.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Sep 2021 03:59:04 -0700 (PDT) In-Reply-To: (Helmut Eller's message of "Fri, 27 Aug 2021 08:41:19 +0200") Received-SPF: pass client-ip=2a00:1450:4864:20::42b; envelope-from=arstoffel@gmail.com; helo=mail-wr1-x42b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275509 Archived-At: I think it would be really cool to have PEGs built into Emacs. Things like json.el could be simplified by at least (log10 2) orders of magnitude with PEGs. Whatever the use case of `rx' is, PEGs are probably the "real" solution. But I suspect this would only take traction with a fast and robust C implementation like Lua's LPEG (see below for a reason). On Fri, 27 Aug 2021 at 08:41, Helmut Eller wrote: > On Thu, Aug 26 2021, Eric Abrahamsen wrote: > >> Whoo, I've been trying to get enough of a handle on the parsing actions >> to write a documentation patch for them -- now I'm seeing what Helmut >> meant by "semantically unintuitive". > > What I actually meant with "semantically unintuitive" are issues > described in Roman Redziejowski's "Trying to understand PEG" paper[*]. > He writes: > > The problem with limited backtracking is that by not trying hard it > may miss some inputs that it should accept. A notorious example is > the rule A =3D aAa | aa that defines the set of strings of a=E2=80=99s= of even > length. Implemented with limited backtracking, this rule accepts only > strings of length 2^n. When I started to write PEGs intensively, I thought the limited backtracking would be a problem. It's not. In fact, I find the regexp-style backtracking great, but only for =E2=80=9Cquick and dirty=E2= =80=9D things (e.g., those throw-away little programs one writes for grep or isearch). But if are trying to write a more complex parser, aggressive backtracking actually gets in the way. The example above is kind of silly. You can parse an even number of a's with the rule A =3D aaA | =CE=B5. This is still kind of bad, because (unle= ss peg.el is way fancier than I'm imagining), it consumes the call stack. LPEG has a kind of =E2=80=9Ctail call optimization=E2=80=9D that allows you= to do this. Obviously, the sane way to parse an even number of a's is the rule (aa)*, aka (* "aa"). But there are many justifiable use-cases for the tail call optimization. For instance, given a pattern P, produce a new pattern that looks ahead for the first match of P. This would be P | .P, or (or P (and (any) P)) in peg.el notation. Is there a simple an efficient way to do this in peg.el, that allows to skip over thousands of characters without a new call stack entry for each one of them?