From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.devel Subject: Re: Make peg.el a built-in library? Date: Tue, 08 Nov 2022 08:18:15 -0800 Message-ID: <87a6511ku0.fsf@ericabrahamsen.net> References: <875yvtbbn3.fsf@ericabrahamsen.net> <877d07a16u.fsf@localhost> <87tu3asg2r.fsf@ericabrahamsen.net> <87edud25ov.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31575"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Cancel-Lock: sha1:kYBUuFkK/UMKIGIFvPnDOe1No+M= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 08 18:02:36 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1osRzg-0007zj-2O for ged-emacs-devel@m.gmane-mx.org; Tue, 08 Nov 2022 18:02:36 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1osRyj-0000RP-QX; Tue, 08 Nov 2022 12:01:43 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osRIw-0003Qu-Pw for emacs-devel@gnu.org; Tue, 08 Nov 2022 11:18:30 -0500 Original-Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osRIu-00032m-UH for emacs-devel@gnu.org; Tue, 08 Nov 2022 11:18:26 -0500 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1osRIt-0000mU-GS for emacs-devel@gnu.org; Tue, 08 Nov 2022 17:18:23 +0100 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 08 Nov 2022 12:00:49 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:299353 Archived-At: Ihor Radchenko writes: > Eric Abrahamsen writes: > >>> Is there any progress merging peg.el to Emacs? >>> I do not see any obvious blockers in the discussion, but the merge never >>> happened? >> >> I will say that I tried to use PEG to resolve some gruesome text-parsing >> issues in EBDB very recently, and failed to make it work in the hour or >> two I'd allotted to the problem. The file-comment docs are pretty good, >> but I think they would need to be expanded in a few crucial ways, >> particularly to help those who don't necessarily know how PEGs work. >> >> Specifically, it is not obvious (to me) the ways in which PEGs (or maybe >> just peg.el) are not fully declarative. It doesn't backtrack, and I >> suspect it won't ever backtrack or isn't even supposed to, which means >> users should be made explicitly aware of the ways in which their rules >> can fail, and the ways in which declaration order matter. The comment >> for the `or' construct reads: >> >> Prioritized Choice >> >> And that's about the only hint you get. > > As the comment in peg.el states, the definitions are adapted from the > original PEG paper. There is even a link to paper and also to > presentation explaining how peg works. I strongly advice you to read > that. Prioritized Choice is explained there. This is what I was saying in my original message, though: if peg.el is going to go into core, it probably needs more/better docs than code comments and "read this paper". Its likely users will be Elisp library authors like me, who are just trying to free themselves from regexp hell and want a relatively straightforward alternative. I used peg.el to prototype search-string parsing in Gnus and everything Just Worked the first time and it was pretty amazing. In my later example below everything did not Just Work, but I think with some more hand-holdy documentation it would have. >> I was trying to parse a >> multiword name like >> >> Eric Edwin Abrahamsen >> >> into the structure >> >> (("Eric" "Edwin") "Abrahamsen") >> >> using rules like >> >> (plain-name (substring (+ [word])) (* [space])) >> (full-name (list (+ plain-name) plain-name) >> `(names -- (list (butlast names) (car (last names))))) >> >> Which always fails to match because (+ plain-name) is greedy and eats up >> all the words. It doesn't ever try leaving out the last word in an >> attempt to make the rule match. > > One way is > > (with-peg-rules > ((name (substring (+ [word])) (* [blank])) > (given-name name (not (eol))) > (last-name name (and (eol))) > (full-name (list (+ given-name) last-name) `(names -- (list (butlast names) (car (last names)))))) > (peg-run (peg full-name))) > > A simple-minded non-greedy version would be ambiguous. You must > necessarily indicate end of input. > > A more appropriate non-ambiguous non-greedy statement would involve or > (which you admittedly did not understand): > > (with-peg-rules > ((name (substring (+ [word])) (* [blank])) > (given-name name) > (last-name name (and (eol))) > (full-name (list (+ (or last-name given-name)) (and (eol))) `(names -- (list (butlast names) (car (last names)))))) > ;;;;;;;;;;;;;;;;;;;;;^^ > (peg-run (peg full-name))) Thanks! This is very helpful to my understanding. In this particular case I'm putting strings in a temporary buffer, so signals like (eol) or more likely (eob) are present and reliable. >> I'm happy to write the docs (should it have its own info manual >> section?), if we really think there are no other necessary >> fixes/improvements. > > I find PEG to be a nice addition when regexps do not cut the necessary > parsing, while using Bovine or tree-sitter is an overkill. I've never tried tree-sitter, but I have tried and failed to make Bovine do this sort of thing more than once over the years. I also agree that a middle ground is needed. Eric