From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Michael Heerdegen Newsgroups: gmane.emacs.devel Subject: Re: Make peg.el a built-in library? Date: Mon, 01 Nov 2021 00:43:13 +0100 Message-ID: <87mtmobw1a.fsf@web.de> References: <875yvtbbn3.fsf@ericabrahamsen.net> <87a6jjc7c8.fsf@web.de> <87v926w3bs.fsf@ericabrahamsen.net> <87zgrhh4he.fsf@web.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34706"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Helmut Eller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Nov 01 00:44:49 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mhKVN-0008qC-3T for ged-emacs-devel@m.gmane-mx.org; Mon, 01 Nov 2021 00:44:49 +0100 Original-Received: from localhost ([::1]:57598 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mhKVK-0007gb-VV for ged-emacs-devel@m.gmane-mx.org; Sun, 31 Oct 2021 19:44:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57112) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhKTw-0006vo-HP for emacs-devel@gnu.org; Sun, 31 Oct 2021 19:43:20 -0400 Original-Received: from mout.web.de ([212.227.15.3]:53211) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhKTu-0005zV-24 for emacs-devel@gnu.org; Sun, 31 Oct 2021 19:43:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1635723794; bh=TyXZC3iiH4CJT1fBBGh5Gasy4BcMfoSOSo/zsKDrB1g=; h=X-UI-Sender-Class:From:To:Cc:Subject:References:Date:In-Reply-To; b=O/gOVhFKPnbjMMb6YGSePvVP2Qa4xxfclqsrNZiWL8FZer3JBl13YgINldOn/JA+t swJZR+xJ75TNWG+WWwTh9oTzflZHHTtq2Qj2LShZu2BwrNDvo8PESt4Rj3wgOWRobP FM4qJ7gV1jp5xhk7NMm/EKaviyQUcXyMMyR1J0PE= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Original-Received: from drachen.dragon ([92.208.225.87]) by smtp.web.de (mrweb005 [213.165.67.108]) with ESMTPSA (Nemesis) id 1MdO56-1nGfIY1YX2-00ZGR8; Mon, 01 Nov 2021 00:43:14 +0100 In-Reply-To: (Helmut Eller's message of "Sun, 10 Oct 2021 07:58:13 +0200") X-Provags-ID: V03:K1:njg5FvJTtDzM88EjUsGJuJBnQjC/MzkAugOi6Ts7b8nVkwYeaN6 y49BU+HKdHKZnvRZulMlCutQHhI5UKbjKJ9IP0jVYTtp1tq/m4xucgdnxp+ojf7OOHBkI6D bnn2re3+CS3w8XK/DZ96vuT7FRlvKuDmijZhqBw3E4Hmhmr+cqkhrXENhrJac2FkJy0UNnH /Osuvw5QX6nT+xmmjjq6Q== X-UI-Out-Filterresults: notjunk:1;V03:K0:JqkAQyD+9Jo=:djL/TXhDcVOCPb/q3Q3TIm 80YjuZYutsctSWsl2zUy3CMeO7PdRKPF6kRHzlvbZ92JJvsVAxys5pNzKEOmL6+DcKexzxbFJ oXWFN/PMktiRqc8KWEmxqODozA7fk5ogVG3MOawcHnu6aRXxs9rSSq50LokzePqqcShHOINSu QVHR3jcSqD1vNj3x/xfngxevlKZjl82zOaWxVT6GDyjiZUhws9wSo9hYZ8uRO+NDhvj01VC2Q fDS1U+xGOlJpqcVIcmnFCqYSHT/YVqikwlkHItNly2aL+EBiz+tF1sKUfrBzNKaDYjloovhpc xYuevF3t51F4o39yj6hQRAZxUT15U9+jwFaMsQxcIHk1SKzhNhRcUbICJE8lZNQ2uXUPCex5Q naOdDKN7RdJsvG8RD0v9HySpirXXvroarUoYSgONPGbrtKskHCv8+BVJYgfEx7ixhx9ZQxV3K boRl1h2WY45XI5CaoM0XdB8j31SiAw1SNNfH5hLZe/qnVodZJyNyd7fph4alULMn8/SlVOIfn DdDQY7hJUEJeSzkDLCc7atphGu18XMqLEE3Fu2fHjfIVEMSqtZu66l959Zmqhq7y2L2OndfCs H0htDzM7WVgdSXyFoyvfi5ikLuZCqArNjgL8y1ZafdDCBD0FYQaMUqRGCs6ITOdimwJjzMzcA /CP7rcydReKE/qY0qFXhxq9IU+KcLaMPgz1qmFG8XVN1CQXWBmHXVDjLzaTPykzWz7Of8yDur rFczSC3lSSt7K6v6eaLY5Vd9XtUdGUPcpsI0blZV2KANQshBCKvyvhU+XryBHfC8+glWJkE5 Received-SPF: pass client-ip=212.227.15.3; envelope-from=michael_heerdegen@web.de; helo=mout.web.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278348 Archived-At: --=-=-= Content-Type: text/plain Helmut Eller writes: > The LPEG people wrote a paper[*] about this problem. I tried to convert their transcription function to Elisp. See below. Seems to work - but so far only basic regexp constructs are supported. > The problem probably are backrefs and other extensions. I think backrefs can be implemented in peg.el in a simple way. They can't be a standard extension though because matching the backref needs to advance point (so they are not just a certain `guard'). If that is supported by peg.el I think backrefs could just be transcribed more or less directly but I I'm not sure about equivalence. > Helmut > > http://www.lua.inf.puc-rio.br/publications/medeiros11regular.pdf --=-=-= Content-Type: application/emacs-lisp Content-Disposition: inline; filename=rx-to-peg.el Content-Transfer-Encoding: quoted-printable ;;; rx-to-peg.el --- Transform simple rx forms into really equivalent pegs = -*- lexical-binding: t -*- ;; Copyright (C) 2021 Free Software Foundation, Inc ;; For a peg implementation in Emacs see peg.el in Gnu Elpa. ;; The algorithm is from here: ;; http://www.lua.inf.puc-rio.br/publications/medeiros11regular.pdf ;; ;; We use the terms "peg" and "grammar" synonymously. A peg is ;; represented as pair (i.e. list) (P p) where P the list of rule ;; definitions and p the expression to match. ;; ;; Use like this: ;; ;; (rx->peg '(and (* (and (or "a" "aa") "b")) eos)) ;; =3D=3D> ;; (((a (or (and "ab" a) ;; (and "aab" a) ;; (eob)))) ;; a) ;;; Code: ;; Helpers for handling grammars. ;; these two help to construct nicer `and' and `or' expressions (defun g-or (x y) (let ((conds (append (if (eq (car-safe x) 'or) (cdr x) (list x)) (if (eq (car-safe y) 'or) (cdr y) (list y))))) (if (cdr conds) `(or ,@conds) (car conds)))) (defun g-and (x y) (cl-flet ((concat-subsequent-strings-in (list) (let ((rest list)) (while rest (when (stringp (car rest)) (while (stringp (cadr rest)) (setcar rest (concat (car rest) (cadr rest))) (setcdr rest (cddr rest)))) (cl-callf cdr rest))) list)) (let ((conds (delete "" (concat-subsequent-strings-in (append (if (eq (car-safe x) 'and) (cdr x) (li= st x)) (if (eq (car-safe y) 'and) (cdr y) (li= st y))))))) (if (cdr conds) `(and ,@conds) (car conds))))) (defvar peg-symbol-ctr) (defun g--get-rule-name (n) "Return non-terminal (symbol) number N." (intern (if (<=3D n 26) (string (+ (- ?a 1) n)) (format "r%d" n)))) (defun g-make (P p) (list P p)) (defun g-rules (g) ;; Return G's rule defintions (car g)) (defun g-expression (g) ;; Return G's match expression (cadr g)) (defun g-w/extended-expr (g f) ;; apply F to G's match expression and return the result as new peg (pcase-let ((`(,P ,p) g)) `(,P ,(funcall f p)))) ;; Definition of the transcription function =CE=A0 as in the paper (cl-defgeneric =CE=A0 (expr cont) ;; EXPR is a regular expression (as an s-exp). ;; Second arg CONT is the continuation grammar. ;; ;; When the following two conditions hold: ;; ;; (1) For no subrx (* E) of EXPR does E match the empty string. ;; (2) The language of EXPR has the prefix property: There are no ;; distinct string S1 and S2 in the language of EXPR so that S1 is a ;; prefix of S2 ;; ;; then the returned peg is equivalent to EXPR, i.e. both match the ;; same strings. (2) can be ensured by matching the end of the string ;; (`eos' in `rx') ) ;; Our entry point: (defun rx->peg (rx) (let ((peg-symbol-ctr 0)) ;; call =CE=A0 with rx and a continuation grammar with language {""} (=CE=A0 rx '(() "")))) (cl-defmethod =CE=A0 ((expr string) cont) (g-w/extended-expr cont (lambda (p) (g-and expr p)))) (cl-defmethod =CE=A0 ((_expr (eql eos)) cont) ;; pegs search buffers, so eos gets eob (g-w/extended-expr cont (lambda (p) (g-and '(eob) p)))) (cl-defmethod =CE=A0 ((expr (head and)) cont) (cl-reduce #'=CE=A0 (cdr expr) :from-end t :initial-value cont)) (cl-defmethod =CE=A0 ((expr (head or)) cont) (pcase (cdr expr) ((and (pred cddr) factors) (=CE=A0 `(or ,(car factors) (or ,@(cdr factors))) cont)) ((and (pred cdr) `(,e1 ,e2)) (let* ((g1 (=CE=A0 e1 cont)) (g2 (=CE=A0 e2 (g-w/extended-expr g1 (lambda (_) (g-expression = cont)))))) (g-w/extended-expr g2 (lambda (p2) (g-or (g-expression g1) p2))))) (ex (=CE=A0 ex cont)))) (cl-defmethod =CE=A0 ((expr (head *)) cont) (let* ((e (cadr expr)) (rulesym (g--get-rule-name (cl-incf peg-symbol-ctr))) (g1 (=CE=A0 e (g-make (g-rules cont) rulesym)))) (g-make (cons (list rulesym (g-or (g-expression g1) (g-expression cont))) (g-rules g1)) rulesym))) (provide 'rx-to-peg) ;;; =CE=A0.el ends here --=-=-= Content-Type: text/plain Michael. --=-=-=--