From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id aO+vIECltmDTHwEAgWs5BA (envelope-from ) for ; Tue, 01 Jun 2021 23:23:12 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id UNTnG0CltmB6dgAA1q6Kng (envelope-from ) for ; Tue, 01 Jun 2021 21:23:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C9D3D2592B for ; Tue, 1 Jun 2021 23:23:11 +0200 (CEST) Received: from localhost ([::1]:42260 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1loBqw-0006Ic-Eb for larch@yhetil.org; Tue, 01 Jun 2021 17:23:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56864) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1loBqY-0006IU-JE for emacs-orgmode@gnu.org; Tue, 01 Jun 2021 17:22:46 -0400 Received: from mail-wm1-x335.google.com ([2a00:1450:4864:20::335]:45046) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1loBqW-0004VG-5C for emacs-orgmode@gnu.org; Tue, 01 Jun 2021 17:22:46 -0400 Received: by mail-wm1-x335.google.com with SMTP id p13-20020a05600c358db029019f44afc845so413929wmq.3 for ; Tue, 01 Jun 2021 14:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=yD9XX0YQVzuTHH5OTm/rHkCwj7H0gnCos0YJgVmotps=; b=oehc5wpH9QTumHPcX4GemBumTa/OhUbfHNbZ6fzwm7G2ei1o0nPdg3xTI1MwXZtK9E JiHBJdMUML5Gr9+4L5QVP+jhVUw8aWyFoPGB3edkjxs3gprj0aLhLoWRv2wD15mdRk1a UzBU1oMKwElKAP3Ap6WFTJo0oyi91pj5scnzQvXDyVB2tHsCLduf84u0yytroFFApu93 +eIo1fraq58+aHNUrMSygrvD9KEZP9TIbsECWB8e9bKmZ6eM3KjRbFDCZeJ3LZq58IuT Rldvw3XCL3DoeZxL3Ri3RJzH014gH+Zr6zhKKaHBWpTR3czHcVS/auFWoWpQS17+Uvcq EOpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=yD9XX0YQVzuTHH5OTm/rHkCwj7H0gnCos0YJgVmotps=; b=oX9IJp5S305Q1ZJb25Uq17XSR/X1EOK8tuKY5Kjz3gkkEl0PMHRV2lm463XJXdX6Wi Jho4lmF+tpdKX60ZpZVyg+Xeb1OnwDQxuE7KzdKqwcuhdldlceU3gPQSY78+Vg4dYOjO uxVFqdgH2uRL0rzIYzwECM0ccD/B4kbyBu08dIEaI16XK7gfDarjiltEf+1v359lvteR YuzxaGz5vi8v4uw0cLFRRhDkT4TREMYbysP3MJu105LQUp6lYaBjlNZbKF3lnR0xBXWi QPMVdT4cFOf5A0iNByT03z3GhaE7MONRsz6VQMOBGPoiZsfyPi7RYUc0ZkPDEF5V8xI4 ttuw== X-Gm-Message-State: AOAM530tJ1eps4aukbYXKmhX6gwFxJTzzPyUGJFtS/MTBKlsLodrVOCl 9MVb5lEmwg27ePUA0ep6N0cBHxzLng== X-Google-Smtp-Source: ABdhPJwxJn7gOL806ZAqIvwKbFU9WUwa/m2tECKfLVJN2MYs8jYXM1tKxJthhuS1zYCh+It0IgS8vA== X-Received: by 2002:a1c:a184:: with SMTP id k126mr18219227wme.82.1622582562319; Tue, 01 Jun 2021 14:22:42 -0700 (PDT) Received: from ?IPv6:2001:a61:3b05:4d01:2a03:7b93:3887:b569? ([2001:a61:3b05:4d01:2a03:7b93:3887:b569]) by smtp.gmail.com with ESMTPSA id r5sm667830wmh.23.2021.06.01.14.22.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 01 Jun 2021 14:22:39 -0700 (PDT) Subject: Re: A formal grammar for Org To: Tom Gillespie References: From: =?UTF-8?Q?Jakob_Sch=c3=b6ttl?= Message-ID: <13899acc-0760-a772-67d0-50b91cc5d9ac@gmail.com> Date: Tue, 1 Jun 2021 23:22:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Received-SPF: pass client-ip=2a00:1450:4864:20::335; envelope-from=jschoett@gmail.com; helo=mail-wm1-x335.google.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.613, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1622582592; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=yD9XX0YQVzuTHH5OTm/rHkCwj7H0gnCos0YJgVmotps=; b=AKPuAFK+09wHpyfWFzj5vXz+rfB2yS5pEm3/h4WrlrLkQek8k9bmRkV0feFdXY1VojqnAM 5S8/oMnOjlS115zzGQbMBM6CAC/6Hz/seT+1B8MKbf1jRm5CfGEvGLczqfWkZkPsPt4pzw bPymJkzvw+hmobAM6W7uYI+bV0bqVzua9MjRxPBbaVGOxYnGocBP4eaFcCahitYr3WD7ZY vELfm2UDgUacUh1cbDBvOWDZM2JsAm9y+eCjmk3St+FujPDEgoMp238uZSSNAfaA2kYH0P qg0ZlLf3/x5zT/eEneEX5Az/43JQcmanWUnpzP/s10kDLtdKWk7QmBpmpYIb/Q== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1622582592; a=rsa-sha256; cv=none; b=TpOs7apANV5EKghR74ALSlMZZsCQt5ve0tL/2tSgZsnYicU3qqm1oa99VCU1R14kV/lnP7 u26fewogkXOFpsIVp80N7jyCnDG22WdLrqM4/u/2qA2X7V3WjaPws4yn/D/HPWp+5afGbf QH6dQZf/dk2lMFgAgVQRtRGtdalCq3HQzx1NK5PsJ1g0XkGa46YtTouoWXFV3bin2tPc3/ vl0eSFYzxQmto2/YBsVDp6BuQtp7RvqHnz+zyTUE7reXjjK0ndXvJw32U38YYew/qWcG7d fnB57vkJOCig0VFS33b4ufkiF/w122L9hkEvkfFN7b2r4TIbXichEtSr1qcoPw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=oehc5wpH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -3.13 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=oehc5wpH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: C9D3D2592B X-Spam-Score: -3.13 X-Migadu-Scanner: scn0.migadu.com X-TUID: G3IwDPdZYGd4 Am 01.06.21 um 11:53 schrieb Tom Gillespie: > >> We have a pretty similar project, org-parser[1]. It's also written in a Lisp dialect, Clojure, but it uses instaparse instead of brag as parser library. > https://github.com/tgbugs/laundry/tree/next#similar-projects I managed > to get it into my README as a reminder to myself to have a thorough > look at it, but have been occupied with other work since then. Thanks, I'll also set a link in our README to related work. >> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. It would be so cool, if it could be generated from one formal specification. > Yes, that would be great. It would be a major step to have a couple of > grammars for org that can be used for stuff like this and compared to > each other, along with test cases that we can use to define correct > behavior. Right, that would be interesting. But it requires all parser to yield exactly the same structure (to be comparable). I think a design goal of org-parser is to provide a easy to use AST but not necessarily a 100%-match to the AST from org-element.el. How is it with laundry? Do you try to stick exactly to org modes parse result structure? > One issue that I don't have a full understanding of at the > moment is how certain ambiguous forms will impact the ability to > transform directly into the tree sitter grammar. > > The reason I mention > this is because I have had to move to a two phase parser in order to > deal with ambiguous parses. We also have two phases: "parse" and "transform" (the latter is basically a mapping function transforming nodes of the AST). I also see that as a problem for generating grammar.js. a) For tree-sitter, depending of what we expect from it, it may not be necessary, to do the second phase. E.g. for syntax highlighting the context free grammar might be enough. b) Since transformations of org-parser can be compiled to JS, it might be possible, to even create the grammar.js as two-phase parser. >> Do you plan, in your parser, to do a transformation step from the raw parser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date "2021-06-01") and the transformed AST would transform this to a higher-level timestamp object. > Yes. I already do that to a certain extent in the expander > https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the > raw AST is hard to work with directly), but there will be more. I also > expect that I will add an intermediate step where the AST is > rearranged to account for aspects of org semantics that cannot be > captured by the context free part of the grammar. > > After that step there are a number of potential conversions, one of which will > transform the AST into Racket structs, but I haven't made it quite > that far yet. That said, I think that in terms of defining a canonical > parse, I am aiming to do that in the transformed intermediate > s-expression representation because I think it will be easier to > define the correctness of certain user interactions on that form rather than > on the higher level object representation, even if the higher level > objects are ultimately used to actually implement that behavior. Interesting. Yeah, because things like timestamps have language-specific representations may not be comparable across e.g. emacs lisp, rust, and clojure/JS. >> Do you have any automated tests for your parser? > Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt > you can run them from the working directory via =raco test laundry=. Ah, alright, I first didn't see them. Wow. These parser projects are really a huge amount of work times 4 (grammar, transformation, tests, re-export) ^^ > > It would be great to align the grammars and the behavior using a set > of common test cases. If it works out, that our parser have exactly the same resulting structure, that would be great. But not sure, if that works out, to be honest. At least we can share each others mean test.org files ^^ Best, Jakob