From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 0FxrEtkFYl/sVgAA0tVLHw (envelope-from ) for ; Wed, 16 Sep 2020 12:32:25 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id wE9VDtkFYl94MAAA1q6Kng (envelope-from ) for ; Wed, 16 Sep 2020 12:32:25 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AC21F9404D7 for ; Wed, 16 Sep 2020 12:32:24 +0000 (UTC) Received: from localhost ([::1]:38826 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kIWbm-0008LF-OR for larch@yhetil.org; Wed, 16 Sep 2020 08:32:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37182) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kIWYC-00057X-Iy for emacs-orgmode@gnu.org; Wed, 16 Sep 2020 08:28:40 -0400 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]:41209) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kIWY9-0005J6-R2 for emacs-orgmode@gnu.org; Wed, 16 Sep 2020 08:28:39 -0400 Received: by mail-pf1-x42a.google.com with SMTP id z19so3866273pfn.8 for ; Wed, 16 Sep 2020 05:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:date:message-id:mime-version :content-transfer-encoding; bh=LrPVhVWLMoiqNA0GEDG442wa3h0aRM/4eEIY+iXhO6w=; b=Djc4wIDjjFAjRDYHK5efHjc//tVEW1pZQKGCN1l4GhJDgXfuJQpfystzXx+ZnxdQHs jvSsdNRfRjL3mBCM+v7l1/l5uixFf0+dEnebcT9ETcY+Ozk9M9Dgr6jCrEtZwolkEfK9 eWORq1yKSmDwQdkjhsFHByTGKzdD5N6dkJT3ItaNCaXIqqGx6xS3OuGffB1KkArJYSHQ TdYcCrPkW+pgnnNGMgs+NjiAuNzzrJf0codfkhTa8HxZ7GkNHEFNpz0csAgxj0GwL5dJ H/y32vP5HZ3trYTii9LIoxm3FDuiqOAmJH50tNh5Gon+yMU3sKbjXkfH3RcENUmv1JQB HE1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=LrPVhVWLMoiqNA0GEDG442wa3h0aRM/4eEIY+iXhO6w=; b=mr9Eagt8wFoC+sgeddG3a2jxwOE66lzOfSlt8bfQv2rjhrVSU/jbisIAdiR8yYMbLu QqxNNmgjvs9imFLKF/DNvNQXrYUhiBCBGeiDiNK62eiacyKQGjF/71HMIoonXm5Tf4Qo TCaS/SPK0LdLrVaw64VOQvP8/8raxR/uxapOIVNJhzpPSwM39K64HUxc7udlCPBzyOqC QQQoJcbs26DJcQ6i67jKrZw68s9gO29lQbHNulO3wDkKmwivBEfZUTfBj2BhCavq0hyc Kgnpw2Sercph/Kuxjf5JlGvU6i8AWpegDz7d6vJkhy86LEje7rSICeZqQqIgYrnHEwyx +7gA== X-Gm-Message-State: AOAM530RsqhvfS+77OayeqhKYV0ZnA/iEZ0NkqIz39bgG2joZjlOhJts F75gVUNfsWyB/HO9ODrEvFL5Iebh21nUKw== X-Google-Smtp-Source: ABdhPJyl2Rfmn/eOn0eg/4nu4ce9Vhm3HMhwDc2/yrE9UJ8MbqNaRsgPJZHmjqEr+wmvNA1CyY/wIw== X-Received: by 2002:aa7:9e4e:0:b029:13c:1611:6589 with SMTP id z14-20020aa79e4e0000b029013c16116589mr22412598pfq.6.1600259316304; Wed, 16 Sep 2020 05:28:36 -0700 (PDT) Received: from localhost ([210.3.160.218]) by smtp.gmail.com with ESMTPSA id x3sm2807500pjf.42.2020.09.16.05.28.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 05:28:35 -0700 (PDT) From: Ihor Radchenko To: =?utf-8?Q?Przemys=C5=82aw_Kami=C5=84ski?= , emacs-orgmode@gnu.org Subject: Re: official orgmode parser In-Reply-To: References: <68dc1ea1-52e8-7d9e-fb2d-bcf08c111eca@intrepidus.pl> <87d02n2yyr.fsf@gmail.com> <482cea5c-4214-57ac-dfeb-1e305180fee5@intrepidus.pl> <20200915095548.GP20869@maokai> <20200915123722.GA20532@tuxteam.de> Date: Wed, 16 Sep 2020 20:27:36 +0800 Message-ID: <87bli5nbyf.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=yantar92@gmail.com; helo=mail-pf1-x42a.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=Djc4wIDj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.21 X-TUID: WK26FH7JPBEt FYI: You may find https://github.com/ndwarshuis/org-ml helpful. Przemys=C5=82aw Kami=C5=84ski writes: > On 9/15/20 2:37 PM, tomas@tuxteam.de wrote: >> On Tue, Sep 15, 2020 at 01:15:56PM +0200, Przemys=C5=82aw Kami=C5=84ski = wrote: >>=20 >> [...] >>=20 >>> There's the org-json (or ox-json) package but for some reason I >>> wasn't able to run it successfully. I guess export to S-exps would >>> be best here. But yes I'll check that out. >>=20 >> If that's your route, perhaps the "Org element API" [1] might be >> helpful. Especially `org-element-parse-buffer' gives you a Lisp >> data structure which is supposed to be a parse of your Org buffer. >>=20 >> From there to S-expression can be trivial (e.g. `print' or `pp'), >> depending on what you want to do. >>=20 >> Walking the structure should be nice in Lisp, too. >>=20 >> The topic of (non-Emacs) parsing of Org comes up regularly, and >> there is a good (but AFAIK not-quite-complete) Org syntax spec >> in Worg [2], but there are a couple of difficulties to be mastered >> before such a thing can become really enjoyable and useful. >>=20 >> The loose specification of Org's format (arguably its second >> or third strongest asset, the first two being its incredible >> community and Emacs itself) is something which makes this >> problem "interesting". People have invented lots of usages >> which might be broken should Org change to a strict formal >> spec. You don't want to break those people. >>=20 >> But yes, perhaps some day someone nails it. Perhaps it's you :) >>=20 >> Cheers >>=20 >> [1] https://orgmode.org/worg/dev/org-element-api.html >> [2] https://orgmode.org/worg/dev/org-syntax.html >>=20 >> - t >>=20 > > So I looked at (pp (org-element-parse-buffer)) however it does print out= =20 > recursive stuff which other schemes have trouble parsing. > > My code looks more or less like this: > > (defun org-parse (f) > (with-temp-buffer > (find-file f) > (let* ((parsed (org-element-parse-buffer)) > (all (append org-element-all-elements org-element-all-objects= )) > (mapped (org-element-map parsed all > (lambda (item) > (strip-parent item))))) > (pp mapped)))) > > > strip-parent is basically (plist-put props :parent nil) for elements=20 > properties. However it turns out there are more recursive objects, like > > :title > #("Headline 1" 0 10 > (:parent > (headline #2 > (section > > So I'm wondering do I have to do it by hand for all cases or is there=20 > some way to output only a simple AST without those nested objects? > > Best, > Przemek