Re: Org Syntax Specification

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Ihor Radchenko <yantar92@gmail.com>
To: Tom Gillespie <tgbugs@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>,
	Nicolas Goaziou <mail@nicolasgoaziou.fr>,
	Timothy <tecosaur@gmail.com>
Subject: Re: Org Syntax Specification
Date: Tue, 18 Jan 2022 20:09:59 +0800	[thread overview]
Message-ID: <87r195nt2g.fsf@localhost> (raw)
In-Reply-To: <CA+G3_PM15pru_zRgT=t-gzbd6qTOw7xzXCqWd3VMb8ciH3D77g@mail.gmail.com>

Tom Gillespie <tgbugs@gmail.com> writes:

> Extremely in favor of removing switches. There are so many better ways
> to do this now that aren't like some eldritch unix horror crawling up
> out of the abyss and into the eBNF :)

I also agree that switches and $$-style equations may be deprecated.
We can
1. Do not mention them in the document
2. Add org-lint warnings about obsoletion

As for your other comments, you seem to be suggesting a number of
changes to the existing Org syntax. Some of them looks fine, some are
not. However, please keep in mind that we have to deal with back
compatibility, third party compatibility, and not breaking existing Org
documents unless we have a very strong justification. I suggest to
branch a number of new threads from here for each concrete suggestion
where you want to make changes to Org syntax, as opposed to just
document wording. Otherwise, this discussion will become a total mess.

More details below.

> +Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater

Nope. Sections are actually elements. See =org-element-all-elements=.

> +other headings. [fn:tom2:I would not discuss strata here because it is
> +not related to the syntax of the document. It is related to how that
> +syntax is interpreted by org mode. The strata are nesting rules that
> +are independent of the syntax, and discussing that here in the syntax
> +document is confusing, because the nesting is not something that can be
> +parsed directly because it depends on the number of asterisks.]

I disagree. Nesting rules are the important part of syntax. We have
restrictions on what elements can be inside other element. The same
patterns are not recognised in Org depending on their nesting. For
example, links that you put into property drawers are not considered
link objects.

> +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should
> +be treated in a way that is entirely separate from objects. This document has included
> +them as such as has org-element (iirc) however since they can never appear in a paragraph
> +and because tables are completely separate syntactically, we should probably drop the
> +idea that table cells are objects. I realize that this might mean the creation of a
> +distinction between paragraph-objects, title-objects, table-objects etc.]

Again I disagree. While your idea about table cells is reasonable
(similar for citation-references inside citations), I am against
decoupling Org syntax from org-element implementation. In
org-element.el, table-cells are just yet another object. If we make
things in org-element and syntax document out of sync, confusion and
errors will follow during future maintenance.

>  A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
> -is considered a /blank line/.  Blank lines can be used to separate
> +is considered a /blank line/.  Blank lines separate
>  paragraphs and other elements.

This actually reads slightly confusing. "Blank lines separate paragraphs
and other elements" sounds like blank lines are only relevant
before/after paragraphs. However, there are also footnote references and
lists. Maybe we can try something like:

Blank lines can be used to indicate end of some elements.

"can" because a single blank line usually does not separate anything.

> +considered part of the paragraph.[fn:tom4:I don't think we need to discuss
> +nesting scope here, it is confusing, it is always the immediately prior
> +(lesser?) element.]

Then where can we put it? This is one of the tricky conventions we use
in the parser.

> ++ STARS :: A string consisting of one or more asterisks[fn::removed
> +  note about inline tasks because it is still a heading, any mention
> +  of a concrete number should not appear in the specification of
> syntax.]

I am not sure here. Inline tasks are special because a one-line inline
task must not contain any text below, cannot have planning or
properties.

> +  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
> +  variable and can be set by users in an org file using =#+todo:=.].

If we mention this, we also need to elaborate kind of element is
#+todo:, where it can be located, and how to parse multiple instances of
#+todo in the document.

> -A heading contains directly one section (optionally), followed by
> -any number of deeper level headings.
> +The level of a heading can be used to construct a nested structure.
> +All content following a heading that appears before the next heading
> +(regardless of the level of that next heading) is a section. In addition,
> +text before the first heading in an org document is also a section.

Note that it is not true for one-line inline tasks.

> +considered a section), sections only occur within headings.[fn:: The
> +choice to call this syntactic component a section is confusing because
> +it is at odds with the usual notion of a section, namely that the
> +usual concept of a section implies that it includes nested content.  I
> +personally didn't realize that it ended at the next heading until
> +writing this comment (as can be seen from reading my comments in the
> +laundry implementation). Therefore I suggest that we look for an
> +alternate name for this syntactic component. Maybe "segment" or
> +something similar that indicates that it is truncated?]

Sounds reasonable. However, we may also need to make this change in
Elisp level, which is tricky when you think about
backward-compatibility.

> +however, contain [[Planning][planning]].[fn::This is wrong? If it is not
> +wrong, then it should be. Property drawers are already annoying to implement
> +because they share syntax with regular drawers, and allowing a property drawer
> +at the top of a file without a heading means that it should be a regular drawer
> +not a property drawer, otherwise you have to special case the handling of drawers
> +in the zeroth section. What is the use case for a property drawer as opposed to
> +a #+property: line in the zeroth section? I may come around on this at some point,
> +but right now it seems more complex, however it might actually be more consistent
> +if we imagine the zeroth section as being nested inside a single heading that has
> +level zero implicitly at the top of a document. Unfortunately that means that such
> +property drawers cannot be determined from a homogeneous syntax but instead require
> +some operations on the internal representation. Note also that if this were allowed
> +then the property drawer should only be allowed as the very first line of a file
> +because newlines at the start of a file need to be preserved. More though required.]

The statement about property drawers in first section (that how we refer
to it in org-element) is correct. First section and its property drawer
location is special.

I agree that it's inconsistent with normal property drawers. However, we
cannot change it without breaking existing Org files. It we decide to
change syntax in this area, we should think carefully about possible
consequences.

> + [fn::Without going into to much detail, affiliated keywords should
> +not be distinguished from other keywords at the level of the syntax.
> +The fact that they are is an artifact of the elisp implementation.
> +The determination of the behavior of a keyword with regard to
> +affiliating behavior should be determined in a later pass, even if in
> +some cases some implementations may want to materialize them into the
> +parser for performance reasons. Allowing users to promote a keyword to
> +be an affiliated keyword would be incredibly powerful for attaching
> +metadata to parts of org-files in a way that is user extensible. It
> +may still be desirable to describe the behavior of affiliated keywords
> +here, but they are not in any way distinct from other keywords at the
> +level of org syntax and trying to implement them as such is usually a
> +mistake (that I have made).]

I generally support this idea. Handling keywords in org-element is not
pretty. Having them in the parse tree would make things easier. However,
we again need to consider back-compatibility. I can imagine third-party
ox-* packages breaking if we make this change - we should double check
if we decide to change this.

> +property of the element they apply to. [fn::While it is tempting to try
> +to do this at the level of the grammar it induces a number of nasty
> +ambiguities in practice. It is saner to have a single unified keyword
> +syntax and then to determine affiliation behavior in a later pass.]

Yes, it is saner. However, our syntax document is supposed to be
human-readable description of what org-element does. We cannot introduce
differences between grammar document and de-facto parser implementation.
This will defeat the purpose to providing reference syntax - we will get
inconsistency between Emacs Org mode and external parsers.

> +  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn::
> +  All keywords should allow OPTVAL, it regularizes and simplifies the syntax.]

I support this idea.

> + [fn:: ~:end:~ may be capitalized (legacy support)]

Both :END: and :end: are supported by Org parser. What do you mean by
legacy?

> + [fn::I suggest that we remove inlinetasks from this document.
> +They are a hack that cannot be implemented as part of a grammar
> +because they require a concrete value to be specified which breaks
> +the arbitrary nesting depth of headings. I think I wrote this somewhere
> +else as well, but inline tasks can only be a layer on top of headings,
> +they cannot displace them.]

I disagree. inilinetasks are a part of syntax de facto and they can be
encountered in Org documents in the wild. If you treat inlinetasks as
ordinary headings, things may be broken unpredictably during parsing.

Instead, we may consider making inlinetask level constant.

> +indicate that it should, which is misleading. Further, it is actually
> +not possible to implement contents as specified because grammars
> +cannot track the indentation level that is required to reconstruct
> +list items correctly. Therefore CONTENTS should not be defined as such
> +but should only specify that they can be anything except a newline. I
> +think that the intent of this document is somewhat a conflation of the
> +syntax for org and of the semantics as determined by export backends
> +and/or org-element, however it makes it extremely confusing because it
> +is not actually possible to parse CONTENTS, they must be reconstructed
> +from the parse tree.]

Could you elaborate why grammars cannot track the indentation level?
AFAIU, If it were the case, python would not be parseable.

> + [fn::The failure mode for malformed contents needs to be
> +determined more clearly here. We don't want property draws to suddenly
> +become plain drawers just because a user has a malformed line, that
> +could be disastrous if certain settings in the property drawer mask
> +settings from further up the tree.  In short, malformed contents
> +should not poison the whole property drawer.]

Yet, it is exactly what happens in Org. malformed property drawers will
become ordinary drawers.

>      + SWITCHES :: Any number of SWITCH patterns, separated by a single
> -      space character
> +      space character [fn::For the love of all that is sane can we
> +      please just remove this from the spec or mark it as legacy.]

I support this idea.

> +PLANNING must directly follow HEADING without any blank lines in between. 
> +
> + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]

The last one wins (as in org-element-planning-parser)

> + [fn::As I think I mention elsewhere, the concrete names here
> +should NOT be part of the syntax, it makes the parser brittle
> +and hard to maintain. Differentiation between entities and fragments
> +should be handled at the syntax level for cases where the fragment
> +has brackets, and then in a second pass for values that are
> +syntactically entity-or-fragment and must be determined after
> +the fact.]

How would you define entities object then? First/second pass is an
implementation detail. Our current description follows how org-element
handles entities.

> + [fn::We probably want to node that BACKEND can be the empty string
> +per that thread on how to deal with intra-word markup. Again this
> +also touches on the general principle of wanting to close over the
> +empty string so that users aren't surprised when ~@@:lol@@~ suddenly
> +appears in plain text just because no backend was specified.]

While I am not opposing the idea, your principle is not followed by
org-element parser. We may consider changing it, but it is again a whole
separate discussion where we need to consider pros and cons.

>  Note that the first pattern may not occur on an /unindented/ line, as it
> -is then a [[#Footnote_Definitions][footnote definition]].
> +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right?
> +the font locking code is not consistent with actual behavior, need to
> +review the laundry test cases and example files.]

Do not look at font-locking. You can safely consider that fontification
is wrong in all non-trivial cases. Always check org-element-at-point and
org-element-context.

> -  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]].
> +  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in
> +  macros, I think it would be safe to add ~\|~ as an escape character.
> +  The issue in the elisp implementation is not actually at the level
> +  of the syntax, but is actually in the export backends or somewhere
> +  deeper, because even using a macro that expands to be a pipe ~|~
> +  breaks the table (which is really bad).]

I am not sure if it is needed. We can already to \vert

> + [fn::I have some suggestions for extensions to timestamp syntax to
> +support historical and far future dates, as well timezone offsets (NOT
> +the 3 letter ambiguous disaster) and seconds and sub-second times.]

That would be welcome, but someone™ should implement timezone support in
Elisp level. We have several discussions about this in the past.

> +The four =*/_+= may be arbitrarily nested to any depth. Verbatim and
> +code ==~= may be nested inside any other markup, but no other markup
> +will be interpreted inside of them since they are interpreted exactly.

That's not accurate. you cannot nest, say, bold inside bold. You cannot
put code inside any other markup freely: consider *bold =asd*asd= not bold*

Best,
Ihor

next prev parent reply	other threads:[~2022-01-18 13:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-09 18:02 Org Syntax Specification Timothy
2022-01-15 12:40 ` Sébastien Miquel
2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
2022-01-16  8:08     ` Sébastien Miquel
2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
2022-01-16  9:46       ` Colin Baxter 😺
2022-01-16 11:11         ` Tim Cross
2022-01-16 13:26         ` Juan Manuel Macías
2022-01-16 14:43           ` Colin Baxter 😺
2022-01-16 15:16             ` Greg Minshall
2022-01-16 17:45         ` Rudolf Adamkovič
2022-01-16 12:10     ` Eric S Fraga
2022-01-16 14:30       ` Anthony Cowley
2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
2022-01-18 12:09   ` Ihor Radchenko [this message]
2022-01-19  1:22     ` Tom Gillespie
2022-01-19 11:58       ` Ihor Radchenko
2022-09-25  9:09 ` Bastien
2022-09-25 21:28   ` Rohit Patnaik
2022-11-26  2:41   ` Ihor Radchenko
2022-11-26  6:24     ` Bastien
2022-11-26  6:05   ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r195nt2g.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=mail@nicolasgoaziou.fr \
    --cc=tecosaur@gmail.com \
    --cc=tgbugs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.