emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Org Syntax Specification
@ 2022-01-09 18:02 Timothy
  2022-01-15 12:40 ` Sébastien Miquel
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
  0 siblings, 2 replies; 15+ messages in thread
From: Timothy @ 2022-01-09 18:02 UTC (permalink / raw)
  To: org-mode-email, mail


[-- Attachment #1.1: Type: text/plain, Size: 2554 bytes --]

Hi All,

I’ve talked about adding citation syntax to the org-syntax document before, and
previously expressed the thought that it could be generally improved quite a
bit. This has culminated me in spending the last few days straight working on a
rewrite of org-syntax.org to try to bring it closer to the point where we can
knock “(draft)” out of the title 🙂.

Ihor has been a tremendous help pointing out inaccuracies and explaining some of
the parsing behaviour (thanks!), which has allowed me to get it to a point where
I think it would benefit from wider feedback.

I’ve just pushed my latest revision to worg as
<https://orgmode.org/worg/dev/org-syntax-edited.html>. Personally though, I think
it’s best viewed as a PDF, so I’ve also uploaded the PDF export to
<https://0x0.st/oiM5.pdf>.

It would be great if those of you with an interest/understanding of Org’s syntax
could have a look and let me know what you think. I think the best way to
compare to the current org-syntax.org would be to put them side-by-side. I’ve
attempted to list the main changes I’ve made in the appendix, however I’ve
likely missed things.

Lastly, having spent a while looking at the syntax, I’m wondering if we should
take this opportunity to mark some of the syntactic elements we’ve become less
happy with as *(depreciated)*. I’m specifically thinking of the TeX-style LaTeX
fragments which have been a bit of a pain. To quote Nicolas in org-syntax.org:
      It would introduce incompatibilities with previous Org versions,
      but support for `$...$' (and for symmetry, `$$...$$') constructs
      ought to be removed.

      They are slow to parse, fragile, redundant and imply false
      positives.  — ngz

Marking this as depreciated would have no effect on Org’s current behaviour, but
we could:
1. Mark as depreciated now-ish
2. Add a utility to convert from TeX-style to LaTeX-style
3. Add org lint/fortification warnings
4. A while later (half a decade? more?) actually remove support

The other component of the syntax which feels particularly awkward to me is
source block switches. They seem a bit odd, and since arguments exist,
completely redundant.

――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

That’s all for now, I hope you all had a great Christmas and new year!

All the best,
Timothy

[-- Attachment #1.2: Type: text/html, Size: 7068 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Org Syntax Specification
  2022-01-09 18:02 Org Syntax Specification Timothy
@ 2022-01-15 12:40 ` Sébastien Miquel
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
  1 sibling, 1 reply; 15+ messages in thread
From: Sébastien Miquel @ 2022-01-15 12:40 UTC (permalink / raw)
  To: Timothy, org-mode-email

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

Hi,

The new document seems much clearer. It makes a nice complement to the
manual and we should definitely lose the (draft). Thank you Timothy
for the work.

> Lastly, having spent a while looking at the syntax, I’m wondering if 
> we should take this opportunity to mark some of the syntactic elements 
> we’ve become less happy with as *(depreciated)*. I’m specifically 
> thinking of the TeX-style LaTeX fragments which have been a bit of a 
> pain. To quote Nicolas in org-syntax.org:
>
>     It would introduce incompatibilities with previous Org versions,
>     but support for |$...$| (and for symmetry, |$$...$$|) constructs
>     ought to be removed.
>
>     They are slow to parse, fragile, redundant and imply false
>     positives. — ngz
>

This quote has been mentioned a few times lately, and no one has yet
spoken in favor of the $…$ syntax, so I'll have a quick go.

It is easier to use, easier to read and more commonly used (and known)
in tex documents (a quick web search for sample tex documents confirms
the latter). Removing this syntax would make org slightly harder to
pick up, with respect to writing scientific documents.

As for the listed shortcomings, I don't think we know whether its
slowness is significant and false positives can be avoided by using
the \dollar entity (possibly ?). In my own use, the only usability
issue I can think of is false negatives while writing : inserting a
space or other such characters at the end of a snippet removes the
fontification (I solve this by modifying the fontification regexps).

Regards,

-- 
Sébastien Miquel

[-- Attachment #2: Type: text/html, Size: 3541 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification)
  2022-01-15 12:40 ` Sébastien Miquel
@ 2022-01-15 16:36   ` Timothy
  2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16 12:10     ` Eric S Fraga
  0 siblings, 2 replies; 15+ messages in thread
From: Timothy @ 2022-01-15 16:36 UTC (permalink / raw)
  To: sebastien.miquel; +Cc: org-mode-email


[-- Attachment #1.1: Type: text/plain, Size: 3415 bytes --]

Hi Sebastien,

Thanks for your comments, and your thoughts on the proposed deprecation.

It’s worth explicitly considering why we wouldn’t want to steer people away from
the TeX-syntax LaTeX fragments, so I am glad you have brought up some reasons.
I do not find myself agreeing with them however, and will endeavour to explain
why below.

⁃ It is easier to use
  • Hmm. Not sure about this. Keystroke wise we’re comparing `$$' to `\('. The
    latter can be completed by smartparens, but since single dollars are
    reasonable Org content the former can’t. At this point the only argument is
    muscle memory, and if you’re a LaTeX user (a good target audience for LaTeX
    fragments I think), I’d expect LaTeX-style `\(' to be more familiar.
⁃ Easier to read
  • I had a quick look at a document to gauge this for myself, and if anything I
    found the opposite (see <https://0x0.st/o-32.png>). This may be influenced by
    a minor fontification tweak I made to LaTeX style input though.
⁃ more commonly used (and known) in tex documents (a quick web search for sample
  tex documents confirms the latter).
⁃ Removing this syntax would make org slightly harder to pick up, with respect
  to writing scientific documents.
  • With respect to writing scientific documents, I think we can reasonably
    expect people to be familiar with `\(', particularly given the points I raise
    below.

These points seem to have a common thread in wanting to have Org be like LaTeX.
I find this sensible, but I think this is a good opportunity to point out that
$/$$ are very much second class citizens in LaTeX now, no matter what you may
see in old documents.

To quote from David Carlisle (one of the main members of the LaTeX3 team) on [tex.stackexchange]:
> $$ is TeX primitive syntax, which, as others have commented is hard to
> redefine (in classic TeX there is no command name which triggers entering or
> leaving display math).
> LaTeX doesn’t officially support $$. The most noticeable failure if you use
> the syntax is that the fleqn option will no longer affect the display of the
> mathematics, it will remain centered rather than being set flush left.

Another member of the LaTeX3 team, Joseph Wright, has made even stronger
comments about $-syntax on [tex.stackexchange]:
> I’d note with my ’LaTeX3’ hat on that there is a strong chance we’ll favour `\(
> ... \)' to the point of not supporting `$...$' for LaTeX3. So in the long term it
> might be best to get used to `\(...\)'.

In further comments Joseph goes on to say that it is likely that $-syntax will
/not/ be dropped outright, but that $$ likely will be. Among other
things the $-syntax produces worse error reporting and spacing.

So, to sum up LaTeX currently prefers `\(...\)' / `\[...\]' over `$' / `$$', and it
looks like people will be pushed more strongly in this direction in future.

More than anything else, I think this demonstrates why aside from annoyances
with the parsing, purely from a user perspective, it would make sense to favour
LaTeX-syntax LaTeX fragments.

All the best,
Timothy


[tex.stackexchange] <https://tex.stackexchange.com/questions/503/why-is-preferable-to>

[tex.stackexchange] <https://tex.stackexchange.com/questions/510/are-and-preferable-to-dollar-signs-for-math-mode?noredirect=1&lq=1#comment2607_513>

[-- Attachment #1.2: Type: text/html, Size: 13813 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification)
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
@ 2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 12:10     ` Eric S Fraga
  1 sibling, 2 replies; 15+ messages in thread
From: Sébastien Miquel @ 2022-01-16  8:08 UTC (permalink / raw)
  To: Timothy; +Cc: org-mode-email

Hi,

With respect to readability, I only mean to point out that the $…$
syntax is one less character, and that the \(\) characters are quite
overloaded.

> this is a good opportunity to point out that $/$$ are very much second 
> class citizens in LaTeX now, no matter what you may see in old documents. 


The posts that you quote are 10 years old. As per [0] (2020), there
will be no LaTeX3. Nor is it only old documents that use the $…$
syntax : looking for learning ressources (see [1]), everything that I
find uses it. That includes The Not So Short Introduction to LaTeX [2]
(2021) and https://en.wikibooks.org/wiki/LaTeX/Mathematics.

Although I have no evidence of this, my expectation is that the
majority of tex users use the $…$ syntax (it is in fact widely used
outside of tex: in most markdown flavors and texmacs for example). I
also expect that a significant proportion of tex users are not aware
of the \(…\) syntax. I think here of users that are less tech literate
than most of this mailing list.

Regards,

[0]: 
https://www.latex-project.org/publications/2020-FMi-TUB-tb128mitt-quovadis.pdf
[1]: 
https://tex.stackexchange.com/questions/11/what-are-good-learning-resources-for-a-latex-beginner
[2]: https://ctan.tetaneutral.net/info/lshort/english/lshort.pdf

-- 
Sébastien Miquel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  8:08     ` Sébastien Miquel
@ 2022-01-16  9:23       ` Martin Steffen
  2022-01-16  9:46       ` Colin Baxter 😺
  1 sibling, 0 replies; 15+ messages in thread
From: Martin Steffen @ 2022-01-16  9:23 UTC (permalink / raw)
  To: emacs-orgmode



Hi

to add my two cents. I am latex user of _many_ years (as user of emacs +
org), and I use it often for math-loaded texts.

I do use $ (I actually did not even know that \( \) is (supposed to be)
the new way  until I saw it generated by org.

As for $$ (or \[), I basically don't use it. I use
begin/end{displaymath}.

I don't care that it's a lot to type in, as I use an editor, that
assists me (said emacs ;-) resp. auc-tex mode).

I like the keybindings for environments there (and with the usual prefix
C-u C-x C-e, one can for instance turn a display-math into an equation,
should one decide later).

$$ I never used. The display-math simply looks nicer and is better
supported by auc-tex in that it uses standard indentation for
environments. For me it's likewise important that the text is properly
indented, and highlighted, so I can read the source file with easy,
while working on it.

Also \[ \] does proper indentation, but as said, I got used to C-x C-e
and that produces for me displaymath (probably it can be customized, but
I am happy with it as is).


Martin












>>>>> "Sébastien" == Sébastien Miquel <sebastien.miquel@posteo.eu> writes:

    Sébastien> Hi,

    Sébastien> With respect to readability, I only mean to point out
    Sébastien> that the $…$ syntax is one less character, and that the
    Sébastien> \(\) characters are quite overloaded.

    >> this is a good opportunity to point out that $/$$ are very much
    >> second class citizens in LaTeX now, no matter what you may see in
    >> old documents.


    Sébastien> The posts that you quote are 10 years old. As per [0]
    Sébastien> (2020), there will be no LaTeX3. Nor is it only old
    Sébastien> documents that use the $…$ syntax : looking for learning
    Sébastien> ressources (see [1]), everything that I find uses
    Sébastien> it. That includes The Not So Short Introduction to LaTeX
    Sébastien> [2] (2021) and
    Sébastien> https://en.wikibooks.org/wiki/LaTeX/Mathematics.

    Sébastien> Although I have no evidence of this, my expectation is
    Sébastien> that the majority of tex users use the $…$ syntax (it is
    Sébastien> in fact widely used outside of tex: in most markdown
    Sébastien> flavors and texmacs for example). I also expect that a
    Sébastien> significant proportion of tex users are not aware of the
    Sébastien> \(…\) syntax. I think here of users that are less tech
    Sébastien> literate than most of this mailing list.

    Sébastien> Regards,

    Sébastien> [0]:
    Sébastien> https://www.latex-project.org/publications/2020-FMi-TUB-tb128mitt-quovadis.pdf
    Sébastien> [1]:
    Sébastien> https://tex.stackexchange.com/questions/11/what-are-good-learning-resources-for-a-latex-beginner
    Sébastien> [2]:
    Sébastien> https://ctan.tetaneutral.net/info/lshort/english/lshort.pdf

    Sébastien> -- Sébastien Miquel




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
@ 2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
                           ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Colin Baxter 😺 @ 2022-01-16  9:46 UTC (permalink / raw)
  To: Sébastien Miquel; +Cc: , org-mode-email, Timothy

>>>>> Sébastien Miquel <sebastien.miquel@posteo.eu> writes:

    > Hi, With respect to readability, I only mean to point out that the
    > $…$ syntax is one less character, and that the \(\) characters are
    > quite overloaded.

Indeed. Compare something like

$g=\lim_{\delta m\to 0}(\delta F/\delta m)$

with

\(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)

Backslash city! I know which one I'd prefer to read.

    >> this is a good opportunity to point out that $/$$ are very much
    >> second class citizens in LaTeX now, no matter what you may see in
    >> old documents.

    > The posts that you quote are 10 years old. As per [0] (2020),
    > there will be no LaTeX3. Nor is it only old documents that use the
    > $…$ syntax : looking for learning ressources (see [1]), everything
    > that I find uses it. That includes The Not So Short Introduction
    > to LaTeX [2] (2021) and
    > https://en.wikibooks.org/wiki/LaTeX/Mathematics.

Ah, LaTeX3 - whatever happened to that?

    > Although I have no evidence of this, my expectation is that the
    > majority of tex users use the $…$ syntax (it is in fact widely
    > used outside of tex: in most markdown flavors and texmacs for
    > example). I also expect that a significant proportion of tex users
    > are not aware of the \(…\) syntax. I think here of users that are
    > less tech literate than most of this mailing list.

Agreed.

Best wishes,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
@ 2022-01-16 11:11         ` Tim Cross
  2022-01-16 13:26         ` Juan Manuel Macías
  2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 0 replies; 15+ messages in thread
From: Tim Cross @ 2022-01-16 11:11 UTC (permalink / raw)
  To: emacs-orgmode


Colin Baxter 😺 <m43cap@yandex.com> writes:

>>>>>> Sébastien Miquel <sebastien.miquel@posteo.eu> writes:
>
>     > Hi, With respect to readability, I only mean to point out that the
>     > $…$ syntax is one less character, and that the \(\) characters are
>     > quite overloaded.
>
> Indeed. Compare something like
>
> $g=\lim_{\delta m\to 0}(\delta F/\delta m)$
>
> with
>
> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)
>
> Backslash city! I know which one I'd prefer to read.
>
>     >> this is a good opportunity to point out that $/$$ are very much
>     >> second class citizens in LaTeX now, no matter what you may see in
>     >> old documents.
>
>     > The posts that you quote are 10 years old. As per [0] (2020),
>     > there will be no LaTeX3. Nor is it only old documents that use the
>     > $…$ syntax : looking for learning ressources (see [1]), everything
>     > that I find uses it. That includes The Not So Short Introduction
>     > to LaTeX [2] (2021) and
>     > https://en.wikibooks.org/wiki/LaTeX/Mathematics.
>
> Ah, LaTeX3 - whatever happened to that?
>
>     > Although I have no evidence of this, my expectation is that the
>     > majority of tex users use the $…$ syntax (it is in fact widely
>     > used outside of tex: in most markdown flavors and texmacs for
>     > example). I also expect that a significant proportion of tex users
>     > are not aware of the \(…\) syntax. I think here of users that are
>     > less tech literate than most of this mailing list.
>
> Agreed.
>
> Best wishes,

While I can see the advantages of $..$ for equations, I think we also
need to keep in mind that org mode is NOT a latex or tex editing mode.
While it is excellent at providing a higher level abstraction which
works well with Latex, other considerations also need to come into play,
especially with respect to efficient and consistent parsing of org mode
syntax. From that perspective, $...$ seem to add complexity which is
making it much harder to get consistency and efficiency in parsing and
processing things like font locking, indentation etc.

The question then becomes "Is the slight reduction in typing and/or
possibly more readable $..$ syntax sufficient justification for more
complex and difficult to maintain code for parsing, font-locking and
indentation/filling? Furthermore, could not the readability issue be
even further enhanced with the \[...\] syntax if we are able to parse
the contents more reliably/efficiently and possibly provide other
mechanisms to improve readability of math/formula? (i.e. better
font-locking, hiding of delimiters etc).

I'm not convinced arguments regarding what authors familiar with writing
in Tex/Latex are familiar with is terribly relevant to org mode. There
are already things in org mode which are inconsistent with what you
would write in pure Tex/Latex and as mentioned, org mode is not just a
front-end for writing Tex/Latex documents. Org has its own flavoured
markup and we should work towards making the syntax of that markup as
consistent, clean and verifiable as possible. 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
  2022-01-16  8:08     ` Sébastien Miquel
@ 2022-01-16 12:10     ` Eric S Fraga
  2022-01-16 14:30       ` Anthony Cowley
  1 sibling, 1 reply; 15+ messages in thread
From: Eric S Fraga @ 2022-01-16 12:10 UTC (permalink / raw)
  To: Timothy; +Cc: sebastien.miquel, org-mode-email

On Sunday, 16 Jan 2022 at 00:36, Timothy wrote:
>         Hmm. Not sure about this. Keystroke wise we’re comparing $$
>         to \(. The latter can be completed by smartparens, but since
>         single dollars are reasonable Org content the former can’t.
>         At this point the only argument is muscle memory, and if

As an aside, I will suggest including the following code in your Emacs
customization:

#+begin_src emacs-lisp :tangle "esf-org.el"
  ;; from Nicolas Richard <theonewiththeevillook@yahoo.fr>
  ;; Date: Fri, 8 Mar 2013 16:23:02 +0100
  ;; Message-ID: <87vc913oh5.fsf@yahoo.fr>
  (defun yf/org-electric-dollar nil
    "When called once, insert \\(\\) and leave point in between.
  When called twice, replace the previously inserted \\(\\) by one $."
         (interactive)
         (if (and (looking-at "\\\\)") (looking-back "\\\\("))
             (progn (delete-char 2)
                    (delete-char -2)
                    (insert "$"))
           (insert "\\(\\)")
           (backward-char 2)))
  (define-key org-mode-map (kbd "$") 'yf/org-electric-dollar)
#+end_src

I've been using this for years now and it works very well: I also had
$...$ in my muscle memory.

The only time it can be annoying is if you wish to edit/write org table
expressions directly instead of using org's features for this, such as
editing the equation (C-c ') or inserting one (C-c = with or without
C-u).

-- 
: Eric S Fraga, with org release_9.5.2-306-g9623da in Emacs 29.0.50


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
@ 2022-01-16 13:26         ` Juan Manuel Macías
  2022-01-16 14:43           ` Colin Baxter 😺
  2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 1 reply; 15+ messages in thread
From: Juan Manuel Macías @ 2022-01-16 13:26 UTC (permalink / raw)
  To: Colin Baxter, Timothy, Sébastien Miquel; +Cc: orgmode

Colin Baxter writes:

> Ah, LaTeX3 - whatever happened to that?

If you're a LaTeX user, you're already using LaTeX3 to a very high
extent, even if you don't see it. The current idea is not to replace
LaTeX2e with LaTeX3 as a new version, but to gradually incorporate
elements of LaTeX3 into the LaTeX kernel, like the new syntax, xparse,
etc. LaTeX3 is already present in many aspects of LaTeX, and that is an
undeniable advance. If anyone is interested in the state of the art,
this short talk by Frank Mittelbach at TUG 2020 is very illustrative:

https://invidious.snopyta.org/watch?v=zNci4lcb8Vo

Best regards,

Juan Manuel 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 12:10     ` Eric S Fraga
@ 2022-01-16 14:30       ` Anthony Cowley
  0 siblings, 0 replies; 15+ messages in thread
From: Anthony Cowley @ 2022-01-16 14:30 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: sebastien.miquel, org-mode-email, Timothy



> On Jan 16, 2022, at 7:13 AM, Eric S Fraga <e.fraga@ucl.ac.uk> wrote:
> 
> On Sunday, 16 Jan 2022 at 00:36, Timothy wrote:
>>        Hmm. Not sure about this. Keystroke wise we’re comparing $$
>>        to \(. The latter can be completed by smartparens, but since
>>        single dollars are reasonable Org content the former can’t.
>>        At this point the only argument is muscle memory, and if
> 
> As an aside, I will suggest including the following code in your Emacs
> customization:
> 
> #+begin_src emacs-lisp :tangle "esf-org.el"
>  ;; from Nicolas Richard <theonewiththeevillook@yahoo.fr>
>  ;; Date: Fri, 8 Mar 2013 16:23:02 +0100
>  ;; Message-ID: <87vc913oh5.fsf@yahoo.fr>
>  (defun yf/org-electric-dollar nil
>    "When called once, insert \\(\\) and leave point in between.
>  When called twice, replace the previously inserted \\(\\) by one $."
>         (interactive)
>         (if (and (looking-at "\\\\)") (looking-back "\\\\("))
>             (progn (delete-char 2)
>                    (delete-char -2)
>                    (insert "$"))
>           (insert "\\(\\)")
>           (backward-char 2)))
>  (define-key org-mode-map (kbd "$") 'yf/org-electric-dollar)
> #+end_src
> 
> I've been using this for years now and it works very well: I also had
> $...$ in my muscle memory.

This is a really helpful snippet, but I tried it out for a while a previous time this issue came up and found the readability of equations took too much of a hit. The “backslash city” really is tough to visually parse. Backslash density is already an unfortunate bit of the LaTeX experience, and I didn’t get used to the extra slashed characters as bookends over a two week trial.

I still wanted to express my appreciation for you sharing this!

Anthony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 13:26         ` Juan Manuel Macías
@ 2022-01-16 14:43           ` Colin Baxter 😺
  2022-01-16 15:16             ` Greg Minshall
  0 siblings, 1 reply; 15+ messages in thread
From: Colin Baxter 😺 @ 2022-01-16 14:43 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Sébastien Miquel, orgmode, Timothy

>>>>> Juan Manuel Macías <maciaschain@posteo.net> writes:

    > Colin Baxter writes:
    >> Ah, LaTeX3 - whatever happened to that?

    > If you're a LaTeX user, you're already using LaTeX3 to a very high
    > extent, even if you don't see it. The current idea is not to
    > replace LaTeX2e with LaTeX3 as a new version, but to gradually
    > incorporate elements of LaTeX3 into the LaTeX kernel, like the new
    > syntax, xparse, etc. LaTeX3 is already present in many aspects of
    > LaTeX, and that is an undeniable advance. If anyone is interested
    > in the state of the art, this short talk by Frank Mittelbach at
    > TUG 2020 is very illustrative:

Yes, I know. My remark was tongue in cheek.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 14:43           ` Colin Baxter 😺
@ 2022-01-16 15:16             ` Greg Minshall
  0 siblings, 0 replies; 15+ messages in thread
From: Greg Minshall @ 2022-01-16 15:16 UTC (permalink / raw)
  To: Colin Baxter 😺
  Cc: Juan Manuel Macías, Sébastien Miquel, orgmode, Timothy

Colin,

>     > Colin Baxter writes:
>     >> Ah, LaTeX3 - whatever happened to that?
...
> Yes, I know. My remark was tongue in cheek.

which leaves open whether your tongue was already in your cheek at:

> Indeed. Compare something like
> 
> $g=\lim_{\delta m\to 0}(\delta F/\delta m)$
> 
> with
> 
> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)

?

additionally, fwiw, i was a long time '$...$'-user.  at one point i was
betrayed, and switched to '\(...\)'.  it may be more to type (i hadn't
noticed the suggestion Eric just sent in), but i liked the
repeatability.  and, in terms of parsing, i'm very sympathetic to having
"directional" end markers.

cheers, Greg


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
  2022-01-16 13:26         ` Juan Manuel Macías
@ 2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 0 replies; 15+ messages in thread
From: Rudolf Adamkovič @ 2022-01-16 17:45 UTC (permalink / raw)
  To: Colin Baxter 😺, Sébastien Miquel; +Cc: org-mode-email, Timothy

Colin Baxter 😺 <m43cap@yandex.com> writes:

> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)
>
> Backslash city! I know which one I'd prefer to read.

Further, in-text single-letter variables that permeate mathematical
writing, and I think everyone would agree that $k$ reads well.  Alas, as
soon as one needs to write $k$-th, it stops working and one must rewrite
as \(k\)-th.  So, one often ends up using both ways anyway, right?

Rudy

-- 
"Logic is a science of the necessary laws of thought, without which no
employment of the understanding and the reason takes place." -- Immanuel
Kant, 1785

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Org Syntax Specification
  2022-01-09 18:02 Org Syntax Specification Timothy
  2022-01-15 12:40 ` Sébastien Miquel
@ 2022-01-18  0:54 ` Tom Gillespie
  2022-01-18 12:09   ` Ihor Radchenko
  1 sibling, 1 reply; 15+ messages in thread
From: Tom Gillespie @ 2022-01-18  0:54 UTC (permalink / raw)
  To: Timothy; +Cc: org-mode-email, Nicolas Goaziou

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

Hi Timothy,
    I have attached a patch with some modifications and a bunch of
comments (as footnotes). More replies in line. Thank you for all your
work on this!
Tom

> Marking this as depreciated would have no effect on Org’s current behaviour, but we could:
>
> Mark as depreciated now-ish
> Add a utility to convert from TeX-style to LaTeX-style
> Add org lint/fortification warnings
> A while later (half a decade? more?) actually remove support

In favor of this. There are good alternatives for this now.

> The other component of the syntax which feels particularly awkward to me is source block switches. They seem a bit odd, and since arguments exist, completely redundant.

Extremely in favor of removing switches. There are so many better ways
to do this now that aren't like some eldritch unix horror crawling up
out of the abyss and into the eBNF :)

[-- Attachment #2: 0001-Tom-s-comments-and-modifications-to-org-syntax-edite.patch --]
[-- Type: text/x-patch, Size: 41243 bytes --]

From 3527331f02e593ec6ba6cb4c8bde3f64de3ad216 Mon Sep 17 00:00:00 2001
From: Tom Gillespie <tgbugs@gmail.com>
Date: Mon, 17 Jan 2022 19:34:21 -0500
Subject: [PATCH] Tom's comments and modifications to org syntax edited

I removed any mention of markdown because it is a distraction in this
document and is not something we want anyone attending to here.

I change "top level section" to "zeroth section" which I think is more
consistent terminology because level is often used to refer to the
depth of parsing at any given point in the file and the top level
refers to anything that can be parsed without context. Zeroth makes it
clear that we are talking about the actual zeroth occurrence of a
section in a file/buffer/stream.
---
 dev/org-syntax-edited.org | 399 +++++++++++++++++++++++++++++++-------
 1 file changed, 331 insertions(+), 68 deletions(-)

diff --git a/dev/org-syntax-edited.org b/dev/org-syntax-edited.org
index c3259473..2e99070d 100644
--- a/dev/org-syntax-edited.org
+++ b/dev/org-syntax-edited.org
@@ -19,9 +19,7 @@ under the GNU General Public License v3 or later.
 Org is a plaintext format composed of simple, yet versatile, forms
 which represent formatting and structural information.  It is designed
 to be both intuitive to use, and capable of representing complex
-documents.  Like [[https://datatracker.ietf.org/doc/html/rfc7763][Markdown]], Org may be considered a lightweight markup
-language.  However, while Markdown refers to a collection of similar
-syntaxes, Org is a single syntax.
+documents.
 
 This document describes and comments on Org syntax as it is currently
 read by its parser (=org-element.el=) and, therefore, by the export
@@ -32,14 +30,13 @@ framework.
 ** Objects and Elements
 
 The components of this syntax can be divided into two classes:
-"[[#Objects][objects]]" and "[[#Elements][elements]]".  To better understand these classes,
-consider the paragraph as a unit of measurement.  /Elements/ are
-syntactic components that exist at the same or greater scope than a
-paragraph, i.e. which could not be contained by a paragraph.
-Conversely, /objects/ are syntactic components that exist with a smaller
-scope than a paragraph, and so can be contained within a paragraph.
-
-Elements can be stratified into "[[#Headings][headings]]", "[[#Sections][sections]]", "[[#Greater_Elements][greater
+"[[#Elements][elements]]" and "[[#Objects][objects]]".  Elements are
+syntactic components that have the same priority as or greater
+priority than a paragraph. Objects are syntactic components that are
+only recognized inside a paragraph or other paragraph-like elements
+such as heading titles.
+
+Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater
 elements]]", and "[[#Lesser_Elements][lesser elements]]", from broadest scope to
 narrowest.  Along with objects, these sub-classes define categories of
 syntactic environments.  Only [[#Headings][headings]], [[#Sections][sections]], [[#Property_Drawers][property drawers]], and
@@ -52,7 +49,12 @@ elements that cannot contain any other elements.  As such, a paragraph
 is considered a lesser element.  Greater elements can themselves
 contain greater elements or lesser elements. Sections contain both
 greater and lesser elements, and headings can contain a section and
-other headings.
+other headings. [fn:tom2:I would not discuss strata here because it is
+not related to the syntax of the document. It is related to how that
+syntax is interpreted by org mode. The strata are nesting rules that
+are independent of the syntax, and discussing that here in the syntax
+document is confusing, because the nesting is not something that can be
+parsed directly because it depends on the number of asterisks.]
 
 ** The minimal and standard sets of objects
 
@@ -60,25 +62,33 @@ To simplify references to common collections of objects, we define two
 useful sets.  The /<<<minimal set>>> of objects/ refers to [[#Plain_Text][plain text]], [[#Emphasis_Markers][text
 markup]], [[#Entities][entities]], [[#LaTeX_Fragments][LaTeX fragments]], [[#Subscript_and_Superscript][superscripts and subscripts]].  The
 /<<<standard set>>> of objects/ refers to the entire set of objects, excluding
-citation references and [[#Table_Cells][table cells]].
+citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should
+be treated in a way that is entirely separate from objects. This document has included
+them as such as has org-element (iirc) however since they can never appear in a paragraph
+and because tables are completely separate syntactically, we should probably drop the
+idea that table cells are objects. I realize that this might mean the creation of a
+distinction between paragraph-objects, title-objects, table-objects etc.]
 
 ** Blank lines
 
 A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
-is considered a /blank line/.  Blank lines can be used to separate
+is considered a /blank line/.  Blank lines separate
 paragraphs and other elements.
 
 With the exception of [[#Items][list items]], blank lines belong to the preceding
 element with the narrowest possible scope.  For example, if at the end
 of a section we have a paragraph and a blank line, that blank line is
-considered part of the paragraph.
+considered part of the paragraph.[fn:tom4:I don't think we need to discuss
+nesting scope here, it is confusing, it is always the immediately prior
+(lesser?) element.]
 
 ** Indentation
 
 Indentation consists of a series of space and tab characters at the
 beginning of a line. Most elements can be indentated, with the
 exception of [[#Headings][headings]], [[#Inlinetasks][inlinetasks]], [[#Footnote_Definitions][footnote definitions]], and [[#Diary_Sexp][diary
-sexps]].
+sexps]]. [fn::Maybe a note that indentation is only meaningful in plain lists
+and for greater blocks is aligned to the indentation of the #+end_ block?]
 
 ** Syntax patterns
 
@@ -97,7 +107,8 @@ meaning, For instance, "KEY" and "VALUE" when describing
 elements or objects.
 
 Unless otherwise specified, a space in a pattern represents one or
-more horizontal whitespace characters.
+more horizontal whitespace characters.[fn::This should be in bold
+so that people don't miss it.]
 
 Patterns will often also contain static structures that serve to
 differentiate a particular element or object type from others, but
@@ -141,25 +152,34 @@ In this document, unless specified otherwise, case is insignificant.
 :CUSTOM_ID: Headings
 :END:
 
-A Heading is a /unindented/ line structured according to the following pattern:
+A Heading is an /unindented/ line structured according to the following pattern:
 
 #+begin_example
 STARS KEYWORD PRIORITY TITLE TAGS
 #+end_example
 
-+ STARS :: A string consisting of one or more asterisks (up to
-  ~org-inlinetask-min-level~ if the =org-inlinetask= library is loaded)
++ STARS :: A string consisting of one or more asterisks[fn::removed
+  note about inline tasks because it is still a heading, any mention
+  of a concrete number should not appear in the specification of syntax.]
   and ended by a space character.  The number of asterisks is used to
-  define the level of the heading.
+  define the level of the heading. [fn::Implementation note: when parsing
+  stars the space following the stars MUST NOT BE CONSUMED and the next
+  phase of parsing MUST start with the space so that it is possible to have
+  a heading with no title that also has tags.]
 
 + KEYWORD (optional) :: A string which is a member of
   ~org-todo-keywords-1~[fn:otkw1:By default, ~org-todo-keywords-1~ only
-  contains =TODO= and =DONE=, however this is liable to change.].  Case is
-  significant.  This is called a "TODO keyword".
-
-+ PRIORITY (optional) :: A single alphanumeric character preceded by a
-  hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#1]=).  This
-  is called a "priority cookie".
+  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
+  variable and can be set by users in an org file using =#+todo:=.].
+  Case is significant.  This is called a "TODO keyword". [fn::Implementation note:
+  TODO keywords cannot be hardcoded in a tokenizer, the tokenizer must
+  be configurable at runtime so that in-file TODO keywords are properly interpreted.]
+
++ PRIORITY (optional) :: A single letter preceded by a
+  hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#D]=).  This
+  is called a "priority cookie".[fn::Numeric values are not supported
+  it is a quirk of the elisp implementation that they appear to work,
+  however they break in nasty and unexpected ways.]
 
 + TITLE (optional) :: A series of objects from the standard set,
   excluding line break objects.  It is matched after every other part.
@@ -180,15 +200,17 @@ STARS KEYWORD PRIORITY TITLE TAGS
 If the first word appearing in the title is =COMMENT=, the heading
 will be considered as "commented".  Case is significant.
 
-If its title is the value of ~org-footnote-section~ (=Footnotes= by
-default), it will be considered as a "footnote section".  Case is
-significant.
+If the title of a heading is exactly the value of ~org-footnote-section~
+(=Footnotes= by default), it will be considered as a "footnote section".
+Case is significant.
 
 If =ARCHIVE= is one of the tags given, the heading will be considered as
 "archived".  Case is significant.
 
-A heading contains directly one section (optionally), followed by
-any number of deeper level headings.
+The level of a heading can be used to construct a nested structure.
+All content following a heading that appears before the next heading
+(regardless of the level of that next heading) is a section. In addition,
+text before the first heading in an org document is also a section.
 
 *** Sections
 :PROPERTIES:
@@ -197,7 +219,15 @@ any number of deeper level headings.
 
 Sections contain one or more non-heading elements.  With the exception
 of the text before the first heading in a document (which is
-considered a section), sections only occur within headings.
+considered a section), sections only occur within headings.[fn:: The
+choice to call this syntactic component a section is confusing because
+it is at odds with the usual notion of a section, namely that the
+usual concept of a section implies that it includes nested content.  I
+personally didn't realize that it ended at the next heading until
+writing this comment (as can be seen from reading my comments in the
+laundry implementation). Therefore I suggest that we look for an
+alternate name for this syntactic component. Maybe "segment" or
+something similar that indicates that it is truncated?]
 
 *Example*
 
@@ -224,31 +254,67 @@ Its internal structure could be summarized as:
    (heading))))
 #+end_example
 
-*** The top level section
+*** The zeroth section
 :PROPERTIES:
-:CUSTOM_ID: Top_level_section
+:CUSTOM_ID: Zeroth_section
 :END:
 
 All elements before the first heading in a document lie in a special
-section called the /top level section/.  It may be preceded by blank
-lines.  Unlike a normal section, the top level section can immediately
+section called the /zeroth section/.  It may be preceded by blank
+lines.  Unlike a normal section, the zeroth section can immediately
 contain a [[#Property_Drawers][property drawer]], optionally preceded by [[#Comments][comments]].  It cannot
-however, contain [[Planning][planning]].
+however, contain [[Planning][planning]].[fn::This is wrong? If it is not
+wrong, then it should be. Property drawers are already annoying to implement
+because they share syntax with regular drawers, and allowing a property drawer
+at the top of a file without a heading means that it should be a regular drawer
+not a property drawer, otherwise you have to special case the handling of drawers
+in the zeroth section. What is the use case for a property drawer as opposed to
+a #+property: line in the zeroth section? I may come around on this at some point,
+but right now it seems more complex, however it might actually be more consistent
+if we imagine the zeroth section as being nested inside a single heading that has
+level zero implicitly at the top of a document. Unfortunately that means that such
+property drawers cannot be determined from a homogeneous syntax but instead require
+some operations on the internal representation. Note also that if this were allowed
+then the property drawer should only be allowed as the very first line of a file
+because newlines at the start of a file need to be preserved. More though required.]
 
 ** Affiliated Keywords
 :PROPERTIES:
 :CUSTOM_ID: Affiliated_Keywords
 :END:
 
+ [fn::Without going into to much detail, affiliated keywords should
+not be distinguished from other keywords at the level of the syntax.
+The fact that they are is an artifact of the elisp implementation.
+The determination of the behavior of a keyword with regard to
+affiliating behavior should be determined in a later pass, even if in
+some cases some implementations may want to materialize them into the
+parser for performance reasons. Allowing users to promote a keyword to
+be an affiliated keyword would be incredibly powerful for attaching
+metadata to parts of org-files in a way that is user extensible. It
+may still be desirable to describe the behavior of affiliated keywords
+here, but they are not in any way distinct from other keywords at the
+level of org syntax and trying to implement them as such is usually a
+mistake (that I have made).]
+
 With the exception of [[#Comments][comments]], [[#Clocks][clocks]], [[#Headings][headings]], [[#Inlinetasks][inlinetasks]],
 [[#Items][items]], [[#Node_Properties][node properties]], [[#Planning][planning]], [[#Property_Drawers][property drawers]], [[#Sections][sections]], and
 [[#Table_Rows][table rows]], every other element type can be assigned attributes.
+ [fn::Technically tables can be assigned attributes, if you try to affiliate to a table
+row you are accidentally creating a new table. Also, comments probably shouldn't be
+in this list, but I need to review what the behavior was when trying to affiliate
+to a paragraph where there is a comment in between, I'm pretty sure it doesn't work
+though some of the reordering via org-element does .... Being able to affiliate to
+comments could be quite powerful for some specialized use cases.]
 
 This is done by adding specific [[#Keywords][keywords]], named /affiliated/ keywords,
 immediately above the element considered (a blank line cannot lie
 between the affiliated keyword and element). Structurally, affiliated
 keyword are not considered an element in their own right but a
-property of the element they apply to.
+property of the element they apply to. [fn::While it is tempting to try
+to do this at the level of the grammar it induces a number of nasty
+ambiguities in practice. It is saner to have a single unified keyword
+syntax and then to determine affiliation behavior in a later pass.]
 
 Affiliated keywords are structured according to one of the following pattern:
 
@@ -268,19 +334,42 @@ Affiliated keywords are structured according to one of the following pattern:
 + OPTVAL (optional) :: A string consisting of any characters but a
   newline.  This term is only valid when KEY is a member of
   ~org-element-dual-keywords~[fn:oedkw:By default,
-  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].
+  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn::
+  All keywords should allow OPTVAL, it regularizes and simplifies the syntax.]
 + VALUE :: A string consisting of any characters but a newline, except
   in the case where KEY is member of
   ~org-element-parsed-keywords~[fn:oepkw:By default,
   ~org-element-parsed-keywords~ contains =CAPTION=.] in which case VALUE
   is a series of objects from the standard set, excluding footnote
-  references.
-
-Repeating an affiliated keyword before an element will usually result
-in the prior VALUEs being overwritten by the last instance of KEY.
-There are two situations under which the VALUEs will be concatenated:
+  references (and line breaks ???).[fn::This is confusing.  A
+  =#+caption:= cannot contain a @@export: snippet@@ with a newline in
+  it, which this text seems to imply. A better wording would be to
+  state that there are some keywords where the contents of VALUE will
+  be further parsed as paragraphs (or whatever we are calling that
+  thing now. I think we are still missing the term for "object
+  containing syntax component")]
+
+ [fn::The behavior of affiliated keywords with respect to shadowing
+needs to be fully specified because it has major semantics implications,
+and for org babel headers it has security implications.]
+By default when there are multiple affiliated keywords that last occurrence
+of a given keyword is the one that has priority. Normally users should not
+specify more than a single instance of an affiliated keyword per element, but
+if they do the last one on the page wins.
+
+The default behavior is NOT followed for the ~#+header:~ keyword that
+is used for org-babel blocks. ~#+header:~ keywords combine header
+fields and resolve conflicts by having the top right most (first line
+last instance on the line) instance of field take priority. [fn::This
+behavior is critical for org babel and code execution security. If
+there are cases where aff keywords are not following this behavior
+then they need to be fixed. The reason to do first one wins in cases
+like this is so that users do not have to insert lines below which
+lead to hard to understand diffs.]
+
+In addition, there are two situations in which the VALUEs will be concatenated:
 1. If KEY is a member of ~org-element-dual-keywords~[fn:oedkw].
-2. If the affiliated keyword is an instance of the patten
+2. If the affiliated keyword is an instance of the pattern
    =#+attr_BACKEND: VALUE=.
 
 The following example contains three affiliated keywords:
@@ -296,16 +385,20 @@ The following example contains three affiliated keywords:
 :CUSTOM_ID: Greater_Elements
 :END:
 
-Unless specified otherwise, greater elements can contain directly
+Unless otherwise specified, greater elements can directly contain
 any greater or [[#Lesser_Elements][lesser element]] except:
 + Elements of their own type.
 + [[#Planning][Planning]], which may only occur in a [[#Headings][heading]].
-+ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Top_level_section][top level
++ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Zeroth_section][zeroth
   section]].
 + [[#Node_Properties][Node properties]], which can only be found in [[#Property_Drawers][property drawers]].
 + [[#Items][Items]], which may only occur in [[#Plain_Lists][plain lists]].
 + [[#Table_Rows][Table rows]], which may only occur in [[#Tables][tables]].
 
+ [fn::This is somewhat confusing because it lists combinations that
+should already be impossible by default because e.g. items are meaningless
+outside plain lists and should not even be mentioned outside of that context.]
+
 *** Greater Blocks
 :PROPERTIES:
 :CUSTOM_ID: Greater_Blocks
@@ -329,10 +422,14 @@ CONTENTS
   than a newline.
 + CONTENTS :: A collection of zero or more elements, subject to two
   conditions:
-  - No line may start with =#+end_NAME=.
+  - No line in the block may start with =#+end_NAME=.
   - Lines beginning with an asterisk must be quoted by a comma (=,*=).
   Furthermore, lines starting with =#+= may be quoted by a comma (=,#+=).
 
+ [fn::Implementation note: ~#+begin_name~ to ~#+end_name~ usually needs to
+be implemented in the tokenization step. The substructure discussed here
+is thus usually handled in a second pass.]
+
 *** Drawers and Property Drawers
 :PROPERTIES:
 :CUSTOM_ID: Drawers
@@ -349,6 +446,8 @@ CONTENTS
   and underscores (=-_=).
 + CONTENTS :: A collection of zero or more elements, except another drawer.
 
+ [fn:: ~:end:~ may be capitalized (legacy support)]
+
 *** Dynamic Blocks
 :PROPERTIES:
 :CUSTOM_ID: Dynamic_Blocks
@@ -366,12 +465,20 @@ CONTENTS
 + CONTENTS :: A collection of zero or more elements, except another
   dynamic block.
 
+ [fn::The spec needs to clarify how to handle ~#+begin:~ alone on a line or follow by
+only whitespace. It is quite nasty to have the behavior of ~#+begin:~ change if it is
+or is not followed by invisible whitespace. I suggest that we change the behavior of
+~#+begin:~ without whitespace to regularize it so that it is _always_ the start of a
+dynamic block since the ~#+begin:~ keyword by itself is pretty much completely useless
+since if you put anything after it, it becomes the start of a dynamic block anyway.]
+
 *** Footnote Definitions
 :PROPERTIES:
 :CUSTOM_ID: Footnote_Definitions
 :END:
 
-Footnote definitions must occur at the start of an /unindented/ line,
+Footnote definitions must occur at the start of an /unindented/ line
+(they must be preceeded by only a newline, nothing else),
 and are structured according to the following pattern:
 #+begin_example
 [fn:LABEL] CONTENTS
@@ -401,6 +508,13 @@ It even contains a single blank line.
 :CUSTOM_ID: Inlinetasks
 :END:
 
+ [fn::I suggest that we remove inlinetasks from this document.
+They are a hack that cannot be implemented as part of a grammar
+because they require a concrete value to be specified which breaks
+the arbitrary nesting depth of headings. I think I wrote this somewhere
+else as well, but inline tasks can only be a layer on top of headings,
+they cannot displace them.]
+
 Inlinetasks are syntactically a [[#Headings][heading]] with a level of at least
 ~org-inlinetask-min-level~[fn:oiml:The default value of
 ~org-inlinetask-min-level~ is =15=.], i.e. starting with at least that
@@ -448,8 +562,8 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
   character, or a hyphen enclosed by square brackets (i.e. =[ ]=, =[X]=, or =[-]=).
 + TAG (optional) :: An instance of the pattern =TAG-TEXT ::= where
   =TAG-TEXT= represents a string consisting of non-newline characters
-  that does not contain the substring "\nbsp{}::\nbsp{}" (two colons surrounded by
-  whitespace).
+  that does not contain the substring ~" :: "~ (two colons surrounded by
+  whitespace without the quotes).
 + CONTENTS (optional) :: A collection of zero or more elements, ending
   at the first instance of one of the following:
   - The next item.
@@ -457,6 +571,22 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
     not counting lines within other elements or [[#Inlinetasks][inlinetask]] boundaries.
   - Two consecutive blank lines.
 
+ [fn:: The description of CONTENTS is confusing since it cannot contain
+a heading, which is implicit in the indentation rule but not
+obvious. In addition, contents may not actually contain zero or more
+elements because many elements must start on their own line. So
+e.g. 1. #+begin_src does not work, however, the wording seems to
+indicate that it should, which is misleading. Further, it is actually
+not possible to implement contents as specified because grammars
+cannot track the indentation level that is required to reconstruct
+list items correctly. Therefore CONTENTS should not be defined as such
+but should only specify that they can be anything except a newline. I
+think that the intent of this document is somewhat a conflation of the
+syntax for org and of the semantics as determined by export backends
+and/or org-element, however it makes it extremely confusing because it
+is not actually possible to parse CONTENTS, they must be reconstructed
+from the parse tree.]
+
 *Examples*
 
 #+begin_example
@@ -471,11 +601,17 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
 :END:
 
 A /plain list/ is a set of consecutive [[#Items][items]] of the same indentation.
+ [fn::This is confusing because the definition of contents above is
+confusing, it also implies that plain lists cannot be nested, or are
+not somehow nested, which is also confusing. Maybe a line to the effect
+that plain lists may be nested along with any other element that is
+properly indented or something?]
 
 If first item in a plain list has a COUNTER in its BULLET, the plain
 list will be an "ordered plain-list".  If it contains a TAG, it will
 be a "descriptive list".  Otherwise, it will be an "unordered list".
-List types are mutually exclusive.
+List types are mutually exclusive at the same level of indentation, if
+both types are present consecutively then they parse as separate lists.
 
 For example, consider the following excerpt of an Org document:
 
@@ -524,6 +660,13 @@ CONTENTS
 + CONTENTS :: A collection of zero or more [[#Node_Properties][node properties]], not
   separated by blank lines.
 
+ [fn::The failure mode for malformed contents needs to be
+determined more clearly here. We don't want property draws to suddenly
+become plain drawers just because a user has a malformed line, that
+could be disastrous if certain settings in the property drawer mask
+settings from further up the tree.  In short, malformed contents
+should not poison the whole property drawer.]
+
 *Example*
 
 #+begin_example
@@ -537,11 +680,24 @@ CONTENTS
 :CUSTOM_ID: Tables
 :END:
 
+ [fn::I think that this section needs to be split into two separate
+sections one for each grammar. It will make it much easier to specify
+each grammar, and it will also make it clear that they are not
+syntactic elements that are trivially interchangeable since only a
+subset of tables.el tables can be converted to org table syntax (at
+the moment). I'm willing to take a shot at it.]
+
 Tables are started by a line beginning with either:
 + A vertical bar (=|=), forming an "org" type table.
 + The string =+-= followed by a sequence of plus (=+=) and minus (=-=)
   signs, forming a "table.el" type table.
 
+ [fn::Consider whether tables.el tables should be supported by the
+syntax outside of elisp org mode. There are some slightly divergent
+use cases and features and we likely need/want to explore some of
+the alternatives proposed for how to allow pure org tables to support
+the features that are currently only possible for tables.el tables.]
+
 Tables cannot be immediately preceded by such lines, as the current
 line would the be part of the earlier table.
 
@@ -577,6 +733,11 @@ blocks]], [[#Paragraphs][paragraphs]] or [[#Table_Rows][table rows]] can contain
 :CUSTOM_ID: Babel_Call
 :END:
 
+ [fn::As with the other keyword-like things and syntax, I suggest that
+we ultimately move babel calls to live under a section on keyword
+content parsers so that it is clear that they should not be treated as
+separate syntactic components.]
+
 Babel calls are structured according to one of the following patterns:
 #+begin_example
 ,#+call: NAME(ARGUMENTS)
@@ -593,11 +754,20 @@ Babel calls are structured according to one of the following patterns:
   non-newline characters.  Opening and closing square brackets must be
   balanced.
 
+ [fn::Nesting rules for the parens and square brackets need revie here
+and elsewhere. The "must be balanced" requirement is implemented with
+an extremely nasty materialized regex which only works for 3 or 4 levels
+of nesting and thus is really likely to not be what we want.]
+
 *** Blocks
 :PROPERTIES:
 :CUSTOM_ID: Blocks
 :END:
 
+ [fn::These probably should not actually be distinct from greater blocks.
+the syntax is the same, the only difference is that there are 5 types that
+have special specified handling.]
+
 Like [[#Greater_Blocks][greater blocks]], blocks are structured according to the following pattern:
 
 #+begin_example
@@ -622,7 +792,8 @@ CONTENTS
     the pattern =LANGUAGE SWITCHES ARGUMENTS= with:
     + LANGUAGE :: A string consisting of any non-whitespace characters
     + SWITCHES :: Any number of SWITCH patterns, separated by a single
-      space character
+      space character [fn::For the love of all that is sane can we
+      please just remove this from the spec or mark it as legacy.]
       - SWITCH :: Either the pattern =-l "FORMAT"= where =FORMAT=
         represents a string consisting of any characters but a double
         quote (="=) or newline, or the pattern =-S= or =+S= where =S=
@@ -631,7 +802,7 @@ CONTENTS
 + CONTENTS (optional) :: A string consisting of any characters
   (including newlines) subject to the same two conditions of greater
   block's CONTENTS, i.e.
-  - No line may start with =#+end_NAME=.
+  - No line in the block may start with =#+end_NAME=.
   - Lines beginning with an asterisk must be quoted by a comma (=,*=).
   As with greater blocks, lines starting with =#+= may be quoted by a
   comma (=,#+=).
@@ -655,6 +826,10 @@ CONTENTS
 :CUSTOM_ID: Clocks
 :END:
 
+ [fn::This section seems to have been made way too simple? Or is the
+specifically the clocking-clock? If it is the clocking-clock then
+that should be clarified.]
+
 A clock element is structured according to the following pattern:
 
 #+begin_example
@@ -674,7 +849,6 @@ clock: INACTIVE-TIMESTAMP-RANGE DURATION
 clock: [2024-10-12]
 #+end_example
 
-
 *** Diary Sexp
 :PROPERTIES:
 :CUSTOM_ID: Diary_Sexp
@@ -701,6 +875,9 @@ A diary sexp[fn::A common abbreviation for S-expression] element is an
 :CUSTOM_ID: Planning
 :END:
 
+ [fn::I think this and property drawers should be moved to be closer
+to the heading spec section?]
+
 A planning element is structured according to the following pattern:
 
 #+begin_example
@@ -709,13 +886,15 @@ PLANNING
 #+end_example
 
 + HEADING :: A [[#Headings][heading]] element.
-+ PLANNING :: A line consisting of a series of =KEYWORD: TIMESTAMP=
++ PLANNING :: A line consisting of one or more =KEYWORD: TIMESTAMP=
   patterns (termed "info" patterns).
-  - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=.
+  - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=. [fn::
+    Request to add the =OPENED= keyword to track when a task was first known/entered into a file.]
   - TIMESTAMP :: A [[#Timestamps][timestamp]] object.
 
-It is not permitted for any blank lines to lie between HEADING and
-PLANNING.
+PLANNING must directly follow HEADING without any blank lines in between. 
+
+ [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]
 
 *Example*
 
@@ -742,7 +921,6 @@ Comments consist of one or more consecutive comment lines.
 # Over multiple lines
 #+end_example
 
-
 *** Fixed Width Areas
 :PROPERTIES:
 :CUSTOM_ID: Fixed_Width_Areas
@@ -773,6 +951,9 @@ consecutive hyphens (=-----=).
 :CUSTOM_ID: Keywords
 :END:
 
+ [fn::Reminder about regularizing keyword syntax so that it
+always supports ~#+key[opt]:value~ syntax.]
+
 Keywords are structured according to the following pattern:
 
 #+begin_example
@@ -780,7 +961,11 @@ Keywords are structured according to the following pattern:
 #+end_example
 
 + KEY :: A string consisting of any non-whitespace characters, other
-  than =call= (which would forms a [[#Babel_Call][babel call]] element).
+  than =call= (which would forms a [[#Babel_Call][babel call]] element). [fn::This is
+  why I have the note on the ~#+call:~ section. If someone tries to
+  implement this they are going to be in a world of pain because there
+  is a concrete value here. This is because that distinction is not in
+  the syntax but instead should be in a later stage.]
 + VALUE :: A string consisting of any characters but a newline.
 
 When KEY is a member of ~org-element-parsed-keywords~[fn:oepkw], VALUE can contain
@@ -791,7 +976,9 @@ Note that while instances of this pattern are preferentially parsed as
 keyword may occur so long as it is not immediately preceding a valid
 element that can be affiliated.  For example, an instance of
 =#+caption: hi= followed by a blank line will be parsed as a keyword,
-not an affiliated keyword.
+not an affiliated keyword. [fn::A full spec for user defined aff keywords
+will require a bit more clarity for how lonely affiliated keywords should
+behave.]
 
 *** LaTeX Environments
 :PROPERTIES:
@@ -838,6 +1025,16 @@ according to one of the following patterns:
   which does not end in a plus characters (=+=).
 + VALUE (optional) :: A string containing any characters but a newline.
 
+ [fn::This spec is not consistent with the behavior and has bad design.
+Name should be allowed to be empty, same as with heading tags. It is
+critical to include the empty string as part of a grammar like this so
+that it is closed, otherwise we get nasty edge cases. For example it
+should be the case that ~:+:~ is syntactically valid as a node
+property.  The fact that it won't apply to anything is ok, it might
+also be useful if we regularize ~#+begin_NAME~ to allow the empty
+string for NAME. Note that ~:+:~ is already treated as syntactically
+valid for font locking and for property drawer detection (I think).]
+
 *** Paragraphs
 :PROPERTIES:
 :CUSTOM_ID: Paragraphs
@@ -850,11 +1047,21 @@ Empty lines and other elements end paragraphs.
 
 Paragraphs can contain the standard set of objects.
 
+ [fn::Implementation note: it is possible to define
+paragraphs constructively instead of as they are defined
+here as the negation or fall through of all other things.
+We should update this section with the positive definition
+once I have it nailed down.]
+
 *** Table Rows
 :PROPERTIES:
 :CUSTOM_ID: Table_Rows
 :END:
 
+ [fn::I suggest we roll this up into the org tables section
+spec so that we don't have to worry about making a note that
+these only occur in tables.]
+
 A table row consists of a vertical bar (=|=) followed by:
 + Any number of [[#Table_Cells][table cells]], forming a "standard" type row.
 + A hyphen (=-=), forming a "rule" type row.  Any non-newline characters
@@ -892,6 +1099,13 @@ such as a paragraph.
 :CUSTOM_ID: Entities
 :END:
 
+ [fn::As I think I mention elsewhere, the concrete names here
+should NOT be part of the syntax, it makes the parser brittle
+and hard to maintain. Differentiation between entities and fragments
+should be handled at the syntax level for cases where the fragment
+has brackets, and then in a second pass for values that are
+syntactically entity-or-fragment and must be determined after
+the fact.]
 Entities are structured according to the following pattern:
 
 #+begin_example
@@ -987,6 +1201,7 @@ ought to be removed.
 They are slow to parse, fragile, redundant and imply false
 positives.  --- ngz
 #+end_quote
+ [fn::Strong support for removing these.]
 
 ** Export Snippets
 :PROPERTIES:
@@ -1002,6 +1217,12 @@ Export snippets are structured according to the following pattern:
 + BACKEND :: A string consisting of alphanumeric characters and hyphens.
 + VALUE (optional) :: A string containing anything but the string =@@=.
 
+ [fn::We probably want to node that BACKEND can be the empty string
+per that thread on how to deal with intra-word markup. Again this
+also touches on the general principle of wanting to close over the
+empty string so that users aren't surprised when ~@@:lol@@~ suddenly
+appears in plain text just because no backend was specified.]
+
 ** Footnote References
 :PROPERTIES:
 :CUSTOM_ID: Footnote_References
@@ -1019,14 +1240,17 @@ Footnote references are structured according to one of the following patterns:
   hyphens and underscores (=-_=).
 + DEFINITION (optional) :: A series of objects from the standard set,
   so long as opening and closing square brackets are balanced within
-  DEFINITION.
+  DEFINITION. [fn::As noted elsewhere, the balanced brackets
+  requirement is a nightmare and needs a review.]
 
 If the reference follows the second pattern, it is called an "inline
 footnote".  If it follows the third pattern, i.e. if LABEL is omitted,
 it is called an "anonymous footnote".
 
 Note that the first pattern may not occur on an /unindented/ line, as it
-is then a [[#Footnote_Definitions][footnote definition]].
+is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right?
+the font locking code is not consistent with actual behavior, need to
+review the laundry test cases and example files.]
 
 ** Citations
 :PROPERTIES:
@@ -1127,7 +1351,8 @@ src_LANG[HEADERS]{BODY}
 + LANG :: A string consisting of any non-whitespace characters.
 + HEADERS (optional), BODY (optional) :: A string consisting of any
   characters but a newline.  Opening and closing square brackets must
-  be balanced.
+  be balanced. [fn::Nesting issues need review. Suggestion to do
+  something like what Racket scribble does.]
 
 ** Line Breaks
 :PROPERTIES:
@@ -1366,6 +1591,10 @@ SIGN CHARS FINAL
 :CUSTOM_ID: Table_Cells
 :END:
 
+ [fn::Need to condense this with tables and table rows because
+spreading these out makes it super hard to understand the table syntax
+for basically no reason.]
+
 Table cells are structured according to the following pattern:
 
 #+begin_example
@@ -1375,7 +1604,12 @@ CONTENTS SPACES|
 + CONTENTS :: A series of objects not containing the vertical bar
   character (=|=).  It can contain the minimal set of objects,
   [[#Citations][citations]], [[#Export_Snippets][export snippets]], [[#Footnote_References][footnote references]], [[#Links][links]], [[#Macros][macros]],
-  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]].
+  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in
+  macros, I think it would be safe to add ~\|~ as an escape character.
+  The issue in the elisp implementation is not actually at the level
+  of the syntax, but is actually in the export backends or somewhere
+  deeper, because even using a macro that expands to be a pipe ~|~
+  breaks the table (which is really bad).]
 + SPACES :: A string consisting of zero or more of space characters,
   used to align the table columns.
 
@@ -1386,6 +1620,10 @@ The final vertical bar (=|=) may be omitted in the last cell of a row.
 :CUSTOM_ID: Timestamps
 :END:
 
+ [fn::I have some suggestions for extensions to timestamp syntax to
+support historical and far future dates, as well timezone offsets (NOT
+the 3 letter ambiguous disaster) and seconds and sub-second times.]
+
 Timestamps are structured according to one of the seven following patterns:
 
 #+begin_example
@@ -1470,6 +1708,10 @@ BORDER BODY BORDER
 + [[#Special_Tokens][POST]] :: Either a whitespace character, =-=, =.=, =,=, =;=, =:=, =!=, =?=, ='=, =)=, =}=,
   =[=, ="=, or the end of a line.
 
+The four =*/_+= may be arbitrarily nested to any depth. Verbatim and
+code ==~= may be nested inside any other markup, but no other markup
+will be interpreted inside of them since they are interpreted exactly.
+
 *Examples*
 
 #+begin_example
@@ -1483,12 +1725,16 @@ functions starting with ~org-element-~.
 :CUSTOM_ID: Plain_Text
 :END:
 
+ [fn::I'm not sure I would add this, the fall through is sloppy
+and it is better to specify values constructively.]
+
 Any string that doesn't match any other object can be considered a
 plain text object.[fn::In ~org-element.el~ plain text objects are
 abstracted away to strings for performance reasons.]
 Within a plain text object, all whitespace is collapsed to a single
 space. For instance, =hello\n there= is equivalent to =hello there=.
 
+
 * Footnotes
 
 [fn:1] In particular, the parser requires stars at column 0 to be
@@ -1497,7 +1743,10 @@ quoted by a comma when they do not define a heading.
 [fn:2] It also means that only headings and sections can be recognized
 just by looking at the beginning of the line.  Planning lines and
 property drawers can be recognized by looking at one or two lines
-above.
+above. [fn::This is incorrect. There are many elements that can be
+recognized by looking at the start of a line, however the conflation
+between pure syntax level and other levels of parsing and processing
+obscure this.]
 
 As a consequence, using ~org-element-at-point~ or ~org-element-context~
 will move up to the parent heading, and parse top-down from there
@@ -1571,6 +1820,20 @@ until context around the original location is found.
 :CUSTOM_ID: Entities_List
 :END:
 
+ [fn::The org entities section is useful, but I suggest not including
+that section at all right now. There is a way to define and abstract
+syntax that does not require the parser to pull in all those concrete
+forms which reflects how org mode implements that functionality but
+should not be specified as part of the syntax document. There are some
+significant edge cases that need to be worked out in the grammar for
+this that having a hardcoded list masks. I suggest we work that
+portion out before committing any of that to a spec doc. It is also a
+bad idea to list of all of those in the spec doc because it will
+likely get out of sync with the code that implements such detection in
+elisp (despite the fact that the list is being auto generated via a
+code block). Maybe it makes sense to include the code block so that
+devs and users can discover it for themselves?]
+
 #+begin_src emacs-lisp :results raw :exports results
 (concat "| Name | Character |\n|-\n"
         (mapconcat
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Org Syntax Specification
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
@ 2022-01-18 12:09   ` Ihor Radchenko
  0 siblings, 0 replies; 15+ messages in thread
From: Ihor Radchenko @ 2022-01-18 12:09 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: org-mode-email, Nicolas Goaziou, Timothy

Tom Gillespie <tgbugs@gmail.com> writes:

> Extremely in favor of removing switches. There are so many better ways
> to do this now that aren't like some eldritch unix horror crawling up
> out of the abyss and into the eBNF :)

I also agree that switches and $$-style equations may be deprecated.
We can
1. Do not mention them in the document
2. Add org-lint warnings about obsoletion

As for your other comments, you seem to be suggesting a number of
changes to the existing Org syntax. Some of them looks fine, some are
not. However, please keep in mind that we have to deal with back
compatibility, third party compatibility, and not breaking existing Org
documents unless we have a very strong justification. I suggest to
branch a number of new threads from here for each concrete suggestion
where you want to make changes to Org syntax, as opposed to just
document wording. Otherwise, this discussion will become a total mess.

More details below.

> +Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater

Nope. Sections are actually elements. See =org-element-all-elements=.

> +other headings. [fn:tom2:I would not discuss strata here because it is
> +not related to the syntax of the document. It is related to how that
> +syntax is interpreted by org mode. The strata are nesting rules that
> +are independent of the syntax, and discussing that here in the syntax
> +document is confusing, because the nesting is not something that can be
> +parsed directly because it depends on the number of asterisks.]

I disagree. Nesting rules are the important part of syntax. We have
restrictions on what elements can be inside other element. The same
patterns are not recognised in Org depending on their nesting. For
example, links that you put into property drawers are not considered
link objects.
  
> +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should
> +be treated in a way that is entirely separate from objects. This document has included
> +them as such as has org-element (iirc) however since they can never appear in a paragraph
> +and because tables are completely separate syntactically, we should probably drop the
> +idea that table cells are objects. I realize that this might mean the creation of a
> +distinction between paragraph-objects, title-objects, table-objects etc.]

Again I disagree. While your idea about table cells is reasonable
(similar for citation-references inside citations), I am against
decoupling Org syntax from org-element implementation. In
org-element.el, table-cells are just yet another object. If we make
things in org-element and syntax document out of sync, confusion and
errors will follow during future maintenance.
  
>  A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
> -is considered a /blank line/.  Blank lines can be used to separate
> +is considered a /blank line/.  Blank lines separate
>  paragraphs and other elements.

This actually reads slightly confusing. "Blank lines separate paragraphs
and other elements" sounds like blank lines are only relevant
before/after paragraphs. However, there are also footnote references and
lists. Maybe we can try something like:

Blank lines can be used to indicate end of some elements.

"can" because a single blank line usually does not separate anything.

> +considered part of the paragraph.[fn:tom4:I don't think we need to discuss
> +nesting scope here, it is confusing, it is always the immediately prior
> +(lesser?) element.]

Then where can we put it? This is one of the tricky conventions we use
in the parser.
  
> ++ STARS :: A string consisting of one or more asterisks[fn::removed
> +  note about inline tasks because it is still a heading, any mention
> +  of a concrete number should not appear in the specification of
> syntax.]

I am not sure here. Inline tasks are special because a one-line inline
task must not contain any text below, cannot have planning or
properties.

> +  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
> +  variable and can be set by users in an org file using =#+todo:=.].

If we mention this, we also need to elaborate kind of element is
#+todo:, where it can be located, and how to parse multiple instances of
#+todo in the document.

> -A heading contains directly one section (optionally), followed by
> -any number of deeper level headings.
> +The level of a heading can be used to construct a nested structure.
> +All content following a heading that appears before the next heading
> +(regardless of the level of that next heading) is a section. In addition,
> +text before the first heading in an org document is also a section.

Note that it is not true for one-line inline tasks.

> +considered a section), sections only occur within headings.[fn:: The
> +choice to call this syntactic component a section is confusing because
> +it is at odds with the usual notion of a section, namely that the
> +usual concept of a section implies that it includes nested content.  I
> +personally didn't realize that it ended at the next heading until
> +writing this comment (as can be seen from reading my comments in the
> +laundry implementation). Therefore I suggest that we look for an
> +alternate name for this syntactic component. Maybe "segment" or
> +something similar that indicates that it is truncated?]

Sounds reasonable. However, we may also need to make this change in
Elisp level, which is tricky when you think about
backward-compatibility.
  
> +however, contain [[Planning][planning]].[fn::This is wrong? If it is not
> +wrong, then it should be. Property drawers are already annoying to implement
> +because they share syntax with regular drawers, and allowing a property drawer
> +at the top of a file without a heading means that it should be a regular drawer
> +not a property drawer, otherwise you have to special case the handling of drawers
> +in the zeroth section. What is the use case for a property drawer as opposed to
> +a #+property: line in the zeroth section? I may come around on this at some point,
> +but right now it seems more complex, however it might actually be more consistent
> +if we imagine the zeroth section as being nested inside a single heading that has
> +level zero implicitly at the top of a document. Unfortunately that means that such
> +property drawers cannot be determined from a homogeneous syntax but instead require
> +some operations on the internal representation. Note also that if this were allowed
> +then the property drawer should only be allowed as the very first line of a file
> +because newlines at the start of a file need to be preserved. More though required.]

The statement about property drawers in first section (that how we refer
to it in org-element) is correct. First section and its property drawer
location is special.

I agree that it's inconsistent with normal property drawers. However, we
cannot change it without breaking existing Org files. It we decide to
change syntax in this area, we should think carefully about possible
consequences.

> + [fn::Without going into to much detail, affiliated keywords should
> +not be distinguished from other keywords at the level of the syntax.
> +The fact that they are is an artifact of the elisp implementation.
> +The determination of the behavior of a keyword with regard to
> +affiliating behavior should be determined in a later pass, even if in
> +some cases some implementations may want to materialize them into the
> +parser for performance reasons. Allowing users to promote a keyword to
> +be an affiliated keyword would be incredibly powerful for attaching
> +metadata to parts of org-files in a way that is user extensible. It
> +may still be desirable to describe the behavior of affiliated keywords
> +here, but they are not in any way distinct from other keywords at the
> +level of org syntax and trying to implement them as such is usually a
> +mistake (that I have made).]

I generally support this idea. Handling keywords in org-element is not
pretty. Having them in the parse tree would make things easier. However,
we again need to consider back-compatibility. I can imagine third-party
ox-* packages breaking if we make this change - we should double check
if we decide to change this.

> +property of the element they apply to. [fn::While it is tempting to try
> +to do this at the level of the grammar it induces a number of nasty
> +ambiguities in practice. It is saner to have a single unified keyword
> +syntax and then to determine affiliation behavior in a later pass.]

Yes, it is saner. However, our syntax document is supposed to be
human-readable description of what org-element does. We cannot introduce
differences between grammar document and de-facto parser implementation.
This will defeat the purpose to providing reference syntax - we will get
inconsistency between Emacs Org mode and external parsers.
  
> +  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn::
> +  All keywords should allow OPTVAL, it regularizes and simplifies the syntax.]

I support this idea.

> + [fn:: ~:end:~ may be capitalized (legacy support)]

Both :END: and :end: are supported by Org parser. What do you mean by
legacy?

> + [fn::I suggest that we remove inlinetasks from this document.
> +They are a hack that cannot be implemented as part of a grammar
> +because they require a concrete value to be specified which breaks
> +the arbitrary nesting depth of headings. I think I wrote this somewhere
> +else as well, but inline tasks can only be a layer on top of headings,
> +they cannot displace them.]

I disagree. inilinetasks are a part of syntax de facto and they can be
encountered in Org documents in the wild. If you treat inlinetasks as
ordinary headings, things may be broken unpredictably during parsing.

Instead, we may consider making inlinetask level constant.

> +indicate that it should, which is misleading. Further, it is actually
> +not possible to implement contents as specified because grammars
> +cannot track the indentation level that is required to reconstruct
> +list items correctly. Therefore CONTENTS should not be defined as such
> +but should only specify that they can be anything except a newline. I
> +think that the intent of this document is somewhat a conflation of the
> +syntax for org and of the semantics as determined by export backends
> +and/or org-element, however it makes it extremely confusing because it
> +is not actually possible to parse CONTENTS, they must be reconstructed
> +from the parse tree.]

Could you elaborate why grammars cannot track the indentation level?
AFAIU, If it were the case, python would not be parseable.

> + [fn::The failure mode for malformed contents needs to be
> +determined more clearly here. We don't want property draws to suddenly
> +become plain drawers just because a user has a malformed line, that
> +could be disastrous if certain settings in the property drawer mask
> +settings from further up the tree.  In short, malformed contents
> +should not poison the whole property drawer.]

Yet, it is exactly what happens in Org. malformed property drawers will
become ordinary drawers.

>      + SWITCHES :: Any number of SWITCH patterns, separated by a single
> -      space character
> +      space character [fn::For the love of all that is sane can we
> +      please just remove this from the spec or mark it as legacy.]

I support this idea.

> +PLANNING must directly follow HEADING without any blank lines in between. 
> +
> + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]

The last one wins (as in org-element-planning-parser)

> + [fn::As I think I mention elsewhere, the concrete names here
> +should NOT be part of the syntax, it makes the parser brittle
> +and hard to maintain. Differentiation between entities and fragments
> +should be handled at the syntax level for cases where the fragment
> +has brackets, and then in a second pass for values that are
> +syntactically entity-or-fragment and must be determined after
> +the fact.]

How would you define entities object then? First/second pass is an
implementation detail. Our current description follows how org-element
handles entities.

> + [fn::We probably want to node that BACKEND can be the empty string
> +per that thread on how to deal with intra-word markup. Again this
> +also touches on the general principle of wanting to close over the
> +empty string so that users aren't surprised when ~@@:lol@@~ suddenly
> +appears in plain text just because no backend was specified.]

While I am not opposing the idea, your principle is not followed by
org-element parser. We may consider changing it, but it is again a whole
separate discussion where we need to consider pros and cons.

>  Note that the first pattern may not occur on an /unindented/ line, as it
> -is then a [[#Footnote_Definitions][footnote definition]].
> +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right?
> +the font locking code is not consistent with actual behavior, need to
> +review the laundry test cases and example files.]

Do not look at font-locking. You can safely consider that fontification
is wrong in all non-trivial cases. Always check org-element-at-point and
org-element-context.
  
> -  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]].
> +  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in
> +  macros, I think it would be safe to add ~\|~ as an escape character.
> +  The issue in the elisp implementation is not actually at the level
> +  of the syntax, but is actually in the export backends or somewhere
> +  deeper, because even using a macro that expands to be a pipe ~|~
> +  breaks the table (which is really bad).]

I am not sure if it is needed. We can already to \vert

> + [fn::I have some suggestions for extensions to timestamp syntax to
> +support historical and far future dates, as well timezone offsets (NOT
> +the 3 letter ambiguous disaster) and seconds and sub-second times.]

That would be welcome, but someone™ should implement timezone support in
Elisp level. We have several discussions about this in the past.

> +The four =*/_+= may be arbitrarily nested to any depth. Verbatim and
> +code ==~= may be nested inside any other markup, but no other markup
> +will be interpreted inside of them since they are interpreted exactly.

That's not accurate. you cannot nest, say, bold inside bold. You cannot
put code inside any other markup freely: consider *bold =asd*asd= not bold*

Best,
Ihor


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-01-18 13:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-09 18:02 Org Syntax Specification Timothy
2022-01-15 12:40 ` Sébastien Miquel
2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
2022-01-16  8:08     ` Sébastien Miquel
2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
2022-01-16  9:46       ` Colin Baxter 😺
2022-01-16 11:11         ` Tim Cross
2022-01-16 13:26         ` Juan Manuel Macías
2022-01-16 14:43           ` Colin Baxter 😺
2022-01-16 15:16             ` Greg Minshall
2022-01-16 17:45         ` Rudolf Adamkovič
2022-01-16 12:10     ` Eric S Fraga
2022-01-16 14:30       ` Anthony Cowley
2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
2022-01-18 12:09   ` Ihor Radchenko

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).