* The less ambiguous math delimiters in tables
@ 2024-12-24 9:20 Rudolf Adamkovič
2024-12-24 9:25 ` Ihor Radchenko
2024-12-24 9:50 ` Rudolf Adamkovič
0 siblings, 2 replies; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-24 9:20 UTC (permalink / raw)
To: emacs-orgmode
We know that
\(...\) math delimiters are superior to $...$
in that
they remove parsing ambiguities.
So, I updated my notes to the new delimiters, but
Org still struggles with | within \(...\) in tables,
such as
| \(|x|\) |
Is this a feature or a bug?
I was hoping to reap the benefits of \(...\) in this case.
P.S. I know about '\vert' in LaTeX.
Rudy
--
"Programming reliably -- must be an activity of an undeniably
mathematical nature […] You see, mathematics is about thinking, and
doing mathematics is always trying to think as well as possible."
--- Edsger W. Dijkstra, 1981
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-24 9:20 The less ambiguous math delimiters in tables Rudolf Adamkovič
@ 2024-12-24 9:25 ` Ihor Radchenko
2024-12-25 17:56 ` Rudolf Adamkovič
2024-12-24 9:50 ` Rudolf Adamkovič
1 sibling, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-24 9:25 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
> Org still struggles with | within \(...\) in tables,
>
> such as
>
> | \(|x|\) |
>
> Is this a feature or a bug?
It is a syntax limitation.
Org parser is outer-inner - the table row is parsed first.
So, | are unconditionally used as table delimiters, *before* verbatim
LaTeX markup is recognized.
\vert is one possible workaround, but it does not work inside verbatim
text.
The only sane way would be adding some kind of alternative table
delimiter syntax to aid the situations like in your example.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-24 9:25 ` Ihor Radchenko
@ 2024-12-25 17:56 ` Rudolf Adamkovič
2024-12-25 18:05 ` Ihor Radchenko
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-25 17:56 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> It is a syntax limitation.
> Org parser is outer-inner - the table row is parsed first.
> So, | are unconditionally used as table delimiters, *before* verbatim
> LaTeX markup is recognized.
How is that not a parser bug?
I thought that, with the noisy \(...\) delimiters,
the parser finally has enough information
to not screw up.
P.S. It is impossible to write, e.g. \|, which is vector length (norm).
Rudy
--
"I would prefer an intelligent hell to a stupid paradise."
--- Blaise Pascal
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-25 17:56 ` Rudolf Adamkovič
@ 2024-12-25 18:05 ` Ihor Radchenko
2024-12-25 20:14 ` Rudolf Adamkovič
2024-12-27 17:17 ` Leo Butler
2024-12-28 16:39 ` Max Nikulin
2 siblings, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-25 18:05 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> It is a syntax limitation.
>> Org parser is outer-inner - the table row is parsed first.
>> So, | are unconditionally used as table delimiters, *before* verbatim
>> LaTeX markup is recognized.
>
> How is that not a parser bug?
>
> I thought that, with the noisy \(...\) delimiters,
>
> the parser finally has enough information
>
> to not screw up.
Your example shows ambiguous markup that can be
interpreted in multiple ways:
1. <begin cell> \(<end cell><begin cell>x<end cell><begin cell>\) <end cell>
2. <begin cell> <begin latex>|x|<end latex> <end cell>
Org parser chooses one. It has to choose some.
Org parser also chooses a simpler interpretation that does not require
backtracking.
> P.S. It is impossible to write, e.g. \|, which is vector length (norm).
Yes, it is impossible. That's why I call it markup limitation.
We need to improve the markup to address this known problematic scenario.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-25 18:05 ` Ihor Radchenko
@ 2024-12-25 20:14 ` Rudolf Adamkovič
2024-12-26 9:17 ` Ihor Radchenko
0 siblings, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-25 20:14 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> Your example shows ambiguous markup that can be
> interpreted in multiple ways:
>
> 1. <begin cell> \(<end cell><begin cell>x<end cell><begin cell>\) <end cell>
> 2. <begin cell> <begin latex>|x|<end latex> <end cell>
>
> Org parser chooses one. It has to choose some.
> Org parser also chooses a simpler interpretation that does not require
> backtracking.
But (2) is a *much, much, much* better choice (for the user).
How often does a table row contain
- a sole '\(` in one cell and
- a sole '\)` in another cell?
Virtually never [excluding verbatim/code, which is a different problem.]
How often does LaTeX include `|'? Often:
- absolute value,
- parallel lines,
- vector length,
- set cardinality,
- various norms,
on and on.
The use starts at elementary school level, so
it is like Org reserving the minus sign in LaTeX for itself,
which would be a similar usability disaster.
Rudy
--
"It is no paradox to say that in our most theoretical moods we may be
nearest to our most practical applications." --- Alfred North
Whitehead, 1861-1947
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-25 20:14 ` Rudolf Adamkovič
@ 2024-12-26 9:17 ` Ihor Radchenko
2024-12-26 13:31 ` Rudolf Adamkovič
0 siblings, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-26 9:17 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> 1. <begin cell> \(<end cell><begin cell>x<end cell><begin cell>\) <end cell>
>> 2. <begin cell> <begin latex>|x|<end latex> <end cell>
>>
>> Org parser chooses one. It has to choose some.
>> Org parser also chooses a simpler interpretation that does not require
>> backtracking.
>
> But (2) is a *much, much, much* better choice (for the user).
Maybe, but it is also much more complex in terms of parser.
Backtracking will introduce non-linear complexity to the parser,
degrading the performance significantly. It will also make Org syntax
much, much harder in more complex cases - there will still be
ambiguities when you have more than 2 interpretations: e.g.
| \(|x|\) | \(|x|\) |
this one has 3 possibilities:
1. <cell> \(</cell><cell>x...
2. <cell> <latex>|x|\) | \(|x|</latex> </cell>
3. <cell> <latex>|x|</latex> </cell><cell> <latex>|x|</latex> </cell>
And there will be similar situations with even more possibilities. In
fact, the number of ambiguous alternatives can blow up pretty quickly
when the text is complex enough combination of literal and non-literal
markups.
In any case, the way Org parser works in this example is one of the most
fundamental design decisions in the Org markup. We cannot change it at
this point without breaking all the historical documents + third-party
parsers. That's why I am talking about providing markup extension to
address the issue rather than altering the existing parser fundamentals.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-26 9:17 ` Ihor Radchenko
@ 2024-12-26 13:31 ` Rudolf Adamkovič
2024-12-26 13:51 ` Ihor Radchenko
0 siblings, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-26 13:31 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> Maybe, but it is also much more complex in terms of parser.
> Backtracking will introduce non-linear complexity to the parser,
> degrading the performance significantly.
Is that so? I thought it is all about simple precedence rules. In this
case, once the parser finds the opening \(, it interprets everything as
LaTeX, until it finds the closing \).
> It will also make Org syntax much, much harder in more complex cases -
> there will still be ambiguities when you have more than 2
> interpretations: e.g.
>
> | \(|x|\) | \(|x|\) |
>
> this one has 3 possibilities:
>
> 1. <cell> \(</cell><cell>x...
> 2. <cell> <latex>|x|\) | \(|x|</latex> </cell>
> 3. <cell> <latex>|x|</latex> </cell><cell> <latex>|x|</latex> </cell>
Again, if \(...\) has a higher precedence than the table |, then this is
not a problem. There is no ambiguity, right?
> We cannot change it at this point without breaking all the historical
> documents + third-party parsers. That's why I am talking about
> providing markup extension to address the issue rather than altering
> the existing parser fundamentals.
It would only break the documents that have one-sided \( or \) in the
cells of the same column, no? And that is ... virtually never?
Rudy
--
"I do not fear death. I had been dead for billions and billions of years
before I was born, and had not suffered the slightest inconvenience from it."
--- Mark Twain, paraphrased
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-26 13:31 ` Rudolf Adamkovič
@ 2024-12-26 13:51 ` Ihor Radchenko
2024-12-27 13:23 ` Rudolf Adamkovič
0 siblings, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-26 13:51 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> Maybe, but it is also much more complex in terms of parser.
>> Backtracking will introduce non-linear complexity to the parser,
>> degrading the performance significantly.
>
> Is that so? I thought it is all about simple precedence rules. In this
> case, once the parser finds the opening \(, it interprets everything as
> LaTeX, until it finds the closing \).
Yes, it is. Please check how Org parser works in
`org-element--parse-elements' and `org-element--parse-objects'.
>> We cannot change it at this point without breaking all the historical
>> documents + third-party parsers. That's why I am talking about
>> providing markup extension to address the issue rather than altering
>> the existing parser fundamentals.
>
> It would only break the documents that have one-sided \( or \) in the
> cells of the same column, no? And that is ... virtually never?
In a theoretical case if we agree to what you are suggesting, it should
not be just for tables. There are similar cases with other markup, like
*foo =* *= bar*
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-26 13:51 ` Ihor Radchenko
@ 2024-12-27 13:23 ` Rudolf Adamkovič
2024-12-27 13:35 ` Ihor Radchenko
0 siblings, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-27 13:23 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> In a theoretical case if we agree to what you are suggesting, it should
> not be just for tables. There are similar cases with other markup, like
>
> *foo =* *= bar*
Agreed! We could introduce a kind of escaping that means "this MUST be
interpreted as markup" and/or "this MUST NOT be interpreted as markup",
but that could lead to documents that are hard to read for humans. Or,
we could add structured markup that is unambiguous and takes precedence
over all unstructured markup. For example:
emphasis{...}
verbatim{...}
table[...]{...}
src[...]{...}
The last one already exists. :)
That said, as for my original problem, I still think that \(...\) should
take precedence over |. Even if we added structured latex{...} markup,
it should not be necessary in my case, as Org should not severely break
basic LaTeX within tables in the first place.
Rudy
--
"All you have to do is write one true sentence. Write the truest
sentence that you know." --- Ernest Miller Hemingway (1899-1961)
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-27 13:23 ` Rudolf Adamkovič
@ 2024-12-27 13:35 ` Ihor Radchenko
2024-12-30 21:02 ` Rudolf Adamkovič
0 siblings, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-27 13:35 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> In a theoretical case if we agree to what you are suggesting, it should
>> not be just for tables. There are similar cases with other markup, like
>>
>> *foo =* *= bar*
>
> Agreed! We could introduce a kind of escaping that means "this MUST be
> interpreted as markup" and/or "this MUST NOT be interpreted as markup",
> but that could lead to documents that are hard to read for humans. Or,
> we could add structured markup that is unambiguous and takes precedence
> over all unstructured markup. For example:
>
> emphasis{...}
> verbatim{...}
> table[...]{...}
> src[...]{...}
>
> The last one already exists. :)
See https://list.orgmode.org/875xwqj4tl.fsf@localhost/
> That said, as for my original problem, I still think that \(...\) should
> take precedence over |. Even if we added structured latex{...} markup,
> it should not be necessary in my case, as Org should not severely break
> basic LaTeX within tables in the first place.
Sorry, but no.
Basically, what you propose is a rabbit hole that will introduce new
parser bugs and, worse, new systematic problems with syntax.
For context, I proposed similar ideas to Nicolas, the author of
org-element parser, in the past, and he rejected them firmly.
FYI, my approach to solve this problem is different - I want
(eventually) to allow some kind of alternative syntax for tables that
will allow bypassing similar situations. For example, we can allow
multiple || to serve as delimiters:
| this | is | a | normal | table | row |
|| here || we || allow || verbatim "|" || inside || ||
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-27 13:35 ` Ihor Radchenko
@ 2024-12-30 21:02 ` Rudolf Adamkovič
2024-12-31 17:47 ` Ihor Radchenko
0 siblings, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-30 21:02 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> FYI, my approach to solve this problem is different - I want
> (eventually) to allow some kind of alternative syntax for tables that
> will allow bypassing similar situations. For example, we can allow
> multiple || to serve as delimiters:
>
> | this | is | a | normal | table | row |
> || here || we || allow || verbatim "|" || inside || ||
Your idea is not that different from what I proposed, except that it is
a less general, in that you special-case one particular problem with
tables. Or, would the "doubling escapes" work for all markup? Still,
that would not solve other table-related problems, such as
| =foo= and =bar= |,
where the problem is = and not |, right?
As for "parser bugs", I find it hard to believe that we could not come
up with some unambiguous sequence of characters, even if ugly, to
instruct the parser about precedence.
Rudy
--
"It is no paradox to say that in our most theoretical moods we may be
nearest to our most practical applications."
--- Alfred North Whitehead, 1861-1947
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-30 21:02 ` Rudolf Adamkovič
@ 2024-12-31 17:47 ` Ihor Radchenko
0 siblings, 0 replies; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-31 17:47 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> | this | is | a | normal | table | row |
>> || here || we || allow || verbatim "|" || inside || ||
>
> Your idea is not that different from what I proposed, except that it is
> a less general, in that you special-case one particular problem with
> tables. Or, would the "doubling escapes" work for all markup?
Yes, all the markup.
> ... Still,
> that would not solve other table-related problems, such as
>
> | =foo= and =bar= |,
>
> where the problem is = and not |, right?
What exactly is the problem in your example?
> As for "parser bugs", I find it hard to believe that we could not come
> up with some unambiguous sequence of characters, even if ugly, to
> instruct the parser about precedence.
We can. For example, via inline special blocks. Or via "repetitive
marker" syntax I suggested above.
And coming up with ugly syntax is worse than finding a good one.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-25 17:56 ` Rudolf Adamkovič
2024-12-25 18:05 ` Ihor Radchenko
@ 2024-12-27 17:17 ` Leo Butler
2024-12-28 16:39 ` Max Nikulin
2 siblings, 0 replies; 25+ messages in thread
From: Leo Butler @ 2024-12-27 17:17 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: Ihor Radchenko, emacs-orgmode@gnu.org
On Wed, Dec 25 2024, Rudolf Adamkovič <rudolf@adamkovic.org> wrote:
> Ihor Radchenko <yantar92@posteo.net> writes:
>
>> It is a syntax limitation.
>> Org parser is outer-inner - the table row is parsed first.
>> So, | are unconditionally used as table delimiters, *before* verbatim
>> LaTeX markup is recognized.
>
> How is that not a parser bug?
>
> I thought that, with the noisy \(...\) delimiters,
>
> the parser finally has enough information
>
> to not screw up.
>
> P.S. It is impossible to write, e.g. \|, which is vector length (norm).
Rudy,
Since you are using Org -> LaTeX, I suggest using something like the
following:
| \([x]\) |
Depending on the flavour of LaTeX you use, you can use a package to
typeset the left/right square brackets as |, e.g.
\newunicodechar{[}{|}
\newunicodechar{]}{|}
Leo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-25 17:56 ` Rudolf Adamkovič
2024-12-25 18:05 ` Ihor Radchenko
2024-12-27 17:17 ` Leo Butler
@ 2024-12-28 16:39 ` Max Nikulin
2024-12-30 21:25 ` Rudolf Adamkovič
2024-12-31 5:43 ` Ihor Radchenko
2 siblings, 2 replies; 25+ messages in thread
From: Max Nikulin @ 2024-12-28 16:39 UTC (permalink / raw)
To: emacs-orgmode
On 26/12/2024 00:56, Rudolf Adamkovič wrote:
> I thought that, with the noisy \(...\) delimiters,
GitHub users faced ambiguities with $...$ delimiters as well, so do not
be upset by "noisy" syntax. Alternatives may be painful as well.
<https://nschloe.github.io/2022/06/27/math-on-github-follow-up.html#-vs-dollar-bugs>
I have no idea how expensive will be a parser that handles more cases
with balanced delimiters. E.g. pandoc uses another approach:
printf '%s\n' '| \(|x|\) | \(|x|\) |' | pandoc -f org -t latex
\begin{longtable}[]{@{}ll@{}}
\toprule
\endhead
\(|x|\) & \(|x|\) \\
\bottomrule
\end{longtable}
Likely you would be unhappy if some of you document were exported in a
different way due to change of parsing rules. On the other hand, I do
not have a collection of pitfalls for pandoc.
An extensive test suite is necessary to consider alternatives for
parsing rules.
Current logic may be roughly describes as the following. When Org
recognizes start of some element, it tries to find its end, mostly
neglecting opening markers. A fragment is parsed for nested elements
*after* boundaries of the parent element are determined.
The parser ignores latex fragments when it splits table row into cells.
It ignores link markers when it tries to find where emphasis terminates,
etc.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-28 16:39 ` Max Nikulin
@ 2024-12-30 21:25 ` Rudolf Adamkovič
2025-01-02 17:02 ` Max Nikulin
2024-12-31 5:43 ` Ihor Radchenko
1 sibling, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-30 21:25 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> E.g. pandoc uses another approach:
>
> printf '%s\n' '| \(|x|\) | \(|x|\) |' | pandoc -f org -t latex
> \begin{longtable}[]{@{}ll@{}}
> \toprule
> \endhead
> \(|x|\) & \(|x|\) \\
> \bottomrule
> \end{longtable}
>
> Likely you would be unhappy if some of you document were exported in a
> different way due to change of parsing rules. On the other hand, I do
> not have a collection of pitfalls for pandoc.
That Pandoc output looks correct to me. Is there a gotcha I do not see?
If not, then that is exactly what I would expect from the Org parser.
> An extensive test suite is necessary to consider alternatives for
> parsing rules.
Yes, that much is given. Without an extensive test suite, working on a
parser would be nothing but a waste of time, and the end result would
be, at least for us humans, an infinite stream of bugs.
> Current logic may be roughly describes as the following. When Org
> recognizes start of some element, it tries to find its end, mostly
> neglecting opening markers. A fragment is parsed for nested elements
> *after* boundaries of the parent element are determined.
Honestly? That sounds like a wrong approach to parsing. (And if that
is the case, then that could explain why I keep fighting the Org parser
on a daily basis, compared to practically never in every other language
I use.)
Rudy
--
"We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time"
--- T. S. Eliot, Little Gidding, Four Quarters, 1943
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-30 21:25 ` Rudolf Adamkovič
@ 2025-01-02 17:02 ` Max Nikulin
0 siblings, 0 replies; 25+ messages in thread
From: Max Nikulin @ 2025-01-02 17:02 UTC (permalink / raw)
To: emacs-orgmode
On 31/12/2024 04:25, Rudolf Adamkovič wrote:
> Max Nikulin writes:
>> An extensive test suite is necessary to consider alternatives for
>> parsing rules.
>
> Yes, that much is given. Without an extensive test suite, working on a
> parser would be nothing but a waste of time, and the end result would
> be, at least for us humans, an infinite stream of bugs.
I do not say that there is no tests, but to change the parser almost
certainly much more corner cases should be added. The format should be
suitable for non-elisp tools.
Ihor Radchenko. [PATCH] org-test: Create a collaborative test set for
Org buffer parser. Sat, 11 Dec 2021 22:39:07 +0800.
<https://list.orgmode.org/87fsqzi4tw.fsf@localhost>
>> Current logic may be roughly describes as the following. When Org
>> recognizes start of some element, it tries to find its end, mostly
>> neglecting opening markers. A fragment is parsed for nested elements
>> *after* boundaries of the parent element are determined.
>
> Honestly? That sounds like a wrong approach to parsing. (And if that
> is the case, then that could explain why I keep fighting the Org parser
> on a daily basis, compared to practically never in every other language
> I use.)
I expect that pandoc-like parser may reduce number of pitfalls. However,
having no experience with parsers, I will unlikely try to implement it.
I would not be surprised if it is unfeasible without some parser
generator since it should be bottom-up one. I have no idea how to
collect real life cases when current behavior is better. Otherwise it is
hard to estimate degree of disaster due to breaking change.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-28 16:39 ` Max Nikulin
2024-12-30 21:25 ` Rudolf Adamkovič
@ 2024-12-31 5:43 ` Ihor Radchenko
2024-12-31 12:20 ` Rudolf Adamkovič
1 sibling, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-31 5:43 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> printf '%s\n' '| \(|x|\) | \(|x|\) |' | pandoc -f org -t latex
> \begin{longtable}[]{@{}ll@{}}
> \toprule
> \endhead
> \(|x|\) & \(|x|\) \\
> \bottomrule
> \end{longtable}
>
> Likely you would be unhappy if some of you document were exported in a
> different way due to change of parsing rules. On the other hand, I do
> not have a collection of pitfalls for pandoc.
These edge cases come in pairs:
printf '%s\n' '| \(first cell | mid | last\) |' | pandoc -f org -t latex
\begin{longtable}[]{@{}l@{}}
\toprule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
\(first cell | mid | last\) \\
\end{longtable}
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-31 5:43 ` Ihor Radchenko
@ 2024-12-31 12:20 ` Rudolf Adamkovič
2024-12-31 13:32 ` Ihor Radchenko
0 siblings, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-31 12:20 UTC (permalink / raw)
To: Ihor Radchenko, Max Nikulin; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> These edge cases come in pairs:
>
> printf '%s\n' '| \(first cell | mid | last\) |' | pandoc -f org -t latex
> \begin{longtable}[]{@{}l@{}}
> \toprule\noalign{}
> \endhead
> \bottomrule\noalign{}
> \endlastfoot
> \(first cell | mid | last\) \\
> \end{longtable}
But how many Org tables exist in the world that have a row with an
un-closed \( in one cell and an un-closed \) in a subsequent cell?
About zero, would be my guess. :) Yet, there exist billions, if not
trillions, LaTeX fragments containing |, and many of them are located in
tables. We optimize for the former, that is a super-ultra-rare edge
case, if it even exists, instead of the common use case.
Rudy
--
"Programming reliably -- must be an activity of an undeniably
mathematical nature […] You see, mathematics is about thinking, and
doing mathematics is always trying to think as well as possible."
--- Edsger W. Dijkstra, 1981
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-31 12:20 ` Rudolf Adamkovič
@ 2024-12-31 13:32 ` Ihor Radchenko
2024-12-31 15:57 ` Rudolf Adamkovič
2025-01-02 17:20 ` Max Nikulin
0 siblings, 2 replies; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-31 13:32 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: Max Nikulin, emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> ... | \(first cell | mid | last\) |
>
> But how many Org tables exist in the world that have a row with an
> un-closed \( in one cell and an un-closed \) in a subsequent cell?
It is not just about \(...\). Consider other verbatim Org markup like
| $10 | foo | $\alpha$ |
or
| =10+20 | 30 | =foo= |
I strongly encourage you to look into Org parser implementation and
think what is needed to achieve what you want. And then think again,
considering more complex scenarios when we have ambiguity between 3 or
more markup pairs. The idea you propose is very tempting (I have been
tempted myself), but that way lays madness.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-31 13:32 ` Ihor Radchenko
@ 2024-12-31 15:57 ` Rudolf Adamkovič
2024-12-31 16:46 ` Ihor Radchenko
2025-01-02 17:20 ` Max Nikulin
1 sibling, 1 reply; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-31 15:57 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> It is not just about \(...\).
Of course, and I agree that a more general mechanism is needed to solve
all precedence problems, be it in tables, or (better) in general. But,
as I said before, even if we had such a mechanism, it should not be
necessary to use it for \(...\) in tables, as there is no ambiguity, and
so no "madness", no? Does that make sense? Or, am I mistaken?
Rudy
--
"It is far better to have a question that can't be answered than an
answer that can't be questioned." --- Carl Sagan
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-31 15:57 ` Rudolf Adamkovič
@ 2024-12-31 16:46 ` Ihor Radchenko
0 siblings, 0 replies; 25+ messages in thread
From: Ihor Radchenko @ 2024-12-31 16:46 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: Max Nikulin, emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
>> It is not just about \(...\).
>
> Of course, and I agree that a more general mechanism is needed to solve
> all precedence problems, be it in tables, or (better) in general. But,
> as I said before, even if we had such a mechanism, it should not be
> necessary to use it for \(...\) in tables, as there is no ambiguity, and
> so no "madness", no? Does that make sense? Or, am I mistaken?
IMHO, making special cases just for certain types of syntax is worse
than universal approach.
Imagine that \(...\) takes precedence as you say. Then, users may look
at it and assume that everything else works the same. And get confused
when it does not. Or an opposite - get used to how the rest of markup
works just to find out that \(...\) is suddenly different.
And that it aside from complicating the parser *a lot*. It is not at all
trivial to implement what you want.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-31 13:32 ` Ihor Radchenko
2024-12-31 15:57 ` Rudolf Adamkovič
@ 2025-01-02 17:20 ` Max Nikulin
2025-01-02 17:32 ` Ihor Radchenko
1 sibling, 1 reply; 25+ messages in thread
From: Max Nikulin @ 2025-01-02 17:20 UTC (permalink / raw)
To: emacs-orgmode
On 31/12/2024 20:32, Ihor Radchenko wrote:
> | $10 | foo | $\alpha$ |
pandoc result matches my expectation:
\$10 & foo & \(\alpha\) \\
> | =10+20 | 30 | =foo= |
I found significantly more convincing cases with current parser when
emphasis end marker accidentally appears in the middle of a link.
=something in a table looks like a formula imported from a spreadsheet.
In the case of "native" Org table, this form is unlikely for input of
some code block, so zero-width space may be used.
I admit that changing parser is not an easy task and I am aware that
Nicolas is strongly if favor of current approach.
Nicolas Goaziou. c47b535bb origin/main org-element: Remove dependency on
‘org-emphasis-regexp-components’
Thu, 18 Nov 2021 13:35:19 +0100.
<https://list.orgmode.org/87y25l8wvs.fsf@nicolasgoaziou.fr/>
> I disagree. Priority should be given to the first object being started.
> This is, IMO, the only sane way to handle syntax.
Nicolas Goaziou. org parser and priorities of inline elements.
Sat, 27 Nov 2021 20:02:31 +0100.
<https://list.orgmode.org/87mtlppgl4.fsf@nicolasgoaziou.fr/>
> I don't see any incentive to change the order objects are parsed, once
> you know how Org does it. This is just a red herring. What is useful,
> however, is to fontify them the way Org sees them.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2025-01-02 17:20 ` Max Nikulin
@ 2025-01-02 17:32 ` Ihor Radchenko
2025-01-03 15:14 ` Max Nikulin
0 siblings, 1 reply; 25+ messages in thread
From: Ihor Radchenko @ 2025-01-02 17:32 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 31/12/2024 20:32, Ihor Radchenko wrote:
>> | $10 | foo | $\alpha$ |
>
> pandoc result matches my expectation:
>
> \$10 & foo & \(\alpha\) \\
How so?
I just tried
printf '%s\n' '| =10+20 | 30 | =foo= |' | pandoc -f org -t latex
\begin{longtable}[]{@{}l@{}}
\toprule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
\texttt{10+20\ \textbar{}\ 30\ \textbar{}\ =foo} \\
\end{longtable}
The table cells are not there.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2025-01-02 17:32 ` Ihor Radchenko
@ 2025-01-03 15:14 ` Max Nikulin
0 siblings, 0 replies; 25+ messages in thread
From: Max Nikulin @ 2025-01-03 15:14 UTC (permalink / raw)
To: emacs-orgmode
On 03/01/2025 00:32, Ihor Radchenko wrote:
> Max Nikulin writes:
>
>> On 31/12/2024 20:32, Ihor Radchenko wrote:
>>> | $10 | foo | $\alpha$ |
>>
>> pandoc result matches my expectation:
>>
>> \$10 & foo & \(\alpha\) \\
>
> How so?
> I just tried
Sorry, I failed to express that the above example is handled perfectly
by pandoc, while the following one I find unrealistic
> printf '%s\n' '| =10+20 | 30 | =foo= |' | pandoc -f org -t latex
I can not imagine when this construct may emerge in real life. Parsing
with single cell is questionable, is a kind of pitfall, but still may be
considered as valid.
Perhaps the following on is better
| ~0.001 | ~func3(x)~ |
but there are workarounds like
| \(~0.001\) | ~func3(x)~ |
On the other hand current approach has already known pitfalls (omitting
links to mailing list messages)
Descriptive list
- goal :: src_haskell[:exports code]{monoidBSFold :: FilePath -> IO
Counts}
- term :: [[https://example.org][Bread :: Crumbs]]
Verbatim inside emphasis
- A _b =c_ d= e_ f
Links (a workaround is to break emphasis on link borders)
- /lorem [[https://ormode.org/?oops=1][ipsum]] dolor/
- /lorem [[https://orgmode.org/,oops][ipsum]] dolor/
The same with macro
#+macro: Pathname =$1=
/Open the file {{{Pathname(~/.bashrc)}}}/
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The less ambiguous math delimiters in tables
2024-12-24 9:20 The less ambiguous math delimiters in tables Rudolf Adamkovič
2024-12-24 9:25 ` Ihor Radchenko
@ 2024-12-24 9:50 ` Rudolf Adamkovič
1 sibling, 0 replies; 25+ messages in thread
From: Rudolf Adamkovič @ 2024-12-24 9:50 UTC (permalink / raw)
To: emacs-orgmode
Rudolf Adamkovič <rudolf@adamkovic.org> writes:
> P.S. I know about '\vert' in LaTeX.
But, I was not able to figure out how to put
\(\|\vec{u}\|\)
into a table.
[The \| sequence typesets as || but with different spacing than \vert\vert.]
Rudy
--
"Logic is a science of the necessary laws of thought, without which no
employment of the understanding and the reason takes place."
--- Immanuel Kant, 1785
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-01-03 15:15 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-24 9:20 The less ambiguous math delimiters in tables Rudolf Adamkovič
2024-12-24 9:25 ` Ihor Radchenko
2024-12-25 17:56 ` Rudolf Adamkovič
2024-12-25 18:05 ` Ihor Radchenko
2024-12-25 20:14 ` Rudolf Adamkovič
2024-12-26 9:17 ` Ihor Radchenko
2024-12-26 13:31 ` Rudolf Adamkovič
2024-12-26 13:51 ` Ihor Radchenko
2024-12-27 13:23 ` Rudolf Adamkovič
2024-12-27 13:35 ` Ihor Radchenko
2024-12-30 21:02 ` Rudolf Adamkovič
2024-12-31 17:47 ` Ihor Radchenko
2024-12-27 17:17 ` Leo Butler
2024-12-28 16:39 ` Max Nikulin
2024-12-30 21:25 ` Rudolf Adamkovič
2025-01-02 17:02 ` Max Nikulin
2024-12-31 5:43 ` Ihor Radchenko
2024-12-31 12:20 ` Rudolf Adamkovič
2024-12-31 13:32 ` Ihor Radchenko
2024-12-31 15:57 ` Rudolf Adamkovič
2024-12-31 16:46 ` Ihor Radchenko
2025-01-02 17:20 ` Max Nikulin
2025-01-02 17:32 ` Ihor Radchenko
2025-01-03 15:14 ` Max Nikulin
2024-12-24 9:50 ` Rudolf Adamkovič
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.