* Inline markup: How does org identify nested code/verbatim?
@ 2023-01-29 18:05 c.buhtz
2023-01-29 18:20 ` c.buhtz
2023-01-30 14:56 ` Ihor Radchenko
0 siblings, 2 replies; 5+ messages in thread
From: c.buhtz @ 2023-01-29 18:05 UTC (permalink / raw)
To: emacs-orgmode
Hi folks,
this is a question about org(mode) development itself.
It is magic to me how you do this. ;) And I would like to learn it
because I do write kind of an org parser in Python.
Here is a nested code-in-verbatim text.
This =is ~code~ in verbatim= text.
Exporting this to html (via org-html-export-as-html)
This <code>is ~code~ in verbatim</code> text.
Awsome! :D
The point is myself I'm able to identify code or verbatim with regex
including three catch groups for the content before, between and
after the inline markers.
for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
for code: "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
But they don't work together. In the example above I need to use the
verbatim regex first to make it right.
If I would use the code regex first it wouldn't work because it would
find the ~code~ but without knowing that it is surrounded by ~verbatim~.
I don't know what my users inputs to my software: verbatim in code or
code in verbatim. So I have to figure out which regex to use first.
How does org solve this problem? I don't need a full working solution
but just an idea.
One approach in my mind is to run both regex separate and then compare
the results "somehow":
Verbatim: ['This', ' ', 'is ~code~ in verbatim', ' ', 'text.']
Code : ['This =is', ' ', 'code', ' ', 'in verbatim= text.']
"Somehow"!
Another approach in my mind is to do something I would call nested
regex. Constructing a regex pattern looking for verbatim with code in
it. And the other way around of course.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inline markup: How does org identify nested code/verbatim?
2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
@ 2023-01-29 18:20 ` c.buhtz
2023-01-30 2:29 ` Max Nikulin
2023-01-30 14:56 ` Ihor Radchenko
1 sibling, 1 reply; 5+ messages in thread
From: c.buhtz @ 2023-01-29 18:20 UTC (permalink / raw)
To: emacs-orgmode
Please let me add the nested-regex-approach. I wouldn't call this a
solution but just an approach. No one understand that regex it is
nearly unmaintainable.
I hope for a more elegant solution.
This matches if we have code in verbatim
^|[ .,;:\-?!({\"']=.*?(?:^|[ .,;:\-?!({\"']~.*?~[.,;:\-?!)}\"']|$).*?=[ .,;:\-?!)}\"']|$
This matches if we have verbatim in code
(?:^|[ .,;:\-?!({\"']~.*?(?:^|[ .,;:\-?!({\"']=.*?=[.,;:\-?!)}\"']|$).*?~[ .,;:\-?!)}\"']|$)
If one of this matching I now which one of my "usual" regex pattern using catching groups to extract the content I should use first.
Just for testing (maybe on regex101.com) here is the text I used.
This =is ~code~ in verbatim= text.
This =is usual verbatim= text.
This ~is =verbatim= in code~ text.
This ~is usual code~ text.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inline markup: How does org identify nested code/verbatim?
2023-01-29 18:20 ` c.buhtz
@ 2023-01-30 2:29 ` Max Nikulin
0 siblings, 0 replies; 5+ messages in thread
From: Max Nikulin @ 2023-01-30 2:29 UTC (permalink / raw)
To: emacs-orgmode
On 30/01/2023 01:20, c.buhtz wrote:
> Please let me add the nested-regex-approach.
You should look up for any markup starting at first. org-element parser
uses "first wins" approach. Notice the following:
/italics ~code/ verbatim~
is exported as
<p>
<i>italics ~code</i> verbatim~</p>
Notice that closing italics marker cancels recognizing of code snippet.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inline markup: How does org identify nested code/verbatim?
2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
2023-01-29 18:20 ` c.buhtz
@ 2023-01-30 14:56 ` Ihor Radchenko
2023-01-30 23:36 ` Tom Gillespie
1 sibling, 1 reply; 5+ messages in thread
From: Ihor Radchenko @ 2023-01-30 14:56 UTC (permalink / raw)
To: c.buhtz; +Cc: emacs-orgmode
<c.buhtz@posteo.jp> writes:
> The point is myself I'm able to identify code or verbatim with regex
> including three catch groups for the content before, between and
> after the inline markers.
>
> for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
> for code: "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
>
> But they don't work together. In the example above I need to use the
> verbatim regex first to make it right.
See https://orgmode.org/worg/org-syntax.html#Emphasis_Markers
Note that Org is not context-free. Within Org AST elements that can
contain objects, the first match "wins":
1. Org looks at a text and searches the first matching object regexp
2. Everything before the match is considered plain-text
3. Everything inside the match is considered the matched object and then
parsed recursively
4. go to (1)
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-01-30 23:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
2023-01-29 18:20 ` c.buhtz
2023-01-30 2:29 ` Max Nikulin
2023-01-30 14:56 ` Ihor Radchenko
2023-01-30 23:36 ` Tom Gillespie
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).