* Inconsistent text markup handling when double-nesting markers @ 2023-10-09 23:02 Tom Alexander 2023-10-10 12:07 ` Ihor Radchenko 0 siblings, 1 reply; 9+ messages in thread From: Tom Alexander @ 2023-10-09 23:02 UTC (permalink / raw) To: emacs-orgmode I used the following test document: ``` __foo__ **foo** ``` I'd expect the two to behave the same but the first one parses as: ``` (paragraph "_" (subscript "foo") "__" ) ``` Whereas the second parses as: ``` (paragraph (bold (bold "foo" ) ) ) ``` This pattern happens in worg at [2] Looking at the description for text markup in the syntax document[1], I don't see any reason the first wouldn't be parsed as an underline: 1. PRE: valid because it is the beginning of a line 2. MARKER: valid underscore 3. CONTENTS: valid. Series of objects from standard set includes both subscript and text markup, so regardless of how we parse the interior, its valid. Also cannot begin or end with whitespace but there is no whitespace in the CONTENTS. 4. MARKER: valid underscore 5. POST: Only valid if we extend the underline to the 2nd underscore so it ends at the end of the line. But the 2nd line shows us that having copies of the marker inside the CONTENTS is fine so I see two possible expected parses of the CONTENTS: 4a. (underline "foo") 4b. ((subscript "foo") (plain-text "_")) I also ran the following test document to further prove that having copies of the marker inside the CONTENTS is fine: ``` *foo*bar* ``` which parses as (bold "foo*bar") So the only way the top line would fail to parse as an underline is if it matched the first closing underscore as closing the underline, but that would be invalid because underscore is not a valid POST character and invalid copies of the closing marker are ignored as proven by both "**foo**" and "*foo*bar*". [1] https://orgmode.org/worg/org-syntax.html#Emphasis_Markers [2] https://git.sr.ht/~bzg/worg/tree/ba6cda890f200d428a5d68e819eef15b5306055f/org-contrib/babel/intro.org#L117 -- Tom Alexander pgp: https://fizz.buzz/pgp.asc ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-09 23:02 Inconsistent text markup handling when double-nesting markers Tom Alexander @ 2023-10-10 12:07 ` Ihor Radchenko 2023-10-11 2:23 ` Max Nikulin 0 siblings, 1 reply; 9+ messages in thread From: Ihor Radchenko @ 2023-10-10 12:07 UTC (permalink / raw) To: Tom Alexander; +Cc: emacs-orgmode "Tom Alexander" <tom@fizz.buzz> writes: > I used the following test document: > ``` > __foo__ > > **foo** > ``` > > I'd expect the two to behave the same but the first one parses as: > ``` > (paragraph > "_" > (subscript "foo") > "__" > ) > ``` Fixed, on main. https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=fe23bec60 -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-10 12:07 ` Ihor Radchenko @ 2023-10-11 2:23 ` Max Nikulin 2023-10-11 9:15 ` Ihor Radchenko 0 siblings, 1 reply; 9+ messages in thread From: Max Nikulin @ 2023-10-11 2:23 UTC (permalink / raw) To: Tom Alexander; +Cc: emacs-orgmode On 10/10/2023 19:07, Ihor Radchenko wrote: > "Tom Alexander" writes: > >> I used the following test document: >> ``` >> __foo__ >> >> **foo** >> ``` > > Fixed, on main. > https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=fe23bec60 Isn't nested bold for "**bold**" a bug? Generally it is not allowed and *b1 *b2* b3* is parsed as bold only for "b1 *b2". ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-11 2:23 ` Max Nikulin @ 2023-10-11 9:15 ` Ihor Radchenko 2023-10-11 12:16 ` Max Nikulin 0 siblings, 1 reply; 9+ messages in thread From: Ihor Radchenko @ 2023-10-11 9:15 UTC (permalink / raw) To: Max Nikulin; +Cc: Tom Alexander, emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > Isn't nested bold for "**bold**" a bug? Generally it is not allowed and > > *b1 *b2* b3* > > is parsed as bold only for "b1 *b2". No, **bold** it is not a bug. The parser is recursive with inner markup not "seeing" its parent. So, we first parse the outer bold and then continue parsing the contents separately, as *bold*. Be it another way, /*bold italic*/ would also not be allowed as we demand bol, whitespace, -, (, {, ', or " before the markup: https://orgmode.org/worg/org-syntax.html#Emphasis_Markers -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-11 9:15 ` Ihor Radchenko @ 2023-10-11 12:16 ` Max Nikulin 2023-10-11 12:26 ` Ihor Radchenko 0 siblings, 1 reply; 9+ messages in thread From: Max Nikulin @ 2023-10-11 12:16 UTC (permalink / raw) To: Ihor Radchenko; +Cc: Tom Alexander, emacs-orgmode On 11/10/2023 16:15, Ihor Radchenko wrote: > Max Nikulin <manikulin@gmail.com> writes: > >> Isn't nested bold for "**bold**" a bug? Generally it is not allowed and >> >> *b1 *b2* b3* >> >> is parsed as bold only for "b1 *b2". > > No, **bold** it is not a bug. The parser is recursive with inner markup > not "seeing" its parent. So, we first parse the outer bold and then > continue parsing the contents separately, as *bold*. I just find the following rather confusing: (org-export-string-as "**bold**" 'html t) "<p>\n<b><b>bold</b></b></p>\n" (org-export-string-as "**inner* outer*" 'html t) "<p>\n<b>*inner</b> outer*</p>\n" (org-export-string-as "*outer *inner**" 'html t) "<p>\n<b>outer <b>inner</b></b></p>\n" (org-export-string-as "*begin *inner* end*" 'html t) "<p>\n<b>begin *inner</b> end*</p>\n" > Be it another way, /*bold italic*/ would also not be allowed as > we demand bol, whitespace, -, (, {, ', or " before the markup: > https://orgmode.org/worg/org-syntax.html#Emphasis_Markers Certainly /*b*/ should work, but nested bold was a surprise for me. I believed that nesting is strictly prohibited. The case of underscores is even more tricky due to ambiguity of underline and subscript. P.S. Juan Manuel at certain moment discovered that pandoc allows nesting for *b1 *b2* b3*. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-11 12:16 ` Max Nikulin @ 2023-10-11 12:26 ` Ihor Radchenko 2023-10-11 14:40 ` Tom Alexander 2023-10-12 10:23 ` Max Nikulin 0 siblings, 2 replies; 9+ messages in thread From: Ihor Radchenko @ 2023-10-11 12:26 UTC (permalink / raw) To: Max Nikulin; +Cc: Tom Alexander, emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> No, **bold** it is not a bug. The parser is recursive with inner markup >> not "seeing" its parent. So, we first parse the outer bold and then >> continue parsing the contents separately, as *bold*. > > I just find the following rather confusing: > > (org-export-string-as "**bold**" 'html t) > "<p>\n<b><b>bold</b></b></p>\n" > (org-export-string-as "**inner* outer*" 'html t) > "<p>\n<b>*inner</b> outer*</p>\n" > (org-export-string-as "*outer *inner**" 'html t) > "<p>\n<b>outer <b>inner</b></b></p>\n" > (org-export-string-as "*begin *inner* end*" 'html t) > "<p>\n<b>begin *inner</b> end*</p>\n" Maybe. It is indeed one of the edge cases. But it is following the parser logic, which is (1) first matching markup is parser; (2) parsing recursive contents is isolated. >> Be it another way, /*bold italic*/ would also not be allowed as >> we demand bol, whitespace, -, (, {, ', or " before the markup: >> https://orgmode.org/worg/org-syntax.html#Emphasis_Markers > > Certainly /*b*/ should work, but nested bold was a surprise for me. I > believed that nesting is strictly prohibited. The case of underscores is > even more tricky due to ambiguity of underline and subscript. It is not strictly prohibited on purpose. It is just a consequence of how the parser works that nesting <end> constructs is almost impossible, except certain edge cases like **b**. > P.S. Juan Manuel at certain moment discovered that pandoc allows nesting > for *b1 *b2* b3*. Which is a bug in pandoc. I think we discussed this topic a number of times in the past - our markup is a compromise between simplicity for users and simplicity of the parser. This works in many simple cases, but edge cases become problematic. Workarounds have been discussed as well. For example, creole markup and generic inline markup constructs (your idea with direct AST and the idea with inline special blocks). -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-11 12:26 ` Ihor Radchenko @ 2023-10-11 14:40 ` Tom Alexander 2023-10-12 10:23 ` Max Nikulin 1 sibling, 0 replies; 9+ messages in thread From: Tom Alexander @ 2023-10-11 14:40 UTC (permalink / raw) To: Ihor Radchenko, Max Nikulin; +Cc: emacs-orgmode > Fixed, on main. Thanks! -- Tom Alexander pgp: https://fizz.buzz/pgp.asc ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-11 12:26 ` Ihor Radchenko 2023-10-11 14:40 ` Tom Alexander @ 2023-10-12 10:23 ` Max Nikulin 2023-10-12 12:04 ` Ihor Radchenko 1 sibling, 1 reply; 9+ messages in thread From: Max Nikulin @ 2023-10-12 10:23 UTC (permalink / raw) To: Ihor Radchenko; +Cc: Tom Alexander, emacs-orgmode On 11/10/2023 19:26, Ihor Radchenko wrote: > Max Nikulin writes: > >> P.S. Juan Manuel at certain moment discovered that pandoc allows nesting >> for *b1 *b2* b3*. > > Which is a bug in pandoc. > > I think we discussed this topic a number of times in the past - our > markup is a compromise between simplicity for users and simplicity of > the parser. This works in many simple cases, but edge cases become > problematic. I have no intention to raise discussions of changing patterns to recognize beginning and end of objects or extending of syntax. My guess is that pandoc may use bottom-up, not top-down approach. I admit, my opinion may be biased by reading complains concerning unexpected behavior of current implementation. Perhaps besides advantages pandoc parser has downsides. I would not be surprised if bottom up parser is unbearable without some tool that generates code for provided rules. By the way, is it explicitly specified that within an element namely top-down strategy must be used to recognize objects? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Inconsistent text markup handling when double-nesting markers 2023-10-12 10:23 ` Max Nikulin @ 2023-10-12 12:04 ` Ihor Radchenko 0 siblings, 0 replies; 9+ messages in thread From: Ihor Radchenko @ 2023-10-12 12:04 UTC (permalink / raw) To: Max Nikulin; +Cc: Tom Alexander, emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > By the way, is it explicitly specified that within an element namely > top-down strategy must be used to recognize objects? https://orgmode.org/worg/org-syntax.html has it, I think. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-10-12 12:04 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-09 23:02 Inconsistent text markup handling when double-nesting markers Tom Alexander 2023-10-10 12:07 ` Ihor Radchenko 2023-10-11 2:23 ` Max Nikulin 2023-10-11 9:15 ` Ihor Radchenko 2023-10-11 12:16 ` Max Nikulin 2023-10-11 12:26 ` Ihor Radchenko 2023-10-11 14:40 ` Tom Alexander 2023-10-12 10:23 ` Max Nikulin 2023-10-12 12:04 ` Ihor Radchenko
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.