emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@gmail.com>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
Date: Fri, 29 Jul 2022 17:06:21 +0800	[thread overview]
Message-ID: <878rocmgoi.fsf@localhost> (raw)
In-Reply-To: <tbvhuk$pt3$1@ciao.gmane.io>

Max Nikulin <manikulin@gmail.com> writes:

>>> The good point in your patch is that \- is still work as shy hyphen
>>> (that, by the way, may be used in some cases instead of zero width
>>> space: *intra*\-word). On the other hand I have managed to find a case
>>> when your approach is not ideal:
>>>
>>> *\--scratch\--*
>>>
>>> <p>
>>> <b>&#x00ad;-scratch</b></p>
>> 
>> Well. I think that it is impossible to use the same escape construct to
>> both force emphasis and escape it.
>
> Let's articulate the problem as follows: when some characters ("*". "/". 
> etc.) besides used literally are overloaded with 2 additional roles that 
> are start emphasis group and terminate emphasis group, in addition to 
> lightweight markup heuristics, it is necessary to provide a way to 
> disambiguate which of 3 roles is associated with particular character.
>
> "Activate" and "deactivate" characters or entities for emphasis markers 
> are alternative and perhaps not so clear terms have used before.
>
> The advantage of zero width space is that "[:space:]" is part of 
> PREMATCH and POSTMATCH (outer) regexps in 
> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the 
> inner borders of emphasized span of text. The latter is mostly 
> meaningful, however I am unsure if bold space has the same width as 
> regular one, and space in fixed width font is certainly distinct.
>
> The problem with the "\--" entity is that it is not handled properly at 
> the start of emphasis region. It neither disables emphasis nor parsed as 
> complete entity, instead it becomes combination of "\-" shy hyphen and 
> literal "-".
>
> Unsure if it can be solved consistently. Possible ways:
> - It addition to space-like (in respect to current regexp) entity add 
> another one that acts as a part of word, but like "\--" stripped from 
> output. Likely it should be accompanied by more changes in the parser 
> and regexps.
> - Provide some new explicit syntax for literal character, start of 
> emphasis group, end of emphasis group.

The fact that \-- was not parsed in your example is because entities
cannot be directly followed by a letter (see 12.4 Special Symbols).

You need

*\--{}scratch\--*

Concerning the 3 listed roles of the *_/+ markup, I propose to simplify
the problem a bit and not try to make \-- serve as a proper escape symbol.
Instead, we can document the already existing quoting entities:

 ("slash" "/" nil "/" "/" "/" "/")
 ("plus" "+" nil "+" "+" "+" "+")
 ("under" "\\_" nil "_" "_" "_" "_")
 ("equal" "=" nil "=" "=" "=" "=")
 ("star" "\\star" t "*" "*" "*" "⋆")

Then, your example should better be written as

\star{}scratch\star

\-- may better work between markup, not inside.

> Concerning zero width space workaround, I may be wrong, but Nicolas 
> might consider using U+200B zero width space as the escape character for 
> itself: single one is filtered out during export, double zero width 
> space becomes single character. (I do not like this kind of "white 
> space" programming language".)

This is too complex, IMHO.
If desired, we can again go the entity road and introduce
\zws entity.

Note that we already have

 ("nbsp" "~" nil "&nbsp;" " " " " " ")
 ("ensp" "\\hspace*{.5em}" nil "&ensp;" " " " " " ")
 ("emsp" "\\hspace*{1em}" nil "&emsp;" " " " " " ")
 ("thinsp" "\\hspace*{.2em}" nil "&thinsp;" " " " " " ")

Generally, it is a good idea to advertise entities in the manual.
Zero-width space is not only limited, it is impossible to use, e.g. in
tables when you want to quote "|". The only solution is using \vert or
\vbar entity.

> Another question is whether U+2060 word 
> joiner (or some other character) should be added either as alternative 
> to zero width space or to allow =    verbatim    = fixed width text 
> surrounded by fixed width spaces.

This particular example is tricky.
If we put escape symbol _inside_ the verbatim, it is never possible to
know if the user intents to use that symbol literally or not.
But non-space before/after opening/closing markup char is hard-coded and
changing it is fragile.

Instead of using some kind of "escape" symbol here, I suggest turning to
the idea about inline special blocks. We can introduce a more verbose
markup that will allow spaces inside at the beginning/end of the
contents.

https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
Manuel Macías [ML:Org mode] (2022) About 'inline special blocks'

Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or
something similar, with _name{...}name_ being inline special block.

Best,
Ihor


  reply	other threads:[~2022-07-29  9:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-19  5:32 How to force markup without spaces cinsky
2012-11-19  7:11 ` Vladimir Lomov
2012-11-19 10:06   ` Seong-Kook Shin
2012-11-19 14:40     ` Suvayu Ali
2012-12-13 21:26       ` Bastien
2022-07-25 17:50         ` K
2022-07-25 18:27         ` K
2022-07-25 19:02           ` K
2022-07-26  1:26             ` Ihor Radchenko
2022-07-26  2:23               ` Max Nikulin
2022-07-26  4:26                 ` K K
2022-07-26  6:30                   ` Max Nikulin
2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
2022-07-26 14:25                     ` Timothy
2022-07-26 15:27                       ` András Simonyi
2022-07-26 16:38                     ` Max Nikulin
2022-07-27  3:30                     ` Max Nikulin
2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
2022-07-28 15:34                       ` Max Nikulin
2022-07-29  1:43                         ` Ihor Radchenko
2022-07-29  2:50                           ` Max Nikulin
2022-07-29  9:06                             ` Ihor Radchenko [this message]
2022-07-30  0:22                               ` [PATCH v2] " Samuel Wales
2022-07-30  4:12                                 ` Samuel Wales
2022-07-30  6:49                                 ` Ihor Radchenko
2022-07-30 15:44                                   ` Max Nikulin
2022-07-28 22:20                       ` [PATCH] " Tim Cross
2022-07-29  0:32                       ` Juan Manuel Macías
2022-07-29  5:49                       ` tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rocmgoi.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).