emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* How to force markup without spaces
@ 2012-11-19  5:32 cinsky
  2012-11-19  7:11 ` Vladimir Lomov
  0 siblings, 1 reply; 30+ messages in thread
From: cinsky @ 2012-11-19  5:32 UTC (permalink / raw)
  To: emacs-orgmode


Hi,

AFAIK, if the markup syntax (=code=, *bold*, ..) is directly followed
by non-whitespace characters, then it will not be marked-up:

   =hello=there
   /not/italic

This may be right decision on English text, but in some languages, the
postposition (grammar) will be postfixed without spaces into the
previous noun, so it will be the trouble.  (Following text contains
Korean characters in UTF-8, you may need additional korean font to
read properly)

   =printf=는
   =bold=로
   =철수=는

I'm sure that some other languages will have same problem
(e.g. Japanese or Chinese).

Is there any way to force mark-up on this situation?

If this pattern cannot be implemented easily, how about to introduce
new escaping character to prevent to insert whitespace between
marked-up text and the following postfix text?  For example:

  =printf=\is      => rendered in HTML: <code>printf</code>is
  *bold*\asdf      => rendered in HTML: <b>bold</b>asdf
  /철수/\는        => rendered in HTML: <i>철수</i>는

I can't say the above solution is well-designed, but I'm sure that
you'll get the point.

Thanks.

-- 
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
Korean Ver: http://www.cinsk.org/cfaqs/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-11-19  5:32 How to force markup without spaces cinsky
@ 2012-11-19  7:11 ` Vladimir Lomov
  2012-11-19 10:06   ` Seong-Kook Shin
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Lomov @ 2012-11-19  7:11 UTC (permalink / raw)
  To: cinsky; +Cc: emacs-orgmode

Hello,
** cinsky@gmail.com [2012-11-19 14:32:21 +0900]:

> Hi,

> AFAIK, if the markup syntax (=code=, *bold*, ..) is directly followed
> by non-whitespace characters, then it will not be marked-up:

>    =hello=there
>    /not/italic

> This may be right decision on English text, but in some languages, the
> postposition (grammar) will be postfixed without spaces into the
> previous noun, so it will be the trouble.  (Following text contains
> Korean characters in UTF-8, you may need additional korean font to
> read properly)

>    =printf=는
>    =bold=로
>    =철수=는

> I'm sure that some other languages will have same problem
> (e.g. Japanese or Chinese).

> Is there any way to force mark-up on this situation?

> If this pattern cannot be implemented easily, how about to introduce
> new escaping character to prevent to insert whitespace between
> marked-up text and the following postfix text?  For example:

>   =printf=\is      => rendered in HTML: <code>printf</code>is
>   *bold*\asdf      => rendered in HTML: <b>bold</b>asdf
>   /철수/\는        => rendered in HTML: <i>철수</i>는

> I can't say the above solution is well-designed, but I'm sure that
> you'll get the point.

May be this will help you:
http://article.gmane.org/gmane.emacs.orgmode/46263/match=zero+width+space

-- 
"Had he and I but met
By some old ancient inn,		But ranged as infantry,
We should have sat us down to wet	And staring face to face,
Right many a nipperkin!			I shot at him as he at me,
					And killed him in his place.
I shot him dead because --
Because he was my foe,			He thought he'd 'list, perhaps,
Just so: my foe of course he was;	Off-hand-like -- just as I --
That's clear enough; although		Was out of work -- had sold his traps
					No other reason why.
Yes; quaint and curious war is!
You shoot a fellow down
You'd treat, if met where any bar is
Or help to half-a-crown."
		-- Thomas Hardy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-11-19  7:11 ` Vladimir Lomov
@ 2012-11-19 10:06   ` Seong-Kook Shin
  2012-11-19 14:40     ` Suvayu Ali
  0 siblings, 1 reply; 30+ messages in thread
From: Seong-Kook Shin @ 2012-11-19 10:06 UTC (permalink / raw)
  To: Vladimir Lomov; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2796 bytes --]

Yes, thank for the solution.

By the way, I'll prefer "word joiner" character (U+2060) to "zero width
space" character (U+200B),
because postpositions (grammar) should not be separated on line-break
policy.

Anyway, is there any plan to implement this feature in other way?
Using the solution that you provides makes the org document stick to the
unicode,
so it can't be used in other character encodings.

Thanks.

On Mon, Nov 19, 2012 at 4:11 PM, Vladimir Lomov <lomov.vl@gmail.com> wrote:

> Hello,
> ** cinsky@gmail.com [2012-11-19 14:32:21 +0900]:
>
> > Hi,
>
> > AFAIK, if the markup syntax (=code=, *bold*, ..) is directly followed
> > by non-whitespace characters, then it will not be marked-up:
>
> >    =hello=there
> >    /not/italic
>
> > This may be right decision on English text, but in some languages, the
> > postposition (grammar) will be postfixed without spaces into the
> > previous noun, so it will be the trouble.  (Following text contains
> > Korean characters in UTF-8, you may need additional korean font to
> > read properly)
>
> >    =printf=는
> >    =bold=로
> >    =철수=는
>
> > I'm sure that some other languages will have same problem
> > (e.g. Japanese or Chinese).
>
> > Is there any way to force mark-up on this situation?
>
> > If this pattern cannot be implemented easily, how about to introduce
> > new escaping character to prevent to insert whitespace between
> > marked-up text and the following postfix text?  For example:
>
> >   =printf=\is      => rendered in HTML: <code>printf</code>is
> >   *bold*\asdf      => rendered in HTML: <b>bold</b>asdf
> >   /철수/\는        => rendered in HTML: <i>철수</i>는
>
> > I can't say the above solution is well-designed, but I'm sure that
> > you'll get the point.
>
> May be this will help you:
> http://article.gmane.org/gmane.emacs.orgmode/46263/match=zero+width+space
>
> --
> "Had he and I but met
> By some old ancient inn,                But ranged as infantry,
> We should have sat us down to wet       And staring face to face,
> Right many a nipperkin!                 I shot at him as he at me,
>                                         And killed him in his place.
> I shot him dead because --
> Because he was my foe,                  He thought he'd 'list, perhaps,
> Just so: my foe of course he was;       Off-hand-like -- just as I --
> That's clear enough; although           Was out of work -- had sold his
> traps
>                                         No other reason why.
> Yes; quaint and curious war is!
> You shoot a fellow down
> You'd treat, if met where any bar is
> Or help to half-a-crown."
>                 -- Thomas Hardy
>



-- 
C FAQs: http://c-faq.com/
Korean: http://www.cinsk.org/cfaqs/

[-- Attachment #2: Type: text/html, Size: 4311 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-11-19 10:06   ` Seong-Kook Shin
@ 2012-11-19 14:40     ` Suvayu Ali
  2012-12-13 21:26       ` Bastien
  0 siblings, 1 reply; 30+ messages in thread
From: Suvayu Ali @ 2012-11-19 14:40 UTC (permalink / raw)
  To: emacs-orgmode

On Mon, Nov 19, 2012 at 07:06:10PM +0900, Seong-Kook Shin wrote:
> 
> Anyway, is there any plan to implement this feature in other way?
> Using the solution that you provides makes the org document stick to the
> unicode,
> so it can't be used in other character encodings.
> 

AFAIK, this will not be included;

  <http://thread.gmane.org/gmane.emacs.orgmode/59881/focus=59971>


-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-11-19 14:40     ` Suvayu Ali
@ 2012-12-13 21:26       ` Bastien
  2022-07-25 17:50         ` K
  2022-07-25 18:27         ` K
  0 siblings, 2 replies; 30+ messages in thread
From: Bastien @ 2012-12-13 21:26 UTC (permalink / raw)
  To: Suvayu Ali; +Cc: emacs-orgmode

Hi,

Suvayu Ali <fatkasuvayu+linux@gmail.com> writes:

>> Anyway, is there any plan to implement this feature in other way?
>> Using the solution that you provides makes the org document stick to the
>> unicode,
>> so it can't be used in other character encodings.
>> 
>
> AFAIK, this will not be included;
>
>   <http://thread.gmane.org/gmane.emacs.orgmode/59881/focus=59971>

More precisely this can be included when we decide to drop support 
of Emacs 22.

Does anyone know what is the current backward compatibility state
of major native Emacs packages (Gnus/ERC/etc) wrt Emacs 22?

Thanks,

-- 
 Bastien

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-12-13 21:26       ` Bastien
@ 2022-07-25 17:50         ` K
  2022-07-25 18:27         ` K
  1 sibling, 0 replies; 30+ messages in thread
From: K @ 2022-07-25 17:50 UTC (permalink / raw)
  To: Bastien, Suvayu Ali; +Cc: emacs-orgmode


Hello everyone, I am a chinese user and also came across this problem.

Bastin once wrote this almost a decade ago:

> More precisely this can be included when we decide to drop support 
> of Emacs 22.
> 
> Does anyone know what is the current backward compatibility state
> of major native Emacs packages (Gnus/ERC/etc) wrt Emacs 22?
> 
> Thanks,
> 

Since emacs has released 28.1, Could this problem be solved?

Although we have the zero-width space workaround, for some fonts the
character will not be zero-space. So it would be nice to solve this
problem.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2012-12-13 21:26       ` Bastien
  2022-07-25 17:50         ` K
@ 2022-07-25 18:27         ` K
  2022-07-25 19:02           ` K
  1 sibling, 1 reply; 30+ messages in thread
From: K @ 2022-07-25 18:27 UTC (permalink / raw)
  To: Bastien, Suvayu Ali; +Cc: emacs-orgmode


Hello everyone, I am a chinese user and also came across this problem.

Bastin once wrote this almost a decade ago:

> More precisely this can be included when we decide to drop support 
> of Emacs 22.
> 
> Does anyone know what is the current backward compatibility state
> of major native Emacs packages (Gnus/ERC/etc) wrt Emacs 22?
> 
> Thanks,
> 

Since emacs has released 28.1, Could this problem be solved?

Although we have the zero-width space workaround, for some fonts the
character will not be zero-space. So it would be nice to solve this
problem.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2022-07-25 18:27         ` K
@ 2022-07-25 19:02           ` K
  2022-07-26  1:26             ` Ihor Radchenko
  0 siblings, 1 reply; 30+ messages in thread
From: K @ 2022-07-25 19:02 UTC (permalink / raw)
  To: k_foreign; +Cc: bzg, emacs-orgmode

> Bastin once wrote this almost a decade ago:

Sorry for the misspelling, the name is Bastien, not Bastin.

The thread and post I am mentioning is at
https://list.orgmode.org/orgmode/87bodxy77m.fsf@bzg.ath.cx/




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2022-07-25 19:02           ` K
@ 2022-07-26  1:26             ` Ihor Radchenko
  2022-07-26  2:23               ` Max Nikulin
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-26  1:26 UTC (permalink / raw)
  To: K; +Cc: bzg, emacs-orgmode

K <k_foreign@outlook.com> writes:

> The thread and post I am mentioning is at
> https://list.orgmode.org/orgmode/87bodxy77m.fsf@bzg.ath.cx/

That thread references yet another thread at
http://thread.gmane.org/gmane.emacs.orgmode/59881/focus=59971
However, gname links are no longer working.
Do you happen to know which thread id or subject the link is referring
to in the mailing list archive?

To add regarding the markup without spaces, we have discussed something
called "inline special blocks" in
https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
Such blocks can be used as an alternative markup.

Another idea we have discussed is using something similar to Markdown
format: **bold**, //italics//, __underline__, etc. It is less verbose
compared to the special blocks, which should be valuable for
Japanese/Chinese/other languages with no spaces between words.

Best,
Ihor


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2022-07-26  1:26             ` Ihor Radchenko
@ 2022-07-26  2:23               ` Max Nikulin
  2022-07-26  4:26                 ` K K
  0 siblings, 1 reply; 30+ messages in thread
From: Max Nikulin @ 2022-07-26  2:23 UTC (permalink / raw)
  To: emacs-orgmode

On 26/07/2022 08:26, Ihor Radchenko wrote:
> K writes:
> 
>> The thread and post I am mentioning is at
>> https://list.orgmode.org/orgmode/87bodxy77m.fsf@bzg.ath.cx/
> 
> That thread references yet another thread at
> http://thread.gmane.org/gmane.emacs.orgmode/59881/focus=59971
> However, gname links are no longer working.
> Do you happen to know which thread id or subject the link is referring
> to in the mailing list archive?

https://list.orgmode.org/orgmode/9C09CF9B-5B8F-4435-98D0-7E0B32BA5ACA@nf.mpg.de/T/
Stefan Vollmar. suggestion for org-emphasis-regexp-components: *U*nited 
*N*ations. 2012-09-05  8:05 UTC

However the suggestion was namely to use U+200B ZERO WIDTH SPACE and it 
is actually implemented since `org-emphasis-regexp-components' currently 
contains [:space:].

The U+2060 word joiner character (from this thread) is not a space, so 
currently it can not be used in such role. Recent mention of this character:
Tom Gillespie. On zero width spaces and Org syntax. Fri, 3 Dec 2021 
20:04:28 -0800. 
https://CA+G3_PM4cxHa8bU+3QG541UiOauLNAQFZQu-+UKczx3itOeTHg@mail.gmail.com

K, could you, please, clarify what is your particular use case? Some 
other workarounds, e.g. custom links, was discussed during last couple 
of years.

P.S. list.orgmode.org supports search by gmane article number:
     https://list.orgmode.org/orgmode/?q=gmane%3A59881
see
Kyle Meyer. yhetil.org/orgmode now supports searching by Gmane ID. Thu, 
23 Apr 2020 04:43:20 +0000 
https://list.orgmode.org/87k126revr.fsf@kyleam.com

Another recipe to fetch the article (from the same message) is
     w3m -m nntp://news.gmane.io/gmane.emacs.orgmode/59971




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2022-07-26  2:23               ` Max Nikulin
@ 2022-07-26  4:26                 ` K K
  2022-07-26  6:30                   ` Max Nikulin
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
  0 siblings, 2 replies; 30+ messages in thread
From: K K @ 2022-07-26  4:26 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode, Ihor Radchenko

[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]

On 2022-07-26 Tue. 09:23 +0700,Max Nikulin wrote:

> However the suggestion was namely to use U+200B ZERO WIDTH SPACE and
> it
> is actually implemented since `org-emphasis-regexp-components'
> currently
> contains [:space:].
> ...
> K, could you, please, clarify what is your particular use case?

My bad, I misunderstood the "feature" mentioned in the old post.

My use case is to emphasize chinese characters without spaces being inserted, even those zero-width spaces. For example "中文*测*试" should be enough to emphasize "测".

I am using zero-width spaces right now, and it works fine in org-mode buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE character will not be zero-width for certain fonts. So I hope not to use that character.

On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
> Another idea we have discussed is using something similar to Markdown
> format: **bold**, //italics//, __underline__, etc. It is less verbose
> compared to the special blocks, which should be valuable for
> Japanese/Chinese/other languages with no spaces between words.

By the way, it seems that my use case has already been implemented by markdown-mode. In a markdown-mode buffer "中文**测**试" will certainly make "测" bold.

[-- Attachment #2: Type: text/html, Size: 3321 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
  2022-07-26  4:26                 ` K K
@ 2022-07-26  6:30                   ` Max Nikulin
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
  1 sibling, 0 replies; 30+ messages in thread
From: Max Nikulin @ 2022-07-26  6:30 UTC (permalink / raw)
  To: emacs-orgmode

On 26/07/2022 11:26, K K wrote:
> On 2022-07-26 Tue. 09:23 +0700,Max Nikulin wrote:
> 
>> > However the suggestion was namely to use U+200B ZERO WIDTH SPACE and
>> > it
>> > is actually implemented since `org-emphasis-regexp-components'
>> > currently
>> > contains [:space:].
>> > ...
>> > K, could you, please, clarify what is your particular use case?
> 
> My bad, I misunderstood the "feature" mentioned in the old post.
> 
> My use case is to emphasize chinese characters without spaces being 
> inserted, even those zero-width spaces. For example "中文*测*试" should 
> be enough to emphasize "测".
> 
> I am using zero-width spaces right now, and it works fine in org-mode 
> buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE 
> character will not be zero-width for certain fonts. So I hope not to use 
> that character.

I have not tested it, but I expect you can use
- export filter that removes zero-width spaces at the last export stage. 
I assume that your documents do not contain them besides markup workaround
- #+latex_header: \DeclareUnicodeCharacter{200B}{}
- custom link

    #+begin_src elisp :results none :exports both
      (org-link-set-parameters
       "sep"
       :export (lambda (path desc backend)
	       (if (org-export-derived-backend-p backend 'org)
		   (org-link-make-string (concat "sep:" path) desc)
		 (or desc ""))))
    #+end_src
    "中文[[sep:][*测*]]试"

   https://list.orgmode.org/ssp8e7$ah2$1@ciao.gmane.io/
   Max Nikulin Re: [RFC] Creole-style / Support for 
**emphasis**__within__**a word** Tue, 25 Jan 2022 23:27:50 +0700

In other thread we are discussing advantages and problems of switching 
from PdfLaTeX to LuaLaTeX for non-latin scripts. The latter is a Unicode 
engine. I am curious what is your opinion from standpoint of Chinese 
language, namely amount of required customization in both cases. I 
think, it is better to either start a dedicated thread, or find the part 
of discussion related to fonts and babel (LaTeX package) setup.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] org-export: Remove zero-width space escapes during export
  2022-07-26  4:26                 ` K K
  2022-07-26  6:30                   ` Max Nikulin
@ 2022-07-26 12:59                   ` Ihor Radchenko
  2022-07-26 14:25                     ` Timothy
                                       ` (3 more replies)
  1 sibling, 4 replies; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-26 12:59 UTC (permalink / raw)
  To: K K; +Cc: Max Nikulin, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

K K <k_foreign@outlook.com> writes:

> My use case is to emphasize chinese characters without spaces being inserted, even those zero-width spaces. For example "中文*测*试" should be enough to emphasize "测".
>
> I am using zero-width spaces right now, and it works fine in org-mode buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE character will not be zero-width for certain fonts. So I hope not to use that character.

This is a bug. While escape symbols do not affect export in most common
scenarios, your report is adding yet another case when zero-width space
is actually altering the export result.

I am attaching a tentative patch that will make Org export remove
zero-width spaces when those spaces actually separate the object
boundaries.

Any objections?

> On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
>> Another idea we have discussed is using something similar to Markdown
>> format: **bold**, //italics//, __underline__, etc. It is less verbose
>> compared to the special blocks, which should be valuable for
>> Japanese/Chinese/other languages with no spaces between words.
>
> By the way, it seems that my use case has already been implemented by markdown-mode. In a markdown-mode buffer "中文**测**试" will certainly make "测" bold.

The idea was indeed inspired by Markdown.
However, Markdown is different - **bold** is the official syntax to
indicate bold markup. Though things are more complex in reality:
https://www.markdownguide.org/basic-syntax/ Markdown has its own edge
cases.

Best,
Ihor


[-- Attachment #2: 0001-org-export-Remove-zero-width-space-escapes-during-ex.patch --]
[-- Type: text/x-patch, Size: 3213 bytes --]

From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001
Message-Id: <5764b41b858bff3d56dcb24741cf550a7e245d36.1658840330.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 26 Jul 2022 20:50:47 +0800
Subject: [PATCH] org-export: Remove zero-width space escapes during export

* lisp/ox.el (org-export--remove-escaped): New function removing
zero-width spaces when they separate object boundaries.
(org-export-as): Call `org-export--remove-escaped'.
* testing/lisp/test-ox.el (test-org-export/remove-escaped): New test.
---
 lisp/ox.el              | 22 ++++++++++++++++++++++
 testing/lisp/test-ox.el | 13 +++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/lisp/ox.el b/lisp/ox.el
index 40ad7ae4e..de034fd22 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data info)
   ;; Return modified parse tree.
   data)
 
+(defun org-export--remove-escaped (data info)
+  "Remove escape symbols from plain-text in DATA.
+DATA is a parse tree or a secondary string.  INFO is a plist
+containing export options.  It is modified by side effect and
+returned by the function."
+  (org-element-map data '(plain-text)
+    (lambda (string)
+      (let (processed-string)
+        (setq processed-string
+              (replace-regexp-in-string "\\`​" "" string))
+        (setq processed-string
+              (replace-regexp-in-string "​\\'" "" processed-string))
+        (unless (equal string processed-string)
+          (org-element-insert-before processed-string string)
+          (org-element-extract-element string))))
+    info nil nil t)
+  ;; Return modified parse tree.
+  data)
+
 ;;;###autoload
 (defun org-export-as
     (backend &optional subtreep visible-only body-only ext-plist)
@@ -3046,6 +3065,9 @@ (defun org-export-as
 	   ;; communication channel.
 	   (org-export--prune-tree tree info)
 	   (org-export--remove-uninterpreted-data tree info)
+           ;; Remove zero-width spaces that escape Org syntax
+           ;; elements.
+           (org-export--remove-escaped tree info)
 	   ;; Call parse tree filters.
 	   (setq tree
 	         (org-export-filter-apply-functions
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 7c71b6e24..ea4fce363 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted ()
 			     (section . (lambda (s c i) c))))
 	     nil nil nil '(:with-sub-superscript {}))))))
 
+(ert-deftest test-org-export/remove-escaped ()
+  "Test removing escape symbols."
+  ;; Remove zero-width space around markup.
+  (should
+   (equal "This*is*test.\n"
+          (org-test-with-temp-text "This​*is*​test.\n"
+            (org-export-as (org-test-default-backend)))))
+  ;; Do not remove zero-width space in other places.
+  (should
+   (equal "This​is​test.\n"
+          (org-test-with-temp-text "This​is​test.\n"
+            (org-export-as (org-test-default-backend))))))
+
 (ert-deftest test-org-export/export-scope ()
   "Test all export scopes."
   ;; Subtree.
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] org-export: Remove zero-width space escapes during export
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
@ 2022-07-26 14:25                     ` Timothy
  2022-07-26 15:27                       ` András Simonyi
  2022-07-26 16:38                     ` Max Nikulin
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Timothy @ 2022-07-26 14:25 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: K K, Max Nikulin, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]

Hi Ihor,

> I am attaching a tentative patch that will make Org export remove
> zero-width spaces when those spaces actually separate the object
> boundaries.
>
> Any objections?

IMO this is an immanently sensible idea. I added an export filter like this to
my config basically as soon as I found out about zero-width spaces.

One minor quibble, I find the name mildly misleading. When you say “escaped” I
think of escaped characters, which isn’t really connected to what the zero width
does. I’d personally be inclined to call the zero width space an “invisible
semantic separator”.

> +(defun org-export–remove-escaped (data info)
> +  “Remove escape symbols from plain-text in DATA.
> +DATA is a parse tree or a secondary string.  INFO is a plist
> +containing export options.  It is modified by side effect and
> +returned by the function.”

How about:

┌────
│ (defun org-export--remove-semantic-separators (data info)
│   "Remove Org-specific semantic separators from plain-text in DATA.
│ DATA is a parse tree or a secondary string.  INFO is a plist
│ containing export options.  It is modified by side effect and
│ returned by the function."
└────

All the best,
Timothy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] org-export: Remove zero-width space escapes during export
  2022-07-26 14:25                     ` Timothy
@ 2022-07-26 15:27                       ` András Simonyi
  0 siblings, 0 replies; 30+ messages in thread
From: András Simonyi @ 2022-07-26 15:27 UTC (permalink / raw)
  To: Timothy; +Cc: Ihor Radchenko, K K, Max Nikulin, emacs-orgmode

Dear All,

this might be a very stupid question as I'm not familiar with the
internals of the Org export engine, but couldn't this change lead to
problems with the Org-to-Org export of documents containing these
"semantic separators"?

thanks and best wishes,
András

On Tue, 26 Jul 2022 at 16:52, Timothy <orgmode@tec.tecosaur.net> wrote:
>
> Hi Ihor,
>
> > I am attaching a tentative patch that will make Org export remove
> > zero-width spaces when those spaces actually separate the object
> > boundaries.
> >
> > Any objections?
>
> IMO this is an immanently sensible idea. I added an export filter like this to
> my config basically as soon as I found out about zero-width spaces.
>
> One minor quibble, I find the name mildly misleading. When you say “escaped” I
> think of escaped characters, which isn’t really connected to what the zero width
> does. I’d personally be inclined to call the zero width space an “invisible
> semantic separator”.
>
> > +(defun org-export–remove-escaped (data info)
> > +  “Remove escape symbols from plain-text in DATA.
> > +DATA is a parse tree or a secondary string.  INFO is a plist
> > +containing export options.  It is modified by side effect and
> > +returned by the function.”
>
> How about:
>
> ┌────
> │ (defun org-export--remove-semantic-separators (data info)
> │   "Remove Org-specific semantic separators from plain-text in DATA.
> │ DATA is a parse tree or a secondary string.  INFO is a plist
> │ containing export options.  It is modified by side effect and
> │ returned by the function."
> └────
>
> All the best,
> Timothy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] org-export: Remove zero-width space escapes during export
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
  2022-07-26 14:25                     ` Timothy
@ 2022-07-26 16:38                     ` Max Nikulin
  2022-07-27  3:30                     ` Max Nikulin
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
  3 siblings, 0 replies; 30+ messages in thread
From: Max Nikulin @ 2022-07-26 16:38 UTC (permalink / raw)
  To: emacs-orgmode

On 26/07/2022 19:59, Ihor Radchenko wrote:
> 
> This is a bug. While escape symbols do not affect export in most common
> scenarios, your report is adding yet another case when zero-width space
> is actually altering the export result.

I agree is zero-width space used as an escape character is too 
intrusive. It adds stray line breakpoints, it may be unwanted during 
copy&paste of text, especially if such text is code or a command.

> I am attaching a tentative patch that will make Org export remove
> zero-width spaces when those spaces actually separate the object
> boundaries.
> 
> Any objections?

I think, you broke a valid use case when zero width space allows to wrap 
objects in the case of narrow page

[[unicorn-1.jpg]]​[[unicorn-2.jpg]]​[[unicorn-3.jpg]]​[[unicorn-4.jpg]]

It was briefly discussed, see
https://list.orgmode.org/874k7qboaq.fsf@nicolasgoaziou.fr/
Nicolas Goaziou. Re: Org-syntax: Intra-word markup. Fri, 03 Dec 2021 
00:05:33 +0100

> The idea was indeed inspired by Markdown.
> However, Markdown is different - **bold** is the official syntax to
> indicate bold markup.

Or by asciidoc 
https://list.orgmode.org/1ef0e093-c165-2a5f-954d-6a33b64c8ee9@mailbox.org/

> +        (setq processed-string
> +              (replace-regexp-in-string "\\`​" "" string))
> +        (setq processed-string
> +              (replace-regexp-in-string "​\\'" "" processed-string))

Please, use \u200B instead of the invisible character.
info "(elisp) Non-ASCII Characters in Strings"
https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] org-export: Remove zero-width space escapes during export
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
  2022-07-26 14:25                     ` Timothy
  2022-07-26 16:38                     ` Max Nikulin
@ 2022-07-27  3:30                     ` Max Nikulin
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
  3 siblings, 0 replies; 30+ messages in thread
From: Max Nikulin @ 2022-07-27  3:30 UTC (permalink / raw)
  To: emacs-orgmode

On 26/07/2022 19:59, Ihor Radchenko wrote:
> 
> I am attaching a tentative patch that will make Org export remove
> zero-width spaces when those spaces actually separate the object
> boundaries.

Ihor, I have realized that you did not address another use case: zero 
width spaces may be added to suppress activation of markup. In such 
cases they are in the middle of text objects, but they should be removed.

     Switch to the *​scratch​* buffer.

I consider zero width spaces as a workaround that is acceptable in some 
cases but awkward in others. It is tricky to deal with it in some 
general way.

I do not agree with the stance "just maintain status quo" expressed in 
response to Juan Manuel Macías. On zero width spaces and Org syntax. 
Fri, 03 Dec 2021 12:48:16 +0000 
https://list.orgmode.org/orgmode/87ilw5yhv3.fsf@posteo.net/

An idea. At certain moment I was surprised that markup markers are not 
activated at the borders of export snippets:

     intra@@org:@@*w*@@org:@@ord

It is not really lightweight markup but at least it is purely ASCII and 
visible by default. It might be breaking change in some edge cases. I am 
unsure concerning increasing complexity of the parser. Macro markers 
{{{macro}}} have similar behavior.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
                                       ` (2 preceding siblings ...)
  2022-07-27  3:30                     ` Max Nikulin
@ 2022-07-28 13:17                     ` Ihor Radchenko
  2022-07-28 15:34                       ` Max Nikulin
                                         ` (3 more replies)
  3 siblings, 4 replies; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-28 13:17 UTC (permalink / raw)
  To: K K; +Cc: Max Nikulin, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1948 bytes --]

Ihor Radchenko <yantar92@gmail.com> writes:

> I am attaching a tentative patch that will make Org export remove
> zero-width spaces when those spaces actually separate the object
> boundaries.
>
> Any objections?

Given the raised objections, zero-width space does not appear to be a
useful escape symbol because it has its valid uses as a standalone space
symbol.

The raised objections can be solved using some kind of intricate
heuristics, but I do not feel like it is a good direction to go. The
code will be too complex and fragile.

Therefore, I am proposing a different approach for shielding
fontification: introducing a special entity.

The new entity is \--, which is a valid boundary between emphasis
markup. It will be removed during export (replaced by "").

"\--" specifically is somewhat arbitrary choice. The actual requirements
for the entity name are: (1) No clash with LaTeX (which is why simpler
\- would not cut it); (2) Being a valid markup boundary: entity must end
with (any space ?- ?\( ?' ?\" ?\{).

I am attaching a tentative patch introducing the new entity. Note that
some minor tweaks to the parser were needed. I do not see it as a big
deal - the current entity regexp has much more cumbersome exceptions.

Also, the patch will not work correctly on org → org export, similar to
pointed in one of the replies to the previous abandoned approach. I do
not want to address it here because a much more appropriate solution for
this issue is changing org-element-interpret-data.

Consider (org-element-interpret-data '("asd" (bold () "bold") "bsd"))
This will return "asd*bold*bsd", which is not correct even though the
given Org datum is not wrong by itself - such things can easily appear
when user filters are applied to parse tree during org→org export.

Otherwise, the patch should be good enough to play around and kick-start
the discussion.

WDYT?

Best,
Ihor


[-- Attachment #2: 0001-Add-new-entity-serving-as-markup-separator-escape-sy.patch --]
[-- Type: text/x-patch, Size: 2994 bytes --]

From 521a4b06578cf37f22e9f33d2f45b967419ad3a3 Mon Sep 17 00:00:00 2001
Message-Id: <521a4b06578cf37f22e9f33d2f45b967419ad3a3.1659013441.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Thu, 28 Jul 2022 21:02:26 +0800
Subject: [PATCH] Add new entity \-- serving as markup separator/escape symbol

* lisp/org-entities.el (org-entities): Add \-- entity.  This entity is
exported as an empty string and simply serves as markup separator if
the user needs any.
* lisp/org.el (org-fontify-entities):
* lisp/org-element.el (org-element-entity-parser):
(org-element--set-regexps): Update entity regexp to match "-".
---
 lisp/org-element.el  | 4 ++--
 lisp/org-entities.el | 4 ++++
 lisp/org.el          | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/lisp/org-element.el b/lisp/org-element.el
index 9e9b7c5ec..6405b4db8 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -258,7 +258,7 @@ (defun org-element--set-regexps ()
 		      "\\$"
 		      ;; Objects starting with "\": line break,
 		      ;; entity, latex fragment.
-		      "\\\\\\(?:[a-zA-Z[(]\\|\\\\[ \t]*$\\|_ +\\)"
+		      "\\\\\\(?:[-a-zA-Z[(]\\|\\\\[ \t]*$\\|_ +\\)"
 		      ;; Objects starting with raw text: inline Babel
 		      ;; source block, inline Babel call.
 		      "\\(?:call\\|src\\)_"))
@@ -3158,7 +3158,7 @@ (defun org-element-entity-parser ()
 
 Assume point is at the beginning of the entity."
   (catch 'no-object
-    (when (looking-at "\\\\\\(?:\\(?1:_ +\\)\\|\\(?1:there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z]+\\)\\(?2:$\\|{}\\|[^[:alpha:]]\\)\\)")
+    (when (looking-at "\\\\\\(?:\\(?1:_ +\\)\\|\\(?1:there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z-]+\\)\\(?2:$\\|{}\\|[^[:alpha:]]\\)\\)")
       (save-excursion
 	(let* ((value (or (org-entity-get (match-string 1))
 			  (throw 'no-object nil)))
diff --git a/lisp/org-entities.el b/lisp/org-entities.el
index d35e3fa8a..9d79d23fc 100644
--- a/lisp/org-entities.el
+++ b/lisp/org-entities.el
@@ -264,6 +264,10 @@ (defconst org-entities
      ("rsaquo" "\\guilsinglright{}" nil "&rsaquo;" ">" ">" "›")
 
      "* Other"
+     
+     "** Escaping Org markup"
+     ("--" "" nil "" "" "" "")
+     
      "** Misc. (often used)"
      ("circ" "\\^{}" nil "&circ;" "^" "^" "∘")
      ("vert" "\\vert{}" t "&vert;" "|" "|" "|")
diff --git a/lisp/org.el b/lisp/org.el
index 937892ef3..29ccff83b 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5828,7 +5828,7 @@ (defun org-fontify-entities (limit)
 	;; i.e., "\_ ", could be fontified anyway, and it would be
 	;; confusing when adding a second white space character.
 	(while (re-search-forward
-		"\\\\\\(there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z]+\\)\\($\\|{}\\|[^[:alpha:]\n]\\)"
+		"\\\\\\(there4\\|sup[123]\\|frac[13][24]\\|[a-zA-Z-]+\\)\\($\\|{}\\|[^[:alpha:]\n]\\)"
 		limit t)
 	  (when (and (not (org-at-comment-p))
 		     (setq ee (org-entity-get (match-string 1)))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
@ 2022-07-28 15:34                       ` Max Nikulin
  2022-07-29  1:43                         ` Ihor Radchenko
  2022-07-28 22:20                       ` [PATCH] " Tim Cross
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Max Nikulin @ 2022-07-28 15:34 UTC (permalink / raw)
  To: emacs-orgmode

On 28/07/2022 20:17, Ihor Radchenko wrote:
> 
> Therefore, I am proposing a different approach for shielding
> fontification: introducing a special entity.
> 
> The new entity is \--, which is a valid boundary between emphasis
> markup. It will be removed during export (replaced by "").

I like your idea more than my similar attempt:
Max Nikulin to emacs-orgmode. [PATCH] Intra-word markup: \relax. Fri, 28 
Jan 2022 19:12:51 +0700.
https://list.orgmode.org/st0mk5$fnv$1@ciao.gmane.io

The good point in your patch is that \- is still work as shy hyphen 
(that, by the way, may be used in some cases instead of zero width 
space: *intra*\-word). On the other hand I have managed to find a case 
when your approach is not ideal:

*\--scratch\--*

<p>
<b>&#x00ad;-scratch</b></p>

"\--" are added with hope to suppress bold text and keep asterisks.

I expected possible problem at the border of "-" and "$", but 
fortunately the following works well

/pre/\--$n$\--*th*



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
  2022-07-28 15:34                       ` Max Nikulin
@ 2022-07-28 22:20                       ` Tim Cross
  2022-07-29  0:32                       ` Juan Manuel Macías
  2022-07-29  5:49                       ` tomas
  3 siblings, 0 replies; 30+ messages in thread
From: Tim Cross @ 2022-07-28 22:20 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: K K, Max Nikulin, emacs-orgmode


Ihor Radchenko <yantar92@gmail.com> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> I am attaching a tentative patch that will make Org export remove
>> zero-width spaces when those spaces actually separate the object
>> boundaries.
>>
>> Any objections?
>
> Given the raised objections, zero-width space does not appear to be a
> useful escape symbol because it has its valid uses as a standalone space
> symbol.
>
> The raised objections can be solved using some kind of intricate
> heuristics, but I do not feel like it is a good direction to go. The
> code will be too complex and fragile.
>

Ihor, thanks for articulating this as it was something I was becoming
increasingly concerned about. 

> Therefore, I am proposing a different approach for shielding
> fontification: introducing a special entity.
>
> The new entity is \--, which is a valid boundary between emphasis
> markup. It will be removed during export (replaced by "").
>
> "\--" specifically is somewhat arbitrary choice. The actual requirements
> for the entity name are: (1) No clash with LaTeX (which is why simpler
> \- would not cut it); (2) Being a valid markup boundary: entity must end
> with (any space ?- ?\( ?' ?\" ?\{).
>
> I am attaching a tentative patch introducing the new entity. Note that
> some minor tweaks to the parser were needed. I do not see it as a big
> deal - the current entity regexp has much more cumbersome exceptions.
>
> Also, the patch will not work correctly on org → org export, similar to
> pointed in one of the replies to the previous abandoned approach. I do
> not want to address it here because a much more appropriate solution for
> this issue is changing org-element-interpret-data.
>
> Consider (org-element-interpret-data '("asd" (bold () "bold") "bsd"))
> This will return "asd*bold*bsd", which is not correct even though the
> given Org datum is not wrong by itself - such things can easily appear
> when user filters are applied to parse tree during org→org export.
>
> Otherwise, the patch should be good enough to play around and kick-start
> the discussion.
>
> WDYT?
>

I think this is definitely preferred over the zero width space as it is
clearer and 'intentional'. While I'm still 'on the fence' regarding the
tension between the need for this new functionality and the additional
complexity it introduces, this approach seems potentially cleaner and
more manageable.

Given the important work you are doing to integrate parsing of elements
and fontification, I feel you are in the best position to judge whether
this addition can be justified wrt complexity vs functionality and am
confident your on the right track here.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
  2022-07-28 15:34                       ` Max Nikulin
  2022-07-28 22:20                       ` [PATCH] " Tim Cross
@ 2022-07-29  0:32                       ` Juan Manuel Macías
  2022-07-29  5:49                       ` tomas
  3 siblings, 0 replies; 30+ messages in thread
From: Juan Manuel Macías @ 2022-07-29  0:32 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: orgmode

Hi, Ihor,

Ihor Radchenko writes:

> Given the raised objections, zero-width space does not appear to be a
> useful escape symbol because it has its valid uses as a standalone space
> symbol.
>
> The raised objections can be solved using some kind of intricate
> heuristics, but I do not feel like it is a good direction to go. The
> code will be too complex and fragile.
>
> Therefore, I am proposing a different approach for shielding
> fontification: introducing a special entity.
>
> The new entity is \--, which is a valid boundary between emphasis
> markup. It will be removed during export (replaced by "").
>
> "\--" specifically is somewhat arbitrary choice. The actual requirements
> for the entity name are: (1) No clash with LaTeX (which is why simpler
> \- would not cut it); (2) Being a valid markup boundary: entity must end
> with (any space ?- ?\( ?' ?\" ?\{).
>
> I am attaching a tentative patch introducing the new entity. Note that
> some minor tweaks to the parser were needed. I do not see it as a big
> deal - the current entity regexp has much more cumbersome exceptions.
>
> Also, the patch will not work correctly on org → org export, similar to
> pointed in one of the replies to the previous abandoned approach. I do
> not want to address it here because a much more appropriate solution for
> this issue is changing org-element-interpret-data.
>
> Consider (org-element-interpret-data '("asd" (bold () "bold") "bsd"))
> This will return "asd*bold*bsd", which is not correct even though the
> given Org datum is not wrong by itself - such things can easily appear
> when user filters are applied to parse tree during org→org export.
>
> Otherwise, the patch should be good enough to play around and kick-start
> the discussion.

I'm late joining this thread, although I am particularly interested in
the topic.

I can't make any technical comments because I haven't had time to test
the patch yet, but I have to say that your idea of using a special
entity seems to me the best approach to the problem. I would vote for
this to be the way to go.

I believe that using the zero width space character as an escape
character is not a happy idea, and I have already left my arguments in
some other thread, long ago. The zero width space is a random
workaround, but should not (in my opinion) be part of the markup. For
various reasons: it is not an ascii character, there are certain
contexts in which it can produce an unexpected result in LaTeX, etc. In
addition, the zero width space, as an escape character, has a curious
anomaly: it is an escape character that does not have a plan B and a way
to escape the escape character when you want to use it by itself.

I also like the idea of using a special entity because it is not
necessary to invent anything new and it takes advantage of an existing
resource.

Well, that's my opinion.

Best regards,

Juan Manuel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-28 15:34                       ` Max Nikulin
@ 2022-07-29  1:43                         ` Ihor Radchenko
  2022-07-29  2:50                           ` Max Nikulin
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-29  1:43 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> The good point in your patch is that \- is still work as shy hyphen 
> (that, by the way, may be used in some cases instead of zero width 
> space: *intra*\-word). On the other hand I have managed to find a case 
> when your approach is not ideal:
>
> *\--scratch\--*
>
> <p>
> <b>&#x00ad;-scratch</b></p>

Well. I think that it is impossible to use the same escape construct to
both force emphasis and escape it.

However, we can do

 *scratch\--{}*

which is a bit hacky, but it is the best thing I can think of without
introducing two separate entities: one for forcing the markup and one
for escaping the markup.

In general, the proposed \-- entity is only meaningful _before_ markup
characters. When it is placed after markup character, it does literally
nothing.

Best,
Ihor


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-29  1:43                         ` Ihor Radchenko
@ 2022-07-29  2:50                           ` Max Nikulin
  2022-07-29  9:06                             ` [PATCH v2] " Ihor Radchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Max Nikulin @ 2022-07-29  2:50 UTC (permalink / raw)
  To: emacs-orgmode

On 29/07/2022 08:43, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>> The good point in your patch is that \- is still work as shy hyphen
>> (that, by the way, may be used in some cases instead of zero width
>> space: *intra*\-word). On the other hand I have managed to find a case
>> when your approach is not ideal:
>>
>> *\--scratch\--*
>>
>> <p>
>> <b>&#x00ad;-scratch</b></p>
> 
> Well. I think that it is impossible to use the same escape construct to
> both force emphasis and escape it.

Let's articulate the problem as follows: when some characters ("*". "/". 
etc.) besides used literally are overloaded with 2 additional roles that 
are start emphasis group and terminate emphasis group, in addition to 
lightweight markup heuristics, it is necessary to provide a way to 
disambiguate which of 3 roles is associated with particular character.

"Activate" and "deactivate" characters or entities for emphasis markers 
are alternative and perhaps not so clear terms have used before.

The advantage of zero width space is that "[:space:]" is part of 
PREMATCH and POSTMATCH (outer) regexps in 
`org-emphasis-regexp-components' and "[:space:]" is forbidden at the 
inner borders of emphasized span of text. The latter is mostly 
meaningful, however I am unsure if bold space has the same width as 
regular one, and space in fixed width font is certainly distinct.

The problem with the "\--" entity is that it is not handled properly at 
the start of emphasis region. It neither disables emphasis nor parsed as 
complete entity, instead it becomes combination of "\-" shy hyphen and 
literal "-".

Unsure if it can be solved consistently. Possible ways:
- It addition to space-like (in respect to current regexp) entity add 
another one that acts as a part of word, but like "\--" stripped from 
output. Likely it should be accompanied by more changes in the parser 
and regexps.
- Provide some new explicit syntax for literal character, start of 
emphasis group, end of emphasis group.

Concerning zero width space workaround, I may be wrong, but Nicolas 
might consider using U+200B zero width space as the escape character for 
itself: single one is filtered out during export, double zero width 
space becomes single character. (I do not like this kind of "white 
space" programming language".) Another question is whether U+2060 word 
joiner (or some other character) should be added either as alternative 
to zero width space or to allow =    verbatim    = fixed width text 
surrounded by fixed width spaces.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
  2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
                                         ` (2 preceding siblings ...)
  2022-07-29  0:32                       ` Juan Manuel Macías
@ 2022-07-29  5:49                       ` tomas
  3 siblings, 0 replies; 30+ messages in thread
From: tomas @ 2022-07-29  5:49 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

On Thu, Jul 28, 2022 at 09:17:32PM +0800, Ihor Radchenko wrote:
> Ihor Radchenko <yantar92@gmail.com> writes:
> 
> > I am attaching a tentative patch that will make Org export remove
> > zero-width spaces when those spaces actually separate the object
> > boundaries.
> >
> > Any objections?
> 
> Given the raised objections, zero-width space does not appear to be a
> useful escape symbol because it has its valid uses as a standalone space
> symbol.
> 
> The raised objections can be solved using some kind of intricate
> heuristics, but I do not feel like it is a good direction to go. The
> code will be too complex and fragile.
> 
> Therefore, I am proposing a different approach for shielding
> fontification: introducing a special entity.
> 
> The new entity is \--, which is a valid boundary between emphasis
> markup. It will be removed during export (replaced by "").

[...]

I like that approach very much. I'm impressed, really.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
  2022-07-29  2:50                           ` Max Nikulin
@ 2022-07-29  9:06                             ` Ihor Radchenko
  2022-07-30  0:22                               ` Samuel Wales
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-29  9:06 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>>> The good point in your patch is that \- is still work as shy hyphen
>>> (that, by the way, may be used in some cases instead of zero width
>>> space: *intra*\-word). On the other hand I have managed to find a case
>>> when your approach is not ideal:
>>>
>>> *\--scratch\--*
>>>
>>> <p>
>>> <b>&#x00ad;-scratch</b></p>
>> 
>> Well. I think that it is impossible to use the same escape construct to
>> both force emphasis and escape it.
>
> Let's articulate the problem as follows: when some characters ("*". "/". 
> etc.) besides used literally are overloaded with 2 additional roles that 
> are start emphasis group and terminate emphasis group, in addition to 
> lightweight markup heuristics, it is necessary to provide a way to 
> disambiguate which of 3 roles is associated with particular character.
>
> "Activate" and "deactivate" characters or entities for emphasis markers 
> are alternative and perhaps not so clear terms have used before.
>
> The advantage of zero width space is that "[:space:]" is part of 
> PREMATCH and POSTMATCH (outer) regexps in 
> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the 
> inner borders of emphasized span of text. The latter is mostly 
> meaningful, however I am unsure if bold space has the same width as 
> regular one, and space in fixed width font is certainly distinct.
>
> The problem with the "\--" entity is that it is not handled properly at 
> the start of emphasis region. It neither disables emphasis nor parsed as 
> complete entity, instead it becomes combination of "\-" shy hyphen and 
> literal "-".
>
> Unsure if it can be solved consistently. Possible ways:
> - It addition to space-like (in respect to current regexp) entity add 
> another one that acts as a part of word, but like "\--" stripped from 
> output. Likely it should be accompanied by more changes in the parser 
> and regexps.
> - Provide some new explicit syntax for literal character, start of 
> emphasis group, end of emphasis group.

The fact that \-- was not parsed in your example is because entities
cannot be directly followed by a letter (see 12.4 Special Symbols).

You need

*\--{}scratch\--*

Concerning the 3 listed roles of the *_/+ markup, I propose to simplify
the problem a bit and not try to make \-- serve as a proper escape symbol.
Instead, we can document the already existing quoting entities:

 ("slash" "/" nil "/" "/" "/" "/")
 ("plus" "+" nil "+" "+" "+" "+")
 ("under" "\\_" nil "_" "_" "_" "_")
 ("equal" "=" nil "=" "=" "=" "=")
 ("star" "\\star" t "*" "*" "*" "⋆")

Then, your example should better be written as

\star{}scratch\star

\-- may better work between markup, not inside.

> Concerning zero width space workaround, I may be wrong, but Nicolas 
> might consider using U+200B zero width space as the escape character for 
> itself: single one is filtered out during export, double zero width 
> space becomes single character. (I do not like this kind of "white 
> space" programming language".)

This is too complex, IMHO.
If desired, we can again go the entity road and introduce
\zws entity.

Note that we already have

 ("nbsp" "~" nil "&nbsp;" " " " " " ")
 ("ensp" "\\hspace*{.5em}" nil "&ensp;" " " " " " ")
 ("emsp" "\\hspace*{1em}" nil "&emsp;" " " " " " ")
 ("thinsp" "\\hspace*{.2em}" nil "&thinsp;" " " " " " ")

Generally, it is a good idea to advertise entities in the manual.
Zero-width space is not only limited, it is impossible to use, e.g. in
tables when you want to quote "|". The only solution is using \vert or
\vbar entity.

> Another question is whether U+2060 word 
> joiner (or some other character) should be added either as alternative 
> to zero width space or to allow =    verbatim    = fixed width text 
> surrounded by fixed width spaces.

This particular example is tricky.
If we put escape symbol _inside_ the verbatim, it is never possible to
know if the user intents to use that symbol literally or not.
But non-space before/after opening/closing markup char is hard-coded and
changing it is fragile.

Instead of using some kind of "escape" symbol here, I suggest turning to
the idea about inline special blocks. We can introduce a more verbose
markup that will allow spaces inside at the beginning/end of the
contents.

https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
Manuel Macías [ML:Org mode] (2022) About 'inline special blocks'

Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or
something similar, with _name{...}name_ being inline special block.

Best,
Ihor


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
  2022-07-29  9:06                             ` [PATCH v2] " Ihor Radchenko
@ 2022-07-30  0:22                               ` Samuel Wales
  2022-07-30  4:12                                 ` Samuel Wales
  2022-07-30  6:49                                 ` Ihor Radchenko
  0 siblings, 2 replies; 30+ messages in thread
From: Samuel Wales @ 2022-07-30  0:22 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

i am not in a position to judge \-- but i like the idea of not having
zws be used, and expect you have thought it out.


just an idea: something approximately like this might work, or
something like john kitchen's poc implementation of it might.  this is
called extensible syntax.  one of the goals of es is to reduce the
proliferation of org syntax and other stuff.

es was proposed long ago, but i was unable to sufficiently follow up
for unrelated reasons.  i have lots of replies and lots of further
work on it but that's neither here nor there in this case.

[other stuff includes but is not limited to increase reusability and
reliability of code to implement things you want to do with syntax
such as whether to show it, add a subfeature, export it variantly in
different exporters, escape it, quote it, pretty-print it, etc.; allow
user to do this so org is not burdened by it; etc.  terms to look up
in the mailing list archives include extensible syntax, parsing risk,
and id markers.]

  $[emphasis :position beg :type bold :display "*"]bold text$[emphasis
:position end :type bold :display "*"]

alternatively:

  $()...

other than the basics, such as sexp, i do NOT care about the details
of the $[] low level syntax in general OR the arglist details in this
particular case.  those can change according to consensus or
implementation needs etc.  instead, it is getting the concept across
that matters to me.  one key thing about es is that when we want a new
feature, we do not need new org syntax for that new feature.  OR for
new subfeatures.  we just do something like this:

  $[extended-timestamp :whatever yes :displays-as interval]

or whatever.  this has nothing to do with bold emphasis.  it is an
unrelated feature, using the same outer syntax.  another completely
unrelated feature i'd strongly like, for emacs in general, is id
markers.  that too can be done with this syntax.

it looks verbose to 3rd party tools but is parseable by them.  this
example displays as * to the user.  parseable as lisp sexp data using
lisp tools.  it is meant to be vaguely reminiscent of a cl function
call while still not likely to occur naturally.

it would of course not be typed by the user directly but by some
completion thing.

i am not doing well so i am unlikely to be able to respond much or at
all to queries.  please take it easy on me if this rubs you the wrong
way.  it is just an idea and it does not have to be the answer.

merely saying that once implemented, could solve this problem and ALSO
later problems.  in fact, we discussed coloring of text using this
syntax.  although with various understandings of it.  that's kinda
similar to emphasis.

On 7/29/22, Ihor Radchenko <yantar92@gmail.com> wrote:
> Max Nikulin <manikulin@gmail.com> writes:
>
>>>> The good point in your patch is that \- is still work as shy hyphen
>>>> (that, by the way, may be used in some cases instead of zero width
>>>> space: *intra*\-word). On the other hand I have managed to find a case
>>>> when your approach is not ideal:
>>>>
>>>> *\--scratch\--*
>>>>
>>>> <p>
>>>> <b>&#x00ad;-scratch</b></p>
>>>
>>> Well. I think that it is impossible to use the same escape construct to
>>> both force emphasis and escape it.
>>
>> Let's articulate the problem as follows: when some characters ("*". "/".
>> etc.) besides used literally are overloaded with 2 additional roles that
>> are start emphasis group and terminate emphasis group, in addition to
>> lightweight markup heuristics, it is necessary to provide a way to
>> disambiguate which of 3 roles is associated with particular character.
>>
>> "Activate" and "deactivate" characters or entities for emphasis markers
>> are alternative and perhaps not so clear terms have used before.
>>
>> The advantage of zero width space is that "[:space:]" is part of
>> PREMATCH and POSTMATCH (outer) regexps in
>> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the
>> inner borders of emphasized span of text. The latter is mostly
>> meaningful, however I am unsure if bold space has the same width as
>> regular one, and space in fixed width font is certainly distinct.
>>
>> The problem with the "\--" entity is that it is not handled properly at
>> the start of emphasis region. It neither disables emphasis nor parsed as
>> complete entity, instead it becomes combination of "\-" shy hyphen and
>> literal "-".
>>
>> Unsure if it can be solved consistently. Possible ways:
>> - It addition to space-like (in respect to current regexp) entity add
>> another one that acts as a part of word, but like "\--" stripped from
>> output. Likely it should be accompanied by more changes in the parser
>> and regexps.
>> - Provide some new explicit syntax for literal character, start of
>> emphasis group, end of emphasis group.
>
> The fact that \-- was not parsed in your example is because entities
> cannot be directly followed by a letter (see 12.4 Special Symbols).
>
> You need
>
> *\--{}scratch\--*
>
> Concerning the 3 listed roles of the *_/+ markup, I propose to simplify
> the problem a bit and not try to make \-- serve as a proper escape symbol.
> Instead, we can document the already existing quoting entities:
>
>  ("slash" "/" nil "/" "/" "/" "/")
>  ("plus" "+" nil "+" "+" "+" "+")
>  ("under" "\\_" nil "_" "_" "_" "_")
>  ("equal" "=" nil "=" "=" "=" "=")
>  ("star" "\\star" t "*" "*" "*" "⋆")
>
> Then, your example should better be written as
>
> \star{}scratch\star
>
> \-- may better work between markup, not inside.
>
>> Concerning zero width space workaround, I may be wrong, but Nicolas
>> might consider using U+200B zero width space as the escape character for
>> itself: single one is filtered out during export, double zero width
>> space becomes single character. (I do not like this kind of "white
>> space" programming language".)
>
> This is too complex, IMHO.
> If desired, we can again go the entity road and introduce
> \zws entity.
>
> Note that we already have
>
>  ("nbsp" "~" nil "&nbsp;" " " " " " ")
>  ("ensp" "\\hspace*{.5em}" nil "&ensp;" " " " " " ")
>  ("emsp" "\\hspace*{1em}" nil "&emsp;" " " " " " ")
>  ("thinsp" "\\hspace*{.2em}" nil "&thinsp;" " " " " " ")
>
> Generally, it is a good idea to advertise entities in the manual.
> Zero-width space is not only limited, it is impossible to use, e.g. in
> tables when you want to quote "|". The only solution is using \vert or
> \vbar entity.
>
>> Another question is whether U+2060 word
>> joiner (or some other character) should be added either as alternative
>> to zero width space or to allow =    verbatim    = fixed width text
>> surrounded by fixed width spaces.
>
> This particular example is tricky.
> If we put escape symbol _inside_ the verbatim, it is never possible to
> know if the user intents to use that symbol literally or not.
> But non-space before/after opening/closing markup char is hard-coded and
> changing it is fragile.
>
> Instead of using some kind of "escape" symbol here, I suggest turning to
> the idea about inline special blocks. We can introduce a more verbose
> markup that will allow spaces inside at the beginning/end of the
> contents.
>
> https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
> Manuel Macías [ML:Org mode] (2022) About 'inline special blocks'
>
> Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or
> something similar, with _name{...}name_ being inline special block.
>
> Best,
> Ihor
>
>


-- 
The Kafka Pandemic

A blog about science, health, human rights, and misopathy:
https://thekafkapandemic.blogspot.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
  2022-07-30  0:22                               ` Samuel Wales
@ 2022-07-30  4:12                                 ` Samuel Wales
  2022-07-30  6:49                                 ` Ihor Radchenko
  1 sibling, 0 replies; 30+ messages in thread
From: Samuel Wales @ 2022-07-30  4:12 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

my deep apologies for the typo in john's name.  i meant of course John
Kitchin -- jkitchin.  i refer to his new style link syntax and his
proof of concept for cl style keyword args.  i still owe you email
replies.


On 7/29/22, Samuel Wales <samologist@gmail.com> wrote:
> i am not in a position to judge \-- but i like the idea of not having
> zws be used, and expect you have thought it out.
>
>
> just an idea: something approximately like this might work, or
> something like john kitchen's poc implementation of it might.  this is
> called extensible syntax.  one of the goals of es is to reduce the
> proliferation of org syntax and other stuff.
>
> es was proposed long ago, but i was unable to sufficiently follow up
> for unrelated reasons.  i have lots of replies and lots of further
> work on it but that's neither here nor there in this case.
>
> [other stuff includes but is not limited to increase reusability and
> reliability of code to implement things you want to do with syntax
> such as whether to show it, add a subfeature, export it variantly in
> different exporters, escape it, quote it, pretty-print it, etc.; allow
> user to do this so org is not burdened by it; etc.  terms to look up
> in the mailing list archives include extensible syntax, parsing risk,
> and id markers.]
>
>   $[emphasis :position beg :type bold :display "*"]bold text$[emphasis
> :position end :type bold :display "*"]
>
> alternatively:
>
>   $()...
>
> other than the basics, such as sexp, i do NOT care about the details
> of the $[] low level syntax in general OR the arglist details in this
> particular case.  those can change according to consensus or
> implementation needs etc.  instead, it is getting the concept across
> that matters to me.  one key thing about es is that when we want a new
> feature, we do not need new org syntax for that new feature.  OR for
> new subfeatures.  we just do something like this:
>
>   $[extended-timestamp :whatever yes :displays-as interval]
>
> or whatever.  this has nothing to do with bold emphasis.  it is an
> unrelated feature, using the same outer syntax.  another completely
> unrelated feature i'd strongly like, for emacs in general, is id
> markers.  that too can be done with this syntax.
>
> it looks verbose to 3rd party tools but is parseable by them.  this
> example displays as * to the user.  parseable as lisp sexp data using
> lisp tools.  it is meant to be vaguely reminiscent of a cl function
> call while still not likely to occur naturally.
>
> it would of course not be typed by the user directly but by some
> completion thing.
>
> i am not doing well so i am unlikely to be able to respond much or at
> all to queries.  please take it easy on me if this rubs you the wrong
> way.  it is just an idea and it does not have to be the answer.
>
> merely saying that once implemented, could solve this problem and ALSO
> later problems.  in fact, we discussed coloring of text using this
> syntax.  although with various understandings of it.  that's kinda
> similar to emphasis.
>
> On 7/29/22, Ihor Radchenko <yantar92@gmail.com> wrote:
>> Max Nikulin <manikulin@gmail.com> writes:
>>
>>>>> The good point in your patch is that \- is still work as shy hyphen
>>>>> (that, by the way, may be used in some cases instead of zero width
>>>>> space: *intra*\-word). On the other hand I have managed to find a case
>>>>> when your approach is not ideal:
>>>>>
>>>>> *\--scratch\--*
>>>>>
>>>>> <p>
>>>>> <b>&#x00ad;-scratch</b></p>
>>>>
>>>> Well. I think that it is impossible to use the same escape construct to
>>>> both force emphasis and escape it.
>>>
>>> Let's articulate the problem as follows: when some characters ("*". "/".
>>> etc.) besides used literally are overloaded with 2 additional roles that
>>> are start emphasis group and terminate emphasis group, in addition to
>>> lightweight markup heuristics, it is necessary to provide a way to
>>> disambiguate which of 3 roles is associated with particular character.
>>>
>>> "Activate" and "deactivate" characters or entities for emphasis markers
>>> are alternative and perhaps not so clear terms have used before.
>>>
>>> The advantage of zero width space is that "[:space:]" is part of
>>> PREMATCH and POSTMATCH (outer) regexps in
>>> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the
>>> inner borders of emphasized span of text. The latter is mostly
>>> meaningful, however I am unsure if bold space has the same width as
>>> regular one, and space in fixed width font is certainly distinct.
>>>
>>> The problem with the "\--" entity is that it is not handled properly at
>>> the start of emphasis region. It neither disables emphasis nor parsed as
>>> complete entity, instead it becomes combination of "\-" shy hyphen and
>>> literal "-".
>>>
>>> Unsure if it can be solved consistently. Possible ways:
>>> - It addition to space-like (in respect to current regexp) entity add
>>> another one that acts as a part of word, but like "\--" stripped from
>>> output. Likely it should be accompanied by more changes in the parser
>>> and regexps.
>>> - Provide some new explicit syntax for literal character, start of
>>> emphasis group, end of emphasis group.
>>
>> The fact that \-- was not parsed in your example is because entities
>> cannot be directly followed by a letter (see 12.4 Special Symbols).
>>
>> You need
>>
>> *\--{}scratch\--*
>>
>> Concerning the 3 listed roles of the *_/+ markup, I propose to simplify
>> the problem a bit and not try to make \-- serve as a proper escape
>> symbol.
>> Instead, we can document the already existing quoting entities:
>>
>>  ("slash" "/" nil "/" "/" "/" "/")
>>  ("plus" "+" nil "+" "+" "+" "+")
>>  ("under" "\\_" nil "_" "_" "_" "_")
>>  ("equal" "=" nil "=" "=" "=" "=")
>>  ("star" "\\star" t "*" "*" "*" "⋆")
>>
>> Then, your example should better be written as
>>
>> \star{}scratch\star
>>
>> \-- may better work between markup, not inside.
>>
>>> Concerning zero width space workaround, I may be wrong, but Nicolas
>>> might consider using U+200B zero width space as the escape character for
>>> itself: single one is filtered out during export, double zero width
>>> space becomes single character. (I do not like this kind of "white
>>> space" programming language".)
>>
>> This is too complex, IMHO.
>> If desired, we can again go the entity road and introduce
>> \zws entity.
>>
>> Note that we already have
>>
>>  ("nbsp" "~" nil "&nbsp;" " " " " " ")
>>  ("ensp" "\\hspace*{.5em}" nil "&ensp;" " " " " " ")
>>  ("emsp" "\\hspace*{1em}" nil "&emsp;" " " " " " ")
>>  ("thinsp" "\\hspace*{.2em}" nil "&thinsp;" " " " " " ")
>>
>> Generally, it is a good idea to advertise entities in the manual.
>> Zero-width space is not only limited, it is impossible to use, e.g. in
>> tables when you want to quote "|". The only solution is using \vert or
>> \vbar entity.
>>
>>> Another question is whether U+2060 word
>>> joiner (or some other character) should be added either as alternative
>>> to zero width space or to allow =    verbatim    = fixed width text
>>> surrounded by fixed width spaces.
>>
>> This particular example is tricky.
>> If we put escape symbol _inside_ the verbatim, it is never possible to
>> know if the user intents to use that symbol literally or not.
>> But non-space before/after opening/closing markup char is hard-coded and
>> changing it is fragile.
>>
>> Instead of using some kind of "escape" symbol here, I suggest turning to
>> the idea about inline special blocks. We can introduce a more verbose
>> markup that will allow spaces inside at the beginning/end of the
>> contents.
>>
>> https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
>> Manuel Macías [ML:Org mode] (2022) About 'inline special blocks'
>>
>> Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or
>> something similar, with _name{...}name_ being inline special block.
>>
>> Best,
>> Ihor
>>
>>
>
>
> --
> The Kafka Pandemic
>
> A blog about science, health, human rights, and misopathy:
> https://thekafkapandemic.blogspot.com
>


-- 
The Kafka Pandemic

A blog about science, health, human rights, and misopathy:
https://thekafkapandemic.blogspot.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
  2022-07-30  0:22                               ` Samuel Wales
  2022-07-30  4:12                                 ` Samuel Wales
@ 2022-07-30  6:49                                 ` Ihor Radchenko
  2022-07-30 15:44                                   ` Max Nikulin
  1 sibling, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2022-07-30  6:49 UTC (permalink / raw)
  To: Samuel Wales; +Cc: Max Nikulin, emacs-orgmode

Samuel Wales <samologist@gmail.com> writes:

> i am not in a position to judge \-- but i like the idea of not having
> zws be used, and expect you have thought it out.
>
>
> just an idea: something approximately like this might work, or
> something like john kitchen's poc implementation of it might.  this is
> called extensible syntax.  one of the goals of es is to reduce the
> proliferation of org syntax and other stuff.
>
> es was proposed long ago, but i was unable to sufficiently follow up
> for unrelated reasons.  i have lots of replies and lots of further
> work on it but that's neither here nor there in this case.
>
> [other stuff includes but is not limited to increase reusability and
> reliability of code to implement things you want to do with syntax
> such as whether to show it, add a subfeature, export it variantly in
> different exporters, escape it, quote it, pretty-print it, etc.; allow
> user to do this so org is not burdened by it; etc.  terms to look up
> in the mailing list archives include extensible syntax, parsing risk,
> and id markers.]
>
>   $[emphasis :position beg :type bold :display "*"]bold text$[emphasis
> :position end :type bold :display "*"]

This is similar to another recent idea about inline special blocks.
Among other things, we discussed supplying parameters to such inline
special blocks. This suggestion is essentially equivalent, except you
give a slightly different syntax.

> alternatively:
>
>   $()...
>
> other than the basics, such as sexp, i do NOT care about the details
> of the $[] low level syntax in general OR the arglist details in this
> particular case.  those can change according to consensus or
> implementation needs etc.  instead, it is getting the concept across
> that matters to me.  one key thing about es is that when we want a new
> feature, we do not need new org syntax for that new feature.  OR for
> new subfeatures.  we just do something like this:
>
>   $[extended-timestamp :whatever yes :displays-as interval]
>
> or whatever.  this has nothing to do with bold emphasis.  it is an
> unrelated feature, using the same outer syntax.  another completely
> unrelated feature i'd strongly like, for emacs in general, is id
> markers.  that too can be done with this syntax.

I feel like generalizing syntax to arbitrary inline object types is a
bit too much **at this point of time**. Yes, we can do this, but a lot
of places in Org codebase depend on the existing syntax. It is not easy
to extend, for example, the code dealing with timestamps, to work with
arbitrary timestamp-like objects. Too many things are hard-coded -
changing them will be a humongous amount of work.

> merely saying that once implemented, could solve this problem and ALSO
> later problems.  in fact, we discussed coloring of text using this
> syntax.  although with various understandings of it.  that's kinda
> similar to emphasis.

Colouring was also one of the things I thought of when discussing inline
special blocks. Also, authored comments where we need to keep the author
metadata.

> i am not doing well so i am unlikely to be able to respond much or at
> all to queries.  please take it easy on me if this rubs you the wrong
> way.  it is just an idea and it does not have to be the answer.

Sorry to hear this now and recently. I am hoping that you get better
soon.

Best,
Ihor


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol
  2022-07-30  6:49                                 ` Ihor Radchenko
@ 2022-07-30 15:44                                   ` Max Nikulin
  0 siblings, 0 replies; 30+ messages in thread
From: Max Nikulin @ 2022-07-30 15:44 UTC (permalink / raw)
  To: emacs-orgmode

On 30/07/2022 13:49, Ihor Radchenko wrote:
> Samuel Wales writes:
>>
>>    $[emphasis :position beg :type bold :display "*"]bold text$[emphasis
>> :position end :type bold :display "*"]
> 
> This is similar to another recent idea about inline special blocks.
> Among other things, we discussed supplying parameters to such inline
> special blocks. This suggestion is essentially equivalent, except you
> give a slightly different syntax.

Samuel asked for syntax extension that allows to define some feature as 
a lisp function almost 2 decades ago:

https://list.orgmode.org/orgmode/20524da70901041233g105f372fv175a47dc9884fa43@mail.gmail.com/T/
Samuel Wales. extensible syntax. Sun, 4 Jan 2009 13:33:23 -0700



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: How to force markup without spaces
@ 2022-07-26 10:24 K
  0 siblings, 0 replies; 30+ messages in thread
From: K @ 2022-07-26 10:24 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

On Tue, 2022-07-26 at 13:30 +0700, Max Nikulin wrote:

> I have not tested it, but I expect you can use
> - export filter that removes zero-width spaces at the last export
> stage.
> I assume that your documents do not contain them besides markup
> workaround
> - #+latex_header: \DeclareUnicodeCharacter{200B}{}
> - custom link
> 
>     #+begin_src elisp :results none :exports both
>       (org-link-set-parameters
>        "sep"
>        :export (lambda (path desc backend)
>                (if (org-export-derived-backend-p backend 'org)
>                    (org-link-make-string (concat "sep:" path) desc)
>                  (or desc ""))))
>     #+end_src
>     "中文[[sep:][*测*]]试"

I tested the second workaround, and replaced the \DeclareUnicodeCharacter{200B}{} sequence with \newunicodechar{​}{} sequence since I am using xelatex, which does not support the former.
It works fine so far.

> In other thread we are discussing advantages and problems of
> switching
> from PdfLaTeX to LuaLaTeX for non-latin scripts. The latter is a
> Unicode
> engine. I am curious what is your opinion from standpoint of Chinese
> language, namely amount of required customization in both cases. I
> think, it is better to either start a dedicated thread, or find the
> part
> of discussion related to fonts and babel (LaTeX package) setup.

As far as I know, Chinese users commonly use ctex package https://ctan.org/pkg/ctex to handle Chinese typesetting problem, and they prefer xelatex and lualatex over pdflatex. They don't support more fonts when using pdflatex, compared with using xelatex etc. (you can see that on page 7 of their pdf document). So I just use xelatex and don't have much experience using pdflatex.

When using ctex, you just need to declare \documentclass{ctexart} (ctexart is a ctex version article) to use Chinese characters. Then if your system has the required default fonts, the pdf documents should be OK.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-07-30 15:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-19  5:32 How to force markup without spaces cinsky
2012-11-19  7:11 ` Vladimir Lomov
2012-11-19 10:06   ` Seong-Kook Shin
2012-11-19 14:40     ` Suvayu Ali
2012-12-13 21:26       ` Bastien
2022-07-25 17:50         ` K
2022-07-25 18:27         ` K
2022-07-25 19:02           ` K
2022-07-26  1:26             ` Ihor Radchenko
2022-07-26  2:23               ` Max Nikulin
2022-07-26  4:26                 ` K K
2022-07-26  6:30                   ` Max Nikulin
2022-07-26 12:59                   ` [PATCH] org-export: Remove zero-width space escapes during export Ihor Radchenko
2022-07-26 14:25                     ` Timothy
2022-07-26 15:27                       ` András Simonyi
2022-07-26 16:38                     ` Max Nikulin
2022-07-27  3:30                     ` Max Nikulin
2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
2022-07-28 15:34                       ` Max Nikulin
2022-07-29  1:43                         ` Ihor Radchenko
2022-07-29  2:50                           ` Max Nikulin
2022-07-29  9:06                             ` [PATCH v2] " Ihor Radchenko
2022-07-30  0:22                               ` Samuel Wales
2022-07-30  4:12                                 ` Samuel Wales
2022-07-30  6:49                                 ` Ihor Radchenko
2022-07-30 15:44                                   ` Max Nikulin
2022-07-28 22:20                       ` [PATCH] " Tim Cross
2022-07-29  0:32                       ` Juan Manuel Macías
2022-07-29  5:49                       ` tomas
2022-07-26 10:24 How to force markup without spaces K

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).