From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
Received: from mp10.migadu.com ([2001:41d0:8:6d80::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms5.migadu.com with LMTPS
	id cLOoMw6j42JsggAAbAwnHQ
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Fri, 29 Jul 2022 11:06:22 +0200
Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp10.migadu.com with LMTPS
	id QCyxMg6j42J5GAAAG6o9tA
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Fri, 29 Jul 2022 11:06:22 +0200
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 56E1724B88
	for <larch@yhetil.org>; Fri, 29 Jul 2022 11:06:22 +0200 (CEST)
Received: from localhost ([::1]:47142 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	id 1oHLwq-0002Hf-LC
	for larch@yhetil.org; Fri, 29 Jul 2022 05:06:20 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41898)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1oHLvu-0002HW-7x
 for emacs-orgmode@gnu.org; Fri, 29 Jul 2022 05:05:22 -0400
Received: from mail-pf1-x42c.google.com ([2607:f8b0:4864:20::42c]:46917)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1oHLvs-0003rx-Ar
 for emacs-orgmode@gnu.org; Fri, 29 Jul 2022 05:05:21 -0400
Received: by mail-pf1-x42c.google.com with SMTP id c3so4075768pfb.13
 for <emacs-orgmode@gnu.org>; Fri, 29 Jul 2022 02:05:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:in-reply-to:references:date:message-id
 :mime-version:content-transfer-encoding;
 bh=8WYihxPTvcWaR9GZ5nr9vfmYLCWiZIHYmJ5l500WCNU=;
 b=PBdrwui4eTORM8y4PgXYi8Ne8aNeHpcaS8A5VyptgY+piKb2Y+YofrFbExew824jA6
 dteu/Y7x/4WG7QE2Jo5gAMa8+zJCFgbNewNU0HzS5qlMCrhxSd1sEIgVbtUY7xWBXGAi
 sH/FKaY/ZC9V/hezHs5j8hERKTivwdkUY7jJVU5phwpk3QKosVqXbvmWuYkGc58SFSnZ
 llnbZXHFosXC7pEaGzF6xgnJ/N+CpYu8Txi9+5cAsuKqE9f47C6BJ5PYKz2nVMjdLtfI
 yq168KgCHRfuUR5YsqUDNyVMIAW/NAdWO83hovIHCvZwZ0c9deyK01VEhhtkII3uGuZF
 UrFw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date
 :message-id:mime-version:content-transfer-encoding;
 bh=8WYihxPTvcWaR9GZ5nr9vfmYLCWiZIHYmJ5l500WCNU=;
 b=BblEXLl4jjK6J/IKuFVSUAP0KSugiCObtX/Q3UCR/21aVV6xk6CsFXYAjY1JCRCQJn
 oQvD9PPYbufqI1H8/+dlaSXgH21gxpe3+zbcwcKiwh/vFQfWE+qfWnvE3igIQ42kNS1w
 J9D5h6T6fSAt+ehqo7K5AWTweLDtyGGUYutihY4m1wsLjUivJh2bbQeyk4lPMoZVk/Ca
 1zE9nyUW+wSm3001POKhOZEq2BzPI2AFmkb2+5DNVKjBIeBSRj9BLJtaS2dgJgUTdeOj
 tA7+JKhFH8Di+sO/MlbRrt7xp046zeH9ivt6B8Ox8PtEQrWgQy6bq5SAUq8q+DMPxp2v
 rv7A==
X-Gm-Message-State: AJIora9ghcvpl6LDR8UV7cqNp38Nh/jeGOkxuaMUIcdnLtxX9PJM67hv
 Lc5cCBMSqXZg4Zi07tAGPuw=
X-Google-Smtp-Source: AGRyM1uRU3HOA63AZq2yi28a0WMf4/5Q6cFBd9lCnp2LMJIJA0SquCtVykVpzRCE69uoVp3s1XkPTw==
X-Received: by 2002:a05:6a00:158e:b0:52a:e628:8b3b with SMTP id
 u14-20020a056a00158e00b0052ae6288b3bmr2856547pfk.80.1659085518318; 
 Fri, 29 Jul 2022 02:05:18 -0700 (PDT)
Received: from localhost ([115.154.175.57]) by smtp.gmail.com with ESMTPSA id
 x18-20020aa79a52000000b005254e44b748sm2264458pfj.84.2022.07.29.02.05.16
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 29 Jul 2022 02:05:17 -0700 (PDT)
From: Ihor Radchenko <yantar92@gmail.com>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: [PATCH v2] Add new entity \-- serving as markup separator/escape
 symbol
In-Reply-To: <tbvhuk$pt3$1@ciao.gmane.io>
References: <BY5PR10MB4289167298649297E045360996959@BY5PR10MB4289.namprd10.prod.outlook.com>
 <87r128d5pp.fsf@localhost> <tbnj6u$11sv$1@ciao.gmane.io>
 <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com>
 <87v8rkav2x.fsf@localhost> <87mtct9y1f.fsf@localhost>
 <tbua9i$hg4$1@ciao.gmane.io> <87mtcsn173.fsf@localhost>
 <tbvhuk$pt3$1@ciao.gmane.io>
Date: Fri, 29 Jul 2022 17:06:21 +0800
Message-ID: <878rocmgoi.fsf@localhost>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=2607:f8b0:4864:20::42c;
 envelope-from=yantar92@gmail.com; helo=mail-pf1-x42c.google.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-orgmode@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org
Sender: "Emacs-orgmode" <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
X-Migadu-Flow: FLOW_IN
X-Migadu-To: larch@yhetil.org
X-Migadu-Country: US
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1659085582;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:list-id:list-help:
	 list-unsubscribe:list-subscribe:list-post:dkim-signature;
	bh=8WYihxPTvcWaR9GZ5nr9vfmYLCWiZIHYmJ5l500WCNU=;
	b=uO8TJenP9WUMWm4NltSnUBuuZ1M2KhNITAgM4ErHIL6u3EJ0i5F2Yy96WRadnKLu6k6lhK
	1Y7liPogm3KfnDvyzXYJO6H70ZfZRdd8aMkxKrGRJIY1arDNM3UgLlG5cIr2VBYHIHMKeA
	nTjGefO17iDnzzjoc9xtWMYJ5jy6DrdZ3mRw9sc7uwEkWIIquesY7LILHpwp+ooRtKm5e3
	MCK5wlmLDZsrH6cCAlvYOobFANUKGuFxc+lJEz2TCGkspdu9N166RwNKS41Exgh24ST3oB
	I2RERinncjEUWg5tS3Aj9d5d536DkYd/R/9n1FWTpKcdMUwXFIdIvsqpo30vzQ==
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1659085582; a=rsa-sha256; cv=none;
	b=OW08HOLnzzPQq/67yJsWfcURcXY36hNU8kgmZseOBwBthU/S/ZtZwioUa1y+luem5w40of
	XqqfcLQZOeuaZOIK0/uHj/KPhcCvefD+QM3kG3qNzo4PS44ATi6065jipAKUvIm+qQheuQ
	1G+nDrHV9Bgx+1uUmmmlP5DvRyeQtS794IvuHbMP/4jnBIMUHOeGc2XP2Z54rmmuArZee7
	qpVz41N1HRM6Nl+0wyNkv2Tt3eoFMobCWsMAVEFB9gyZs2j33aXroveVt0o9AVeLfH9ENj
	8lfMyyWdT5MAu3Cpl/BCzC18Ng+pWv7kp6S8v/L1ChsHJS/ZOzI5TusqXhOaow==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=PBdrwui4;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Spam-Score: -3.22
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=PBdrwui4;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Queue-Id: 56E1724B88
X-Spam-Score: -3.22
X-Migadu-Scanner: scn0.migadu.com
X-TUID: HjLkTyKEDdqN

Max Nikulin <manikulin@gmail.com> writes:

>>> The good point in your patch is that \- is still work as shy hyphen
>>> (that, by the way, may be used in some cases instead of zero width
>>> space: *intra*\-word). On the other hand I have managed to find a case
>>> when your approach is not ideal:
>>>
>>> *\--scratch\--*
>>>
>>> <p>
>>> <b>&#x00ad;-scratch</b></p>
>>=20
>> Well. I think that it is impossible to use the same escape construct to
>> both force emphasis and escape it.
>
> Let's articulate the problem as follows: when some characters ("*". "/".=
=20
> etc.) besides used literally are overloaded with 2 additional roles that=
=20
> are start emphasis group and terminate emphasis group, in addition to=20
> lightweight markup heuristics, it is necessary to provide a way to=20
> disambiguate which of 3 roles is associated with particular character.
>
> "Activate" and "deactivate" characters or entities for emphasis markers=20
> are alternative and perhaps not so clear terms have used before.
>
> The advantage of zero width space is that "[:space:]" is part of=20
> PREMATCH and POSTMATCH (outer) regexps in=20
> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the=20
> inner borders of emphasized span of text. The latter is mostly=20
> meaningful, however I am unsure if bold space has the same width as=20
> regular one, and space in fixed width font is certainly distinct.
>
> The problem with the "\--" entity is that it is not handled properly at=20
> the start of emphasis region. It neither disables emphasis nor parsed as=
=20
> complete entity, instead it becomes combination of "\-" shy hyphen and=20
> literal "-".
>
> Unsure if it can be solved consistently. Possible ways:
> - It addition to space-like (in respect to current regexp) entity add=20
> another one that acts as a part of word, but like "\--" stripped from=20
> output. Likely it should be accompanied by more changes in the parser=20
> and regexps.
> - Provide some new explicit syntax for literal character, start of=20
> emphasis group, end of emphasis group.

The fact that \-- was not parsed in your example is because entities
cannot be directly followed by a letter (see 12.4 Special Symbols).

You need

*\--{}scratch\--*

Concerning the 3 listed roles of the *_/+ markup, I propose to simplify
the problem a bit and not try to make \-- serve as a proper escape symbol.
Instead, we can document the already existing quoting entities:

 ("slash" "/" nil "/" "/" "/" "/")
 ("plus" "+" nil "+" "+" "+" "+")
 ("under" "\\_" nil "_" "_" "_" "_")
 ("equal" "=3D" nil "=3D" "=3D" "=3D" "=3D")
 ("star" "\\star" t "*" "*" "*" "=E2=8B=86")

Then, your example should better be written as

\star{}scratch\star

\-- may better work between markup, not inside.

> Concerning zero width space workaround, I may be wrong, but Nicolas=20
> might consider using U+200B zero width space as the escape character for=
=20
> itself: single one is filtered out during export, double zero width=20
> space becomes single character. (I do not like this kind of "white=20
> space" programming language".)

This is too complex, IMHO.
If desired, we can again go the entity road and introduce
\zws entity.

Note that we already have

 ("nbsp" "~" nil "&nbsp;" " " "=C2=A0" "=C2=A0")
 ("ensp" "\\hspace*{.5em}" nil "&ensp;" " " " " "=E2=80=82")
 ("emsp" "\\hspace*{1em}" nil "&emsp;" " " " " "=E2=80=83")
 ("thinsp" "\\hspace*{.2em}" nil "&thinsp;" " " " " "=E2=80=89")

Generally, it is a good idea to advertise entities in the manual.
Zero-width space is not only limited, it is impossible to use, e.g. in
tables when you want to quote "|". The only solution is using \vert or
\vbar entity.

> Another question is whether U+2060 word=20
> joiner (or some other character) should be added either as alternative=20
> to zero width space or to allow =3D    verbatim    =3D fixed width text=20
> surrounded by fixed width spaces.

This particular example is tricky.
If we put escape symbol _inside_ the verbatim, it is never possible to
know if the user intents to use that symbol literally or not.
But non-space before/after opening/closing markup char is hard-coded and
changing it is fragile.

Instead of using some kind of "escape" symbol here, I suggest turning to
the idea about inline special blocks. We can introduce a more verbose
markup that will allow spaces inside at the beginning/end of the
contents.

https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net
Manuel Mac=C3=ADas [ML:Org mode] (2022) About 'inline special blocks'

Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or
something similar, with _name{...}name_ being inline special block.

Best,
Ihor