From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
Received: from mp11.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms5.migadu.com with LMTPS
	id MPmbIWLm32KmWQEAbAwnHQ
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Tue, 26 Jul 2022 15:04:34 +0200
Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp11.migadu.com with LMTPS
	id IEOeIWLm32LhkQAA9RJhRA
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Tue, 26 Jul 2022 15:04:34 +0200
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 26177D649
	for <larch@yhetil.org>; Tue, 26 Jul 2022 15:04:34 +0200 (CEST)
Received: from localhost ([::1]:40608 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	id 1oGKEi-0001J4-Ks
	for larch@yhetil.org; Tue, 26 Jul 2022 09:04:32 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:42068)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1oGK8g-0007Ud-Bn
 for emacs-orgmode@gnu.org; Tue, 26 Jul 2022 08:58:29 -0400
Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]:39517)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1oGK8e-0006mM-Cd
 for emacs-orgmode@gnu.org; Tue, 26 Jul 2022 08:58:18 -0400
Received: by mail-pj1-x102f.google.com with SMTP id
 x24-20020a17090ab01800b001f21556cf48so17205155pjq.4
 for <emacs-orgmode@gnu.org>; Tue, 26 Jul 2022 05:58:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:in-reply-to:references:date:message-id
 :mime-version; bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=;
 b=Um6PpKortL4pf47q0mj7X7mvcBzUu/FOqWiFqqO4hIaA0gqa9uNhNLhYPXiF0EPrPm
 g0c/b6b5EaiRtJZ4xY/l8my8QHrlLUEP1ZqsOBHTcsUIg0aH1bBFQ2NAbtagbLQZ0C15
 s8/g/wFBzNIRqskXYM7ovmxSkMSPIXiPg1/PEUvItt/mvQQsnVVD8qq0vt5QJwcPwAwB
 AaW1XFS7TLsddoROFZNMnpUtVhDTpunlYfyDE2GPvUIkhmGpsssvxUo0shbd5etFCSw+
 WldDInBeixbCqkZaHScMG8jj+7NbBgdoIwjxsGWyZjAczfZ6NGd1S5lqw75gpFWTB5Tg
 Km9Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date
 :message-id:mime-version;
 bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=;
 b=UtSsjjdUtUOmpEKfgAHSW3siBQbFb6iQ65SAUvEj5ZOO90MAymO9PxvGi/ZtV1M5XO
 1tGprLY9Mm5DYa3Fc96/uLj/lg1Q4hFKVjbCmsaMf/R30tUgbmcuLFSg/nQuwQYVFyq9
 rfiMvoAduvbdAqrAdUUria8kD2UWKBLmaHMBkIZWV1bO+P+cWJJl3qME1n3pyHPsKx7i
 y5UZ2pF5TR4RabtuedAYDFN/pnBbjBeguwEgX7NgGxoqqgHUgLxF3TUO6TnSvCT3rF6U
 koKK9j4KcqaAHz3RiQgo6XVQEwRzMkIw5Uea8FETqBOFiiJHqvtbbIZwxAALaGytecFn
 Idpg==
X-Gm-Message-State: AJIora8npkVeSJstpYa+6envBIqbj6Kq0ryvPMdQfht1Bq7/2fd0VxUi
 CKLir/1KdipFZWw9utxbXhc=
X-Google-Smtp-Source: AGRyM1vichsawlIi7bqxYQ8R8oNMwODUvcyJzeJmG9bj6dtCOjhxeu7gAkpD8fzUuqNGGrnr5AjTkQ==
X-Received: by 2002:a17:902:ef4f:b0:16c:d1a7:19f3 with SMTP id
 e15-20020a170902ef4f00b0016cd1a719f3mr17185787plx.65.1658840294715; 
 Tue, 26 Jul 2022 05:58:14 -0700 (PDT)
Received: from localhost ([2409:8a70:217:4f80:8ec6:81ff:fe70:339d])
 by smtp.gmail.com with ESMTPSA id
 j7-20020a170903024700b0016c20d40ee7sm11717223plh.174.2022.07.26.05.58.12
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 26 Jul 2022 05:58:13 -0700 (PDT)
From: Ihor Radchenko <yantar92@gmail.com>
To: K K <k_foreign@outlook.com>
Cc: Max Nikulin <manikulin@gmail.com>,  "emacs-orgmode@gnu.org"
 <emacs-orgmode@gnu.org>
Subject: [PATCH] org-export: Remove zero-width space escapes during export
In-Reply-To: <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com>
References: <BY5PR10MB4289167298649297E045360996959@BY5PR10MB4289.namprd10.prod.outlook.com>
 <87r128d5pp.fsf@localhost> <tbnj6u$11sv$1@ciao.gmane.io>
 <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com>
Date: Tue, 26 Jul 2022 20:59:18 +0800
Message-ID: <87v8rkav2x.fsf@localhost>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Received-SPF: pass client-ip=2607:f8b0:4864:20::102f;
 envelope-from=yantar92@gmail.com; helo=mail-pj1-x102f.google.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-orgmode@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org
Sender: "Emacs-orgmode" <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
X-Migadu-Flow: FLOW_IN
X-Migadu-To: larch@yhetil.org
X-Migadu-Country: US
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1658840674;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:in-reply-to:in-reply-to:
	 references:references:list-id:list-help:list-unsubscribe:
	 list-subscribe:list-post:dkim-signature;
	bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=;
	b=DxTEKx2Mjnw5+KWxQ2de9gfzM3oY0ihk2aucO6R4iRgkSt3+rfnysCFGzem945CDJllYeE
	HOzSrz7pPIfG2hILwAVEYh+J+qUuaTXjNFVYsvvt91WvfCH0lvKVH/n2DjsqikJWqM479A
	ZInKG0Lv8uEET/NMKQdSxGaUyd7X8FI4m5OTFBo072/l8ZshlnmCm27pgvuA8WM5pLdwk0
	4B7C3WFf72CRot60V4YHLrVrXq9Eh7Uo9sWhLGYEIDL62reuG5WZtQAldu7PoxV23Oy364
	s+frePSv6URdu7mE3sYIQbn9PffsJxS6ZzAk3WwkCBR0DZ9UO9t5hWC/Wapwxw==
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1658840674; a=rsa-sha256; cv=none;
	b=X5sWE0WS+xUA480+CA/AXcehWDFQYetKt0MKP8voWa3V3pbmTDBnLBZqL6v4UQFPH593oK
	KmyUgfHohWlZ/OAzPYC22Y1wcDebC2M9r06TkbpiFTMtU3P2rTtH4ycGslb6+rWb1llX7c
	IhdCVoLt0BXZGIDu5CsPHs1KbpmipdmA7GXwljxgcWkRhHEj7CulqmYHnkcktF3iFwRzC4
	j++1xrDrATgZkg1bNpGqAO8YJzFwY69c2pqNrCoEaSMAM/sSfLAucLOiVimzlnB+tzon6V
	NrfbSMKnJtewb9MA6zWQBpzevruNJNAUO8FlSaDVw3ZxnAt3ZlHAWkuHYu5B8Q==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=Um6PpKor;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Spam-Score: -3.43
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=Um6PpKor;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Queue-Id: 26177D649
X-Spam-Score: -3.43
X-Migadu-Scanner: scn1.migadu.com
X-TUID: cAs5lXyCK6P3

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

K K <k_foreign@outlook.com> writes:

> My use case is to emphasize chinese characters without spaces being inser=
ted, even those zero-width spaces. For example "=E4=B8=AD=E6=96=87*=E6=B5=
=8B*=E8=AF=95" should be enough to emphasize "=E6=B5=8B".
>
> I am using zero-width spaces right now, and it works fine in org-mode buf=
fers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE chara=
cter will not be zero-width for certain fonts. So I hope not to use that ch=
aracter.

This is a bug. While escape symbols do not affect export in most common
scenarios, your report is adding yet another case when zero-width space
is actually altering the export result.

I am attaching a tentative patch that will make Org export remove
zero-width spaces when those spaces actually separate the object
boundaries.

Any objections?

> On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
>> Another idea we have discussed is using something similar to Markdown
>> format: **bold**, //italics//, __underline__, etc. It is less verbose
>> compared to the special blocks, which should be valuable for
>> Japanese/Chinese/other languages with no spaces between words.
>
> By the way, it seems that my use case has already been implemented by mar=
kdown-mode. In a markdown-mode buffer "=E4=B8=AD=E6=96=87**=E6=B5=8B**=E8=
=AF=95" will certainly make "=E6=B5=8B" bold.

The idea was indeed inspired by Markdown.
However, Markdown is different - **bold** is the official syntax to
indicate bold markup. Though things are more complex in reality:
https://www.markdownguide.org/basic-syntax/ Markdown has its own edge
cases.

Best,
Ihor


--=-=-=
Content-Type: text/x-patch; charset=utf-8
Content-Disposition: inline;
 filename=0001-org-export-Remove-zero-width-space-escapes-during-ex.patch
Content-Transfer-Encoding: quoted-printable

>From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001
Message-Id: <5764b41b858bff3d56dcb24741cf550a7e245d36.1658840330.git.yantar=
92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 26 Jul 2022 20:50:47 +0800
Subject: [PATCH] org-export: Remove zero-width space escapes during export

* lisp/ox.el (org-export--remove-escaped): New function removing
zero-width spaces when they separate object boundaries.
(org-export-as): Call `org-export--remove-escaped'.
* testing/lisp/test-ox.el (test-org-export/remove-escaped): New test.
---
 lisp/ox.el              | 22 ++++++++++++++++++++++
 testing/lisp/test-ox.el | 13 +++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/lisp/ox.el b/lisp/ox.el
index 40ad7ae4e..de034fd22 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data i=
nfo)
   ;; Return modified parse tree.
   data)
=20
+(defun org-export--remove-escaped (data info)
+  "Remove escape symbols from plain-text in DATA.
+DATA is a parse tree or a secondary string.  INFO is a plist
+containing export options.  It is modified by side effect and
+returned by the function."
+  (org-element-map data '(plain-text)
+    (lambda (string)
+      (let (processed-string)
+        (setq processed-string
+              (replace-regexp-in-string "\\`=E2=80=8B" "" string))
+        (setq processed-string
+              (replace-regexp-in-string "=E2=80=8B\\'" "" processed-string=
))
+        (unless (equal string processed-string)
+          (org-element-insert-before processed-string string)
+          (org-element-extract-element string))))
+    info nil nil t)
+  ;; Return modified parse tree.
+  data)
+
 ;;;###autoload
 (defun org-export-as
     (backend &optional subtreep visible-only body-only ext-plist)
@@ -3046,6 +3065,9 @@ (defun org-export-as
 	   ;; communication channel.
 	   (org-export--prune-tree tree info)
 	   (org-export--remove-uninterpreted-data tree info)
+           ;; Remove zero-width spaces that escape Org syntax
+           ;; elements.
+           (org-export--remove-escaped tree info)
 	   ;; Call parse tree filters.
 	   (setq tree
 	         (org-export-filter-apply-functions
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 7c71b6e24..ea4fce363 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted ()
 			     (section . (lambda (s c i) c))))
 	     nil nil nil '(:with-sub-superscript {}))))))
=20
+(ert-deftest test-org-export/remove-escaped ()
+  "Test removing escape symbols."
+  ;; Remove zero-width space around markup.
+  (should
+   (equal "This*is*test.\n"
+          (org-test-with-temp-text "This=E2=80=8B*is*=E2=80=8Btest.\n"
+            (org-export-as (org-test-default-backend)))))
+  ;; Do not remove zero-width space in other places.
+  (should
+   (equal "This=E2=80=8Bis=E2=80=8Btest.\n"
+          (org-test-with-temp-text "This=E2=80=8Bis=E2=80=8Btest.\n"
+            (org-export-as (org-test-default-backend))))))
+
 (ert-deftest test-org-export/export-scope ()
   "Test all export scopes."
   ;; Subtree.
--=20
2.35.1


--=-=-=--