From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id YALZBRB65GIkHQEAbAwnHQ (envelope-from ) for ; Sat, 30 Jul 2022 02:23:44 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id 2Nv2BBB65GI8RwEAG6o9tA (envelope-from ) for ; Sat, 30 Jul 2022 02:23:44 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 82C71B30C for ; Sat, 30 Jul 2022 02:23:43 +0200 (CEST) Received: from localhost ([::1]:41876 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oHaGb-0007eq-KR for larch@yhetil.org; Fri, 29 Jul 2022 20:23:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33294) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oHaFQ-0007eT-AX for emacs-orgmode@gnu.org; Fri, 29 Jul 2022 20:22:28 -0400 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]:45784) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oHaFJ-00010h-B5 for emacs-orgmode@gnu.org; Fri, 29 Jul 2022 20:22:23 -0400 Received: by mail-lf1-x12a.google.com with SMTP id c7so4793534lfr.12 for ; Fri, 29 Jul 2022 17:22:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc; bh=iQSKYNQ0CYMR9Yqr9WBwAmH0IBQiKJntV9N5ybwwFPs=; b=Prf5m/o6gn/z3qCq5qeKtPGKKqXzk05MAChzjZJ2bQ4LLychcvXzbNIOCaWOg4jfst n/sIwYCJYKKYc2u33gV2V2/tLaXgvrxdR7EqqDPvH+qobji83Ro2zQLwOJ+VV0BeGoLT g0XxMOw2qWINZAZNur4WUcJ9dPxqOLp6lPxDjlWjG4AWMb1wkSoi+9z5QYanRo976L/I 2lkDsUbFaiMBmDuaWMbmMdnPeKDiypU0u6JFre3uwfsFSPGfsBrbRUX+bvbuJdACP+cY Zyejmfr1jgyUsVmqOe/F6vcrwN56YkLRiZ12M03RdPtwmqsnjkx9WiHrUjys/ydwVENI f9Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc; bh=iQSKYNQ0CYMR9Yqr9WBwAmH0IBQiKJntV9N5ybwwFPs=; b=Fcb8WgxlQ4FcWLi11tXFOyAfN+1pHBKxqAMsIkgMC1ioI0MbmeRMhss+PZrBfHZOIB ByB9eAGOoT2vYwOHX5FQi8XC0xCFvmeL51XST7ya7Vm/7xCP15jqRdh+p9ZXJmbfNPXY gd86zSv+VPI4BtxuG4ro+cKBpWYDDDy8nyUVyhLbhj8XuIIjVcqZCIdzfDCyXILpOByu Kj5t1tfk6lVT3FowwqIZswWtTKu3iVAleoGzV9f5+eKkvOjqPqoAVxTc5YPGkXfeMnVe UpXyEJWhQ/b44tzBVVcI3IokfK3QfcswLE7KnIwI9wlz9WH0DSdspDXOgRsMxhEY4R39 JXEQ== X-Gm-Message-State: AJIora8N68J64IKaCtssGugbJwxhqtqS7deCK50VV6XI6y18P7AeEpRc AUbp0dzJNmzgIz2kir+tuF1k/Neg5Q7I8zfuYa0= X-Google-Smtp-Source: AGRyM1uofj2XhDcv6mLHcF0D3jOyUYL4U1d9dK9juhpHJKS2I23xVXnkLyJmRqDgmdOuotxyafCZdzQrQth+chnHYXg= X-Received: by 2002:a19:dc4d:0:b0:489:63cb:20c7 with SMTP id f13-20020a19dc4d000000b0048963cb20c7mr1918129lfj.101.1659140538497; Fri, 29 Jul 2022 17:22:18 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:aa6:cac4:0:b0:1fe:cd23:c613 with HTTP; Fri, 29 Jul 2022 17:22:16 -0700 (PDT) In-Reply-To: <878rocmgoi.fsf@localhost> References: <87r128d5pp.fsf@localhost> <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com> <87v8rkav2x.fsf@localhost> <87mtct9y1f.fsf@localhost> <87mtcsn173.fsf@localhost> <878rocmgoi.fsf@localhost> From: Samuel Wales Date: Fri, 29 Jul 2022 17:22:16 -0700 Message-ID: Subject: Re: [PATCH v2] Add new entity \-- serving as markup separator/escape symbol To: Ihor Radchenko Cc: Max Nikulin , emacs-orgmode@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::12a; envelope-from=samologist@gmail.com; helo=mail-lf1-x12a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1659140623; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=iQSKYNQ0CYMR9Yqr9WBwAmH0IBQiKJntV9N5ybwwFPs=; b=GOHKtqx+mpiBKTKigbvJLA2aTKHJxafcB/FxD4lDiyagiax2kZZL3FlPVBk5U/9GUYu9nv oBBYFTXOc4hrtNTtJQK1cpwBz9aHQX9dAoNzN76PzCaZvbAflSDAqZI4A4GYLLCtCZUaAF hzWPFx7Z/a2B9f/h6Q+YwvprUlSU/RRe6FHacG9mRuWK7YaYFZ/amOXryIxznpPs/bl2Fc IEMgssiAcKH8tN7mdc4zka9iKWIG4cOQkwp/2TlHIbZDq/k2LhwfyL79fXwOygC3oBWhTH RNVQpT79YPaNccMjxS6KSAFwbBWEFR9AYCMprlylI5pKv0U4H6dKsM+vylVlzA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1659140623; a=rsa-sha256; cv=none; b=jLgYd1etCkjkZdgWpsESotgifRk4DU2b1ZO6d6JRldvYTAMPJ01HXdhSy8V7EMVqyAAxek jqi+3RvlzkZl2qnMUR5XvnGNky26XvVjUPU3wOCnzfmN4CR9GM/9xx8H3QjDcFjAxNVR93 saVRmu/OKqBSgyWav+AbuhcayuNWGc5DlRCXWS7odE0XX6K890Dg3765MuBOr2sHARmCL/ SRrASr92cH1YLqZE+WXGHdawECRBU1V8NPky3UVzkfnBcKdD/U3VDu490tZgeQOJXa/qrC tWDfsxj4wMKYCPQtb/hDwXWZqork69cIbWgus0fEVKttBKydvGUIfGyN5DRPJQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="Prf5m/o6"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -8.72 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="Prf5m/o6"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 82C71B30C X-Spam-Score: -8.72 X-Migadu-Scanner: scn0.migadu.com X-TUID: UotuFFrfhbBj i am not in a position to judge \-- but i like the idea of not having zws be used, and expect you have thought it out. just an idea: something approximately like this might work, or something like john kitchen's poc implementation of it might. this is called extensible syntax. one of the goals of es is to reduce the proliferation of org syntax and other stuff. es was proposed long ago, but i was unable to sufficiently follow up for unrelated reasons. i have lots of replies and lots of further work on it but that's neither here nor there in this case. [other stuff includes but is not limited to increase reusability and reliability of code to implement things you want to do with syntax such as whether to show it, add a subfeature, export it variantly in different exporters, escape it, quote it, pretty-print it, etc.; allow user to do this so org is not burdened by it; etc. terms to look up in the mailing list archives include extensible syntax, parsing risk, and id markers.] $[emphasis :position beg :type bold :display "*"]bold text$[emphasis :position end :type bold :display "*"] alternatively: $()... other than the basics, such as sexp, i do NOT care about the details of the $[] low level syntax in general OR the arglist details in this particular case. those can change according to consensus or implementation needs etc. instead, it is getting the concept across that matters to me. one key thing about es is that when we want a new feature, we do not need new org syntax for that new feature. OR for new subfeatures. we just do something like this: $[extended-timestamp :whatever yes :displays-as interval] or whatever. this has nothing to do with bold emphasis. it is an unrelated feature, using the same outer syntax. another completely unrelated feature i'd strongly like, for emacs in general, is id markers. that too can be done with this syntax. it looks verbose to 3rd party tools but is parseable by them. this example displays as * to the user. parseable as lisp sexp data using lisp tools. it is meant to be vaguely reminiscent of a cl function call while still not likely to occur naturally. it would of course not be typed by the user directly but by some completion thing. i am not doing well so i am unlikely to be able to respond much or at all to queries. please take it easy on me if this rubs you the wrong way. it is just an idea and it does not have to be the answer. merely saying that once implemented, could solve this problem and ALSO later problems. in fact, we discussed coloring of text using this syntax. although with various understandings of it. that's kinda similar to emphasis. On 7/29/22, Ihor Radchenko wrote: > Max Nikulin writes: > >>>> The good point in your patch is that \- is still work as shy hyphen >>>> (that, by the way, may be used in some cases instead of zero width >>>> space: *intra*\-word). On the other hand I have managed to find a case >>>> when your approach is not ideal: >>>> >>>> *\--scratch\--* >>>> >>>>

>>>> ­-scratch

>>> >>> Well. I think that it is impossible to use the same escape construct to >>> both force emphasis and escape it. >> >> Let's articulate the problem as follows: when some characters ("*". "/". >> etc.) besides used literally are overloaded with 2 additional roles that >> are start emphasis group and terminate emphasis group, in addition to >> lightweight markup heuristics, it is necessary to provide a way to >> disambiguate which of 3 roles is associated with particular character. >> >> "Activate" and "deactivate" characters or entities for emphasis markers >> are alternative and perhaps not so clear terms have used before. >> >> The advantage of zero width space is that "[:space:]" is part of >> PREMATCH and POSTMATCH (outer) regexps in >> `org-emphasis-regexp-components' and "[:space:]" is forbidden at the >> inner borders of emphasized span of text. The latter is mostly >> meaningful, however I am unsure if bold space has the same width as >> regular one, and space in fixed width font is certainly distinct. >> >> The problem with the "\--" entity is that it is not handled properly at >> the start of emphasis region. It neither disables emphasis nor parsed as >> complete entity, instead it becomes combination of "\-" shy hyphen and >> literal "-". >> >> Unsure if it can be solved consistently. Possible ways: >> - It addition to space-like (in respect to current regexp) entity add >> another one that acts as a part of word, but like "\--" stripped from >> output. Likely it should be accompanied by more changes in the parser >> and regexps. >> - Provide some new explicit syntax for literal character, start of >> emphasis group, end of emphasis group. > > The fact that \-- was not parsed in your example is because entities > cannot be directly followed by a letter (see 12.4 Special Symbols). > > You need > > *\--{}scratch\--* > > Concerning the 3 listed roles of the *_/+ markup, I propose to simplify > the problem a bit and not try to make \-- serve as a proper escape symbol= . > Instead, we can document the already existing quoting entities: > > ("slash" "/" nil "/" "/" "/" "/") > ("plus" "+" nil "+" "+" "+" "+") > ("under" "\\_" nil "_" "_" "_" "_") > ("equal" "=3D" nil "=3D" "=3D" "=3D" "=3D") > ("star" "\\star" t "*" "*" "*" "=E2=8B=86") > > Then, your example should better be written as > > \star{}scratch\star > > \-- may better work between markup, not inside. > >> Concerning zero width space workaround, I may be wrong, but Nicolas >> might consider using U+200B zero width space as the escape character for >> itself: single one is filtered out during export, double zero width >> space becomes single character. (I do not like this kind of "white >> space" programming language".) > > This is too complex, IMHO. > If desired, we can again go the entity road and introduce > \zws entity. > > Note that we already have > > ("nbsp" "~" nil " " " " "=C2=A0" "=C2=A0") > ("ensp" "\\hspace*{.5em}" nil " " " " " " "=E2=80=82") > ("emsp" "\\hspace*{1em}" nil " " " " " " "=E2=80=83") > ("thinsp" "\\hspace*{.2em}" nil " " " " " " "=E2=80=89") > > Generally, it is a good idea to advertise entities in the manual. > Zero-width space is not only limited, it is impossible to use, e.g. in > tables when you want to quote "|". The only solution is using \vert or > \vbar entity. > >> Another question is whether U+2060 word >> joiner (or some other character) should be added either as alternative >> to zero width space or to allow =3D verbatim =3D fixed width text >> surrounded by fixed width spaces. > > This particular example is tricky. > If we put escape symbol _inside_ the verbatim, it is never possible to > know if the user intents to use that symbol literally or not. > But non-space before/after opening/closing markup char is hard-coded and > changing it is fragile. > > Instead of using some kind of "escape" symbol here, I suggest turning to > the idea about inline special blocks. We can introduce a more verbose > markup that will allow spaces inside at the beginning/end of the > contents. > > https://orgmode.org/list/87a6b8pbhg.fsf@posteo.net > Manuel Mac=C3=ADas [ML:Org mode] (2022) About 'inline special blocks' > > Instead of using the tricky *bold text*, we may allow _*{bold text}*_ or > something similar, with _name{...}name_ being inline special block. > > Best, > Ihor > > --=20 The Kafka Pandemic A blog about science, health, human rights, and misopathy: https://thekafkapandemic.blogspot.com