From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id kMQKFBFm1GEFSwEAgWs5BA (envelope-from ) for ; Tue, 04 Jan 2022 16:21:53 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id yG91ERFm1GHREgAA9RJhRA (envelope-from ) for ; Tue, 04 Jan 2022 16:21:53 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0640F28DC9 for ; Tue, 4 Jan 2022 16:21:53 +0100 (CET) Received: from localhost ([::1]:56054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n4ldH-0005lM-Oa for larch@yhetil.org; Tue, 04 Jan 2022 10:21:51 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54774) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n4lOL-0008LW-Iv for emacs-orgmode@gnu.org; Tue, 04 Jan 2022 10:06:25 -0500 Received: from mout01.posteo.de ([185.67.36.65]:49457) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n4lOH-0006dD-U8 for emacs-orgmode@gnu.org; Tue, 04 Jan 2022 10:06:25 -0500 Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id 4830F240027 for ; Tue, 4 Jan 2022 16:06:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1641308779; bh=aez9DixomTYR8i7geY1r8XzfBjGiU7hV+W6ozpaX4Jk=; h=From:To:Subject:Date:From; b=TCXE+tvaXYKukbsHOBcP5X+f3oMbS4E9S8jDr5YLGHAZoAV4cIhIOzKKIyddoxc1P kFoeBQV9Ku9gwMHOtQ66k2bpIAh1bqQ3T96vcsuXjfgB67uCnvZX+v2jfBokc8oP0X tNI2yZOanrFuvdBHSd9BX5KT5+TDBxC9xSCNIrXBJuq2yVXfqzEWmqWL4si00ETvy3 RvvgCxmoCELmDac16u7KvBgkDHVY8sIrj526P6pXM65WJEJzbETp3ib3bcmmVqlnVV q9X6CTW+BQqTCKTozciiVXjDasbajpF0nbNuLXeoTbiKhQ7+pc/fKi/WFCfTVWnt86 rU3Eo6Q/T74RQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4JSwtL5Q6vz6tph for ; Tue, 4 Jan 2022 16:06:18 +0100 (CET) From: =?utf-8?Q?Juan_Manuel_Mac=C3=ADas?= To: orgmode Subject: Re: A simple Lua filter for Pandoc References: <875yqzu7rx.fsf@posteo.net> Date: Tue, 04 Jan 2022 15:06:16 +0000 In-Reply-To: (Max Nikulin's message of "Tue, 4 Jan 2022 21:05:54 +0700") Message-ID: <87o84r4k2f.fsf@posteo.net> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=185.67.36.65; envelope-from=maciaschain@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1641309713; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=aez9DixomTYR8i7geY1r8XzfBjGiU7hV+W6ozpaX4Jk=; b=DjYBJGLRevUn9OMVgYtY6aDFagUck6aN8XUwUAWt++c6I62gw3UBrSgBKxfv3zQQNocPNI prQJu3xRyhkyK+td4EEf5FlI+Pl/BxNWNy2jGbqchjabv9hMQ/WZ2+WgcYNxcMtcUfX71g xTO6oyocaYnEawcg9P6zw7ZedXk8b+zBS9b2H5KROO3qvlh/DZzvhAYIuTmCCsgfPcc+Lf 7avINdO5xxUziHp4tyrR595rPq+tmLSkFU4xzCxMX5vXA5SYbe1iCjcugobzXmiOqAqofr wFCQFuv9TvDx+lfPFj84nLSVjQaVtN63hB2tOCkkVuX4fXaI24ssFhYIz6KqJA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1641309713; a=rsa-sha256; cv=none; b=SqoeJWn3vvacorCFSFAk8yuqLgbuoWho9QK4SKpg0tYnrMdhFeC8Z0CpVrKTfkqe7pHFnC 5SgO+x/OH2VKJvOa5iAAHNTQnptQ+OVqAxDSNVl5DYBx/jzNXaut5Ndr1VjfbS/3KLViZt ZzfGbrfOF5S0UjrClof6MKZSOk/agvnayKKVSSNw/jPeUqhNwBt2AREO0M8Ij10PIZN1HZ 4fL8u9kS0F9GnEVL+kl64k7fk1oUXPgusMkQO6pcK5iWnKBXafwW7YiuLAbx52Cy4qLfQ1 GL0d+kJzm9LMO2Fp0WHZglJMBIQ50UDWDYmkQ9RdUXT/dLXO+CWPDk25vVnexA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=TCXE+tva; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.49 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=TCXE+tva; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 0640F28DC9 X-Spam-Score: -4.49 X-Migadu-Scanner: scn1.migadu.com X-TUID: xGzEqLcQbXDL Max Nikulin writes: > Ideally it should be done pandoc and only if it causes incorrect > parsing of org markup. NBSP, probably, should be replaced by some > exporters, I do not think, it is a problem e.g. in HTML files. The reason for this filter is my own comfort. Linguistics texts contains a lot of certain characters such as "/" or "*", and they are often italicized or bold. So, in order not to be more confused than necessary, I prefer that they pass as entities. In general, there are certain characters that I am more comfortable working with as entities than as literal characters (for example, a lot of zero-width combining diacritics that are used a lot in linguistics or epigraphy (and there are no fonts that include the NFC normalized version of all possible combinations: in fact, they are not in Unicode, and would have to go to the private use area). Summarizing, I prefer that these characters have their actual typographic representation only with LuaTeX. A very typical example is the character U+0323 (COMBINING DOT BELOW). It is very uncomfortable to work /in situ/, although there are fonts that usually render it well (with the 'mark' otf tag). (Naturally, I have to do, inside Org, a lot of corrections in italics later, due to the bad habit that Word users have of applying direct formatting. Interestingly only the pandoc docx reader trims the emphasis before exporting to Org or Markdown, so as not to produce things like "/ foo /". But the odt reader doesn't. I don't know if I'm missing something.