From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id qGMoHFirN19rOAAA0tVLHw (envelope-from ) for ; Sat, 15 Aug 2020 09:31:04 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 42obGFirN19qZwAA1q6Kng (envelope-from ) for ; Sat, 15 Aug 2020 09:31:04 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1A558940776 for ; Sat, 15 Aug 2020 09:31:04 +0000 (UTC) Received: from [144.217.243.247] (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id EB60629BA4; Sat, 15 Aug 2020 05:30:51 -0400 (EDT) Received: from meesny.iki.fi (meesny.iki.fi [195.140.195.201]) by mail.notmuchmail.org (Postfix) with ESMTPS id 5ECA229B8C for ; Sat, 15 Aug 2020 05:30:47 -0400 (EDT) Received: from mithlond.arda (mobile-access-bceeee-200.dhcp.inet.fi [188.238.238.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: tlikonen) by meesny.iki.fi (Postfix) with ESMTPSA id B162920581; Sat, 15 Aug 2020 12:30:44 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1597483844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k/bXbNEBdsImu9uAH26+oV2tIps/5+pmvvtcXBax3xY=; b=fkQcNvzyznnEr+QznwGE6R6QHB6JtmBErGgj6d5HPXrsZUuxJziZm9IcQqt2OUw5N5xtHf MZLW/BHov1qaMdoGCYcMOnSexWP5vVISLwssIU30KiIxQIUTojCJ6aRsbUukItVCIiVifl 8NdHZZDHFjqL5Qs6HbaTz9pvXSCWv/A= From: Teemu Likonen To: notmuch@notmuchmail.org Subject: [PATCH 1/2] Emacs: Add a new function for balancing bidi control chars Date: Sat, 15 Aug 2020 12:30:35 +0300 Message-Id: <20200815093036.5930-2-tlikonen@iki.fi> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200815093036.5930-1-tlikonen@iki.fi> References: <20200815093036.5930-1-tlikonen@iki.fi> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1597483844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k/bXbNEBdsImu9uAH26+oV2tIps/5+pmvvtcXBax3xY=; b=KPBlzcy+S9A2HxtQRcDoJLDD4k0O1gV6ijibn1YYDdYu0tjNMOzGnnS5YqTvJypJZLlw+A 2imzfUN3fNhHyBs6KGTD1/7FgCN9KG4wdf0pVnzEOfwcFcf19xax39OyrEGQSGIEB3O9R+ QF68w4LEYYzpwJvBoo2x6SBQY37OUww= ARC-Seal: i=1; s=meesny; d=iki.fi; t=1597483844; a=rsa-sha256; cv=none; b=Ga/E25kGQsbk3GSqIaYRyHQJE9RFe5+D1xajFRvu7L6rpNBkKVTxv2W4O9Pz+Cpyj+qbPP nxRVLHkAkFIY9PXDkTWolj2Hdr6nEvHdzBjSJ06dMuRqL5wKSQeGdR5XtPnI3kwTgkMOXt f0abw4NMG75X9xVZS8jePHa3rXqUIjk= ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=tlikonen smtp.mailfrom=tlikonen@iki.fi Message-ID-Hash: 7RRCU2S6OHNUTY3XH4TV6Y5MEC4Q2M76 X-Message-ID-Hash: 7RRCU2S6OHNUTY3XH4TV6Y5MEC4Q2M76 X-MailFrom: tlikonen@iki.fi X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: tomi.ollila@iki.fi X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (body hash did not verify) header.d=iki.fi header.s=meesny header.b=fkQcNvzy; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Spam-Score: 4.53 X-TUID: QO0MYKIZkV3Y The following Unicode's bidirectional control chars are modal so that they push a new bidirectional rendering mode to a stack: U+202A LEFT-TO-RIGHT EMBEDDING U+202B RIGHT-TO-LEFT EMBEDDING U+202D LEFT-TO-RIGHT OVERRIDE U+202E RIGHT-TO-LEFT OVERRIDE Every mode must be terminated with with character U+202C POP DIRECTIONAL FORMATTING which pops the mode from the stack. The stack is per paragraph. A new text paragraph resets the rendering mode changed by these control characters. This change adds a new function "notmuch-balance-bidi-ctrl-chars" which reads its STRING argument and ensures that all push characters (U+202A, U+202B, U+202D, U+202E) have a pop character pair (U+202C). The function may add more U+202C characters at the end of the returned string, or it may remove some U+202C characters. The returned string is safe in the sense that it won't change the surrounding bidirectional rendering mode. This function should be used when sanitizing arbitrary input. --- emacs/notmuch-lib.el | 54 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index 118faf1e..e6252c6c 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -469,6 +469,60 @@ be displayed." "[No Subject]" subject))) + +(defun notmuch-balance-bidi-ctrl-chars (string) + "Balance bidirectional control chars in STRING. + +The following Unicode's bidirectional control chars are modal so +that they push a new bidirectional rendering mode to a stack: +U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING, +U+202D LEFT-TO-RIGHT OVERRIDE and U+202E RIGHT-TO-LEFT OVERRIDE. +Every mode must be terminated with with character U+202C POP +DIRECTIONAL FORMATTING which pops the mode from the stack. The +stack is per paragraph. A new text paragraph resets the rendering +mode changed by these control characters. + +This function reads the STRING argument and ensures that all push +characters (U+202A, U+202B, U+202D, U+202E) have a pop character +pair (U+202C). The function may add more U+202C characters at the +end of the returned string, or it may remove some U+202C +characters. The returned string is safe in the sense that it +won't change the surrounding bidirectional rendering mode. This +function should be used when sanitizing arbitrary input." + + (let ((new-string nil) + (stack-count 0)) + + (cl-flet ((push-char-p (c) + ;; U+202A LEFT-TO-RIGHT EMBEDDING + ;; U+202B RIGHT-TO-LEFT EMBEDDING + ;; U+202D LEFT-TO-RIGHT OVERRIDE + ;; U+202E RIGHT-TO-LEFT OVERRIDE + (cl-find c '(?\u202a ?\u202b ?\u202d ?\u202e))) + (pop-char-p (c) + ;; U+202C POP DIRECTIONAL FORMATTING + (eql c ?\u202c))) + + (cl-loop for char across string + do (cond ((push-char-p char) + (cl-incf stack-count) + (push char new-string)) + ((and (pop-char-p char) + (cl-plusp stack-count)) + (cl-decf stack-count) + (push char new-string)) + ((and (pop-char-p char) + (not (cl-plusp stack-count))) + ;; The stack is empty. Ignore this pop character. + ) + (t (push char new-string))))) + + ;; Add possible missing pop characters. + (cl-loop repeat stack-count + do (push ?\x202c new-string)) + + (seq-into (nreverse new-string) 'string))) + (defun notmuch-sanitize (str) "Sanitize control character in STR. -- 2.20.1