From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id CK9cN9CRMV/OaQAA0tVLHw (envelope-from ) for ; Mon, 10 Aug 2020 18:28:32 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id mKwyM9CRMV8LdwAA1q6Kng (envelope-from ) for ; Mon, 10 Aug 2020 18:28:32 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [IPv6:2607:5300:201:3100::1657]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 07F5F9403C5 for ; Mon, 10 Aug 2020 18:28:31 +0000 (UTC) Received: from [144.217.243.247] (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 7B8801FA22; Mon, 10 Aug 2020 14:28:21 -0400 (EDT) Received: from lahtoruutu.iki.fi (unknown [IPv6:2a0b:5c81:1c1::37]) by mail.notmuchmail.org (Postfix) with ESMTPS id B30DF1F9DC for ; Mon, 10 Aug 2020 14:28:18 -0400 (EDT) Received: from mithlond (mobile-access-bcee73-232.dhcp.inet.fi [188.238.115.232]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: tlikonen) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id F22C61B003B9; Mon, 10 Aug 2020 21:28:05 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1597084086; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZyBurUNrjpSVfnlDJCnFv8DPyrkC7MND6uYHsp0q6pQ=; b=uFlMhfwE8dZy06IC/JrDVIln7LczGjtejVwtvAFCgF8Q9Cq6MSygOiqtmTscFqjQZBm+YE hGD8AgdSH5uVfH3lUxnxv3qTsswVWgzDtSEh++1aeEa/O5lz3/w4BTMbt7BKqTn83j6PN+ RAjB6KZ1DVm+XEVD1wsSSiRop40xw9oW2yPIOfE2RxMG7s8EcV5pXGED7dKo5RE6mHJUHg P6A/bm2uhcKBrbdLIyM3ksuD4HG09MYp3cOmbV8Yy/C14h02bG8lLvqCRvZo7SSMLXleiD ameLAKwey2Ugn/ropiwQSLVL8eB2c8vbp6BvR4LAgZv9u5VlnkAFLx3kpcUlGg== From: Teemu Likonen To: tomi.ollila@iki.fi, notmuch@notmuchmail.org Subject: Sanitize bidi control chars In-Reply-To: <87v9hqv4a0.fsf@iki.fi> References: <874kpfq14z.fsf@iki.fi> <20200807044641.3745-1-tlikonen@iki.fi> <87v9hqv4a0.fsf@iki.fi> Date: Mon, 10 Aug 2020 21:27:59 +0300 Message-ID: <87sgcuuzio.fsf@iki.fi> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1597084086; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZyBurUNrjpSVfnlDJCnFv8DPyrkC7MND6uYHsp0q6pQ=; b=SeQNul8YxpP8uOOQh4VIBy9M7O02dQPjG79d49/14SbyBC2yEoBQ3qS7e60DBhuT6MbBUC bFVEiigNtmqYlZbeKZjcHsSOpYfWxjzHC3QhXZgElCI74/7F/Uzz0K4quNtymGFusF7MWq w3BBQ2qNH6KV+vDPYLAEa41KcokGwx1GoCor2dMIX1CMZdd8K0lQFz58InozjUylm/1T/K Quxd448W5JJNEQzfC0whfObPD/JlHIqeImb+jOzWSo6K7dKDpInyu4eqQFFxkwzNKsifuD kz3FdpTTL1UCrDUNoul1DIt8b6w/Tyzdo1CqTFjlZ4BDdiKN+pubnDDfsJO2JQ== ARC-Seal: i=1; s=lahtoruutu; d=iki.fi; t=1597084086; a=rsa-sha256; cv=none; b=OmdypveG3gIARj1Eu7DDym+/MS+/86PtuS2CEQTu3iUueyWUqxURMs27gMx+PXexxg50di y8VhVrZbmyeTY6ncn5KUqhXZoYJWuk1P68l7E2EWqWMD8J79Mv67Yc0HOLGzG1uhbMxLU2 g9Tx9KJXlfE0+sFhbcTD5fcgt6Dyq9LH/b+7yzqMKRNFR6dD+1w7soaZojw8BLf8RQv5wi RCcr0t/U0WAmLyC4D3h5VWu4ZhKou+zeBA0FNdTA9IJkGJOTB0ILBkqcRMUkWLoiUwTCHn 0SVSPstGnE9hW9AyO4LbeANCCWwC+8VJMZdV6lw67uOzulWB2f8hIwaBp8PDjQ== ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=tlikonen smtp.mailfrom=tlikonen@iki.fi Message-ID-Hash: QBXGPEVLJCTRLIHJS6RYWZPXABYNOTCZ X-Message-ID-Hash: QBXGPEVLJCTRLIHJS6RYWZPXABYNOTCZ X-MailFrom: tlikonen@iki.fi X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: multipart/mixed; boundary="===============5034040843572619771==" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (body hash did not verify) header.d=iki.fi header.s=lahtoruutu header.b=uFlMhfwE; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2607:5300:201:3100::1657 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Spam-Score: -0.07 X-TUID: eUBxH9xNFmWz --===============5034040843572619771== Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable * 2020-08-10 19:45:11+03, Teemu Likonen wrote: > If we wanted to clean message headers from possible unpaired overrides > we should clean all these: > > U+202A LEFT-TO-RIGHT EMBEDDING (push) > U+202B RIGHT-TO-LEFT EMBEDDING (push) > U+202C POP DIRECTIONAL FORMATTING (pop) > U+202D LEFT-TO-RIGHT OVERRIDE (push) > U+202E RIGHT-TO-LEFT OVERRIDE (push) > > Or we could even try to be clever and count those characters and then > insert or remove some of them so that there are as many "push" > characters as "pop" characters. Below is an example Emacs Lisp function to balance those "push" and "pop" bidi control chars. This kind of code could be used to sanitize message headers or any arbitrary text coming from user. I'm not even sure if such thing should be done in Emacs or in lower level Notmuch code. Anyway, I tried to add it to notmuch-sanitize function. Now Tomi's message didn't switch direction of other text anymore (in notmuch-search-mode buffer). (defun notmuch-balance-bidi-ctrl-chars (string) (let ((new nil) (stack-count 0)) (cl-flet ((push-char-p (c) ;; U+202A LEFT-TO-RIGHT EMBEDDING ;; U+202B RIGHT-TO-LEFT EMBEDDING ;; U+202D LEFT-TO-RIGHT OVERRIDE ;; U+202E RIGHT-TO-LEFT OVERRIDE (cl-find c '(?\x202a ?\x202b ?\x202d ?\x202e))) (pop-char-p (c) ;; U+202C POP DIRECTIONAL FORMATTING (eql c ?\x202c))) (cl-loop for char across string do (cond ((push-char-p char) (cl-incf stack-count) (push char new)) ((and (pop-char-p char) (cl-plusp stack-count)) (cl-decf stack-count) (push char new)) ((and (pop-char-p char) (not (cl-plusp stack-count))) ;; The stack is empty. Ignore this pop char. ) (t (push char new))))) ;; Add missing pops. (cl-loop repeat stack-count do (push ?\x202c new)) (seq-into (nreverse new) 'string))) =2D-=20 /// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/ // OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450 --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIYEARYIAC4WIQTJW2wqtelxC1gHdbitnXWr7pTCcwUCXzGRrxAcdGxpa29uZW5A aWtpLmZpAAoJEK2ddavulMJzj3gBAOCGE9J2IxQBsBnZivccAut8F345p1hplSVk 4vTxiUjtAQC6LXeaHOq0OxyefFu1nSrAUuIk0PmFL4whq0dHHxZWBg== =9vPZ -----END PGP SIGNATURE----- --=-=-=-- --===============5034040843572619771== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============5034040843572619771==--