unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Teemu Likonen <tlikonen@iki.fi>
To: tomi.ollila@iki.fi, notmuch@notmuchmail.org
Subject: Sanitize bidi control chars
Date: Mon, 10 Aug 2020 21:27:59 +0300	[thread overview]
Message-ID: <87sgcuuzio.fsf@iki.fi> (raw)
In-Reply-To: <87v9hqv4a0.fsf@iki.fi>


[-- Attachment #1.1: Type: text/plain, Size: 2324 bytes --]

* 2020-08-10 19:45:11+03, Teemu Likonen wrote:

> If we wanted to clean message headers from possible unpaired overrides
> we should clean all these:
>
>     U+202A LEFT-TO-RIGHT EMBEDDING (push)
>     U+202B RIGHT-TO-LEFT EMBEDDING (push)
>     U+202C POP DIRECTIONAL FORMATTING (pop)
>     U+202D LEFT-TO-RIGHT OVERRIDE (push)
>     U+202E RIGHT-TO-LEFT OVERRIDE (push)
>
> Or we could even try to be clever and count those characters and then
> insert or remove some of them so that there are as many "push"
> characters as "pop" characters.

Below is an example Emacs Lisp function to balance those "push" and
"pop" bidi control chars. This kind of code could be used to sanitize
message headers or any arbitrary text coming from user.

I'm not even sure if such thing should be done in Emacs or in lower
level Notmuch code. Anyway, I tried to add it to notmuch-sanitize
function. Now Tomi's message didn't switch direction of other text
anymore (in notmuch-search-mode buffer).


(defun notmuch-balance-bidi-ctrl-chars (string)
  (let ((new nil)
        (stack-count 0))

    (cl-flet ((push-char-p (c)
                ;; U+202A LEFT-TO-RIGHT EMBEDDING
                ;; U+202B RIGHT-TO-LEFT EMBEDDING
                ;; U+202D LEFT-TO-RIGHT OVERRIDE
                ;; U+202E RIGHT-TO-LEFT OVERRIDE
                (cl-find c '(?\x202a ?\x202b ?\x202d ?\x202e)))
              (pop-char-p (c)
                ;; U+202C POP DIRECTIONAL FORMATTING
                (eql c ?\x202c)))

      (cl-loop
       for char across string
       do (cond ((push-char-p char)
                 (cl-incf stack-count)
                 (push char new))
                ((and (pop-char-p char)
                      (cl-plusp stack-count))
                 (cl-decf stack-count)
                 (push char new))
                ((and (pop-char-p char)
                      (not (cl-plusp stack-count)))
                 ;; The stack is empty. Ignore this pop char.
                 )
                (t (push char new)))))

    ;; Add missing pops.
    (cl-loop
     repeat stack-count
     do (push ?\x202c new))

    (seq-into (nreverse new) 'string)))



-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



      reply	other threads:[~2020-08-10 18:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03  6:13 Message headers in right-to-left mode in Emacs notmuch-show-mode Teemu Likonen
2020-08-03  7:37 ` Teemu Likonen
2020-08-04 12:16   ` [PATCH v2] Emacs: Force left-to-right display for message headers Teemu Likonen
2020-08-04 18:02     ` [PATCH v3] " Teemu Likonen
2020-08-05  8:45       ` David Edmondson
2020-08-05  9:40         ` Teemu Likonen
2020-08-05  9:50           ` Teemu Likonen
2020-08-05 10:03           ` David Edmondson
2020-08-05 10:32             ` [PATCH v4] " Teemu Likonen
2020-08-06 12:04               ` David Bremner
2020-08-06 14:50                 ` Teemu Likonen
2020-08-06 14:59                   ` Teemu Likonen
2020-08-07  4:46                   ` [PATCH v5] Emacs: Ensure " Teemu Likonen
2020-08-07  9:23                     ` David Edmondson
2020-08-08 19:21                     ` David Bremner
2020-08-09 20:12                     ` =?utf-8?Q?T=E2=80=AEomi?= Ollila
2020-08-10  8:43                       ` David Edmondson
2020-08-10 16:45                       ` Teemu Likonen
2020-08-10 18:27                         ` Teemu Likonen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sgcuuzio.fsf@iki.fi \
    --to=tlikonen@iki.fi \
    --cc=notmuch@notmuchmail.org \
    --cc=tomi.ollila@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).