* 2020-08-16 19:28:51+03, Tomi Ollila wrote:

> Good stuff -- implementation looks like port of the php code in 
>
>    https://www.iamcal.com/understanding-bidirectional-text
>
> to emacs lisp... anyway nice implementation took be a bit of
> time for me to understand it...

I don't read PHP and didn't try to read that code at all but the idea is
simple enough.

> thoughts
>
> - is it slow to execute it always, pure lisp implementation;
>   (string-match "[\u202a-\u202e]") could be done before that.
>   (if it were executed often could loop with `looking-at`
>    (and then moving point based on match-end) be faster...

I don't see any speed issues but if we want to optimize I would create a
new sanitize function which walks just once across the characters
without using regular expressions. But currently I think it's
unnecessary micro optimization.

> - *but* adding U+202C's in `notmuch-sanitize` is doing it too early, as
>   some functions truncate the strings afterwards if those are too long
>   (e.g. `notmuch-search-insert-authors`) so those get lost.. 

Good point. This would mean that we shouldn't do "bidi ctrl char
balancing" in notmuch-sanitize. We should call the new
notmuch-balance-bidi-ctrl-chars function in various places before
inserting arbitrary strings to buffer and before combining such strings
with other strings.

> (what I noticed when looking `notmuch-search-insert-authors` that it uses
>  `length` to check the length of a string -- but that also counts these bidi
>  mode changing "characters" (as one char). `string-width` would be better
>  there -- and probably in many other places.)

Yes, definitely string-width when truncating is based on width and when
using tabular format in buffers. With that function zero-width
characters really have no width.

-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450