Eli Zaretskii <eliz@gnu.org> writes:

>> From: Pip Cet <pipcet@gmail.com>
>> Cc: 41506@debbugs.gnu.org
>> Date: Sat, 06 Jun 2020 07:58:24 +0000
>> 
>> when we're called with bidi_it->first_elt = true, it's possible we
>> shouldn't touch bidi_it->new_paragraph at all...
>
> Can you elaborate on why you think that?

Sorry, I shouldn't have said "touch" there. I meant "set", though I no
longer think so.

> first_elt can be set when we are at the beginning of a paragraph or
> when we are in the middle of it, so its meaning is different from that
> of new_paragraph.

Indeed.

>> +	 paragraph might start.  But don't do that for the first
>> +	 element since this function will be called twice in that
>> +	 case.  */
>
> Which code causes the two calls, and why is that significant in this
> case?

Maybe this code would be clearer:

      if (!bidi_it->first_elt)
	{
	  bytepos++;
	  pos++;
	}

We always look at the paragraph containing the next character to be
loaded by bidi_level_of_next_char. If first_elt is set, that is the
current character; otherwise, it's the one after that.

In the "\n\nש" case, this happens:

1. bidi_paragraph_init is called with first_elt = 1 at buffer position 1
2. new_paragraph is cleared to false
3. bidi_at_paragraph_end is called for buffer position 2. That looks
like a line ending a paragraph, though it's actually a line starting the
next paragraph. Still, it returns true.
4. new_paragraph is set again
5. bidi_paragraph_init is called with first_elt = 0 at buffer position 1

So everything happens to work in this case, even though several of the
assumptions in the bidi code are violated.  The code is written to
assume paragraphs contain at least two characters: that assumption means
it's valid for bidi_paragraph_init to clear new_paragraph. In this case,
it's not, but the next line we're looking at, while not actually ending
a paragraph, looks like it is...

What I'm not sure about is "\n \nש". It could be either a single
two-line paragraph followed by ש, or a single-character paragraph
followed by another paragraph whose first line happens to contain only a
space character; in the first case, paragraph orientation would default
to L2R, in the second case, it would be R2L. Do you happen to know what
Unicode says for this case?