From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#41506: 28.0.50; RTL problem Date: Sat, 06 Jun 2020 16:45:38 +0300 Message-ID: <83lfl08fzx.fsf@gnu.org> References: <838shhxuff.fsf@gnu.org> <83tuztctpk.fsf@gnu.org> <83ftbdcmm3.fsf@gnu.org> <87k10kzkv3.fsf@gmail.com> <833678a8xx.fsf@gnu.org> <871rmsz6mw.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="53739"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 41506@debbugs.gnu.org To: Pip Cet Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jun 06 15:46:09 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jhZ9F-000Dsp-56 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 06 Jun 2020 15:46:09 +0200 Original-Received: from localhost ([::1]:41190 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jhZ9E-00035Q-7u for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 06 Jun 2020 09:46:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41658) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jhZ98-00035G-B7 for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:46:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39544) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jhZ98-0003ug-25 for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jhZ98-0006Uq-0P for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:46:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 06 Jun 2020 13:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41506 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed Original-Received: via spool by 41506-submit@debbugs.gnu.org id=B41506.159145115624961 (code B ref 41506); Sat, 06 Jun 2020 13:46:01 +0000 Original-Received: (at 41506) by debbugs.gnu.org; 6 Jun 2020 13:45:56 +0000 Original-Received: from localhost ([127.0.0.1]:51090 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jhZ92-0006UX-9E for submit@debbugs.gnu.org; Sat, 06 Jun 2020 09:45:56 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:57028) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jhZ90-0006UK-HY for 41506@debbugs.gnu.org; Sat, 06 Jun 2020 09:45:55 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:60508) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jhZ8v-0003qX-7d; Sat, 06 Jun 2020 09:45:49 -0400 Original-Received: from [176.228.60.248] (port=3922 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jhZ8t-0002Pe-Mr; Sat, 06 Jun 2020 09:45:48 -0400 In-Reply-To: <871rmsz6mw.fsf@gmail.com> (message from Pip Cet on Sat, 06 Jun 2020 13:05:43 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:181617 Archived-At: > From: Pip Cet > Cc: 41506@debbugs.gnu.org > Date: Sat, 06 Jun 2020 13:05:43 +0000 > > >> + paragraph might start. But don't do that for the first > >> + element since this function will be called twice in that > >> + case. */ > > > > Which code causes the two calls, and why is that significant in this > > case? > > Maybe this code would be clearer: > > if (!bidi_it->first_elt) > { > bytepos++; > pos++; > } Could be, let's see what is the conclusion of this discussion. > In the "\n\nש" case, this happens: > > 1. bidi_paragraph_init is called with first_elt = 1 at buffer position 1 > 2. new_paragraph is cleared to false > 3. bidi_at_paragraph_end is called for buffer position 2. That looks > like a line ending a paragraph, though it's actually a line starting the > next paragraph. Still, it returns true. > 4. new_paragraph is set again > 5. bidi_paragraph_init is called with first_elt = 0 at buffer position 1 I minor correction to item 3: the second newline in this example is handled as belonging to the previous paragraph. You can see that by examining the behavior of RIGHT and LEFT arrow keys: they behave differently in R2L and L2R paragraphs. > What I'm not sure about is "\n \nש". It could be either a single > two-line paragraph followed by ש, or a single-character paragraph > followed by another paragraph whose first line happens to contain only a > space character; in the first case, paragraph orientation would default > to L2R, in the second case, it would be R2L. Do you happen to know what > Unicode says for this case? It's not Unicode in this case, it's Emacs. If UAX#9 is read and followed strictly, then each \n ends a paragraph and begins a new one. IOW, every physical line is a separate paragraph. This is a direct consequence of Newline's bidi class being B (paragraph separator): (get-char-code-property #x0a 'bidi-class) => B (as mandated by 3.2 in UAX#9), and of rules P1--P3 in UAX#9. However, since in Emacs the usual case is that hard newlines are used to fill text, the default UAX#9 behavior would make no sense, as a line that happens to start with a R2L character would be rendered right-to-left, even if the previous line wasn't. It would produce a randomly jagged display of paragraphs that mix L2R and R2L characters just because a line was broken in a different place by filling. So we use the "higher-level protocols" fire escape (see 4.3 in UAX#9) and define a "paragraph" differently, for the purposes of base paragraph direction: we by default require that paragraphs be separated by empty lines, see bidi-paragraph-separate-re. Thus, the above example by default treats the " \n" line as a paragraph separator, and the ש after it as the start of a new paragraph. (For completeness, we do support the strict interpretation of UAX#9: if you set both bidi-paragraph-start-re and bidi-paragraph-separate-re to "^", you get that. Any code changes we come up with here must therefore be tested at least with those settings as well.)