From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Thu, 29 Jun 2017 17:49:24 +0300 Message-ID: <83injergqz.fsf@gnu.org> References: Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1498747816 25966 195.159.176.226 (29 Jun 2017 14:50:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 29 Jun 2017 14:50:16 +0000 (UTC) Cc: 27526@debbugs.gnu.org To: Itai Berli Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jun 29 16:50:12 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQali-0006JN-Io for geb-bug-gnu-emacs@m.gmane.org; Thu, 29 Jun 2017 16:50:06 +0200 Original-Received: from localhost ([::1]:39789 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQaln-0004ko-Tj for geb-bug-gnu-emacs@m.gmane.org; Thu, 29 Jun 2017 10:50:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53009) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQali-0004io-5d for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 10:50:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dQale-0007xU-7A for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 10:50:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:42115) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dQale-0007xN-3x for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 10:50:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dQald-0006kB-U5 for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 10:50:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 29 Jun 2017 14:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 27526 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 27526-submit@debbugs.gnu.org id=B27526.149874779125902 (code B ref 27526); Thu, 29 Jun 2017 14:50:01 +0000 Original-Received: (at 27526) by debbugs.gnu.org; 29 Jun 2017 14:49:51 +0000 Original-Received: from localhost ([127.0.0.1]:44792 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQalS-0006ji-VD for submit@debbugs.gnu.org; Thu, 29 Jun 2017 10:49:51 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:33680) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQalR-0006jU-E2 for 27526@debbugs.gnu.org; Thu, 29 Jun 2017 10:49:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dQalI-0007qd-5n for 27526@debbugs.gnu.org; Thu, 29 Jun 2017 10:49:44 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39920) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQalI-0007qV-21; Thu, 29 Jun 2017 10:49:40 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2614 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dQalH-0005u8-5e; Thu, 29 Jun 2017 10:49:39 -0400 In-reply-to: (message from Itai Berli on Thu, 29 Jun 2017 12:16:00 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:134028 Archived-At: > From: Itai Berli > Date: Thu, 29 Jun 2017 12:16:00 +0300 > > I'll repeat: according to Unicode a paragraph ends with a paragraph > separator. What constitutes a paragraph separator is specified precisely > in section 5.8 'Newline Guidelines' of The Unicode Standard version > 8.0.0. For instance, on a MacOS X system, it is `LF` (line feed, > Unicode 000A). The formatting effects of the bidi algorithm must not > cross the paragraph separator boundary. > > And yet in Emacs the formatting extend beyond the paragraph separator, > and this is the case on all operating systems. Consider, for instance, > the following example. The UBA allows applications to employ "higher-level protocols" when deciding on base paragraph direction. See section 4.3 in UAX#9 and specifically clause HL1 there. This is what Emacs does: it applies its own heuristics for this decision. The reason for that is that Emacs's implementation of the UBA must work reasonably well in plain-text buffers, where typically long paragraphs are broken into lines by newline characters (which are paragraph separators according to the UBA), and many times the partition into lines is done by auto-fill or similar features, thus making the first character of the next line fairly arbitrary. Using the UBA paragraph-direction determination would then produce unacceptable results, whereby the direction of a part of a paragraph could change in unpredictable ways when text is refilled. > Consider, for > instance, a LaTeX document for typesetting Hebrew > text. Normally in order to eliminate the usual leading indentation of > the first line of a paragraph, a `\noinent` command is placed at the > beginning of the paragraph. However, because the Unicode bidi algorithm > determins the directionality of a paragraph based on its first word, the > Hebrew text is formatted like English text. This is not a problem; it is > to be expected. The Emacs bidirectional display doesn't have special facilities for marked-up text, such as TeX and HTML/XML. Because those markups use punctuation characters for their markup, doing so in RTL context can produce unpleasant results in the default display, as you point out. You can alleviate this to some extent by (in your case) starting the paragraph with an RLM control character before \noindent, optionally followed by an LRM or enclosing \noindent in LRE..PDF (so that the backslash displays to the left of "noindent"). This is admittedly a bit awkward, but I think the results are still acceptable. I will gladly work with anyone who'd volunteer to introduce features required to better support markup languages. This will require low-level display changes and some support from the relevant major modes to use those features. For now, the demand was sufficiently low (I think you are about the second person to raise the issue since bidirectional display debuted in Emacs 24.1) to keep this issue low on our TODO. > One way to resolve this is to explicitly change the directionality of the > paragraph, however, disregarding the fact that this is not currently > possible due to a separate Emacs bug, even if it were possible, it would > affect the placement of the backslash at the beginning of the > `\noindent` command, which will no longer look like a LaTeX command. I think my suggestion above fixes this latter issue as well. Thanks.