From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Itai Berli Newsgroups: gmane.emacs.bugs Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Thu, 29 Jun 2017 12:16:00 +0300 Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1498727835 25345 195.159.176.226 (29 Jun 2017 09:17:15 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 29 Jun 2017 09:17:15 +0000 (UTC) To: 27526@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jun 29 11:17:09 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQVZU-00068v-Rl for geb-bug-gnu-emacs@m.gmane.org; Thu, 29 Jun 2017 11:17:09 +0200 Original-Received: from localhost ([::1]:37890 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQVZY-0001cI-Bk for geb-bug-gnu-emacs@m.gmane.org; Thu, 29 Jun 2017 05:17:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53838) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQVZQ-0001Zx-61 for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:17:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dQVZO-0003sd-3p for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:17:04 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:40733) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dQVZN-0003sX-Vf for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:17:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dQVZN-0001N5-NX for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:17:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Itai Berli Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 29 Jun 2017 09:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 27526 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14987278145251 (code B ref -1); Thu, 29 Jun 2017 09:17:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 29 Jun 2017 09:16:54 +0000 Original-Received: from localhost ([127.0.0.1]:43410 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQVZG-0001Mc-68 for submit@debbugs.gnu.org; Thu, 29 Jun 2017 05:16:54 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:34568) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dQVZF-0001MO-0z for submit@debbugs.gnu.org; Thu, 29 Jun 2017 05:16:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dQVZ7-0003hS-RV for submit@debbugs.gnu.org; Thu, 29 Jun 2017 05:16:47 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:55146) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dQVZ7-0003hI-Ns for submit@debbugs.gnu.org; Thu, 29 Jun 2017 05:16:45 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53749) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dQVZ5-0001Yp-E8 for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:16:45 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dQVZ3-0003dp-Eh for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:16:43 -0400 Original-Received: from mail-vk0-x229.google.com ([2607:f8b0:400c:c05::229]:33046) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dQVZ3-0003dc-9L for bug-gnu-emacs@gnu.org; Thu, 29 Jun 2017 05:16:41 -0400 Original-Received: by mail-vk0-x229.google.com with SMTP id r126so46665309vkg.0 for ; Thu, 29 Jun 2017 02:16:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=uTaoZGP3MXRb0nkev+knp1A2odQaQp0fAklfAkfaHto=; b=suOapsBXPdd8UNF2sp0mo7oMcn0rCFoSp2lxT43tahj4gUSJNAxZ2LY8Yw0V3PlnK9 eqsu80QgQryO5NZrk1AGwbkPaBKPZ2aZNoJFn725JCj9HlFFI3Vt0KuqQz88Sw7DE2gx TOmoGTudUGGESUwek3DJ1x4pm1f145lOPZa3L2pakFZ0pLd9TRz8Zpges3zdePOGG7A4 BWE1+Uj0rNeb6zn7f8pN3NXB0O1nbMscdcXHiHcvDFUU7568OcYPjA4dK8oXDTe6s3+F kD+/9p1m98sVthoAdk59+XXj+dKGtOc3i6DrX0vMYNSVPDEa+UHR+E8DMNlbC/uNQSGb c/Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=uTaoZGP3MXRb0nkev+knp1A2odQaQp0fAklfAkfaHto=; b=Wu0BbIXlQ4E1DlkC4X+xDdftGBOfjaQSoc/0xDyvKxhrd7DW0uFJLiqPl0vbesE1cF OsQ5DYaSeJ3Gopi+Yg53Yz2OVODJS+25PU5ACQmw4yXK1QyqS9c8xDdqVRsIf1F2SDgT azfvmbL2M3lMJFNqv6xVYevcDloJHcifLhZ4fL4ga4cg5givI76Yd0bNVqC9yXXW7zM4 weUifMiaWf53JmFJdQ1HNdg0iN7C4221fQV7fInvrkTf9vISKvPOMBBlmRyKYfyy4XSk Bvn8RFHylqpmk5Cftux68/rTznr3igAqbWAY34F9sMVcUS4SOx0HhikcHZIuUUYXzp4U iJow== X-Gm-Message-State: AKS2vOxJy+FN0DEGU4tvrt5J7XZhtp9OAW68GlafpJV1vByzdYfqqcmw x45Ndo1BTxu0eAdQWp4N6eYaNAUTnmOKLrA= X-Received: by 10.31.108.7 with SMTP id h7mr8188190vkc.114.1498727800425; Thu, 29 Jun 2017 02:16:40 -0700 (PDT) Original-Received: by 10.176.70.85 with HTTP; Thu, 29 Jun 2017 02:16:00 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:134017 Archived-At: According to the Emacs manual (section 37.26 Bidirectional Display) > Emacs provides a =E2=80=9CFull Bidirectionality=E2=80=9D class implement= ation of the > UBA, consistent with the requirements of the Unicode Standard v8.0. And again (section 22.19 Bidirectional Editing) > Emacs implements the Unicode Bidirectional Algorithm described in the Uni= code Standard Annex #9, for reordering of bidirectional text for display. However these statements are false. Emacs does not implement the Unicode Bidirectional Algorithm correctly, and therefore does not even provide 'Implicit bidirectionality', which is the minimal level of conformance listed in section 4.2 'Explicit Formatting Character' of the Unicode 8.0.0 Bidirectional Algorithm specifications (www.unicode.org/reports/tr9/tr9-33.html), let alone 'Full bidirectionality= '. The reason has to do with the way the Emacs bidi implementation recognizes separate paragraphs, which is inconsistent with the Unicode specifications. The unicode Bidirectional Algorithm, specify (section 3 'Basic Display Algorithm') > The algorithm reorders text only within a paragraph; characters in one > paragraph have no effect on characters in a different > paragraph. Paragraphs are divided by the Paragraph Separator or > appropriate Newline Function (for guidelines on the handling of CR, > LF, and CRLF, see Section 4.4, Directionality, and Section 5.8, > Newline Guidelines of [Unicode]). However Emacs, by its own admition (section 22.19 Bidirectional Editing), take the following approach: > Paragraph boundaries are empty lines, i.e., lines consisting entirely of = whitespace characters. I'll repeat: according to Unicode a paragraph ends with a paragraph separator. What constitutes a paragraph separator is specified precisely in section 5.8 'Newline Guidelines' of The Unicode Standard version 8.0.0. For instance, on a MacOS X system, it is `LF` (line feed, Unicode 000A). The formatting effects of the bidi algorithm must not cross the paragraph separator boundary. And yet in Emacs the formatting extend beyond the paragraph separator, and this is the case on all operating systems. Consider, for instance, the following example. ILLUSTRATION: An English paragraph directly following a Hebrew paragraph is formatted like Hebrew text. http://imgur.com/3eyrUfA The first, Hebrew paragraph is formatted correctly, however the second, English paragraph is formatted wrongly, as though it was a Hebrew paragraph: it is right justified, the question mark appears on the left, and so does the cursor. Once an empty paragraph is inserted between the two paragraph, the English paragraph is formatted correctly. ILLUSTRATION: When paragraphs are separated by an empty paragraph, they are formatted correctly. http://imgur.com/ZsHGkwf This is not just a theoretical question of conformance to standards; this problem has practical consequences. Consider, for instance, a LaTeX document for typesetting Hebrew text. Normally in order to eliminate the usual leading indentation of the first line of a paragraph, a `\noinent` command is placed at the beginning of the paragraph. However, because the Unicode bidi algorithm determins the directionality of a paragraph based on its first word, the Hebrew text is formatted like English text. This is not a problem; it is to be expected. ILLUSTRATION: A LaTeX document for typesetting a Hebrew paragraph with no indentation of the first line. http://imgur.com/xYUkZKr One way to resolve this is to explicitly change the directionality of the paragraph, however, disregarding the fact that this is not currently possible due to a separate Emacs bug, even if it were possible, it would affect the placement of the backslash at the beginning of the `\noindent` command, which will no longer look like a LaTeX command. ILLUSTRATION: Explicitly changing the directionality of the paragraph. http://imgur.com/sPcVReA (Note: This is a screenshot of a Microsoft Word application, since due to a bug, Emacs doesn't currently enable to change the automatically determined directionality of a paragraph.) So the best way to resolve this problem would be to place the `\noindent` command on a separate paragraph. Unfortunately, here Emacs' faulty implementatino of the Unicode bidi algorithm rears its ugly head. Since Emacs doesn't recognize the paragraph separator for what it is, it will format the Hebrew text wrongly as though it were an English tex= t. ILLUSTRATION: Putting the `\noindent` on a separate paragraph results in the Hebrew text being formatted like English text http://imgur.com/44ds6rK Placing an empty paragraph between the `\noindent' command and the Hebrew text will resolve the formatting problem inside the Emacs editor, bu= t now the `\indent` command, which only affects the current LaTeX paragraphs (LaTeX paragraphs are ended by an empty line), no longer eliminates the indentation of the first line of the Hebrew paragraph in the typeset file. In GNU Emacs 25.1.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21 Version 10.9.5 (Build 13F1911)) of 2016-09-21 built on builder10-9.porkrind.org Windowing system distributor 'Apple', version 10.3.1504 Configured using: 'configure --with-ns '--enable-locallisppath=3D/Library/Application Support/Emacs/${version}/site-lisp:/Library/Application Support/Emacs/site-lisp' --with-modules' Configured features: NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Fundamental Minor modes in effect: ivy-mode: t shell-dirtrack-mode: t projectile-mode: t helm-descbinds-mode: t async-bytecomp-package-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent messages: ad-handle-definition: =E2=80=98ibuffer=E2=80=99 got redefined Turn on helm-projectile key bindings For information about GNU Emacs and the GNU system, type C-h C-a. Load-path shadows: /Users/itaiberli/.emacs.d/elpa/seq-2.20/seq hides /Applications/Emacs.app/Contents/Resources/lisp/emacs-lisp/seq Features: (shadow sort mail-extr emacsbug message rfc822 mml mml-sec epg mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mail-utils colir color counsel jka-compr esh-util etags xref project swiper reftex reftex-vars two-column ivy delsel ivy-overlay helm-projectile helm-files rx image-dired tramp tramp-compat tramp-loaddefs trampver shell pcomplete format-spec dired-x dired-aux ffap helm-tags helm-bookmark helm-adaptive helm-info bookmark pp helm-external helm-net browse-url xml url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util url-parse auth-source gnus-util mm-util help-fns mail-prsvr password-cache url-vars mailcap helm-buffers helm-grep helm-regexp helm-utils helm-locate helm-help helm-types projectile grep compile comint ansi-color ring ibuf-ext ibuffer thingatpt helm-descbinds helm easy-mmode helm-source cl-seq eieio-compat eieio eieio-core helm-multi-match helm-lib dired helm-config helm-easymenu cl-macs async-bytecomp async advice edmacro kmacro finder-inf tex-site info package epg-config seq byte-opt gv bytecomp byte-compile cl-extra help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel ns-win ucs-normalize term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote kqueue cocoa ns multi-tty make-network-process emacs) Memory information: ((conses 16 312045 13704) (symbols 48 30403 0) (miscs 40 88 192) (strings 32 51754 11765) (string-bytes 1 1669992) (vectors 16 50218) (vector-slots 8 844617 7052) (floats 8 564 218) (intervals 56 242 111) (buffers 976 18))