From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Itai Berli Newsgroups: gmane.emacs.bugs Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Tue, 4 Jul 2017 13:42:19 +0300 Message-ID: References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="94eb2c1498f413f11b05537b8c2d" X-Trace: blaine.gmane.org 1499165063 30938 195.159.176.226 (4 Jul 2017 10:44:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 4 Jul 2017 10:44:23 +0000 (UTC) To: 27526@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 04 12:44:17 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dSLJY-0007Vq-7P for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Jul 2017 12:44:16 +0200 Original-Received: from localhost ([::1]:40325 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dSLJa-0001Av-2K for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Jul 2017 06:44:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56248) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dSLJO-0001Ae-6C for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dSLJL-0001Hq-32 for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:48534) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dSLJK-0001Hg-Uv for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dSLJK-0005bq-H3 for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Itai Berli Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 04 Jul 2017 10:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 27526 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 27526-submit@debbugs.gnu.org id=B27526.149916498821487 (code B ref 27526); Tue, 04 Jul 2017 10:44:02 +0000 Original-Received: (at 27526) by debbugs.gnu.org; 4 Jul 2017 10:43:08 +0000 Original-Received: from localhost ([127.0.0.1]:51211 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dSLIR-0005aV-Hn for submit@debbugs.gnu.org; Tue, 04 Jul 2017 06:43:07 -0400 Original-Received: from mail-vk0-f48.google.com ([209.85.213.48]:32836) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dSLIP-0005a1-Qf for 27526@debbugs.gnu.org; Tue, 04 Jul 2017 06:43:06 -0400 Original-Received: by mail-vk0-f48.google.com with SMTP id r126so109028161vkg.0 for <27526@debbugs.gnu.org>; Tue, 04 Jul 2017 03:43:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ZjCBk8Wi5XfVHVZe5OA3kRh0qnHcoTJU5G8BClGLIWQ=; b=KVFFab9vrGppyI0lMHZzusaYV9vn20yhEVXntQK8SLdz+oqdA5sKA4G4F5k23pCc9i 8hTOfI4DUPcmGqNmn9SZKcGQqBkrItPLE/SIZe/il/9kUFJ195UOb+KPDjWyQKAJUDMi UZ2U2WfX1rexK23RrANuvr6FSUr2I9pq8lQGAlznpbZ09m8OxRkqo+JT0QKwEZ1ISuho T/Clif21OD0VaiAWJl8i4wX/vxw2TFhTqDo1ctjqCXppwE4GAyGzoIJSI5tOvWjunwB/ lwNX+dQRCuAjXRgVfkzUjFDSEPuIbs3TA6zjFXCQy+XEvBPcspi6HNOTnAUORa2eZ5Gl LxQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ZjCBk8Wi5XfVHVZe5OA3kRh0qnHcoTJU5G8BClGLIWQ=; b=WUKZZqWy4wwqPW+kKIVIe0EheJpGOY3gQu1LwZGx0lYNhr8jBsddVrPDWBDGpLerTl K+vRn7nORM/LJUTPvhPLbBbV3YI/YCwFi2QY2r73A/EpFYUiQETg/NMIPytzlK0Phd+z f+AHsCFPoVC21mMgM8luFizvtC3EBIb3V1tpll/XeehP1K4am/kO6c+h1ELxvmxsnBJe MkkP1ml8nQJ+fm9yWptV+YL9TXnAtNxI9P7hgOeWxZBGypMpaKdFo5yhyue1l1ElK0sA jXXKAJgczvIan9ZYdh4XTRJ8bC781Xw+oPfQUjqO2gp6HQYjJ4d3vlpawaoTmeBlP1YJ geZw== X-Gm-Message-State: AKS2vOw+bpfZahTF31jM3gb4IvEszYg/V5B6Is48qnc3d/FZy0FGXPVH eSWDcPcUGS0b/hIDHRWOildjq2ZP3xZZElE= X-Received: by 10.31.248.10 with SMTP id w10mr17984108vkh.55.1499164979751; Tue, 04 Jul 2017 03:42:59 -0700 (PDT) Original-Received: by 10.176.70.85 with HTTP; Tue, 4 Jul 2017 03:42:19 -0700 (PDT) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:134165 Archived-At: --94eb2c1498f413f11b05537b8c2d Content-Type: text/plain; charset="UTF-8" I'd like to add another reason why this behavior is problematic: it breaks interoperability with other plain text editors, since the text will not be displayed the same way. Consider, for instance, the very same plain text file in GEdit: http://imgur.com/Iw4yrdQ in Emacs: http://imgur.com/7kfWseE Finally, the question of whether Emacs behavior is consistent with the UBA specifications is debatable, since when UBA section 3 states "Paragraphs may also be determined by higher-level protocols" the question is what exactly the "also" means: is it that the higher-level protocols (HLP) can decide that a newline character is not a paragraph boundary, as Emacs does, or is it that the HLP can only declare paragraph boundaries *in addition to *paragraph separator characters? On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli wrote: > > The UBA allows applications to employ "higher-level protocols" when > > deciding on base paragraph direction. See section 4.3 in UAX#9 and > specifically clause HL1 there. > > > This is what Emacs does: it applies its own heuristics for this > > decision. The reason for that is that Emacs's implementation of the > > UBA must work reasonably well in plain-text buffers, where typically > > long paragraphs are broken into lines by newline characters (which are > > paragraph separators according to the UBA), and many times the > > partition into lines is done by auto-fill or similar features, thus > > making the first character of the next line fairly arbitrary. Using > > the UBA paragraph-direction determination would then produce > > unacceptable results, whereby the direction of a part of a paragraph > > could change in unpredictable ways when text is refilled. > > As I understand it, the "higher-level protocols" provision is intended > to allow for such things as table cells, elements of structured markup > languages, and word processors that use an idio-syncratic > implementation of a paragraph separator *under the hood*. It is not > intended for plain running text; for this the standard specifies > explicitly what the paragraph separators for every operating system > are. > > > typically long paragraphs are broken into lines by newline characters > > I see no evidence of the validity of this statement on my system (Emacs > 25.1.1). But even if this were so, it would still not merit > *hard-coding* the paragraph separator as a blank line, as there are > situations (such as the one I presented in my bug report) that require > a diffferent configuration. > > > You can alleviate this to some extent by ...(in your case) starting > > the paragraph with an RLM control character before \noindent, > > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so > > that the backslash displays to the left of "noindent"). This is > > admittedly a bit awkward, but I think the results are still acceptable. > > As you mentioned, the solution is cubersome. It might have been > acceptable if this was the sole issue, but this example illustrates just > one of > several problems that arise due to current paragraph separator > convention. > > In conclusion, and on a personal note, I implore you to change this > behavior, and to do so as soon as possible, and not only for specialized > markup documents, but for every document. > > I am currently working on my thesis. Emacs is useless to me as a text > editor of Hebrew texts without this feature. This is no > exaggeration. > > The original reason I chose Emacs over other editors was because of > the combination of AUCTeX and the promise of full Unicode > compatibility. AUCTeX has delivered on its promise, but in the area of > Unicode, as far as my needs are concerned it is if there was no Unicode > support at all, and I will be sadly forced to look for a different editor. > --94eb2c1498f413f11b05537b8c2d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'd like to add another reason why this behavior is pr= oblematic: it breaks interoperability with other plain text editors,=C2=A0since the text will not be displayed the sa= me way. Consider, for instance, the very same plain text file
in= GEdit:=C2=A0http://imgur.com/Iw4yrdQ<= /a>

Finally, the question = of whether Emacs behavior is consistent with the UBA specifications is deba= table,=C2=A0since when UBA section 3 state= s "Paragraphs may also be determined by higher-level protocols" t= he question is what exactly the "also" means: is it that the high= er-level protocols (HLP) can decide that a newline character is not a parag= raph boundary, as Emacs does, or is it that the HLP can only declare paragr= aph boundaries=C2=A0in addition to=C2= =A0paragraph separator characters?

On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <itai.berl= i@gmail.com> wrote:
> Th= e UBA allows applications to employ "higher-level protocols" when=
> deciding on base paragraph direction.=C2=A0 See section 4.3 in UAX#9 a= nd specifically clause HL1 there.

> This is what Emacs does: it applies its own heuristics for this
> decision.=C2=A0 The reason for that is that Emacs's implementation= of the
> UBA must work reasonably well in plain-text buffers, where typically > long paragraphs are broken into lines by newline characters (which are=
> paragraph separators according to the UBA), and many times the
> partition into lines is done by auto-fill or similar features, thus > making the first character of the next line fairly arbitrary.=C2=A0 Us= ing
> the UBA paragraph-direction determination would then produce
> unacceptable results, whereby the direction of a part of a paragraph > could change in unpredictable ways when text is refilled.

=C2=A0As I understand it, the "higher-level protocols" provision = is intended
=C2=A0to allow for such things as table cells, elements of structured marku= p
=C2=A0languages, and word processors that use an idio-syncratic
=C2=A0implementation of a paragraph separator *under the hood*. It is not =C2=A0intended for plain running text; for this the standard specifies
=C2=A0explicitly what the paragraph separators for every operating system =C2=A0are.

> typically long paragraphs are broken into lines by newline characters<= br>
I see no evidence of the validity of this statement on my system (Emacs
25.1.1). But even if this were so, it would still not merit
*hard-coding* the paragraph separator as a blank line, as there are
situations (such as the one I presented in my bug report) that require
a diffferent configuration.

> You can alleviate this to some extent by ...(in your case) starting > the paragraph with an RLM control character before \noindent,
> optionally followed by an LRM or enclosing \noindent in LRE..PDF (so > that the backslash displays to the left of "noindent").=C2= =A0 This is
> admittedly a bit awkward, but I think the results are still acceptable= .

As you mentioned, the solution is cubersome. It might have been
acceptable if this was the sole issue, but this example illustrates just on= e of
several problems that arise due to current paragraph separator
convention.

In conclusion, and on a personal note, I implore you to change this
behavior, and to do so as soon as possible, and not only for specialized markup documents, but for every document.

I am currently working on my thesis. Emacs is useless to me as a text
editor of Hebrew texts without this feature. This is no
exaggeration.

The original reason I chose Emacs over other editors was because of
the combination of AUCTeX and the promise of full Unicode
compatibility. AUCTeX has delivered on its promise, but in the area of
Unicode, as far as my needs are concerned it is if there was no Unicode
support at all, and I will be sadly forced to look for a different editor.<= br>

--94eb2c1498f413f11b05537b8c2d--