From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Itai Berli <itai.berli@gmail.com>
Newsgroups: gmane.emacs.bugs
Subject: bug#27526: 25.1;
	Nonconformance to Unicode bidirectionality algorithm due to paragraph
	separator
Date: Tue, 4 Jul 2017 13:42:19 +0300
Message-ID: <CABsNJ=PZwamr9wwn=HSDmedVcSvw94gUTU6=-O=-u1A7BMUuUw@mail.gmail.com>
References: <CABsNJ=PXKOfOS=HE=TPNFyidHxWkSsRDMBPKQ98=QsQVDWM4AA@mail.gmail.com>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="94eb2c1498f413f11b05537b8c2d"
X-Trace: blaine.gmane.org 1499165063 30938 195.159.176.226 (4 Jul 2017 10:44:23 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 4 Jul 2017 10:44:23 +0000 (UTC)
To: 27526@debbugs.gnu.org
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 04 12:44:17 2017
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1dSLJY-0007Vq-7P
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Jul 2017 12:44:16 +0200
Original-Received: from localhost ([::1]:40325 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1dSLJa-0001Av-2K
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Jul 2017 06:44:18 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56248)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1dSLJO-0001Ae-6C
	for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:12 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1dSLJL-0001Hq-32
	for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:06 -0400
Original-Received: from debbugs.gnu.org ([208.118.235.43]:48534)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1dSLJK-0001Hg-Uv
	for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:03 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1dSLJK-0005bq-H3
	for bug-gnu-emacs@gnu.org; Tue, 04 Jul 2017 06:44:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Itai Berli <itai.berli@gmail.com>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 04 Jul 2017 10:44:02 +0000
Resent-Message-ID: <handler.27526.B27526.149916498821487@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 27526
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 27526-submit@debbugs.gnu.org id=B27526.149916498821487
	(code B ref 27526); Tue, 04 Jul 2017 10:44:02 +0000
Original-Received: (at 27526) by debbugs.gnu.org; 4 Jul 2017 10:43:08 +0000
Original-Received: from localhost ([127.0.0.1]:51211 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1dSLIR-0005aV-Hn
	for submit@debbugs.gnu.org; Tue, 04 Jul 2017 06:43:07 -0400
Original-Received: from mail-vk0-f48.google.com ([209.85.213.48]:32836)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <itai.berli@gmail.com>) id 1dSLIP-0005a1-Qf
	for 27526@debbugs.gnu.org; Tue, 04 Jul 2017 06:43:06 -0400
Original-Received: by mail-vk0-f48.google.com with SMTP id r126so109028161vkg.0
	for <27526@debbugs.gnu.org>; Tue, 04 Jul 2017 03:43:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; 
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
	bh=ZjCBk8Wi5XfVHVZe5OA3kRh0qnHcoTJU5G8BClGLIWQ=;
	b=KVFFab9vrGppyI0lMHZzusaYV9vn20yhEVXntQK8SLdz+oqdA5sKA4G4F5k23pCc9i
	8hTOfI4DUPcmGqNmn9SZKcGQqBkrItPLE/SIZe/il/9kUFJ195UOb+KPDjWyQKAJUDMi
	UZ2U2WfX1rexK23RrANuvr6FSUr2I9pq8lQGAlznpbZ09m8OxRkqo+JT0QKwEZ1ISuho
	T/Clif21OD0VaiAWJl8i4wX/vxw2TFhTqDo1ctjqCXppwE4GAyGzoIJSI5tOvWjunwB/
	lwNX+dQRCuAjXRgVfkzUjFDSEPuIbs3TA6zjFXCQy+XEvBPcspi6HNOTnAUORa2eZ5Gl
	LxQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to;
	bh=ZjCBk8Wi5XfVHVZe5OA3kRh0qnHcoTJU5G8BClGLIWQ=;
	b=WUKZZqWy4wwqPW+kKIVIe0EheJpGOY3gQu1LwZGx0lYNhr8jBsddVrPDWBDGpLerTl
	K+vRn7nORM/LJUTPvhPLbBbV3YI/YCwFi2QY2r73A/EpFYUiQETg/NMIPytzlK0Phd+z
	f+AHsCFPoVC21mMgM8luFizvtC3EBIb3V1tpll/XeehP1K4am/kO6c+h1ELxvmxsnBJe
	MkkP1ml8nQJ+fm9yWptV+YL9TXnAtNxI9P7hgOeWxZBGypMpaKdFo5yhyue1l1ElK0sA
	jXXKAJgczvIan9ZYdh4XTRJ8bC781Xw+oPfQUjqO2gp6HQYjJ4d3vlpawaoTmeBlP1YJ
	geZw==
X-Gm-Message-State: AKS2vOw+bpfZahTF31jM3gb4IvEszYg/V5B6Is48qnc3d/FZy0FGXPVH
	eSWDcPcUGS0b/hIDHRWOildjq2ZP3xZZElE=
X-Received: by 10.31.248.10 with SMTP id w10mr17984108vkh.55.1499164979751;
	Tue, 04 Jul 2017 03:42:59 -0700 (PDT)
Original-Received: by 10.176.70.85 with HTTP; Tue, 4 Jul 2017 03:42:19 -0700 (PDT)
In-Reply-To: <CABsNJ=PXKOfOS=HE=TPNFyidHxWkSsRDMBPKQ98=QsQVDWM4AA@mail.gmail.com>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:134165
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/134165>

--94eb2c1498f413f11b05537b8c2d
Content-Type: text/plain; charset="UTF-8"

I'd like to add another reason why this behavior is problematic: it breaks
interoperability with other plain text editors, since the text will not be
displayed the same way. Consider, for instance, the very same plain text
file
in GEdit: http://imgur.com/Iw4yrdQ
in Emacs: http://imgur.com/7kfWseE

Finally, the question of whether Emacs behavior is consistent with the UBA
specifications is debatable, since when UBA section 3 states "Paragraphs
may also be determined by higher-level protocols" the question is what
exactly the "also" means: is it that the higher-level protocols (HLP) can
decide that a newline character is not a paragraph boundary, as Emacs does,
or is it that the HLP can only declare paragraph boundaries *in addition
to *paragraph separator characters?

On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <itai.berli@gmail.com> wrote:

> > The UBA allows applications to employ "higher-level protocols" when
> > deciding on base paragraph direction.  See section 4.3 in UAX#9 and
> specifically clause HL1 there.
>
> > This is what Emacs does: it applies its own heuristics for this
> > decision.  The reason for that is that Emacs's implementation of the
> > UBA must work reasonably well in plain-text buffers, where typically
> > long paragraphs are broken into lines by newline characters (which are
> > paragraph separators according to the UBA), and many times the
> > partition into lines is done by auto-fill or similar features, thus
> > making the first character of the next line fairly arbitrary.  Using
> > the UBA paragraph-direction determination would then produce
> > unacceptable results, whereby the direction of a part of a paragraph
> > could change in unpredictable ways when text is refilled.
>
>  As I understand it, the "higher-level protocols" provision is intended
>  to allow for such things as table cells, elements of structured markup
>  languages, and word processors that use an idio-syncratic
>  implementation of a paragraph separator *under the hood*. It is not
>  intended for plain running text; for this the standard specifies
>  explicitly what the paragraph separators for every operating system
>  are.
>
> > typically long paragraphs are broken into lines by newline characters
>
> I see no evidence of the validity of this statement on my system (Emacs
> 25.1.1). But even if this were so, it would still not merit
> *hard-coding* the paragraph separator as a blank line, as there are
> situations (such as the one I presented in my bug report) that require
> a diffferent configuration.
>
> > You can alleviate this to some extent by ...(in your case) starting
> > the paragraph with an RLM control character before \noindent,
> > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> > that the backslash displays to the left of "noindent").  This is
> > admittedly a bit awkward, but I think the results are still acceptable.
>
> As you mentioned, the solution is cubersome. It might have been
> acceptable if this was the sole issue, but this example illustrates just
> one of
> several problems that arise due to current paragraph separator
> convention.
>
> In conclusion, and on a personal note, I implore you to change this
> behavior, and to do so as soon as possible, and not only for specialized
> markup documents, but for every document.
>
> I am currently working on my thesis. Emacs is useless to me as a text
> editor of Hebrew texts without this feature. This is no
> exaggeration.
>
> The original reason I chose Emacs over other editors was because of
> the combination of AUCTeX and the promise of full Unicode
> compatibility. AUCTeX has delivered on its promise, but in the area of
> Unicode, as far as my needs are concerned it is if there was no Unicode
> support at all, and I will be sadly forced to look for a different editor.
>

--94eb2c1498f413f11b05537b8c2d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;d like to add another reason why this behavior is pr=
oblematic: it breaks interoperability with other plain text editors,<span s=
tyle=3D"font-size:12.8px">=C2=A0since the text will not be displayed the sa=
me way. Consider, f</span>or instance, the very same plain text file<div>in=
 GEdit:=C2=A0<a href=3D"http://imgur.com/Iw4yrdQ">http://imgur.com/Iw4yrdQ<=
/a><div>in Emacs:=C2=A0<a href=3D"http://imgur.com/7kfWseE">http://imgur.co=
m/7kfWseE</a><br></div><div><br></div><div><div><div>Finally, the question =
of whether Emacs behavior is consistent with the UBA specifications is deba=
table,=C2=A0<span style=3D"font-size:12.8px">since when UBA section 3 state=
s &quot;Paragraphs may also be determined by higher-level protocols&quot; t=
he question is what exactly the &quot;also&quot; means: is it that the high=
er-level protocols (HLP) can decide that a newline character is not a parag=
raph boundary, as Emacs does, or is it that the HLP can only declare paragr=
aph boundaries=C2=A0</span><i style=3D"font-size:12.8px">in addition to=C2=
=A0</i><span style=3D"font-size:12.8px">paragraph separator characters?</sp=
an></div></div></div></div></div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <span dir=3D"l=
tr">&lt;<a href=3D"mailto:itai.berli@gmail.com" target=3D"_blank">itai.berl=
i@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">&gt; Th=
e UBA allows applications to employ &quot;higher-level protocols&quot; when=
<br>
&gt; deciding on base paragraph direction.=C2=A0 See section 4.3 in UAX#9 a=
nd specifically clause HL1 there.<br>
<br>
&gt; This is what Emacs does: it applies its own heuristics for this<br>
&gt; decision.=C2=A0 The reason for that is that Emacs&#39;s implementation=
 of the<br>
&gt; UBA must work reasonably well in plain-text buffers, where typically<b=
r>
&gt; long paragraphs are broken into lines by newline characters (which are=
<br>
&gt; paragraph separators according to the UBA), and many times the<br>
&gt; partition into lines is done by auto-fill or similar features, thus<br=
>
&gt; making the first character of the next line fairly arbitrary.=C2=A0 Us=
ing<br>
&gt; the UBA paragraph-direction determination would then produce<br>
&gt; unacceptable results, whereby the direction of a part of a paragraph<b=
r>
&gt; could change in unpredictable ways when text is refilled.<br>
<br>
=C2=A0As I understand it, the &quot;higher-level protocols&quot; provision =
is intended<br>
=C2=A0to allow for such things as table cells, elements of structured marku=
p<br>
=C2=A0languages, and word processors that use an idio-syncratic<br>
=C2=A0implementation of a paragraph separator *under the hood*. It is not<b=
r>
=C2=A0intended for plain running text; for this the standard specifies<br>
=C2=A0explicitly what the paragraph separators for every operating system<b=
r>
=C2=A0are.<br>
<br>
&gt; typically long paragraphs are broken into lines by newline characters<=
br>
<br>
I see no evidence of the validity of this statement on my system (Emacs<br>
25.1.1). But even if this were so, it would still not merit<br>
*hard-coding* the paragraph separator as a blank line, as there are<br>
situations (such as the one I presented in my bug report) that require<br>
a diffferent configuration.<br>
<br>
&gt; You can alleviate this to some extent by ...(in your case) starting<br=
>
&gt; the paragraph with an RLM control character before \noindent,<br>
&gt; optionally followed by an LRM or enclosing \noindent in LRE..PDF (so<b=
r>
&gt; that the backslash displays to the left of &quot;noindent&quot;).=C2=
=A0 This is<br>
&gt; admittedly a bit awkward, but I think the results are still acceptable=
.<br>
<br>
As you mentioned, the solution is cubersome. It might have been<br>
acceptable if this was the sole issue, but this example illustrates just on=
e of<br>
several problems that arise due to current paragraph separator<br>
convention.<br>
<br>
In conclusion, and on a personal note, I implore you to change this<br>
behavior, and to do so as soon as possible, and not only for specialized<br=
>
markup documents, but for every document.<br>
<br>
I am currently working on my thesis. Emacs is useless to me as a text<br>
editor of Hebrew texts without this feature. This is no<br>
exaggeration.<br>
<br>
The original reason I chose Emacs over other editors was because of<br>
the combination of AUCTeX and the promise of full Unicode<br>
compatibility. AUCTeX has delivered on its promise, but in the area of<br>
Unicode, as far as my needs are concerned it is if there was no Unicode<br>
support at all, and I will be sadly forced to look for a different editor.<=
br>
</blockquote></div><br></div>

--94eb2c1498f413f11b05537b8c2d--