unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Itai Berli <itai.berli@gmail.com>
To: 27526@debbugs.gnu.org
Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
Date: Tue, 4 Jul 2017 13:42:19 +0300	[thread overview]
Message-ID: <CABsNJ=PZwamr9wwn=HSDmedVcSvw94gUTU6=-O=-u1A7BMUuUw@mail.gmail.com> (raw)
In-Reply-To: <CABsNJ=PXKOfOS=HE=TPNFyidHxWkSsRDMBPKQ98=QsQVDWM4AA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3835 bytes --]

I'd like to add another reason why this behavior is problematic: it breaks
interoperability with other plain text editors, since the text will not be
displayed the same way. Consider, for instance, the very same plain text
file
in GEdit: http://imgur.com/Iw4yrdQ
in Emacs: http://imgur.com/7kfWseE

Finally, the question of whether Emacs behavior is consistent with the UBA
specifications is debatable, since when UBA section 3 states "Paragraphs
may also be determined by higher-level protocols" the question is what
exactly the "also" means: is it that the higher-level protocols (HLP) can
decide that a newline character is not a paragraph boundary, as Emacs does,
or is it that the HLP can only declare paragraph boundaries *in addition
to *paragraph separator characters?

On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <itai.berli@gmail.com> wrote:

> > The UBA allows applications to employ "higher-level protocols" when
> > deciding on base paragraph direction.  See section 4.3 in UAX#9 and
> specifically clause HL1 there.
>
> > This is what Emacs does: it applies its own heuristics for this
> > decision.  The reason for that is that Emacs's implementation of the
> > UBA must work reasonably well in plain-text buffers, where typically
> > long paragraphs are broken into lines by newline characters (which are
> > paragraph separators according to the UBA), and many times the
> > partition into lines is done by auto-fill or similar features, thus
> > making the first character of the next line fairly arbitrary.  Using
> > the UBA paragraph-direction determination would then produce
> > unacceptable results, whereby the direction of a part of a paragraph
> > could change in unpredictable ways when text is refilled.
>
>  As I understand it, the "higher-level protocols" provision is intended
>  to allow for such things as table cells, elements of structured markup
>  languages, and word processors that use an idio-syncratic
>  implementation of a paragraph separator *under the hood*. It is not
>  intended for plain running text; for this the standard specifies
>  explicitly what the paragraph separators for every operating system
>  are.
>
> > typically long paragraphs are broken into lines by newline characters
>
> I see no evidence of the validity of this statement on my system (Emacs
> 25.1.1). But even if this were so, it would still not merit
> *hard-coding* the paragraph separator as a blank line, as there are
> situations (such as the one I presented in my bug report) that require
> a diffferent configuration.
>
> > You can alleviate this to some extent by ...(in your case) starting
> > the paragraph with an RLM control character before \noindent,
> > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> > that the backslash displays to the left of "noindent").  This is
> > admittedly a bit awkward, but I think the results are still acceptable.
>
> As you mentioned, the solution is cubersome. It might have been
> acceptable if this was the sole issue, but this example illustrates just
> one of
> several problems that arise due to current paragraph separator
> convention.
>
> In conclusion, and on a personal note, I implore you to change this
> behavior, and to do so as soon as possible, and not only for specialized
> markup documents, but for every document.
>
> I am currently working on my thesis. Emacs is useless to me as a text
> editor of Hebrew texts without this feature. This is no
> exaggeration.
>
> The original reason I chose Emacs over other editors was because of
> the combination of AUCTeX and the promise of full Unicode
> compatibility. AUCTeX has delivered on its promise, but in the area of
> Unicode, as far as my needs are concerned it is if there was no Unicode
> support at all, and I will be sadly forced to look for a different editor.
>

[-- Attachment #2: Type: text/html, Size: 4716 bytes --]

  reply	other threads:[~2017-07-04 10:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
2017-06-29 18:36 ` Itai Berli
2017-07-04 10:42   ` Itai Berli [this message]
2017-07-04 15:03     ` Eli Zaretskii
2017-07-04 15:57       ` Itai Berli
2017-07-04 16:18         ` Eli Zaretskii
2017-07-04 16:37           ` Itai Berli
2017-07-04 16:47             ` Eli Zaretskii
2017-07-04 17:01               ` Itai Berli
2017-07-04 17:46                 ` Eli Zaretskii
2017-07-12 15:10                   ` Itai Berli
2017-07-12 15:36                     ` Eli Zaretskii
2017-07-12 15:52                       ` Itai Berli
2017-07-12 16:12                         ` Eli Zaretskii
2017-07-17 14:54           ` Eli Zaretskii
2017-07-17 15:16             ` Itai Berli
2017-07-17 15:23               ` Jean-Christophe Helary
2017-07-17 18:33                 ` Itai Berli
2017-07-17 20:20                   ` Eli Zaretskii
2017-07-17 20:43                     ` Itai Berli
2017-07-18  2:33                       ` Eli Zaretskii
2017-07-18  2:45                         ` Glenn Morris
2017-07-18  4:01                           ` Itai Berli
2017-07-18  4:54                             ` Eli Zaretskii
2017-07-18  5:52                               ` Itai Berli
2017-07-18 13:27                                 ` Itai Berli
2017-07-18 14:44                                   ` Eli Zaretskii
2017-07-18 15:22                                     ` Itai Berli
2017-07-18 15:55                                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABsNJ=PZwamr9wwn=HSDmedVcSvw94gUTU6=-O=-u1A7BMUuUw@mail.gmail.com' \
    --to=itai.berli@gmail.com \
    --cc=27526@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).