From: Itai Berli <itai.berli@gmail.com>
To: 27526@debbugs.gnu.org
Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
Date: Tue, 4 Jul 2017 13:42:19 +0300 [thread overview]
Message-ID: <CABsNJ=PZwamr9wwn=HSDmedVcSvw94gUTU6=-O=-u1A7BMUuUw@mail.gmail.com> (raw)
In-Reply-To: <CABsNJ=PXKOfOS=HE=TPNFyidHxWkSsRDMBPKQ98=QsQVDWM4AA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3835 bytes --]
I'd like to add another reason why this behavior is problematic: it breaks
interoperability with other plain text editors, since the text will not be
displayed the same way. Consider, for instance, the very same plain text
file
in GEdit: http://imgur.com/Iw4yrdQ
in Emacs: http://imgur.com/7kfWseE
Finally, the question of whether Emacs behavior is consistent with the UBA
specifications is debatable, since when UBA section 3 states "Paragraphs
may also be determined by higher-level protocols" the question is what
exactly the "also" means: is it that the higher-level protocols (HLP) can
decide that a newline character is not a paragraph boundary, as Emacs does,
or is it that the HLP can only declare paragraph boundaries *in addition
to *paragraph separator characters?
On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <itai.berli@gmail.com> wrote:
> > The UBA allows applications to employ "higher-level protocols" when
> > deciding on base paragraph direction. See section 4.3 in UAX#9 and
> specifically clause HL1 there.
>
> > This is what Emacs does: it applies its own heuristics for this
> > decision. The reason for that is that Emacs's implementation of the
> > UBA must work reasonably well in plain-text buffers, where typically
> > long paragraphs are broken into lines by newline characters (which are
> > paragraph separators according to the UBA), and many times the
> > partition into lines is done by auto-fill or similar features, thus
> > making the first character of the next line fairly arbitrary. Using
> > the UBA paragraph-direction determination would then produce
> > unacceptable results, whereby the direction of a part of a paragraph
> > could change in unpredictable ways when text is refilled.
>
> As I understand it, the "higher-level protocols" provision is intended
> to allow for such things as table cells, elements of structured markup
> languages, and word processors that use an idio-syncratic
> implementation of a paragraph separator *under the hood*. It is not
> intended for plain running text; for this the standard specifies
> explicitly what the paragraph separators for every operating system
> are.
>
> > typically long paragraphs are broken into lines by newline characters
>
> I see no evidence of the validity of this statement on my system (Emacs
> 25.1.1). But even if this were so, it would still not merit
> *hard-coding* the paragraph separator as a blank line, as there are
> situations (such as the one I presented in my bug report) that require
> a diffferent configuration.
>
> > You can alleviate this to some extent by ...(in your case) starting
> > the paragraph with an RLM control character before \noindent,
> > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> > that the backslash displays to the left of "noindent"). This is
> > admittedly a bit awkward, but I think the results are still acceptable.
>
> As you mentioned, the solution is cubersome. It might have been
> acceptable if this was the sole issue, but this example illustrates just
> one of
> several problems that arise due to current paragraph separator
> convention.
>
> In conclusion, and on a personal note, I implore you to change this
> behavior, and to do so as soon as possible, and not only for specialized
> markup documents, but for every document.
>
> I am currently working on my thesis. Emacs is useless to me as a text
> editor of Hebrew texts without this feature. This is no
> exaggeration.
>
> The original reason I chose Emacs over other editors was because of
> the combination of AUCTeX and the promise of full Unicode
> compatibility. AUCTeX has delivered on its promise, but in the area of
> Unicode, as far as my needs are concerned it is if there was no Unicode
> support at all, and I will be sadly forced to look for a different editor.
>
[-- Attachment #2: Type: text/html, Size: 4716 bytes --]
next prev parent reply other threads:[~2017-07-04 10:42 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
2017-06-29 9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
2017-06-29 18:36 ` Itai Berli
2017-07-04 10:42 ` Itai Berli [this message]
2017-07-04 15:03 ` Eli Zaretskii
2017-07-04 15:57 ` Itai Berli
2017-07-04 16:18 ` Eli Zaretskii
2017-07-04 16:37 ` Itai Berli
2017-07-04 16:47 ` Eli Zaretskii
2017-07-04 17:01 ` Itai Berli
2017-07-04 17:46 ` Eli Zaretskii
2017-07-12 15:10 ` Itai Berli
2017-07-12 15:36 ` Eli Zaretskii
2017-07-12 15:52 ` Itai Berli
2017-07-12 16:12 ` Eli Zaretskii
2017-07-17 14:54 ` Eli Zaretskii
2017-07-17 15:16 ` Itai Berli
2017-07-17 15:23 ` Jean-Christophe Helary
2017-07-17 18:33 ` Itai Berli
2017-07-17 20:20 ` Eli Zaretskii
2017-07-17 20:43 ` Itai Berli
2017-07-18 2:33 ` Eli Zaretskii
2017-07-18 2:45 ` Glenn Morris
2017-07-18 4:01 ` Itai Berli
2017-07-18 4:54 ` Eli Zaretskii
2017-07-18 5:52 ` Itai Berli
2017-07-18 13:27 ` Itai Berli
2017-07-18 14:44 ` Eli Zaretskii
2017-07-18 15:22 ` Itai Berli
2017-07-18 15:55 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABsNJ=PZwamr9wwn=HSDmedVcSvw94gUTU6=-O=-u1A7BMUuUw@mail.gmail.com' \
--to=itai.berli@gmail.com \
--cc=27526@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.