all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Itai Berli <itai.berli@gmail.com>
To: 27526@debbugs.gnu.org
Subject: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
Date: Tue, 4 Jul 2017 18:57:33 +0300	[thread overview]
Message-ID: <CABsNJ=Nu8A-rej-C8qFH3w4LEzirHrf5q+LMzmrTqBE77ibHFQ@mail.gmail.com> (raw)
In-Reply-To: <83inj8nt0h.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4559 bytes --]

> As I already explained, the behavior of GEdit is unacceptable for
Emacs, because most modes derived from Text mode tend to deal with
buffers where lines are broken by newlines, so potentially switching
paragraph direction just because a newline happens to be there would
have devastating effect on the text as displayed.

How about letting the user decide what's best for them? Would it be
possible to add an option to Emacs that a user can set, say, in their
.emacs file, which will determine whether the bidi imiplementation will
consider the newline character as the paragraph separator or an empty line?

On Tue, Jul 4, 2017 at 6:03 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 13:42:19 +0300
> >
> > I'd like to add another reason why this behavior is problematic: it
> breaks interoperability with other plain text
> > editors, since the text will not be displayed the same way. Consider,
> for instance, the very same plain text file
> > in GEdit: http://imgur.com/Iw4yrdQ
> > in Emacs: http://imgur.com/7kfWseE
>
> As I already explained, the behavior of GEdit is unacceptable for
> Emacs, because most modes derived from Text mode tend to deal with
> buffers where lines are broken by newlines, so potentially switching
> paragraph direction just because a newline happens to be there would
> have devastating effect on the text as displayed.  This is perhaps in
> contrast with other editors and word-processors which mostly deal with
> long lines without hard newlines.  That's why the notion of paragraph
> in Emacs's UBA implementation was chosen to fit the traditional Emacs
> definition of paragraph in text-mode and its derivatives.
>
> > Finally, the question of whether Emacs behavior is consistent with the
> UBA specifications is debatable, since
> > when UBA section 3 states "Paragraphs may also be determined by
> higher-level protocols" the question is
> > what exactly the "also" means: is it that the higher-level protocols
> (HLP) can decide that a newline character is
> > not a paragraph boundary, as Emacs does, or is it that the HLP can only
> declare paragraph boundaries in
> > addition to paragraph separator characters?
>
> It is clear from the context and the example following the above
> sentence that "also" doesn't mean "in addition".
>
> However, the main issue is not the paragraph boundary, the main issue
> is how the base direction of the paragraph is determined.  Because no
> matter where the paragraph boundary is, if the base direction is not
> recalculated there, then the fact that the boundary is there doesn't
> matter.
>
> From Section 4.3 Higher-Level Protocols of the UAX#9:
>
>   HL1. Override P3, and set the paragraph embedding level
>        explicitly. This does not apply when deciding how to treat FSI
>        in rule X5c.
>
>        . A higher-level protocol may set any paragraph level. This can
>          be done on the basis of the context, such as on a table cell,
>          paragraph, document, or system level. (P2 may be skipped if
>          P3 is overridden). [...]
>        . A higher-level protocol may apply rules equivalent to P2 and
>          P3 but default to level 1 (RTL) rather than 0 (LTR) to match
>          overall RTL context.
>        . A higher-level protocol may use an entirely different
>          algorithm that heuristically auto-detects the paragraph
>          embedding level based on the paragraph text and its
>          context. For example, it could base it on whether there are
>          more RTL characters in the text than LTR. As another example,
>          when the paragraph contains no strong characters, its
>          direction could be determined by the levels of the paragraphs
>          before and after.
>
> And Section 3.3.1, which describes the P1, P2, and P3 paragraph-level
> rules, says:
>
>   Whenever a higher-level protocol specifies the paragraph level,
>   rules P2 and P3 may be overridden: see HL1.
>
> So an application is allowed to override _all_ of the paragraph-level
> rules, and do what suits it best.  And based on some non-negligible
> experience with bidi-aware applications, I submit that an application
> that does _not_ employ some higher-level protocol for base paragraph
> direction will violate user expectations when working with plain text.
> E.g., try reading in MS Outlook an unformatted text message which has
> a lot of RTL text mixed with LTR.  It's unreadable; I always
> copy/paste it into Emacs, and only then I'm able to read it.
>

[-- Attachment #2: Type: text/html, Size: 5859 bytes --]

  reply	other threads:[~2017-07-04 15:57 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
2017-06-29 18:36 ` Itai Berli
2017-07-04 10:42   ` Itai Berli
2017-07-04 15:03     ` Eli Zaretskii
2017-07-04 15:57       ` Itai Berli [this message]
2017-07-04 16:18         ` Eli Zaretskii
2017-07-04 16:37           ` Itai Berli
2017-07-04 16:47             ` Eli Zaretskii
2017-07-04 17:01               ` Itai Berli
2017-07-04 17:46                 ` Eli Zaretskii
2017-07-12 15:10                   ` Itai Berli
2017-07-12 15:36                     ` Eli Zaretskii
2017-07-12 15:52                       ` Itai Berli
2017-07-12 16:12                         ` Eli Zaretskii
2017-07-17 14:54           ` Eli Zaretskii
2017-07-17 15:16             ` Itai Berli
2017-07-17 15:23               ` Jean-Christophe Helary
2017-07-17 18:33                 ` Itai Berli
2017-07-17 20:20                   ` Eli Zaretskii
2017-07-17 20:43                     ` Itai Berli
2017-07-18  2:33                       ` Eli Zaretskii
2017-07-18  2:45                         ` Glenn Morris
2017-07-18  4:01                           ` Itai Berli
2017-07-18  4:54                             ` Eli Zaretskii
2017-07-18  5:52                               ` Itai Berli
2017-07-18 13:27                                 ` Itai Berli
2017-07-18 14:44                                   ` Eli Zaretskii
2017-07-18 15:22                                     ` Itai Berli
2017-07-18 15:55                                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABsNJ=Nu8A-rej-C8qFH3w4LEzirHrf5q+LMzmrTqBE77ibHFQ@mail.gmail.com' \
    --to=itai.berli@gmail.com \
    --cc=27526@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.