unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
@ 2017-06-29  9:16 Itai Berli
  2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Itai Berli @ 2017-06-29  9:16 UTC (permalink / raw)
  To: 27526

According to the Emacs manual (section 37.26 Bidirectional Display)

>  Emacs provides a “Full Bidirectionality” class implementation of the
>  UBA, consistent with the requirements of the Unicode Standard v8.0.

And again (section 22.19 Bidirectional Editing)

> Emacs implements the Unicode Bidirectional Algorithm described in the Unicode Standard Annex #9, for reordering of bidirectional text for display.

However these statements are false. Emacs does not implement the Unicode
Bidirectional Algorithm correctly, and therefore does not even provide
'Implicit bidirectionality', which is the minimal level of conformance
listed in section 4.2 'Explicit Formatting Character' of the Unicode
8.0.0 Bidirectional Algorithm specifications
(www.unicode.org/reports/tr9/tr9-33.html), let alone 'Full bidirectionality'.

The reason has to do with the way the Emacs bidi implementation
recognizes separate paragraphs, which is inconsistent with the Unicode
specifications.

The unicode Bidirectional Algorithm, specify (section 3 'Basic
Display Algorithm')

> The algorithm reorders text only within a paragraph; characters in one
> paragraph have no effect on characters in a different
> paragraph. Paragraphs are divided by the Paragraph Separator or
> appropriate Newline Function (for guidelines on the handling of CR,
> LF, and CRLF, see Section 4.4, Directionality, and Section 5.8,
> Newline Guidelines of [Unicode]).

However Emacs, by its own admition (section 22.19 Bidirectional
Editing), take the following approach:

> Paragraph boundaries are empty lines, i.e., lines consisting entirely of whitespace characters.

I'll repeat: according to Unicode a paragraph ends with a paragraph
separator. What constitutes a paragraph separator is specified precisely
in section 5.8 'Newline Guidelines' of The Unicode Standard version
8.0.0. For instance, on a MacOS X system, it is `LF` (line feed,
Unicode 000A). The formatting effects of the bidi algorithm must not
cross the paragraph separator boundary.

And yet in Emacs the formatting extend beyond the paragraph separator,
and this is the case on all operating systems. Consider, for instance,
the following example.

ILLUSTRATION: An English paragraph directly following a Hebrew paragraph
is formatted like Hebrew text.
http://imgur.com/3eyrUfA

The first, Hebrew paragraph is formatted correctly, however the second,
English paragraph is formatted wrongly, as though it was a Hebrew
paragraph: it is right justified, the question mark appears on the left,
and so does the cursor. Once an empty paragraph is inserted between the two
paragraph, the English paragraph is formatted correctly.

ILLUSTRATION: When paragraphs are separated by an empty paragraph, they
are formatted correctly.
http://imgur.com/ZsHGkwf

This is not just a theoretical question of conformance to standards;
this problem has practical consequences.

Consider, for
instance, a LaTeX document for typesetting Hebrew
text. Normally in order to eliminate the usual leading indentation of
the first line of a paragraph, a `\noinent` command is placed at the
beginning of the paragraph. However, because the Unicode bidi algorithm
determins the directionality of a paragraph based on its first word, the
Hebrew text is formatted like English text. This is not a problem; it is
to be expected.

ILLUSTRATION: A LaTeX document for typesetting a Hebrew paragraph with
no indentation of the first line.
http://imgur.com/xYUkZKr

One way to resolve this is to explicitly change the directionality of the
paragraph, however, disregarding the fact that this is not currently
possible due to a separate Emacs bug, even if it were possible, it would
affect the placement of the backslash at the beginning of the
`\noindent` command, which will no longer look like a LaTeX command.

ILLUSTRATION: Explicitly changing the directionality of the
paragraph.
http://imgur.com/sPcVReA

(Note: This is a screenshot of a Microsoft Word application,
since due to a bug, Emacs doesn't currently enable to change the
automatically determined directionality of a paragraph.)

So the best way to resolve this problem would be to place the `\noindent`
command on a separate paragraph. Unfortunately, here Emacs' faulty
implementatino of the Unicode bidi algorithm rears its ugly
head. Since Emacs doesn't recognize the paragraph separator for what it
is, it will format the Hebrew text wrongly as though it were an English text.

ILLUSTRATION: Putting the `\noindent` on a separate paragraph results in
the Hebrew text being formatted like English text
http://imgur.com/44ds6rK

Placing an empty paragraph between the `\noindent' command and the
Hebrew text will resolve the formatting problem inside the Emacs editor, but
now the `\indent` command, which only affects the current LaTeX
paragraphs (LaTeX paragraphs are ended by an empty line), no longer
eliminates the indentation of the first line of the Hebrew paragraph in
the typeset file.



In GNU Emacs 25.1.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21
Version 10.9.5 (Build 13F1911))
 of 2016-09-21 built on builder10-9.porkrind.org
Windowing system distributor 'Apple', version 10.3.1504
Configured using:
 'configure --with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/${version}/site-lisp:/Library/Application
 Support/Emacs/site-lisp' --with-modules'

Configured features:
NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  ivy-mode: t
  shell-dirtrack-mode: t
  projectile-mode: t
  helm-descbinds-mode: t
  async-bytecomp-package-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
ad-handle-definition: ‘ibuffer’ got redefined
Turn on helm-projectile key bindings
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
/Users/itaiberli/.emacs.d/elpa/seq-2.20/seq hides
/Applications/Emacs.app/Contents/Resources/lisp/emacs-lisp/seq

Features:
(shadow sort mail-extr emacsbug message rfc822 mml mml-sec epg mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mail-utils colir color counsel
jka-compr esh-util etags xref project swiper reftex reftex-vars
two-column ivy delsel ivy-overlay helm-projectile helm-files rx
image-dired tramp tramp-compat tramp-loaddefs trampver shell pcomplete
format-spec dired-x dired-aux ffap helm-tags helm-bookmark helm-adaptive
helm-info bookmark pp helm-external helm-net browse-url xml url
url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util url-parse auth-source gnus-util mm-util help-fns
mail-prsvr password-cache url-vars mailcap helm-buffers helm-grep
helm-regexp helm-utils helm-locate helm-help helm-types projectile grep
compile comint ansi-color ring ibuf-ext ibuffer thingatpt helm-descbinds
helm easy-mmode helm-source cl-seq eieio-compat eieio eieio-core
helm-multi-match helm-lib dired helm-config helm-easymenu cl-macs
async-bytecomp async advice edmacro kmacro finder-inf tex-site info
package epg-config seq byte-opt gv bytecomp byte-compile cl-extra
help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel ns-win ucs-normalize term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham
georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote kqueue cocoa ns multi-tty
make-network-process emacs)

Memory information:
((conses 16 312045 13704)
 (symbols 48 30403 0)
 (miscs 40 88 192)
 (strings 32 51754 11765)
 (string-bytes 1 1669992)
 (vectors 16 50218)
 (vector-slots 8 844617 7052)
 (floats 8 564 218)
 (intervals 56 242 111)
 (buffers 976 18))





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: Explicit directionality marks CAN be inserted!
  2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
@ 2017-06-29  9:42 ` Itai Berli
  2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
  2017-06-29 18:36 ` Itai Berli
  2 siblings, 0 replies; 31+ messages in thread
From: Itai Berli @ 2017-06-29  9:42 UTC (permalink / raw)
  To: 27526

I'd like to retract my statement I made in the LaTeX example that
inserting explicit directionality marks doesn't work in Emacs. It
does.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
  2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
@ 2017-06-29 14:49 ` Eli Zaretskii
  2017-06-29 18:36 ` Itai Berli
  2 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2017-06-29 14:49 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Thu, 29 Jun 2017 12:16:00 +0300
> 
> I'll repeat: according to Unicode a paragraph ends with a paragraph
> separator. What constitutes a paragraph separator is specified precisely
> in section 5.8 'Newline Guidelines' of The Unicode Standard version
> 8.0.0. For instance, on a MacOS X system, it is `LF` (line feed,
> Unicode 000A). The formatting effects of the bidi algorithm must not
> cross the paragraph separator boundary.
> 
> And yet in Emacs the formatting extend beyond the paragraph separator,
> and this is the case on all operating systems. Consider, for instance,
> the following example.

The UBA allows applications to employ "higher-level protocols" when
deciding on base paragraph direction.  See section 4.3 in UAX#9 and
specifically clause HL1 there.

This is what Emacs does: it applies its own heuristics for this
decision.  The reason for that is that Emacs's implementation of the
UBA must work reasonably well in plain-text buffers, where typically
long paragraphs are broken into lines by newline characters (which are
paragraph separators according to the UBA), and many times the
partition into lines is done by auto-fill or similar features, thus
making the first character of the next line fairly arbitrary.  Using
the UBA paragraph-direction determination would then produce
unacceptable results, whereby the direction of a part of a paragraph
could change in unpredictable ways when text is refilled.

> Consider, for
> instance, a LaTeX document for typesetting Hebrew
> text. Normally in order to eliminate the usual leading indentation of
> the first line of a paragraph, a `\noinent` command is placed at the
> beginning of the paragraph. However, because the Unicode bidi algorithm
> determins the directionality of a paragraph based on its first word, the
> Hebrew text is formatted like English text. This is not a problem; it is
> to be expected.

The Emacs bidirectional display doesn't have special facilities for
marked-up text, such as TeX and HTML/XML.  Because those markups use
punctuation characters for their markup, doing so in RTL context can
produce unpleasant results in the default display, as you point out.

You can alleviate this to some extent by (in your case) starting the
paragraph with an RLM control character before \noindent, optionally
followed by an LRM or enclosing \noindent in LRE..PDF (so that the
backslash displays to the left of "noindent").  This is admittedly a
bit awkward, but I think the results are still acceptable.

I will gladly work with anyone who'd volunteer to introduce features
required to better support markup languages.  This will require
low-level display changes and some support from the relevant major
modes to use those features.  For now, the demand was sufficiently low
(I think you are about the second person to raise the issue since
bidirectional display debuted in Emacs 24.1) to keep this issue low on
our TODO.

> One way to resolve this is to explicitly change the directionality of the
> paragraph, however, disregarding the fact that this is not currently
> possible due to a separate Emacs bug, even if it were possible, it would
> affect the placement of the backslash at the beginning of the
> `\noindent` command, which will no longer look like a LaTeX command.

I think my suggestion above fixes this latter issue as well.

Thanks.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
  2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
  2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
@ 2017-06-29 18:36 ` Itai Berli
  2017-07-04 10:42   ` Itai Berli
  2 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-06-29 18:36 UTC (permalink / raw)
  To: 27526

> The UBA allows applications to employ "higher-level protocols" when
> deciding on base paragraph direction.  See section 4.3 in UAX#9 and specifically clause HL1 there.

> This is what Emacs does: it applies its own heuristics for this
> decision.  The reason for that is that Emacs's implementation of the
> UBA must work reasonably well in plain-text buffers, where typically
> long paragraphs are broken into lines by newline characters (which are
> paragraph separators according to the UBA), and many times the
> partition into lines is done by auto-fill or similar features, thus
> making the first character of the next line fairly arbitrary.  Using
> the UBA paragraph-direction determination would then produce
> unacceptable results, whereby the direction of a part of a paragraph
> could change in unpredictable ways when text is refilled.

 As I understand it, the "higher-level protocols" provision is intended
 to allow for such things as table cells, elements of structured markup
 languages, and word processors that use an idio-syncratic
 implementation of a paragraph separator *under the hood*. It is not
 intended for plain running text; for this the standard specifies
 explicitly what the paragraph separators for every operating system
 are.

> typically long paragraphs are broken into lines by newline characters

I see no evidence of the validity of this statement on my system (Emacs
25.1.1). But even if this were so, it would still not merit
*hard-coding* the paragraph separator as a blank line, as there are
situations (such as the one I presented in my bug report) that require
a diffferent configuration.

> You can alleviate this to some extent by ...(in your case) starting
> the paragraph with an RLM control character before \noindent,
> optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> that the backslash displays to the left of "noindent").  This is
> admittedly a bit awkward, but I think the results are still acceptable.

As you mentioned, the solution is cubersome. It might have been
acceptable if this was the sole issue, but this example illustrates just one of
several problems that arise due to current paragraph separator
convention.

In conclusion, and on a personal note, I implore you to change this
behavior, and to do so as soon as possible, and not only for specialized
markup documents, but for every document.

I am currently working on my thesis. Emacs is useless to me as a text
editor of Hebrew texts without this feature. This is no
exaggeration.

The original reason I chose Emacs over other editors was because of
the combination of AUCTeX and the promise of full Unicode
compatibility. AUCTeX has delivered on its promise, but in the area of
Unicode, as far as my needs are concerned it is if there was no Unicode
support at all, and I will be sadly forced to look for a different editor.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-06-29 18:36 ` Itai Berli
@ 2017-07-04 10:42   ` Itai Berli
  2017-07-04 15:03     ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-04 10:42 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 3835 bytes --]

I'd like to add another reason why this behavior is problematic: it breaks
interoperability with other plain text editors, since the text will not be
displayed the same way. Consider, for instance, the very same plain text
file
in GEdit: http://imgur.com/Iw4yrdQ
in Emacs: http://imgur.com/7kfWseE

Finally, the question of whether Emacs behavior is consistent with the UBA
specifications is debatable, since when UBA section 3 states "Paragraphs
may also be determined by higher-level protocols" the question is what
exactly the "also" means: is it that the higher-level protocols (HLP) can
decide that a newline character is not a paragraph boundary, as Emacs does,
or is it that the HLP can only declare paragraph boundaries *in addition
to *paragraph separator characters?

On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli <itai.berli@gmail.com> wrote:

> > The UBA allows applications to employ "higher-level protocols" when
> > deciding on base paragraph direction.  See section 4.3 in UAX#9 and
> specifically clause HL1 there.
>
> > This is what Emacs does: it applies its own heuristics for this
> > decision.  The reason for that is that Emacs's implementation of the
> > UBA must work reasonably well in plain-text buffers, where typically
> > long paragraphs are broken into lines by newline characters (which are
> > paragraph separators according to the UBA), and many times the
> > partition into lines is done by auto-fill or similar features, thus
> > making the first character of the next line fairly arbitrary.  Using
> > the UBA paragraph-direction determination would then produce
> > unacceptable results, whereby the direction of a part of a paragraph
> > could change in unpredictable ways when text is refilled.
>
>  As I understand it, the "higher-level protocols" provision is intended
>  to allow for such things as table cells, elements of structured markup
>  languages, and word processors that use an idio-syncratic
>  implementation of a paragraph separator *under the hood*. It is not
>  intended for plain running text; for this the standard specifies
>  explicitly what the paragraph separators for every operating system
>  are.
>
> > typically long paragraphs are broken into lines by newline characters
>
> I see no evidence of the validity of this statement on my system (Emacs
> 25.1.1). But even if this were so, it would still not merit
> *hard-coding* the paragraph separator as a blank line, as there are
> situations (such as the one I presented in my bug report) that require
> a diffferent configuration.
>
> > You can alleviate this to some extent by ...(in your case) starting
> > the paragraph with an RLM control character before \noindent,
> > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so
> > that the backslash displays to the left of "noindent").  This is
> > admittedly a bit awkward, but I think the results are still acceptable.
>
> As you mentioned, the solution is cubersome. It might have been
> acceptable if this was the sole issue, but this example illustrates just
> one of
> several problems that arise due to current paragraph separator
> convention.
>
> In conclusion, and on a personal note, I implore you to change this
> behavior, and to do so as soon as possible, and not only for specialized
> markup documents, but for every document.
>
> I am currently working on my thesis. Emacs is useless to me as a text
> editor of Hebrew texts without this feature. This is no
> exaggeration.
>
> The original reason I chose Emacs over other editors was because of
> the combination of AUCTeX and the promise of full Unicode
> compatibility. AUCTeX has delivered on its promise, but in the area of
> Unicode, as far as my needs are concerned it is if there was no Unicode
> support at all, and I will be sadly forced to look for a different editor.
>

[-- Attachment #2: Type: text/html, Size: 4716 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 10:42   ` Itai Berli
@ 2017-07-04 15:03     ` Eli Zaretskii
  2017-07-04 15:57       ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-04 15:03 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 4 Jul 2017 13:42:19 +0300
> 
> I'd like to add another reason why this behavior is problematic: it breaks interoperability with other plain text
> editors, since the text will not be displayed the same way. Consider, for instance, the very same plain text file
> in GEdit: http://imgur.com/Iw4yrdQ
> in Emacs: http://imgur.com/7kfWseE

As I already explained, the behavior of GEdit is unacceptable for
Emacs, because most modes derived from Text mode tend to deal with
buffers where lines are broken by newlines, so potentially switching
paragraph direction just because a newline happens to be there would
have devastating effect on the text as displayed.  This is perhaps in
contrast with other editors and word-processors which mostly deal with
long lines without hard newlines.  That's why the notion of paragraph
in Emacs's UBA implementation was chosen to fit the traditional Emacs
definition of paragraph in text-mode and its derivatives.

> Finally, the question of whether Emacs behavior is consistent with the UBA specifications is debatable, since
> when UBA section 3 states "Paragraphs may also be determined by higher-level protocols" the question is
> what exactly the "also" means: is it that the higher-level protocols (HLP) can decide that a newline character is
> not a paragraph boundary, as Emacs does, or is it that the HLP can only declare paragraph boundaries in
> addition to paragraph separator characters?

It is clear from the context and the example following the above
sentence that "also" doesn't mean "in addition".

However, the main issue is not the paragraph boundary, the main issue
is how the base direction of the paragraph is determined.  Because no
matter where the paragraph boundary is, if the base direction is not
recalculated there, then the fact that the boundary is there doesn't
matter.

From Section 4.3 Higher-Level Protocols of the UAX#9:

  HL1. Override P3, and set the paragraph embedding level
       explicitly. This does not apply when deciding how to treat FSI
       in rule X5c.

       . A higher-level protocol may set any paragraph level. This can
       	 be done on the basis of the context, such as on a table cell,
       	 paragraph, document, or system level. (P2 may be skipped if
       	 P3 is overridden). [...]
       . A higher-level protocol may apply rules equivalent to P2 and
       	 P3 but default to level 1 (RTL) rather than 0 (LTR) to match
       	 overall RTL context.
       . A higher-level protocol may use an entirely different
       	 algorithm that heuristically auto-detects the paragraph
       	 embedding level based on the paragraph text and its
       	 context. For example, it could base it on whether there are
       	 more RTL characters in the text than LTR. As another example,
       	 when the paragraph contains no strong characters, its
       	 direction could be determined by the levels of the paragraphs
       	 before and after.

And Section 3.3.1, which describes the P1, P2, and P3 paragraph-level
rules, says:

  Whenever a higher-level protocol specifies the paragraph level,
  rules P2 and P3 may be overridden: see HL1.

So an application is allowed to override _all_ of the paragraph-level
rules, and do what suits it best.  And based on some non-negligible
experience with bidi-aware applications, I submit that an application
that does _not_ employ some higher-level protocol for base paragraph
direction will violate user expectations when working with plain text.
E.g., try reading in MS Outlook an unformatted text message which has
a lot of RTL text mixed with LTR.  It's unreadable; I always
copy/paste it into Emacs, and only then I'm able to read it.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 15:03     ` Eli Zaretskii
@ 2017-07-04 15:57       ` Itai Berli
  2017-07-04 16:18         ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-04 15:57 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 4559 bytes --]

> As I already explained, the behavior of GEdit is unacceptable for
Emacs, because most modes derived from Text mode tend to deal with
buffers where lines are broken by newlines, so potentially switching
paragraph direction just because a newline happens to be there would
have devastating effect on the text as displayed.

How about letting the user decide what's best for them? Would it be
possible to add an option to Emacs that a user can set, say, in their
.emacs file, which will determine whether the bidi imiplementation will
consider the newline character as the paragraph separator or an empty line?

On Tue, Jul 4, 2017 at 6:03 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 13:42:19 +0300
> >
> > I'd like to add another reason why this behavior is problematic: it
> breaks interoperability with other plain text
> > editors, since the text will not be displayed the same way. Consider,
> for instance, the very same plain text file
> > in GEdit: http://imgur.com/Iw4yrdQ
> > in Emacs: http://imgur.com/7kfWseE
>
> As I already explained, the behavior of GEdit is unacceptable for
> Emacs, because most modes derived from Text mode tend to deal with
> buffers where lines are broken by newlines, so potentially switching
> paragraph direction just because a newline happens to be there would
> have devastating effect on the text as displayed.  This is perhaps in
> contrast with other editors and word-processors which mostly deal with
> long lines without hard newlines.  That's why the notion of paragraph
> in Emacs's UBA implementation was chosen to fit the traditional Emacs
> definition of paragraph in text-mode and its derivatives.
>
> > Finally, the question of whether Emacs behavior is consistent with the
> UBA specifications is debatable, since
> > when UBA section 3 states "Paragraphs may also be determined by
> higher-level protocols" the question is
> > what exactly the "also" means: is it that the higher-level protocols
> (HLP) can decide that a newline character is
> > not a paragraph boundary, as Emacs does, or is it that the HLP can only
> declare paragraph boundaries in
> > addition to paragraph separator characters?
>
> It is clear from the context and the example following the above
> sentence that "also" doesn't mean "in addition".
>
> However, the main issue is not the paragraph boundary, the main issue
> is how the base direction of the paragraph is determined.  Because no
> matter where the paragraph boundary is, if the base direction is not
> recalculated there, then the fact that the boundary is there doesn't
> matter.
>
> From Section 4.3 Higher-Level Protocols of the UAX#9:
>
>   HL1. Override P3, and set the paragraph embedding level
>        explicitly. This does not apply when deciding how to treat FSI
>        in rule X5c.
>
>        . A higher-level protocol may set any paragraph level. This can
>          be done on the basis of the context, such as on a table cell,
>          paragraph, document, or system level. (P2 may be skipped if
>          P3 is overridden). [...]
>        . A higher-level protocol may apply rules equivalent to P2 and
>          P3 but default to level 1 (RTL) rather than 0 (LTR) to match
>          overall RTL context.
>        . A higher-level protocol may use an entirely different
>          algorithm that heuristically auto-detects the paragraph
>          embedding level based on the paragraph text and its
>          context. For example, it could base it on whether there are
>          more RTL characters in the text than LTR. As another example,
>          when the paragraph contains no strong characters, its
>          direction could be determined by the levels of the paragraphs
>          before and after.
>
> And Section 3.3.1, which describes the P1, P2, and P3 paragraph-level
> rules, says:
>
>   Whenever a higher-level protocol specifies the paragraph level,
>   rules P2 and P3 may be overridden: see HL1.
>
> So an application is allowed to override _all_ of the paragraph-level
> rules, and do what suits it best.  And based on some non-negligible
> experience with bidi-aware applications, I submit that an application
> that does _not_ employ some higher-level protocol for base paragraph
> direction will violate user expectations when working with plain text.
> E.g., try reading in MS Outlook an unformatted text message which has
> a lot of RTL text mixed with LTR.  It's unreadable; I always
> copy/paste it into Emacs, and only then I'm able to read it.
>

[-- Attachment #2: Type: text/html, Size: 5859 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 15:57       ` Itai Berli
@ 2017-07-04 16:18         ` Eli Zaretskii
  2017-07-04 16:37           ` Itai Berli
  2017-07-17 14:54           ` Eli Zaretskii
  0 siblings, 2 replies; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-04 16:18 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 4 Jul 2017 18:57:33 +0300
> 
> How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a
> user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the
> newline character as the paragraph separator or an empty line?

Could be.  I'd need to carefully review the code to say for sure.
Originally, the regexp which defines where paragraph begins was
customizable, but it led to grave bugs, so I removed that.  Maybe a
more restricted facility could avoid such pitfalls.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 16:18         ` Eli Zaretskii
@ 2017-07-04 16:37           ` Itai Berli
  2017-07-04 16:47             ` Eli Zaretskii
  2017-07-17 14:54           ` Eli Zaretskii
  1 sibling, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-04 16:37 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

If you can do it, that'll be fantastic. And while you're perusing the code,
perhaps you can see if it is also possible to allow the user to decide
whether they want the bidi control characters to be visible or not

On Tue, Jul 4, 2017 at 7:18 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 18:57:33 +0300
> >
> > How about letting the user decide what's best for them? Would it be
> possible to add an option to Emacs that a
> > user can set, say, in their .emacs file, which will determine whether
> the bidi imiplementation will consider the
> > newline character as the paragraph separator or an empty line?
>
> Could be.  I'd need to carefully review the code to say for sure.
> Originally, the regexp which defines where paragraph begins was
> customizable, but it led to grave bugs, so I removed that.  Maybe a
> more restricted facility could avoid such pitfalls.
>

[-- Attachment #2: Type: text/html, Size: 1358 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 16:37           ` Itai Berli
@ 2017-07-04 16:47             ` Eli Zaretskii
  2017-07-04 17:01               ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-04 16:47 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 4 Jul 2017 19:37:04 +0300
> 
> And while you're perusing the code, perhaps you can see if it is also
> possible to allow the user to decide whether they want the bidi control characters to be visible or not

You can do that already: just customize glyphless-char-display-control
to be 'zero-width' for the 'format-control' class, and these
characters will become invisible.  Didn't I mention that up-thread?





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 16:47             ` Eli Zaretskii
@ 2017-07-04 17:01               ` Itai Berli
  2017-07-04 17:46                 ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-04 17:01 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

You did, but it would be much nicer for a noob like me to be able to simply
type in my .emacs file something like: (bidi.markers.visible false), or
maybe even

(bidi.markers.ALM null)
(bidi.markers.RLM ⊲)
(bidi.markers.LRM ⊳)
...

Isn't the Bidi feature important and complicated enough to merit its own
tailored set of customizable parameters?


On Tue, Jul 4, 2017 at 7:47 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 19:37:04 +0300
> >
> > And while you're perusing the code, perhaps you can see if it is also
> > possible to allow the user to decide whether they want the bidi control
> characters to be visible or not
>
> You can do that already: just customize glyphless-char-display-control
> to be 'zero-width' for the 'format-control' class, and these
> characters will become invisible.  Didn't I mention that up-thread?
>

[-- Attachment #2: Type: text/html, Size: 1486 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 17:01               ` Itai Berli
@ 2017-07-04 17:46                 ` Eli Zaretskii
  2017-07-12 15:10                   ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-04 17:46 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 4 Jul 2017 20:01:25 +0300
> 
> You did, but it would be much nicer for a noob like me to be able to simply type in my .emacs file something
> like: (bidi.markers.visible false), or maybe even
> 
> (bidi.markers.ALM null)
> (bidi.markers.RLM ⊲)
> (bidi.markers.LRM ⊳)

Sorry, I don't see why the exact way how to customize this is so
important.  glyphless-char-display-control is a user-level
customizable variable, not some obscure feature that requires Lisp
programming to tailor it to your needs.

> Isn't the Bidi feature important and complicated enough to merit its own tailored set of customizable
> parameters?

It does have its private customizations, but this one isn't one of
them, I don't see why it should be.  The characters of the Cf general
category are quite a few, and Emacs handled them all the same, because
they all have the same nature.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 17:46                 ` Eli Zaretskii
@ 2017-07-12 15:10                   ` Itai Berli
  2017-07-12 15:36                     ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-12 15:10 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

Is there any progress with allowing the user to customize the
end-of-paragraph mark to be the OS paragraph separator character?

On Tue, Jul 4, 2017 at 8:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 20:01:25 +0300
> >
> > You did, but it would be much nicer for a noob like me to be able to
> simply type in my .emacs file something
> > like: (bidi.markers.visible false), or maybe even
> >
> > (bidi.markers.ALM null)
> > (bidi.markers.RLM ⊲)
> > (bidi.markers.LRM ⊳)
>
> Sorry, I don't see why the exact way how to customize this is so
> important.  glyphless-char-display-control is a user-level
> customizable variable, not some obscure feature that requires Lisp
> programming to tailor it to your needs.
>
> > Isn't the Bidi feature important and complicated enough to merit its own
> tailored set of customizable
> > parameters?
>
> It does have its private customizations, but this one isn't one of
> them, I don't see why it should be.  The characters of the Cf general
> category are quite a few, and Emacs handled them all the same, because
> they all have the same nature.
>

[-- Attachment #2: Type: text/html, Size: 1653 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-12 15:10                   ` Itai Berli
@ 2017-07-12 15:36                     ` Eli Zaretskii
  2017-07-12 15:52                       ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-12 15:36 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Wed, 12 Jul 2017 18:10:19 +0300
> 
> Is there any progress with allowing the user to customize the end-of-paragraph mark to be the OS paragraph
> separator character?

No, I didn't yet have time to work on that.  (And I think you were
talking about the newline character, not the paragraph separator
character.)





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-12 15:36                     ` Eli Zaretskii
@ 2017-07-12 15:52                       ` Itai Berli
  2017-07-12 16:12                         ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-12 15:52 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

> I think you were talking about the newline character, not the paragraph
separator character.

On UNIX and contemporary macOS it's U+000A (LF), on Windows it's the
sequence U+000D U+000A (CR LF).

On Wed, Jul 12, 2017 at 6:36 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Wed, 12 Jul 2017 18:10:19 +0300
> >
> > Is there any progress with allowing the user to customize the
> end-of-paragraph mark to be the OS paragraph
> > separator character?
>
> No, I didn't yet have time to work on that.  (And I think you were
> talking about the newline character, not the paragraph separator
> character.)
>

[-- Attachment #2: Type: text/html, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-12 15:52                       ` Itai Berli
@ 2017-07-12 16:12                         ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-12 16:12 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Wed, 12 Jul 2017 18:52:10 +0300
> 
> > I think you were talking about the newline character, not the paragraph separator character.
> 
> On UNIX and contemporary macOS it's U+000A (LF), on Windows it's the sequence U+000D U+000A (CR
> LF).

Not in the Emacs buffer: there we have only the newline (a.k.a. "LF")
characters.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-04 16:18         ` Eli Zaretskii
  2017-07-04 16:37           ` Itai Berli
@ 2017-07-17 14:54           ` Eli Zaretskii
  2017-07-17 15:16             ` Itai Berli
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-17 14:54 UTC (permalink / raw)
  To: itai.berli; +Cc: 27526

> Date: Tue, 04 Jul 2017 19:18:39 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 27526@debbugs.gnu.org
> 
> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 4 Jul 2017 18:57:33 +0300
> > 
> > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a
> > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the
> > newline character as the paragraph separator or an empty line?
> 
> Could be.  I'd need to carefully review the code to say for sure.
> Originally, the regexp which defines where paragraph begins was
> customizable, but it led to grave bugs, so I removed that.  Maybe a
> more restricted facility could avoid such pitfalls.

It turned out to be relatively easy, so I implemented this on the
master branch of the Emacs Git repository.  There are two new
variables that you should set to "^" to get the behavior you wanted.
I hope you can build the master branch and see whether the new
facilities solve your case.

Thanks.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 14:54           ` Eli Zaretskii
@ 2017-07-17 15:16             ` Itai Berli
  2017-07-17 15:23               ` Jean-Christophe Helary
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-17 15:16 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

Thanks. I've never built Emacs from source. I think it might be easier for
me to wait till this patch makes it to the official release.

On Mon, Jul 17, 2017 at 5:54 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 04 Jul 2017 19:18:39 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 27526@debbugs.gnu.org
> >
> > > From: Itai Berli <itai.berli@gmail.com>
> > > Date: Tue, 4 Jul 2017 18:57:33 +0300
> > >
> > > How about letting the user decide what's best for them? Would it be
> possible to add an option to Emacs that a
> > > user can set, say, in their .emacs file, which will determine whether
> the bidi imiplementation will consider the
> > > newline character as the paragraph separator or an empty line?
> >
> > Could be.  I'd need to carefully review the code to say for sure.
> > Originally, the regexp which defines where paragraph begins was
> > customizable, but it led to grave bugs, so I removed that.  Maybe a
> > more restricted facility could avoid such pitfalls.
>
> It turned out to be relatively easy, so I implemented this on the
> master branch of the Emacs Git repository.  There are two new
> variables that you should set to "^" to get the behavior you wanted.
> I hope you can build the master branch and see whether the new
> facilities solve your case.
>
> Thanks.
>

[-- Attachment #2: Type: text/html, Size: 1913 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 15:16             ` Itai Berli
@ 2017-07-17 15:23               ` Jean-Christophe Helary
  2017-07-17 18:33                 ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Jean-Christophe Helary @ 2017-07-17 15:23 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]


> On Jul 18, 2017, at 0:16, Itai Berli <itai.berli@gmail.com> wrote:
> 
> Thanks. I've never built Emacs from source. I think it might be easier for me to wait till this patch makes it to the official release.

It's actually pretty easy to build from source. The easiest way (that depends on your platform) is to install the version that corresponds to HEAD. The slightly less trivial way is toget the code from Savannah:
https://savannah.gnu.org/projects/emacs
clone the code and follow the instructions.
I got used to doing that a few weeks ago and it is fascinating to see all the new features pouring in everyday.

Jean-Christophe

> 
> On Mon, Jul 17, 2017 at 5:54 PM, Eli Zaretskii <eliz@gnu.org <mailto:eliz@gnu.org>> wrote:
> > Date: Tue, 04 Jul 2017 19:18:39 +0300
> > From: Eli Zaretskii <eliz@gnu.org <mailto:eliz@gnu.org>>
> > Cc: 27526@debbugs.gnu.org <mailto:27526@debbugs.gnu.org>
> >
> > > From: Itai Berli <itai.berli@gmail.com <mailto:itai.berli@gmail.com>>
> > > Date: Tue, 4 Jul 2017 18:57:33 +0300
> > >
> > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a
> > > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the
> > > newline character as the paragraph separator or an empty line?
> >
> > Could be.  I'd need to carefully review the code to say for sure.
> > Originally, the regexp which defines where paragraph begins was
> > customizable, but it led to grave bugs, so I removed that.  Maybe a
> > more restricted facility could avoid such pitfalls.
> 
> It turned out to be relatively easy, so I implemented this on the
> master branch of the Emacs Git repository.  There are two new
> variables that you should set to "^" to get the behavior you wanted.
> I hope you can build the master branch and see whether the new
> facilities solve your case.
> 
> Thanks.
> 


[-- Attachment #2: Type: text/html, Size: 3360 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 15:23               ` Jean-Christophe Helary
@ 2017-07-17 18:33                 ` Itai Berli
  2017-07-17 20:20                   ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-17 18:33 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

Eli, what version number should I download?

On Mon, Jul 17, 2017 at 6:23 PM, Jean-Christophe Helary <
jean.christophe.helary@gmail.com> wrote:

>
> On Jul 18, 2017, at 0:16, Itai Berli <itai.berli@gmail.com> wrote:
>
> Thanks. I've never built Emacs from source. I think it might be easier for
> me to wait till this patch makes it to the official release.
>
>
> It's actually pretty easy to build from source. The easiest way (that
> depends on your platform) is to install the version that corresponds to
> HEAD. The slightly less trivial way is toget the code from Savannah:
> https://savannah.gnu.org/projects/emacs
> clone the code and follow the instructions.
> I got used to doing that a few weeks ago and it is fascinating to see all
> the new features pouring in everyday.
>
> Jean-Christophe
>
>
> On Mon, Jul 17, 2017 at 5:54 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> > Date: Tue, 04 Jul 2017 19:18:39 +0300
>> > From: Eli Zaretskii <eliz@gnu.org>
>> > Cc: 27526@debbugs.gnu.org
>> >
>> > > From: Itai Berli <itai.berli@gmail.com>
>> > > Date: Tue, 4 Jul 2017 18:57:33 +0300
>> > >
>> > > How about letting the user decide what's best for them? Would it be
>> possible to add an option to Emacs that a
>> > > user can set, say, in their .emacs file, which will determine whether
>> the bidi imiplementation will consider the
>> > > newline character as the paragraph separator or an empty line?
>> >
>> > Could be.  I'd need to carefully review the code to say for sure.
>> > Originally, the regexp which defines where paragraph begins was
>> > customizable, but it led to grave bugs, so I removed that.  Maybe a
>> > more restricted facility could avoid such pitfalls.
>>
>> It turned out to be relatively easy, so I implemented this on the
>> master branch of the Emacs Git repository.  There are two new
>> variables that you should set to "^" to get the behavior you wanted.
>> I hope you can build the master branch and see whether the new
>> facilities solve your case.
>>
>> Thanks.
>>
>
>
>

[-- Attachment #2: Type: text/html, Size: 3477 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 18:33                 ` Itai Berli
@ 2017-07-17 20:20                   ` Eli Zaretskii
  2017-07-17 20:43                     ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-17 20:20 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Mon, 17 Jul 2017 21:33:07 +0300
> 
> Eli, what version number should I download?

You should clone the Emacs Git repository as described here:

  https://savannah.gnu.org/git/?group=emacs

(Follow the instructions under "Anonymous clone", only for "Emacs
source repository", you don't need ELPA.)  Then the file INSTALL.REPO
which you will find in the top-level directory of the cloned
repository should explain how to build the development sources.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 20:20                   ` Eli Zaretskii
@ 2017-07-17 20:43                     ` Itai Berli
  2017-07-18  2:33                       ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-17 20:43 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

I get the error message

xml.c:23:10: fatal error: 'libxml/tree.h' file not found
#include <libxml/tree.h>
         ^
1 error generated.
make[2]: *** [xml.o] Error 1
make[1]: *** [src] Error 2
make: *** [default] Error 2

even though my path includes '
/usr/local/Cellar/libxml2/2.9.4_3/include/libxml2', and this directory in
turn contains 'libxml/tree.h'.

On Mon, Jul 17, 2017 at 11:20 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Mon, 17 Jul 2017 21:33:07 +0300
> >
> > Eli, what version number should I download?
>
> You should clone the Emacs Git repository as described here:
>
>   https://savannah.gnu.org/git/?group=emacs
>
> (Follow the instructions under "Anonymous clone", only for "Emacs
> source repository", you don't need ELPA.)  Then the file INSTALL.REPO
> which you will find in the top-level directory of the cloned
> repository should explain how to build the development sources.
>

[-- Attachment #2: Type: text/html, Size: 2892 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-17 20:43                     ` Itai Berli
@ 2017-07-18  2:33                       ` Eli Zaretskii
  2017-07-18  2:45                         ` Glenn Morris
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-18  2:33 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Mon, 17 Jul 2017 23:43:22 +0300
> 
> I get the error message
> 
> xml.c:23:10: fatal error: 'libxml/tree.h' file not found
> #include <libxml/tree.h>
> ^
> 1 error generated.
> make[2]: *** [xml.o] Error 1
> make[1]: *** [src] Error 2
> make: *** [default] Error 2
> 
> even though my path includes '/usr/local/Cellar/libxml2/2.9.4_3/include/libxml2', and this directory in turn
> contains 'libxml/tree.h'.

If you say "make V=1", do you see /usr/local/Cellar mentioned among
the -I command-line options of the compilation command line?

If that doesn't give a clue, you could configure Emacs --without-xml2.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18  2:33                       ` Eli Zaretskii
@ 2017-07-18  2:45                         ` Glenn Morris
  2017-07-18  4:01                           ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Glenn Morris @ 2017-07-18  2:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27526, Itai Berli


>> xml.c:23:10: fatal error: 'libxml/tree.h' file not found

https://debbugs.gnu.org/18779
https://lists.gnu.org/archive/html/emacs-devel/2015-11/msg01926.html

TL;DR:
xcode-select --install





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18  2:45                         ` Glenn Morris
@ 2017-07-18  4:01                           ` Itai Berli
  2017-07-18  4:54                             ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-18  4:01 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Thanks, Glenn. This worked. Why isn't this in the INSTALL file?

Eli, you mentioned I needed to set a couple variables. What are these
variables, and what do I need to set them to?

On Tue, Jul 18, 2017 at 5:45 AM, Glenn Morris <rgm@gnu.org> wrote:

>
> >> xml.c:23:10: fatal error: 'libxml/tree.h' file not found
>
> https://debbugs.gnu.org/18779
> https://lists.gnu.org/archive/html/emacs-devel/2015-11/msg01926.html
>
> TL;DR:
> xcode-select --install
>

[-- Attachment #2: Type: text/html, Size: 1026 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18  4:01                           ` Itai Berli
@ 2017-07-18  4:54                             ` Eli Zaretskii
  2017-07-18  5:52                               ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-18  4:54 UTC (permalink / raw)
  To: 27526, itai.berli

On July 18, 2017 7:01:41 AM GMT+03:00, Itai Berli <itai.berli@gmail.com> wrote:
> Thanks, Glenn. This worked. Why isn't this in the INSTALL file?
> 
> Eli, you mentioned I needed to set a couple variables. What are these
> variables, and what do I need to set them to?

The variables are bidi-paragraph-start-re and bidi-paragraph-separate-re. 
Their doc strings should have the details you need; in particular, the
settings for your case are already described there as an example.
You can also find their description in the manual.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18  4:54                             ` Eli Zaretskii
@ 2017-07-18  5:52                               ` Itai Berli
  2017-07-18 13:27                                 ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-18  5:52 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

I cannot access the variables; not only these, but all others too. When I
execute C-h v, the echo buffer says:

> Cannot open load file: No such file or directory, help-fns

On Tue, Jul 18, 2017 at 7:54 AM, Eli Zaretskii <eliz@gnu.org> wrote:

> On July 18, 2017 7:01:41 AM GMT+03:00, Itai Berli <itai.berli@gmail.com>
> wrote:
> > Thanks, Glenn. This worked. Why isn't this in the INSTALL file?
> >
> > Eli, you mentioned I needed to set a couple variables. What are these
> > variables, and what do I need to set them to?
>
> The variables are bidi-paragraph-start-re and bidi-paragraph-separate-re.
> Their doc strings should have the details you need; in particular, the
> settings for your case are already described there as an example.
> You can also find their description in the manual.
>

[-- Attachment #2: Type: text/html, Size: 1233 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18  5:52                               ` Itai Berli
@ 2017-07-18 13:27                                 ` Itai Berli
  2017-07-18 14:44                                   ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-18 13:27 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

OK, so I've managed to get the C-h v to work by running `make install`.

I've tested the feature, and it works. Thanks!

However, it can only be changed in the current buffer, and only by
executing M-: (setq ...). Can you please make it possible to set this
variable globally, say in the .emacs file, as well as to customize it with
M-x set-variable?

On Tue, Jul 18, 2017 at 8:52 AM, Itai Berli <itai.berli@gmail.com> wrote:

> I cannot access the variables; not only these, but all others too. When I
> execute C-h v, the echo buffer says:
>
> > Cannot open load file: No such file or directory, help-fns
>
> On Tue, Jul 18, 2017 at 7:54 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> On July 18, 2017 7:01:41 AM GMT+03:00, Itai Berli <itai.berli@gmail.com>
>> wrote:
>> > Thanks, Glenn. This worked. Why isn't this in the INSTALL file?
>> >
>> > Eli, you mentioned I needed to set a couple variables. What are these
>> > variables, and what do I need to set them to?
>>
>> The variables are bidi-paragraph-start-re and bidi-paragraph-separate-re.
>> Their doc strings should have the details you need; in particular, the
>> settings for your case are already described there as an example.
>> You can also find their description in the manual.
>>
>
>

[-- Attachment #2: Type: text/html, Size: 2286 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18 13:27                                 ` Itai Berli
@ 2017-07-18 14:44                                   ` Eli Zaretskii
  2017-07-18 15:22                                     ` Itai Berli
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-18 14:44 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526

> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 18 Jul 2017 16:27:28 +0300
> 
> I've tested the feature, and it works. Thanks!

Thanks for testing.

> However, it can only be changed in the current buffer, and only by executing M-: (setq ...). Can you please
> make it possible to set this variable globally, say in the .emacs file, as well as to customize it with M-x
> set-variable?

As with any variable, you can set these two globally by using
setq-default instead of setq.  That should work in your .emacs as
well.

I don't want to make this a defcustom yet, as I'm not sure it should
be a user option.  I'd like first to see how many people would like to
use it.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18 14:44                                   ` Eli Zaretskii
@ 2017-07-18 15:22                                     ` Itai Berli
  2017-07-18 15:55                                       ` Eli Zaretskii
  0 siblings, 1 reply; 31+ messages in thread
From: Itai Berli @ 2017-07-18 15:22 UTC (permalink / raw)
  To: 27526

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

How can you tell how many people would like to use it, or indeed if anyone
uses it at all?

At any rate, thanks for this fix. It is extremely helpful, and even
provides a workaround -- to a degree! -- for the line-wrapping problem, as
long as one is writing a document in a markup language like TeX/LaTeX or
XML where line breaks are treated the same as spaces. However, the
line-wrapping bug is still a major annoyance, at best, and until it is
fixed, Emacs cannot claim to be Unicode compliant. I saw that Mr. Stallman
chimed in on the line-wrapping bug, does this mean that there's hope that
it will get fixed in the forseeable future?

On Tue, Jul 18, 2017 at 5:44 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Itai Berli <itai.berli@gmail.com>
> > Date: Tue, 18 Jul 2017 16:27:28 +0300
> >
> > I've tested the feature, and it works. Thanks!
>
> Thanks for testing.
>
> > However, it can only be changed in the current buffer, and only by
> executing M-: (setq ...). Can you please
> > make it possible to set this variable globally, say in the .emacs file,
> as well as to customize it with M-x
> > set-variable?
>
> As with any variable, you can set these two globally by using
> setq-default instead of setq.  That should work in your .emacs as
> well.
>
> I don't want to make this a defcustom yet, as I'm not sure it should
> be a user option.  I'd like first to see how many people would like to
> use it.
>

[-- Attachment #2: Type: text/html, Size: 1922 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator
  2017-07-18 15:22                                     ` Itai Berli
@ 2017-07-18 15:55                                       ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2017-07-18 15:55 UTC (permalink / raw)
  To: Itai Berli; +Cc: 27526-done

> Resent-Sender: help-debbugs@gnu.org
> From: Itai Berli <itai.berli@gmail.com>
> Date: Tue, 18 Jul 2017 18:22:12 +0300
> 
> How can you tell how many people would like to use it, or indeed if anyone uses it at all?

By reading most of the Emacs-related traffic out there.

> At any rate, thanks for this fix. It is extremely helpful, and even provides a workaround -- to a degree! -- for the
> line-wrapping problem, as long as one is writing a document in a markup language like TeX/LaTeX or XML
> where line breaks are treated the same as spaces.

Thanks, so I'm closing this bug report.

> However, the line-wrapping bug is still a major
> annoyance, at best, and until it is fixed, Emacs cannot claim to be Unicode compliant.

I disagree, as I already said many times.  In any case, that's a
separate bug report.

> I saw that Mr. Stallman
> chimed in on the line-wrapping bug, does this mean that there's hope that it will get fixed in the forseeable
> future?

Richard chimed in on a tangent, it wasn't about the wrapping of bidi
text when paragraph direction is the opposite one.





^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-07-18 15:55 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-29  9:16 bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Itai Berli
2017-06-29  9:42 ` bug#27526: Explicit directionality marks CAN be inserted! Itai Berli
2017-06-29 14:49 ` bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Eli Zaretskii
2017-06-29 18:36 ` Itai Berli
2017-07-04 10:42   ` Itai Berli
2017-07-04 15:03     ` Eli Zaretskii
2017-07-04 15:57       ` Itai Berli
2017-07-04 16:18         ` Eli Zaretskii
2017-07-04 16:37           ` Itai Berli
2017-07-04 16:47             ` Eli Zaretskii
2017-07-04 17:01               ` Itai Berli
2017-07-04 17:46                 ` Eli Zaretskii
2017-07-12 15:10                   ` Itai Berli
2017-07-12 15:36                     ` Eli Zaretskii
2017-07-12 15:52                       ` Itai Berli
2017-07-12 16:12                         ` Eli Zaretskii
2017-07-17 14:54           ` Eli Zaretskii
2017-07-17 15:16             ` Itai Berli
2017-07-17 15:23               ` Jean-Christophe Helary
2017-07-17 18:33                 ` Itai Berli
2017-07-17 20:20                   ` Eli Zaretskii
2017-07-17 20:43                     ` Itai Berli
2017-07-18  2:33                       ` Eli Zaretskii
2017-07-18  2:45                         ` Glenn Morris
2017-07-18  4:01                           ` Itai Berli
2017-07-18  4:54                             ` Eli Zaretskii
2017-07-18  5:52                               ` Itai Berli
2017-07-18 13:27                                 ` Itai Berli
2017-07-18 14:44                                   ` Eli Zaretskii
2017-07-18 15:22                                     ` Itai Berli
2017-07-18 15:55                                       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).