unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43207: 26.3; Strange bidi behavior
@ 2020-09-04 19:36 Niels Möller
  2020-09-04 20:03 ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Niels Möller @ 2020-09-04 19:36 UTC (permalink / raw)
  To: 43207

I get a pretty confusing behavior when I load the file at

  https://www.lysator.liu.se/~nisse/misc/emacs-bidi-bug-2.txt

This is what it looks like

  https://www.lysator.liu.se/~nisse/misc/emacs-bidi-bug-2.png

I'll try to describe the behavior I get when loading the file with

  emacs -Q emacs-bidi-bug-2.txt

(I'm not sure exactly what intended behavior is, but what I see is quite
confusing). The file contains some arabic characters (originating in a
discussion of the vatican reportedly registering an arabic domain name
meaning "katholic"), followed by some Swedish text.

The Swedish text is displayed mostly in left-to-right order (except for
punctuation characters), but right-justified in the buffer, and to me it
seems like some parts of emacs thinks the text is rendered
right-to-left.

More specifically, C-f moves point in the expected "logical order" of
the text, which is mostly to the right on the screen. However pressing
the right-arrow key (bound to right-char) moves cursor to the left on
most parts of this text (the opposite direction of C-f (forward-char)),
despite the text being rendered in left-to-right order.

I would have expected the later part of the file to be displayed
left-justified in left-to-right order, with the exception of the single
word "كاثولي" rendered right-to-left.

Regards,
/Niels

In GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.20)
 of 2020-05-17, modified by Debian built on x86-csail-01
Windowing system distributor 'The X.Org Foundation', version
 11.0.12004000

Configured using:
 'configure --build x86_64-linux-gnu --prefix=/usr
 --sharedstatedir=/var/lib --libexecdir=/usr/lib
 --localstatedir=/var/lib --infodir=/usr/share/info
 --mandir=/usr/share/man --enable-libsystemd --with-pop=yes
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.3/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.3/site-lisp:/usr/share/emacs/site-lisp
 --with-sound=alsa --without-gconf --with-mailutils --build
 x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
 --libexecdir=/usr/lib --localstatedir=/var/lib
 --infodir=/usr/share/info --mandir=/usr/share/man --enable-libsystemd
 --with-pop=yes
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.3/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.3/site-lisp:/usr/share/emacs/site-lisp
 --with-sound=alsa --without-gconf --with-mailutils --with-x=yes
 --with-x-toolkit=gtk3 --with-toolkit-scroll-bars 'CFLAGS=-g -O2
 -fdebug-prefix-map=/build/emacs-mHAik2/emacs-26.3+1=.
 -fstack-protector-strong
 -Wformat -Werror=format-security -Wall' 'CPPFLAGS=-Wdate-time
 -D_FORTIFY_SOURCE=2' LDFLAGS=-Wl,-z,relro'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB
NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM THREADS LIBSYSTEMD LCMS2

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Text

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/usr/share/emacs/site-lisp/lyskom-elisp-client hides
/usr/share/emacs/site-lisp/lyskom-elisp-client/lyskom-elisp-client

Features:
(pp shadow sort mail-extr emacsbug message dired dired-loaddefs
format-spec rfc822 mml mml-sec epa derived epg epg-config gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils parse-time browse-url wid-edit
network-stream puny nsm rmc seq auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache starttls tls gnutls lyskom-rest
string lyskom-menus lyskom-ansaphone lyskom-messages mship-edit
lyskom-cache lyskom-services lyskom-mime lyskom-aux-items lyskom-command
advice lyskom-clienttypes lyskom-types lyskom-language-sv lyskom-strings
lyskom-language lyskom-macros lyskom-vars lyskom-feature lyskom-defvar
lyskom edmacro kmacro cl-print byte-opt gv bytecomp byte-compile cconv
thingatpt cl-extra help-fns radix-tree help-mode easymenu cl-loaddefs
cl-lib elec-pair time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting move-toolbar gtk x-toolkit x
multi-tty make-network-process emacs)

Memory information:
((conses 16 264803 23621)
 (symbols 48 30298 0)
 (miscs 40 845 608)
 (strings 32 61450 3495)
 (string-bytes 1 1701594)
 (vectors 16 37130)
 (vector-slots 8 1636019 140596)
 (floats 8 99 394)
 (intervals 56 3949 356)
 (buffers 992 21))

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#43207: 26.3; Strange bidi behavior
  2020-09-04 19:36 bug#43207: 26.3; Strange bidi behavior Niels Möller
@ 2020-09-04 20:03 ` Eli Zaretskii
  2020-09-05  0:09   ` Stefan Kangas
  2020-09-05  6:46   ` Niels Möller
  0 siblings, 2 replies; 5+ messages in thread
From: Eli Zaretskii @ 2020-09-04 20:03 UTC (permalink / raw)
  To: Niels Möller; +Cc: 43207

tags 43207 notabug
thanks

> From: nisse@lysator.liu.se (Niels Möller)
> Date: Fri, 04 Sep 2020 21:36:41 +0200
> 
> I get a pretty confusing behavior when I load the file at
> 
>   https://www.lysator.liu.se/~nisse/misc/emacs-bidi-bug-2.txt
> 
> This is what it looks like
> 
>   https://www.lysator.liu.se/~nisse/misc/emacs-bidi-bug-2.png

It is not a bug, but the expected behavior.  The display of
bidirectional text is affected by the "base paragraph direction", and
in Emacs paragraphs are separated by empty lines.  Since there's no
empty line between the Arabic text and the following lines of Latin
text, that Latin text "inherits" the base paragraph direction of
right-to-left, set by the line that has only the Arabic text.

You can either insert an empty line between that Arabic line, or you
can force the entire buffer to be displayed with left-to-right base
directionality by doing

  M-x set-variable RET bidi-paragraph-direction RET left-to-right RET

This is all described in the Emacs manual, btw; see the node
"Bidirectional Editing" there.

> More specifically, C-f moves point in the expected "logical order" of
> the text, which is mostly to the right on the screen. However pressing
> the right-arrow key (bound to right-char) moves cursor to the left on
> most parts of this text (the opposite direction of C-f (forward-char)),
> despite the text being rendered in left-to-right order.

This is also the expected behavior.  It might be surprising for
someone who isn't used to reading bidirectional text, especially when
the base direction of a paragraph is the opposite of the natural text
direction.  But this is how most bidi-supporting applications out
there behave.

If you prefer the arrow keys to move the cursor visually, you can do

  M-x set-variable RET visual-order-cursor-movement RET t RET

(This is also in the manual.)

> I would have expected the later part of the file to be displayed
> left-justified in left-to-right order, with the exception of the single
> word "كاثولي" rendered right-to-left.

That'd cause annoying change of justification to the left or right
when Arabic or Latin words are pushed to the next line due to text
insertion.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#43207: 26.3; Strange bidi behavior
  2020-09-04 20:03 ` Eli Zaretskii
@ 2020-09-05  0:09   ` Stefan Kangas
  2020-09-05  6:46   ` Niels Möller
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Kangas @ 2020-09-05  0:09 UTC (permalink / raw)
  To: Eli Zaretskii, Niels Möller; +Cc: 43207-done

Eli Zaretskii <eliz@gnu.org> writes:

> tags 43207 notabug
> thanks

I'm therefore closing this bug report.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#43207: 26.3; Strange bidi behavior
  2020-09-04 20:03 ` Eli Zaretskii
  2020-09-05  0:09   ` Stefan Kangas
@ 2020-09-05  6:46   ` Niels Möller
  2020-09-05  7:35     ` Eli Zaretskii
  1 sibling, 1 reply; 5+ messages in thread
From: Niels Möller @ 2020-09-05  6:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43207

Eli Zaretskii <eliz@gnu.org> writes:

> It is not a bug, but the expected behavior.  The display of
> bidirectional text is affected by the "base paragraph direction", and
> in Emacs paragraphs are separated by empty lines.  Since there's no
> empty line between the Arabic text and the following lines of Latin
> text, that Latin text "inherits" the base paragraph direction of
> right-to-left, set by the line that has only the Arabic text.

Thanks for the explanation. So if the base paragraph direction is
right-to-left, then right arrow is supposed to move logical backwards,
like C-b.

> You can either insert an empty line between that Arabic line, 

Is there any way to tell emacs that a new paragraph starts, without
inserting anything visible in the buffer? Some special unicode
character, or emacs text property?

> But this is how most bidi-supporting applications out there behave.

For what it's worth, display in firefox works differently. The line of
arabic text is right-to-left and right-justified on the screen, but
following lines are left-to-right, more like what I expected. So it
seems to use a different parapgraph heuristics than emacs.

> If you prefer the arrow keys to move the cursor visually, you can do
>
>   M-x set-variable RET visual-order-cursor-movement RET t RET
>
> (This is also in the manual.)

I was also able to find this setting via the documentation for
left-char/right-char. That's nice.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#43207: 26.3; Strange bidi behavior
  2020-09-05  6:46   ` Niels Möller
@ 2020-09-05  7:35     ` Eli Zaretskii
  0 siblings, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2020-09-05  7:35 UTC (permalink / raw)
  To: Niels Möller; +Cc: 43207

> From: nisse@lysator.liu.se (Niels Möller)
> Cc: 43207@debbugs.gnu.org
> Date: Sat, 05 Sep 2020 08:46:51 +0200
> 
> Thanks for the explanation. So if the base paragraph direction is
> right-to-left, then right arrow is supposed to move logical backwards,
> like C-b.

Yes.

> > You can either insert an empty line between that Arabic line, 
> 
> Is there any way to tell emacs that a new paragraph starts, without
> inserting anything visible in the buffer? Some special unicode
> character, or emacs text property?

No, not at the moment.  FWIW, I don't think there's a significant need
for such a feature, given the related features we already have.

> > But this is how most bidi-supporting applications out there behave.
> 
> For what it's worth, display in firefox works differently. The line of
> arabic text is right-to-left and right-justified on the screen, but
> following lines are left-to-right, more like what I expected. So it
> seems to use a different parapgraph heuristics than emacs.

Yes, the Unicode Standard allows some leeway in this matter, and Emacs
uses it.  Firefox is not a text editor, so it doesn't need to cope
with the various situations we have every day in Emacs, where text
filling rearranges the words at the beginning of a physical line,
which generally can change the base direction at random (because the
base direction depends on the first string directional character of
the first line of the paragraph).

Because someone requested strict adherence to the Unicode
Bidirectional Algorithm in this particular matter, Emacs does support
the "each physical line starts a new paragraph" behavior; you will
find in the manual how to set that up using the variables
bidi-paragraph-start-re and bidi-paragraph-separate-re.  But I don't
recommend such a setup in Emacs, for the reasons I described above.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-05  7:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-04 19:36 bug#43207: 26.3; Strange bidi behavior Niels Möller
2020-09-04 20:03 ` Eli Zaretskii
2020-09-05  0:09   ` Stefan Kangas
2020-09-05  6:46   ` Niels Möller
2020-09-05  7:35     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).