* bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line
@ 2023-02-03 19:41 Halim
2023-02-04 11:38 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Halim @ 2023-02-03 19:41 UTC (permalink / raw)
To: 61269
In a left-to-right line emacs display a sequence of one or more
spaces (U+0020), where the spaces precede a tab (U+0009) and they
both appear between two right-to-left alphabet, to the left of the
first (in typing order) rtl alphabet.
The bug does not present when the rtl text is inside an rtl
isolate.
Let s represent space, t represet tab, l represent itself, r and
m represent arabic alphabet. The following example have this format
in typing order from left to right.
Format:
lsrssstm
Example text:
l ح م
The expected display is 'lsrssstm', the actual is 'lssssrtm'.
The spaces following 'r' in the format is displayed to the left
of 'r' in the actual display. Using 'C-f' from 'r' moves the
cursor to the left until it hits 't' where the cursor move to
the right of 'r'.
I have tried to view the file containing the buggy text in
focuswriter and fribidi. They both display the same expected
way.
Extra Info
The bug also present to ltr text on rtl line. I believe
this is generic and is caused by this line
'&& level != bidi_it->level_stack[0].level' (see below).
The bug also present in emacs built from commit
'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
--without-all. In this commit I make the following
modification.
---------------
$ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
diff --git a/src/bidi.c b/src/bidi.c
index e012512..fe6e4d6 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
if ((bidi_it->orig_type == NEUTRAL_WS
|| bidi_it->orig_type == WEAK_BN
|| bidi_isolate_fmt_char (bidi_it->orig_type))
- && bidi_it->next_for_ws.charpos < bidi_it->charpos
- /* If this character is already at base level, we don't need to
- reset it, so avoid the potentially costly loop below. */
- && level != bidi_it->level_stack[0].level)
+ && bidi_it->next_for_ws.charpos < bidi_it->charpos)
{
int ch;
ptrdiff_t clen = bidi_it->ch_len;
---------------
It fixes the bug.
In GNU Emacs 28.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.36, cairo version 1.17.6)
of 2023-01-03 built on 2
Windowing system distributor 'The X.Org Foundation', version 11.0.12101006
System Description: Arch Linux
Configured using:
'configure --sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib
--localstatedir=/var --with-cairo --with-harfbuzz --with-libsystemd
--with-modules --with-x-toolkit=gtk3 'CFLAGS=-march=x86-64
-mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2
-Wformat -Werror=format-security -fstack-clash-protection
-fcf-protection -g
-ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto'
'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto''
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Fundamental
Minor modes in effect:
delete-selection-mode: t
cua-mode: t
umath-mode: umath-insert-common
tooltip-mode: t
global-eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug sendmail misearch multi-isearch
mule-util jka-compr nndraft nnmh nnfolder utf-7 rfc2104 gnutls
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art
mm-uu mml2015 mm-view mml-smime smime dig nntp gnus-cache gnus-sum shr
kinsoku svg dom gnus-group gnus-undo gnus-start gnus-dbus dbus xml
gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo parse-time iso8601
gnus-spec gnus-int gnus-range message dired dired-loaddefs rfc822 mml
mml-sec epa mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader gnus-win gnus nnheader gnus-util rmail
rmail-loaddefs rfc2047 rfc2045 ietf-drums time-date mail-utils mm-util
mail-prsvr display-fill-column-indicator display-line-numbers delsel
cua-base cus-load lsp-mode lsp-protocol help-mode xref project
tree-widget wid-edit spinner pcase network-stream puny nsm rmc
markdown-mode rx color thingatpt noutline outline lv inline imenu ht
filenotify f f-shortdoc shortdoc s ewoc epg rfc6068 epg-config dash
compile text-property-search comint ansi-color ring finder-inf edmacro
kmacro easy-mmode derived info cl package browse-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util mailcap url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json subr-x map
url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 386790 21130)
(symbols 48 30110 6)
(strings 32 132616 6853)
(string-bytes 1 3608021)
(vectors 16 51861)
(vector-slots 8 610382 31136)
(floats 8 356 324)
(intervals 56 4882 0)
(buffers 992 21))
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line
2023-02-03 19:41 bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line Halim
@ 2023-02-04 11:38 ` Eli Zaretskii
2023-02-05 16:55 ` Halim
0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2023-02-04 11:38 UTC (permalink / raw)
To: Halim; +Cc: 61269
> From: Halim <mhalimln@outlook.com>
> Date: Sat, 04 Feb 2023 02:41:35 +0700
>
>
> In a left-to-right line emacs display a sequence of one or more
> spaces (U+0020), where the spaces precede a tab (U+0009) and they
> both appear between two right-to-left alphabet, to the left of the
> first (in typing order) rtl alphabet.
>
> The bug does not present when the rtl text is inside an rtl
> isolate.
>
> Let s represent space, t represet tab, l represent itself, r and
> m represent arabic alphabet. The following example have this format
> in typing order from left to right.
>
> Format:
> lsrssstm
>
> Example text:
> l ح م
>
> The expected display is 'lsrssstm', the actual is 'lssssrtm'.
> The spaces following 'r' in the format is displayed to the left
> of 'r' in the actual display. Using 'C-f' from 'r' moves the
> cursor to the left until it hits 't' where the cursor move to
> the right of 'r'.
>
> I have tried to view the file containing the buggy text in
> focuswriter and fribidi. They both display the same expected
> way.
>
> Extra Info
>
> The bug also present to ltr text on rtl line. I believe
> this is generic and is caused by this line
> '&& level != bidi_it->level_stack[0].level' (see below).
>
> The bug also present in emacs built from commit
> 'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
> --without-all. In this commit I make the following
> modification.
>
> ---------------
> $ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
> diff --git a/src/bidi.c b/src/bidi.c
> index e012512..fe6e4d6 100644
> --- a/src/bidi.c
> +++ b/src/bidi.c
> @@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
> if ((bidi_it->orig_type == NEUTRAL_WS
> || bidi_it->orig_type == WEAK_BN
> || bidi_isolate_fmt_char (bidi_it->orig_type))
> - && bidi_it->next_for_ws.charpos < bidi_it->charpos
> - /* If this character is already at base level, we don't need to
> - reset it, so avoid the potentially costly loop below. */
> - && level != bidi_it->level_stack[0].level)
> + && bidi_it->next_for_ws.charpos < bidi_it->charpos)
> {
> int ch;
> ptrdiff_t clen = bidi_it->ch_len;
> ---------------
>
> It fixes the bug.
Thanks.
You are right that the logic there was flawed. However, just removing
the base-level test is sub-optimal: that test was added to speed up
redisplay when the buffer has a lot of control characters (e.g.,
binary null bytes) that don't need to be reordered; see bug#22739.
So I have installed a slightly different change, reproduced below;
please see that it solves the problem, including (presumably) some
real-life problems you had in displaying RTL text with embedded TABs.
diff --git a/src/bidi.c b/src/bidi.c
index e012512..93875d2 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -3300,12 +3300,15 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
it belongs to a sequence of WS characters preceding a newline
or a TAB or a paragraph separator. */
if ((bidi_it->orig_type == NEUTRAL_WS
- || bidi_it->orig_type == WEAK_BN
+ || (bidi_it->orig_type == WEAK_BN
+ /* If this BN character is already at base level, we don't
+ need to consider resetting it, since I1 and I2 below
+ will not change the level, so avoid the potentially
+ costly loop below. */
+ && level != bidi_it->level_stack[0].level)
|| bidi_isolate_fmt_char (bidi_it->orig_type))
- && bidi_it->next_for_ws.charpos < bidi_it->charpos
- /* If this character is already at base level, we don't need to
- reset it, so avoid the potentially costly loop below. */
- && level != bidi_it->level_stack[0].level)
+ /* This means the informaition about WS resolution is not valid. */
+ && bidi_it->next_for_ws.charpos < bidi_it->charpos)
{
int ch;
ptrdiff_t clen = bidi_it->ch_len;
@@ -3340,7 +3343,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
|| bidi_it->orig_type == NEUTRAL_S
|| bidi_it->ch == '\n' || bidi_it->ch == BIDI_EOB
|| ((bidi_it->orig_type == NEUTRAL_WS
- || bidi_it->orig_type == WEAK_BN
+ || bidi_it->orig_type == WEAK_BN /* L1/Retaining */
|| bidi_isolate_fmt_char (bidi_it->orig_type)
|| bidi_explicit_dir_char (bidi_it->ch))
&& (bidi_it->next_for_ws.type == NEUTRAL_B
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line
2023-02-04 11:38 ` Eli Zaretskii
@ 2023-02-05 16:55 ` Halim
2023-02-05 17:17 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Halim @ 2023-02-05 16:55 UTC (permalink / raw)
To: 61269
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Halim <mhalimln@outlook.com>
>> Date: Sat, 04 Feb 2023 02:41:35 +0700
>>
>>
>> In a left-to-right line emacs display a sequence of one or more
>> spaces (U+0020), where the spaces precede a tab (U+0009) and they
>> both appear between two right-to-left alphabet, to the left of the
>> first (in typing order) rtl alphabet.
>>
>> The bug does not present when the rtl text is inside an rtl
>> isolate.
>>
>> Let s represent space, t represet tab, l represent itself, r and
>> m represent arabic alphabet. The following example have this format
>> in typing order from left to right.
>>
>> Format:
>> lsrssstm
>>
>> Example text:
>> l ح م
>>
>> The expected display is 'lsrssstm', the actual is 'lssssrtm'.
>> The spaces following 'r' in the format is displayed to the left
>> of 'r' in the actual display. Using 'C-f' from 'r' moves the
>> cursor to the left until it hits 't' where the cursor move to
>> the right of 'r'.
>>
>> I have tried to view the file containing the buggy text in
>> focuswriter and fribidi. They both display the same expected
>> way.
>>
>> Extra Info
>>
>> The bug also present to ltr text on rtl line. I believe
>> this is generic and is caused by this line
>> '&& level != bidi_it->level_stack[0].level' (see below).
>>
>> The bug also present in emacs built from commit
>> 'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
>> --without-all. In this commit I make the following
>> modification.
>>
>> ---------------
>> $ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
>> diff --git a/src/bidi.c b/src/bidi.c
>> index e012512..fe6e4d6 100644
>> --- a/src/bidi.c
>> +++ b/src/bidi.c
>> @@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
>> if ((bidi_it->orig_type == NEUTRAL_WS
>> || bidi_it->orig_type == WEAK_BN
>> || bidi_isolate_fmt_char (bidi_it->orig_type))
>> - && bidi_it->next_for_ws.charpos < bidi_it->charpos
>> - /* If this character is already at base level, we don't need to
>> - reset it, so avoid the potentially costly loop below. */
>> - && level != bidi_it->level_stack[0].level)
>> + && bidi_it->next_for_ws.charpos < bidi_it->charpos)
>> {
>> int ch;
>> ptrdiff_t clen = bidi_it->ch_len;
>> ---------------
>>
>> It fixes the bug.
>
> Thanks.
>
> You are right that the logic there was flawed. However, just removing
> the base-level test is sub-optimal: that test was added to speed up
> redisplay when the buffer has a lot of control characters (e.g.,
> binary null bytes) that don't need to be reordered; see bug#22739.
>
> So I have installed a slightly different change, reproduced below;
> please see that it solves the problem, including (presumably) some
> real-life problems you had in displaying RTL text with embedded TABs.
>
> diff --git a/src/bidi.c b/src/bidi.c
> index e012512..93875d2 100644
> --- a/src/bidi.c
> +++ b/src/bidi.c
> @@ -3300,12 +3300,15 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
> it belongs to a sequence of WS characters preceding a newline
> or a TAB or a paragraph separator. */
> if ((bidi_it->orig_type == NEUTRAL_WS
> - || bidi_it->orig_type == WEAK_BN
> + || (bidi_it->orig_type == WEAK_BN
> + /* If this BN character is already at base level, we don't
> + need to consider resetting it, since I1 and I2 below
> + will not change the level, so avoid the potentially
> + costly loop below. */
> + && level != bidi_it->level_stack[0].level)
> || bidi_isolate_fmt_char (bidi_it->orig_type))
> - && bidi_it->next_for_ws.charpos < bidi_it->charpos
> - /* If this character is already at base level, we don't need to
> - reset it, so avoid the potentially costly loop below. */
> - && level != bidi_it->level_stack[0].level)
> + /* This means the informaition about WS resolution is not valid. */
> + && bidi_it->next_for_ws.charpos < bidi_it->charpos)
> {
> int ch;
> ptrdiff_t clen = bidi_it->ch_len;
> @@ -3340,7 +3343,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
> || bidi_it->orig_type == NEUTRAL_S
> || bidi_it->ch == '\n' || bidi_it->ch == BIDI_EOB
> || ((bidi_it->orig_type == NEUTRAL_WS
> - || bidi_it->orig_type == WEAK_BN
> + || bidi_it->orig_type == WEAK_BN /* L1/Retaining */
> || bidi_isolate_fmt_char (bidi_it->orig_type)
> || bidi_explicit_dir_char (bidi_it->ch))
> && (bidi_it->next_for_ws.type == NEUTRAL_B
I have done the same test as I did before and your patch does fix
the problem. Unfortunately I never had any real-life problems as I
did not write any bidi text (I does write, but its only to help my
understanding on UBA), so I cant give any result on this.
Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line
2023-02-05 16:55 ` Halim
@ 2023-02-05 17:17 ` Eli Zaretskii
0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2023-02-05 17:17 UTC (permalink / raw)
To: Halim; +Cc: 61269-done
> From: Halim <mhalimln@outlook.com>
> Date: Sun, 05 Feb 2023 23:55:38 +0700
>
> I have done the same test as I did before and your patch does fix
> the problem. Unfortunately I never had any real-life problems as I
> did not write any bidi text (I does write, but its only to help my
> understanding on UBA), so I cant give any result on this.
OK, thanks. So I'm closing this bug; feel free to reopen if you
encounter some similar issues with whitespace and TABs in bidi
context.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-02-05 17:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-03 19:41 bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional line Halim
2023-02-04 11:38 ` Eli Zaretskii
2023-02-05 16:55 ` Halim
2023-02-05 17:17 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).