unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Dmitry Antipov <dmantipov@yandex.ru>
Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org
Subject: Re: Long lines and bidi
Date: Mon, 11 Feb 2013 18:42:12 +0200	[thread overview]
Message-ID: <83ehgm6bvf.fsf@gnu.org> (raw)
In-Reply-To: <511884F5.6030806@yandex.ru>

> Date: Mon, 11 Feb 2013 09:43:17 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>
> 
> Yet another interesting profile (generated by scroll-both micro-benchmark with
> r111730) is shown below.
> 
> Input is 4K lines, each line is ~27K bytes, Imla'ei (modern Arabic) script.

Can you publish the file, or the URL where you downloaded it from?

> IIUC this R2L text with long lines should push bidi really hard,
> but... bidi core routines (by itself) are almost irrelevant in the
> profile:

Actually, that's expected, see below.

>      39.96%        emacs  emacs                          [.] scan_buffer
>      28.72%        emacs  emacs                          [.] buf_charpos_to_bytepos
>      21.82%        emacs  emacs                          [.] buf_bytepos_to_charpos
>       0.59%        emacs  emacs                          [.] re_match_2_internal
>       0.51%        emacs  emacs                          [.] sub_char_table_ref
>       0.42%        emacs  emacs                          [.] mark_object
>       0.23%        emacs  emacs                          [.] composition_gstring_width
>       0.19%        emacs  libc-2.16.so                   [.] __memcpy_ssse3_back
>       0.18%        emacs  emacs                          [.] x_produce_glyphs
>       0.17%        emacs  emacs                          [.] move_it_in_display_line_to
>       0.17%        emacs  emacs                          [.] hash_lookup
>       0.17%        emacs  emacs                          [.] Fgarbage_collect
>       0.17%        emacs  emacs                          [.] lface_hash
>       0.16%        emacs  emacs                          [.] decode_coding_utf_8
>       0.16%        emacs  emacs                          [.] face_for_font
>       0.16%        emacs  emacs                          [.] composition_gstring_p
>       0.15%        emacs  emacs                          [.] compile_pattern
>       0.15%        emacs  emacs                          [.] get_next_display_element
>       0.14%        emacs  emacs                          [.] bidi_level_of_next_char
>       0.12%        emacs  emacs                          [.] font_range
>       0.12%        emacs  emacs                          [.] bidi_fetch_char
>       0.12%        emacs  emacs                          [.] internal_equal
>       0.11%        emacs  emacs                          [.] autocmp_chars
>       0.11%        emacs  emacs                          [.] char_table_ref
>       0.11%        emacs  libgtk-3.so.0.600.4            [.] 0x0000000000115bf0
>       0.10%        emacs  emacs                          [.] next_element_from_buffer
>       0.10%        emacs  emacs                          [.] composition_update_it
>       0.10%        emacs  emacs                          [.] boyer_moore

The Arabic script is a heavy user of character compositions: they are
important for correct shaping of the glyphs, without which any speaker
of Arabic will turn away in disgust.  The fact that you see functions
like composition_update_it, composition_gstring_p,
composition_gstring_width, and sub_char_table_ref all hint towards
this.  Character compositions work by scanning the vicinity of a
composable character using regular expression matching in Lisp.  That
is why you see re_match_2_internal relatively high in the profile.
Handling these compositions can obscure any bidi reordering.  To
disable this factor, turn off auto-composition-mode.

More importantly, you cannot easily "push bidi really hard", not with
a file that consists of predominantly RTL characters.  That's because
such a file is as easy to display as a pure LTR text: the characters
are delivered for display entirely in their logical order in the
buffer, and only laid out starting at the right margin of the window
instead of at the left margin.

To exercise bidi.c, you need heavily mixed RTL and LTR text, with
digits, punctuation, and lots of embeddings and directional overrides
(using the LRE, RLE, RLO, and LRO control characters), which push and
pop the reordering stack.  Only then the reordering of characters will
become non-trivial, and you _might_ see some bidi functions as hot
spots.  I say "might" because bidi.c uses a dynamic cache which allows
it to fetch and analyze each character only once, even if reordering
jumps here and there like a young goat.  Thus, the only overhead of
reordering is the logic that decides where in the cache is the next
character to deliver for display; the cache is accessed directly (it
is implemented as a linear array).

There could be rare pathological situations where bidi.c needs to
examine lots (and I'm talking tens or hundreds of thousands) of
characters for some simple redisplay operation.  A few of these were
discovered and taken care of during late stages of v24.1 development,
but maybe there are some more.  These typically show up as heavy usage
of bidi_fetch_char or its subroutines, or of bidi_find_paragraph_start
and its subroutines.  I haven't seen such problems since last July.



  parent reply	other threads:[~2013-02-11 16:42 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <877gmp5a04.fsf@ed.ac.uk>
     [not found] ` <83vca89izh.fsf@gnu.org>
     [not found]   ` <5110906D.7020406@yandex.ru>
     [not found]     ` <83fw1aac3d.fsf@gnu.org>
     [not found]       ` <51120360.4060104@yandex.ru>
     [not found]         ` <jwvehgtfrd6.fsf-monnier+emacs@gnu.org>
     [not found]           ` <51127363.5030203@yandex.ru>
     [not found]             ` <834nhp9u9j.fsf@gnu.org>
2013-02-08 13:33               ` Long lines and bidi [Was: Re: bug#13623: ...] Dmitry Antipov
2013-02-08 14:07                 ` Eli Zaretskii
2013-02-08 14:46                   ` Long lines and bidi Eli Zaretskii
2013-02-08 16:38                     ` Dmitry Antipov
2013-02-08 16:52                       ` Eli Zaretskii
2013-02-09  3:34                         ` Paul Eggert
2013-02-09  8:46                           ` Eli Zaretskii
2013-02-09  9:05                             ` Paul Eggert
2013-02-09  9:33                               ` Eli Zaretskii
2013-02-11  2:33                                 ` Paul Eggert
2013-02-09 10:01                               ` Eli Zaretskii
2013-02-10 16:57                                 ` Eli Zaretskii
2013-02-11  5:43                                   ` Dmitry Antipov
2013-02-11  7:54                                     ` Dmitry Antipov
2013-02-11 16:47                                       ` Eli Zaretskii
2013-02-11 23:55                                         ` Paul Eggert
2013-02-11 16:42                                     ` Eli Zaretskii [this message]
2013-02-11 17:53                                       ` Dmitry Antipov
2013-02-11 18:10                                         ` Eli Zaretskii
2013-02-11 18:21                                           ` Dmitry Antipov
2013-02-11 17:17                                   ` Eli Zaretskii
2013-02-11 17:55                                     ` Drew Adams
2013-02-11 18:13                                       ` Eli Zaretskii
2013-02-08 16:21                   ` Long lines and bidi [Was: Re: bug#13623: ...] Dmitry Antipov
2013-02-08 17:04                     ` Eli Zaretskii
2013-02-08 15:33                 ` Stefan Monnier
2013-02-08 16:05                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ehgm6bvf.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=dmantipov@yandex.ru \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).