From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Long lines and bidi Date: Mon, 11 Feb 2013 18:42:12 +0200 Message-ID: <83ehgm6bvf.fsf@gnu.org> References: <877gmp5a04.fsf@ed.ac.uk> <83vca89izh.fsf@gnu.org> <5110906D.7020406@yandex.ru> <83fw1aac3d.fsf@gnu.org> <51120360.4060104@yandex.ru> <51127363.5030203@yandex.ru> <834nhp9u9j.fsf@gnu.org> <5114FEBB.8020201@yandex.ru> <838v6y99wk.fsf@gnu.org> <836222983u.fsf@gnu.org> <51152A00.6070101@yandex.ru> <83y5ey7npl.fsf@gnu.org> <5115C3BC.8020203@cs.ucla.edu> <83txpl7u3w.fsf@gnu.org> <5116113D.5070707@cs.ucla.edu> <83mwvd7qlx.fsf@gnu.org> <83r4ko5cpv.fsf@gnu.org> <511884F5.6030806@yandex.ru> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1360600952 1559 80.91.229.3 (11 Feb 2013 16:42:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 11 Feb 2013 16:42:32 +0000 (UTC) Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org To: Dmitry Antipov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 11 17:42:51 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1U4wSw-0001mf-Fx for ged-emacs-devel@m.gmane.org; Mon, 11 Feb 2013 17:42:50 +0100 Original-Received: from localhost ([::1]:52532 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4wSd-0006nx-16 for ged-emacs-devel@m.gmane.org; Mon, 11 Feb 2013 11:42:31 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:53461) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4wSK-0006DB-DN for emacs-devel@gnu.org; Mon, 11 Feb 2013 11:42:13 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U4wSI-0006ey-7Y for emacs-devel@gnu.org; Mon, 11 Feb 2013 11:42:12 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:53260) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4wSH-0006eL-Qw for emacs-devel@gnu.org; Mon, 11 Feb 2013 11:42:10 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MI200L00EDWBQ00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Mon, 11 Feb 2013 18:42:04 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MI200LDZEE32H30@a-mtaout22.012.net.il>; Mon, 11 Feb 2013 18:42:03 +0200 (IST) In-reply-to: <511884F5.6030806@yandex.ru> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:156960 Archived-At: > Date: Mon, 11 Feb 2013 09:43:17 +0400 > From: Dmitry Antipov > CC: Eli Zaretskii , Paul Eggert > > Yet another interesting profile (generated by scroll-both micro-benchmark with > r111730) is shown below. > > Input is 4K lines, each line is ~27K bytes, Imla'ei (modern Arabic) script. Can you publish the file, or the URL where you downloaded it from? > IIUC this R2L text with long lines should push bidi really hard, > but... bidi core routines (by itself) are almost irrelevant in the > profile: Actually, that's expected, see below. > 39.96% emacs emacs [.] scan_buffer > 28.72% emacs emacs [.] buf_charpos_to_bytepos > 21.82% emacs emacs [.] buf_bytepos_to_charpos > 0.59% emacs emacs [.] re_match_2_internal > 0.51% emacs emacs [.] sub_char_table_ref > 0.42% emacs emacs [.] mark_object > 0.23% emacs emacs [.] composition_gstring_width > 0.19% emacs libc-2.16.so [.] __memcpy_ssse3_back > 0.18% emacs emacs [.] x_produce_glyphs > 0.17% emacs emacs [.] move_it_in_display_line_to > 0.17% emacs emacs [.] hash_lookup > 0.17% emacs emacs [.] Fgarbage_collect > 0.17% emacs emacs [.] lface_hash > 0.16% emacs emacs [.] decode_coding_utf_8 > 0.16% emacs emacs [.] face_for_font > 0.16% emacs emacs [.] composition_gstring_p > 0.15% emacs emacs [.] compile_pattern > 0.15% emacs emacs [.] get_next_display_element > 0.14% emacs emacs [.] bidi_level_of_next_char > 0.12% emacs emacs [.] font_range > 0.12% emacs emacs [.] bidi_fetch_char > 0.12% emacs emacs [.] internal_equal > 0.11% emacs emacs [.] autocmp_chars > 0.11% emacs emacs [.] char_table_ref > 0.11% emacs libgtk-3.so.0.600.4 [.] 0x0000000000115bf0 > 0.10% emacs emacs [.] next_element_from_buffer > 0.10% emacs emacs [.] composition_update_it > 0.10% emacs emacs [.] boyer_moore The Arabic script is a heavy user of character compositions: they are important for correct shaping of the glyphs, without which any speaker of Arabic will turn away in disgust. The fact that you see functions like composition_update_it, composition_gstring_p, composition_gstring_width, and sub_char_table_ref all hint towards this. Character compositions work by scanning the vicinity of a composable character using regular expression matching in Lisp. That is why you see re_match_2_internal relatively high in the profile. Handling these compositions can obscure any bidi reordering. To disable this factor, turn off auto-composition-mode. More importantly, you cannot easily "push bidi really hard", not with a file that consists of predominantly RTL characters. That's because such a file is as easy to display as a pure LTR text: the characters are delivered for display entirely in their logical order in the buffer, and only laid out starting at the right margin of the window instead of at the left margin. To exercise bidi.c, you need heavily mixed RTL and LTR text, with digits, punctuation, and lots of embeddings and directional overrides (using the LRE, RLE, RLO, and LRO control characters), which push and pop the reordering stack. Only then the reordering of characters will become non-trivial, and you _might_ see some bidi functions as hot spots. I say "might" because bidi.c uses a dynamic cache which allows it to fetch and analyze each character only once, even if reordering jumps here and there like a young goat. Thus, the only overhead of reordering is the logic that decides where in the cache is the next character to deliver for display; the cache is accessed directly (it is implemented as a linear array). There could be rare pathological situations where bidi.c needs to examine lots (and I'm talking tens or hundreds of thousands) of characters for some simple redisplay operation. A few of these were discovered and taken care of during late stages of v24.1 development, but maybe there are some more. These typically show up as heavy usage of bidi_fetch_char or its subroutines, or of bidi_find_paragraph_start and its subroutines. I haven't seen such problems since last July.