From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: IT testing for multiple characters =?utf-8?B?KNa016jWuSk=?= that occupy 1 display string. Date: Mon, 01 Oct 2018 09:35:29 +0300 Message-ID: <83d0su41ji.fsf@gnu.org> References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1538375659 22016 195.159.176.226 (1 Oct 2018 06:34:19 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 1 Oct 2018 06:34:19 +0000 (UTC) Cc: emacs-devel@gnu.org To: Keith David Bershatsky Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 01 08:34:15 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g6rmZ-0005dA-Fn for ged-emacs-devel@m.gmane.org; Mon, 01 Oct 2018 08:34:15 +0200 Original-Received: from localhost ([::1]:59655 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g6rog-0007mv-1d for ged-emacs-devel@m.gmane.org; Mon, 01 Oct 2018 02:36:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g6rnz-0007lo-N2 for emacs-devel@gnu.org; Mon, 01 Oct 2018 02:35:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g6rnu-0002Pn-Lf for emacs-devel@gnu.org; Mon, 01 Oct 2018 02:35:43 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50629) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g6rnu-0002Pj-Hn; Mon, 01 Oct 2018 02:35:38 -0400 Original-Received: from [176.228.60.248] (port=1691 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1g6rnu-0007ei-52; Mon, 01 Oct 2018 02:35:38 -0400 In-reply-to: (message from Keith David Bershatsky on Sun, 30 Sep 2018 17:14:04 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:230178 Archived-At: > Date: Sun, 30 Sep 2018 17:14:04 -0700 > From: Keith David Bershatsky > > When using move_it_in_display_line_to, ִרֹ is treated as one display string; however, it is 3 different characters occupying 3 HPOS. it->c only reports 1460. > > I initially guessed that this was a composition situation, however, it.what returns a 0, which is the same as any other regular character. It is definitely a composition, but you've put the individual codepoints in the wrong order, at least in your email. The correct order should be: first u+05e8, the base character, and after that u+5b4 (1460 decimal) and u+5b9, in any order. Here's what Emacs reports about this composition on my system: position: 1 of 3 (0%), column: 0 character: ר‎ (displayed as ר‎) (codepoint 1512, #o2750, #x5e8) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x05E8 script: hebrew syntax: w which means: word category: .:Base, R:Right-to-left (strong) to input: type "C-x 8 RET 5e8" or "C-x 8 RET HEBREW LETTER RESH" buffer code: #xD7 #xA8 file code: not encodable by coding system iso-latin-1-dos display: composed to form "רִֹ" (see below) Composed with the following character(s) "ִֹ" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 by these glyphs: [0 2 1512 696 8 1 6 6 0 nil] [0 2 1465 662 8 4 5 8 -7 [-9 0 0]] [0 2 1460 657 8 4 5 -1 2 [-9 0 0]] > Q: How do we test for this situation, so that we know all of the different characters and each of their individual pixel widths? If I put a breakpoint in move_it_in_display_line_to, at or after the call to PRODUCE_GLYPHS, I see a composition: (gdb) p it->what $2 = IT_COMPOSITION (gdb) p it->c $3 = 1512 (gdb) p it->cmp_it $4 = { stop_pos = 1, id = 0, ch = 1465, rule_idx = 2, lookback = 1, nglyphs = 3, reversed_p = false, charpos = 1, nchars = 3, nbytes = 6, from = 0, to = 3, width = 1 } The nchars and nbytes members of the cmp_it structure clearly show that Emacs composes 3 characters whose combined length in the internal buffer representation is 6 bytes. You didn't tell how and where you took the iterator information, so I don't know why this didn't look like a composition to you. Maybe you had the codepoints in the wrong order, like in your mail, in which case the u+5b4 character will indeed not compose with the other 2. As for knowing how many pixels this composed character takes -- the answer as the same as with any other display element: you subtract the X coordinate (it->current_x) before the glyph from the X coordinate after it.