From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#970: 23.0.60; Non-ASCII display problems on a tty Date: Sat, 27 Sep 2008 17:48:02 +0300 Message-ID: References: <87vdwtz4nz.fsf@cyd.mit.edu> Reply-To: Eli Zaretskii , 970@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1222528232 16235 80.91.229.12 (27 Sep 2008 15:10:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 27 Sep 2008 15:10:32 +0000 (UTC) Cc: bug-gnu-emacs@gnu.org To: Kenichi Handa , 970@emacsbugs.donarmstrong.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Sep 27 17:11:30 2008 connect(): Connection refused Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KjbS7-0004vb-Mp for geb-bug-gnu-emacs@m.gmane.org; Sat, 27 Sep 2008 17:11:24 +0200 Original-Received: from localhost ([127.0.0.1]:55688 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KjbR5-0002fP-4y for geb-bug-gnu-emacs@m.gmane.org; Sat, 27 Sep 2008 11:10:19 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KjbQt-0002YI-BO for bug-gnu-emacs@gnu.org; Sat, 27 Sep 2008 11:10:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KjbQs-0002XZ-KX for bug-gnu-emacs@gnu.org; Sat, 27 Sep 2008 11:10:06 -0400 Original-Received: from [199.232.76.173] (port=43185 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KjbQs-0002XP-AU for bug-gnu-emacs@gnu.org; Sat, 27 Sep 2008 11:10:06 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:35159) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KjbQs-0003au-3D for bug-gnu-emacs@gnu.org; Sat, 27 Sep 2008 11:10:06 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m8RF9sDT016768; Sat, 27 Sep 2008 08:09:55 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id m8REt54h012671; Sat, 27 Sep 2008 07:55:05 -0700 X-Loop: don@donarmstrong.com Resent-From: Eli Zaretskii Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Sat, 27 Sep 2008 14:55:05 +0000 Resent-Message-ID: Resent-Sender: don@donarmstrong.com X-Emacs-PR-Message: report 970 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 970-submit@emacsbugs.donarmstrong.com id=B970.122252693611307 (code B ref 970); Sat, 27 Sep 2008 14:55:05 +0000 Original-Received: (at 970) by emacsbugs.donarmstrong.com; 27 Sep 2008 14:48:56 +0000 Original-Received: from mtaout5.012.net.il (mtaout5.012.net.il [84.95.2.13]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m8REmqPx011294 for <970@emacsbugs.donarmstrong.com>; Sat, 27 Sep 2008 07:48:53 -0700 Original-Received: from HOME-C4E4A596F7 ([77.127.170.116]) by i_mtaout5.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0K7U004EGZTUTBV1@i_mtaout5.012.net.il> for 970@emacsbugs.donarmstrong.com; Sat, 27 Sep 2008 17:49:07 +0300 (IDT) In-reply-to: <87vdwtz4nz.fsf@cyd.mit.edu> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) Resent-Date: Sat, 27 Sep 2008 11:10:06 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:20828 Archived-At: I have some more info about this bug. The below is based on displaying a file that is encoded in iso-2022-7bit-unix, and has a single line that is a copy of line 20 from etc/HELLO, which is the entry for the Bengali language. To produce this file, copy line 20 of HELLO, paste it into a new file, type "C-x RET f iso-2022-7bit-unix RET" and save the file. The display problems for this line are directly caused by the fact that tty_write_glyphs is called with its last argument len=22, which means the display engine expects 22 characters to be displayed. And tty_write_glyphs therefore moves cursor by 22 positions to account for that. However, encode_terminal_code returns a string whose length is only 13 characters, and the difference between 13 and 22 is the immediate cause for display problems: the displayed string looks as if it were padded by whitespace, but typing "C-x =" on these ``whitespace'' characters reveals that they are not spaces at all. Looking inside encode_terminal_code, I see that the problem is somehow related to composite characters. The first group of non-ASCII characters (in parentheses) are composite characters whose u.cmp.automatic flag is set. The Lisp object returned by composition_gstring_from_id for this group of characters is a Lisp vector: [[nil 2476 2494 2434 2482 2494] 0 [0 0 2476 2476 1 0 1 1 0 nil] [1 1 2494 2494 1 0 1 1 0 nil] [2 2 2434 2434 1 0 1 1 0 nil] [3 3 2482 2482 1 0 1 1 0 nil] [4 4 2494 2494 1 0 1 1 0 nil]] When this code: if (src->u.cmp.automatic) for (i = src->u.cmp.from; i < src->u.cmp.to; i++) { Lisp_Object g = LGSTRING_GLYPH (gstring, i); int c = LGLYPH_CHAR (g); if (! char_charset (c, charset_list, NULL)) break; buf += CHAR_STRING (c, buf); nchars++; } walks this Lisp vector, it immediately finds that the 1st character cannot be encoded by the current terminal's encoding, and breaks out of the loop. Then the `?' character gets stored in the buffer that is being prepared for encoding: if (i == 0) { /* The first character of the composition is not encodable. */ *buf++ = '?'; nchars++; } This is all as expected, but because of the "if (i == 0)" clause above, the `?' character gets stored only for the first character in this composition, whose codepoint is 2476. For other characters, the u.cmp.from value is greater than 0, so `?' is not stored for them. By contrast, on a graphics terminal, the 5 characters inside the parentheses are displayed as 2 visible glyphs, one (codepoint 2476) for buffer position 10, the other (codepoint 2482) for buffer position 13. Thus, I would expect to see two `?' question marks inside parentheses, not one. Similar problem happens with the second group of non-ASCII characters on this line, the characters that follow the TAB character. Here's the Lisp object returned by composition_gstring_from_id: [[nil 2472 2478 2488 2509 2453 2494 2480] 1 [0 0 2472 2472 1 0 1 1 0 nil] [1 1 2478 2478 1 0 1 1 0 nil] [2 3 2488 2488 1 0 1 1 0 nil] [2 3 2509 2509 0 0 0 1 0 nil] [4 4 2453 2453 1 0 1 1 0 nil] [5 5 2494 2494 1 0 1 1 0 nil] [6 6 2480 2480 1 0 1 1 0 nil]] (Note that in this case, there are elements in this vector whose FROM-IDX and TO-IDX values are not identical, and also the WIDTH value is zero for one of them.) This group of characters is displayed as 4 visible glyphs on a graphics terminal: respectively, for buffer positions 17 (code 2472), 18 (code 2478), 19 (code 2488), and 23 (2480). On a TTY, only one `?' is shown, again for the same reason as described above: the "if (i == 0)" test. My first suspicion would be that the object returned by composition_gstring_from_id gives incorrect data for FROM-IDX and TO-IDX, but I'm not sure I understood the composition machinery enough to draw a definitive conclusion. It is not even clear to me how do we want to display these characters: do we want the number of `?'s to be identical to the number of glyphs displayed by a graphics terminal, or do we want something else? Handa-san, can you please comment on these findings?