all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Kenichi Handa <handa@m17n.org>, 970@emacsbugs.donarmstrong.com
Cc: bug-gnu-emacs@gnu.org
Subject: bug#970: 23.0.60; Non-ASCII display problems on a tty
Date: Sat, 27 Sep 2008 17:48:02 +0300	[thread overview]
Message-ID: <u8wtdlk65.fsf@gnu.org> (raw)
In-Reply-To: <87vdwtz4nz.fsf@cyd.mit.edu>

I have some more info about this bug.

The below is based on displaying a file that is encoded in
iso-2022-7bit-unix, and has a single line that is a copy of line 20
from etc/HELLO, which is the entry for the Bengali language.

To produce this file, copy line 20 of HELLO, paste it into a new file,
type "C-x RET f iso-2022-7bit-unix RET" and save the file.

The display problems for this line are directly caused by the fact
that tty_write_glyphs is called with its last argument len=22, which
means the display engine expects 22 characters to be displayed.  And
tty_write_glyphs therefore moves cursor by 22 positions to account for
that.

However, encode_terminal_code returns a string whose length is only 13
characters, and the difference between 13 and 22 is the immediate
cause for display problems: the displayed string looks as if it were
padded by whitespace, but typing "C-x =" on these ``whitespace''
characters reveals that they are not spaces at all.

Looking inside encode_terminal_code, I see that the problem is somehow
related to composite characters.  The first group of non-ASCII
characters (in parentheses) are composite characters whose
u.cmp.automatic flag is set.  The Lisp object returned by
composition_gstring_from_id for this group of characters is a Lisp
vector:

  [[nil 2476 2494 2434 2482 2494] 0 [0 0 2476 2476 1 0 1 1 0 nil] [1 1 2494 2494 1 0 1 1 0 nil] [2 2 2434 2434 1 0 1 1 0 nil] [3 3 2482 2482 1 0 1 1 0 nil] [4 4 2494 2494 1 0 1 1 0 nil]]

When this code:

	  if (src->u.cmp.automatic)
	    for (i = src->u.cmp.from; i < src->u.cmp.to; i++)
	      {
		Lisp_Object g = LGSTRING_GLYPH (gstring, i);
		int c = LGLYPH_CHAR (g);

		if (! char_charset (c, charset_list, NULL))
		  break;
		buf += CHAR_STRING (c, buf);
		nchars++;
	      }

walks this Lisp vector, it immediately finds that the 1st character
cannot be encoded by the current terminal's encoding, and breaks out
of the loop.  Then the `?' character gets stored in the buffer that is
being prepared for encoding:

	  if (i == 0)
	    {
	      /* The first character of the composition is not encodable.  */
	      *buf++ = '?';
	      nchars++;
	    }

This is all as expected, but because of the "if (i == 0)" clause
above, the `?' character gets stored only for the first character in
this composition, whose codepoint is 2476.  For other characters, the
u.cmp.from value is greater than 0, so `?' is not stored for them.

By contrast, on a graphics terminal, the 5 characters inside the
parentheses are displayed as 2 visible glyphs, one (codepoint 2476)
for buffer position 10, the other (codepoint 2482) for buffer position
13.  Thus, I would expect to see two `?' question marks inside
parentheses, not one.

Similar problem happens with the second group of non-ASCII characters
on this line, the characters that follow the TAB character.  Here's
the Lisp object returned by composition_gstring_from_id:

  [[nil 2472 2478 2488 2509 2453 2494 2480] 1 [0 0 2472 2472 1 0 1 1 0 nil] [1 1 2478 2478 1 0 1 1 0 nil] [2 3 2488 2488 1 0 1 1 0 nil] [2 3 2509 2509 0 0 0 1 0 nil] [4 4 2453 2453 1 0 1 1 0 nil] [5 5 2494 2494 1 0 1 1 0 nil] [6 6 2480 2480 1 0 1 1 0 nil]]

(Note that in this case, there are elements in this vector whose
FROM-IDX and TO-IDX values are not identical, and also the WIDTH value
is zero for one of them.)  This group of characters is displayed as 4
visible glyphs on a graphics terminal: respectively, for buffer
positions 17 (code 2472), 18 (code 2478), 19 (code 2488), and 23
(2480).  On a TTY, only one `?' is shown, again for the same reason as
described above: the "if (i == 0)" test.

My first suspicion would be that the object returned by
composition_gstring_from_id gives incorrect data for FROM-IDX and
TO-IDX, but I'm not sure I understood the composition machinery enough
to draw a definitive conclusion.  It is not even clear to me how do we
want to display these characters: do we want the number of `?'s to be
identical to the number of glyphs displayed by a graphics terminal, or
do we want something else?

Handa-san, can you please comment on these findings?






  parent reply	other threads:[~2008-09-27 14:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-18 18:32 bug#970: 23.0.60; Non-ASCII display problems on a tty Chong Yidong
2008-09-19  8:44 ` Eli Zaretskii
2008-09-27 14:48 ` Eli Zaretskii [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-09-12 10:18 Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=u8wtdlk65.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=970@emacsbugs.donarmstrong.com \
    --cc=bug-gnu-emacs@gnu.org \
    --cc=handa@m17n.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.