all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: dalias@aerifal.cx, emacs-devel@gnu.org
Subject: Re: [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running	on terminal (w/rtfs & ideas for fix)]
Date: Tue, 24 Oct 2006 09:30:38 +0900	[thread overview]
Message-ID: <E1GcABi-0003oX-00@etlken> (raw)
In-Reply-To: <E1GZSrj-0005dK-W6@fencepost.gnu.org> (message from Richard Stallman on Mon, 16 Oct 2006 09:50:51 -0400)

In article <E1GZSrj-0005dK-W6@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> Would you please look at this issue and comment?
> I am not sure if this is something we should try to fix, now or ever.
> But I would like you to think about it.

Sorry for the late response.   Actually there's not that
much we can do on this matter.

> ------- Start of forwarded message -------
> Date: Wed, 11 Oct 2006 15:16:50 -0400
> To: bug-gnu-emacs@gnu.org
> From: Rich Felker <dalias@aerifal.cx>
> Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs
> 	& ideas for fix)
[...]
> When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text
> files, it treats all characters as if they occupy one character cell
> column on the terminal. This causes it to become confused about the
> cursor position whenever there is CJK fullwidth text or scripts that
> use nonspacing combining characters present, to the point that editing
> is impossible.

Unfortunately, the current Emacs assumes that all characters
in a charset has the same width.  As far as we are dealing
with legacy charsets (e.g. ISO8859, JISX, KSC, GB), that
assumption worked well.

> Attached to this email is a UTF-8 file you can open in Emacs which
> exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and
> Thai (for nonspacing).

> The root of the problem: In term.c, produce_glyphs() function, the
> code assumes all multibyte characters for a given 'charset' have the
> same width:

The root of the problem is that there's no way for Emacs to
know how many column a terminal use to display a specific
character.  For Hiragana, it's possible for Emacs to guess
it will be displayed with two-column, but for Tibetan and
Thai, it heavily depends on terminal's capapbility of
handling CTL (Complex Text Layout).  If a terminal doesn't
know how to do CTL for Tibetan, it will just produce glyphs
for each syllable component without stacking (and thus
occupy several columns).  If a terminal does, it will dislay
them in one (or two) column.  But, there's no way for Emacs
to know which is the case.

> Correctly fixing the issue:

> 1. Needs some sort of width lookup for unicode characters without
>    having to convert from Emacs' native encoding to UCS thru UTF-8.
>    This should be straightforward for someone who understands the
>    code.

That only works for such simple characters as Hiranaga.  In
emacs-unicode-2 branch, I introduced char-width-table that
maps each character to column-width occupied by that
character on screen.

> 2. The apppend_glyph() function needs to handle width==0 case, perhaps
>    converting the previous glyph into a COMPOSITE_GLYPH instead of
>    adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH
>    system in Emacs so I don't know if this is feasible.

COMPOSITE_GLYPH is a glyph containing multiple characters
that must be displayed as a single grapheme cluster.  On X,
Emacs displays characters in a COMPOSITE_GLYPH correctly
(sometimes by stacking, sometimes by overstriking, sometimes
by using alternate glyph, etc).  But, as there's no way on
terminal to perform such a operation, current Emacs just
displays the first character of a COMPOSITE_GLYPH.

> At present this issue is making it very difficult for me to use
> Tibetan text in composing email and material for the web, so I'm
> looking for some way to fix it, either upstream or with hacks I can
> make locally for the time being until it's fixed properly.

If you want to handle Tibetan text, using X is the only way
for the moment.

---
Kenichi Handa
handa@m17n.org

      reply	other threads:[~2006-10-24  0:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-16 13:50 [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] Richard Stallman
2006-10-24  0:30 ` Kenichi Handa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1GcABi-0003oX-00@etlken \
    --to=handa@m17n.org \
    --cc=dalias@aerifal.cx \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.