* [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)]
@ 2006-10-16 13:50 Richard Stallman
2006-10-24 0:30 ` Kenichi Handa
0 siblings, 1 reply; 2+ messages in thread
From: Richard Stallman @ 2006-10-16 13:50 UTC (permalink / raw)
Cc: emacs-devel
Would you please look at this issue and comment?
I am not sure if this is something we should try to fix, now or ever.
But I would like you to think about it.
------- Start of forwarded message -------
Date: Wed, 11 Oct 2006 15:16:50 -0400
To: bug-gnu-emacs@gnu.org
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="izwpuPcl6rIhGH0d"
Content-Disposition: inline
From: Rich Felker <dalias@aerifal.cx>
Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs
& ideas for fix)
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed
version=3.0.4
- --izwpuPcl6rIhGH0d
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text
files, it treats all characters as if they occupy one character cell
column on the terminal. This causes it to become confused about the
cursor position whenever there is CJK fullwidth text or scripts that
use nonspacing combining characters present, to the point that editing
is impossible.
My coding system settings:
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)
I run emacs inside GNU screen, running on a real UTF-8 terminal, but
if you don't have a real UTF-8 terminal, screen can emulate UTF-8
(showing ? for unavailable width-1 characters and ?? for unavailable
width-2 characters) on any terminal. Using a UTF-8 xterm or other
terminal that supports UTF-8 may make it easier to see the problem
though.
Attached to this email is a UTF-8 file you can open in Emacs which
exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and
Thai (for nonspacing).
The root of the problem: In term.c, produce_glyphs() function, the
code assumes all multibyte characters for a given 'charset' have the
same width:
/* A multi-byte character. The display width is fixed for all
characters of the set. Some of the glyphs may have to be
ignored because they are already displayed in a continued
line. */
int charset = CHAR_CHARSET (it->c);
it->pixel_width = CHARSET_WIDTH (charset);
I put together a horrible elaborate hack to work around this:
struct glyph glyph = { .type = CHAR_GLYPH, .u = { .ch = it->c } };
char *foo = encode_terminal_code (&glyph, 1, &terminal_coding);
wchar_t wc = dec_utf8(foo); /* naive utf8 decode function */
it->pixel_width = mk_wcwidth(wc); /* Kuhn's UCS wcwidth func */
But it's incorrect and assumes the terminal encoding is UTF-8.. not to
mention it's quite inefficient and ugly. (Note: for term.c, "pixel"
means character cell.)
With this change made, CJK characters are correctly treated as two
columns, and combining marks as 0, however combining marks disappear
_entirely_ due to the loop in append_glyph() (term.c) never executing
if width==0.
Correctly fixing the issue:
1. Needs some sort of width lookup for unicode characters without
having to convert from Emacs' native encoding to UCS thru UTF-8.
This should be straightforward for someone who understands the
code.
2. The apppend_glyph() function needs to handle width==0 case, perhaps
converting the previous glyph into a COMPOSITE_GLYPH instead of
adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH
system in Emacs so I don't know if this is feasible.
At present this issue is making it very difficult for me to use
Tibetan text in composing email and material for the web, so I'm
looking for some way to fix it, either upstream or with hacks I can
make locally for the time being until it's fixed properly.
Rich
- --izwpuPcl6rIhGH0d
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename="example.txt"
Content-Transfer-Encoding: 8bit
????????
????????
???????: ??? ??? ??? ???
- --izwpuPcl6rIhGH0d
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
bug-gnu-emacs mailing list
bug-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs
- --izwpuPcl6rIhGH0d--
------- End of forwarded message -------
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)]
2006-10-16 13:50 [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] Richard Stallman
@ 2006-10-24 0:30 ` Kenichi Handa
0 siblings, 0 replies; 2+ messages in thread
From: Kenichi Handa @ 2006-10-24 0:30 UTC (permalink / raw)
Cc: dalias, emacs-devel
In article <E1GZSrj-0005dK-W6@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> Would you please look at this issue and comment?
> I am not sure if this is something we should try to fix, now or ever.
> But I would like you to think about it.
Sorry for the late response. Actually there's not that
much we can do on this matter.
> ------- Start of forwarded message -------
> Date: Wed, 11 Oct 2006 15:16:50 -0400
> To: bug-gnu-emacs@gnu.org
> From: Rich Felker <dalias@aerifal.cx>
> Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs
> & ideas for fix)
[...]
> When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text
> files, it treats all characters as if they occupy one character cell
> column on the terminal. This causes it to become confused about the
> cursor position whenever there is CJK fullwidth text or scripts that
> use nonspacing combining characters present, to the point that editing
> is impossible.
Unfortunately, the current Emacs assumes that all characters
in a charset has the same width. As far as we are dealing
with legacy charsets (e.g. ISO8859, JISX, KSC, GB), that
assumption worked well.
> Attached to this email is a UTF-8 file you can open in Emacs which
> exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and
> Thai (for nonspacing).
> The root of the problem: In term.c, produce_glyphs() function, the
> code assumes all multibyte characters for a given 'charset' have the
> same width:
The root of the problem is that there's no way for Emacs to
know how many column a terminal use to display a specific
character. For Hiragana, it's possible for Emacs to guess
it will be displayed with two-column, but for Tibetan and
Thai, it heavily depends on terminal's capapbility of
handling CTL (Complex Text Layout). If a terminal doesn't
know how to do CTL for Tibetan, it will just produce glyphs
for each syllable component without stacking (and thus
occupy several columns). If a terminal does, it will dislay
them in one (or two) column. But, there's no way for Emacs
to know which is the case.
> Correctly fixing the issue:
> 1. Needs some sort of width lookup for unicode characters without
> having to convert from Emacs' native encoding to UCS thru UTF-8.
> This should be straightforward for someone who understands the
> code.
That only works for such simple characters as Hiranaga. In
emacs-unicode-2 branch, I introduced char-width-table that
maps each character to column-width occupied by that
character on screen.
> 2. The apppend_glyph() function needs to handle width==0 case, perhaps
> converting the previous glyph into a COMPOSITE_GLYPH instead of
> adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH
> system in Emacs so I don't know if this is feasible.
COMPOSITE_GLYPH is a glyph containing multiple characters
that must be displayed as a single grapheme cluster. On X,
Emacs displays characters in a COMPOSITE_GLYPH correctly
(sometimes by stacking, sometimes by overstriking, sometimes
by using alternate glyph, etc). But, as there's no way on
terminal to perform such a operation, current Emacs just
displays the first character of a COMPOSITE_GLYPH.
> At present this issue is making it very difficult for me to use
> Tibetan text in composing email and material for the web, so I'm
> looking for some way to fix it, either upstream or with hacks I can
> make locally for the time being until it's fixed properly.
If you want to handle Tibetan text, using X is the only way
for the moment.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-10-24 0:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-16 13:50 [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] Richard Stallman
2006-10-24 0:30 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).