From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] Date: Tue, 24 Oct 2006 09:30:38 +0900 Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1161649819 7663 80.91.229.2 (24 Oct 2006 00:30:19 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 24 Oct 2006 00:30:19 +0000 (UTC) Cc: dalias@aerifal.cx, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 24 02:30:15 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GcABE-0004DJ-0c for ged-emacs-devel@m.gmane.org; Tue, 24 Oct 2006 02:30:10 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GcABC-0002uW-HF for ged-emacs-devel@m.gmane.org; Mon, 23 Oct 2006 20:30:06 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GcAB0-0002uR-Uk for emacs-devel@gnu.org; Mon, 23 Oct 2006 20:29:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GcAAz-0002u3-PW for emacs-devel@gnu.org; Mon, 23 Oct 2006 20:29:54 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GcAAz-0002tz-Lc for emacs-devel@gnu.org; Mon, 23 Oct 2006 20:29:53 -0400 Original-Received: from [150.29.246.133] (helo=mx1.aist.go.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GcAAy-0005DP-1j; Mon, 23 Oct 2006 20:29:52 -0400 Original-Received: from smtp1.aist.go.jp ([150.29.246.12]) by mx1.aist.go.jp with ESMTP id k9O0ThhC028340; Tue, 24 Oct 2006 09:29:43 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id k9O0TfcX026251; Tue, 24 Oct 2006 09:29:41 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1GcABi-0003oX-00; Tue, 24 Oct 2006 09:30:38 +0900 Original-To: rms@gnu.org In-reply-to: (message from Richard Stallman on Mon, 16 Oct 2006 09:50:51 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:61085 Archived-At: In article , Richard Stallman writes: > Would you please look at this issue and comment? > I am not sure if this is something we should try to fix, now or ever. > But I would like you to think about it. Sorry for the late response. Actually there's not that much we can do on this matter. > ------- Start of forwarded message ------- > Date: Wed, 11 Oct 2006 15:16:50 -0400 > To: bug-gnu-emacs@gnu.org > From: Rich Felker > Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs > & ideas for fix) [...] > When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text > files, it treats all characters as if they occupy one character cell > column on the terminal. This causes it to become confused about the > cursor position whenever there is CJK fullwidth text or scripts that > use nonspacing combining characters present, to the point that editing > is impossible. Unfortunately, the current Emacs assumes that all characters in a charset has the same width. As far as we are dealing with legacy charsets (e.g. ISO8859, JISX, KSC, GB), that assumption worked well. > Attached to this email is a UTF-8 file you can open in Emacs which > exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and > Thai (for nonspacing). > The root of the problem: In term.c, produce_glyphs() function, the > code assumes all multibyte characters for a given 'charset' have the > same width: The root of the problem is that there's no way for Emacs to know how many column a terminal use to display a specific character. For Hiragana, it's possible for Emacs to guess it will be displayed with two-column, but for Tibetan and Thai, it heavily depends on terminal's capapbility of handling CTL (Complex Text Layout). If a terminal doesn't know how to do CTL for Tibetan, it will just produce glyphs for each syllable component without stacking (and thus occupy several columns). If a terminal does, it will dislay them in one (or two) column. But, there's no way for Emacs to know which is the case. > Correctly fixing the issue: > 1. Needs some sort of width lookup for unicode characters without > having to convert from Emacs' native encoding to UCS thru UTF-8. > This should be straightforward for someone who understands the > code. That only works for such simple characters as Hiranaga. In emacs-unicode-2 branch, I introduced char-width-table that maps each character to column-width occupied by that character on screen. > 2. The apppend_glyph() function needs to handle width==0 case, perhaps > converting the previous glyph into a COMPOSITE_GLYPH instead of > adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH > system in Emacs so I don't know if this is feasible. COMPOSITE_GLYPH is a glyph containing multiple characters that must be displayed as a single grapheme cluster. On X, Emacs displays characters in a COMPOSITE_GLYPH correctly (sometimes by stacking, sometimes by overstriking, sometimes by using alternate glyph, etc). But, as there's no way on terminal to perform such a operation, current Emacs just displays the first character of a COMPOSITE_GLYPH. > At present this issue is making it very difficult for me to use > Tibetan text in composing email and material for the web, so I'm > looking for some way to fix it, either upstream or with hacks I can > make locally for the time being until it's fixed properly. If you want to handle Tibetan text, using X is the only way for the moment. --- Kenichi Handa handa@m17n.org