From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Richard Stallman Newsgroups: gmane.emacs.devel Subject: [dalias@aerifal.cx: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] Date: Mon, 16 Oct 2006 09:50:51 -0400 Message-ID: Reply-To: rms@gnu.org NNTP-Posting-Host: main.gmane.org Content-Type: text/plain; charset=ISO-8859-15 X-Trace: sea.gmane.org 1161006881 1440 80.91.229.2 (16 Oct 2006 13:54:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 16 Oct 2006 13:54:41 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 16 15:54:39 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GZSvI-0004k4-5q for ged-emacs-devel@m.gmane.org; Mon, 16 Oct 2006 15:54:32 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GZSvD-0005sz-NQ for ged-emacs-devel@m.gmane.org; Mon, 16 Oct 2006 09:54:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GZSrm-00011U-EP for emacs-devel@gnu.org; Mon, 16 Oct 2006 09:50:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GZSrl-00010I-3z for emacs-devel@gnu.org; Mon, 16 Oct 2006 09:50:53 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GZSrk-000101-TO for emacs-devel@gnu.org; Mon, 16 Oct 2006 09:50:52 -0400 Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GZT0s-00082M-WA for emacs-devel@gnu.org; Mon, 16 Oct 2006 10:00:19 -0400 Original-Received: from rms by fencepost.gnu.org with local (Exim 4.34) id 1GZSrj-0005dK-W6; Mon, 16 Oct 2006 09:50:52 -0400 Original-To: handa@m17n.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:60789 Archived-At: Would you please look at this issue and comment? I am not sure if this is something we should try to fix, now or ever. But I would like you to think about it. ------- Start of forwarded message ------- Date: Wed, 11 Oct 2006 15:16:50 -0400 To: bug-gnu-emacs@gnu.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="izwpuPcl6rIhGH0d" Content-Disposition: inline From: Rich Felker Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix) X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.0.4 - --izwpuPcl6rIhGH0d Content-Type: text/plain; charset=us-ascii Content-Disposition: inline When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text files, it treats all characters as if they occupy one character cell column on the terminal. This causes it to become confused about the cursor position whenever there is CJK fullwidth text or scripts that use nonspacing combining characters present, to the point that editing is impossible. My coding system settings: (setq locale-coding-system 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (set-selection-coding-system 'utf-8) (prefer-coding-system 'utf-8) I run emacs inside GNU screen, running on a real UTF-8 terminal, but if you don't have a real UTF-8 terminal, screen can emulate UTF-8 (showing ? for unavailable width-1 characters and ?? for unavailable width-2 characters) on any terminal. Using a UTF-8 xterm or other terminal that supports UTF-8 may make it easier to see the problem though. Attached to this email is a UTF-8 file you can open in Emacs which exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and Thai (for nonspacing). The root of the problem: In term.c, produce_glyphs() function, the code assumes all multibyte characters for a given 'charset' have the same width: /* A multi-byte character. The display width is fixed for all characters of the set. Some of the glyphs may have to be ignored because they are already displayed in a continued line. */ int charset = CHAR_CHARSET (it->c); it->pixel_width = CHARSET_WIDTH (charset); I put together a horrible elaborate hack to work around this: struct glyph glyph = { .type = CHAR_GLYPH, .u = { .ch = it->c } }; char *foo = encode_terminal_code (&glyph, 1, &terminal_coding); wchar_t wc = dec_utf8(foo); /* naive utf8 decode function */ it->pixel_width = mk_wcwidth(wc); /* Kuhn's UCS wcwidth func */ But it's incorrect and assumes the terminal encoding is UTF-8.. not to mention it's quite inefficient and ugly. (Note: for term.c, "pixel" means character cell.) With this change made, CJK characters are correctly treated as two columns, and combining marks as 0, however combining marks disappear _entirely_ due to the loop in append_glyph() (term.c) never executing if width==0. Correctly fixing the issue: 1. Needs some sort of width lookup for unicode characters without having to convert from Emacs' native encoding to UCS thru UTF-8. This should be straightforward for someone who understands the code. 2. The apppend_glyph() function needs to handle width==0 case, perhaps converting the previous glyph into a COMPOSITE_GLYPH instead of adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH system in Emacs so I don't know if this is feasible. At present this issue is making it very difficult for me to use Tibetan text in composing email and material for the web, so I'm looking for some way to fix it, either upstream or with hacks I can make locally for the time being until it's fixed properly. Rich - --izwpuPcl6rIhGH0d Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="example.txt" Content-Transfer-Encoding: 8bit ???????? ???????? ???????: ??? ??? ??? ??? - --izwpuPcl6rIhGH0d Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ bug-gnu-emacs mailing list bug-gnu-emacs@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs - --izwpuPcl6rIhGH0d-- ------- End of forwarded message -------