From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: James Ralston Newsgroups: gmane.emacs.bugs Subject: emacs thinks UTF-8 can't encode Japanese text? Date: Thu, 13 Jan 2005 17:56:28 -0500 Organization: Software Engineering Institute Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1105658580 5661 80.91.229.6 (13 Jan 2005 23:23:00 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 13 Jan 2005 23:23:00 +0000 (UTC) Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jan 14 00:22:46 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CpEIf-0000yj-00 for ; Fri, 14 Jan 2005 00:22:45 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CpEUK-00035x-Ja for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Jan 2005 18:34:48 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1CpETn-0002tY-TM for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2005 18:34:16 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1CpETm-0002sw-Pl for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2005 18:34:15 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CpEPm-0001Nv-Q0 for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2005 18:30:07 -0500 Original-Received: from [130.59.10.2] (helo=chx400.switch.ch) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1CpDtM-0002ZD-SL for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2005 17:56:37 -0500 Original-Received: from smtp02.sei.cmu.edu ([192.58.107.165]) by chx400.switch.ch with esmtp (Exim 3.20 #1) id 1CpDtK-0002oN-00 for gnu-emacs-bug@moderators.isc.org; Thu, 13 Jan 2005 23:56:35 +0100 Original-Received: from ms00.sei.cmu.edu (ms00.sei.cmu.edu [128.237.2.1]) by smtp02.sei.cmu.edu (8.12.11/8.12.11/1.20) with ESMTP id j0DMuViG006539 for ; Thu, 13 Jan 2005 17:56:31 -0500 Original-Received: from usenet02.sei.cmu.edu (usenet02.sei.cmu.edu [128.237.1.73]) by ms00.sei.cmu.edu (8.9.3/8.9.3/1.11) with ESMTP id RAA06039 for ; Thu, 13 Jan 2005 17:56:30 -0500 (EST) Original-Received: from usenet02.sei.cmu.edu (localhost [127.0.0.1]) by usenet02.sei.cmu.edu (8.12.11/8.12.11/1.3.812.13) with ESMTP id j0DMuS5c013574 for ; Thu, 13 Jan 2005 17:56:28 -0500 Original-Received: (from news@localhost) by usenet02.sei.cmu.edu (8.12.11/8.12.11/1.10) id j0DMuSub013572 for gnu-emacs-bug@moderators.isc.org; Thu, 13 Jan 2005 17:56:28 -0500 Original-To: gnu-emacs-bug@moderators.isc.org Original-Path: not-for-mail Original-Newsgroups: gnu.emacs.bug Original-Lines: 72 Original-NNTP-Posting-Host: pcmy.sei.cmu.edu Original-X-Trace: usenet02.sei.cmu.edu 1105656988 13571 128.237.7.71 (13 Jan 2005 22:56:28 GMT) Original-X-Complaints-To: abuse@sei.cmu.edu Original-NNTP-Posting-Date: Thu, 13 Jan 2005 22:56:28 +0000 (UTC) User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.) X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:10425 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:10425 I posted the following to gnu.emacs.help: On 2005-01-12 at 01:32-05, James Ralston wrote: > I'm trying to use Emacs 21.3 on Fedora Core 3 to edit files > containing Japanese text encoded with UTF-8. > > I've used the same version of Emacs on Fedora Core 2 with no > problems. Everything just works. My locale is the same on both > systems: en_US.UTF-8. > > But on my FC3 system, if I visit a UTF-8 encoded file, the Japanese > characters display as empty boxes. Also, if I paste Japanese text > into an Emacs window, and try to save the buffer, I receive this > message: > >> These default coding systems were tried: >> utf-8-unix >> However, none of them safely encodes the target text. > > This message makes no sense, because UTF-8 encodes everything. > > On my FC2 system, here's what "C-u C-x =" says: > >> character: い (0151044, 53796, 0xd224) >> charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87) >> code point: 36 36 >> syntax: word >> category: H:Japanese Hiragana characters of 2-byte character sets >> j:Japanese >> |:While filling, we can break a line at this character. >> buffer code: 0x92 0xA4 0xA4 >> file code: 0xE3 0x81 0x84 (encoded by coding system utf-8-unix) >> font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0 > > On my FC3 system, here's what "C-u C-x =" on the same character says: > >> character: い (0151044, 53796, 0xd224) >> charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87) >> code point: 36 36 >> syntax: word >> category: H:Japanese Hiragana characters of 2-byte character sets >> j:Japanese >> |:While filling, we can break a line at this character. >> buffer code: 0x92 0xA4 0xA4 >> file code: not encodable by coding system utf-8-unix >> font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0 > > The only difference is the "file code:" line. But I don't > understand why Emacs 21.3 on FC3 doesn't think that UTF-8 encodes > that character, because it absolutely does. > > The FC3 packager claims that he has no problems: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144707 > > Does anyone have any ideas? The more I ponder this, the more I'm beginning to think that this is actually a bug with Emacs that I've managed to trigger somehow. Claiming that UTF-8 doesn't encode い is bogus. I've even gone so far as to trace Emacs while I open a file that contains Japanese characters, but I didn't detect any glaring differences between Emacs on FC2 (which works) and Emacs on FC3 (which doesn't work). I'm just about out of ideas. Does anyone else have any? Thanks, James