From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ken Newsgroups: gmane.emacs.help Subject: Re: utf8 char display in buffer Date: Fri, 12 Jun 2009 10:54:23 -0400 Message-ID: <4A326C1F.1060601@mousecar.com> References: <7I2dndeTy7sqkLLXnZ2dnUVZ_gmdnZ2d@sysmatrix.net> Reply-To: gebser@mousecar.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1244819280 14086 80.91.229.12 (12 Jun 2009 15:08:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 12 Jun 2009 15:08:00 +0000 (UTC) To: GNU Emacs List Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Jun 12 17:07:55 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MF8Ly-0005Dq-FL for geh-help-gnu-emacs@m.gmane.org; Fri, 12 Jun 2009 17:07:53 +0200 Original-Received: from localhost ([127.0.0.1]:57060 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MF8Lv-0002KR-TO for geh-help-gnu-emacs@m.gmane.org; Fri, 12 Jun 2009 11:07:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MF8AC-0005eM-1U for help-gnu-emacs@gnu.org; Fri, 12 Jun 2009 10:55:28 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MF8AA-0005cR-J9 for help-gnu-emacs@gnu.org; Fri, 12 Jun 2009 10:55:27 -0400 Original-Received: from [199.232.76.173] (port=52864 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MF8AA-0005cJ-Ay for help-gnu-emacs@gnu.org; Fri, 12 Jun 2009 10:55:26 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]:28948) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MF8AA-000746-5o for help-gnu-emacs@gnu.org; Fri, 12 Jun 2009 10:55:26 -0400 Original-Received: from mout.perfora.net ([74.208.4.195]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MF8A8-0000fM-7e for help-gnu-emacs@gnu.org; Fri, 12 Jun 2009 10:55:24 -0400 Original-Received: from [192.168.0.26] (dsl093-011-017.cle1.dsl.speakeasy.net [66.93.11.17]) by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis) id 0MKpCa-1MF89C3F49-000d7q; Fri, 12 Jun 2009 10:54:28 -0400 User-Agent: Thunderbird 2.0.0.0 (X11/20070326) In-Reply-To: X-Enigmail-Version: 0.95.7 OpenPGP: id=5AD091E7 X-Provags-ID: V01U2FsdGVkX18g/jedh9H+OF6YJEKqUL12LgwV+rowtbErnZp nnK1XTGzj1uNtT+YF96dR+MlCAAlahC72uwYMWrACjwrhLdgxk X3BIrKQdFtlIlsAu2FqTPMJ0w0x4Yyn X-Detected-Operating-System: by mx20.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:65207 Archived-At: Ed, Thanks for distributing. Everyone responding to this thread, Please either CC me when posting about this issue or else edit the "To" field so that your response comes to the whole list. I'd like to get everyone's input. Thanks. Lewis, Thanks for posting. It's lonely out there when you're the only one with a particular problem. To make sure we're suffering the same cyber-indignity, here's the scenario as I see it (from an older version of emacs running on Linux): 0) Some others and myself want to include some non-English characters in a file being edited in emacs. Problems arise, however: 1) In a buffer which is already utf-8 encoded, I set the appropriate input method, type in the desired characters. They display just peachy and there is happiness in EmacsLand. 2) I save the buffer to a file, then close the buffer. 3) I visit the same file (i.e., load it again into emacs). Because it has <!-- -*- coding: utf-8; -*- --> as the first line, it opens utf-8 encoded. This is confirmed by the presence of a 'u' as the second character in the status bar. 4) The text in the buffer displays fine, except that in place of each of those non-English characters is a little empty box. With the cursor on one of those boxes, an 'a' with a horizontal bar above it, doing "C-x =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)". (While, in emacs the character after "Char:" is a little box, if I load this same file into Firefox, that same character appears as it should, as an 'a' with a horizontal bar above it. How it appears in your email client will depend upon your email client.) A) The fact that, as described in (4), the characters display correctly in Firefox, but not in emacs indicates that emacs is not drawing on the needed character set. Yet, the fact that in (1) the characters initially display correctly (when first input) indicates that the needed character set is present on the system and emacs can find it and has permission access it. Further, we would think that emacs would throw out an error message if either of these conditions were not met... and it doesn't. We can only assume that, when visiting and then decoding a file and pulling into a buffer for display, emacs is not even asking for the proper character set when encountering a non-English character. This is where I would start to look for the error. B) It would be helpful if the code which does the decoding of a file and renders it into the buffer display, if that part of it would throw an error message when it encounters a character it doesn't know how to display, i.e., when a little box character is displayed. After all, isn't it an error when a little box is displayed in lieu of the correct character? Possible error messages would be something like: "decoding process can't find /path/to/charset.file" or "decoding process doesn't have requisite permission to read /path/to/charset.file" or "invalid character: [hex/decimal value]" or other. On 06/10/2009 11:21 PM B. T. Raven wrote: > Lewis Perin wrote: >> I've been following this thread closely because I have the original >> poster's problem, only the characters that give me trouble are some - >> not many, actually - Chinese characters, e.g. ni3, the normal second >> person pronoun. And, as with the original poster, the troublesome >> characters, when copied and pasted to other applications from Emacs, >> display perfectly. >> >> "B. T. Raven" writes: >> >>> [...] >>> (set-language-environment 'UTF-8) >>> (set-default-coding-systems 'utf-8) >>> (setq file-name-coding-system 'utf-8) >>> (setq default-buffer-file-coding-system 'utf-8) >>> (setq coding-system-for-write 'utf-8) >>> (set-keyboard-coding-system 'utf-8) >>> (set-terminal-coding-system 'utf-8) >>> (set-clipboard-coding-system 'utf-8) >>> (set-selection-coding-system 'utf-8) >>> (prefer-coding-system 'utf-8) >>> (modify-coding-system-alist 'process >>> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos) >>> >>> >>> and try C-x ret c utf-8 >>> C-x C-f >>> >>> to open the file. >> >> I tried this, but it didn't help. Emacs 22.3 / Win32. > > Even on Emacs 23 although I see the characters in the buffer, I can't > save the following as utf-8: > > nǐ hǎo 你 好 > u+4f60 and u+597d > > Or at least not so as to be readable with 22.3. Both versions are using > Arial Unicode MS. > > Why is that? > > >> >> /Lew >> --- >> Lew Perin / perin@acm.org >> http://www.panix.com/~perin/babelcarp.html