From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.devel Subject: Cyrillic vs UTF-8 Date: Fri, 25 Apr 2003 18:12:17 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1051287401 16055 80.91.224.249 (25 Apr 2003 16:16:41 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 25 Apr 2003 16:16:41 +0000 (UTC) Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Apr 25 18:16:39 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1995sN-0004Ap-00 for ; Fri, 25 Apr 2003 18:16:39 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 1995zA-00037P-00 for ; Fri, 25 Apr 2003 18:23:40 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 1995rz-0007XY-00 for emacs-devel@quimby.gnus.org; Fri, 25 Apr 2003 12:16:15 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 1995rc-0007Jc-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 12:15:52 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 1995rC-0006xy-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 12:15:27 -0400 Original-Received: from 178.230.13.217.in-addr.dgcsystems.net ([217.13.230.178] helo=yxa.extundo.com) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 1995oI-0005Y8-00 for emacs-devel@gnu.org; Fri, 25 Apr 2003 12:12:26 -0400 Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.9/8.12.9) with ESMTP id h3PGCI07020019 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK) for ; Fri, 25 Apr 2003 18:12:18 +0200 Original-To: emacs-devel@gnu.org Mail-Copies-To: nobody X-Payment: hashcash 1.2 0:030425:emacs-devel@gnu.org:00c859e111159bf4 X-Hashcash: 0:030425:emacs-devel@gnu.org:00c859e111159bf4 User-Agent: Gnus/5.090019 (Oort Gnus v0.19) Emacs/21.3.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13448 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13448 $ emacs -q --no-site-file C-h H (view HELLO file) Mark the line with Russian text with mouse q (quit HELLO file) C-x C-f ff RET (open a new file) C-y (yank the text, looks fine in the new buffer) C-x C-s (save file, it complains that iso-latin-1 cannot encode the data, and suggests utf-8) RET (go with the default utf-8) C-x C-k (kill buffer) C-x C-f ff RET (open file again) (emacs fail to recognize it as utf-8 and displays gibberis= h) C-x C-k (kill buffer) C-x RET c utf-8 C-x C-f ff RET (open fail as utf-8) (emacs recognize the file as utf-8 but display empty boxes) Pressing C-u C-x =3D on the first empty box (first non-ascii character) shows: character: =D0=A0 (01212100, 332864, 0x51440) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: 40 64 syntax: w which means: word category: y:Cyrillic=20=20 buffer code: 0x9C 0xF4 0xA8 0xC0 file code: 0xD0 0xA0 (encoded by coding system mule-utf-8-unix) Unicode: 0420 font: -Adobe-Courier-Medium-R-Normal--17-120-100-100-M-100-ISO10646-1 I think there are two problems. Opening the file the first time should guess it is a utf-8 file. Secondly, emacs should be able to find a font that contains the characters -- I have all font packages from Debian installed. The following works fine: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 In GNU Emacs 21.3.50.12 (i686-pc-linux-gnu) of 2003-04-25 on latte.josefsson.org configured using `configure '--with-gtk'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: en_US.UTF-8 value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: en_US.UTF-8 value of $LANG: nil locale-coding-system: nil default-enable-multibyte-characters: t Recent input: M-x r e p o r Recent messages: (emacs -q) Loading tool-bar...done Loading image...done Loading tooltip...done For information about the GNU Project and its goals, type C-h C-p. Loading emacsbug...done