From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Cyrillic, utf-8 and windows Date: Wed, 10 Dec 2003 08:58:28 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200312092358.IAA16526@etlken.m17n.org> References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1071014873 2358 80.91.224.253 (10 Dec 2003 00:07:53 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 10 Dec 2003 00:07:53 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Dec 10 01:07:41 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ATrtF-0002uE-00 for ; Wed, 10 Dec 2003 01:07:41 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ATrtF-0003OC-00 for ; Wed, 10 Dec 2003 01:07:41 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ATsma-00010P-8f for emacs-devel@quimby.gnus.org; Tue, 09 Dec 2003 20:04:52 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ATskR-00085y-Ab for emacs-devel@gnu.org; Tue, 09 Dec 2003 20:02:39 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ATsjl-0007ml-FR for emacs-devel@gnu.org; Tue, 09 Dec 2003 20:02:28 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ATshq-0007Bn-Oo; Tue, 09 Dec 2003 19:59:59 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hB9NwTh02393; Wed, 10 Dec 2003 08:58:29 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hB9NwTs12110; Wed, 10 Dec 2003 08:58:29 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id IAA16526; Wed, 10 Dec 2003 08:58:28 +0900 (JST) Original-To: sds@gnu.org In-reply-to: (message from Sam Steingold on Tue, 09 Dec 2003 13:25:00 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18591 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18591 Thank you for the report. In article , Sam Steingold writes: >> >> I can open in Emacs a utf-8 file with Cyrillic characters in it and it >> is displayed just fine - with correct glyphs &c. >> I set `default-input-method' to "cyrillic-yawerty" in .emacs, >> so when I try C-\ `toggle-input-method', I get 2 "character outline >> boxes" in the modeline and when I type, I see these "character outline >> boxes" in the buffer instead of the characters I just typed. >> When I save the buffer, kill it, and re-visit the file, >> I see what I just typed displayed correctly as Cyrillic! >> So, why does Emacs display the characters that I type as boxes >> (rectangles) but shows them correctly when loaded from a file on disk? Because those are different character for Emacs as you already found as below. > when I type using cyrillic-yawerty, I get this: [...] > charset: cyrillic-iso8859-5 [...] > when I save the file, kill the buffer and visit the file again, that > character becomes [...] > charset: mule-unicode-0100-24ff > So, how do I tell cyrillic-yawerty to insert UTF-8?! The input method cyrillic-yawerty generates iso-8859-5 characters, and Emacs has a facility to automatically adjust an input character to what the buffer-file-coding-system expects. But, I found a bug in that facility and insufficiency in set-default-coding-systems (called from prefer-coding-system). Please try the attached patch. But, there still exist one problem. As you don't have iso8859-5 fonts, the input-method indicator in the modeline can't be displayed correctly. For the moment, Emacs doesn't has a facility to automatically try the other fonts (e.g. iso10646-1). Emacs-unicode version has it. --- Ken'ichi HANDA handa@m17n.org *** ucs-tables.el.~1.34.~ Tue Sep 2 08:25:38 2003 --- ucs-tables.el Wed Dec 10 08:17:57 2003 *************** *** 2507,2512 **** --- 2507,2514 ---- (coding-system-base default-buffer-file-coding-system)))) (when cs (setq table (coding-system-get cs 'translation-table-for-encode)) + (if (and table (symbolp table)) + (setq table (get table 'translation-table))) (unless (char-table-p table) (setq table (coding-system-get cs 'translation-table-for-input))) (when (char-table-p table) *** mule-cmds.el.~1.249.~ Wed Nov 26 08:10:10 2003 --- mule-cmds.el Wed Dec 10 08:42:43 2003 *************** *** 321,326 **** --- 321,331 ---- o default value for the command `set-keyboard-coding-system'." (check-coding-system coding-system) (setq-default buffer-file-coding-system coding-system) + (if (fboundp 'ucs-set-table-for-input) + (dolist (buffer (buffer-list)) + (or (local-variable-p 'buffer-file-coding-system buffer) + (ucs-set-table-for-input buffer)))) + (if default-enable-multibyte-characters (setq default-file-name-coding-system coding-system)) ;; If coding-system is nil, honor that on MS-DOS as well, so