From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.devel Subject: Re: Cyrillic vs UTF-8 Date: Sun, 04 May 2003 14:01:03 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <1858-Fri25Apr2003194023+0300-eliz@elta.co.il> <200304260811.RAA08227@etlken.m17n.org> <200305010827.RAA15024@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1052049885 23097 80.91.224.249 (4 May 2003 12:04:45 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 4 May 2003 12:04:45 +0000 (UTC) Cc: Kenichi Handa Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sun May 04 14:04:43 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19CIEU-00060G-00 for ; Sun, 04 May 2003 14:04:42 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 19CIH0-00039R-00 for ; Sun, 04 May 2003 14:07:18 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19CIFe-0000wM-03 for emacs-devel@quimby.gnus.org; Sun, 04 May 2003 08:05:54 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19CIEs-0000Zl-00 for emacs-devel@gnu.org; Sun, 04 May 2003 08:05:06 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19CIEh-0000Pg-00 for emacs-devel@gnu.org; Sun, 04 May 2003 08:04:56 -0400 Original-Received: from 178.230.13.217.in-addr.dgcsystems.net ([217.13.230.178] helo=yxa.extundo.com) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19CIBA-0006Tm-00; Sun, 04 May 2003 08:01:16 -0400 Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) by yxa.extundo.com (8.12.9/8.12.9) with ESMTP id h44C13bT023137; Sun, 4 May 2003 14:01:03 +0200 Original-To: Dave Love Mail-Copies-To: nobody X-Payment: hashcash 1.2 0:030504:d.love@dl.ac.uk:66d25a0a4812ad9a X-Hashcash: 0:030504:d.love@dl.ac.uk:66d25a0a4812ad9a X-Payment: hashcash 1.2 0:030504:handa@m17n.org:45ccc5eb856960e1 X-Hashcash: 0:030504:handa@m17n.org:45ccc5eb856960e1 X-Payment: hashcash 1.2 0:030504:rms@gnu.org:903b583bb27e649f X-Hashcash: 0:030504:rms@gnu.org:903b583bb27e649f X-Payment: hashcash 1.2 0:030504:eliz@elta.co.il:3d600c4b22b1af3d X-Hashcash: 0:030504:eliz@elta.co.il:3d600c4b22b1af3d X-Payment: hashcash 1.2 0:030504:emacs-devel@gnu.org:505dc81fd1a97fb5 X-Hashcash: 0:030504:emacs-devel@gnu.org:505dc81fd1a97fb5 In-Reply-To: (Dave Love's message of "04 May 2003 12:04:13 +0100") User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3 (gnu/linux) Original-cc: rms@gnu.org Original-cc: eliz@elta.co.il Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13670 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13670 Dave Love writes: > What is this thread all about? I've replied to private mail to > correct misconceptions, but it seemed to be nothing to do with > Cyrillic (which I did all the recent work on as far as I know). The original problem was that saving cyrillic (e.g., from the HELLO file) as UTF-8 weren't auto-detected as UTF-8 when loading the file back again. This has been fixed now. Another problem was that Emacs, when asked to load the file as UTF-8, picked a Unicode font that didn't include this glyphs. This has not been fixed (although Stephen seemed to have some ideas). To workaround the problem, users need to define a fontset, and use it. Doing this is rather user unfriendly (X resource, or elisp) so I suggested making it possible to customize fontsets. My other Cyrillic thread was that (double-width) cyrillic isn't possible to save as UTF-8 at all. This was fixed by adding a PROBLEMS entry that says not all of Unicode is supported. While discussing it, it seems like the real problem was the cut'n'paste behavior that generated the double-width cyrillic in the first place, so there were some discussion about making Emacs use UTF8_STRING, when available, instead of COMPOUND_TEXT. I hope this summarizes the thread.