From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Cyrillic vs UTF-8 Date: Mon, 28 Apr 2003 18:18:41 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200304280918.SAA10779@etlken.m17n.org> References: <1858-Fri25Apr2003194023+0300-eliz@elta.co.il> <200304260811.RAA08227@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1051521663 768 80.91.224.249 (28 Apr 2003 09:21:03 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 28 Apr 2003 09:21:03 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Apr 28 11:21:01 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19A4oS-0000BY-00 for ; Mon, 28 Apr 2003 11:20:40 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 19A4wY-0001ew-00 for ; Mon, 28 Apr 2003 11:29:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19A4oI-0000aM-04 for emacs-devel@quimby.gnus.org; Mon, 28 Apr 2003 05:20:30 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19A4ne-0000E1-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 05:19:50 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19A4nD-00086G-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 05:19:30 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19A4mn-00077z-00 for emacs-devel@gnu.org; Mon, 28 Apr 2003 05:18:57 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h3S9Igo11396; Mon, 28 Apr 2003 18:18:42 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h3S9IfA19243; Mon, 28 Apr 2003 18:18:41 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id SAA10779; Mon, 28 Apr 2003 18:18:41 +0900 (JST) Original-To: jas@extundo.com In-reply-to: (message from Simon Josefsson on Sat, 26 Apr 2003 14:25:15 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: eliz@elta.co.il X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13507 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13507 In article , Simon Josefsson writes: > Kenichi Handa writes: >> Unfortunately, the current Emacs doesn't have a facility to >> detect UTF-8 byte sequence. So, if we put UTF-8 the higher >> priority, all files are detected as UTF-8. :-( > I see. Is this very difficult to solve, or why hasn't it? The > algorithm to detect UTF-8 is not that complicated. Ooops, I'm very sorry that I was wrong. The current Emacs contains a builtin utf-8 and utf-16 (with BOM) detectors. So, putting UTF-8 the higher priority should have no problem. Richard Stallman writes: > It seems binary is preferred over utf-8 and utf-16-* in > coding-category-list. This seems extremely conservative. I guess it > means UTF-8 can never be autodetected by default? > That certainly seems undesirable. Unless there is a specific reason > why it needs to be this way, I agree with you that we should raise > the priority of utf-8 and utf-16. We can raise the priority of utf-16-le-with-signature and utf-16-be-with-signature, but can't raise the priority of utf-16-le, utf-16-be, utf-16 because it's impossible to distinguish them from binary data. So, I've just installed these changes. 2003-04-28 Kenichi Handa * international/mule-cmds.el (reset-language-environment): Raise the priority of mule-utf-8, mule-utf-16-be-with-signature and mule-utf-16-le.-with-signature. * international/mule-conf.el: Set coding-category-utf-16-be to mule-utf-16-be-with-signature, coding-category-utf-16-le to mule-utf-16-le-with-signature. Raise the priority of coding-category-utf-8, coding-category-utf-16-be, and coding-category-utf-16-le --- Ken'ichi HANDA handa@m17n.org