From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: can not decode 0x93 and 0x94 to correct char Date: Sat, 29 Sep 2007 09:47:37 -0400 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1191073803 15652 80.91.229.12 (29 Sep 2007 13:50:03 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 29 Sep 2007 13:50:03 +0000 (UTC) Cc: Eli Zaretskii , emacs-devel@gnu.org To: "William Xue" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Sep 29 15:49:58 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ibchg-0008Pw-4j for ged-emacs-devel@m.gmane.org; Sat, 29 Sep 2007 15:49:56 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ibchc-00019Q-Dq for ged-emacs-devel@m.gmane.org; Sat, 29 Sep 2007 09:49:52 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IbchZ-00019B-KF for emacs-devel@gnu.org; Sat, 29 Sep 2007 09:49:49 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IbchX-00018z-45 for emacs-devel@gnu.org; Sat, 29 Sep 2007 09:49:48 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IbchW-00018w-VY for emacs-devel@gnu.org; Sat, 29 Sep 2007 09:49:47 -0400 Original-Received: from tomts13.bellnexxia.net ([209.226.175.34] helo=tomts13-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IbchS-0000Y5-IX; Sat, 29 Sep 2007 09:49:42 -0400 Original-Received: from ceviche.home ([65.94.190.5]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20070929134941.SHOD13659.tomts13-srv.bellnexxia.net@ceviche.home>; Sat, 29 Sep 2007 09:49:41 -0400 Original-Received: by ceviche.home (Postfix, from userid 20848) id B39A9B4062; Sat, 29 Sep 2007 09:47:37 -0400 (EDT) In-Reply-To: (William Xue's message of "Sat\, 29 Sep 2007 16\:29\:22 +0800") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux) X-Detected-Kernel: Solaris 8 (1) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:80015 Archived-At: > ; for cp1258 > (prefer-coding-system 'windows-1258) > ; for displaying utf-8 encoded file > (prefer-coding-system 'utf-8-emacs) > ; for displaying chinese characters > (prefer-coding-system 'gb2312) > It would be a little problem. Because if I changed the gb2312 to gb18030 > or gbk, the first setting (prefer-coding-system 'windows-1258) would > be failed. I'm not sure what you mean by "would be failed", but when you use prefer-coding-system, you have to realize that it's not quite as simple as it sounds: - first, the three statements above mean to try (in this order) first gb2312, then utf-8, then windows-1258. - second, this order should not be chosen exclusively based on how often you expect to use each of those encodings. Because it depends a lot of the frequency of false positives. E.g. utf-8 should usually be first, because it has very few false positives (if the auto-detect decides it's utf-8, then it's very unlikely that the file isn't utf-8). OTOH window-1258 should *not* be first because it has many false positives: any file without a 0 byte in it is a valid windows-1258 file. The second point is the main reason why the order of detection of coding systems when reading a file should be the same as the order of preference to choose a coding system to use when writing a file. Stefan