From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "William Xue" Newsgroups: gmane.emacs.devel Subject: Re: can not decode 0x93 and 0x94 to correct char Date: Sat, 29 Sep 2007 23:30:32 +0800 Organization: SMILING Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1191079919 32285 80.91.229.12 (29 Sep 2007 15:31:59 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 29 Sep 2007 15:31:59 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Stefan Monnier" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Sep 29 17:31:54 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IbeIL-00064N-I3 for ged-emacs-devel@m.gmane.org; Sat, 29 Sep 2007 17:31:53 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IbeIH-00049K-RN for ged-emacs-devel@m.gmane.org; Sat, 29 Sep 2007 11:31:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IbeHM-0003nq-G7 for emacs-devel@gnu.org; Sat, 29 Sep 2007 11:30:52 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IbeHL-0003nN-6c for emacs-devel@gnu.org; Sat, 29 Sep 2007 11:30:51 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IbeHK-0003nH-Tq for emacs-devel@gnu.org; Sat, 29 Sep 2007 11:30:51 -0400 Original-Received: from an-out-0708.google.com ([209.85.132.248]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IbeHK-00055p-LH for emacs-devel@gnu.org; Sat, 29 Sep 2007 11:30:50 -0400 Original-Received: by an-out-0708.google.com with SMTP id c25so395250ana for ; Sat, 29 Sep 2007 08:30:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:date:to:subject:from:organization:cc:content-type:mime-version:references:content-transfer-encoding:message-id:in-reply-to:user-agent; bh=SMOrmFKQnBPeAVG6WKGHv76XWa0r/a1d+NAIfLmcgyc=; b=hAy6aq/WbkkO0+sX2tGGhrbreE7f4pibTzyE7/3phw1FwXKpt4DdOf+37HX+7ayTU7Pot4e6lczxrlmnZeworlhG/pSWOk0IABQYWSX6MI8Y7Kn8Snr9ACKeHy0OktGNxEvbd/Ym9UP9dsxpTSIHp1uLa2cRhrOpIKjepDf5V6A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:date:to:subject:from:organization:cc:content-type:mime-version:references:content-transfer-encoding:message-id:in-reply-to:user-agent; b=Q/B6SQipnM3ofFnB0eSncZI9HVBzhXmyep9vH2hntcCDuAoxE/XjsKyYcaB/4XyRsbTs7Ii/NSbR1nncBsKUOhqO9blU8ZMGHYuSpcFq1Fc78zp6adoj8LecojiUYNhU54b6x4PAiT8Cgo2pe5KtYsv0ZsofSo+hCkN5uktK4Gs= Original-Received: by 10.90.78.9 with SMTP id a9mr2299194agb.1191079845565; Sat, 29 Sep 2007 08:30:45 -0700 (PDT) Original-Received: from smiling ( [222.92.135.93]) by mx.google.com with ESMTPS id k34sm7732803rvb.2007.09.29.08.30.40 (version=SSLv3 cipher=OTHER); Sat, 29 Sep 2007 08:30:44 -0700 (PDT) In-Reply-To: User-Agent: Opera Mail/9.50 (Win32) X-Detected-Kernel: Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:80016 Archived-At: On Sat, 29 Sep 2007 21:47:37 +0800, Stefan Monnier wrote: >> ; for cp1258 >> (prefer-coding-system 'windows-1258) >> ; for displaying utf-8 encoded file >> (prefer-coding-system 'utf-8-emacs) >> ; for displaying chinese characters >> (prefer-coding-system 'gb2312) > >> It would be a little problem. Because if I changed the gb2312 to gb18030 >> or gbk, the first setting (prefer-coding-system 'windows-1258) would >> be failed. > > I'm not sure what you mean by "would be failed", but when you use If I changed the gb2312 to gb18030 or gbk, the char \223 and \224, which are left and right quotation marks in cp1258, would not be decoded correctly. So I think it may not be a correct solution for this situation. If somebody want to decode Japanese, French, Russian, and so on, it's too complex > prefer-coding-system, you have to realize that it's not quite as simple > as > it sounds: > - first, the three statements above mean to try (in this order) first > gb2312, then utf-8, then windows-1258. > - second, this order should not be chosen exclusively based on how often > you expect to use each of those encodings. Because it depends a lot of > the frequency of false positives. E.g. utf-8 should usually be first, > because it has very few false positives (if the auto-detect decides > it's > utf-8, then it's very unlikely that the file isn't utf-8). > OTOH window-1258 should *not* be first because it has many false > positives: any file without a 0 byte in it is a valid windows-1258 > file. > > The second point is the main reason why the order of detection of coding > systems when reading a file should be the same as the order of > preference to > choose a coding system to use when writing a file. Thanks! > > > Stefan > > > _______________________________________________ > Emacs-devel mailing list > Emacs-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/emacs-devel -- Yours, WilliamX