From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working Date: Fri, 27 Mar 2009 15:55:16 +0900 Message-ID: References: <87k56bu0uu.fsf@cyd.mit.edu> Reply-To: Kenichi Handa , 1654@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1238137485 11095 80.91.229.12 (27 Mar 2009 07:04:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Mar 2009 07:04:45 +0000 (UTC) Cc: cyd@stupidchicken.com, 1654@emacsbugs.donarmstrong.com To: poppyer Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Mar 27 08:06:02 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ln68F-0003Ii-S3 for geb-bug-gnu-emacs@m.gmane.org; Fri, 27 Mar 2009 08:05:49 +0100 Original-Received: from localhost ([127.0.0.1]:57067 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ln66n-00042p-UZ for geb-bug-gnu-emacs@m.gmane.org; Fri, 27 Mar 2009 03:04:05 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ln66j-00041Q-Nj for bug-gnu-emacs@gnu.org; Fri, 27 Mar 2009 03:04:01 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ln66e-000417-5K for bug-gnu-emacs@gnu.org; Fri, 27 Mar 2009 03:04:00 -0400 Original-Received: from [199.232.76.173] (port=53118 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ln66e-000414-0D for bug-gnu-emacs@gnu.org; Fri, 27 Mar 2009 03:03:56 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]:1612) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Ln66d-0001sM-9Y for bug-gnu-emacs@gnu.org; Fri, 27 Mar 2009 03:03:55 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ln66b-0007cR-Vr for bug-gnu-emacs@gnu.org; Fri, 27 Mar 2009 03:03:54 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n2R73jCP003631; Fri, 27 Mar 2009 00:03:45 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id n2R703E1002442; Fri, 27 Mar 2009 00:00:03 -0700 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: Kenichi Handa Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Fri, 27 Mar 2009 07:00:03 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: followup 1654 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 1654-submit@emacsbugs.donarmstrong.com id=B1654.12381368801026 (code B ref 1654); Fri, 27 Mar 2009 07:00:03 +0000 Original-Received: (at 1654) by emacsbugs.donarmstrong.com; 27 Mar 2009 06:54:40 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n2R6sb92001020 for <1654@emacsbugs.donarmstrong.com>; Thu, 26 Mar 2009 23:54:38 -0700 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n2R6sarO001614; Fri, 27 Mar 2009 15:54:36 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n2R6sa6T018544; Fri, 27 Mar 2009 15:54:36 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id n2R6sZVX021323; Fri, 27 Mar 2009 15:54:35 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1Ln5yG-0002fU-65; Fri, 27 Mar 2009 15:55:16 +0900 In-reply-to: (message from poppyer on Fri, 27 Mar 2009 13:03:58 +0800) X-detected-kernel: by mx20.gnu.org: Linux 2.6 (newer, 3) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Resent-Date: Fri, 27 Mar 2009 03:04:00 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:26668 Archived-At: In article , poppyer writes: > But for the big5, in the list returned by > "(detect_coding_region (region-beginning) (region-end))", > there is not big5. I do understand that gbk and big5's sequences might > not be easy to distinguish, but in this case, both encodings are > compatible to the input literal text, so both should be in the returned list. Am > I right? You are right. But, the current Emacs can't have both GBK and Big5 in a list of coding systems to try for detecting because they are in the same category of coding-system (i.e. charset-base). I know that this restriction is not good, and improving it is in my todo list, but I still don't have a time to work on it. > BTW, is that any hook that I can put after the coding detection? I might > want to write a small lisp to distinguish BIG5 and GBK (by char statistics, > for example). We don't have such a hook, but I think you can use after-insert-file-functions for reading a file. When that hook is called, the buffer already contains a text decoded by buffer-file-coding-system. You can re-decode the newly inserted text as this: (defun check-gbk-big5 (nchars) (if (and enable-multibyte-characters (not coding-system-for-read) (coding-system-equal 'chinese-gbk (coding-system-base buffer-file-coding-system))) (let* ((pos (point)) (end (+ pos nchars)) (modified (buffer-modified-p))) (when (search-forward "\x5201" end t) ;; (*1) (save-restriction (goto-char pos) (narrow-to-region pos end) (encode-coding-region pos end buffer-file-coding-system) (decode-coding-region pos (point-max) 'big5) (set-buffer-file-coding-system last-coding-system-used) (set-buffer-modified-p modified) (setq nchars (point-max)))))) nchars) (add-hook 'after-insert-file-functions 'check-gbk-big5) You can change (*1) part to your check function. --- Kenichi Handa handa@m17n.org