unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: poppyer <poppyer@gmail.com>
Cc: cyd@stupidchicken.com, 1654@emacsbugs.donarmstrong.com
Subject: bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working
Date: Fri, 27 Mar 2009 15:55:16 +0900	[thread overview]
Message-ID: <E1Ln5yG-0002fU-65@etlken> (raw)
In-Reply-To: <ukwsabwo8x.fsf@nusnet-97-126.dynip.nus.edu.sg> (message from poppyer on Fri, 27 Mar 2009 13:03:58 +0800)

In article <ukwsabwo8x.fsf@nusnet-97-126.dynip.nus.edu.sg>, poppyer <poppyer@gmail.com> writes:

> But for the big5, in the list returned by 
> "(detect_coding_region (region-beginning) (region-end))", 
> there is not big5. I do understand that gbk and big5's sequences might
> not be easy to distinguish, but in this case, both encodings are
> compatible to the input literal text, so both should be in the returned list. Am
> I right?

You are right.  But, the current Emacs can't have both GBK
and Big5 in a list of coding systems to try for detecting
because they are in the same category of coding-system
(i.e. charset-base).   I know that this restriction is not
good, and improving it is in my todo list, but I still don't
have a time to work on it.

> BTW, is that any hook that I can put after the coding detection? I might
> want to write a small lisp to distinguish BIG5 and GBK (by char statistics,
> for example).

We don't have such a hook, but I think you can use
after-insert-file-functions for reading a file.  When that
hook is called, the buffer already contains a text decoded
by buffer-file-coding-system.  You can re-decode the newly
inserted text as this:

(defun check-gbk-big5 (nchars)
  (if (and enable-multibyte-characters
	   (not coding-system-for-read)
	   (coding-system-equal
	    'chinese-gbk (coding-system-base buffer-file-coding-system)))
      (let* ((pos (point))
	     (end (+ pos nchars))
	     (modified (buffer-modified-p)))
	(when (search-forward "\x5201" end t)  ;; (*1)
	  (save-restriction
	    (goto-char pos)
	    (narrow-to-region pos end)
	    (encode-coding-region pos end buffer-file-coding-system)
	    (decode-coding-region pos (point-max) 'big5)
	    (set-buffer-file-coding-system last-coding-system-used)
	    (set-buffer-modified-p modified)
	    (setq nchars (point-max))))))
  nchars)

(add-hook 'after-insert-file-functions 'check-gbk-big5)

You can change (*1) part to your check function.

---
Kenichi Handa
handa@m17n.org






  reply	other threads:[~2009-03-27  6:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-27  2:59 bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working Chong Yidong
2009-03-27  4:34 ` Kenichi Handa
2009-03-27  4:58   ` Chong Yidong
2009-03-27  5:20     ` poppyer
2009-03-27  7:29       ` Kenichi Handa
2009-03-27  6:02     ` Kenichi Handa
2009-03-27  5:03   ` poppyer
2009-03-27  6:55     ` Kenichi Handa [this message]
2009-03-27  7:00       ` poppyer
2009-03-27  8:52       ` poppyer
2009-03-30  1:08         ` Kenichi Handa
  -- strict thread matches above, loose matches on Subject: below --
2008-12-21  8:21 Gaofeng Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1Ln5yG-0002fU-65@etlken \
    --to=handa@m17n.org \
    --cc=1654@emacsbugs.donarmstrong.com \
    --cc=cyd@stupidchicken.com \
    --cc=poppyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).