From: Kenichi Handa <handa@m17n.org>
To: poppyer <poppyer@gmail.com>
Cc: cyd@stupidchicken.com, 1654@emacsbugs.donarmstrong.com
Subject: bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working
Date: Fri, 27 Mar 2009 15:55:16 +0900 [thread overview]
Message-ID: <E1Ln5yG-0002fU-65@etlken> (raw)
In-Reply-To: <ukwsabwo8x.fsf@nusnet-97-126.dynip.nus.edu.sg> (message from poppyer on Fri, 27 Mar 2009 13:03:58 +0800)
In article <ukwsabwo8x.fsf@nusnet-97-126.dynip.nus.edu.sg>, poppyer <poppyer@gmail.com> writes:
> But for the big5, in the list returned by
> "(detect_coding_region (region-beginning) (region-end))",
> there is not big5. I do understand that gbk and big5's sequences might
> not be easy to distinguish, but in this case, both encodings are
> compatible to the input literal text, so both should be in the returned list. Am
> I right?
You are right. But, the current Emacs can't have both GBK
and Big5 in a list of coding systems to try for detecting
because they are in the same category of coding-system
(i.e. charset-base). I know that this restriction is not
good, and improving it is in my todo list, but I still don't
have a time to work on it.
> BTW, is that any hook that I can put after the coding detection? I might
> want to write a small lisp to distinguish BIG5 and GBK (by char statistics,
> for example).
We don't have such a hook, but I think you can use
after-insert-file-functions for reading a file. When that
hook is called, the buffer already contains a text decoded
by buffer-file-coding-system. You can re-decode the newly
inserted text as this:
(defun check-gbk-big5 (nchars)
(if (and enable-multibyte-characters
(not coding-system-for-read)
(coding-system-equal
'chinese-gbk (coding-system-base buffer-file-coding-system)))
(let* ((pos (point))
(end (+ pos nchars))
(modified (buffer-modified-p)))
(when (search-forward "\x5201" end t) ;; (*1)
(save-restriction
(goto-char pos)
(narrow-to-region pos end)
(encode-coding-region pos end buffer-file-coding-system)
(decode-coding-region pos (point-max) 'big5)
(set-buffer-file-coding-system last-coding-system-used)
(set-buffer-modified-p modified)
(setq nchars (point-max))))))
nchars)
(add-hook 'after-insert-file-functions 'check-gbk-big5)
You can change (*1) part to your check function.
---
Kenichi Handa
handa@m17n.org
next prev parent reply other threads:[~2009-03-27 6:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-27 2:59 bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working Chong Yidong
2009-03-27 4:34 ` Kenichi Handa
2009-03-27 4:58 ` Chong Yidong
2009-03-27 5:20 ` poppyer
2009-03-27 7:29 ` Kenichi Handa
2009-03-27 6:02 ` Kenichi Handa
2009-03-27 5:03 ` poppyer
2009-03-27 6:55 ` Kenichi Handa [this message]
2009-03-27 7:00 ` poppyer
2009-03-27 8:52 ` poppyer
2009-03-30 1:08 ` Kenichi Handa
-- strict thread matches above, loose matches on Subject: below --
2008-12-21 8:21 Gaofeng Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1Ln5yG-0002fU-65@etlken \
--to=handa@m17n.org \
--cc=1654@emacsbugs.donarmstrong.com \
--cc=cyd@stupidchicken.com \
--cc=poppyer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).