From: tomas@tuxteam.de
To: help-gnu-emacs@gnu.org
Subject: Re: How to determine encoding for file?
Date: Mon, 25 Jan 2010 06:57:43 +0100 [thread overview]
Message-ID: <20100125055743.GB26580@tomas> (raw)
In-Reply-To: <87y6jn2mgd.fsf@hubble.lan.informatimago.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sun, Jan 24, 2010 at 10:59:46PM +0100, Pascal J. Bourguignon wrote:
> kj <no.email@please.post> writes:
>
> > I've downloaded a large file that is supposed to contain a mixture
> > of Japanese and English (it's basically a learner's dictionary).
> > The English is displayed correctly, but not so for the Japanese.
> >
> > I've tried setting the buffer's coding system to utf-8,
> > japanese-shift-jis, japanese-shift-jis-mac, japanese-shift-jis-dos
> > (just guessing). None worked.
> >
> > In fact, I'm not even sure that any of these changes of the coding
> > system achieved *anything*, since the buffer's appearance remained
> > unchanged throughout all this mucking around. I used the command
> > set-buffer-file-coding-system to do this.
This won't do the trick (see below for what will do). This function just
says: "forget you loaded this file as shift-JIS. From now on it will be
UTF-8" (for example). So it doesn't change anything, but when you save
the file, it will be transformed to the new coding system (if possible).
> > Should I need to do
> > anything besides re-setting the coding system to see a change in
> > how the file is displayed?
You'll have to use `revert-buffer-with-coding-system' (by default mapped
to the key seqence C-x RET r). This will reload the file under
assumption of the new coding system.
> > More importantly, is there a better way to determine a file's
> > correct coding system besides trial and error?
Pascal answered this part better than I could :-)
There will be always lots of byte sequences valid under several coding
systems (but meaning different things). The methods out there to get a
grip on the problem are heuristic, partly based on statistical
properties of the text. If you want to have some fun understanding the
kind of problems involved, have a look at [1]. For an implementation in
Emacs Lisp, see Unicad [2]
- --------
[1] <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
[2] <http://www.emacswiki.org/emacs-en/Unicad>
Regards
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFLXTLXBcgs9XrR2kYRAnR5AJ9Jowgc9pPrCaW0lRe1Tv7xFGya+QCfRXJ8
mLTW2GBvke8OYbVdWiVcrcU=
=gJuQ
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2010-01-25 5:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-24 21:29 How to determine encoding for file? kj
2010-01-24 21:59 ` Pascal J. Bourguignon
2010-01-25 5:57 ` tomas [this message]
[not found] ` <mailman.146.1264399148.14305.help-gnu-emacs@gnu.org>
2010-01-25 14:55 ` kj
2010-01-26 9:01 ` Thien-Thi Nguyen
2010-01-26 2:58 ` Kevin Rodgers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100125055743.GB26580@tomas \
--to=tomas@tuxteam.de \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.