all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: tomas@tuxteam.de
To: help-gnu-emacs@gnu.org
Subject: Re: How to determine encoding for file?
Date: Mon, 25 Jan 2010 06:57:43 +0100	[thread overview]
Message-ID: <20100125055743.GB26580@tomas> (raw)
In-Reply-To: <87y6jn2mgd.fsf@hubble.lan.informatimago.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Jan 24, 2010 at 10:59:46PM +0100, Pascal J. Bourguignon wrote:
> kj <no.email@please.post> writes:
> 
> > I've downloaded a large file that is supposed to contain a mixture
> > of Japanese and English (it's basically a learner's dictionary).
> > The English is displayed correctly, but not so for the Japanese.
> >
> > I've tried setting the buffer's coding system to utf-8,
> > japanese-shift-jis, japanese-shift-jis-mac, japanese-shift-jis-dos
> > (just guessing).  None worked.
> >
> > In fact, I'm not even sure that any of these changes of the coding
> > system achieved *anything*, since the buffer's appearance remained
> > unchanged throughout all this mucking around.  I used the command
> > set-buffer-file-coding-system to do this.

This won't do the trick (see below for what will do). This function just
says: "forget you loaded this file as shift-JIS. From now on it will be
UTF-8" (for example). So it doesn't change anything, but when you save
the file, it will be transformed to the new coding system (if possible).

> >                                            Should I need to do
> > anything besides re-setting the coding system to see a change in
> > how the file is displayed?

You'll have to use `revert-buffer-with-coding-system' (by default mapped
to the key seqence C-x RET r). This will reload the file under
assumption of the new coding system.

> > More importantly, is there a better way to determine a file's
> > correct coding system besides trial and error?

Pascal answered this part better than I could :-)

There will be always lots of byte sequences valid under several coding
systems (but meaning different things). The methods out there to get a
grip on the problem are heuristic, partly based on statistical
properties of the text. If you want to have some fun understanding the
kind of problems involved, have a look at [1]. For an implementation in
Emacs  Lisp, see Unicad [2]

- --------
[1] <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
[2] <http://www.emacswiki.org/emacs-en/Unicad>

Regards

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLXTLXBcgs9XrR2kYRAnR5AJ9Jowgc9pPrCaW0lRe1Tv7xFGya+QCfRXJ8
mLTW2GBvke8OYbVdWiVcrcU=
=gJuQ
-----END PGP SIGNATURE-----




  reply	other threads:[~2010-01-25  5:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-24 21:29 How to determine encoding for file? kj
2010-01-24 21:59 ` Pascal J. Bourguignon
2010-01-25  5:57   ` tomas [this message]
     [not found]   ` <mailman.146.1264399148.14305.help-gnu-emacs@gnu.org>
2010-01-25 14:55     ` kj
2010-01-26  9:01       ` Thien-Thi Nguyen
2010-01-26  2:58 ` Kevin Rodgers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100125055743.GB26580@tomas \
    --to=tomas@tuxteam.de \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.