unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Detecting if a file is binary
@ 2009-11-24 15:23 Nordlöw
  2009-11-24 17:42 ` tomas
  0 siblings, 1 reply; 2+ messages in thread
From: Nordlöw @ 2009-11-24 15:23 UTC (permalink / raw)
  To: help-gnu-emacs

Is there a way in emacs-lisp code to detect if a file binary, that is
it does *not* contain a correct multi-character coding.
Or can every possible combination of bytes always be correctly decoded
by some character coding?

/Nordlöw


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Detecting if a file is binary
  2009-11-24 15:23 Detecting if a file is binary Nordlöw
@ 2009-11-24 17:42 ` tomas
  0 siblings, 0 replies; 2+ messages in thread
From: tomas @ 2009-11-24 17:42 UTC (permalink / raw)
  To: Nordlöw; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Nov 24, 2009 at 07:23:34AM -0800, Nordlöw wrote:
> Is there a way in emacs-lisp code to detect if a file binary, that is
> it does *not* contain a correct multi-character coding.
> Or can every possible combination of bytes always be correctly decoded
> by some character coding?

Yes, it can. For all one-byte encodings of the iso-8859-x family, each
byte represents a valid code point, for example. In utf-8 there are byte
sequences which can't (shouldn't) happen.

I think the only way to gain some confidence is by statistical analysis
of the text.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLDBrsBcgs9XrR2kYRAvtOAJ9wJZ1Q9oTHX7rJUCb/0G3IhbzzKwCfaqBt
2ZZsjoR0Skn0QwptSPQVH1A=
=/HfN
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-11-24 17:42 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-24 15:23 Detecting if a file is binary Nordlöw
2009-11-24 17:42 ` tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).