From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: tomas@tuxteam.de Newsgroups: gmane.emacs.help Subject: Re: Detecting if a file is binary Date: Tue, 24 Nov 2009 18:42:04 +0100 Message-ID: <20091124174204.GA3679@tomas> References: <825bd047-222a-46a5-9a8b-c50d6126f86d@j35g2000vbl.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; x-action=pgp-signed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1259086176 4453 80.91.229.12 (24 Nov 2009 18:09:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 24 Nov 2009 18:09:36 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: =?iso-8859-1?Q?Nordl=F6w?= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Nov 24 19:09:30 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NCzpP-0001oN-2r for geh-help-gnu-emacs@m.gmane.org; Tue, 24 Nov 2009 19:09:27 +0100 Original-Received: from localhost ([127.0.0.1]:44237 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NCzpO-0000GU-Fr for geh-help-gnu-emacs@m.gmane.org; Tue, 24 Nov 2009 13:09:26 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NCzVe-0008PR-9u for help-gnu-emacs@gnu.org; Tue, 24 Nov 2009 12:49:02 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NCzVZ-0008NB-La for help-gnu-emacs@gnu.org; Tue, 24 Nov 2009 12:49:01 -0500 Original-Received: from [199.232.76.173] (port=38388 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NCzVZ-0008N2-FD for help-gnu-emacs@gnu.org; Tue, 24 Nov 2009 12:48:57 -0500 Original-Received: from alextrapp1.equinoxe.de ([217.22.192.104]:39784 helo=www.elogos.de) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NCzVZ-0003rd-1m for help-gnu-emacs@gnu.org; Tue, 24 Nov 2009 12:48:57 -0500 Original-Received: by www.elogos.de (Postfix, from userid 1000) id C1B0590094; Tue, 24 Nov 2009 18:42:04 +0100 (CET) Content-Disposition: inline In-Reply-To: <825bd047-222a-46a5-9a8b-c50d6126f86d@j35g2000vbl.googlegroups.com> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70091 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Nov 24, 2009 at 07:23:34AM -0800, Nordl=C3=B6w wrote: > Is there a way in emacs-lisp code to detect if a file binary, that is > it does *not* contain a correct multi-character coding. > Or can every possible combination of bytes always be correctly decoded > by some character coding? Yes, it can. For all one-byte encodings of the iso-8859-x family, each byte represents a valid code point, for example. In utf-8 there are byte sequences which can't (shouldn't) happen. I think the only way to gain some confidence is by statistical analysis of the text. Regards - -- tom=C3=A1s -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFLDBrsBcgs9XrR2kYRAvtOAJ9wJZ1Q9oTHX7rJUCb/0G3IhbzzKwCfaqBt 2ZZsjoR0Skn0QwptSPQVH1A=3D =3D/HfN -----END PGP SIGNATURE-----