From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?Windows-1252?B?SvxyZ2VuIEhhcnRtYW5u?= Newsgroups: gmane.emacs.help Subject: RE: Automatic recognition of some specific coding systems Date: Fri, 27 Feb 2015 13:12:46 +0100 Message-ID: References: , NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1425039199 19488 80.91.229.3 (27 Feb 2015 12:13:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Feb 2015 12:13:19 +0000 (UTC) To: "help-gnu-emacs@gnu.org" Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Feb 27 13:13:11 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YRJn2-00054D-CJ for geh-help-gnu-emacs@m.gmane.org; Fri, 27 Feb 2015 13:13:08 +0100 Original-Received: from localhost ([::1]:35146 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRJn1-00042a-I7 for geh-help-gnu-emacs@m.gmane.org; Fri, 27 Feb 2015 07:13:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41371) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRJmn-0003xR-A6 for help-gnu-emacs@gnu.org; Fri, 27 Feb 2015 07:12:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YRJmi-0002hA-6N for help-gnu-emacs@gnu.org; Fri, 27 Feb 2015 07:12:53 -0500 Original-Received: from dub004-omc4s6.hotmail.com ([157.55.2.81]:51280) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRJmh-0002gy-UP for help-gnu-emacs@gnu.org; Fri, 27 Feb 2015 07:12:48 -0500 Original-Received: from DUB124-W41 ([157.55.2.72]) by DUB004-OMC4S6.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Fri, 27 Feb 2015 04:12:46 -0800 X-TMN: [f3jc7LfGapYY0x9DNEp1Jrdgtx98f/b8] X-Originating-Email: [juergen_hartmann_@hotmail.com] Importance: Normal In-Reply-To: X-OriginalArrivalTime: 27 Feb 2015 12:12:46.0884 (UTC) FILETIME=[B2935640:01D05286] X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 157.55.2.81 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:102917 Archived-At: Thank you=2C Yuri Khan=2C for widening the perspective:=0A= =0A= > The general problem you=92re solving is that of encoding detection.=0A= > There exist ready-made solutions for that=2C e.g. by computing byte=0A= > frequencies and matching them against known character frequencies in=0A= > your language. One of these is called enca.=0A= >=0A= > Googling for =93emacs enca=94 yields a post by Dmitriyi Paduchikh in=0A= > gnu.emacs.sources=2C dated 2007.=0A= >=0A= > https://lists.gnu.org/archive/html/gnu-emacs-sources/2007-06/msg00037.htm= l=0A= =0A= To use Google is always a good advise that I will gratefully follow=0A= once more with respect to this broader background.=0A= =0A= Actually I didn't know Enca at all up to now: A language based attempt=0A= to recognize encoding is an interesting idea.=0A= =0A= Unfortunately=2C Enca can not be used in my special case=2C because--I=0A= didn't mention this before=2C sorry--the text files to handle are mostly=0A= in English and German. For the former ones encoding is not an issue=2C=0A= and for the latter the language German is not supported by Enca.=0A= =0A= Enca 1.14 for example only supports=0A= =0A= Belarussian=0A= Bulgarian=0A= Czech=0A= Estonian=0A= Croatian=0A= Hungarian=0A= Lithuanian=0A= Latvian=0A= Polish=0A= Russian=0A= Slovak=0A= Slovene=0A= Ukrainian=0A= Chinese=0A= =0A= But for people that use any of these languages this might be a=0A= promising option.=0A= =0A= Apart from that--and this might be helpful in my case also--the idea=0A= to use an external software to detect encoding is very charming=2C and=0A= maybe it is possible to adapt the lisp snippets contained in your link=0A= to other programs. E.g.=0A= =0A= find -bi ...=0A= =0A= is capable to identify file encodings although it recognizes cp850=0A= rather non-specifically as "unknown-8bit".=0A= =0A= So thank you very much for your suggestions.=0A= =0A= Juergen=0A= =0A= =