From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.help Subject: Re: Automatic recognition of some specific coding systems Date: Fri, 27 Feb 2015 08:50:26 +0700 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1425001873 19620 80.91.229.3 (27 Feb 2015 01:51:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Feb 2015 01:51:13 +0000 (UTC) Cc: "help-gnu-emacs@gnu.org" To: =?UTF-8?Q?J=C3=BCrgen_Hartmann?= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Feb 27 02:51:12 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YRA59-0003vh-N0 for geh-help-gnu-emacs@m.gmane.org; Fri, 27 Feb 2015 02:51:11 +0100 Original-Received: from localhost ([::1]:33441 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRA58-0005sV-S9 for geh-help-gnu-emacs@m.gmane.org; Thu, 26 Feb 2015 20:51:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53065) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRA4n-0005nz-4e for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 20:50:50 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YRA4m-0000dO-8c for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 20:50:49 -0500 Original-Received: from mail-ie0-x233.google.com ([2607:f8b0:4001:c03::233]:36037) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRA4l-0000d0-Um for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 20:50:48 -0500 Original-Received: by ierx19 with SMTP id x19so24310199ier.3 for ; Thu, 26 Feb 2015 17:50:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=55QYVy9gaLM2IzGSZEXzmea9hZ2+YQmLjX/obsmDLD0=; b=gpEnG632Cop0vRKl6THbB5viHGxFM2EZ9lB9mkGtzfockh3pHZWx6oSjcJ2UNtLSJh VhqaB0Tb9ZkMNlCwHhebRBwwrbQVij3eJFM7VvWLRB0QV8RoPfjkej9FSEG0UMxCZ7xg OmkhbHh7S36nmTjH69LDshDHlMv8BItSLRAPz5OIj/uWcR7vDLE8KAwgyI0hLsWpezJ8 vCLiRM7wVhAsbmDi5k8lj6OKx6Ln3+mlwiSqlzHn2KTqk2JU97Bf4p+/EPfqN11Gd8OC 7L/uE2deMx4XiJpz9sZHBP1Q/zKh1ULIgawT0erFHOd6hUuy3eGps/CeXUDw41Z5K3HG aAPQ== X-Received: by 10.50.109.228 with SMTP id hv4mr1178633igb.45.1425001846148; Thu, 26 Feb 2015 17:50:46 -0800 (PST) Original-Received: by 10.107.48.193 with HTTP; Thu, 26 Feb 2015 17:50:26 -0800 (PST) In-Reply-To: X-Google-Sender-Auth: 3qZwi3rn1Et3rae0Mgu1EdUd-ZA X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4001:c03::233 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:102912 Archived-At: On Tue, Feb 24, 2015 at 9:31 PM, J=C3=BCrgen Hartmann wrote: > Most of the text files that I have to work with are encoded with one > of the coding systems > > utf-8-unix > latin-9-unix > cp850-dos > [=E2=80=A6] Now that Eli has suggested a direction of your search, I=E2=80=99ll go in a= nd suggest another. The general problem you=E2=80=99re solving is that of encoding detection. There exist ready-made solutions for that, e.g. by computing byte frequencies and matching them against known character frequencies in your language. One of these is called enca. Googling for =E2=80=9Cemacs enca=E2=80=9D yields a post by Dmitriyi Paduchi= kh in gnu.emacs.sources, dated 2007. https://lists.gnu.org/archive/html/gnu-emacs-sources/2007-06/msg00037.html