all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Jürgen Hartmann" <juergen_hartmann_@hotmail.com>
To: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Subject: RE: Automatic recognition of some specific coding systems
Date: Fri, 27 Feb 2015 13:12:46 +0100	[thread overview]
Message-ID: <DUB124-W4108005C2C652E82C66322A8150@phx.gbl> (raw)
In-Reply-To: <CAP_d_8Vn9Nrdi+uXtxUQikQK5Sno+qX3O0SWFHzA8wKgX9Bqtw@mail.gmail.com>

Thank you, Yuri Khan, for widening the perspective:

> The general problem you’re solving is that of encoding detection.
> There exist ready-made solutions for that, e.g. by computing byte
> frequencies and matching them against known character frequencies in
> your language. One of these is called enca.
>
> Googling for “emacs enca” yields a post by Dmitriyi Paduchikh in
> gnu.emacs.sources, dated 2007.
>
> https://lists.gnu.org/archive/html/gnu-emacs-sources/2007-06/msg00037.html

To use Google is always a good advise that I will gratefully follow
once more with respect to this broader background.

Actually I didn't know Enca at all up to now: A language based attempt
to recognize encoding is an interesting idea.

Unfortunately, Enca can not be used in my special case, because--I
didn't mention this before, sorry--the text files to handle are mostly
in English and German. For the former ones encoding is not an issue,
and for the latter the language German is not supported by Enca.

Enca 1.14 for example only supports

   Belarussian
   Bulgarian
   Czech
   Estonian
   Croatian
   Hungarian
   Lithuanian
   Latvian
   Polish
   Russian
   Slovak
   Slovene
   Ukrainian
   Chinese

But for people that use any of these languages this might be a
promising option.

Apart from that--and this might be helpful in my case also--the idea
to use an external software to detect encoding is very charming, and
maybe it is possible to adapt the lisp snippets contained in your link
to other programs. E.g.

   find -bi ...

is capable to identify file encodings although it recognizes cp850
rather non-specifically as "unknown-8bit".

So thank you very much for your suggestions.

Juergen

 		 	   		  

  reply	other threads:[~2015-02-27 12:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24 15:31 Automatic recognition of some specific coding systems Jürgen Hartmann
2015-02-24 18:28 ` Eli Zaretskii
2015-02-24 22:30   ` Jürgen Hartmann
2015-02-25 16:19     ` Eli Zaretskii
2015-02-25 17:53       ` Jürgen Hartmann
2015-02-25 20:29         ` Eli Zaretskii
2015-02-25 23:23           ` Jürgen Hartmann
2015-02-26 16:36             ` Eli Zaretskii
2015-02-26 22:34               ` Jürgen Hartmann
2015-02-28 16:55                 ` Eli Zaretskii
2015-03-03 22:58                   ` Jürgen Hartmann
2015-02-27  1:50 ` Yuri Khan
2015-02-27 12:12   ` Jürgen Hartmann [this message]
2015-02-27 12:25     ` Jürgen Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DUB124-W4108005C2C652E82C66322A8150@phx.gbl \
    --to=juergen_hartmann_@hotmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.