* Re: Coding system prefer
2009-03-03 20:55 ` Maze
@ 2009-03-04 9:11 ` Peter Dyballa
2009-03-04 9:53 ` Fedor Khod'kov
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Peter Dyballa @ 2009-03-04 9:11 UTC (permalink / raw)
To: Maze; +Cc: help-gnu-emacs
Am 03.03.2009 um 21:55 schrieb Maze:
> Can't believe, that Emacs don't auto-detect these encodings...
How do you detect the encodings?
Can you write the algorithm? I mean, not in Emacs Lisp, just in English?
--
Greetings
Pete
One cannot live by television, video games, top ten CDs, and dumb
movies alone.
– Amiri Baraka, 1999
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
2009-03-04 9:11 ` Peter Dyballa
@ 2009-03-04 9:53 ` Fedor Khod'kov
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04 9:53 UTC (permalink / raw)
To: help-gnu-emacs
Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
>
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
>
> No. All files are *.txt :(
> Can't believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?
Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html
It uses external utility called "enca" to detect charset. This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--
^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>]
* Re: Coding system prefer
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:01 ` Sergio
2009-03-05 4:19 ` Miles Bader
0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-04 11:01 UTC (permalink / raw)
To: help-gnu-emacs
On 4 мар, 15:11, Peter Dyballa <Peter_Dyba...@Web.DE> wrote:
> Am 03.03.2009 um 21:55 schrieb Maze:
>> Can't believe, that Emacs don't auto-detect these encodings...
> How do you detect the encodings?
> Can you write the algorithm? I mean, not in Emacs Lisp, just in English?
The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
quite
reliably using statistics about the character frequency distribution.
The tables themselves are quite small (about 1 Kbyte); the 8-bit
encodings are language-dependent, the Unicode encodings are
autodetected in a more general way.
Here are the tables I have for Cyrillic:
,----
| c:/Program Files/FAR/Addons/Tables/Cyrillic:
| total used in directory 12 available 29790096
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 .
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 ..
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 E-Mail Double
Conversion
| -rw-rw-rw- 1 spokrovs Domain Users 1079 2005-07-04 DKOI8
(Mainframe).reg
| -rw-rw-rw- 1 spokrovs Domain Users 1063 2005-07-01 DM (Amiga).reg
| -rw-rw-rw- 1 spokrovs Domain Users 723 2005-07-04 Descript.ion
| -rw-rw-rw- 1 spokrovs Domain Users 1111 2006-02-13 Dist.Rus.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1153 2006-02-13 Dist.Ukr.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1108 2006-03-23 ISO-8859-5.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1084 2006-03-23 KOI8-R.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1016 2006-03-23 KOI8-U.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1186 2006-03-23 Macintosh
Standard.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1104 2005-07-01 RUSCII (GOST
Ukrainian).reg
| -rw-rw-rw- 1 spokrovs Domain Users 1073 2006-03-23
Windows-1251.reg
`----
I think there is a similar package in emacs, although its emphasis is
on language recognition rather then on the encoding.
--
Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-04 11:01 ` Sergio
@ 2009-03-05 4:19 ` Miles Bader
2009-03-05 8:06 ` Sergio
0 siblings, 1 reply; 13+ messages in thread
From: Miles Bader @ 2009-03-05 4:19 UTC (permalink / raw)
To: help-gnu-emacs
Sergio <sergio.pokrovskij@gmail.com> writes:
> The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
> quite reliably using statistics about the character frequency
> distribution.
Does that work for anything except text files containing prose?
-Miles
--
`Suppose Korea goes to the World Cup final against Japan and wins,' Moon said.
`All the past could be forgiven.' [NYT]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-05 4:19 ` Miles Bader
@ 2009-03-05 8:06 ` Sergio
2009-03-20 9:06 ` Maze
0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-05 8:06 UTC (permalink / raw)
To: help-gnu-emacs
On Mar 5, 10:19 am, Miles Bader <mi...@gnu.org> wrote:
> Sergio <sergio.pokrovs...@gmail.com> writes:
>> The FAR file manager,http://en.wikipedia.org/wiki/FAR_Managerdoes it
>> quite reliably using statistics about the character frequency
>> distribution.
> Does that work for anything except text files containing prose?
Yes, it does.
Of course it does not work for a binary file; but it works fine for a
text file in formal language, like C program with Russian strings or a
text with HTML markup.
I never explored the internals, but I guess that normally one can
ignore the ASCII part; only codes greater than 127 really matter. Of
these, one can easily detect utf-8 or other unicode encoding (at least
for the alphabetic planes; I never need the CJK part). And there are
8-bit codes, in which the higher part is characteristic.
And usually the noise part (like markup or formal language statements)
is in ASCII.
I never needed EBCDIC or any other encoding which is not a superset of
ASCII.
--
Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
` (2 preceding siblings ...)
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:51 ` Fedor Khod'kov
2009-03-04 18:48 ` Eli Zaretskii
4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04 11:51 UTC (permalink / raw)
To: help-gnu-emacs
Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
>
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
>
> No. All files are *.txt :(
> Can't believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?
Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html
It uses external utility called "enca" to detect charset. This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
` (3 preceding siblings ...)
2009-03-04 11:51 ` Fedor Khod'kov
@ 2009-03-04 18:48 ` Eli Zaretskii
4 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2009-03-04 18:48 UTC (permalink / raw)
To: help-gnu-emacs
> From: Maze <mazebox@gmail.com>
> Date: Tue, 3 Mar 2009 12:55:32 -0800 (PST)
>
> Can't believe, that Emacs don't auto-detect these encodings...
Apparently, Emacs developers don't see this as a serious problem.
How about if you donate a solution to it?
^ permalink raw reply [flat|nested] 13+ messages in thread