* Coding system prefer
@ 2009-03-03 11:46 Maze
2009-03-03 14:58 ` Jason Rumney
2009-03-03 21:44 ` Peter Dyballa
0 siblings, 2 replies; 13+ messages in thread
From: Maze @ 2009-03-03 11:46 UTC (permalink / raw)
To: help-gnu-emacs
Hello!
I need to work with files in cp1251 and cp866 and i want emacs
autodetect coding system. For correct file opening i try to use coding-
system-priority-list.
I do: prefer-coding-system cp866-dos
Looking C-h С :
Priority order for recognizing coding systems when reading files:
1. cp866
2. utf-8 (alias: mule-utf-8)
3. iso-2022-7bit
4. iso-2022-7bit-lock (alias: iso-2022-int-1)
etc...
Opening files in cp866 is working correct.
Next, do prefer-coding-system cp1251-dos
Result:
Priority order for recognizing coding systems when reading files:
1. windows-1251 (alias: cp1251 windows-1251)
2. utf-8 (alias: mule-utf-8)
3. iso-2022-7bit
4. iso-2022-7bit-lock (alias: iso-2022-int-1)
etc...
Opening files in cp1251 is working correct, but in cp866 is not. :(
Then i used (set-coding-system-priority 'cp866 'cp1251) with the same
effect.
How can i use both coding together?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 11:46 Coding system prefer Maze
@ 2009-03-03 14:58 ` Jason Rumney
2009-03-03 16:18 ` Teemu Likonen
2009-03-03 21:44 ` Peter Dyballa
1 sibling, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2009-03-03 14:58 UTC (permalink / raw)
To: help-gnu-emacs
On Mar 3, 7:46 pm, Maze <maze...@gmail.com> wrote:
> I need to work with files in cp1251 and cp866
...
> How can i use both coding together?
You can't. Both these encodings are indistinguishable from binary
data, so all files will be auto-detected as whichever is first in the
priority list.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 14:58 ` Jason Rumney
@ 2009-03-03 16:18 ` Teemu Likonen
2009-03-03 20:55 ` Maze
0 siblings, 1 reply; 13+ messages in thread
From: Teemu Likonen @ 2009-03-03 16:18 UTC (permalink / raw)
To: help-gnu-emacs
On 2009-03-03 06:58 (-0800), Jason Rumney wrote:
> On Mar 3, 7:46 pm, Maze <maze...@gmail.com> wrote:
>> I need to work with files in cp1251 and cp866
> ...
>> How can i use both coding together?
>
> You can't. Both these encodings are indistinguishable from binary
> data, so all files will be auto-detected as whichever is first in the
> priority list.
If the files can be detected by filename then variable auto-coding-alist
may be useful. As the name says it's an association list consisting of
items "(REGEXP . CODING-SYSTEM)". There is also the variable
auto-coding-functions which is a list of functions for determining
conding systems. For more info, see:
C-h v auto-coding-alist RET
C-h v auto-coding-functions RET
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 16:18 ` Teemu Likonen
@ 2009-03-03 20:55 ` Maze
2009-03-04 9:11 ` Peter Dyballa
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Maze @ 2009-03-03 20:55 UTC (permalink / raw)
To: help-gnu-emacs
On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
> If the files can be detected by filename then variable auto-coding-alist
> may be useful.
No. All files are *.txt :(
Can't believe, that Emacs don't auto-detect these encodings...
May be I can use any external package for Emacs to resolve this
problem?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 11:46 Coding system prefer Maze
2009-03-03 14:58 ` Jason Rumney
@ 2009-03-03 21:44 ` Peter Dyballa
1 sibling, 0 replies; 13+ messages in thread
From: Peter Dyballa @ 2009-03-03 21:44 UTC (permalink / raw)
To: Maze; +Cc: help-gnu-emacs
Am 03.03.2009 um 12:46 schrieb Maze:
> How can i use both coding together?
By using file local variables à la
<comment> -*- mode: LaTeX; coding: utf-8; -*-
or, when you're lucky, associating particular encodings with file
name extensions.
--
Greetings
Pete
Every instructor assumes that you have nothing else to do except
study for that instructor's course.
– Fourth Law of Applied Terror
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
@ 2009-03-04 9:11 ` Peter Dyballa
2009-03-04 9:53 ` Fedor Khod'kov
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Peter Dyballa @ 2009-03-04 9:11 UTC (permalink / raw)
To: Maze; +Cc: help-gnu-emacs
Am 03.03.2009 um 21:55 schrieb Maze:
> Can't believe, that Emacs don't auto-detect these encodings...
How do you detect the encodings?
Can you write the algorithm? I mean, not in Emacs Lisp, just in English?
--
Greetings
Pete
One cannot live by television, video games, top ten CDs, and dumb
movies alone.
– Amiri Baraka, 1999
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
2009-03-04 9:11 ` Peter Dyballa
@ 2009-03-04 9:53 ` Fedor Khod'kov
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04 9:53 UTC (permalink / raw)
To: help-gnu-emacs
Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
>
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
>
> No. All files are *.txt :(
> Can't believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?
Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html
It uses external utility called "enca" to detect charset. This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:01 ` Sergio
2009-03-05 4:19 ` Miles Bader
0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-04 11:01 UTC (permalink / raw)
To: help-gnu-emacs
On 4 мар, 15:11, Peter Dyballa <Peter_Dyba...@Web.DE> wrote:
> Am 03.03.2009 um 21:55 schrieb Maze:
>> Can't believe, that Emacs don't auto-detect these encodings...
> How do you detect the encodings?
> Can you write the algorithm? I mean, not in Emacs Lisp, just in English?
The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
quite
reliably using statistics about the character frequency distribution.
The tables themselves are quite small (about 1 Kbyte); the 8-bit
encodings are language-dependent, the Unicode encodings are
autodetected in a more general way.
Here are the tables I have for Cyrillic:
,----
| c:/Program Files/FAR/Addons/Tables/Cyrillic:
| total used in directory 12 available 29790096
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 .
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 ..
| drwxrwxrwx 1 spokrovs Domain Users 0 09-22 19:24 E-Mail Double
Conversion
| -rw-rw-rw- 1 spokrovs Domain Users 1079 2005-07-04 DKOI8
(Mainframe).reg
| -rw-rw-rw- 1 spokrovs Domain Users 1063 2005-07-01 DM (Amiga).reg
| -rw-rw-rw- 1 spokrovs Domain Users 723 2005-07-04 Descript.ion
| -rw-rw-rw- 1 spokrovs Domain Users 1111 2006-02-13 Dist.Rus.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1153 2006-02-13 Dist.Ukr.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1108 2006-03-23 ISO-8859-5.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1084 2006-03-23 KOI8-R.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1016 2006-03-23 KOI8-U.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1186 2006-03-23 Macintosh
Standard.reg
| -rw-rw-rw- 1 spokrovs Domain Users 1104 2005-07-01 RUSCII (GOST
Ukrainian).reg
| -rw-rw-rw- 1 spokrovs Domain Users 1073 2006-03-23
Windows-1251.reg
`----
I think there is a similar package in emacs, although its emphasis is
on language recognition rather then on the encoding.
--
Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
` (2 preceding siblings ...)
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:51 ` Fedor Khod'kov
2009-03-04 18:48 ` Eli Zaretskii
4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04 11:51 UTC (permalink / raw)
To: help-gnu-emacs
Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
>
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
>
> No. All files are *.txt :(
> Can't believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?
Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html
It uses external utility called "enca" to detect charset. This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-03 20:55 ` Maze
` (3 preceding siblings ...)
2009-03-04 11:51 ` Fedor Khod'kov
@ 2009-03-04 18:48 ` Eli Zaretskii
4 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2009-03-04 18:48 UTC (permalink / raw)
To: help-gnu-emacs
> From: Maze <mazebox@gmail.com>
> Date: Tue, 3 Mar 2009 12:55:32 -0800 (PST)
>
> Can't believe, that Emacs don't auto-detect these encodings...
Apparently, Emacs developers don't see this as a serious problem.
How about if you donate a solution to it?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-04 11:01 ` Sergio
@ 2009-03-05 4:19 ` Miles Bader
2009-03-05 8:06 ` Sergio
0 siblings, 1 reply; 13+ messages in thread
From: Miles Bader @ 2009-03-05 4:19 UTC (permalink / raw)
To: help-gnu-emacs
Sergio <sergio.pokrovskij@gmail.com> writes:
> The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
> quite reliably using statistics about the character frequency
> distribution.
Does that work for anything except text files containing prose?
-Miles
--
`Suppose Korea goes to the World Cup final against Japan and wins,' Moon said.
`All the past could be forgiven.' [NYT]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-05 4:19 ` Miles Bader
@ 2009-03-05 8:06 ` Sergio
2009-03-20 9:06 ` Maze
0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-05 8:06 UTC (permalink / raw)
To: help-gnu-emacs
On Mar 5, 10:19 am, Miles Bader <mi...@gnu.org> wrote:
> Sergio <sergio.pokrovs...@gmail.com> writes:
>> The FAR file manager,http://en.wikipedia.org/wiki/FAR_Managerdoes it
>> quite reliably using statistics about the character frequency
>> distribution.
> Does that work for anything except text files containing prose?
Yes, it does.
Of course it does not work for a binary file; but it works fine for a
text file in formal language, like C program with Russian strings or a
text with HTML markup.
I never explored the internals, but I guess that normally one can
ignore the ASCII part; only codes greater than 127 really matter. Of
these, one can easily detect utf-8 or other unicode encoding (at least
for the alphabetic planes; I never need the CJK part). And there are
8-bit codes, in which the higher part is characteristic.
And usually the noise part (like markup or formal language statements)
is in ASCII.
I never needed EBCDIC or any other encoding which is not a superset of
ASCII.
--
Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Coding system prefer
2009-03-05 8:06 ` Sergio
@ 2009-03-20 9:06 ` Maze
0 siblings, 0 replies; 13+ messages in thread
From: Maze @ 2009-03-20 9:06 UTC (permalink / raw)
To: help-gnu-emacs
Thanks to All!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-20 9:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-03 11:46 Coding system prefer Maze
2009-03-03 14:58 ` Jason Rumney
2009-03-03 16:18 ` Teemu Likonen
2009-03-03 20:55 ` Maze
2009-03-04 9:11 ` Peter Dyballa
2009-03-04 9:53 ` Fedor Khod'kov
[not found] ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
2009-03-04 11:01 ` Sergio
2009-03-05 4:19 ` Miles Bader
2009-03-05 8:06 ` Sergio
2009-03-20 9:06 ` Maze
2009-03-04 11:51 ` Fedor Khod'kov
2009-03-04 18:48 ` Eli Zaretskii
2009-03-03 21:44 ` Peter Dyballa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).