unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Coding system prefer
@ 2009-03-03 11:46 Maze
  2009-03-03 14:58 ` Jason Rumney
  2009-03-03 21:44 ` Peter Dyballa
  0 siblings, 2 replies; 13+ messages in thread
From: Maze @ 2009-03-03 11:46 UTC (permalink / raw)
  To: help-gnu-emacs

Hello!
I need to work with files in cp1251 and cp866 and i want emacs
autodetect coding system. For correct file opening i try to use coding-
system-priority-list.

I do: prefer-coding-system cp866-dos
Looking C-h С :
Priority order for recognizing coding systems when reading files:
  1. cp866
  2. utf-8 (alias: mule-utf-8)
  3. iso-2022-7bit
  4. iso-2022-7bit-lock (alias: iso-2022-int-1)
etc...
Opening files in cp866 is working correct.

Next, do prefer-coding-system cp1251-dos

Result:
Priority order for recognizing coding systems when reading files:
  1. windows-1251 (alias: cp1251 windows-1251)
  2. utf-8 (alias: mule-utf-8)
  3. iso-2022-7bit
  4. iso-2022-7bit-lock (alias: iso-2022-int-1)
etc...

Opening files in cp1251 is working correct, but in cp866 is not. :(

Then i used (set-coding-system-priority 'cp866 'cp1251) with the same
effect.

How can i use both coding together?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 11:46 Coding system prefer Maze
@ 2009-03-03 14:58 ` Jason Rumney
  2009-03-03 16:18   ` Teemu Likonen
  2009-03-03 21:44 ` Peter Dyballa
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2009-03-03 14:58 UTC (permalink / raw)
  To: help-gnu-emacs

On Mar 3, 7:46 pm, Maze <maze...@gmail.com> wrote:

> I need to work with files in cp1251 and cp866
...
> How can i use both coding together?

You can't. Both these encodings are indistinguishable from binary
data, so all files will be auto-detected as whichever is first in the
priority list.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 14:58 ` Jason Rumney
@ 2009-03-03 16:18   ` Teemu Likonen
  2009-03-03 20:55     ` Maze
  0 siblings, 1 reply; 13+ messages in thread
From: Teemu Likonen @ 2009-03-03 16:18 UTC (permalink / raw)
  To: help-gnu-emacs

On 2009-03-03 06:58 (-0800), Jason Rumney wrote:

> On Mar 3, 7:46 pm, Maze <maze...@gmail.com> wrote:
>> I need to work with files in cp1251 and cp866
> ...
>> How can i use both coding together?
>
> You can't. Both these encodings are indistinguishable from binary
> data, so all files will be auto-detected as whichever is first in the
> priority list.

If the files can be detected by filename then variable auto-coding-alist
may be useful. As the name says it's an association list consisting of
items "(REGEXP . CODING-SYSTEM)". There is also the variable
auto-coding-functions which is a list of functions for determining
conding systems. For more info, see:

    C-h v auto-coding-alist RET
    C-h v auto-coding-functions RET


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 16:18   ` Teemu Likonen
@ 2009-03-03 20:55     ` Maze
  2009-03-04  9:11       ` Peter Dyballa
                         ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Maze @ 2009-03-03 20:55 UTC (permalink / raw)
  To: help-gnu-emacs

On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:

> If the files can be detected by filename then variable auto-coding-alist
> may be useful.

No. All files are *.txt :(
Can't  believe, that Emacs don't auto-detect these encodings...
May be I can use any external package for Emacs to resolve this
problem?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 11:46 Coding system prefer Maze
  2009-03-03 14:58 ` Jason Rumney
@ 2009-03-03 21:44 ` Peter Dyballa
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Dyballa @ 2009-03-03 21:44 UTC (permalink / raw)
  To: Maze; +Cc: help-gnu-emacs


Am 03.03.2009 um 12:46 schrieb Maze:

> How can i use both coding together?

By using file local variables à la

	<comment> -*- mode: LaTeX; coding: utf-8; -*-

or, when you're lucky, associating particular encodings with file  
name extensions.

--
Greetings

   Pete

Every instructor assumes that you have nothing else to do except  
study for that instructor's course.
				– Fourth Law of Applied Terror





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 20:55     ` Maze
@ 2009-03-04  9:11       ` Peter Dyballa
  2009-03-04  9:53       ` Fedor Khod'kov
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Peter Dyballa @ 2009-03-04  9:11 UTC (permalink / raw)
  To: Maze; +Cc: help-gnu-emacs


Am 03.03.2009 um 21:55 schrieb Maze:

> Can't  believe, that Emacs don't auto-detect these encodings...

How do you detect the encodings?

Can you write the algorithm? I mean, not in Emacs Lisp, just in English?

--
Greetings

   Pete

One cannot live by television, video games, top ten CDs, and dumb  
movies alone.
				– Amiri Baraka, 1999







^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 20:55     ` Maze
  2009-03-04  9:11       ` Peter Dyballa
@ 2009-03-04  9:53       ` Fedor Khod'kov
       [not found]       ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04  9:53 UTC (permalink / raw)
  To: help-gnu-emacs

Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
> 
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
> 
> No. All files are *.txt :(
> Can't  believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?

Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html

It uses external utility called "enca" to detect charset.  This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
       [not found]       ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:01         ` Sergio
  2009-03-05  4:19           ` Miles Bader
  0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-04 11:01 UTC (permalink / raw)
  To: help-gnu-emacs

On 4 мар, 15:11, Peter Dyballa <Peter_Dyba...@Web.DE> wrote:
> Am 03.03.2009 um 21:55 schrieb Maze:

>> Can't  believe, that Emacs don't auto-detect these encodings...

> How do you detect the encodings?
> Can you write the algorithm? I mean, not in Emacs Lisp, just in English?

The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
quite
reliably using statistics about the character frequency distribution.
The tables themselves are quite small (about 1 Kbyte); the 8-bit
encodings are language-dependent, the Unicode encodings are
autodetected in a more general way.

Here are the tables I have for Cyrillic:

,----
| c:/Program Files/FAR/Addons/Tables/Cyrillic:
| total used in directory 12 available 29790096
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 .
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 ..
| drwxrwxrwx  1 spokrovs Domain Users    0 09-22 19:24 E-Mail Double
Conversion
| -rw-rw-rw-  1 spokrovs Domain Users 1079 2005-07-04  DKOI8
(Mainframe).reg
| -rw-rw-rw-  1 spokrovs Domain Users 1063 2005-07-01  DM (Amiga).reg
| -rw-rw-rw-  1 spokrovs Domain Users  723 2005-07-04  Descript.ion
| -rw-rw-rw-  1 spokrovs Domain Users 1111 2006-02-13  Dist.Rus.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1153 2006-02-13  Dist.Ukr.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1108 2006-03-23  ISO-8859-5.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1084 2006-03-23  KOI8-R.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1016 2006-03-23  KOI8-U.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1186 2006-03-23  Macintosh
Standard.reg
| -rw-rw-rw-  1 spokrovs Domain Users 1104 2005-07-01  RUSCII (GOST
Ukrainian).reg
| -rw-rw-rw-  1 spokrovs Domain Users 1073 2006-03-23
Windows-1251.reg
`----

I think there is a similar package in emacs, although its emphasis is
on language recognition rather then on the encoding.

--
Sergei



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 20:55     ` Maze
                         ` (2 preceding siblings ...)
       [not found]       ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
@ 2009-03-04 11:51       ` Fedor Khod'kov
  2009-03-04 18:48       ` Eli Zaretskii
  4 siblings, 0 replies; 13+ messages in thread
From: Fedor Khod'kov @ 2009-03-04 11:51 UTC (permalink / raw)
  To: help-gnu-emacs

Tue, Mar 03, 2009 at 12:55:32PM -0800, Maze wrote:
> On 3 мар, 19:18, Teemu Likonen <tliko...@iki.fi> wrote:
> 
> > If the files can be detected by filename then variable auto-coding-alist
> > may be useful.
> 
> No. All files are *.txt :(
> Can't  believe, that Emacs don't auto-detect these encodings...
> May be I can use any external package for Emacs to resolve this
> problem?

Probably this can help:
auto-enca.el --- automatically detect coding system with Enca
http://www.archivum.info/gnu.emacs.sources/2007-06/msg00035.html

It uses external utility called "enca" to detect charset.  This
utility can be found here:
http://packages.debian.org/source/lenny/enca
--




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-03 20:55     ` Maze
                         ` (3 preceding siblings ...)
  2009-03-04 11:51       ` Fedor Khod'kov
@ 2009-03-04 18:48       ` Eli Zaretskii
  4 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2009-03-04 18:48 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Maze <mazebox@gmail.com>
> Date: Tue, 3 Mar 2009 12:55:32 -0800 (PST)
> 
> Can't  believe, that Emacs don't auto-detect these encodings...

Apparently, Emacs developers don't see this as a serious problem.

How about if you donate a solution to it?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-04 11:01         ` Sergio
@ 2009-03-05  4:19           ` Miles Bader
  2009-03-05  8:06             ` Sergio
  0 siblings, 1 reply; 13+ messages in thread
From: Miles Bader @ 2009-03-05  4:19 UTC (permalink / raw)
  To: help-gnu-emacs

Sergio <sergio.pokrovskij@gmail.com> writes:
> The FAR file manager, http://en.wikipedia.org/wiki/FAR_Manager does it
> quite reliably using statistics about the character frequency
> distribution.

Does that work for anything except text files containing prose?

-Miles

-- 
`Suppose Korea goes to the World Cup final against Japan and wins,' Moon said.
`All the past could be forgiven.'   [NYT]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-05  4:19           ` Miles Bader
@ 2009-03-05  8:06             ` Sergio
  2009-03-20  9:06               ` Maze
  0 siblings, 1 reply; 13+ messages in thread
From: Sergio @ 2009-03-05  8:06 UTC (permalink / raw)
  To: help-gnu-emacs

On Mar 5, 10:19 am, Miles Bader <mi...@gnu.org> wrote:
> Sergio <sergio.pokrovs...@gmail.com> writes:
>> The FAR file manager,http://en.wikipedia.org/wiki/FAR_Managerdoes it
>> quite reliably using statistics about the character frequency
>> distribution.

> Does that work for anything except text files containing prose?

Yes, it does.

Of course it does not work for a binary file; but it works fine for a
text file in formal language, like C program with Russian strings or a
text with HTML markup.

I never explored the internals, but I guess that normally one can
ignore the ASCII part; only codes greater than 127 really matter.  Of
these, one can easily detect utf-8 or other unicode encoding (at least
for the alphabetic planes; I never need the CJK part).  And there are
8-bit codes, in which the higher part is characteristic.

And usually the noise part (like markup or formal language statements)
is in ASCII.

I never needed EBCDIC or any other encoding which is not a superset of
ASCII.

--
Sergei



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Coding system prefer
  2009-03-05  8:06             ` Sergio
@ 2009-03-20  9:06               ` Maze
  0 siblings, 0 replies; 13+ messages in thread
From: Maze @ 2009-03-20  9:06 UTC (permalink / raw)
  To: help-gnu-emacs

Thanks to All!


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-20  9:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-03 11:46 Coding system prefer Maze
2009-03-03 14:58 ` Jason Rumney
2009-03-03 16:18   ` Teemu Likonen
2009-03-03 20:55     ` Maze
2009-03-04  9:11       ` Peter Dyballa
2009-03-04  9:53       ` Fedor Khod'kov
     [not found]       ` <mailman.2324.1236157912.31690.help-gnu-emacs@gnu.org>
2009-03-04 11:01         ` Sergio
2009-03-05  4:19           ` Miles Bader
2009-03-05  8:06             ` Sergio
2009-03-20  9:06               ` Maze
2009-03-04 11:51       ` Fedor Khod'kov
2009-03-04 18:48       ` Eli Zaretskii
2009-03-03 21:44 ` Peter Dyballa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).