unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* recognizing coding systems
@ 2004-11-03  0:56 Alexandru Cardaniuc
  0 siblings, 0 replies; 2+ messages in thread
From: Alexandru Cardaniuc @ 2004-11-03  0:56 UTC (permalink / raw)


Hi All!

The emacs manual says:

"   The priority list of coding systems depends on the selected language
environment (*note Language Environments::).  For example, if you use
French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use
Czech, you probably want Latin-2 to be preferred.  This is one of the
reasons to specify a language environment.
   However, you can alter the priority list in detail with the command
`M-x prefer-coding-system'.  This command reads the name of a coding
system from the minibuffer, and adds it to the front of the priority
list, so that it is preferred to all others.  If you use this command
several times, each use adds one element to the front of the priority
list."

I added these lines to my .emacs file:

(prefer-coding-system 'koi8-r)
(prefer-coding-system 'cp866)
(prefer-coding-system 'cp1251)

after I run the command describe-coding-system I get this:

-------------------------------------
Coding system for saving this buffer:
  - -- undecided-unix
Default coding system (for new files):
  D -- cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 windows-cp1251 win-1251 win-cp1251)
Coding system for keyboard input:
  nil
Coding system for terminal output:
  1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
Defaults for subprocess I/O:
  decoding: D -- cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 windows-cp1251 win-1251 win-cp1251)
  encoding: D -- cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 windows-cp1251 win-1251 win-cp1251)

Priority order for recognizing coding systems when reading files:
  1. cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 windows-cp1251 win-1251 win-cp1251)
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. iso-2022-jp (alias: junet)
  4. iso-2022-7bit 
  5. iso-2022-7bit-lock (alias: iso-2022-int-1)
  6. iso-2022-8bit-ss2 
  7. emacs-mule 
  8. raw-text 
  9. japanese-shift-jis (alias: shift_jis sjis)
  10. chinese-big5 (alias: big5 cn-big5)
  11. no-conversion (alias: binary)
  12. mule-utf-8 (alias: utf-8)

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.
-----------------------------------------

Only the last prefer-coding-system command appears on the priority
list for recognizing coding systems. Am I doing something wrong?

other coding systems related commands in my .emacs file are:
--------------------------------------------------------
(codepage-setup 866)
(codepage-setup 1251)

(define-coding-system-alias 'windows-1251 'cp1251)
(define-coding-system-alias 'microsoft-1251 'cp1251)
(define-coding-system-alias 'microsoft-cp1251 'cp1251)
(define-coding-system-alias 'windows-cp1251 'cp1251)
(define-coding-system-alias 'win-1251 'cp1251)
(define-coding-system-alias 'win-cp1251 'cp1251)
(define-coding-system-alias 'koi8-u 'cyrillic-koi8)
(define-coding-system-alias 'KOI8-R 'cyrillic-koi8)
(define-coding-system-alias 'koi8 'cyrillic-koi8)

;; selecting language environment
(set-language-environment 'Cyrillic-KOI8)
---------------------------------------------------------

Using emacs for Windows 21.3

-- 
Sincerely yours,
Alexandru Cardaniuc

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: recognizing coding systems
       [not found] <mailman.308.1099443916.8225.help-gnu-emacs@gnu.org>
@ 2004-11-06  8:45 ` Oliver Scholz
  0 siblings, 0 replies; 2+ messages in thread
From: Oliver Scholz @ 2004-11-06  8:45 UTC (permalink / raw)


Alexandru Cardaniuc <kodea@mail.ru> writes:

[...]
> The emacs manual says:
>
[...]
>    However, you can alter the priority list in detail with the command
> `M-x prefer-coding-system'.  This command reads the name of a coding
> system from the minibuffer, and adds it to the front of the priority
> list, so that it is preferred to all others.  If you use this command
> several times, each use adds one element to the front of the priority
> list."
>
> I added these lines to my .emacs file:
>
> (prefer-coding-system 'koi8-r)
> (prefer-coding-system 'cp866)
> (prefer-coding-system 'cp1251)
>
> after I run the command describe-coding-system I get this:
>
> -------------------------------------
[...]
> Priority order for recognizing coding systems when reading files:
>   1. cp1251 (alias: windows-1251 microsoft-1251 microsoft-cp1251 windows-cp1251 win-1251 win-cp1251)
>   2. iso-latin-1 (alias: iso-8859-1 latin-1)
>   3. iso-2022-jp (alias: junet)
>   4. iso-2022-7bit 
>   5. iso-2022-7bit-lock (alias: iso-2022-int-1)
>   6. iso-2022-8bit-ss2 
>   7. emacs-mule 
>   8. raw-text 
>   9. japanese-shift-jis (alias: shift_jis sjis)
>   10. chinese-big5 (alias: big5 cn-big5)
>   11. no-conversion (alias: binary)
>   12. mule-utf-8 (alias: utf-8)
>
>   Other coding systems cannot be distinguished automatically
>   from these, and therefore cannot be recognized automatically
>   with the present coding system priorities.
> -----------------------------------------
>
> Only the last prefer-coding-system command appears on the priority
> list for recognizing coding systems. Am I doing something wrong?
[...]

Well, you encountered a rather special case.

Short answer: the three encodings you pass to `prefer-coding-system'
in turn are special in that for these--but not for
others!---say, `(prefer-coding-system 'cp1251)' /overrides/
`(prefer-coding-system 'cp866)'.  Usually Emacs behaves as the manual
says; here you encountered an exception.

Long answer: The priority of coding systems is determined by a
variable `coding-category-list'[1].  Each coding system belongs to a
coding category.  For example, the coding system `iso-latin-1' belongs
to the category named `coding-category-iso-8-1'. You can determine the
category of a coding system by calling the function
`coding-system-category':

(coding-system-category 'iso-latin-1) ==> coding-category-iso-8-1

Look at the value of `coding-category-list' (C-h v).  After
`(prefer-coding-system 'cp1251)' the first element of this list is
`coding-category-ccl'; this is the category which is tried first, when
it comes to decode text.  Each category symbol in turn is bound to the
name of a coding system.  This is also done by `prefer-coding-system'.

When Emacs decides which coding system to use, it tries each category
in turn; for each category it looks up the coding-system bound to it
and checks whether it may be used to decode the characters in
question[2].  If you do `C-h v coding-category-ccl', you'll see that it
is bound to `cp1251'.

Now the problem which surprised you: /each/ of the three encodings you
pass to `prefer-coding-system' belongs to the category
`coding-category-ccl'.  This has the effect that---unlike with coding
systems which belong to different categories---each call to
`prefer-coding-system' /overrides/ the former.

---

BTW, even if that were different, I wonder whether you'd see any
effect.  I'd guess (though I don't /know/) that cp1251 contains a
fairly large amount of valid characters.  

Or are there any documents encoded in koi8-r containing characters
which are not valid in cp1251?

    Oliver


Footnotes: 
[1] More precisely: for /decoding/ it is determined by a C data
structure which is initialized based on `coding-category-list' by the
function `set-coding-priority-internal'

[2] This happens in the C function detect_coding_mask, called from
detect_coding, called from Finsert_file_contents. For
coding-category-ccl, for instance, detect_coding_mask calls
detect_coding_ccl.



-- 
16 Brumaire an 213 de la Révolution
Liberté, Egalité, Fraternité!

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-11-06  8:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.308.1099443916.8225.help-gnu-emacs@gnu.org>
2004-11-06  8:45 ` recognizing coding systems Oliver Scholz
2004-11-03  0:56 Alexandru Cardaniuc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).