all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Automatic recognition of some specific coding systems
Date: Thu, 26 Feb 2015 18:36:04 +0200	[thread overview]
Message-ID: <83ioeo6363.fsf@gnu.org> (raw)
In-Reply-To: <DUB124-W1891E783BC3ED13641049EA8170@phx.gbl>

> From: Jürgen Hartmann <juergen_hartmann_@hotmail.com>
> Date: Thu, 26 Feb 2015 00:23:50 +0100
> 
> > Try this:
> > 
> >   (set-coding-system-priority 'utf-8 'cp850)
>  
> After doing this, the coding systems
> 
>    utf-8
>    cp850
> 
> get correctly recognized, but
> 
>    latin-9-unix
> 
> gets wrongly recognized as cp850-unix encoded.
> 
> If I modify the lisp expression to
> 
>    (set-coding-system-priority 'utf-8 'latin-9)
> 
> it is utf-8 and latin-9 that are properly recognized while the test
> file
> 
>    cp850-dos
> 
> gets detected as iso-latin-9-dos encoded.

I feared that might be the result.

> If I pass all three coding systems to set-coding-system-priority,
> 
>    (set-coding-system-priority 'utf-8 'latin-9 'cp850)   or
>    (set-coding-system-priority 'utf-8 'cp850 'latin-9)
> 
> it turns out that the function set-coding-system-priority ignores the third
> coding system in these cases, because it belongs to the same coding
> category as the coding system named in the second place. The source
> code src/coding.c comments this in the lines 9972 and 9973 like this:
> 
>     /* Ignore this coding system because a coding system of the
>        same category already had a higher priority.  */

Yes, I know.  That's why I only mentioned 2 of them.

It looks like what you want is beyond the current capabilities of
Emacs's auto-detection of encoding.  See below for some alternatives.

Having said that...

> By the way, could you verify, that this is possible with Emacs 22.3
> with the customization described in my previous post?

...no, it doesn't work for me.  The latin-9 file is decoded using my
locale's encoding (which isn't latin-9), and cp850 file is still
raw-text.

So I think some other factor(s) is/are at work on your system.  Your
locale's encoding is certainly one of them, but I think there should
be something else, either in your customizations or somewhere else.

In general, even if Emacs 22.3 was capable to do the job, I think it
was by sheer luck, and is anyway fragile, since the same
customizations don't work for me (and AFAIU, aren't supposed to work).
So I would suggest to explore alternative ways of doing this in Emacs
24 reliably.  Some possibilities you may wish to explore:

  . Put a 'coding: cp850' cookie in the cp850 files

  . If the names of the cp850 files all match some common pattern, you
    can use modify-coding-system-alist to tell Emacs to decode them by
    cp850

  . Similarly, if the cp850 files' contents match some common regexp,
    you can customize auto-coding-regexp-alist to force their decoding
    by cp850

Of course, you can always turn the table, and do the above for
latin-9, while keeping cp850 in set-coding-system-priority call.  It
all depends which one of these 2 lends itself better to one of these
methods.

I believe that if one of these alternatives can do the job for you,
the result will be much more reliable.




  reply	other threads:[~2015-02-26 16:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24 15:31 Automatic recognition of some specific coding systems Jürgen Hartmann
2015-02-24 18:28 ` Eli Zaretskii
2015-02-24 22:30   ` Jürgen Hartmann
2015-02-25 16:19     ` Eli Zaretskii
2015-02-25 17:53       ` Jürgen Hartmann
2015-02-25 20:29         ` Eli Zaretskii
2015-02-25 23:23           ` Jürgen Hartmann
2015-02-26 16:36             ` Eli Zaretskii [this message]
2015-02-26 22:34               ` Jürgen Hartmann
2015-02-28 16:55                 ` Eli Zaretskii
2015-03-03 22:58                   ` Jürgen Hartmann
2015-02-27  1:50 ` Yuri Khan
2015-02-27 12:12   ` Jürgen Hartmann
2015-02-27 12:25     ` Jürgen Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ioeo6363.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.