From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Automatic recognition of some specific coding systems Date: Thu, 26 Feb 2015 18:36:04 +0200 Message-ID: <83ioeo6363.fsf@gnu.org> References: <83fv9v6u5o.fsf@gnu.org> <83twya55h9.fsf@gnu.org> <83mw4168ha.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1424968586 12692 80.91.229.3 (26 Feb 2015 16:36:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Feb 2015 16:36:26 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Feb 26 17:36:14 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YR1Q3-0007EV-80 for geh-help-gnu-emacs@m.gmane.org; Thu, 26 Feb 2015 17:36:11 +0100 Original-Received: from localhost ([::1]:59807 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR1Q2-0004Ne-BT for geh-help-gnu-emacs@m.gmane.org; Thu, 26 Feb 2015 11:36:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42984) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR1Po-0004GL-1N for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 11:35:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YR1Pj-0003lK-3l for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 11:35:55 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:44697) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR1Pi-0003kz-OO for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 11:35:51 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NKE0000008UC700@a-mtaout20.012.net.il> for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 18:35:49 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NKE000DF0RODW10@a-mtaout20.012.net.il> for help-gnu-emacs@gnu.org; Thu, 26 Feb 2015 18:35:49 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:102905 Archived-At: > From: J=FCrgen Hartmann > Date: Thu, 26 Feb 2015 00:23:50 +0100 >=20 > > Try this: > >=20 > >=A0=A0 (set-coding-system-priority 'utf-8 'cp850) > =A0 > After doing this, the coding systems >=20 > =A0=A0 utf-8 > =A0=A0 cp850 >=20 > get correctly recognized, but >=20 > =A0=A0 latin-9-unix >=20 > gets wrongly recognized as cp850-unix encoded. >=20 > If I modify the lisp expression to >=20 > =A0=A0 (set-coding-system-priority 'utf-8 'latin-9) >=20 > it is utf-8 and latin-9 that are properly recognized while the test > file >=20 > =A0=A0 cp850-dos >=20 > gets detected as iso-latin-9-dos encoded. I feared that might be the result. > If I pass all three coding systems to set-coding-system-priority, >=20 > =A0=A0 (set-coding-system-priority 'utf-8 'latin-9 'cp850)=A0=A0 or > =A0=A0 (set-coding-system-priority 'utf-8 'cp850 'latin-9) >=20 > it turns out that the function set-coding-system-priority ignores t= he third > coding system in these cases, because it belongs to the same coding > category as the coding system named in the second place. The source > code src/coding.c comments this in the lines 9972 and 9973 like thi= s: >=20 > =A0=A0=A0 /* Ignore this coding system because a coding system of t= he > =A0=A0=A0 =A0=A0 same category already had a higher priority.=A0 */ Yes, I know. That's why I only mentioned 2 of them. It looks like what you want is beyond the current capabilities of Emacs's auto-detection of encoding. See below for some alternatives. Having said that... > By the way, could you verify, that this is possible with Emacs 22.3 > with the customization described in my previous post? =2E..no, it doesn't work for me. The latin-9 file is decoded using m= y locale's encoding (which isn't latin-9), and cp850 file is still raw-text. So I think some other factor(s) is/are at work on your system. Your locale's encoding is certainly one of them, but I think there should be something else, either in your customizations or somewhere else. In general, even if Emacs 22.3 was capable to do the job, I think it was by sheer luck, and is anyway fragile, since the same customizations don't work for me (and AFAIU, aren't supposed to work)= . So I would suggest to explore alternative ways of doing this in Emacs 24 reliably. Some possibilities you may wish to explore: . Put a 'coding: cp850' cookie in the cp850 files . If the names of the cp850 files all match some common pattern, yo= u can use modify-coding-system-alist to tell Emacs to decode them b= y cp850 . Similarly, if the cp850 files' contents match some common regexp, you can customize auto-coding-regexp-alist to force their decodin= g by cp850 Of course, you can always turn the table, and do the above for latin-9, while keeping cp850 in set-coding-system-priority call. It all depends which one of these 2 lends itself better to one of these methods. I believe that if one of these alternatives can do the job for you, the result will be much more reliable.