From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.devel Subject: Re: [angeli@iwi.uni-sb.de: Coding problem with Euro sign] Date: Thu, 15 Dec 2005 15:02:48 -0700 Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1134686077 29335 80.91.229.2 (15 Dec 2005 22:34:37 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 15 Dec 2005 22:34:37 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 15 23:34:34 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1En1dz-0000Ix-91 for ged-emacs-devel@m.gmane.org; Thu, 15 Dec 2005 23:32:11 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1En1ed-0001wx-Et for ged-emacs-devel@m.gmane.org; Thu, 15 Dec 2005 17:32:51 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1En1Hr-0005fr-UN for emacs-devel@gnu.org; Thu, 15 Dec 2005 17:09:20 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1En1Hr-0005fO-4T for emacs-devel@gnu.org; Thu, 15 Dec 2005 17:09:19 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1En1Hq-0005fD-Rc for emacs-devel@gnu.org; Thu, 15 Dec 2005 17:09:18 -0500 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtp (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.34) id 1En1K9-0003nR-TN for emacs-devel@gnu.org; Thu, 15 Dec 2005 17:11:42 -0500 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1En1ER-0008Ct-Ew for emacs-devel@gnu.org; Thu, 15 Dec 2005 23:05:47 +0100 Original-Received: from 207.167.42.60 ([207.167.42.60]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 15 Dec 2005 23:05:47 +0100 Original-Received: from ihs_4664 by 207.167.42.60 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 15 Dec 2005 23:05:47 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 89 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 207.167.42.60 User-Agent: Mozilla Thunderbird 0.9 (X11/20041105) X-Accept-Language: en-us, en In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:47824 Archived-At: Ralf Angeli wrote: > * Kevin Rodgers (2005-12-15) writes: > >>Ralf Angeli wrote: >> >>>* Kevin Rodgers (2005-12-14) writes: >>>>And the OP should try visiting the file with the cp1252 coding system. >>> >>>Well, the question now is if it is possible for Emacs to figure out >>>the coding system on itself with the example at hand. >> >>You could try something like this: >> >>(setq auto-coding-regexp-alist >> (cons '("[\040-\177][\200-\237]" . cp1252) >> auto-coding-regexp-alist)) >> >>I don't think that's a general purpose solution since (1) >>auto-coding-regexp-alist actually has precedence over `-*-coding:-*-' >>file variables and (2) other encodings probably use those o200 - o237 >>bytes (certainly other Microsoft Windows code pages do). > > This doesn't seem to work here. I still see the byte codes of the > 8-bit characters when opening the file after evaluating the above > form. OK, now I've actually tried that here in Emacs 21.4 running on Unix/Solaris under X. First it complained that cp1252 is an invalid coding system, so I found the "MS-DOS and MULE" Info node referenced from the "Coding Systems" node and tried `M-x codepage-setup'. It wouldn't take 1252, but a quick search in that node revealed that the right number is 850. So I tweaked the auto-coding-regexp-alist entry to use cp850 and revisited the file. Now instead of displaying the u umlaut and A circumflex characters as such in my default font's character set (iso8859-1) and the euro as "\200", Emacs displays the u umlaut as superscript 3, A circumflex as "\302", and the euro as C cedilla. I assume those display problems are because I haven't configured an Emacs fontset for the cp850 coding system. But the auto-coding-regexp-alist entry worked as intended, and you're on Windows so your fontset should be properly configured for that. One other detail: that entry only sets the coding system if the euro is immediately preceded by an ASCII character. Is that the case in your file? What does `C-h C RET' say after visiting the file? I assume you're running with multibyte characters enabled. > And a customization is actually not what I am interested in; I'd like > Emacs to figure this out by itself, out of the box. How is Emacs supposed to infer the coding system from the contents of that file? If you can come up with a suitable customization, perhaps it will be incorporated into Emacs as the default behavior. > I am not sure how common something like the case at hand is but it is > certainly not academic. And if one is working with different > operating systems or interchanging files with people working on > different operating systems the failure to detect the correct coding > could lead to people regarding Emacs as a truly inferior piece of > software. I can already hear them: "What? It displays the Euro sign > as \200? Even Notepad gets this right!" On these grounds it may > become a bit hard to convince people that Emacs is the one true > editor. Can Notepad display files in anything besides CP850/Windows-1252 and probably UTF-8 w/BOM? E.g. can it distinguish ISO 8859-1 from ISO 8859-2 from ISO 8859-15? > Anyway, I tested a bit and under Windows (surprise) every application > I tried (e.g. Notepad and OpenOffice) managed to display the file > correctly. On GNU/Linux no application got it right. I checked with > less, more, vim, nano, pico, and OpenOffice. Either "garbage" was > displayed or (in case of OpenOffice) a dialog asking the user to > specify the encoding. So it's not like Emacs isn't in good company. > Nevertheless it would be nice if Emacs got it right. Unfortunately I > lack the knowledge for judging if this is possible at all without > having to use all sorts of unreliable heuristics which are costly to > implement. Yes, Windows applications simply assumes you're using a proprietary Microsoft character set, and GNU/Linux apps prioritize support for standard character encodings. Maybe all you need is (prefer-coding-system 'cp850) -- Kevin Rodgers