unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kevin Rodgers <ihs_4664@yahoo.com>
Subject: Re: [angeli@iwi.uni-sb.de: Coding problem with Euro sign]
Date: Thu, 15 Dec 2005 15:02:48 -0700	[thread overview]
Message-ID: <dnsp6c$mg2$1@sea.gmane.org> (raw)
In-Reply-To: <dns53n$dk4$1@sea.gmane.org>

Ralf Angeli wrote:
 > * Kevin Rodgers (2005-12-15) writes:
 >
 >>Ralf Angeli wrote:
 >>
 >>>* Kevin Rodgers (2005-12-14) writes:
 >>>>And the OP should try visiting the file with the cp1252 coding system.
 >>>
 >>>Well, the question now is if it is possible for Emacs to figure out
 >>>the coding system on itself with the example at hand.
 >>
 >>You could try something like this:
 >>
 >>(setq auto-coding-regexp-alist
 >>       (cons '("[\040-\177][\200-\237]" . cp1252)
 >>             auto-coding-regexp-alist))
 >>
 >>I don't think that's a general purpose solution since (1)
 >>auto-coding-regexp-alist actually has precedence over `-*-coding:-*-'
 >>file variables and (2) other encodings probably use those o200 - o237
 >>bytes (certainly other Microsoft Windows code pages do).
 >
 > This doesn't seem to work here.  I still see the byte codes of the
 > 8-bit characters when opening the file after evaluating the above
 > form.

OK, now I've actually tried that here in Emacs 21.4 running on
Unix/Solaris under X.  First it complained that cp1252 is an invalid
coding system, so I found the "MS-DOS and MULE" Info node referenced
from the "Coding Systems" node and tried `M-x codepage-setup'.  It
wouldn't take 1252, but a quick search in that node revealed that the
right number is 850.

So I tweaked the auto-coding-regexp-alist entry to use cp850 and
revisited the file.  Now instead of displaying the u umlaut and A
circumflex characters as such in my default font's character set
(iso8859-1) and the euro as "\200", Emacs displays the u umlaut as
superscript 3, A circumflex as "\302", and the euro as C cedilla.

I assume those display problems are because I haven't configured an
Emacs fontset for the cp850 coding system.  But the
auto-coding-regexp-alist entry worked as intended, and you're on
Windows so your fontset should be properly configured for that.

One other detail: that entry only sets the coding system if the euro
is immediately preceded by an ASCII character.  Is that the case in
your file?  What does `C-h C RET' say after visiting the file?

I assume you're running with multibyte characters enabled.

 > And a customization is actually not what I am interested in; I'd like
 > Emacs to figure this out by itself, out of the box.

How is Emacs supposed to infer the coding system from the contents of
that file?  If you can come up with a suitable customization, perhaps
it will be incorporated into Emacs as the default behavior.

 > I am not sure how common something like the case at hand is but it is
 > certainly not academic.  And if one is working with different
 > operating systems or interchanging files with people working on
 > different operating systems the failure to detect the correct coding
 > could lead to people regarding Emacs as a truly inferior piece of
 > software.  I can already hear them: "What?  It displays the Euro sign
 > as \200?  Even Notepad gets this right!"  On these grounds it may
 > become a bit hard to convince people that Emacs is the one true
 > editor.

Can Notepad display files in anything besides CP850/Windows-1252 and
probably UTF-8 w/BOM?  E.g. can it distinguish ISO 8859-1 from ISO
8859-2 from ISO 8859-15?

 > Anyway, I tested a bit and under Windows (surprise) every application
 > I tried (e.g. Notepad and OpenOffice) managed to display the file
 > correctly.  On GNU/Linux no application got it right.  I checked with
 > less, more, vim, nano, pico, and OpenOffice.  Either "garbage" was
 > displayed or (in case of OpenOffice) a dialog asking the user to
 > specify the encoding.  So it's not like Emacs isn't in good company.
 > Nevertheless it would be nice if Emacs got it right.  Unfortunately I
 > lack the knowledge for judging if this is possible at all without
 > having to use all sorts of unreliable heuristics which are costly to
 > implement.

Yes, Windows applications simply assumes you're using a proprietary
Microsoft character set, and GNU/Linux apps prioritize support for
standard character encodings.  Maybe all you need is
(prefer-coding-system 'cp850)

-- 
Kevin Rodgers

  reply	other threads:[~2005-12-15 22:02 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-13 23:34 [angeli@iwi.uni-sb.de: Coding problem with Euro sign] Richard M. Stallman
2005-12-14 18:56 ` Kevin Rodgers
2005-12-14 22:51   ` Ralf Angeli
2005-12-15  1:34     ` Kevin Rodgers
2005-12-15 16:20       ` Ralf Angeli
2005-12-15 22:02         ` Kevin Rodgers [this message]
2005-12-16  8:57           ` Eli Zaretskii
2005-12-16 17:59             ` Kevin Rodgers
2005-12-17  7:19               ` Eli Zaretskii
2005-12-16 11:55           ` Ralf Angeli
2005-12-16 22:58             ` Kevin Rodgers
2005-12-17  7:36               ` Eli Zaretskii
2005-12-17 10:47               ` Reiner Steib
2006-01-10 12:38             ` windows-XXXX and cpXXXX Kenichi Handa
2006-01-10 19:18               ` Eli Zaretskii
2006-01-11 11:35                 ` Kenichi Handa
2006-01-11 17:46                   ` Eli Zaretskii
2006-01-12  1:25                     ` Kenichi Handa
2006-01-12  4:33                       ` Eli Zaretskii
2006-01-12  8:29                         ` Werner LEMBERG
2006-01-12 19:56                           ` Eli Zaretskii
2006-01-12 13:23                         ` Kenichi Handa
2006-01-12 19:59                           ` Eli Zaretskii
2006-01-13  0:58                             ` Kenichi Handa
2006-01-13  8:52                               ` Eli Zaretskii
2006-01-13 11:50                                 ` Kenichi Handa
2006-01-13 12:59                                   ` Eli Zaretskii
2006-01-16  1:05                                     ` Kenichi Handa
2006-01-16  4:31                                       ` Eli Zaretskii
2006-01-16 12:11                                         ` Kenichi Handa
2006-01-13 14:45                                 ` Stefan Monnier
2005-12-16 10:35         ` [angeli@iwi.uni-sb.de: Coding problem with Euro sign] David Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='dnsp6c$mg2$1@sea.gmane.org' \
    --to=ihs_4664@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).