From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.devel Subject: Re: [angeli@iwi.uni-sb.de: Coding problem with Euro sign] Date: Wed, 14 Dec 2005 11:56:43 -0700 Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1134597898 22259 80.91.229.2 (14 Dec 2005 22:04:58 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 14 Dec 2005 22:04:58 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 14 23:04:56 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EmehS-0008O2-5b for ged-emacs-devel@m.gmane.org; Wed, 14 Dec 2005 23:02:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Emei3-00078y-RS for ged-emacs-devel@m.gmane.org; Wed, 14 Dec 2005 17:02:51 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Embv3-0001ho-Ta for emacs-devel@gnu.org; Wed, 14 Dec 2005 14:04:06 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Embv1-0001dM-AD for emacs-devel@gnu.org; Wed, 14 Dec 2005 14:04:04 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Embv0-0001ao-AM for emacs-devel@gnu.org; Wed, 14 Dec 2005 14:04:02 -0500 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtp (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.34) id 1Embx7-0001IO-8R for emacs-devel@gnu.org; Wed, 14 Dec 2005 14:06:13 -0500 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Embr7-0004wA-JA for emacs-devel@gnu.org; Wed, 14 Dec 2005 20:00:01 +0100 Original-Received: from 207.167.42.60 ([207.167.42.60]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 14 Dec 2005 20:00:01 +0100 Original-Received: from ihs_4664 by 207.167.42.60 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 14 Dec 2005 20:00:01 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 62 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 207.167.42.60 User-Agent: Mozilla Thunderbird 0.9 (X11/20041105) X-Accept-Language: en-us, en In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:47722 Archived-At: Richard M. Stallman wrote: > Would someone please DTRT and ack? > > ------- Start of forwarded message ------- > From: Ralf Angeli > To: emacs-pretest-bug@gnu.org > Date: Tue, 13 Dec 2005 13:12:02 +0100 ... > Subject: Coding problem with Euro sign > Sender: emacs-pretest-bug-bounces+rms=gnu.org@gnu.org ... > > - --=-=-= > > Attached you can find a file with two 8-bit characters I extracted > from a file produced by Visual Studio under Windows. The characters > should be u umlaut and the Euro sign. Emacs does not seem to be able > to find the right coding system for it and displays it with > raw-text-dos. I could not get the file displayed correctly by loading > it with iso-latin-1, iso-latin-9, or cp1251. And I am not sure if > this is a problem of Emacs or if Visual Studio simply produced > garbage. > > > - --=-=-= > Content-Type: text/plain; charset=utf-8 > Content-Disposition: attachment; filename=test.txt > Content-Transfer-Encoding: quoted-printable > > =FC u umlaut > =C2=80 euro > > - --=-=-= I think the OP is confused: u umlaut is 0xFC in ISO 8859-1 (Latin 1), ISO 8859-15 (Latin 9), and Unicode. The euro is 0xA4 in ISO 8859-15 but U+20AC in Unicode (and not defined in ISO 8859-1). But in UTF-8, as the quoted-printable attachment claims to be, they are 0xC3 0xBC and 0xE2 0x82 0xAC resp. The attachment above uses a single-byte encoding for u umlaut. But the encoding used for the euro is a either an unknown 2-byte encoding or the wrong single-byte encoding (C2 is A circumflex in ISO 8859-15) followed by 0x80 (undefined in ISO 8859-*). That could explain why Emacs does not recognize it as iso-latin-1 or iso-latin-9. As far as Microsoft Windows code pages go, 1251 is Cyrillic so the OP must have meant 1252. And in that character set, the euro is indeed 0x80 (and 0xC2 is still A circumflex). So the attachment should have been labelled windows-1252 instead of utf-8, and its contents would be more accurately written as: =FC u umlaut =C2 A circumflex =80 euro And the OP should try visiting the file with the cp1252 coding system. -- Kevin Rodgers