From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.devel Subject: Re: Problem with national characters in XHTML Date: Wed, 28 Sep 2005 16:05:03 +0200 Message-ID: <433AA30F.8050203@student.lu.se> References: <14e4cba14e7621.14e762114e4cba@net.lu.se> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------020309030801090309030402" X-Trace: sea.gmane.org 1127917206 31794 80.91.229.2 (28 Sep 2005 14:20:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 28 Sep 2005 14:20:06 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 28 16:19:58 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EKckS-0007NJ-F6 for ged-emacs-devel@m.gmane.org; Wed, 28 Sep 2005 16:17:28 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EKckR-0003QS-EW for ged-emacs-devel@m.gmane.org; Wed, 28 Sep 2005 10:17:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1EKcZa-0004ea-G7 for emacs-devel@gnu.org; Wed, 28 Sep 2005 10:06:14 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EKcZY-0004dk-RI for emacs-devel@gnu.org; Wed, 28 Sep 2005 10:06:13 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EKcZX-0004dK-Li for emacs-devel@gnu.org; Wed, 28 Sep 2005 10:06:12 -0400 Original-Received: from [81.228.8.164] (helo=pne-smtpout2-sn2.hy.skanova.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1EKcYZ-0000HZ-8h for emacs-devel@gnu.org; Wed, 28 Sep 2005 10:05:11 -0400 Original-Received: from [192.168.123.121] (83.249.205.211) by pne-smtpout2-sn2.hy.skanova.net (7.2.060.1) id 42B94E290119E174; Wed, 28 Sep 2005 16:05:07 +0200 User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en Original-To: Kenichi Handa In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:43329 Archived-At: This is a multi-part message in MIME format. --------------020309030801090309030402 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: Quoted-Printable Kenichi Handa wrote: >In article <14e4cba14e7621.14e762114e4cba@net.lu.se>, LENNART BORGMAN writes: > > =20 > >>I have run into a problem with swedish national characters in an XHTML = document. The header of the document is like this: >> >> > "http://www.w3.org/TR/REC-html40/loose.dtd"> >> >> =20 >> > > =20 > >>The swedish character =E4 looks like \344 in CVS Emacs (2005-09-23). It= looks ok in Internet Explorer, but not in Firefox. Looking at the file w= ith Notepad also shows the swedish characters as expected. >> =20 >> > > =20 > >>I would be glad for some hints and pointers! I am using nxml-mode if th= at matters here. >> =20 >> > >Could you please send me the whole file? > =20 > I have attached to test files in XHTML, one user utf-8 in the header and=20 the other iso-8859-1. Those files tells what is displayed in IE and=20 Firefox and how the swedish character =E4 was entered (though I guess som= e=20 info might be missing for the experts here). I find this a bit confusing still. What character is entered by Emacs=20 when I type =E4 on my swedish keyboard? When I look at the character =E4 = in=20 Emacs with (following-char) it in both test files returns 2276. Is that=20 what I would expect in the iso-8859-1 test file? (It starts with ) --------------020309030801090309030402 Content-Type: text/html; name="test-xhtml-iso-8859-1.html" Content-Disposition: inline; filename="test-xhtml-iso-8859-1.html" Content-Transfer-Encoding: Quoted-Printable Testing National Characters in Emacs, IE and Firefox

Testing National Characters in Emacs, IE 6.0 SP 1 and Firefox 1.0= .7

Using GNU Emacs 22.0.50.1 (i386-mingw-nt5.0.2195) of 2005-09-28

The header in this file contains <xml version=3D"1.0" encoding=3D"iso-8859-1"?>

Character and context Internet Explorer Fir= efox
This is the swedish character =E4 entered in a new iso-8859-1= file. Correct Correct<= /td>
This is swedish =E4 entered in a new utf-8 file.

Compare this with using UTF-8

--------------020309030801090309030402 Content-Type: text/html; name="test-xhtml-utf-8.html" Content-Disposition: inline; filename="test-xhtml-utf-8.html" Content-Transfer-Encoding: Quoted-Printable Testing National Characters in Emacs, IE and Firefox

Testing National Characters in Emacs, IE and Firefox

Using GNU Emacs 22.0.50.1 (i386-mingw-nt5.0.2195) of 2005-09-28

The header in this file contains <xml version=3D"1.0" encoding=3D"utf-8"?>

Character and context Internet Explorer Fir= efox
This is swedish =C3=A4 entered in a new utf-8 file. Wrong Correct
This is swedish =C3=A4 entered after opening the file again.<= /td> Wrong Correct
This is the swedish character =C3=A4 entered in a new iso-885= 9-1 file.

Compare this with using ISO-= 8859-1

Testing Emacs display

If <xml version=3D"1.0" encoding=3D"utf-8"?> is changed to use 8859-1 Emacs still displays the entered characters as they were correct.

Conclusion

Emacs and Firefox seems to handle this correctly. However due to bugs in Internet Explorer only ISO-8859-1 currently = can handle both browsers.

--------------020309030801090309030402 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel --------------020309030801090309030402--