From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Deniz Dogan Newsgroups: gmane.emacs.help Subject: Re: Encoding/decoding problems Date: Thu, 28 Jul 2011 11:26:53 +0200 Message-ID: <4E312B5D.6090600@dogan.se> References: <4E311B3C.7040301@dogan.se> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1311845246 26523 80.91.229.12 (28 Jul 2011 09:27:26 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 28 Jul 2011 09:27:26 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 28 11:27:20 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QmMsC-0003cx-6x for geh-help-gnu-emacs@m.gmane.org; Thu, 28 Jul 2011 11:27:20 +0200 Original-Received: from localhost ([::1]:44251 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmMsB-0006Ai-Mw for geh-help-gnu-emacs@m.gmane.org; Thu, 28 Jul 2011 05:27:19 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:51261) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmMs6-0006A7-Km for help-gnu-emacs@gnu.org; Thu, 28 Jul 2011 05:27:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QmMs5-00018V-EA for help-gnu-emacs@gnu.org; Thu, 28 Jul 2011 05:27:14 -0400 Original-Received: from ch-smtp04.sth.basefarm.net ([80.76.153.5]:59481) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmMs5-00018M-3I for help-gnu-emacs@gnu.org; Thu, 28 Jul 2011 05:27:13 -0400 Original-Received: from c80-216-105-155.bredband.comhem.se ([80.216.105.155]:51542 helo=[192.168.0.10]) by ch-smtp04.sth.basefarm.net with esmtp (Exim 4.76) (envelope-from ) id 1QmMrz-0003is-FD for help-gnu-emacs@gnu.org; Thu, 28 Jul 2011 11:27:09 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 In-Reply-To: X-Originating-IP: 80.216.105.155 X-Scan-Result: No virus found in message 1QmMrz-0003is-FD. X-Scan-Signature: ch-smtp04.sth.basefarm.net 1QmMrz-0003is-FD add716b8f8b95d6d5c9f597f7de6ddce X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.76.153.5 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:81836 Archived-At: On 2011-07-28 11:01, Eli Zaretskii wrote: >> Date: Thu, 28 Jul 2011 10:18:04 +0200 >> From: Deniz Dogan >> >> I'm fetching an XML document that's uses iso-8859-1 coding with >> `url-retrieve' and then I parse it using `xml-parse-region'. >> >> After that, I get the parts of the document that I want and insert them >> into a buffer. However, the Swedish characters å, ä and ö are displayed >> as \345, \344 and \326 respectively. >> >> I've tried messing around with `encode-coding-region' and >> `decode-coding-region' but I'm really not sure what to do here. > > I suggest to start with describing a reproducible recipe for this > problem. Not sure if this forum is appropriate, perhaps emacs-devel > is a better place (as it sounds like you are describing a bug). > Here is the code to reproduce it: (defun fetch-and-show () (interactive) (let* ((old-buffer (current-buffer)) (url "http://dogan.se/sites/default/files/example.xml") (buffer (url-retrieve-synchronously url))) (with-current-buffer buffer (let ((doc (car (xml-parse-region (point-min) (point-max))))) (with-current-buffer old-buffer (insert (nth 2 (nth 2 (nth 3 doc))))))))) The XML file is encoded in iso-8859-1 with a bunch of Swedish characters here and there. The buffer I'm testing this with is *scratch* with utf-8-unix. It should insert "hallå" but inserts "hall\345". I have no idea whether I should use `encode-region-string' or `decode-region-string' or what. I'd doubt it's a bug to be honest, it's probably my lack of understanding that's causing this. Deniz