From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Peter Dyballa Newsgroups: gmane.emacs.help Subject: Re: recoding a buffer coding system Date: Sat, 15 Aug 2009 17:15:01 +0200 Message-ID: <6F6305D2-A661-4DA3-A5BB-02EA10068BB5@Web.DE> References: <87zla29j5p.fsf@uchicago.edu> <833a7tk2hb.fsf@gnu.org> <87hbw9m9mb.fsf@uchicago.edu> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v753.1) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1250349343 22886 80.91.229.12 (15 Aug 2009 15:15:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 15 Aug 2009 15:15:43 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Santiago Mejia Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Aug 15 17:15:36 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1McKyk-0005g0-1c for geh-help-gnu-emacs@m.gmane.org; Sat, 15 Aug 2009 17:15:34 +0200 Original-Received: from localhost ([127.0.0.1]:60902 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1McKyj-0004s8-4C for geh-help-gnu-emacs@m.gmane.org; Sat, 15 Aug 2009 11:15:33 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1McKyM-0004rt-Uz for help-gnu-emacs@gnu.org; Sat, 15 Aug 2009 11:15:10 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1McKyH-0004rh-RE for help-gnu-emacs@gnu.org; Sat, 15 Aug 2009 11:15:09 -0400 Original-Received: from [199.232.76.173] (port=42993 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1McKyH-0004re-M0 for help-gnu-emacs@gnu.org; Sat, 15 Aug 2009 11:15:05 -0400 Original-Received: from fmmailgate03.web.de ([217.72.192.234]:33838) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1McKyH-0007ZO-6C for help-gnu-emacs@gnu.org; Sat, 15 Aug 2009 11:15:05 -0400 Original-Received: from smtp07.web.de (fmsmtp07.dlan.cinetic.de [172.20.5.215]) by fmmailgate03.web.de (Postfix) with ESMTP id E59B410D34043; Sat, 15 Aug 2009 17:15:03 +0200 (CEST) Original-Received: from [91.35.251.10] (helo=[192.168.1.2]) by smtp07.web.de with asmtp (WEB.DE 4.110 #314) id 1McKyF-0001UW-00; Sat, 15 Aug 2009 17:15:03 +0200 In-Reply-To: <87hbw9m9mb.fsf@uchicago.edu> X-Mailer: Apple Mail (2.753.1) X-Sender: Peter_Dyballa@web.de X-Provags-ID: V01U2FsdGVkX18/Mzee4od3copR3ltEd9/GxD/4fDaOSJTd5bZu GHflfa7vNCUbfbRBU64rhmdzdzkDsXUkuRWz3xcp2zRDgrpKKr bOQ/lwIbQtf+SHMwg/GQ== X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:67174 Archived-At: Am 15.08.2009 um 16:31 schrieb Santiago Mejia: > In the buffer *http www:wordreference.com:80* I see the character that > firefox displays as "=FC" (u with umlaut) as \303\274. LATIN SMALL LETTER U WITH DIAERESIS is U+00FC. It is saved as C3 BC =20 (hex) or \303 \274. So you get a correct byte representation. > When I try to copy > and paste it here in this e-mail, however, it appears as: "=C3=BC" Because LATIN CAPITAL LETTER A WITH TILDE is U+00BC and VULGAR =20 FRACTION ONE QUARTER is U+00BC and these two bytes are presented as =20 if belonging into some ISO Latin encoding. > > As I said, however, if I merely save and reopen the file, the =20 > characters > get shown properly. Yes, GNU Emacs now interprets the two bytes as one Unicode character. > > In case this is useful, in the buffer *http www:wordreference.com:80* > the variable 'buffer-file-coding-system' is mule-utf-8 > In the end? When you re-open a second time? The problem probably is that url-retrieve-synchronously fetches a =20 byte stream which is fed into a 7-bit (?) encoding buffer, so Unicode =20= encoded characters end up as two (or more) bytes which are display in =20= octal because their character codes are inappropriate for this encoding. Me, working in GNU Emacs 23.1.50 and 22.3, see no octal codes, I only =20= see the bytes from the UTF-8 encoded umlauts etc. according to HTML =20 property "charset=3Dutf-8." The buffer is in actual no encoding at all, =20= and so you're lucky that it's contents is saved as UTF-8! Therefore =20 no information is lost and obviously GNU Emacs uses the proper =20 encoding when it opens the *file* now. Maybe using (modify-coding-system-alist 'process "" 'utf-8) makes GNU Emacs handle the buffer, associated with no file and with =20 no process, more like it should... I haven't found the proper setting! -- Greetings Pete Time is an illusion. Lunchtime, doubly so.