From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Buchs, Kevin" Newsgroups: gmane.emacs.help Subject: Re: those funny non-ASCII characters Date: Fri, 25 May 2012 08:40:25 -0500 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1337953245 21732 80.91.229.3 (25 May 2012 13:40:45 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 25 May 2012 13:40:45 +0000 (UTC) To: Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri May 25 15:40:43 2012 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SXuky-00037p-9i for geh-help-gnu-emacs@m.gmane.org; Fri, 25 May 2012 15:40:40 +0200 Original-Received: from localhost ([::1]:47161 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXukx-0001V2-TB for geh-help-gnu-emacs@m.gmane.org; Fri, 25 May 2012 09:40:39 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:45109) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXukp-0001U9-Ac for help-gnu-emacs@gnu.org; Fri, 25 May 2012 09:40:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SXukn-0007ey-2z for help-gnu-emacs@gnu.org; Fri, 25 May 2012 09:40:30 -0400 Original-Received: from mail10.mayo.edu ([129.176.212.47]:27868) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXukm-0007e2-US for help-gnu-emacs@gnu.org; Fri, 25 May 2012 09:40:29 -0400 X-IronPort-AV: E=Sophos;i="4.75,656,1330927200"; d="scan'208";a="156940807" Original-Received: from roedlp004a.mayo.edu (HELO mail10.mayo.edu) ([129.176.158.14]) by ironport10-dlp.mayo.edu with ESMTP; 25 May 2012 08:40:26 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av4EAJ+Kv0+BsNQ1/2dsb2JhbABCA7UjgQeCFQEBAQQwCg88BgEIIgYYB1cBBAoJCBaHb5onmA6JBI0sBYI2YAOIP4xZinqEdoJ+ X-IronPort-AV: E=Sophos;i="4.75,656,1330927200"; d="scan'208";a="156940805" Original-Received: from mhro1a.mayo.edu ([129.176.212.53]) by ironport10.mayo.edu with ESMTP; 25 May 2012 08:40:25 -0500 Original-Received: from smtprelay.mayo.edu (smtprelay1.mayo.edu [192.168.48.10]) by mhro1a.mayo.edu with ESMTP id BT-MMP-1892331 for help-gnu-emacs@gnu.org; Fri, 25 May 2012 08:40:25 -0500 Original-Received: from MACE.mayo.edu (mace.mayo.edu [129.176.215.134]) by smtprelay.mayo.edu (8.12.11/8.12.11) with ESMTP id q4PDePGQ024755 for ; Fri, 25 May 2012 08:40:25 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: those funny non-ASCII characters thread-index: Ac06dD1ikmb6lBQoSzqKl3eF5AhR3w== X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 129.176.212.47 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:84970 Archived-At: Thanks, Xah and Eli, for contributing to my further understanding. I went to a specific website where I got the content I copied and pasted and I can see from the HTML that it has a charset=3DUTF-8, so I = understand that is Unicode 8-bit. Using the C-u C-x =3D, I see that the particular character I pasted has a code point of 0x2013 (U+2013). I didn't see, however, what the UTF-8 encoding of that code point was. Should I be able to read that somewhere on the buffer of information I get with C-u C-x =3D ? I was poking around the www.unicode.org website, trying to understand how this U+2013 code point is encoded into UTF-8, but I haven't determined that yet. A fresh buffer in emacs for me on my Win-7 box has an encoding system of iso-latin-1-dos. The coding system used to open and save files is the same. So, help me piece together what happens as I paste the UTF-8 text into a buffer. First, the paste buffer must define that it is in UTF-8. Emacs reads this information and inserts it into the byte string that defines the buffer. Now, how does emacs record that it was a UTF-8 encoded character? Does it translate it into a different internal encoding instead of just recording the 8 bits transferred? Is this encoding used as a superset of all possible encoding systems that emacs supports? Now, Xah, you suggest I embrace Unicode. What does that mean? Would it involve marking all my lisp library files and my org-mode files with the file variable -*- coding: utf-8 -*- ? Or is there another way to go Unicode automatically?=20 I assume that if my lisp library files are encoded utf-8, then I can paste that character from the web page into my call to replace-string in order to substitute the longer dash of Unicode U+2013 with an ascii hyphen or double hyphen. But, how does that really work? If the lisp file is encoded utf-8, then how can I put an ascii character in the replacement string? I would appreciate it if someone could help me open this new door in my brain a bit further. Kevin Buchs | Senior Engineer | SPPDG | 507-538-5459 | buchs.kevin@mayo.edu Mayo Clinic | 200 First Street SW | Rochester, MN 55905 | http://www.mayo.edu/sppdg=20 -----Original Message----- With cursor on that character, type "C-u C-x =3D", and Emacs will show everything it knows about that character, including its canonical name.