From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.help Subject: Re: Special Characters Date: Wed, 12 Aug 2015 10:19:23 +0600 Message-ID: References: <3aa57c28-0bd8-45ab-bcf9-b68029b02889@googlegroups.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1439353203 11431 80.91.229.3 (12 Aug 2015 04:20:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 12 Aug 2015 04:20:03 +0000 (UTC) Cc: "help-gnu-emacs@gnu.org" To: Ian Baylis Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Aug 12 06:19:58 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZPNW8-0000ds-Q6 for geh-help-gnu-emacs@m.gmane.org; Wed, 12 Aug 2015 06:19:56 +0200 Original-Received: from localhost ([::1]:36792 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZPNW7-00010e-Gj for geh-help-gnu-emacs@m.gmane.org; Wed, 12 Aug 2015 00:19:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZPNVx-00010Z-C2 for help-gnu-emacs@gnu.org; Wed, 12 Aug 2015 00:19:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZPNVw-0001DT-3g for help-gnu-emacs@gnu.org; Wed, 12 Aug 2015 00:19:45 -0400 Original-Received: from mail-lb0-x22f.google.com ([2a00:1450:4010:c04::22f]:36491) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZPNVv-0001DL-RE for help-gnu-emacs@gnu.org; Wed, 12 Aug 2015 00:19:44 -0400 Original-Received: by lbbpu9 with SMTP id pu9so2795659lbb.3 for ; Tue, 11 Aug 2015 21:19:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=NRyxMKP/iF4WRtdjfge5ZIS7EyNH9WaJmUt27LHqYus=; b=uc0ua7FLEqafRZNuGx97/sAmB8JOH5fhYbG0oOyC33vg7M/rse8qiHkhjy5dovReHl x4EAHkoGtvADyUMcI+jeCo0Dp7nOKqPcdYdzDs/OF4pqtfirsqDNN3FTFdKj7INsWsd0 UJoEOaIwf0lhCoMFdjuK6Gyr092kMl7m2WAVV5MD5WHMEsZXKqVy4P36wDFJ0erIAWqr uVTrXVX6YuQ2V6ku1TNN6BqecV2dYEnYN0vu0+jozfpxDRbzDf3X2V3g+PTf+FB2nB0j RuMVtfucwbjAE6WR5n8VSfuN30+OX5ZAoKWWot3UiNXHMTGpMz+vS/vT1ftESSPz8Yes VxdQ== X-Received: by 10.152.204.196 with SMTP id la4mr30571642lac.124.1439353182942; Tue, 11 Aug 2015 21:19:42 -0700 (PDT) Original-Received: by 10.25.206.1 with HTTP; Tue, 11 Aug 2015 21:19:23 -0700 (PDT) In-Reply-To: X-Google-Sender-Auth: PXJRFVFJGGCu_G0ewVeSLLdPdcQ X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4010:c04::22f X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106490 Archived-At: On Tue, Aug 11, 2015 at 11:55 PM, Ian Baylis wrote: > Thanks for the reply. Is there a list that contains all the octal > representations of characters like \342\200\231? If you=E2=80=99re interested, you might want to read a description of the UTF-8 encoding, then browse the Unicode charts. http://tools.ietf.org/html/rfc3629 http://www.unicode.org/charts/ However, I must ask: Why do you want to know? Are you going to hand-decode files that come your way? Why not delegate that work to computers? There are many different character encodings. When people or software do not agree on which one they use, misdecoding occurs. With some experience, one can make an accurate guess at which encoding was used originally, although this becomes less necessary as we migrate to UTF-8. PS: please don=E2=80=99t top-post. > On Aug 11, 2015 1:22 PM, "Yuri Khan" wrote: >> >> On Tue, Aug 11, 2015 at 7:37 AM, Ian Baylis wrote: >> > I have a file that has special characters in it. When I open the fil= e >> > in Emacs an ' is represented like: >> > \200\231 or just \231 >> >> The Unicode apostrophe =E2=80=99 (U+2019 Right single quotation mark) is >> encoded in UTF-8 as a sequence of three bytes, whose octal >> representation is \342\200\231 (or hexadecimal E2 80 99). >> >> If your Emacs incorrectly picks e.g. the ISO-8859-1 (aka Latin-1) >> encoding for this file, you will see the letter =C3=A2 (U+00E2 Latin sma= ll >> letter a with circumflex), followed by two codes \200 and \231, >> because those do not correspond to printable characters in Latin-1. >> >> In order to view the file as intended, you need to re-open that file >> using the correct encoding (UTF-8). Eli has given you the command: >> >> C-x RET c utf-8 RET C-x C-f FILE-NAME RET >> >> Alternatively, if you already have a buffer visiting the file, you can >> revert it using the correct encoding: >> >> C-x RET r utf-8 RET (you might need to confirm the revert). >> >> You then need to evaluate how often you use files in encodings other >> than UTF-8. If rarely, you might want to set UTF-8 as your default >> encoding.