unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: those funny non-ASCII characters
Date: Fri, 25 May 2012 17:04:00 +0300	[thread overview]
Message-ID: <83r4u8we5r.fsf@gnu.org> (raw)
In-Reply-To: <F326B9A37B353A449FC7A09B36835F820104816E@MACE.sppdg.ad>

> Date: Fri, 25 May 2012 08:40:25 -0500
> From: "Buchs, Kevin" <buchs.kevin@mayo.edu>
> 
> Thanks, Xah and Eli, for contributing to my further understanding. I
> went to a specific website where I got the content I copied and pasted
> and I can see from the HTML that it has a charset=UTF-8, so I understand
> that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
> character I pasted has a code point of 0x2013 (U+2013). I didn't see,
> however, what the UTF-8 encoding of that code point was. Should I be
> able to read that somewhere on the buffer of information I get with C-u
> C-x = ?

Yes, this part of "C-u C-x ="'s display:

            file code: #xE2 #x80 #x93 (encoded by coding system utf-8-dos)

shows you how it would be encoded in UTF-8.  If you see something like
"not encodable by ...", then you need to set the buffer's encoding
using "C-x RET f".  Under "file code", Emacs shows how the character
would be encoded if the buffer is saved to a disk file or sent to
another program or as an email message.

> I was poking around the www.unicode.org website, trying to
> understand how this U+2013 code point is encoded into UTF-8, but I
> haven't determined that yet.

See above: Emacs shows this under the right circumstances.

> So, help me piece together what happens as I paste the UTF-8 text into a
> buffer. First, the paste buffer must define that it is in UTF-8.

On Windows, Emacs always uses UTF-16 to pass text via the clipboard,
because doing so lets Emacs copy and paste any character from any
character set on Earth.

> Emacs reads this information and inserts it into the byte string
> that defines the buffer. Now, how does emacs record that it was a
> UTF-8 encoded character?

It doesn't.  What it records is the encoding to be used for the
current buffer if it is saved to disk or sent to some program.  That
encoding is a property of the buffer, not of the characters.

> Does it translate it into a different internal encoding

Yes, it does.

> Is this encoding used
> as a superset of all possible encoding systems that emacs supports?

Yes.  See the section "Text Representations" in the ELisp manual that
comes with Emacs, you will find the details there.



  reply	other threads:[~2012-05-25 14:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-25 13:40 those funny non-ASCII characters Buchs, Kevin
2012-05-25 14:04 ` Eli Zaretskii [this message]
2012-05-25 14:42 ` Jambunathan K
     [not found] <mailman.1961.1338398127.855.help-gnu-emacs@gnu.org>
2012-06-01  4:23 ` Jason Rumney
2012-06-01  5:43   ` rusi
2012-06-01  6:12     ` Eli Zaretskii
2012-06-01  7:03     ` Xah Lee
2012-06-01 16:26       ` rusi
2012-06-01 21:06         ` Xah Lee
2012-06-02  3:17           ` rusi
2012-06-02 11:54             ` Xah Lee
2012-06-02 14:10               ` Xah Lee
  -- strict thread matches above, loose matches on Subject: below --
2012-05-30 17:15 Buchs, Kevin
2012-05-31  7:17 ` Thien-Thi Nguyen
2012-05-31 14:57   ` Buchs, Kevin
2012-05-31 16:40     ` Thien-Thi Nguyen
2012-05-31 16:56       ` Buchs, Kevin
2012-05-31 21:46         ` Thien-Thi Nguyen
2012-06-01 13:36           ` Doug Lewan
     [not found]         ` <mailman.2041.1338500734.855.help-gnu-emacs@gnu.org>
2012-06-01  2:42           ` rusi
2012-05-31 15:59 ` PJ Weisberg
     [not found] <mailman.1665.1337953237.855.help-gnu-emacs@gnu.org>
2012-05-25 18:33 ` Xah Lee
     [not found] <mailman.1638.1337903381.855.help-gnu-emacs@gnu.org>
2012-05-25  0:56 ` Xah Lee
2012-05-24 23:49 Buchs, Kevin
2012-05-25  6:36 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83r4u8we5r.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).