all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Xah Lee <xahlee@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: those funny non-ASCII characters
Date: Thu, 24 May 2012 17:56:59 -0700 (PDT)	[thread overview]
Message-ID: <a27b3fd4-7179-4e47-bc60-883cc1b1a59c@nl1g2000pbc.googlegroups.com> (raw)
In-Reply-To: mailman.1638.1337903381.855.help-gnu-emacs@gnu.org

On May 24, 4:49 pm, "Buchs, Kevin" <buchs.ke...@mayo.edu> wrote:
> I often paste content from web pages into an emacs org-mode buffer and I
> get the odd quote characters or dashes that are not ASCII. I created a
> lisp function to remove the unicode ones that are just 8 bits. Lately I
> am seeing that there are characters that are not being caught. They show
> up in emacs as the expected character. When I kill/yank them into lisp
> code, they are not being found. When I save the buffer, I am asked for
> coding and chose raw text. When the file is opened again, these
> characters are showing up as some sort of special symbol (dashed circle
> with flag off the top) followed by doubles/triples of \2xx. For example,
> the dash character I just stored was this sequence: circle-flag \200
> \231. Using Gnu/Linux od to dump them I get hex strings such as: 340 245
> 206 340 244 206 210 200 and for the dash mentioned above 342 200 231.
>
> I am very naive in regard to coding, so please excuse my ignorance. I
> would guess these are 16-bit (Unicode16) characters. Can someone
> enlighten me as to how I can determine what these characters are (after
> pasted into a buffer) and how I can code a function to replace them with
> ASCII equivalents? The only thing I could think of was hexl mode, but
> that didn't turn out well. Thanks.


better to embrace unicode than fight it.

what encoding you have when you paste is rather complex. I guess it
depends on the sources you copy from, as each web page can be in diff
charset and encoding then am not sure your OS do some translation in
the pasteboard.

maybe this will help.

〈Emacs File/Character Encoding/Decoding FAQ〉
http://xahlee.org/emacs/emacs_encoding_decoding_faq.html

〈Xah's Unicode Tutorial〉
http://xahlee.org/Periodic_dosage_dir/unicode.html

to replace non-ascii, you can use the regex

[[:nonascii:]]+

〈Char Classes - GNU Emacs Lisp Reference Manual〉
http://xahlee.org/emacs_manual/elisp/Char-Classes.html

〈Emacs Lisp: Convert Unicode String to ASCII (Zap Gremlins)〉
http://xahlee.org/emacs/emacs_zap_gremlins.html

 Xah


       reply	other threads:[~2012-05-25  0:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.1638.1337903381.855.help-gnu-emacs@gnu.org>
2012-05-25  0:56 ` Xah Lee [this message]
     [not found] <mailman.1961.1338398127.855.help-gnu-emacs@gnu.org>
2012-06-01  4:23 ` those funny non-ASCII characters Jason Rumney
2012-06-01  5:43   ` rusi
2012-06-01  6:12     ` Eli Zaretskii
2012-06-01  7:03     ` Xah Lee
2012-06-01 16:26       ` rusi
2012-06-01 21:06         ` Xah Lee
2012-06-02  3:17           ` rusi
2012-06-02 11:54             ` Xah Lee
2012-06-02 14:10               ` Xah Lee
2012-05-30 17:15 Buchs, Kevin
2012-05-31  7:17 ` Thien-Thi Nguyen
2012-05-31 14:57   ` Buchs, Kevin
2012-05-31 16:40     ` Thien-Thi Nguyen
2012-05-31 16:56       ` Buchs, Kevin
2012-05-31 21:46         ` Thien-Thi Nguyen
2012-06-01 13:36           ` Doug Lewan
     [not found]         ` <mailman.2041.1338500734.855.help-gnu-emacs@gnu.org>
2012-06-01  2:42           ` rusi
2012-05-31 15:59 ` PJ Weisberg
     [not found] <mailman.1665.1337953237.855.help-gnu-emacs@gnu.org>
2012-05-25 18:33 ` Xah Lee
  -- strict thread matches above, loose matches on Subject: below --
2012-05-25 13:40 Buchs, Kevin
2012-05-25 14:04 ` Eli Zaretskii
2012-05-25 14:42 ` Jambunathan K
2012-05-24 23:49 Buchs, Kevin
2012-05-25  6:36 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a27b3fd4-7179-4e47-bc60-883cc1b1a59c@nl1g2000pbc.googlegroups.com \
    --to=xahlee@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.