unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Karl Fogel <kfogel@red-bean.com>
To: Lennart Borgman <lennart.borgman@gmail.com>
Cc: Emacs-Devel devel <emacs-devel@gnu.org>
Subject: Re: I'm is really I'm
Date: Tue, 06 Jul 2010 22:34:29 -0400	[thread overview]
Message-ID: <87zky410lm.fsf@red-bean.com> (raw)
In-Reply-To: <AANLkTin_HpNXlzaliFKwvXbxOwpEezSrg-3cY_12nZW_@mail.gmail.com> (Lennart Borgman's message of "Wed, 7 Jul 2010 03:21:15 +0200")

Lennart Borgman <lennart.borgman@gmail.com> writes:
>Obviously this character is normally ' (char 39).
>
>Do we have any tool for replacing such characters in Emacs? Or is
>there a better way?

I get this problem all the time when pasting from web pages, PDFs, and
other sources of formatted text.

So I've been trying to write either a "filtered paste" or just a
function to clean up a region after pasting it.  But I'm rusty on
character representations in Emacs these days, and am having trouble
coming up with a way to represent (in Elisp source code) the characters
that most often need replacing.

Anyone who wants to play Captain Obvious on the code below, go for it.
It would be nice to give Emacs a standard solution to this common
problem.

  (defun clean-region (start end)
    "Clean up a region of text that comes from a non-plaintext source.
  Formatted sources, such as web pages and PDF documents, often contain
  characters that could be reasonably represented in plain ASCII but are
  not.  For example the characters referenced by &rdquo; and &ldquo; in
  HTML are not the same as ASCII 34 (double quote).  It is sometimes
  desirable to simply convert the formatted text to ASCII."
    (interactive "*r")
    ;; TODO: this is not working yet.  Maybe make chars, not strings,
    ;; and this might work?  Not sure.
    (let ((open-double-quote  (make-string 3 0))
          (close-double-quote (make-string 3 0))
          (funderscore        ? )
          (apostrophe         (make-string 3 0)))
      ;; I don't know any other way to make these strings besides
      ;; just setting each character by hand... but even that doesn't
      ;; seem to result in a working `replace-string' in the end.
      (aset open-double-quote 0 ?â)
      (aset open-double-quote 1 128)
      (aset open-double-quote 2 156)
      (aset close-double-quote 0 ?â)
      (aset close-double-quote 1 128)
      (aset close-double-quote 2 157)
      (aset apostrophe 0 ?â)
      (aset apostrophe 1 128)
      (aset apostrophe 2 153)
      (save-excursion
        (goto-char start)
        (replace-string apostrophe "'"  nil start end))))



  parent reply	other threads:[~2010-07-07  2:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-07  1:21 I'm is really I'm Lennart Borgman
2010-07-07  1:38 ` Harald Hanche-Olsen
2010-07-07  1:57   ` Lennart Borgman
2010-07-07  4:19     ` Jason Rumney
2010-07-07  6:44       ` Reiner Steib
2010-07-07  2:34 ` Karl Fogel [this message]
2010-07-07 13:55   ` Davis Herring
2010-07-07 15:22     ` Karl Fogel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zky410lm.fsf@red-bean.com \
    --to=kfogel@red-bean.com \
    --cc=emacs-devel@gnu.org \
    --cc=lennart.borgman@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).