all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: harven <harven@free.fr>
To: help-gnu-emacs@gnu.org
Subject: Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word)
Date: Tue, 18 Jan 2011 22:32:23 +0100	[thread overview]
Message-ID: <87wrm1gaoo.fsf@ergodik.univ-brest.fr> (raw)
In-Reply-To: ih4r54$fnv$1@reader1.panix.com

dkcombs@panix.com (David Combs) writes:

> FURTHER, and more importantly, how do I *search* for
> one of these funny things, a left-double-quote, say?
> It's so *easy* to just hit C-s "!

You can go to the next non-ascii character using
C-M-s [^[:ascii:]] RET
Repeating C-s after that will recurse through the non-ascii characters.

> You mean do a query-replace on each non-ascii char?  How do I 
> even know which ones are even *in* some buffer of text?

You can use the next command to list all characters in the buffer together
with their frequencies. The non-ascii one should appear at the end.

(defun frequency ()
"Compute the frequencies for each character in the buffer.
 The result appears in another buffer called *frequency*"
(interactive)
(save-excursion
  (goto-char (point-min))
  (let ((freq (make-hash-table :test 'equal)))
    (while (re-search-forward "." nil t)
      (puthash (match-string 0)
        (1+  (gethash (match-string 0) freq 0))
               freq))
    (pop-to-buffer "*frequency*")
    (erase-buffer)
    (maphash
     '(lambda (key value)
        (insert key "  " (number-to-string value) "\n"))
     freq))
  (sort-numeric-fields -1 (point-min) (point-max))
  (reverse-region (point-min) (point-max))
  (other-window 1)))

>
> What'd be nice is something that went through the whole
> buffer *once*, doing the "right thing" with each
> non-ascii char.
>
> Do I make any sense?  Or do I not really understand?

Yes it makes sense.

Have a look at iso-cvt.el. This package provides commands to handle iso8859-1
characters. You can find there a function called iso-translate-conventions. 
This function translates character according to a translation table. I am not
aware of a table giving an ascii translation for all utf-8 characters, so you
will have to make your own, along the lines of

(defvar my-iso-trans-tab
  '(("à" "a")
    ("é" "e")
    ("ß" "s")
    ("ñ" "~n"))
  "Translation table for translating some character to ascii.
   This table is not exhaustive.")

Then, assuming you have executed iso-translate-conventions from iso-cvt.el,
use the following command to translate the selected region.

(defun my-iso-all2ascii (from to &optional buffer)
 "Translate to ascii characters.
Translate the region between FROM and TO using the table
`my-iso-trans-tab'.
Optional arg BUFFER is ignored (for use in `format-alist')."
 (interactive "*r")
 (iso-translate-conventions from to my-iso-trans-tab))

Hope that helps



  reply	other threads:[~2011-01-18 21:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-09  0:53 how to scan file for non-ascii chars (eg cut-n-paste from ms-word) David Combs
2011-01-09 14:23 ` Eli Zaretskii
2011-01-09 16:24   ` Kenneth Goldman
2011-01-09 17:30     ` Eli Zaretskii
2011-01-09 17:53       ` how to scan file for non-ascii chars (egcut-n-paste " Drew Adams
     [not found]       ` <mailman.8.1294595713.11727.help-gnu-emacs@gnu.org>
2011-01-18 20:19         ` David Combs
2011-01-19  2:02           ` how to scan file for non-ascii chars(egcut-n-paste " Drew Adams
     [not found]     ` <mailman.5.1294594249.11727.help-gnu-emacs@gnu.org>
2011-01-18 20:06       ` how to scan file for non-ascii chars (eg cut-n-paste " David Combs
     [not found] ` <mailman.11.1294583034.18702.help-gnu-emacs@gnu.org>
2011-01-18 19:54   ` David Combs
2011-01-18 21:32     ` harven [this message]
2011-01-19  4:38     ` Teemu Likonen
2011-01-20  6:57       ` Kevin Rodgers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wrm1gaoo.fsf@ergodik.univ-brest.fr \
    --to=harven@free.fr \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.