From: harven <harven@free.fr>
To: help-gnu-emacs@gnu.org
Subject: Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word)
Date: Tue, 18 Jan 2011 22:32:23 +0100 [thread overview]
Message-ID: <87wrm1gaoo.fsf@ergodik.univ-brest.fr> (raw)
In-Reply-To: ih4r54$fnv$1@reader1.panix.com
dkcombs@panix.com (David Combs) writes:
> FURTHER, and more importantly, how do I *search* for
> one of these funny things, a left-double-quote, say?
> It's so *easy* to just hit C-s "!
You can go to the next non-ascii character using
C-M-s [^[:ascii:]] RET
Repeating C-s after that will recurse through the non-ascii characters.
> You mean do a query-replace on each non-ascii char? How do I
> even know which ones are even *in* some buffer of text?
You can use the next command to list all characters in the buffer together
with their frequencies. The non-ascii one should appear at the end.
(defun frequency ()
"Compute the frequencies for each character in the buffer.
The result appears in another buffer called *frequency*"
(interactive)
(save-excursion
(goto-char (point-min))
(let ((freq (make-hash-table :test 'equal)))
(while (re-search-forward "." nil t)
(puthash (match-string 0)
(1+ (gethash (match-string 0) freq 0))
freq))
(pop-to-buffer "*frequency*")
(erase-buffer)
(maphash
'(lambda (key value)
(insert key " " (number-to-string value) "\n"))
freq))
(sort-numeric-fields -1 (point-min) (point-max))
(reverse-region (point-min) (point-max))
(other-window 1)))
>
> What'd be nice is something that went through the whole
> buffer *once*, doing the "right thing" with each
> non-ascii char.
>
> Do I make any sense? Or do I not really understand?
Yes it makes sense.
Have a look at iso-cvt.el. This package provides commands to handle iso8859-1
characters. You can find there a function called iso-translate-conventions.
This function translates character according to a translation table. I am not
aware of a table giving an ascii translation for all utf-8 characters, so you
will have to make your own, along the lines of
(defvar my-iso-trans-tab
'(("à" "a")
("é" "e")
("ß" "s")
("ñ" "~n"))
"Translation table for translating some character to ascii.
This table is not exhaustive.")
Then, assuming you have executed iso-translate-conventions from iso-cvt.el,
use the following command to translate the selected region.
(defun my-iso-all2ascii (from to &optional buffer)
"Translate to ascii characters.
Translate the region between FROM and TO using the table
`my-iso-trans-tab'.
Optional arg BUFFER is ignored (for use in `format-alist')."
(interactive "*r")
(iso-translate-conventions from to my-iso-trans-tab))
Hope that helps
next prev parent reply other threads:[~2011-01-18 21:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-09 0:53 how to scan file for non-ascii chars (eg cut-n-paste from ms-word) David Combs
2011-01-09 14:23 ` Eli Zaretskii
2011-01-09 16:24 ` Kenneth Goldman
2011-01-09 17:30 ` Eli Zaretskii
2011-01-09 17:53 ` how to scan file for non-ascii chars (egcut-n-paste " Drew Adams
[not found] ` <mailman.8.1294595713.11727.help-gnu-emacs@gnu.org>
2011-01-18 20:19 ` David Combs
2011-01-19 2:02 ` how to scan file for non-ascii chars(egcut-n-paste " Drew Adams
[not found] ` <mailman.5.1294594249.11727.help-gnu-emacs@gnu.org>
2011-01-18 20:06 ` how to scan file for non-ascii chars (eg cut-n-paste " David Combs
[not found] ` <mailman.11.1294583034.18702.help-gnu-emacs@gnu.org>
2011-01-18 19:54 ` David Combs
2011-01-18 21:32 ` harven [this message]
2011-01-19 4:38 ` Teemu Likonen
2011-01-20 6:57 ` Kevin Rodgers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wrm1gaoo.fsf@ergodik.univ-brest.fr \
--to=harven@free.fr \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.