From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: harven Newsgroups: gmane.emacs.help Subject: Re: Finding and mapping all UTF-8 characters Date: Sat, 05 Dec 2009 21:29:47 +0100 Organization: http://groups.google.com Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1260045689 374 80.91.229.12 (5 Dec 2009 20:41:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 5 Dec 2009 20:41:29 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 21:41:22 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NH1RQ-0007qV-JJ for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 21:41:20 +0100 Original-Received: from localhost ([127.0.0.1]:48038 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NH1RQ-0006GJ-E2 for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 15:41:20 -0500 Original-Path: news.stanford.edu!usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!tiscali!newsfeed1.ip.tiscali.net!proxad.net!feeder1-2.proxad.net!cleanfeed3-a.proxad.net!nnrp15-2.free.fr!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (darwin) Cancel-Lock: sha1:/BdsqdE//ENOgnyLt4wZIkDxiog= Original-Lines: 35 Original-NNTP-Posting-Date: 05 Dec 2009 21:29:49 MET Original-NNTP-Posting-Host: 78.233.232.132 Original-X-Trace: 1260044989 news-3.free.fr 11222 78.233.232.132:51286 Original-X-Complaints-To: abuse@proxad.net Original-Xref: news.stanford.edu gnu.emacs.help:175371 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70447 Archived-At: deech writes: > Hi all, > I recently cut-and-pasted large chunks of text into an HTML document. > When I tried to save the document I was warned that it was ISO-Latin > but there were UTF-8 characters in the text. The warning actually contains a list of these characters, and you can click on them to see where they are located in the buffer. > Is there a way to (1) search for the UTF-8 encoded characters in a > document and (2) map them to a sensible ASCII character? > > Thanks ... > -deech Instead of converting to latin-1, it is probably better to save the file in another coding system. Just do M-x set-buffer-file-coding-system RET utf-8 RET On the other hand, if you were surprised by the unicode characters, then this probably means that there are few of them. Have a look at the iso-cvt.el package for setting a conversion table. The command iso-sgml2iso is pretty close to what you want. Now, if you want to search a buffer for all characters belonging to some category, you can use a regexp. \ca matches any ascii characters (newlines excluded). Same as [[:ascii:]]. \Ca matches any non-ascii characters (newlines included). \cl matches any latin characters (newlines excluded). \Cl matches any non-latin characters (newlines included). So the following command copies all non-latin characters to the scratch buffer. M-x replace-regexp RET \Cl RET \,(princ \& (get-buffer "*scratch*"))