From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Peter Dyballa Newsgroups: gmane.emacs.help Subject: Re: Finding and mapping all UTF-8 characters Date: Sat, 5 Dec 2009 19:40:56 +0100 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1260038568 13792 80.91.229.12 (5 Dec 2009 18:42:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 5 Dec 2009 18:42:48 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: deech Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 19:42:41 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NGzZR-0005yV-9u for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 19:41:29 +0100 Original-Received: from localhost ([127.0.0.1]:52602 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NGzZQ-0002cG-SL for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 13:41:28 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NGzZ2-0002ZY-EB for help-gnu-emacs@gnu.org; Sat, 05 Dec 2009 13:41:04 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NGzYx-0002UX-QC for help-gnu-emacs@gnu.org; Sat, 05 Dec 2009 13:41:04 -0500 Original-Received: from [199.232.76.173] (port=43456 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NGzYx-0002UQ-Kx for help-gnu-emacs@gnu.org; Sat, 05 Dec 2009 13:40:59 -0500 Original-Received: from fmmailgate01.web.de ([217.72.192.221]:50171) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NGzYx-0005yE-MX for help-gnu-emacs@gnu.org; Sat, 05 Dec 2009 13:40:59 -0500 Original-Received: from smtp05.web.de (fmsmtp05.dlan.cinetic.de [172.20.4.166]) by fmmailgate01.web.de (Postfix) with ESMTP id E15B113FCD23B; Sat, 5 Dec 2009 19:40:57 +0100 (CET) Original-Received: from [91.35.185.111] (helo=[192.168.1.2]) by smtp05.web.de with asmtp (WEB.DE 4.110 #314) id 1NGzYv-00080S-00; Sat, 05 Dec 2009 19:40:57 +0100 In-Reply-To: X-Mailer: Apple Mail (2.936) X-Sender: Peter_Dyballa@web.de X-Provags-ID: V01U2FsdGVkX18eCsucI4Srl3CZ8PO8AqiFrl8nddltbUHNZcWd xWDCmDxE/zT53R4v/KHlbyrnO53toDnAKek1dQ0RXMyO/XOT2U ZHUh1LWsC5ds5Ah/Q/HA== X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70442 Archived-At: Am 05.12.2009 um 17:03 schrieb deech: > Is there a way to (1) search for the UTF-8 encoded characters in a > document Yes. In GNU Emacs 23 I've seen in the *Warnings* buffer hyper-links to =20= the characters not fitting into the specified encoding. You could also search for the usual prefixes of UTF-{7,8,16} encoded =20 characters. > and (2) map them to a sensible ASCII character? How can you map 100,000 or 200,000 characters to a very limited set of =20= 100? This mapping would be candidate for the most successful =20 compression algorithm... Besides, it's not sane to save a file in an encoding a when the file's =20= header tells its contents is in encoding b. -- Greetings Pete If you don't find it in the index, look very carefully through the =20 entire catalogue. =96 Sears, Roebuck, and Co., Consumer's Guide, 1897