From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: pjb@informatimago.com (Pascal J. Bourguignon) Newsgroups: gmane.emacs.help Subject: Re: Finding and mapping all UTF-8 characters Date: Sat, 05 Dec 2009 17:38:08 +0100 Organization: Informatimago Message-ID: <87r5r9idfz.fsf@galatea.local> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1260031305 26225 80.91.229.12 (5 Dec 2009 16:41:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 5 Dec 2009 16:41:45 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 17:41:38 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NGxhS-0000tv-Di for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 17:41:38 +0100 Original-Received: from localhost ([127.0.0.1]:55057 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NGxhS-0006kQ-7E for geh-help-gnu-emacs@m.gmane.org; Sat, 05 Dec 2009 11:41:38 -0500 Original-Path: news.stanford.edu!usenet.stanford.edu!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 48 Original-X-Trace: individual.net YRw/BuVaUmxPF0dRvTr19wNVU186K7TP8mtZ14qyoIzXqSFheP Cancel-Lock: sha1:NDVmZDBmNjVkMmMwOGJhMTI0N2I4NjlmNzUxNDU0ZjFhMTlmMjZmNg== sha1:WoUq/ys4DqKdOTX0Lo5cKlq8Hj8= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAABlBMVEUAAAD///+l2Z/dAAAA oElEQVR4nK3OsRHCMAwF0O8YQufUNIQRGIAja9CxSA55AxZgFO4coMgYrEDDQZWPIlNAjwq9 033pbOBPtbXuB6PKNBn5gZkhGa86Z4x2wE67O+06WxGD/HCOGR0deY3f9Ijwwt7rNGNf6Oac l/GuZTF1wFGKiYYHKSFAkjIo1b6sCYS1sVmFhhhahKQssRjRT90ITWUk6vvK3RsPGs+M1RuR mV+hO/VvFAAAAABJRU5ErkJggg== X-Accept-Language: fr, es, en X-Disabled: X-No-Archive: no User-Agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/22.3 (darwin) Original-Xref: news.stanford.edu gnu.emacs.help:175365 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70439 Archived-At: deech writes: > Hi all, > I recently cut-and-pasted large chunks of text into an HTML document. > When I tried to save the document I was warned that it was ISO-Latin > but there were UTF-8 characters in the text. I doubt it warned that. ISO-Latin is not a character encoding, it is a familly of character encodings. A HTML document is not encoded by a familly of encodings, but by one single encoding. UTF-8 is a character encoding. A character is not a character encoding. So a sentence saying that "a document is ISO-Latin but there are UTF-8 characters in the text." is totally meaningless. > Is there a way to (1) search for the UTF-8 encoded characters in a > document and No it is not possible, because characters in a document are not encoded, they are characters, that's all. > (2) map them to a sensible ASCII character? How do you map sensibly =E2=88=88, =E3=8E=B2, =E7=BA=BA or =E2=87=A3 to the= characters in the ASCII character set? But even if you choosed a mapping (you could for example map the character to their names: ELEMENT_OF, SQUARE_MU_S, U7EBA, and DOWNWARDS_DASHED_ARROW), why would you want to do such a thing? HTML is perfectly able to use encodings that can encode unicode characters, and all the current browsers are able to deal with HTML documents encoding unicode characters, so why would you want to massacre your document? (There's a valid reason to be wanting to do that, but if you don't know it, then you don't have it). --=20 __Pascal Bourguignon__