From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ken Newsgroups: gmane.emacs.help Subject: replacing characters and whacky trans-buffer conversion Date: Tue, 06 Mar 2007 10:15:00 -0500 Message-ID: <45ED8574.3040201@speakeasy.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1173194224 23176 80.91.229.12 (6 Mar 2007 15:17:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 6 Mar 2007 15:17:04 +0000 (UTC) To: GNU Emacs List Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Mar 06 16:16:55 2007 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1HObPB-0003bI-Kn for geh-help-gnu-emacs@m.gmane.org; Tue, 06 Mar 2007 16:16:47 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HObP6-0004wd-OQ for geh-help-gnu-emacs@m.gmane.org; Tue, 06 Mar 2007 10:16:40 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HObOV-0004my-EZ for help-gnu-emacs@gnu.org; Tue, 06 Mar 2007 10:16:03 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HObOQ-0004kV-Tf for help-gnu-emacs@gnu.org; Tue, 06 Mar 2007 10:16:03 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HObOQ-0004kN-FL for help-gnu-emacs@gnu.org; Tue, 06 Mar 2007 10:15:58 -0500 Original-Received: from mail7.sea5.speakeasy.net ([69.17.117.9]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1HObOM-0005zq-NZ for help-gnu-emacs@gnu.org; Tue, 06 Mar 2007 10:15:57 -0500 Original-Received: (qmail 21005 invoked from network); 6 Mar 2007 15:15:02 -0000 Original-Received: from dsl093-011-017.cle1.dsl.speakeasy.net (HELO [192.168.0.27]) (gebser@[66.93.11.17]) (envelope-sender ) by mail7.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 6 Mar 2007 15:15:02 -0000 User-Agent: Thunderbird 2.0pre (X11/20070214) X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:41717 Archived-At: An email comes in with this (emdash) character in it: – It looks like an em-dash until the text containing it is pasted into an emacs buffer; then it appears as a series of "garbage characters". (Copy and paste the emdash into an emacs buffer yourself, and perhaps you'll see what I mean.) To me and, possibly to you, this emdash appears in emacs as nine (9) "garbage" characters. Because I want to programmatically replace these 9 garbage characters into something latin1-friendly, I copy-and-paste these nine characters into an *.el file containing a line like this: (replace-string "–" "--" nil (point-min) (point-max)) The sought string (i.e., the first argument above) isn't found, however because, for some whacky reason, the emdash pasted into the *.el file is different-- by one character-- from exactly the same emdash pasted into the other emacs buffer (the one I'm saving the email in). In the emacs buffer containing the email, the fourth garbage character (as shown by C-u C-x=) is: character: β (05542, 2914, 0xb62) charset: greek-iso8859-7 (Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7): ISO-IR-126) code point: 98 syntax: word category: g:Greek buffer code: 0x86 0xE2 file code: not encodable by coding system undecided-unix font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-7 In the *.el buffer, the fourth garbage character (which should be exactly the same character) is: character: â (0342, 226, 0xe2) charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF)) code point: 226 syntax: whitespace category: buffer code: 0xE2 file code: 0xE2 (encoded by coding system raw-text-unix) font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1 I tried entering "C-q 5542 RETURN" into the *.el file, but emacs immediately makes it into the second (â, or 0342) character. Doing the same into the other emacs buffer (containing my copy of the email) *does* enter the good (β, or 05542) character. All I really want is for the above replace-string function to work as expected. But emacs consistently converts that fourth character in the emdash string into a different character, subsequently causing the search to fail. So how do I get the correct "garbage" characters into the first argument of the replace-string function-- i.e., into the *.el file? tnx, ken -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler