From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ken Newsgroups: gmane.emacs.help Subject: Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe? Date: Sat, 17 Feb 2007 07:06:11 -0500 Message-ID: <45D6EFB3.3010202@speakeasy.net> References: <1171628373.417583.61410@k78g2000cwa.googlegroups.com> <87zm7e8e7j.fsf@wivenhoe.staff8.ul.ie> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1171714006 10050 80.91.229.12 (17 Feb 2007 12:06:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 17 Feb 2007 12:06:46 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Feb 17 13:06:39 2007 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1HIOKs-0003zq-NX for geh-help-gnu-emacs@m.gmane.org; Sat, 17 Feb 2007 13:06:39 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HIOKs-0006Nw-68 for geh-help-gnu-emacs@m.gmane.org; Sat, 17 Feb 2007 07:06:38 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HIOKg-0006Nr-04 for help-gnu-emacs@gnu.org; Sat, 17 Feb 2007 07:06:26 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HIOKe-0006Nf-C2 for help-gnu-emacs@gnu.org; Sat, 17 Feb 2007 07:06:24 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HIOKe-0006Nc-4e for help-gnu-emacs@gnu.org; Sat, 17 Feb 2007 07:06:24 -0500 Original-Received: from mail7.sea5.speakeasy.net ([69.17.117.9]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1HIOKd-0002lC-Em for help-gnu-emacs@gnu.org; Sat, 17 Feb 2007 07:06:23 -0500 Original-Received: (qmail 28000 invoked from network); 17 Feb 2007 12:06:21 -0000 Original-Received: from dsl093-011-017.cle1.dsl.speakeasy.net (HELO [192.168.0.27]) (gebser@[66.93.11.17]) (envelope-sender ) by mail7.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 17 Feb 2007 12:06:21 -0000 User-Agent: Thunderbird 1.5.0.9 (X11/20061206) In-Reply-To: X-Enigmail-Version: 0.94.1.1 OpenPGP: id=45796D04 X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:41286 Archived-At: On 02/16/2007 08:47 PM somebody named Stefan Monnier wrote: >> (while (re-search-forward "[€-Ÿ]" nil t) >> (let ((mschar (buffer-substring-no-properties >> (match-beginning 0) (match-end 0)))) >> (cond >> ((string= mschar "‘") (replace-match "`" )) >> ((string= mschar "’") (replace-match "'" )) >> ((string= mschar "“") (replace-match "``")) >> ((string= mschar "”") (replace-match "''")) >> ((string= mschar "–") (replace-match "--"))))) > > Better work on chars rather than strings of one-char. Also better not use > those special chars that are sometimes displayed as \200 and use the \ > 2 0 0 escape sequence instead: > > (require 'cl) > (defun my-fun-foo () > (interactive) > (goto-char (point-min)) > (while (re-search-forward "[\200-\237]" nil t) > (case (char-before) > (?\221 (replace-match "`" )) > (?\222 (replace-match "'" )) > (?\233 (replace-match "``")) > (?\224 (replace-match "''")) > (?\226 (replace-match "--"))))) > > > -- Stefan > > > PS: Guaranteed 100% untested. Stefan, Technically you're correct. It's probably a lot less executable to specify a char than a string consisting of one byte. However, I try to make life easier for the programmer (me and, in an opensource world, everyone else) by making the code as simple as possible. The code written should also accomplish what the user wants it to. These considerations more than overwhelm any pity I might have for the CPU. Moreover, MS files often contain "characters" such as "—", their extraordinary rendition of an em-dash. If elisp is to search-and-replace this (multi-byte) "character", it must use (else develop) a function which understands strings. True, the elisp code could use the more efficient code when searching for a single-byte character, but for the sake of uniformity and to make modification of the code easier, the less efficient code is preferable. Moreover, coding efforts to increase efficiency are typically secondary to those which result in code that works. And we don't have that yet.