all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
@ 2007-02-16 12:19 Endless Story
  2007-02-16 12:44 ` Brendan Halpin
  2007-02-16 16:22 ` Stefan Monnier
  0 siblings, 2 replies; 22+ messages in thread
From: Endless Story @ 2007-02-16 12:19 UTC (permalink / raw)
  To: help-gnu-emacs

My apologies as I know this topic has been covered before, but despite
much Googling I haven't found a simple, direct solution to the
problem.

I have just started seeing lots of nasty stuff like \222 instead of
apostrophes in working on text files in Emacs on XP, then trying to
reformat these files for LaTeX. My guess is I picked this garbage up
when I copy-and-pasted some material out of Word, though I'm not
certain. Emacs on XP displays these characters fine when I am just
working with a .txt or .org file - which is why I didn't pick up on
the problem earlier.

I am looking for a simple function or .el package that will remove the
garbage and replace it with latin1 encoding or whatever is acceptable
to LaTeX. I have done lots of Googling but haven't come up with
anything that doesn't refer to gnus (which I don't use) or which
doesn't get me lost in talk of this encoding system or that. A search-
and-replace function will do fine - and I will even write it myself if
someone gives me a tip on how to get started (again, I'm not a lisp
expert).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 12:19 How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe? Endless Story
@ 2007-02-16 12:44 ` Brendan Halpin
  2007-02-16 16:02   ` ken
                     ` (4 more replies)
  2007-02-16 16:22 ` Stefan Monnier
  1 sibling, 5 replies; 22+ messages in thread
From: Brendan Halpin @ 2007-02-16 12:44 UTC (permalink / raw)
  To: help-gnu-emacs

"Endless Story" <usable.thought@gmail.com> writes:

> I have just started seeing lots of nasty stuff like \222 instead of
> apostrophes in working on text files in Emacs on XP, then trying to
> reformat these files for LaTeX. 

Something like this might work: 

  (while (re-search-forward "[€-Ÿ]" nil t)
    (let ((mschar (buffer-substring-no-properties 
                   (match-beginning 0) (match-end 0))))
      (cond 
       ((string= mschar "‘") (replace-match "`" )) 
       ((string= mschar "’") (replace-match "'" )) 
       ((string= mschar "“") (replace-match "``")) 
       ((string= mschar "”") (replace-match "''")) 
       ((string= mschar "–") (replace-match "--")))))

Obviously, the list of matches can be extended.

Brendan
-- 
Brendan Halpin,  Department of Sociology,  University of Limerick,  Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F2-025 x 3147
mailto:brendan.halpin@ul.ie  http://www.ul.ie/sociology/brendan.halpin.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 12:44 ` Brendan Halpin
@ 2007-02-16 16:02   ` ken
  2007-02-16 16:14     ` Sebastian P. Luque
       [not found]   ` <mailman.4605.1171641752.2155.help-gnu-emacs@gnu.org>
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: ken @ 2007-02-16 16:02 UTC (permalink / raw)
  To: help-gnu-emacs


On 02/16/2007 07:44 AM somebody named Brendan Halpin wrote:
> "Endless Story" <usable.thought@gmail.com> writes:
> 
>> I have just started seeing lots of nasty stuff like \222 instead of
>> apostrophes in working on text files in Emacs on XP, then trying to
>> reformat these files for LaTeX. 
> 
> Something like this might work: 
> 
>   (while (re-search-forward "[€-Ÿ]" nil t)
>     (let ((mschar (buffer-substring-no-properties 
>                    (match-beginning 0) (match-end 0))))
>       (cond 
>        ((string= mschar "‘") (replace-match "`" )) 
>        ((string= mschar "’") (replace-match "'" )) 
>        ((string= mschar "“") (replace-match "``")) 
>        ((string= mschar "”") (replace-match "''")) 
>        ((string= mschar "–") (replace-match "--")))))
> 
> Obviously, the list of matches can be extended.
> 
> Brendan

Thanks much for this.  To make better use, I added a bit of code to the
above:

(defun replace-garbage-chars ()
"Replace goofy MS and other garbage characters with latin1 equivalents."
(interactive)
(save-excursion				;save the current point
  (goto-char (point-min))		;go to begin of buffer
	     (while (re-search-forward "[€-Ÿ]" nil t)
	       (let ((mschar (buffer-substring-no-properties
			      (match-beginning 0) (match-end 0))))
		 (cond
		  ((string= mschar "‘") (replace-match "`" ))
		  ((string= mschar "Â’") (replace-match "'" ))
		  ((string= mschar "“") (replace-match "``"))
		  ((string= mschar "”") (replace-match "''"))
		  ((string= mschar "—") (replace-match "--'"))
		  ((string= mschar "–") (replace-match "--")))))))

This allows, or should allow, the binding of the function to a key
chord.  But something is not right.  Can anyone tell where the problem
is.  Running this function I get no errors, but the garbage/mschars are
not replaced.


tia,
ken

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 16:02   ` ken
@ 2007-02-16 16:14     ` Sebastian P. Luque
  2007-02-17 11:04       ` ken
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian P. Luque @ 2007-02-16 16:14 UTC (permalink / raw)
  To: help-gnu-emacs

On Fri, 16 Feb 2007 11:02:23 -0500,
ken <gebser@speakeasy.net> wrote:

[...]

> This allows, or should allow, the binding of the function to a key
> chord.  But something is not right.  Can anyone tell where the problem
> is.  Running this function I get no errors, but the garbage/mschars are
> not replaced.

I can't tell what is wrong with that code, but I've solved those issues
with this (I lost track where I stole this from):


(standard-display-ascii ?\200 (vector (decode-char 'ucs #x253c)))
(standard-display-ascii ?\201 (vector (decode-char 'ucs #x251c)))
(standard-display-ascii ?\202 (vector (decode-char 'ucs #x252c)))
(standard-display-ascii ?\203 (vector (decode-char 'ucs #x250c)))
(standard-display-ascii ?\204 (vector (decode-char 'ucs #x2524)))
(standard-display-ascii ?\205 (vector (decode-char 'ucs #x2502)))
(standard-display-ascii ?\206 (vector (decode-char 'ucs #x2510)))
(standard-display-ascii ?\210 (vector (decode-char 'ucs #x2534)))
(standard-display-ascii ?\211 (vector (decode-char 'ucs #x2514)))
(standard-display-ascii ?\212 (vector (decode-char 'ucs #x2500)))
(standard-display-ascii ?\214 (vector (decode-char 'ucs #x2518)))
(standard-display-ascii ?\220 [? ])
(standard-display-ascii ?\221 [?\` ])
(standard-display-ascii ?\222 [?\'])
(standard-display-ascii ?\223 [?\"])
(standard-display-ascii ?\224 [?\"])
(standard-display-ascii ?\225 "* ")
(standard-display-ascii ?\226 "--")
(standard-display-ascii ?\227 " -- ")



-- 
Seb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 12:19 How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe? Endless Story
  2007-02-16 12:44 ` Brendan Halpin
@ 2007-02-16 16:22 ` Stefan Monnier
  2007-02-16 17:00   ` Endless Story
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2007-02-16 16:22 UTC (permalink / raw)
  To: help-gnu-emacs

> My apologies as I know this topic has been covered before, but despite
> much Googling I haven't found a simple, direct solution to the
> problem.

> I have just started seeing lots of nasty stuff like \222 instead of
> apostrophes in working on text files in Emacs on XP, then trying to
> reformat these files for LaTeX. My guess is I picked this garbage up
> when I copy-and-pasted some material out of Word, though I'm not
> certain. Emacs on XP displays these characters fine when I am just
> working with a .txt or .org file - which is why I didn't pick up on
> the problem earlier.

> I am looking for a simple function or .el package that will remove the
> garbage and replace it with latin1 encoding or whatever is acceptable
> to LaTeX. I have done lots of Googling but haven't come up with
> anything that doesn't refer to gnus (which I don't use) or which
> doesn't get me lost in talk of this encoding system or that. A search-
> and-replace function will do fine - and I will even write it myself if
> someone gives me a tip on how to get started (again, I'm not a lisp
> expert).

An alternative solution to the one you request might be to open the file
using the `windows-1252' coding-system rather than `latin-1'.

	C-x RET c windows-1252 RET C-x C-f <thefile> RET

or if you have a recent Emacs, you can just revert the buffer with
a different coding system:

	C-x RET r windows-1252 RET


-- Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
       [not found]   ` <mailman.4605.1171641752.2155.help-gnu-emacs@gnu.org>
@ 2007-02-16 16:59     ` Brendan Halpin
  2007-02-16 22:22       ` ken
  0 siblings, 1 reply; 22+ messages in thread
From: Brendan Halpin @ 2007-02-16 16:59 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@speakeasy.net> writes:

> But something is not right.  Can anyone tell where the problem
> is.  Running this function I get no errors, but the garbage/mschars are
> not replaced.

The problem may be that in passing through usenet and from one
machine to another, the "garbage" characters got garbled -- in your
version I see things that look like "Â\200" which indicates
problems with multibyte characters. 

You could try deleting them and entering them again: to enter \200,
for instance, do C-q 2 0 0 RET.

Brendan
-- 
Brendan Halpin,  Department of Sociology,  University of Limerick,  Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F2-025 x 3147
mailto:brendan.halpin@ul.ie  http://www.ul.ie/sociology/brendan.halpin.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 16:22 ` Stefan Monnier
@ 2007-02-16 17:00   ` Endless Story
  2007-02-16 22:00     ` Radamanthe
  0 siblings, 1 reply; 22+ messages in thread
From: Endless Story @ 2007-02-16 17:00 UTC (permalink / raw)
  To: help-gnu-emacs

On Feb 16, 11:22 am, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
> An alternative solution to the one you request might be to open the file
> using the `windows-1252' coding-system rather than `latin-1'.
>
>         C-x RET c windows-1252 RET C-x C-f <thefile> RET
>
> or if you have a recent Emacs, you can just revert the buffer with
> a different coding system:
>
>         C-x RET r windows-1252 RET

This does nothing to address the situation I am talking about, where
these bad characters need to be removed altogether so they will not
foul up text files handed to LaTeX. Also Sebastian's suggestion of
(standard-display-ascii ?\200 (vector (decode-char 'ucs #x253c))),
etc., does nothing either. As far as I can see both of these
approaches merely permit the "correct" display of \222 etc. inside
Emacs. That's not my problem!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 17:00   ` Endless Story
@ 2007-02-16 22:00     ` Radamanthe
  0 siblings, 0 replies; 22+ messages in thread
From: Radamanthe @ 2007-02-16 22:00 UTC (permalink / raw)
  To: help-gnu-emacs

Endless Story wrote:
> On Feb 16, 11:22 am, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
>> An alternative solution to the one you request might be to open the file
>> using the `windows-1252' coding-system rather than `latin-1'.
>>
>>         C-x RET c windows-1252 RET C-x C-f <thefile> RET
>>
>> or if you have a recent Emacs, you can just revert the buffer with
>> a different coding system:
>>
>>         C-x RET r windows-1252 RET
> 
> This does nothing to address the situation I am talking about, where
> these bad characters need to be removed altogether so they will not
> foul up text files handed to LaTeX. Also Sebastian's suggestion of
> (standard-display-ascii ?\200 (vector (decode-char 'ucs #x253c))),
> etc., does nothing either. As far as I can see both of these
> approaches merely permit the "correct" display of \222 etc. inside
> Emacs. That's not my problem!
> 

Emacs display \222 in place of the character because it did not found 
this one in the font (btw, \222 = 0x92 = 146 which is not a "visual" 
character), so it displays the character with its octal code. If you 
move your cursor over it, you'll see that the whole \222 is just a 
unique character.

Tell us what should Emacs do to represent a character it doesn't have 
the layout for ? :)


-- 
R.N.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 16:59     ` Brendan Halpin
@ 2007-02-16 22:22       ` ken
  0 siblings, 0 replies; 22+ messages in thread
From: ken @ 2007-02-16 22:22 UTC (permalink / raw)
  To: GNU Emacs List


On 02/16/2007 11:59 AM somebody named Brendan Halpin wrote:
> ken <gebser@speakeasy.net> writes:
> 
> The problem may be that in passing through usenet and from one
> machine to another, the "garbage" characters got garbled -- in your
> version I see things that look like "Â\200" which indicates
> problems with multibyte characters. 
> 
> You could try deleting them and entering them again: to enter \200,
> for instance, do C-q 2 0 0 RET.

Cutting and pasting from an email, I get this (which in the email is an
em-dash):

—

Hmmm.  Funny.  When I cut-n-paste the string of chars from emacs back
into this Tbird compose window, it looks like an  em-dash again.  Here's
what C-x= tells it is (one character at a time):

ESC % G â \200 \224 ESC % @

How do I get emacs to understand that, i.e., to look for in order to
replace?

tnx++,
ken

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 12:44 ` Brendan Halpin
  2007-02-16 16:02   ` ken
       [not found]   ` <mailman.4605.1171641752.2155.help-gnu-emacs@gnu.org>
@ 2007-02-17  1:47   ` Stefan Monnier
  2007-02-17 12:06     ` ken
       [not found]     ` <mailman.4653.1171713988.2155.help-gnu-emacs@gnu.org>
  2007-02-19 14:17   ` ken
       [not found]   ` <mailman.4721.1171894691.2155.help-gnu-emacs@gnu.org>
  4 siblings, 2 replies; 22+ messages in thread
From: Stefan Monnier @ 2007-02-17  1:47 UTC (permalink / raw)
  To: help-gnu-emacs

>   (while (re-search-forward "[€-Ÿ]" nil t)
>     (let ((mschar (buffer-substring-no-properties 
>                    (match-beginning 0) (match-end 0))))
>       (cond 
>        ((string= mschar "‘") (replace-match "`" )) 
>        ((string= mschar "’") (replace-match "'" )) 
>        ((string= mschar "“") (replace-match "``")) 
>        ((string= mschar "”") (replace-match "''")) 
>        ((string= mschar "–") (replace-match "--")))))

Better work on chars rather than strings of one-char.  Also better not use
those special chars that are sometimes displayed as \200 and use the \
2 0 0 escape sequence instead:

   (require 'cl)
   (defun my-fun-foo ()
     (interactive)
     (goto-char (point-min))
     (while (re-search-forward "[\200-\237]" nil t)
       (case (char-before)
        (?\221 (replace-match "`" ))
        (?\222 (replace-match "'" ))
        (?\233 (replace-match "``"))
        (?\224 (replace-match "''"))
        (?\226 (replace-match "--")))))


-- Stefan


PS: Guaranteed 100% untested.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 16:14     ` Sebastian P. Luque
@ 2007-02-17 11:04       ` ken
  0 siblings, 0 replies; 22+ messages in thread
From: ken @ 2007-02-17 11:04 UTC (permalink / raw)
  To: help-gnu-emacs


On 02/16/2007 11:14 AM somebody named Sebastian P. Luque wrote:
> On Fri, 16 Feb 2007 11:02:23 -0500,
> ken <gebser@speakeasy.net> wrote:
> 
> [...]
> 
>> This allows, or should allow, the binding of the function to a key
>> chord.  But something is not right.  Can anyone tell where the problem
>> is.  Running this function I get no errors, but the garbage/mschars are
>> not replaced.
> 
> I can't tell what is wrong with that code, but I've solved those issues
> with this (I lost track where I stole this from):
> 
> 
> (standard-display-ascii ?\200 (vector (decode-char 'ucs #x253c)))
> (standard-display-ascii ?\201 (vector (decode-char 'ucs #x251c)))
> (standard-display-ascii ?\202 (vector (decode-char 'ucs #x252c)))
> (standard-display-ascii ?\203 (vector (decode-char 'ucs #x250c)))
> (standard-display-ascii ?\204 (vector (decode-char 'ucs #x2524)))
> (standard-display-ascii ?\205 (vector (decode-char 'ucs #x2502)))
> (standard-display-ascii ?\206 (vector (decode-char 'ucs #x2510)))
> (standard-display-ascii ?\210 (vector (decode-char 'ucs #x2534)))
> (standard-display-ascii ?\211 (vector (decode-char 'ucs #x2514)))
> (standard-display-ascii ?\212 (vector (decode-char 'ucs #x2500)))
> (standard-display-ascii ?\214 (vector (decode-char 'ucs #x2518)))
> (standard-display-ascii ?\220 [? ])
> (standard-display-ascii ?\221 [?\` ])
> (standard-display-ascii ?\222 [?\'])
> (standard-display-ascii ?\223 [?\"])
> (standard-display-ascii ?\224 [?\"])
> (standard-display-ascii ?\225 "* ")
> (standard-display-ascii ?\226 "--")
> (standard-display-ascii ?\227 " -- ")

Thanks, Seb.  As I understand this code (and, frankly, I don't totally),
it will handle single-character substrings only.  MS and other files
frequently contain multi-byte characters which appear in emacs as a
string of garbage characters.  (See example in my previous post.)
While of course it's best to write code which doesn't exceed the
capabilities of the programming language, it's pretty much a mark of a
mature programming language that it has the capability to
programmatically search-and-replace multi-byte strings in a file, even
should the sought- and replacement-strings be of different lengths.  In
fact, I made just this sort of function a homework assignment in my C
Programming 101 class many years ago.  So it's rather mystifying to me,
especially given the prevalence of the need for such functionality, that
such a trivial bit of programming is such a challenge in elisp.  Maybe
I'll take another run at it later.

Best,
ken

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-17  1:47   ` Stefan Monnier
@ 2007-02-17 12:06     ` ken
       [not found]     ` <mailman.4653.1171713988.2155.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 22+ messages in thread
From: ken @ 2007-02-17 12:06 UTC (permalink / raw)
  To: help-gnu-emacs


On 02/16/2007 08:47 PM somebody named Stefan Monnier wrote:
>>   (while (re-search-forward "[€-Ÿ]" nil t)
>>     (let ((mschar (buffer-substring-no-properties 
>>                    (match-beginning 0) (match-end 0))))
>>       (cond 
>>        ((string= mschar "‘") (replace-match "`" )) 
>>        ((string= mschar "’") (replace-match "'" )) 
>>        ((string= mschar "“") (replace-match "``")) 
>>        ((string= mschar "”") (replace-match "''")) 
>>        ((string= mschar "–") (replace-match "--")))))
> 
> Better work on chars rather than strings of one-char.  Also better not use
> those special chars that are sometimes displayed as \200 and use the \
> 2 0 0 escape sequence instead:
> 
>    (require 'cl)
>    (defun my-fun-foo ()
>      (interactive)
>      (goto-char (point-min))
>      (while (re-search-forward "[\200-\237]" nil t)
>        (case (char-before)
>         (?\221 (replace-match "`" ))
>         (?\222 (replace-match "'" ))
>         (?\233 (replace-match "``"))
>         (?\224 (replace-match "''"))
>         (?\226 (replace-match "--")))))
> 
> 
> -- Stefan
> 
> 
> PS: Guaranteed 100% untested.

Stefan,

Technically you're correct.  It's probably a lot less executable to
specify a char than a string consisting of one byte.  However, I try to
make life easier for the programmer (me and, in an opensource world,
everyone else) by making the code as simple as possible.  The code
written should also accomplish what the user wants it to.  These
considerations more than overwhelm any pity I might have for the CPU.

Moreover, MS files often contain "characters" such as "—", their
extraordinary rendition of an em-dash.  If elisp is to
search-and-replace this (multi-byte) "character", it must use (else
develop) a function which understands strings.

True, the elisp code could use the more efficient code when searching
for a single-byte character, but for the sake of uniformity and to make
modification of the code easier, the less efficient code is preferable.
 Moreover, coding efforts to increase efficiency are typically secondary
to those which result in code that works.  And we don't have that yet.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
       [not found]     ` <mailman.4653.1171713988.2155.help-gnu-emacs@gnu.org>
@ 2007-02-17 17:29       ` Stefan Monnier
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Monnier @ 2007-02-17 17:29 UTC (permalink / raw)
  To: help-gnu-emacs

> Moreover, MS files often contain "characters" such as "—", their
> extraordinary rendition of an em-dash.

Luckily I have no idea what you're talking about.

> True, the elisp code could use the more efficient code when searching
> for a single-byte character, but for the sake of uniformity and to make
> modification of the code easier, the less efficient code is preferable.
>  Moreover, coding efforts to increase efficiency are typically secondary
> to those which result in code that works.  And we don't have that yet.

There's of course no perfect answer.  My main change was to go from the
single special char \200 to the \ 2 0 0 escape sequence, which should make
things more reliable.

The change from strings to char, was just a particular decision which made
sense in this particular context, but of course, it's not always better.


        Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-16 12:44 ` Brendan Halpin
                     ` (2 preceding siblings ...)
  2007-02-17  1:47   ` Stefan Monnier
@ 2007-02-19 14:17   ` ken
  2007-02-19 16:28     ` Shanks N
       [not found]   ` <mailman.4721.1171894691.2155.help-gnu-emacs@gnu.org>
  4 siblings, 1 reply; 22+ messages in thread
From: ken @ 2007-02-19 14:17 UTC (permalink / raw)
  To: help-gnu-emacs


> "Endless Story" <usable.thought@gmail.com> writes:
> 
>> I have just started seeing lots of nasty stuff like \222 instead of
>> apostrophes in working on text files in Emacs on XP, then trying to
>> reformat these files for LaTeX. 
> 
> ....

The below is not textbook elisp, but it works, is easily understandable,
and so too is easy to modify and add other "characters" to.  For
example, if you have a typical set of chars which signal the beginning
of paragraph (like "\n\n"), you could insert another replace-string line
to convert that to the appropriate LaTeX (or HTML or whatever) coding
for "paragraph".  Such a set of replacements might be better organized
into a separate (but similar) function however.  Open Source == Your Choice.

(defun replace-garbage-chars ()
"Replace goofy MS and other garbage characters with latin1 equivalents."
(interactive)
(save-excursion				;save the current point
  (replace-string "—" "--" nil (point-min) (point-max)) ; multi-byte
  (replace-string "‘" "`" nil (point-min) (point-max))
  (replace-string "’" "'" nil (point-min) (point-max))
  (replace-string "“" "``" nil (point-min) (point-max))
  (replace-string "”" "''" nil (point-min) (point-max))
  (replace-string "–" "--" nil (point-min) (point-max))
))

Note that chars/strings within the first set of double-quotes in each
pair of replace-string args appear in emacs as, e.g., "\221".  To enter
these escaped numbers, e.g. "\221", do C-q 2 2 1 RETURN.

Also, multi-byte strings such as the first should be toward
the top of the list so that single-byte replacements don't
cut them up, making subsequent searches for them impossible.

To discover the code for a new (garbage) char to be replaced,
put the point over it and do "C-x="; the first code returned in
the minibuffer tells you the escaped number you want to replace.

With this function in a file in directory in the emacs path and this in
my ~/.emacs:

(global-set-key "\C-cr" 'replace-garbage-chars)

doing C-cr in an emacs buffer performs the replacements without moving
the point... exactly what I was looking for.


Enjoy,
ken

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-19 14:17   ` ken
@ 2007-02-19 16:28     ` Shanks N
  2007-02-19 18:48       ` ken
  0 siblings, 1 reply; 22+ messages in thread
From: Shanks N @ 2007-02-19 16:28 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@speakeasy.net> writes:

>> "Endless Story" <usable.thought@gmail.com> writes:
>> 
>>> I have just started seeing lots of nasty stuff like \222 instead of
>>> apostrophes in working on text files in Emacs on XP, then trying to
>>> reformat these files for LaTeX. 
>> 
>> ....
>

[...]

> (defun replace-garbage-chars ()
> "Replace goofy MS and other garbage characters with latin1 equivalents."
> (interactive)
> (save-excursion				;save the current point
>   (replace-string "—" "--" nil (point-min) (point-max)) ; multi-byte
>   (replace-string "‘" "`" nil (point-min) (point-max))
>   (replace-string "’" "'" nil (point-min) (point-max))
>   (replace-string "“" "``" nil (point-min) (point-max))
>   (replace-string "”" "''" nil (point-min) (point-max))
>   (replace-string "–" "--" nil (point-min) (point-max))
> ))
>
> Note that chars/strings within the first set of double-quotes in each
> pair of replace-string args appear in emacs as, e.g., "\221".  To enter
> these escaped numbers, e.g. "\221", do C-q 2 2 1 RETURN.
>
> Also, multi-byte strings such as the first should be toward
> the top of the list so that single-byte replacements don't
> cut them up, making subsequent searches for them impossible.
>
> To discover the code for a new (garbage) char to be replaced,
> put the point over it and do "C-x="; the first code returned in
> the minibuffer tells you the escaped number you want to replace.
>
> With this function in a file in directory in the emacs path and this in
> my ~/.emacs:
>
> (global-set-key "\C-cr" 'replace-garbage-chars)
>
> doing C-cr in an emacs buffer performs the replacements without moving
> the point... exactly what I was looking for.
>
>
> Enjoy,
> ken

Can this or one of the other solutions on this thread be posted on the
emacwiki site, if you have the time?  It'd help others.

regards,
Shanks

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-19 16:28     ` Shanks N
@ 2007-02-19 18:48       ` ken
  0 siblings, 0 replies; 22+ messages in thread
From: ken @ 2007-02-19 18:48 UTC (permalink / raw)
  To: help-gnu-emacs


On 02/19/2007 11:28 AM somebody named Shanks N wrote:
> ken <gebser@speakeasy.net> writes:
> 
>>> "Endless Story" <usable.thought@gmail.com> writes:
>>>
>>>> I have just started seeing lots of nasty stuff like \222 instead of
>>>> apostrophes in working on text files in Emacs on XP, then trying to
>>>> reformat these files for LaTeX. 
>>> ....
> 
> [...]
> 
>> (defun replace-garbage-chars ()
>> "Replace goofy MS and other garbage characters with latin1 equivalents."
>> ....
> 
> Can this or one of the other solutions on this thread be posted on the
> emacwiki site, if you have the time?  It'd help others.
> 
> regards,
> Shanks
> 

Good idea, Shanks!  Thanks.  Did that:
<http://www.emacswiki.org/cgi-bin/emacs-en/ReplaceGarbageChars>

ken

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
       [not found]   ` <mailman.4721.1171894691.2155.help-gnu-emacs@gnu.org>
@ 2007-02-20  2:00     ` Stefan Monnier
  2007-02-20 10:45       ` ken
  2007-02-21  9:29     ` Endless Story
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2007-02-20  2:00 UTC (permalink / raw)
  To: help-gnu-emacs

>   (replace-string "—" "--" nil (point-min) (point-max)) ; multi-byte
[...]
> Also, multi-byte strings such as the first should be toward
> the top of the list so that single-byte replacements don't
> cut them up, making subsequent searches for them impossible.

Huh?  There is no overlap between the single-char string "—" and the single
char strings like "\221".  You can reorder them all you like.

I'm not sure what you mean by "multi-byte", but it sounds like there might
be some confusion here.


        Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-20  2:00     ` Stefan Monnier
@ 2007-02-20 10:45       ` ken
  2007-02-20 10:56         ` Juanma Barranquero
  0 siblings, 1 reply; 22+ messages in thread
From: ken @ 2007-02-20 10:45 UTC (permalink / raw)
  To: help-gnu-emacs

On 02/19/2007 09:00 PM somebody named Stefan Monnier wrote:
>>   (replace-string "—" "--" nil (point-min) (point-max)) ; multi-byte
> [...]
>> Also, multi-byte strings such as the first should be toward
>> the top of the list so that single-byte replacements don't
>> cut them up, making subsequent searches for them impossible.
> 
> Huh?  There is no overlap between the single-char string "—" and the single
> char strings like "\221".  You can reorder them all you like.
> 
> I'm not sure what you mean by "multi-byte", but it sounds like there might
> be some confusion here.
> 
> 
>         Stefan

When I copy-n-paste "—" into an emacs buffer, one of the (several) bytes
displayed is represented by '\224'.  I would assume that others will
find the same.  So then of course if I replace that byte, a subsequent
search for "—"  will fail... and leave behind the other garbage characters.

Have you tried to copy-n-paste "—" into an emacs buffer?  What do you
get?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-20 10:45       ` ken
@ 2007-02-20 10:56         ` Juanma Barranquero
  2007-02-20 12:46           ` ken
  0 siblings, 1 reply; 22+ messages in thread
From: Juanma Barranquero @ 2007-02-20 10:56 UTC (permalink / raw)
  To: ken; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 452 bytes --]

On 2/20/07, ken <gebser@speakeasy.net> wrote:

> When I copy-n-paste "—" into an emacs buffer, one of the (several) bytes
> displayed is represented by '\224'.

In a unibyte buffer, you mean? Does it appear correctly if you do

  M-: (set-buffer-multibyte t) <RET>

?

> Have you tried to copy-n-paste "—" into an emacs buffer?
> What do you get?

I get an em dash (U+2014). I'm starting Emacs in multibyte mode (the default).

             Juanma

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-20 10:56         ` Juanma Barranquero
@ 2007-02-20 12:46           ` ken
  2007-02-20 13:07             ` Juanma Barranquero
  0 siblings, 1 reply; 22+ messages in thread
From: ken @ 2007-02-20 12:46 UTC (permalink / raw)
  To: help-gnu-emacs

On 02/20/2007 05:56 AM somebody named Juanma Barranquero wrote:
> On 2/20/07, ken <gebser@speakeasy.net> wrote:
> 
>> When I copy-n-paste "—" into an emacs buffer, one of the (several) bytes
>> displayed is represented by '\224'.
> 
> In a unibyte buffer, you mean? Does it appear correctly if you do
> 
>  M-: (set-buffer-multibyte t) <RET>
> 
> ?
> 
>> Have you tried to copy-n-paste "—" into an emacs buffer?
>> What do you get?
> 
> I get an em dash (U+2014). I'm starting Emacs in multibyte mode (the
> default).
> 
>             Juanma

The OP (original poster) on this thread was having to deal with MS
characters appearing in a buffer with latin1 encoding.  I was having the
same issue.  So the solution I was offering was intended for others (and
myself) who wish to retain their files in latin1 encoding.

To be sure, it's possible to save a buffer to file in another encoding.
 Indeed, I'm often prompting for this when saving a buffer to file.
Since I prefer to retain files in a format which will be readable by the
most people possible, I prefer to use latin1 encoding when- and wherever
feasible.  More importantly, the original questioner was asking for a
way to convert MS characters for use in latin1.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
  2007-02-20 12:46           ` ken
@ 2007-02-20 13:07             ` Juanma Barranquero
  0 siblings, 0 replies; 22+ messages in thread
From: Juanma Barranquero @ 2007-02-20 13:07 UTC (permalink / raw)
  To: ken; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On 2/20/07, ken <gebser@speakeasy.net> wrote:

> So the solution I was offering was intended for others (and
> myself) who wish to retain their files in latin1 encoding.

I don't know what solution you were offering. I was specifically
answering this question:

 "Have you tried to copy-n-paste "—" into an emacs buffer?  What do you get?"

And the answer is: in a multibyte buffer (latin-1, utf-8, etc), em dash.

> Since I prefer to retain files in a format which will be readable by the
> most people possible, I prefer to use latin1 encoding when- and wherever
> feasible.

Then you obviously cannot save U+2014, which is not a latin character.

             Juanma

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
       [not found]   ` <mailman.4721.1171894691.2155.help-gnu-emacs@gnu.org>
  2007-02-20  2:00     ` Stefan Monnier
@ 2007-02-21  9:29     ` Endless Story
  1 sibling, 0 replies; 22+ messages in thread
From: Endless Story @ 2007-02-21  9:29 UTC (permalink / raw)
  To: help-gnu-emacs

On Feb 19, 9:17 am, ken <geb...@speakeasy.net> wrote:
> The below is not textbook elisp, but it works, is easily understandable,
> and so too is easy to modify and add other "characters" to.  For
> example, if you have a typical set of chars which signal the beginning
> of paragraph (like "\n\n"), you could insert another replace-string line
> to convert that to the appropriate LaTeX (or HTML or whatever) coding
> for "paragraph".  Such a set of replacements might be better organized
> into a separate (but similar) function however.  Open Source == Your Choice.

Thanks for this, Ken.

I have already started using Stefan's version, and it's working well
for me - haven't had any problems so far with extra garbage characters
not being found & fixed. I can't follow the differences in the code
closely enough to know whose version is "better," but from my point of
view it's great that both of you came up with recipes. And thanks for
posting yours on the Wiki, too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-02-21  9:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-16 12:19 How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe? Endless Story
2007-02-16 12:44 ` Brendan Halpin
2007-02-16 16:02   ` ken
2007-02-16 16:14     ` Sebastian P. Luque
2007-02-17 11:04       ` ken
     [not found]   ` <mailman.4605.1171641752.2155.help-gnu-emacs@gnu.org>
2007-02-16 16:59     ` Brendan Halpin
2007-02-16 22:22       ` ken
2007-02-17  1:47   ` Stefan Monnier
2007-02-17 12:06     ` ken
     [not found]     ` <mailman.4653.1171713988.2155.help-gnu-emacs@gnu.org>
2007-02-17 17:29       ` Stefan Monnier
2007-02-19 14:17   ` ken
2007-02-19 16:28     ` Shanks N
2007-02-19 18:48       ` ken
     [not found]   ` <mailman.4721.1171894691.2155.help-gnu-emacs@gnu.org>
2007-02-20  2:00     ` Stefan Monnier
2007-02-20 10:45       ` ken
2007-02-20 10:56         ` Juanma Barranquero
2007-02-20 12:46           ` ken
2007-02-20 13:07             ` Juanma Barranquero
2007-02-21  9:29     ` Endless Story
2007-02-16 16:22 ` Stefan Monnier
2007-02-16 17:00   ` Endless Story
2007-02-16 22:00     ` Radamanthe

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.