all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* fixing M$ character codes
@ 2004-07-03 16:00 nospam55
       [not found] ` <Jym.wzfz884ct0.fsf@econet.org>
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: nospam55 @ 2004-07-03 16:00 UTC (permalink / raw)


Hi! Help me and the mankind ! 

I have often to correct manually dowloaded M$ - originating text , i. e. I must bother
to do the ethernal replacements

    \222 into '
    \213 into <
    etc etc

        , where the codes are taken from the well-known

               (defvar gnus-article-dumbquotes-map
                 '(("\202" ",")
                   ("\203" "f")
                   ("\204" ",,")
                   ("\205" "...")
                   ("\213" "<")
                   ("\214" "OE")
                   ("\221" "`")
                   ("\222" "'")
                   ("\223" "``")
                   ("\224" "\"")
                   ("\225" "*")
                   ("\226" "-")
                   ("\227" "--")
                   ("\231" "(TM)")
                   ("\233" ">")
                   ("\234" "oe")
                   ("\264" "'"))
                 "Table for MS-to-Latin1 translation.")

Now, the big question whose answer I didn't find on the internet 

   is there a way for having emacs fix this mess ?

My dream is to have something like 

    M-x  fix-evil-empire-nonsense  RET

that automatically does on the selected region the replacements above for us
poor humans.

Any package? Any defun ?

   Thank you :)



PS micro$oft-ware makes many people to write 

    \222 for ' apostrophe
    \202 for , comma
    \213 for < "less than"
    \233 for > "greater than"
    \224 for backslash
    \225 for asterisk
    etc etc


without they realize this :(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
       [not found] ` <Jym.wzfz884ct0.fsf@econet.org>
@ 2004-07-04 13:19   ` nospam55
  0 siblings, 0 replies; 10+ messages in thread
From: nospam55 @ 2004-07-04 13:19 UTC (permalink / raw)



thanks for the funcs Jim ; maybe I made some mistake using them, 
got misbehavior ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-03 16:00 fixing M$ character codes nospam55
       [not found] ` <Jym.wzfz884ct0.fsf@econet.org>
@ 2004-07-04 14:08 ` Jym Dyer
  2004-07-05 10:00   ` Haines Brown
  2004-07-04 21:06 ` Jesper Harder
  2004-07-05 14:22 ` nospam55
  3 siblings, 1 reply; 10+ messages in thread
From: Jym Dyer @ 2004-07-04 14:08 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

=v= I think ideally the code would parse headers to figure out
whether the brain damaged quotes are supposed to be ISO-Latin,
Windows-1252, UTF-8, or whatever.  But for now I just use a
sledgehammer and convert any and all needlessly-8bit characters
to their 7bit equivalents.

=v= The code I use is below.  I suppose someday I ought to make
them more comprehensive, but for now I just add what I need
along the way.  (Warning:  this converts all know quotes and
dashes to ASCII equivalents, but also convert centered dots to
asterisks, which isn't exactly an equivalent.)
    <_Jym_>


(defun jym.de8 ()
  "Turn 8bit characters into 7bit equivalents."
  (interactive)
  (mapcar
   (function (lambda (old_and_new)
    (save-excursion (apply 'query-replace old_and_new))))
   '(("­" "-")
     ("¹" "'")
     ("²" "''")
     ("³" "``")
     ("·" "*")
     ("…" "...")
     ("‹" "--")
     ("Œ" "`")
     ("‘" "`")
     ("“" "``")			; = 0x93
     ("”" "''")			; = 0x94
     ("•" "*")
     ("–" "-")			; = 0x96
     ("—" "--")			; = 0x97
     ("˜" "`")
     ("™" "'")
     ("œ" "``")
     ("" "''")
     ("â€" "") )))
  ;mapcar;
;defun jym.de8;

(defun jym.de8qp ()
  "Turn quoted printable 8bit into 7bit equivalents."
  (interactive)
  (mapcar
   (function (lambda (old_and_new)
    (save-excursion (apply 'query-replace old_and_new))))
   '(("=\n" "")
     ;("=E2=80=94" "--")
     ("=E2=80=99" "'")			; UTF-8
     ("=E2=80=9C" "``")			; UTF-8
     ("=E2=80=9D" "''")			; UTF-8
     ("=0D\n" "\n")			; = \r\n
     ("=20\n" "\n")
     ("=2E" ".")
     ("=3F" "?")
     ("=46" "F")
     ("=5B" "[")
     ("=5D" "]")
     ("=8B" "--")
     ("=8C" "`")
     ("=91" "`")
     ("=92" "'")
     ("=93" "``")			; = 0223
     ("=94" "''")
     ("=96" "-")			; = 0226
     ("=97" "--")			; = 0227
     ("=A0" " ")
     ("=A5" "'")
     ("=AD" "--")
     ("=AE" "\"")
     ("=B2" "``")
     ("=B3" "''")
     ("=B9" "'")) ))
  ;mapcar;
;defun jym.de8qp;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-03 16:00 fixing M$ character codes nospam55
       [not found] ` <Jym.wzfz884ct0.fsf@econet.org>
  2004-07-04 14:08 ` Jym Dyer
@ 2004-07-04 21:06 ` Jesper Harder
  2004-07-05 14:22 ` nospam55
  3 siblings, 0 replies; 10+ messages in thread
From: Jesper Harder @ 2004-07-04 21:06 UTC (permalink / raw)


nospam55 <nospa@no.yahoo.no> writes:

> I have often to correct manually dowloaded M$ - originating text ,
> i. e. I must bother to do the ethernal replacements, where the codes
> are taken from the well-known
>
>                (defvar gnus-article-dumbquotes-map
>
> is there a way for having emacs fix this mess ?
>
> My dream is to have something like 
>
>     M-x  fix-evil-empire-nonsense  RET

Sure, the command is called `M-x article-treat-dumbquotes'.  It works
in all buffers, not just the Gnus article buffer which it is intended
for.

You'll probably need to autoload it if you're using an infidel Usenet
client not approved by Gnus Towers and the Church of Emacs.

-- 
Jesper Harder                                <http://purl.org/harder/>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-04 14:08 ` Jym Dyer
@ 2004-07-05 10:00   ` Haines Brown
  2004-07-05 10:19     ` Thomas Gehrlein
  2004-07-07 15:12     ` Jym Dyer
  0 siblings, 2 replies; 10+ messages in thread
From: Haines Brown @ 2004-07-05 10:00 UTC (permalink / raw)


Jym Dyer <jym@econet.org> writes:

> =v= I think ideally the code would parse headers to figure out
> whether the brain damaged quotes are supposed to be ISO-Latin,
> Windows-1252, UTF-8, or whatever.  But for now I just use a
> sledgehammer and convert any and all needlessly-8bit characters
> to their 7bit equivalents.
> 
> =v= The code I use is below.  I suppose someday I ought to make
> them more comprehensive, but for now I just add what I need
> along the way.  (Warning:  this converts all know quotes and
> dashes to ASCII equivalents, but also convert centered dots to
> asterisks, which isn't exactly an equivalent.)
>     <_Jym_>
> 
> 
> (defun jym.de8 ()
>   "Turn 8bit characters into 7bit equivalents."
>   (interactive)
>   (mapcar
>    (function (lambda (old_and_new)
>     (save-excursion (apply 'query-replace old_and_new))))
>    '(("­" "-")
>      ("¹" "'")
> ...

Jym,

As on who often has to process documents with 8-bit characters, your
lisp code was certainly welcome. But it does not seem to do anything.

I'm running emacs 21.2.1 and pasted the code you supplied into
~/.emacs, and reloaded emacs. If I open a test file that is filled
with these 8-bit characters, it is displayed in emacs without any
change. 

What am I doing wrong? 

-- 
      Haines Brown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-05 10:00   ` Haines Brown
@ 2004-07-05 10:19     ` Thomas Gehrlein
  2004-07-07 15:12     ` Jym Dyer
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gehrlein @ 2004-07-05 10:19 UTC (permalink / raw)


Haines Brown <brownh@teufel.hartford-hwp.com> writes:

> As on who often has to process documents with 8-bit characters, your
> lisp code was certainly welcome. But it does not seem to do anything.
>
> I'm running emacs 21.2.1 and pasted the code you supplied into
> ~/.emacs, and reloaded emacs. If I open a test file that is filled
> with these 8-bit characters, it is displayed in emacs without any
> change. 
>
> What am I doing wrong?

Did you try calling the function interactively?

M-x jym.de8 RET

Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-03 16:00 fixing M$ character codes nospam55
                   ` (2 preceding siblings ...)
  2004-07-04 21:06 ` Jesper Harder
@ 2004-07-05 14:22 ` nospam55
  3 siblings, 0 replies; 10+ messages in thread
From: nospam55 @ 2004-07-05 14:22 UTC (permalink / raw)





Thomas Gehrlein <thomas.gehrlein@t-online.de> wrote :
 
> Did you try calling the function interactively?
> 
> M-x jym.de8 RET


Yes I think this is the correct use of the Jim's funcs, 
 after selection as region 
of the text to convert ; I think that the problem is that 
some funny 8 bit chars in the jym.de8 func body got lost in the 
journey from jim's HD to ours HDs :(


for example I got the piece

     ...
     ("" "`")
     ("" "``")                        ; = 0x93
     ("" "''")                        ; = 0x94
     ("" "*")
     ("" "-")                        ; = 0x96
     ("" "--")                        ; = 0x97
     ("" "`")
     ("" "'")
     ...

, whith that strange empties "" .


Could you quote that chars for us Jim ? for example, 
octally saying \044 for the dollar sign etc ? 
We type all back at our PCs.

   nospam55 :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-05 10:00   ` Haines Brown
  2004-07-05 10:19     ` Thomas Gehrlein
@ 2004-07-07 15:12     ` Jym Dyer
  2004-07-07 20:38       ` Haines Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Jym Dyer @ 2004-07-07 15:12 UTC (permalink / raw)


> As on who often has to process documents with 8-bit
> characters, your lisp code was certainly welcome.
> But it does not seem to do anything.

=v= I type "Meta-X jym.de8" and it goes through a bunch of
query-replaces.  (Or "Meta-X jym.de8qp" for quoted-printable
buffers.)  Like I said, it's pretty much a sledgehammer and
someday I'll clean it up.  But it does the job.
    <_Jym_>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-07 15:12     ` Jym Dyer
@ 2004-07-07 20:38       ` Haines Brown
  2004-07-15 16:59         ` Jym Dyer
  0 siblings, 1 reply; 10+ messages in thread
From: Haines Brown @ 2004-07-07 20:38 UTC (permalink / raw)


Jym Dyer <jym@econet.org> writes:

> > As on who often has to process documents with 8-bit
> > characters, your lisp code was certainly welcome.
> > But it does not seem to do anything.
> 
> =v= I type "Meta-X jym.de8" and it goes through a bunch of
> query-replaces.  (Or "Meta-X jym.de8qp" for quoted-printable
> buffers.)  Like I said, it's pretty much a sledgehammer and
> someday I'll clean it up.  But it does the job.
>     <_Jym_>

I think I've got myself straightened out, thanks.

One thing that confused me was that I was not sure how to define
"quoted printable." 

I'd also like to avoid being queried for each kind of replacement, as
happens with the ! command. 

-- 
      Haines Brown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: fixing M$ character codes
  2004-07-07 20:38       ` Haines Brown
@ 2004-07-15 16:59         ` Jym Dyer
  0 siblings, 0 replies; 10+ messages in thread
From: Jym Dyer @ 2004-07-15 16:59 UTC (permalink / raw)


> I'd also like to avoid being queried for each kind
> of replacement, as happens with the ! command.

=v= Edit the code to use use replace-string instead of
query-replace.  I use query-replace because, as I said,
it's a sledgehammer approach and I'm just culling the
replacements along the way, so I'm erring on the side
of caution.
    <_Jym_>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-07-15 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-03 16:00 fixing M$ character codes nospam55
     [not found] ` <Jym.wzfz884ct0.fsf@econet.org>
2004-07-04 13:19   ` nospam55
2004-07-04 14:08 ` Jym Dyer
2004-07-05 10:00   ` Haines Brown
2004-07-05 10:19     ` Thomas Gehrlein
2004-07-07 15:12     ` Jym Dyer
2004-07-07 20:38       ` Haines Brown
2004-07-15 16:59         ` Jym Dyer
2004-07-04 21:06 ` Jesper Harder
2004-07-05 14:22 ` nospam55

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.