all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Thomas Persson <thomas@spacecentre.se>
Subject: Re: How to convert .doc to plain text ascii in emacs.
Date: Sun, 02 May 2004 21:26:45 +0200	[thread overview]
Message-ID: <87n04qr5oq.fsf@spacecentre.se> (raw)
In-Reply-To: Pine.LNX.4.44.0405021009480.7405-100000@localhost.localdomain

gebser@speakeasy.net writes:

> Thanks very much.  Your elisp works great.  There's one glitch (which I
> realize is from antiword):
>
> The three characters "\342\200\231" should be replaced by the single 
> apostrophe character (').

The fact that antiword and my code leaves you with a buffer containing
numerical codes instead of the characters themselves is your first
problem. This doesn't happen for me at all. It's either a problem with
antiword or a problem with how emacs displays characters. Try running
antiword from the command line to figure out which.

> To do this by hand, I did M-x replace-regexp Return C-q 342 Return
> C-q 200 Return C-q 231 Return Return ' Return
>
> but this does not find the intended string.  The problem seems to be 
> that C-q 342 is immediately (in the minibuffer) converted into an 'a' 
> with a grave symbol over it.  Putting the point on the backslash (\) 
> preceding the 342 in the antiword-converted buffer and doing "C-u C-x =" 
> indeed shows this a-with-grave character to be (0342, 226, 0xe2).
>
> To create a simple test case, do the following:
>
> Open an empty *scratch* buffer.  Enter into it: C-q 342 Return C-q 200
> Return C-q 231 Return.  The first character that appears is the 
> a-with-grave; the second and third characters appear properly as 
> \200\231.  
>
> It is, I think, the failure of C-q 342 to be represented as \342 which 
> is the problem.  What is the solution?

The fact that you have a problem with replacing the numerical
character codes with the characters themselves is however definitely a
emacs related problem. As far as I can tell it would work to add the
replace-regexp business to the end of the antiword-buffer function
like this:


(defun antiword-buffer ()
  "Takes the current buffer as input to the external program antiword.

If the current buffer is a ms-word document it's contents are replaced
with the output from antiword and the extension `.doc' is replaced
with `.txt' in the buffer-file-name."
  (let ((txt-buffer-file-name (concat (substring (buffer-file-name) 0 -4)
				      ".txt")))
    (shell-command-on-region (point-min) (point-max)
			     "cat | antiword -" nil t nil)
    (undo-start)
    (if (equal (buffer-string) "- is not a Word Document.\n")
	(or (undo-more 1)
	    (message "%s - is not a Word Document."(current-buffer)))
      (set-visited-file-name txt-buffer-file-name)
      (not-modified)
      (replace-regexp "\342\200\231" "'"))))

;; The following expression makes sure that antiword-buffer is run when a
;; file with the .doc extension is opened.
(setq auto-mode-alist
      (append '(("\\.doc\\'" . antiword-buffer))
	      auto-mode-alist))


If that doesn't work then perhaps "wvWare" or "undoc.el" ,as previous
posters have suggested, might be better solutions for you.

  parent reply	other threads:[~2004-05-02 19:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-28 17:32 How to convert .doc to plain text ascii in emacs Don Saklad
2004-04-28 18:13 ` Yoni Rabkin Katzenell
2004-05-01 19:02   ` Thomas Persson
2004-05-02 14:44     ` gebser
2004-05-02 19:04       ` Roodwriter
2004-05-02 19:26       ` Thomas Persson [this message]
2004-05-02  8:57 ` Tim X
  -- strict thread matches above, loose matches on Subject: below --
2004-04-28 18:02 Don Saklad
2004-04-28 18:10 ` Kin Cho
2004-04-28 18:17   ` Jay Belanger
2004-04-29 13:47     ` John Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87n04qr5oq.fsf@spacecentre.se \
    --to=thomas@spacecentre.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.