From: Tim X <timx@nospam.dev.null>
To: help-gnu-emacs@gnu.org
Subject: Re: Viewing PDFs as text?
Date: Tue, 11 Mar 2008 19:07:30 +1100 [thread overview]
Message-ID: <87skyxela5.fsf@lion.rapttech.com.au> (raw)
In-Reply-To: 7599507d-11a4-4e93-974f-5d7aa924b152@e6g2000prf.googlegroups.com
Alan <lngndvs@gmail.com> writes:
> Less (via lesspipe, I think) is able to read pdf files as text, using
> pdftotext. It would be much easier for me to browse pdfs using
> emacs.
>
> I would like to do this from dired.
>
> There is a doc-view.el facility, that seems too complicated for me.
> Did I misunderstand that, or is it possible in some other way to view
> pdfs from dired, as text?
>
doc-view will do what you want. However, if you just want a text version
of the file, you may find the following useful. this is my txutils.el
package. It converts various file types (doc, pdf, ps etc) to text or
sometimes html and then displays the output in an emacs buffer. I use
advice to attach this to view-file (v in dired). As an added benefit, if
you do a view file on an html file, it will render in w3m.
Note that this works perfectly for me. Your milage may differ and you
may find bugs. I'm certanly happy to recieve bug reports, but cannot
guarantee a timely update/fix.
Tim
;; Filename: /home/tcross/projects/emacs-convert/txutils.el
;; Creation Date: Wednesday, 20 September 2006 10:13 PM EST
;; Last Modified: Monday, 27 November 2006 06:38 PM EST
;; Version: 2.0
;; Author: Tim Cross
;; Description: Convert files from doc, ps, pdf, ppt to a format
;; which can be viewed within emacs (i.e. text or html)
;;;
;;; This file is not part of GNU Emacs, but the same permissions apply.
;;; This file is released under the Free software foundation GPL.
;;;
;;; GNU Emacs is free software; you can redistribute it and/or modify
;;; it under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 2, or (at your option)
;;; any later version.
;;;
;;; GNU Emacs is distributed in the hope that it will be useful,
;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Emacs; see the file COPYING. If not, write to
;;; the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
;;;
;;; Commentary
;;; ==========
;;;
;;; The very simple idea behind this basic utility is to make accessing
;;; files in .doc, .pdf, .ps and .ppt more easily accessible without
;;; having to leave emacs or manually convert the file format prior
;;; to being able to view the contents in emacs.
;;;
;;; There are packages which will enable calls to external viewers
;;; for files of specific formats, such as xpdf for pdf etc. However,
;;; I wanted to have everything within emacs as this makes integration,
;;; cutting/pasting etc a lot easier, plus as a blind user, most
;;; external utilities are of little use because they don't also include
;;; speech support.
;;;
;;; The objective here is to have things setup so that when browsing
;;; a directory with dired, you can just hit 'v' for any file you want to
;;; view and you will be presented with a text or html version without
;;; needing to do any manual conversion - or even careing about what
;;; would need to be done.
;;;
;;; You need the following packages (or at least utilities which will
;;; do the same thing). Most of these are fairly standard with many Linux
;;; distros these days.
;;; The wv utilities which contain wvText for converting MS Word docs
;;; The xpdf utilities which include pdftotext for converting PDF to text
;;; The Ghostscript package which contains pstotext for converting PS to text
;;; The ppthtml utility for converting MS Power Point files to html
;;; A configured and working browse-url setup. I use w3m as my browser
;;;
;;; Customizing
;;; ===========
;;;
;;; The easiest way to customize the conversion utility settings is to
;;; use M-x customize-group <RET> txutils <RET>
;;;
;;; In the txutils customize group, you will find just one setting, which
;;; is an alist of values indexed by a regular expression that matches against
;;; file extensions - a crude way of determining the source filetype
;;; (i.e. *.doc, *.pdf, *.ps, *.ppt, *.xls, *.html etc). See the
;;; documentation for more details.
;;;
;;; Installation
;;; ============
;;;
;;; Pretty straight forward. Place this file somewhere in your load path
;;; and put a (require 'txutils) in your .emacs. You may want to byte
;;; compile this file.
;;;
;;; Thanks
;;; ======
;;;
;;; A number of people provided suggestions on how to improve both my elisp
;;; and the program itself. In particular, thanks goes to
;;; Lukas Loehrer
;;; Vinicius Jose Latorre
;;; Andreas Roehler
;;;
;;; plus a few others who gave general suggestions, feedback and
;;; encouragement. Thanks to everyone for taking the time and putting
;;; in the effort, its greatly appreciated.
;;;
;;; Reporting Bugs
;;; ==============
;;;
;;; This is the first bit of elisp I've allowed out into the world and
;;; while I am really learning to love both elisp and cl lisp, I'm still
;;; very much a novice. Therefore, there IS bugs and probably some pretty
;;; poor style within this stuff. Feedback, bug reports and suggestions
;;; always welcome. Send e-mail to tcross@une.edu.au
;;;
(require 'custom)
(require 'browse-url)
; make-temp-file is part of apel prior to emacs 22
(static-when (= emacs-major-version 21)
(require 'poe))
(defgroup txutils nil
"Customize group for txutils."
:prefix "txutils-"
:group 'External)
(defcustom txutils-convert-alist
'( ;; MS Word
("\\.\\(?:DOC\\|doc\\)$" doc "/usr/bin/wvText" nil nil nil nil nil)
;; PDF
("\\.\\(?:PDF\\|pdf\\)$" pdf "/usr/bin/pdftotext" nil nil nil nil nil)
;; PostScript
("\\.\\(?:PS\\|ps\\)$" ps "/usr/bin/pstotext" "-output" t nil nil nil)
;; MS PowerPoint
("\\.\\(?:PPT\\|ppt\\)$" ppt "/usr/bin/ppthtml" nil nil nil t t))
"*Association for program convertion.
Each element has the following form:
(REGEXP SYMBOL CONVERTER SWITCHES INVERT REDIRECT-INPUT REDIRECT-OUTPUT HTML-OUTPUT)
Where:
REGEXP is a regexp to match file type to convert.
SYMBOL is a symbol to designate the fyle type.
CONVERTER is a program to convert the fyle type to text or HTML.
SWITCHES is a string which gives command line switches for the conversion
program. Nil means there are no switches needed.
INVERT indicates if input and output program option is to be
inverted or not. Non-nil means to invert, that is, output
option first then input option. Nil means do not invert,
that is, input option first then output option.
REDIRECT-INPUT indicates to use < to direct input from the input
file. This is useful for utilities which accept input
from stdin rather than a file.
REDIRECT-OUTPUT indicates to use > to direct output to the output
file. This is useful for utilities that only send output to
stdout.
HTML-OUTPUT Indicates the conversion program creates HTML output
rather than plain text."
:type '(repeat
(list :tag "Convertion"
(regexp :tag "File Type Regexp")
(symbol :tag "File Type Symbol")
(string :tag "Converter")
(choice :menu-tag "Output Option"
:tag "Output Option"
(const :tag "None" nil)
string)
(boolean :tag "Invert I/O Option")
(boolean :tag "Redirect Standard Input")
(boolean :tag "Redirect Standard Output")
(boolean :tag "HTML Output")))
:group 'txutils)
(defun txutils-run-command (cmd &optional output-buffer)
"Execute shell command with arguments, putting output in buffer."
(= 0 (shell-command cmd (if output-buffer
output-buffer
"*txutils-output*")
(if output-buffer
"*txutils-output*"))))
(defun txutils-quote-expand-file-name (file-name)
"Expand file name and quote special chars if required."
(shell-quote-argument (expand-file-name file-name)))
(defun txutils-file-alist (file-name)
"Return alist associated with file of this type."
(let ((al txutils-convert-alist))
(while (and al
(not (string-match (caar al) file-name)))
(setq al (cdr al)))
(if al
(cdar al)
nil)))
(defun txutils-make-temp-name (orig-name type-alist)
"Create a temp file name from original file name"
(make-temp-file (file-name-sans-extension
(file-name-nondirectory orig-name)) nil
(if (nth 7 type-alist)
".html"
".txt")))
(defun txutils-build-cmd (input-file output-file type-alist)
"Create the command string from conversion alist."
(let ((f1 (if (nth 3 type-alist)
output-file
input-file))
(f2 (if (nth 3 type-alist)
input-file
output-file)))
(concat
(nth 1 type-alist)
(if (nth 2 type-alist) ; Add cmd line switches
(concat " " (nth 2 type-alist)))
(if (nth 4 type-alist) ; redirect input (which may be output
(concat " < " f1) ; if arguments are inverted!)
(concat " " f1))
(if (nth 5 type-alist) ; redirect output (see above comment)
(concat " > " f2)
(concat " " f2)))))
(defun txutils-do-file-conversion (file-name)
"Based on file extension, convert file to text. Return name of text file"
(interactive "fFile to convert: ")
(let ((f-alist (txutils-file-alist file-name))
output-file)
(when f-alist
(message "Performing file conversion for %s." file-name)
(setq output-file (txutils-make-temp-name file-name f-alist))
(message "Command: %s" (txutils-build-cmd file-name output-file f-alist))
(if (txutils-run-command
(txutils-build-cmd (txutils-quote-expand-file-name file-name)
(txutils-quote-expand-file-name
output-file) f-alist))
output-file
file-name))))
(defadvice view-file (around txutils pre act comp)
"Perform file conversion or call web browser to view contents of file."
(let ((file-arg (ad-get-arg 0)))
(if (txutils-file-alist file-arg)
(ad-set-arg 0 (txutils-do-file-conversion file-arg)))
(if (string-match "\\.\\(?:HTML?\\|html?\\)$" (ad-get-arg 0))
(browse-url-of-file (ad-get-arg 0))
ad-do-it)))
(provide 'txutils)
--
tcross (at) rapttech dot com dot au
next prev parent reply other threads:[~2008-03-11 8:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-10 20:52 Viewing PDFs as text? Alan
2008-03-10 22:59 ` Peter Dyballa
[not found] ` <mailman.8684.1205189951.18990.help-gnu-emacs@gnu.org>
2008-03-11 3:42 ` Roland Winkler
2008-03-11 7:45 ` Tassilo Horn
[not found] ` <mailman.8699.1205221811.18990.help-gnu-emacs@gnu.org>
2008-03-11 13:11 ` Roland Winkler
2008-03-12 13:45 ` Tassilo Horn
2008-03-13 3:37 ` Bastien
2008-03-13 9:29 ` Tassilo Horn
2008-03-13 10:09 ` Bastien
2008-03-13 10:30 ` Tassilo Horn
2008-03-13 18:32 ` Tassilo Horn
2008-03-13 18:45 ` Bastien Guerry
2008-03-13 11:49 ` Emacs 23 and msb.el Guy Durrieu
2008-11-06 6:36 ` Emacs 23: strange character display Guy Durrieu
2008-11-06 7:48 ` Paul R
2008-11-06 13:12 ` Guy Durrieu
[not found] ` <mailman.8756.1205329519.18990.help-gnu-emacs@gnu.org>
2008-03-12 15:38 ` Viewing PDFs as text? Roland Winkler
2008-03-18 22:47 ` Roland Winkler
2008-03-19 10:09 ` Tassilo Horn
2008-03-19 14:48 ` Roland Winkler
2008-03-19 21:08 ` Tassilo Horn
[not found] ` <mailman.9221.1205960921.18990.help-gnu-emacs@gnu.org>
2008-03-19 23:54 ` Roland Winkler
2008-03-20 20:25 ` Tassilo Horn
2009-01-26 0:05 ` Roland Winkler
[not found] ` <87eiyqpjg1.fsf@thinkpad.tsdh.de>
2009-01-26 19:14 ` Samuel Wales
2009-01-26 19:48 ` Tassilo Horn
2009-01-26 19:52 ` Samuel Wales
2009-01-26 20:37 ` Tassilo Horn
2009-01-26 20:47 ` Samuel Wales
2009-01-27 9:11 ` Tassilo Horn
2009-01-26 22:28 ` Roland Winkler
2009-01-27 9:33 ` Tassilo Horn
2009-01-26 22:28 ` Drew Adams
2009-01-27 9:10 ` Tassilo Horn
2008-03-11 7:43 ` Tassilo Horn
2008-03-11 8:07 ` Tim X [this message]
2008-03-20 6:19 ` Alan
2008-03-20 8:50 ` Tim X
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87skyxela5.fsf@lion.rapttech.com.au \
--to=timx@nospam.dev.null \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).