* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
@ 2011-04-18 9:23 Trent W. Buck
2011-06-30 21:57 ` Juri Linkov
2019-09-29 12:26 ` Lars Ingebrigtsen
0 siblings, 2 replies; 3+ messages in thread
From: Trent W. Buck @ 2011-04-18 9:23 UTC (permalink / raw)
To: 8519; +Cc: rfrancoise
doc-view supports using pdftotext on ttys.
Unfortunately it is hard-coded to pass -raw.
I would prefer to pass -layout.
Please modify doc-view to allow me to support something like
(setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))
FYI, my pdftotext manpage says -raw is discouraged:
-layout
Maintain (as best as possible) the original physical
layout of the text. The default is to =b4undo' physical
layout (columns, hyphenation, etc.) and output the text
in reading order.
-raw Keep the text in content stream order. This is a hack
which often "undoes" column formatting, etc. Use of raw
mode is no longer recommended.
In GNU Emacs 24.0.50.1 (x86_64-pc-linux-gnu)
of 2010-12-14 on elegiac, modified by Debian
(emacs-snapshot package, version 1:20101212-2)
configured using `configure '--build' 'x86_64-linux-gnu' '--host' 'x86_64-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/24.0.50/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.0.50/site-lisp:/usr/share/emacs/site-lisp' '--without-compress-info' '--with-x=no' '--without-dbus' '--without-sound' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CFLAGS=-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g -O2' 'LDFLAGS=-g -Wl,--as-needed' 'CPPFLAGS=''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: C
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_AU.utf8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Man
Minor modes in effect:
diff-auto-refine-mode: t
shell-dirtrack-mode: t
rcirc-track-minor-mode: t
xterm-mouse-mode: t
ido-everywhere: t
savehist-mode: t
icomplete-mode: t
show-paren-mode: t
delete-selection-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A
ESC O A ESC O B ESC O B ESC C-b ESC O A ESC O A ESC
O A C-e RET RET ( e v a l - a f t e r - l o a d SPC
" p DEL d o v - DEL DEL c - v i e w " RET TAB ' ( C-y
C-x C-x C-g ESC s ESC O B TAB ESC O A C-e ESC C-k ESC
O B ESC O B ESC O B ESC b ESC O B ESC b ESC b ESC d
l a o u t DEL DEL DEL y o u t ESC O B ESC O A ESC O
A ESC O A ESC O A ESC C-x C-x C-s ESC a ESC a C-x ESC
O D C-x C-k C-x C-k RET C-x C-k RET y C-x 1 C-v C-v
C-v C-v C-v ESC x m a n RET p d f t o t e x t RET C-x
0 C-s r a w ESC O C ESC O C ESC O B C-v ESC x r e p
o r t SPC e m a c s RET b u g RET
Recent messages:
Copying /scpc:soy:/cyber/tmp/split-handshake.pdf to /tmp/tramp.24520Pw.pdf...done
Tramp: Inserting local temp file `/tmp/tramp.24520Pw.pdf'...done
Wrote /tmp/docview1000/split-handshake.pdf
No PNG support is available, or some conversion utility for pdf files is missing.
Unable to render file. View extracted text instead? (y or n) y
Invoking man pdftotext in the background
Please wait: formatting the pdftotext man page...
pdftotext man page formatted
Mark saved where search started
call-interactively: End of buffer [2 times]
Load-path shadows:
/home/twb/.emacs.d/lisp/magit/magit-svn hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-svn
/home/twb/.emacs.d/lisp/magit/magit-key-mode hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-key-mode
/home/twb/.emacs.d/lisp/magit/magit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit
/home/twb/.emacs.d/lisp/magit/magit-topgit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-topgit
/usr/share/emacs/24.0.50/site-lisp/puppet-el/puppet-mode hides /usr/share/emacs/site-lisp/puppet-mode
/usr/share/emacs/24.0.50/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup
Features:
(shadow mail-extr emacsbug eldoc paredit find-func apropos cus-edit
cus-start cus-load ibuf-ext ibuffer sort tramp-cmds noutline outline
w3m-cookie thingatpt w3m-search mule-util w3m-form w3m-symbol
w3m-bookmark w3m-session w3m browse-url doc-view image-mode timezone
w3m-hist w3m-fb bookmark-w3m w3m-ems w3m-ccl ccl w3m-favicon w3m-image
w3m-proc w3m-util cc-mode cc-fonts cc-menus cc-cmds cc-styles cc-align
cc-engine cc-vars cc-defs woman tabify man assoc conf-mode vc-rcs
newcomment rect sh-script executable grep whitespace log-edit pcvs-util
add-log gnus-cite gnus-art mm-uu mml2015 epg-config mm-view smime dig
mailcap nnir gnus-sum macroexp nnoo gnus-group gnus-undo nnmail
mail-source gnus-start gnus-spec gnus-int message sendmail rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus-range gnus
gnus-ems nnheader mail-utils mm-util mail-prsvr wid-edit rst compile
tool-bar etags windmove diff-mode vc help-mode easymenu view tramp-sh
shell comint tramp-cache tramp tramp-compat auth-source netrc gnus-util
password-cache format-spec advice help-fns advice-preload tramp-loaddefs
ffap vc-dispatcher vc-darcs cl xml vc-git image wdired multi-isearch
dired-aux dired regexp-opt disp-table rcirc time-date ring server
jka-compr edmacro kmacro xt-mouse ido savehist icomplete paren delsel
saveplace debian-el debian-el-loaddefs w3m-load emacs-goodies-el
emacs-goodies-custom emacs-goodies-loaddefs easy-mmode dpkg-dev-el
dpkg-dev-el-loaddefs ediff-hook vc-hooks lisp-float-type lisp-mode
register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button
minibuffer faces cus-face files text-properties overlay md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process multi-tty emacs)
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
2011-04-18 9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
@ 2011-06-30 21:57 ` Juri Linkov
2019-09-29 12:26 ` Lars Ingebrigtsen
1 sibling, 0 replies; 3+ messages in thread
From: Juri Linkov @ 2011-06-30 21:57 UTC (permalink / raw)
To: 8519
> doc-view supports using pdftotext on ttys.
> Unfortunately it is hard-coded to pass -raw.
> I would prefer to pass -layout.
>
> Please modify doc-view to allow me to support something like
>
> (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))
I came across the same need and found this bug report.
I think doc-view should also support other free software
that processes PDF files:
1. pdftk
pdftk is able to extract the PDF metadata (title, author, bookmarks, etc.),
e.g.
pdftk file1.pdf dump_data output file1.txt
So for a large PDF document, doc-view could present the
Table of Contents where the user can navigate to the selected page,
and then convert only displayed pages instead of all pages
that is terribly slow for a 1000-page document.
pdftk also can prepare the PDF text for editing in emacs.
From `man pdftk':
-compress useful when you want to edit PDF code
in a text editor like vim or emacs.
Uncompress PDF page streams for editing the PDF
in a text editor (e.g., vim, emacs):
pdftk doc.pdf output doc.unc.pdf uncompress
This feature could be used after typing `C-c C-c'.
Since pdftk is dependent on Java, doc-view should not require it
and should be able to detect the installed PDF processing programs
(with e.g. `(executable-find "pdftk")') and select one of them
according to the user's priority list.
2. A better program is `qpdf'. It has no problems mentioned above.
So doc-view should also detect the availability of
`(executable-find "qpdf")' as well and provide the same option for its
command line arguments (and use all features relevant to doc-view).
3. Using the PDF rendering library `poppler,' it's possible
to implement in Emacs a PDF viewer like `apvlv' for Vim.
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
2011-04-18 9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
2011-06-30 21:57 ` Juri Linkov
@ 2019-09-29 12:26 ` Lars Ingebrigtsen
1 sibling, 0 replies; 3+ messages in thread
From: Lars Ingebrigtsen @ 2019-09-29 12:26 UTC (permalink / raw)
To: Trent W. Buck; +Cc: rfrancoise, 8519
trentbuck@gmail.com (Trent W. Buck) writes:
> doc-view supports using pdftotext on ttys.
> Unfortunately it is hard-coded to pass -raw.
> I would prefer to pass -layout.
>
> Please modify doc-view to allow me to support something like
>
> (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))
Makes sense; I've now added this to Emacs 27.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-09-29 12:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-18 9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
2011-06-30 21:57 ` Juri Linkov
2019-09-29 12:26 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).