unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
@ 2011-04-18  9:23 Trent W. Buck
  2011-06-30 21:57 ` Juri Linkov
  2019-09-29 12:26 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 3+ messages in thread
From: Trent W. Buck @ 2011-04-18  9:23 UTC (permalink / raw)
  To: 8519; +Cc: rfrancoise

doc-view supports using pdftotext on ttys.
Unfortunately it is hard-coded to pass -raw.
I would prefer to pass -layout.

Please modify doc-view to allow me to support something like

    (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))

FYI, my pdftotext manpage says -raw is discouraged:

       -layout

              Maintain (as best as possible) the original physical
              layout of the text.  The default is to =b4undo' physical
              layout (columns, hyphenation, etc.)  and output the text
              in reading order.

       -raw   Keep the text in content stream order.  This is a hack
              which often "undoes" column formatting, etc.  Use of raw
              mode is no longer recommended.


In GNU Emacs 24.0.50.1 (x86_64-pc-linux-gnu)
 of 2010-12-14 on elegiac, modified by Debian
 (emacs-snapshot package, version 1:20101212-2)
configured using `configure  '--build' 'x86_64-linux-gnu' '--host' 'x86_64-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/24.0.50/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.0.50/site-lisp:/usr/share/emacs/site-lisp' '--without-compress-info' '--with-x=no' '--without-dbus' '--without-sound' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CFLAGS=-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g -O2' 'LDFLAGS=-g -Wl,--as-needed' 'CPPFLAGS=''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: C
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_AU.utf8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Man

Minor modes in effect:
  diff-auto-refine-mode: t
  shell-dirtrack-mode: t
  rcirc-track-minor-mode: t
  xterm-mouse-mode: t
  ido-everywhere: t
  savehist-mode: t
  icomplete-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC 
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC 
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O B ESC O B ESC C-b ESC O A ESC O A ESC 
O A C-e RET RET ( e v a l - a f t e r - l o a d SPC 
" p DEL d o v - DEL DEL c - v i e w " RET TAB ' ( C-y 
C-x C-x C-g ESC s ESC O B TAB ESC O A C-e ESC C-k ESC 
O B ESC O B ESC O B ESC b ESC O B ESC b ESC b ESC d 
l a o u t DEL DEL DEL y o u t ESC O B ESC O A ESC O 
A ESC O A ESC O A ESC C-x C-x C-s ESC a ESC a C-x ESC 
O D C-x C-k C-x C-k RET C-x C-k RET y C-x 1 C-v C-v 
C-v C-v C-v ESC x m a n RET p d f t o t e x t RET C-x 
0 C-s r a w ESC O C ESC O C ESC O B C-v ESC x r e p 
o r t SPC e m a c s RET b u g RET

Recent messages:
Copying /scpc:soy:/cyber/tmp/split-handshake.pdf to /tmp/tramp.24520Pw.pdf...done
Tramp: Inserting local temp file `/tmp/tramp.24520Pw.pdf'...done
Wrote /tmp/docview1000/split-handshake.pdf
No PNG support is available, or some conversion utility for pdf files is missing.
Unable to render file.  View extracted text instead? (y or n)  y
Invoking man pdftotext in the background
Please wait: formatting the pdftotext man page...
pdftotext man page formatted
Mark saved where search started
call-interactively: End of buffer [2 times]

Load-path shadows:
/home/twb/.emacs.d/lisp/magit/magit-svn hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-svn
/home/twb/.emacs.d/lisp/magit/magit-key-mode hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-key-mode
/home/twb/.emacs.d/lisp/magit/magit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit
/home/twb/.emacs.d/lisp/magit/magit-topgit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-topgit
/usr/share/emacs/24.0.50/site-lisp/puppet-el/puppet-mode hides /usr/share/emacs/site-lisp/puppet-mode
/usr/share/emacs/24.0.50/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup

Features:
(shadow mail-extr emacsbug eldoc paredit find-func apropos cus-edit
cus-start cus-load ibuf-ext ibuffer sort tramp-cmds noutline outline
w3m-cookie thingatpt w3m-search mule-util w3m-form w3m-symbol
w3m-bookmark w3m-session w3m browse-url doc-view image-mode timezone
w3m-hist w3m-fb bookmark-w3m w3m-ems w3m-ccl ccl w3m-favicon w3m-image
w3m-proc w3m-util cc-mode cc-fonts cc-menus cc-cmds cc-styles cc-align
cc-engine cc-vars cc-defs woman tabify man assoc conf-mode vc-rcs
newcomment rect sh-script executable grep whitespace log-edit pcvs-util
add-log gnus-cite gnus-art mm-uu mml2015 epg-config mm-view smime dig
mailcap nnir gnus-sum macroexp nnoo gnus-group gnus-undo nnmail
mail-source gnus-start gnus-spec gnus-int message sendmail rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus-range gnus
gnus-ems nnheader mail-utils mm-util mail-prsvr wid-edit rst compile
tool-bar etags windmove diff-mode vc help-mode easymenu view tramp-sh
shell comint tramp-cache tramp tramp-compat auth-source netrc gnus-util
password-cache format-spec advice help-fns advice-preload tramp-loaddefs
ffap vc-dispatcher vc-darcs cl xml vc-git image wdired multi-isearch
dired-aux dired regexp-opt disp-table rcirc time-date ring server
jka-compr edmacro kmacro xt-mouse ido savehist icomplete paren delsel
saveplace debian-el debian-el-loaddefs w3m-load emacs-goodies-el
emacs-goodies-custom emacs-goodies-loaddefs easy-mmode dpkg-dev-el
dpkg-dev-el-loaddefs ediff-hook vc-hooks lisp-float-type lisp-mode
register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button
minibuffer faces cus-face files text-properties overlay md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process multi-tty emacs)





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
  2011-04-18  9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
@ 2011-06-30 21:57 ` Juri Linkov
  2019-09-29 12:26 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 3+ messages in thread
From: Juri Linkov @ 2011-06-30 21:57 UTC (permalink / raw)
  To: 8519

> doc-view supports using pdftotext on ttys.
> Unfortunately it is hard-coded to pass -raw.
> I would prefer to pass -layout.
>
> Please modify doc-view to allow me to support something like
>
>     (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))

I came across the same need and found this bug report.

I think doc-view should also support other free software
that processes PDF files:

1. pdftk

pdftk is able to extract the PDF metadata (title, author, bookmarks, etc.),
e.g.

    pdftk file1.pdf dump_data output file1.txt

So for a large PDF document, doc-view could present the
Table of Contents where the user can navigate to the selected page,
and then convert only displayed pages instead of all pages
that is terribly slow for a 1000-page document.

pdftk also can prepare the PDF text for editing in emacs.
From `man pdftk':

  -compress useful when you want to edit PDF code
            in a text editor like vim or emacs.

   Uncompress PDF page streams for editing the PDF
   in a text editor (e.g., vim, emacs):

       pdftk doc.pdf output doc.unc.pdf uncompress

This feature could be used after typing `C-c C-c'.

Since pdftk is dependent on Java, doc-view should not require it
and should be able to detect the installed PDF processing programs
(with e.g. `(executable-find "pdftk")') and select one of them
according to the user's priority list.

2. A better program is `qpdf'. It has no problems mentioned above.
So doc-view should also detect the availability of
`(executable-find "qpdf")' as well and provide the same option for its
command line arguments (and use all features relevant to doc-view).

3. Using the PDF rendering library `poppler,' it's possible
to implement in Emacs a PDF viewer like `apvlv' for Vim.





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
  2011-04-18  9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
  2011-06-30 21:57 ` Juri Linkov
@ 2019-09-29 12:26 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 3+ messages in thread
From: Lars Ingebrigtsen @ 2019-09-29 12:26 UTC (permalink / raw)
  To: Trent W. Buck; +Cc: rfrancoise, 8519

trentbuck@gmail.com (Trent W. Buck) writes:

> doc-view supports using pdftotext on ttys.
> Unfortunately it is hard-coded to pass -raw.
> I would prefer to pass -layout.
>
> Please modify doc-view to allow me to support something like
>
>     (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))

Makes sense; I've now added this to Emacs 27.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-29 12:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-18  9:23 bug#8519: 24.0.50; doc-view: allow pdftotext -layout instead of -raw Trent W. Buck
2011-06-30 21:57 ` Juri Linkov
2019-09-29 12:26 ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).