unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: dalanicolai <dalanicolai@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Emacs Devel <emacs-devel@gnu.org>
Subject: Re: [PATCH] add epub support to doc-view
Date: Fri, 14 Jan 2022 21:02:08 +0100	[thread overview]
Message-ID: <CACJP=3mmKOy4zVmZmDBRMSft7d3PTUYaTkucjMwTVH2TKWg8uw@mail.gmail.com> (raw)
In-Reply-To: <CACJP=3nDuQzZ5thPjcMYy7R3n-Zq8LV8oJHUbwczq0rcEdQQzQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 829 bytes --]

And another, updated, patch, adding an extra option to configure the epub
font-size

On Fri, 14 Jan 2022 at 17:15, dalanicolai <dalanicolai@gmail.com> wrote:

> So here is a second version of the patch. But it adds a few more
> extensions to the list
> I forgot about them, as I don't use or experimented with them, but I
> figured that
> this is a good opportunity to add the support for those extensions also.
> A small comment for those who are interested, the CBZ (and by mupdf
> unsupported CBR)
> files seem to be just zipped/rarred collections of image files (I guess
> usually png/jpg). So
> supporting those extensions doesn't really require the `mutool` command if
> emacs would
> just uncompress the collections.
>
> I guess there is not much more to comment on in addition to the comments
> within the patch/files.
>

[-- Attachment #1.2: Type: text/html, Size: 1233 bytes --]

[-- Attachment #2: 0001-Add-support-for-EPUB-CBZ-FB2-and-O-XPS-extension-to-.patch --]
[-- Type: text/x-patch, Size: 17588 bytes --]

From 53913d3b2c667fa8fda1df671212a3c64a3b21a2 Mon Sep 17 00:00:00 2001
From: Daniel Nicolai <dalanicolai@gmail.com>
Date: Tue, 11 Jan 2022 20:37:36 +0100
Subject: [PATCH] Add support for EPUB, CBZ, FB2 and (O)XPS extension to doc
 view

* doc/emacs/misc.texi (Document View):
Add requirements for new extensions (i.e. mutool)

* lisp/doc-view.el (doc-view): Additionally update preliminary comment
(doc-view-custom-set-epub-font-size): redraw image after setting
(doc-view-unoconv-program): Put code all on one line
(doc-view-doc-type): Update docstring.
(doc-view-kill-proc): Fix comment indentation
(doc-view-mode-p):
Add check for new extensions and alternative check for PDF
(doc-view-pdf/ps->png): Associate new extension with png converter
(doc-view-convert-current-doc): Handle new extensions like PDF's
(doc-view-set-doc-type): Set correct doc-type for new extensions.

* lisp/files.el (auto-mode-alist):
Associate new extension types with doc-view
---
 doc/emacs/misc.texi |  23 ++++---
 lisp/doc-view.el    | 153 ++++++++++++++++++++++++++++----------------
 lisp/files.el       |   2 +-
 3 files changed, 114 insertions(+), 64 deletions(-)

diff --git a/doc/emacs/misc.texi b/doc/emacs/misc.texi
index df1e5ef238..365c079e89 100644
--- a/doc/emacs/misc.texi
+++ b/doc/emacs/misc.texi
@@ -455,20 +455,27 @@ Document View
 @cindex PostScript file
 @cindex OpenDocument file
 @cindex Microsoft Office file
+@cindex EPUB file
+@cindex CBZ file
+@cindex FB2 file
+@cindex XPS file
+@cindex OXPS file
 @cindex DocView mode
 @cindex mode, DocView
 @cindex document viewer (DocView)
 @findex doc-view-mode
 
   DocView mode is a major mode for viewing DVI, PostScript (PS), PDF,
-OpenDocument, and Microsoft Office documents.  It provides features
-such as slicing, zooming, and searching inside documents.  It works by
-converting the document to a set of images using the @command{gs}
-(GhostScript) or @command{mudraw}/@command{pdfdraw} (MuPDF) commands
-and other external tools @footnote{For PostScript files, GhostScript
-is a hard requirement.  For DVI files, @code{dvipdf} or @code{dvipdfm}
-is needed.  For OpenDocument and Microsoft Office documents, the
-@code{unoconv} tool is needed.}, and displaying those images.
+OpenDocument, Microsoft Office, EPUB, CBZ, FB2, XPS and OXPS
+documents.  It provides features such as slicing, zooming, and
+searching inside documents.  It works by converting the document to a
+set of images using the @command{gs} (GhostScript) or
+@command{pdfdraw}/@command{mutool draw} (MuPDF) commands and other
+external tools @footnote{PostScript files require GhostScript, DVI
+files require @code{dvipdf} or @code{dvipdfm}, OpenDocument and
+Microsoft Office documents require the @code{unoconv} tool, and EPUB,
+CBZ, FB2, XPS and OXPS files require @code{mutool} to be available.},
+and displaying those images.
 
 @findex doc-view-toggle-display
 @findex doc-view-minor-mode
diff --git a/lisp/doc-view.el b/lisp/doc-view.el
index 5b462b24f5..57144ece1c 100644
--- a/lisp/doc-view.el
+++ b/lisp/doc-view.el
@@ -3,7 +3,7 @@
 ;; Copyright (C) 2007-2022 Free Software Foundation, Inc.
 ;;
 ;; Author: Tassilo Horn <tsdh@gnu.org>
-;; Keywords: files, pdf, ps, dvi
+;; Keywords: files, pdf, ps, dvi, djvu, epub, cbz, fb2, xps, openxps
 
 ;; This file is part of GNU Emacs.
 
@@ -25,17 +25,19 @@
 ;; Viewing PS/PDF/DVI files requires Ghostscript, `dvipdf' (comes with
 ;; Ghostscript) or `dvipdfm' (comes with teTeX or TeXLive) and
 ;; `pdftotext', which comes with xpdf (https://www.foolabs.com/xpdf/)
-;; or poppler (https://poppler.freedesktop.org/).
-;; Djvu documents require `ddjvu' (from DjVuLibre).
-;; ODF files require `soffice' (from LibreOffice).
+;; or poppler (https://poppler.freedesktop.org/). EPUB, CBZ, FB2, XPS
+;; and OXPS documents require `mutool' which comes with mupdf
+;; (https://mupdf.com/index.html). Djvu documents require `ddjvu'
+;; (from DjVuLibre).  ODF files require `soffice' (from LibreOffice).
 
 ;;; Commentary:
 
 ;; DocView is a document viewer for Emacs.  It converts a number of
-;; document formats (including PDF, PS, DVI, Djvu and ODF files) to a
-;; set of PNG files, one PNG for each page, and displays the PNG
-;; images inside an Emacs buffer.  This buffer uses `doc-view-mode'
-;; which provides convenient key bindings for browsing the document.
+;; document formats (including PDF, PS, DVI, Djvu, ODF, EPUB, CBZ,
+;; FB2, XPS and OXPS files) to a set of PNG (or TIFF for djvu) files,
+;; one image for each page, and displays the images inside an Emacs
+;; buffer.  This buffer uses `doc-view-mode' which provides convenient
+;; key bindings for browsing the document.
 ;;
 ;; To use it simply open a document file with
 ;;
@@ -147,7 +149,10 @@
 ;;;; Customization Options
 
 (defgroup doc-view nil
-  "In-buffer viewer for PDF, PostScript, DVI, and DJVU files."
+  "In-buffer document viewer.
+The viewer handles PDF, PostScript, DVI, DJVU, ODF, EPUB, CBZ,
+FB2, XPS and OXPS files, if the appropriate converter programs
+are available (see Info node `(emacs)Document View')"
   :link '(function-link doc-view)
   :version "22.2"
   :group 'applications
@@ -221,6 +226,20 @@ doc-view-resolution
 Higher values result in larger images."
   :type 'number)
 
+(defun doc-view-custom-set-epub-font-size (option-name new-value)
+  (set-default option-name new-value)
+  (dolist (x (buffer-list))
+    (with-current-buffer x
+      (when (eq doc-view-doc-type 'epub)
+        (delete-directory doc-view--current-cache-dir t)
+        (doc-view-initiate-display)
+        (doc-view-goto-page (doc-view-current-page))))))
+
+(defcustom doc-view-epub-font-size nil
+  "Font size in points for EPUB layout."
+  :type 'integer
+  :set #'doc-view-custom-set-epub-font-size)
+
 (defcustom doc-view-scale-internally t
   "Whether we should try to rescale images ourselves.
 If nil, the document is re-rendered every time the scaling factor is modified.
@@ -256,9 +275,7 @@ doc-view-dvipdf-program
 `doc-view-dvipdf-program' will be preferred."
   :type 'file)
 
-(define-obsolete-variable-alias 'doc-view-unoconv-program
-                                'doc-view-odf->pdf-converter-program
-                                "24.4")
+(define-obsolete-variable-alias 'doc-view-unoconv-program 'doc-view-odf->pdf-converter-program "24.4")
 
 (defcustom doc-view-odf->pdf-converter-program
   (cond
@@ -382,7 +399,8 @@ doc-view--buffer-file-name
 
 (defvar doc-view-doc-type nil
   "The type of document in the current buffer.
-Can be `dvi', `pdf', `ps', `djvu' or `odf'.")
+Can be `dvi', `pdf', `ps', `djvu', `odf', 'epub', `cbz', `fb2',
+`'xps' or `oxps'.")
 
 (defvar doc-view-single-page-converter-function nil
   "Function to call to convert a single page of the document to a bitmap file.
@@ -464,17 +482,17 @@ doc-view--revert-buffer
       ;; It's normal for this operation to result in a very large undo entry.
       (setq-local undo-outer-limit (* 2 (buffer-size))))
   (cl-labels ((revert ()
-                      (let ((revert-buffer-preserve-modes t))
-                        (apply orig-fun args)
-                        ;; Update the cached version of the pdf file,
-                        ;; too.  This is the one that's used when
-                        ;; rendering (bug#26996).
-                        (unless (equal buffer-file-name
-                                       doc-view--buffer-file-name)
-                          ;; FIXME: Lars says he needed to recreate
-                          ;; the dir, we should figure out why.
-                          (doc-view-make-safe-dir doc-view-cache-directory)
-                          (write-region nil nil doc-view--buffer-file-name)))))
+                (let ((revert-buffer-preserve-modes t))
+                  (apply orig-fun args)
+                  ;; Update the cached version of the pdf file,
+                  ;; too.  This is the one that's used when
+                  ;; rendering (bug#26996).
+                  (unless (equal buffer-file-name
+                                 doc-view--buffer-file-name)
+                    ;; FIXME: Lars says he needed to recreate
+                    ;; the dir, we should figure out why.
+                    (doc-view-make-safe-dir doc-view-cache-directory)
+                    (write-region nil nil doc-view--buffer-file-name)))))
     (if (and (eq 'pdf doc-view-doc-type)
              (executable-find "pdfinfo"))
         ;; We don't want to revert if the PDF file is corrupted which
@@ -738,7 +756,7 @@ doc-view-kill-proc
   (interactive)
   (while (consp doc-view--current-converter-processes)
     (ignore-errors ;; Some entries might not be processes, and maybe
-		   ;; some are dead already?
+                    ; some are dead already?
       (kill-process (pop doc-view--current-converter-processes))))
   (when doc-view--current-timer
     (cancel-timer doc-view--current-timer)
@@ -799,8 +817,8 @@ doc-view--current-cache-dir
 ;;;###autoload
 (defun doc-view-mode-p (type)
   "Return non-nil if document type TYPE is available for `doc-view'.
-Document types are symbols like `dvi', `ps', `pdf', or `odf' (any
-OpenDocument format)."
+Document types are symbols like `dvi', `ps', `pdf', `epub',
+`cbz', `fb2', `xps', `oxps', or`odf' (any OpenDocument format)."
   (and (display-graphic-p)
        (image-type-available-p 'png)
        (cond
@@ -811,16 +829,22 @@ doc-view-mode-p
 		  (and doc-view-dvipdfm-program
 		       (executable-find doc-view-dvipdfm-program)))))
 	((memq type '(postscript ps eps pdf))
-	 (or (and doc-view-ghostscript-program
+         (or (and doc-view-ghostscript-program
 	          (executable-find doc-view-ghostscript-program))
-             (and doc-view-pdfdraw-program
-                  (executable-find doc-view-pdfdraw-program))))
+             ;; for pdf also check for `doc-view-pdfdraw-program'
+             (when (eq type 'pdf)
+               (and doc-view-pdfdraw-program
+                    (executable-find doc-view-pdfdraw-program)))))
 	((eq type 'odf)
 	 (and doc-view-odf->pdf-converter-program
 	      (executable-find doc-view-odf->pdf-converter-program)
 	      (doc-view-mode-p 'pdf)))
 	((eq type 'djvu)
 	 (executable-find "ddjvu"))
+        ((memq type '(epub cbz fb2 xps oxps))
+         ;; first check if `doc-view-pdfdraw-program' is set to mutool
+         (and (string= doc-view-pdfdraw-program "mutool")
+              (executable-find "mutool")))
 	(t ;; unknown image type
 	 nil))))
 
@@ -1053,7 +1077,7 @@ doc-view-start-process
   ;; some file-name-handler-managed dir, for example).
   (let* ((default-directory (or (unhandled-file-name-directory
                                  default-directory)
-			      (expand-file-name "~/")))
+			        (expand-file-name "~/")))
          (proc (apply #'start-process name doc-view-conversion-buffer
                       program args)))
     (push proc doc-view--current-converter-processes)
@@ -1139,14 +1163,17 @@ doc-view-pdf-password-protected-pdfdraw-p
     (search-forward "error: cannot authenticate password" nil t)))
 
 (defun doc-view-pdf->png-converter-mupdf (pdf png page callback)
-  (let ((pdf-passwd (if (doc-view-pdf-password-protected-pdfdraw-p pdf)
-                        (read-passwd "Enter password for PDF file: "))))
+  (let* ((pdf-passwd (if (doc-view-pdf-password-protected-pdfdraw-p pdf)
+                         (read-passwd "Enter password for PDF file: ")))
+         (options `(,(concat "-o" png)
+                    ,(format "-r%d" (round doc-view-resolution))
+                    ,@(if pdf-passwd `("-p" ,pdf-passwd)))))
+    (when (and (eq doc-view-doc-type 'epub) doc-view-epub-font-size)
+      (setq options (append options (list (format "-S%s" doc-view-epub-font-size)))))
     (doc-view-start-process
      "pdf->png" doc-view-pdfdraw-program
      `(,@(doc-view-pdfdraw-program-subcommand)
-       ,(concat "-o" png)
-       ,(format "-r%d" (round doc-view-resolution))
-       ,@(if pdf-passwd `("-p" ,pdf-passwd))
+       ,@options
        ,pdf
        ,@(if page `(,(format "%d" page))))
      callback)))
@@ -1189,7 +1216,7 @@ doc-view-pdf/ps->png
   "Convert PDF-PS to PNG asynchronously."
   (funcall
    (pcase doc-view-doc-type
-     ('pdf doc-view-pdf->png-converter-function)
+     ((or 'pdf 'epub 'cbz 'fb2 'xps 'oxps) doc-view-pdf->png-converter-function)
      ('djvu #'doc-view-djvu->tiff-converter-ddjvu)
      (_ #'doc-view-ps->png-converter-ghostscript))
    pdf-ps png nil
@@ -1227,20 +1254,20 @@ doc-view-document->bitmap
     (let ((rest (cdr pages)))
       (funcall doc-view-single-page-converter-function
 	       pdf (format png (car pages)) (car pages)
-       (lambda ()
-         (if rest
-             (doc-view-document->bitmap pdf png rest)
-           ;; Yippie, the important pages are done, update the display.
-           (clear-image-cache)
-           ;; For the windows that have a message (like "Welcome to
-           ;; DocView") display property, clearing the image cache is
-           ;; not sufficient.
-           (dolist (win (get-buffer-window-list (current-buffer) nil 'visible))
-             (with-selected-window win
-	       (when (stringp (overlay-get (doc-view-current-overlay) 'display))
-		 (doc-view-goto-page (doc-view-current-page)))))
-           ;; Convert the rest of the pages.
-           (doc-view-pdf/ps->png pdf png)))))))
+               (lambda ()
+                 (if rest
+                     (doc-view-document->bitmap pdf png rest)
+                   ;; Yippie, the important pages are done, update the display.
+                   (clear-image-cache)
+                   ;; For the windows that have a message (like "Welcome to
+                   ;; DocView") display property, clearing the image cache is
+                   ;; not sufficient.
+                   (dolist (win (get-buffer-window-list (current-buffer) nil 'visible))
+                     (with-selected-window win
+	               (when (stringp (overlay-get (doc-view-current-overlay) 'display))
+		         (doc-view-goto-page (doc-view-current-page)))))
+                   ;; Convert the rest of the pages.
+                   (doc-view-pdf/ps->png pdf png)))))))
 
 (defun doc-view-pdf->txt (pdf txt callback)
   "Convert PDF to TXT asynchronously and call CALLBACK when finished."
@@ -1337,7 +1364,9 @@ doc-view-convert-current-doc
 		    ;; Rename to doc.pdf
 		    (rename-file opdf pdf)
 		    (doc-view-pdf/ps->png pdf png-file)))))
-      ((or 'pdf 'djvu)
+      ;; The doc-view-mode-p check ensures that epub, cbz, fb2 and
+      ;; (o)xps are handled with mutool
+      ((or 'pdf 'djvu 'epub 'cbz 'fb2 'xps 'oxps)
        (let ((pages (doc-view-active-pages)))
          ;; Convert doc to bitmap images starting with the active pages.
          (doc-view-document->bitmap doc-view--buffer-file-name png-file pages)))
@@ -1432,7 +1461,7 @@ doc-view-paper-sizes
 (defun doc-view-guess-paper-size (iw ih)
   "Guess the paper size according to the aspect ratio."
   (cl-labels ((div (x y)
-		   (round (/ (* 100.0 x) y))))
+		(round (/ (* 100.0 x) y))))
     (let ((ar (div iw ih))
 	  (al (mapcar (lambda (l)
 			(list (div (nth 1 l) (nth 2 l)) (car l)))
@@ -1869,6 +1898,8 @@ doc-view-set-doc-type
                    ("dvi" dvi)
                    ;; PDF
                    ("pdf" pdf) ("epdf" pdf)
+                   ;; EPUB
+                   ("epub" epub)
                    ;; PostScript
                    ("ps" ps) ("eps" ps)
                    ;; DjVu
@@ -1880,7 +1911,13 @@ doc-view-set-doc-type
                    ;; Microsoft Office formats (also handled by the odf
                    ;; conversion chain).
                    ("doc" odf) ("docx" odf) ("xls" odf) ("xlsx" odf)
-                   ("ppt" odf) ("pps" odf) ("pptx" odf) ("rtf" odf))
+                   ("ppt" odf) ("pps" odf) ("pptx" odf) ("rtf" odf)
+                   ;; CBZ
+                   ("cbz" cbz)
+                   ;; FB2
+                   ("fb2" fb2)
+                   ;; (Open)XPS
+                   ("xps" xps) ("oxps" oxps))
 		 t))))
 	(content-types
 	 (save-excursion
@@ -1889,7 +1926,13 @@ doc-view-set-doc-type
 	    ((looking-at "%!") '(ps))
 	    ((looking-at "%PDF") '(pdf))
 	    ((looking-at "\367\002") '(dvi))
-	    ((looking-at "AT&TFORM") '(djvu))))))
+	    ((looking-at "AT&TFORM") '(djvu))
+            ;; The following pattern actually is for recognizing
+            ;; zip-archives, so that this same association is used for
+            ;; cbz files. This is fine, as cbz files should be handled
+            ;; like epub anyway.
+            ((looking-at "PK") '(epub))
+            ))))
     (setq-local
      doc-view-doc-type
      (car (or (nreverse (seq-intersection name-types content-types #'eq))
diff --git a/lisp/files.el b/lisp/files.el
index a11786fca2..f2c656bfde 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -2925,7 +2925,7 @@ auto-mode-alist
      ("\\.\\(diffs?\\|patch\\|rej\\)\\'" . diff-mode)
      ("\\.\\(dif\\|pat\\)\\'" . diff-mode) ; for MS-DOS
      ("\\.[eE]?[pP][sS]\\'" . ps-mode)
-     ("\\.\\(?:PDF\\|DVI\\|OD[FGPST]\\|DOCX\\|XLSX?\\|PPTX?\\|pdf\\|djvu\\|dvi\\|od[fgpst]\\|docx\\|xlsx?\\|pptx?\\)\\'" . doc-view-mode-maybe)
+     ("\\.\\(?:PDF\\|EPUB\\|CBZ\\|FB2\\|O?XPS\\|DVI\\|OD[FGPST]\\|DOCX\\|XLSX?\\|PPTX?\\|pdf\\|epub\\|cbz\\|fb2\\|o?xps\\|djvu\\|dvi\\|od[fgpst]\\|docx\\|xlsx?\\|pptx?\\)\\'" . doc-view-mode-maybe)
      ("configure\\.\\(ac\\|in\\)\\'" . autoconf-mode)
      ("\\.s\\(v\\|iv\\|ieve\\)\\'" . sieve-mode)
      ("BROWSE\\'" . ebrowse-tree-mode)
-- 
2.33.1


  reply	other threads:[~2022-01-14 20:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11  2:20 [PATCH] add epub support to doc-view dalanicolai
2022-01-11  2:32 ` Po Lu
2022-01-11  9:34   ` dalanicolai
2022-01-11  9:50     ` Tassilo Horn
2022-01-11 10:04       ` dalanicolai
2022-01-11 10:08         ` dalanicolai
2022-01-11 10:15         ` Robert Pluim
2022-01-11  9:59     ` Robert Pluim
2022-01-11 10:09       ` dalanicolai
2022-01-11  2:48 ` Stefan Monnier
2022-01-11  3:30   ` Stefan Kangas
2022-01-11 10:01     ` dalanicolai
2022-01-11 10:16       ` Robert Pluim
2022-01-13  9:14         ` dalanicolai
2022-01-11  9:59   ` dalanicolai
2022-01-11 10:13     ` dalanicolai
2022-01-11 14:39     ` Stefan Monnier
2022-01-13  9:25       ` dalanicolai
2022-01-14 16:15         ` dalanicolai
2022-01-14 20:02           ` dalanicolai [this message]
2022-01-26 20:28             ` dalanicolai
2022-01-27 16:05               ` Lars Ingebrigtsen
2022-01-27 21:09                 ` Iñigo Serna
2022-01-28 13:47                   ` Lars Ingebrigtsen
2022-01-28 19:51                     ` Iñigo Serna
2022-01-29 17:07                 ` dalanicolai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACJP=3mmKOy4zVmZmDBRMSft7d3PTUYaTkucjMwTVH2TKWg8uw@mail.gmail.com' \
    --to=dalanicolai@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).