all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Visuwesh <visuweshm@gmail.com>
To: Tassilo Horn <tsdh@gnu.org>
Cc: Eli Zaretskii <eliz@gnu.org>, 73638@debbugs.gnu.org
Subject: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Date: Mon, 07 Oct 2024 15:25:38 +0530	[thread overview]
Message-ID: <87zfng44px.fsf@gmail.com> (raw)
In-Reply-To: <87bjzw5knb.fsf@gmail.com> (Visuwesh's message of "Mon, 07 Oct 2024 14:56:16 +0530")

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

[திங்கள் அக்டோபர் 07, 2024] Visuwesh wrote:

>>> [...]
>>> -        (nreverse outline)))))
>>> +          (search-forward "BEGIN")
>>
>> If the script fails for some reason, there will be no BEGIN and we let a
>> search-failed error bubble up.  So I'd put it in the condition-case and
>> handle it like the end-of-file error.  Or simply provide the NOERROR
>> search-forward arg.
>
> Ahh, the intention of the condition-case below was to handle this case.
> Thanks for catching my mistake, it is a common error of mine to forget
> the NOERROR argument.
>
>>> +          (condition-case nil
>>> +              (read (current-buffer))
>>> +            (end-of-file nil)))))))
>>
>> Maybe it would also a good idea to use a :stderr buffer with
>> make-process and put its contents into the imenu-unavailable-error.
>> That way, chances are better we get the reason for failure delivered in
>> bug reports.
>
> I do not think it is worth the trouble since only syntax errors are
> likely to surface up in stderr which would be very unlikely.  If the PDF
> file does not have an outline, there would be nothing printed by our
> script so end-of-file error should catch that case.  

Actually, this wasn't quite correct I think.  We would have stray > in
the buffer and read would return the symbol >.  I corrected that in the
attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-imenu-index-generation-for-PDFs-more-reliable.patch --]
[-- Type: text/x-diff, Size: 4414 bytes --]

From a5055e18889460b429ccacf2970c7ccaf5f423c7 Mon Sep 17 00:00:00 2001
From: Visuwesh <visuweshm@gmail.com>
Date: Sun, 6 Oct 2024 18:02:06 +0530
Subject: [PATCH] Make imenu index generation for PDFs more reliable

Do away with parsing the output of "mutool show FILE outline"
since the URI reported in its output may not include the page
number of the heading, and instead may contained "nameddest"
elements which cannot be resolved using "mutool".  Instead, use
a MuPDF JS script to generate the PDF outline allowing to
resolve such URIs.

* lisp/doc-view.el (doc-view--outline-rx): Remove as no longer
needed.
(doc-view--outline): Reflect that outline can be generated for
non-PDF files too.
(doc-view--mutool-pdf-outline-script): Add new variable to hold
the JS script used to generate the outline.
(doc-view--pdf-outline): Use the script.  (bug#73638)
---
 lisp/doc-view.el | 48 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/lisp/doc-view.el b/lisp/doc-view.el
index 446beeafd9f..fcfdff18a40 100644
--- a/lisp/doc-view.el
+++ b/lisp/doc-view.el
@@ -1969,14 +1969,26 @@ doc-view-search-previous-match
 	(doc-view-goto-page (caar (last doc-view--current-search-matches)))))))
 
 ;;;; Imenu support
-(defconst doc-view--outline-rx
-  "[^\t]+\\(\t+\\)\"\\(.+\\)\"\t#\\(?:page=\\)?\\([0-9]+\\)")
-
 (defvar-local doc-view--outline nil
-  "Cached PDF outline, so that it is only computed once per document.
+  "Cached document outline, so that it is only computed once per document.
 It can be the symbol `unavailable' to indicate that outline is
 unavailable for the document.")
 
+(defvar doc-view--mutool-pdf-outline-script
+  "var document = new Document.openDocument(\"%s\", \"application/pdf\");
+var outline = document.loadOutline();
+if(!outline) quit();
+function pp(outl, level){print(\"((level . \" + level + \")\");\
+print(\"(title . \" + repr(outl.title) + \")\");\
+print(\"(page . \" + document.resolveLink(outl.uri) + \"))\");\
+if(outl.down){for(var i=0; i<outl.down.length; i++){pp(outl.down[i], level+1);}}};
+function run(){print(\"BEGIN(\");\
+for(var i=0; i<outline.length; i++){pp(outline[i], 1);}print(\")\");};
+run()"
+  "JS script to extract the PDF's outline using mutool.
+The script has to be minified to pass it to the REPL.  The \"BEGIN\"
+marker is here to skip past the prompt characters.")
+
 (defun doc-view--pdf-outline (&optional file-name)
   "Return a list describing the outline of FILE-NAME.
 Return a list describing the current file if FILE-NAME is nil.
@@ -1986,21 +1998,25 @@ doc-view--pdf-outline
 structure is extracted by `doc-view--imenu-subtree'."
   (let ((fn (or file-name (buffer-file-name))))
     (when fn
-      (let ((outline nil)
-            (fn (expand-file-name fn)))
-        (with-temp-buffer
-          (unless (eql 0 (call-process doc-view-pdfdraw-program nil
-                                       (current-buffer) nil "show" fn "outline"))
+      (with-temp-buffer
+        (let ((proc (make-process
+                     :name "doc-view-pdf-outline"
+                     :command (list "mutool" "run")
+                     :buffer (current-buffer))))
+          (process-send-string proc (format doc-view--mutool-pdf-outline-script
+                                            (expand-file-name fn)))
+          ;; Need to send this twice for some reason...
+          (process-send-eof)
+          (process-send-eof)
+          (while (accept-process-output proc))
+          (unless (eq (process-status proc) 'exit)
             (setq doc-view--outline 'unavailable)
             (imenu-unavailable-error "Unable to create imenu index using `mutool'"))
           (goto-char (point-min))
-          (while (re-search-forward doc-view--outline-rx nil t)
-            (push `((level . ,(length (match-string 1)))
-                    (title . ,(replace-regexp-in-string "\\\\[rt]" " "
-                                                        (match-string 2)))
-                    (page . ,(string-to-number (match-string 3))))
-                  outline)))
-        (nreverse outline)))))
+          (when (search-forward "BEGIN" nil t)
+            (condition-case nil
+                (read (current-buffer))
+              (end-of-file nil))))))))
 
 (defun doc-view--djvu-outline (&optional file-name)
   "Return a list describing the outline of FILE-NAME.
-- 
2.45.2


[-- Attachment #3: Type: text/plain, Size: 189 bytes --]


>
>> Otherwise, it all looks good to me. :-)
>
> If you are okay with leaving out the stderr case, I will send a patch
> with a non-nil NOERROR argument to the quoted search-forward form.

  reply	other threads:[~2024-10-07  9:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-05 11:06 bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs Visuwesh
2024-10-05 19:56 ` Tassilo Horn
2024-10-06  5:42   ` Eli Zaretskii
2024-10-06  6:28     ` Visuwesh
2024-10-06  6:39       ` Eli Zaretskii
2024-10-06  8:16       ` Tassilo Horn
2024-10-06 10:32         ` Visuwesh
2024-10-06 11:26           ` Tassilo Horn
2024-10-06 12:32             ` Visuwesh
2024-10-07  7:02               ` Tassilo Horn
2024-10-07  9:26                 ` Visuwesh
2024-10-07  9:55                   ` Visuwesh [this message]
2024-10-07 11:03                     ` Tassilo Horn
2024-10-07 12:53                       ` Visuwesh
2024-10-07 15:04                         ` Tassilo Horn
2024-10-08  9:44                           ` Visuwesh
2024-10-08 15:43                             ` Tassilo Horn
2024-10-06  6:39     ` Visuwesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zfng44px.fsf@gmail.com \
    --to=visuweshm@gmail.com \
    --cc=73638@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=tsdh@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.