From: Visuwesh <visuweshm@gmail.com>
To: 73638@debbugs.gnu.org
Cc: "Tassilo Horn" <tsdh@gnu.org>
Subject: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Date: Sat, 05 Oct 2024 16:36:23 +0530 [thread overview]
Message-ID: <87ploebyhc.fsf@gmail.com> (raw)
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.
Currently, if mutool reports the outline as
% mutool show test.pdf outline
| "Text" #nameddest=section.1
| "Annotations" #nameddest=section.2
| "Links" #nameddest=section.3
| "Attachments" #nameddest=section.4
+ "Outline" #nameddest=section.5
+ "subsection" #nameddest=subsection.5.1
| "subsubsection" #nameddest=subsubsection.5.1.1
then nothing can be done. Looking at the source code of mutool, it
looks like the "#..." part is simply a URI. AFAICT, there's no way to
resolve the URI to get the page number using mutool. However, one can
write a JS script instead. Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:
% mutool run outline.js test.pdf
(
((level . 1)
(title . "Text")
(page . 0))
((level . 1)
(title . "Annotations")
(page . 1))
((level . 1)
(title . "Links")
(page . 2))
((level . 1)
(title . "Attachments")
(page . 3))
((level . 1)
(title . "Outline")
(page . 4))
((level . 2)
(title . "subsection")
(page . 4))
((level . 3)
(title . "subsubsection")
(page . 4))
)
This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.32s real 0m00.29s user 0m00.02s system
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.31s real 0m00.29s user 0m00.02s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.33s real 0m00.29s user 0m00.04s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.30s real 0m00.25s user 0m00.04s system
[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
testing in the previous bug report. ]
I don't know JS at all so the script can probably be improved. The docs
for the JS interface is at
https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html
If this approach is acceptable, we can simply run the JS script instead.
WDYT?
[ I couldn't attach the JS script thanks to Gmail's blocking the
message. ]
outline.js:
var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()
print("(")
function pp(outl, level){
print("((level . " + level + ")")
print("(title . " + repr(outl.title) + ")")
print("(page . " + document.resolveLink(outl.uri) + "))")
if(outl.down){
for(var i=0; i<outl.down.length; i++){
pp(outl.down[i], level+1)
}
}
}
for(var i=0; i<outline.length; i++){
pp(outline[i], 1)
}
print(")")
next reply other threads:[~2024-10-05 11:06 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-05 11:06 Visuwesh [this message]
2024-10-05 19:56 ` bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs Tassilo Horn
2024-10-06 5:42 ` Eli Zaretskii
2024-10-06 6:28 ` Visuwesh
2024-10-06 6:39 ` Eli Zaretskii
2024-10-06 8:16 ` Tassilo Horn
2024-10-06 10:32 ` Visuwesh
2024-10-06 11:26 ` Tassilo Horn
2024-10-06 12:32 ` Visuwesh
2024-10-07 7:02 ` Tassilo Horn
2024-10-07 9:26 ` Visuwesh
2024-10-07 9:55 ` Visuwesh
2024-10-07 11:03 ` Tassilo Horn
2024-10-07 12:53 ` Visuwesh
2024-10-07 15:04 ` Tassilo Horn
2024-10-08 9:44 ` Visuwesh
2024-10-08 15:43 ` Tassilo Horn
2024-10-06 6:39 ` Visuwesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ploebyhc.fsf@gmail.com \
--to=visuweshm@gmail.com \
--cc=73638@debbugs.gnu.org \
--cc=tsdh@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.