unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
@ 2024-10-05 11:06 Visuwesh
  2024-10-05 19:56 ` Tassilo Horn
  0 siblings, 1 reply; 18+ messages in thread
From: Visuwesh @ 2024-10-05 11:06 UTC (permalink / raw)
  To: 73638; +Cc: Tassilo Horn

This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.

Currently, if mutool reports the outline as

    % mutool show test.pdf outline
    |	"Text"	#nameddest=section.1
    |	"Annotations"	#nameddest=section.2
    |	"Links"	#nameddest=section.3
    |	"Attachments"	#nameddest=section.4
    +	"Outline"	#nameddest=section.5
    +		"subsection"	#nameddest=subsection.5.1
    |			"subsubsection"	#nameddest=subsubsection.5.1.1

then nothing can be done.  Looking at the source code of mutool, it
looks like the "#..." part is simply a URI.  AFAICT, there's no way to
resolve the URI to get the page number using mutool.  However, one can
write a JS script instead.  Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:

    % mutool run outline.js test.pdf
    (
    ((level . 1)
    (title . "Text")
    (page . 0))
    ((level . 1)
    (title . "Annotations")
    (page . 1))
    ((level . 1)
    (title . "Links")
    (page . 2))
    ((level . 1)
    (title . "Attachments")
    (page . 3))
    ((level . 1)
    (title . "Outline")
    (page . 4))
    ((level . 2)
    (title . "subsection")
    (page . 4))
    ((level . 3)
    (title . "subsubsection")
    (page . 4))
    )

This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':

    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.32s real     0m00.29s user     0m00.02s system
    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.31s real     0m00.29s user     0m00.02s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.33s real     0m00.29s user     0m00.04s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.30s real     0m00.25s user     0m00.04s system

[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
  testing in the previous bug report.  ]

I don't know JS at all so the script can probably be improved.  The docs
for the JS interface is at

    https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html

If this approach is acceptable, we can simply run the JS script instead.
WDYT?

[ I couldn't attach the JS script thanks to Gmail's blocking the
  message.  ]

outline.js:

var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()

print("(")

function pp(outl, level){
    print("((level . " + level + ")")
    print("(title . " + repr(outl.title) + ")")
    print("(page . " + document.resolveLink(outl.uri) + "))")
    if(outl.down){
	for(var i=0; i<outl.down.length; i++){
	    pp(outl.down[i], level+1)
	}
    }
}

for(var i=0; i<outline.length; i++){
    pp(outline[i], 1)
}

print(")")





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-10-08 15:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-05 11:06 bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs Visuwesh
2024-10-05 19:56 ` Tassilo Horn
2024-10-06  5:42   ` Eli Zaretskii
2024-10-06  6:28     ` Visuwesh
2024-10-06  6:39       ` Eli Zaretskii
2024-10-06  8:16       ` Tassilo Horn
2024-10-06 10:32         ` Visuwesh
2024-10-06 11:26           ` Tassilo Horn
2024-10-06 12:32             ` Visuwesh
2024-10-07  7:02               ` Tassilo Horn
2024-10-07  9:26                 ` Visuwesh
2024-10-07  9:55                   ` Visuwesh
2024-10-07 11:03                     ` Tassilo Horn
2024-10-07 12:53                       ` Visuwesh
2024-10-07 15:04                         ` Tassilo Horn
2024-10-08  9:44                           ` Visuwesh
2024-10-08 15:43                             ` Tassilo Horn
2024-10-06  6:39     ` Visuwesh

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).