all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Visuwesh <visuweshm@gmail.com>
To: 73638@debbugs.gnu.org
Cc: "Tassilo Horn" <tsdh@gnu.org>
Subject: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Date: Sat, 05 Oct 2024 16:36:23 +0530	[thread overview]
Message-ID: <87ploebyhc.fsf@gmail.com> (raw)

This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.

Currently, if mutool reports the outline as

    % mutool show test.pdf outline
    |	"Text"	#nameddest=section.1
    |	"Annotations"	#nameddest=section.2
    |	"Links"	#nameddest=section.3
    |	"Attachments"	#nameddest=section.4
    +	"Outline"	#nameddest=section.5
    +		"subsection"	#nameddest=subsection.5.1
    |			"subsubsection"	#nameddest=subsubsection.5.1.1

then nothing can be done.  Looking at the source code of mutool, it
looks like the "#..." part is simply a URI.  AFAICT, there's no way to
resolve the URI to get the page number using mutool.  However, one can
write a JS script instead.  Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:

    % mutool run outline.js test.pdf
    (
    ((level . 1)
    (title . "Text")
    (page . 0))
    ((level . 1)
    (title . "Annotations")
    (page . 1))
    ((level . 1)
    (title . "Links")
    (page . 2))
    ((level . 1)
    (title . "Attachments")
    (page . 3))
    ((level . 1)
    (title . "Outline")
    (page . 4))
    ((level . 2)
    (title . "subsection")
    (page . 4))
    ((level . 3)
    (title . "subsubsection")
    (page . 4))
    )

This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':

    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.32s real     0m00.29s user     0m00.02s system
    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.31s real     0m00.29s user     0m00.02s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.33s real     0m00.29s user     0m00.04s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.30s real     0m00.25s user     0m00.04s system

[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
  testing in the previous bug report.  ]

I don't know JS at all so the script can probably be improved.  The docs
for the JS interface is at

    https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html

If this approach is acceptable, we can simply run the JS script instead.
WDYT?

[ I couldn't attach the JS script thanks to Gmail's blocking the
  message.  ]

outline.js:

var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()

print("(")

function pp(outl, level){
    print("((level . " + level + ")")
    print("(title . " + repr(outl.title) + ")")
    print("(page . " + document.resolveLink(outl.uri) + "))")
    if(outl.down){
	for(var i=0; i<outl.down.length; i++){
	    pp(outl.down[i], level+1)
	}
    }
}

for(var i=0; i<outline.length; i++){
    pp(outline[i], 1)
}

print(")")





             reply	other threads:[~2024-10-05 11:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-05 11:06 Visuwesh [this message]
2024-10-05 19:56 ` bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs Tassilo Horn
2024-10-06  5:42   ` Eli Zaretskii
2024-10-06  6:28     ` Visuwesh
2024-10-06  6:39       ` Eli Zaretskii
2024-10-06  8:16       ` Tassilo Horn
2024-10-06 10:32         ` Visuwesh
2024-10-06 11:26           ` Tassilo Horn
2024-10-06 12:32             ` Visuwesh
2024-10-07  7:02               ` Tassilo Horn
2024-10-07  9:26                 ` Visuwesh
2024-10-07  9:55                   ` Visuwesh
2024-10-07 11:03                     ` Tassilo Horn
2024-10-07 12:53                       ` Visuwesh
2024-10-07 15:04                         ` Tassilo Horn
2024-10-08  9:44                           ` Visuwesh
2024-10-08 15:43                             ` Tassilo Horn
2024-10-06  6:39     ` Visuwesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ploebyhc.fsf@gmail.com \
    --to=visuweshm@gmail.com \
    --cc=73638@debbugs.gnu.org \
    --cc=tsdh@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.