From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Visuwesh Newsgroups: gmane.emacs.bugs Subject: bug#73530: [PATCH] Add imenu index function for Djvu files in doc-view Date: Thu, 03 Oct 2024 16:40:29 +0530 Message-ID: <87plohcuhm.fsf@gmail.com> References: <8734ljg6f5.fsf@gmail.com> <86msjr6ayu.fsf@gnu.org> <874j5ziudn.fsf@gnu.org> <87y13bel5m.fsf@gmail.com> <-wirQcNBR0cpaXo0jL0sp8CxUkFsFX_iWUm_BoGq4ChYLccOyN7QJN53eHf0Q-AncT65owrhqfPWYnnQO3gRHw==@protonmail.internalid> <87setjhcm6.fsf@gnu.org> <87zfnrzjl7.fsf@mail.jao.io> <87h69zh9nw.fsf@gnu.org> <87v7yfzhfz.fsf@mail.jao.io> <87h69yh7zp.fsf@gnu.org> <87ttdyedga.fsf@gmail.com> <87y13ae8it.fsf@gnu.org> <87plome7oc.fsf@gmail.com> <87ikuddp8q.fsf@gmail.com> <875xqb2efq.fsf@gnu.org> <87y136dihg.fsf@gmail.com> <87ttdu1rpu.fsf@gnu.org> <87bk01obpf.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10189"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Eli Zaretskii , "Jose A. Ortega Ruiz" , 73530@debbugs.gnu.org To: Tassilo Horn Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 03 13:12:21 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1swJkq-0002VY-FS for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 03 Oct 2024 13:12:20 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1swJkY-0006ue-RF; Thu, 03 Oct 2024 07:12:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1swJkX-0006uW-2I for bug-gnu-emacs@gnu.org; Thu, 03 Oct 2024 07:12:01 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1swJkW-0000JK-Qb for bug-gnu-emacs@gnu.org; Thu, 03 Oct 2024 07:12:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:Date:References:In-Reply-To:From:To:Subject; bh=PC/iy0kzw3ST43G8Rz18k00It+kivk6m+b4qbJZUH1I=; b=uRkIa3AewY46Sz0bF7N1HR+jfwzlSfgBdIky16ocXtyJV87k7v/R3vAWG+OEB1CR+8slK9xIztIxyMkrlt9kgx0fAZu36SRsRg4vbvR29zQGjwhmLfbWq7oQg3j4ZleRFkqWnugV6XdyT10Id5LrqlQGvDp83Q3PrPIQkJBYzsP272T1ghbQj4FjDL5+6mGP1zasr2/fjLIUsglr41GpSQpB2HIn4lMP4o59B2RC6nW9D+tsD972aNLpBsO5Lh1gjCOdzxpPBuXRcfHDFXluviZRLQJ21wRyi/N4Bw65yCKfxbh7rBSNbjJgjEcseL61Fi3DoLsM3iwGDF6yJWEFpg==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1swJkY-00051Q-ET for bug-gnu-emacs@gnu.org; Thu, 03 Oct 2024 07:12:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Visuwesh Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 03 Oct 2024 11:12:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73530 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 73530-submit@debbugs.gnu.org id=B73530.172795390519270 (code B ref 73530); Thu, 03 Oct 2024 11:12:02 +0000 Original-Received: (at 73530) by debbugs.gnu.org; 3 Oct 2024 11:11:45 +0000 Original-Received: from localhost ([127.0.0.1]:60131 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1swJkG-00050i-GT for submit@debbugs.gnu.org; Thu, 03 Oct 2024 07:11:44 -0400 Original-Received: from mail-pg1-f196.google.com ([209.85.215.196]:52376) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1swJkE-00050P-6I for 73530@debbugs.gnu.org; Thu, 03 Oct 2024 07:11:43 -0400 Original-Received: by mail-pg1-f196.google.com with SMTP id 41be03b00d2f7-7e6ed072cdaso482180a12.0 for <73530@debbugs.gnu.org>; Thu, 03 Oct 2024 04:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727953834; x=1728558634; darn=debbugs.gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PC/iy0kzw3ST43G8Rz18k00It+kivk6m+b4qbJZUH1I=; b=fIiU5ZK1So1w55IgDZFKo333savq1nl53jSMFQt+HWB7GChPKvymxigcAlYnAO8i4Y XjTw5i/oiwmPJApFv3B0XWzXer0aij18ywx061U5jF3Od5U9HJ3S+EumLhP3NF5vzk/+ fffitVQVqzaKCBm71XKFp/ThXTx/cZNpGq6H9+jIMWDH/V+uhw9BlkkQIN8GHFYNAPDn gDn/LYbc5IrFHtMEH0IaYBwUv8tUEzDJGZV2l5MTZeg7peieXwBMPdY1WuvhWiFTYqEu gJWW+LYOhixtKiEWVku2GfYbENbNBwtMl7riiCPLobqUjCODSqxvdQ+ye3vdXY2WGPpF Ma3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727953834; x=1728558634; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PC/iy0kzw3ST43G8Rz18k00It+kivk6m+b4qbJZUH1I=; b=fKjaBSuseuJnkRS7V7c37gti01tM6tzbfXzdZi70eSOeAm54bHdF/BIK1p+frcQQhA YIJVFCneAgBMZ/2ZlGe1WxC55Vlu4cEMp3AlhHVBK8jrrfPx3t+Nq5Gav6mqi4lICJba qijdpHIFck1x6yz7Wg5XqKzlTMWJCM3ztwFWL8++f/Ggri2k18wVbPPwYV+0SVfFePql XJTPYIXuPdwdsLVZRgQIEfHU6qkqinc+zKWHRDWxMT6a2rAfQbgYnb+QK/sI/tbc1jqK epmce0P/m7v3G+EDn2+X06EbF44gehgSB89N9qDE9ErUSgXIyt/wKW2T78c2y89I+32B I0mA== X-Forwarded-Encrypted: i=1; AJvYcCWiOkWlf/Msai+MMz1XOwu8JQiQU+o7RJ+7lbHPjUylpk6B7afWCYDlRTq9c221m5Wa/AyYUA==@debbugs.gnu.org X-Gm-Message-State: AOJu0Yxcx7KYtKf4/DKgIQfjswGfkyHYTDhgNFZq1AxqPdSR3fgSK2dd AFqedppTz0csaf1VsMPeavrXYx7L6Lflk6IyfMLmfxjGrX4ce3dBaYf3vO5H X-Google-Smtp-Source: AGHT+IGg19AaDsVaGiCsJpWKfug7w4Qe4fGoPPy4L6X/zpMPlqlMwCj49XeexmePfJKxnSUkLwdRkA== X-Received: by 2002:a05:6a20:d489:b0:1cf:2ef7:b396 with SMTP id adf61e73a8af0-1d5db1614e0mr8886069637.6.1727953834284; Thu, 03 Oct 2024 04:10:34 -0700 (PDT) Original-Received: from localhost ([115.240.90.130]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71dd9d6fed5sm1110552b3a.11.2024.10.03.04.10.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Oct 2024 04:10:33 -0700 (PDT) In-Reply-To: <87bk01obpf.fsf@gnu.org> (Tassilo Horn's message of "Thu, 03 Oct 2024 10:03:08 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:292899 Archived-At: [=E0=AE=B5=E0=AE=BF=E0=AE=AF=E0=AE=BE=E0=AE=B4=E0=AE=A9=E0=AF=8D =E0=AE=85= =E0=AE=95=E0=AF=8D=E0=AE=9F=E0=AF=8B=E0=AE=AA=E0=AE=B0=E0=AF=8D 03, 2024] T= assilo Horn wrote: > Tassilo Horn writes: > > Hi again, > >>> The PDF generated by LaTeX can have a wildly different outline than >>> matched by doc-view's regexp: >>> >>> % mutool show test.pdf outline >>> | "Text" #nameddest=3Dsection.1 >>> | "Annotations" #nameddest=3Dsection.2 >>> >>> Compare it with: >>> >>> % mutool show atkins_physical_chemistry.pdf outline >>> | "Cover" #page=3D1&view=3DFit >>> | "PREFACE" #page=3D7&view=3DFit >>> | "USING THE BOOK" #page=3D8&view=3DFit >> >> Ok, I see. All my LaTeX PDFs have #nameddest=3Dsection.x values instead >> of #page=3DX values, so that's the reason they don't work. It would be >> good if we could mention that it won't work because there are no page >> references in the outline in the error message. > > Would you mind doing that in a new version of the patch? > > And I wonder if it mutool could spit out page references in addition to > the nameddest references. Do you know if there's a technical limitation > or if it's just not implemented? Unfortunately, I have no idea. I actually don't use doc-view for PDF files, only for docx and DjVu. > Sadly, their communication platform seems to be Discord where I didn't > want to register an account to ask. They seem to use the ghostscript > bugzilla, so one could create a ticket there... If nobody else here in > the discussion already has an account there, I wouldn't mind creating > ony myself and asking. If I cannot find anything in the man page, I can ask in their Discord. I do have an account lying around, I can at least put it to good use. Can this be done in another patch later on? I am not sure if I will get the time soon to follow up on this part of the problem. It would be cleaner if we open another bug report to track this too. >>>> For DjVu, my sample size is 1, and that's a presentation, so at least >>>> here I'm not sure if there should be an index available... >>> >>> I will send the link to the DjVu file that I wrote the feature for >>> off-list. I will send a link to a PDF file too. >> >> Thanks, will try with those two files. > > I did so now and it is blazingly fast for those 80+mb PDF/DjVu files > even on my almost 10 years old laptop, so I'd say your simpler approach > is the right choice. > >>> On this note, should we use doc-view-pdfdraw-program in place of >>> mutool in doc-view--pdf-outline? >> >> Yes, but only if the older names pdfdraw and mudraw already had the >> "show outline" feature. > > I revert the "but only if" part. If mupdf is old and comes with, e.g., > the pdfdraw executable, chances are almost zero that mutool is > installed, too. And if it is, then we should prefer it anywhere. So I > think the way to go is to (executable-find "mutool") in > doc-view-pdfdraw-program first so that it takes precedence and use > doc-view-pdfdraw-program in doc-view--pdf-outline. OK, I will post a patch if you're okay with opening another bug report for the nameddest PDF thingy. >>>> Well, I actually have no strong opinion here. Technically, I like >>>> your approach better because of its simplicity. I would like to test >>>> with some larger documents to see how long index building takes, >>>> though. >>> >>> I tried the function with a large PDF file: >> >> Will try with the large two you've linked later. > > As said above, it's more than fast enough, so let's take your approach. Great, thank you for taking the time to test the patch.