From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eduardo Ochs Newsgroups: gmane.emacs.help Subject: Re: Academic workflow with old PDFs Date: Sat, 20 Aug 2022 18:32:09 -0300 Message-ID: References: <87fshti553.fsf@dataswamp.org> <87czcuzkzr.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35686"; mail-complaints-to="usenet@ciao.gmane.io" Cc: help-gnu-emacs To: Alessandro Bertulli Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Aug 20 23:32:49 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oPW5J-00096y-2x for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 20 Aug 2022 23:32:49 +0200 Original-Received: from localhost ([::1]:51674 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oPW5H-0003Uv-Kq for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 20 Aug 2022 17:32:47 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35238) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oPW4v-0003UY-2M for help-gnu-emacs@gnu.org; Sat, 20 Aug 2022 17:32:25 -0400 Original-Received: from mail-io1-xd2f.google.com ([2607:f8b0:4864:20::d2f]:33641) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oPW4t-0000eW-AT for help-gnu-emacs@gnu.org; Sat, 20 Aug 2022 17:32:24 -0400 Original-Received: by mail-io1-xd2f.google.com with SMTP id y187so5725124iof.0 for ; Sat, 20 Aug 2022 14:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=0sjD2WnRyxtk0P7O/FOfAwGgxP41WWpzccdqaFVd/v4=; b=MJc76yGpbJs9r2O5xEE3kiMiIZv6cxViwnfiduo670wrltR5CBV+384ImS3QzwG6EF /wamZW8eW7GW1oz6mE8sEo0msftC0nxjyiHWmHXQa5vC9XpSDHLSs158Vl6ZynT/3GpD p1slFvn0d64wqJ9rz4Vr6QxcseOX2ILvBavOt7VeJ7E9Om4Y4jaHrAMMxAxQ3yZK3YRL zI92MNEKjJ8I9MmlXCCxFOtB0N99K+TiHEV+mWn5AFD0R/+TM+JT59OG8c6KWIP10lVf i8ZpaxTs3sJi1TmLSM8K967MfH1qRHUbcsJOnN6PY+7lV3zsatTZXva32yVhF5EbfR+Z oxIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=0sjD2WnRyxtk0P7O/FOfAwGgxP41WWpzccdqaFVd/v4=; b=6vN3ncLGR5qdxp3jQ+wWy9cHg8B9EJ3Loknov67QIfmrGhVXP9XypITc/eHc7SEDzK xcDfUoTCJM7f+Uh0d/KKzjuWHolb1stNyWMIc6Ii8hDUb8YREh3Wz0t7LNFPPbV4aOvo MVM5YIWDtUOll/rNAg54pV/i+355T8VQkwaJ57hzjY0wIp/+4WJyDQthKfsL5kJ2ej1s qRIaUg6p3ajT/SRK7ZrgxNm5F1M2U6oAvZR1cs9MGSHVbyY1w4rqZWs2t9JqKZcCS/MU /822geOaeiu7qUBKKJdybLx9z2QEwXTU4+/g/7vwjwfnQa98K15Bubc/ejUGANoHDqAC Pggg== X-Gm-Message-State: ACgBeo0MbYECm2rlWMH4baY9l6m4LWdA2KXz3nIp13RsPzRCMGTBs7LK DZJkkVk0syel30q9I2TJDixEkxWf5/WjO4F2oT0= X-Google-Smtp-Source: AA6agR5hA9ZMdAHhdKgxtfTFvZmsT7tiFqnO/+31D7XGH58kSAuOUC1QPdTUXJwnlZ9UYpCwy2HfSL8eLX++1FOAYeI= X-Received: by 2002:a05:6638:311:b0:349:cd35:1a1 with SMTP id w17-20020a056638031100b00349cd3501a1mr371108jap.253.1661031141549; Sat, 20 Aug 2022 14:32:21 -0700 (PDT) In-Reply-To: <87czcuzkzr.fsf@gmail.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::d2f; envelope-from=eduardoochs@gmail.com; helo=mail-io1-xd2f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:138948 Archived-At: On Sat, 20 Aug 2022 at 18:03, Alessandro Bertulli wrote: > > Here my bad, I should have asked a narrower question. Looking back, I'd > say you're right, my first question was if it was possible (in the > community's opinion) to study old, scanned, poorly indexed PDFs with > pdf-tools, and if not, what other tools do you use. I should have been > more focused, I apologize. Hi Alessandro, my favorite tool for indexing PDFs - and that I use even for PDFs that only contain photos of whiteboards, and that are totally unOCRizable - is the module of eev that is explained in this tutorial, http://angg.twu.net/eev-intros/find-pdf-like-intro.html and in the video whose index is here: http://angg.twu.net/.emacs.videos.html#eev2020 Look for the lines in the index that look like these ones, (find-eev2020video "4:52" "`find-pdf-page' calls an external program") (find-eev2020video "5:26" "`find-pdf-text' converts the PDF to text and") (find-eev2020video "10:45" "`code-pdf-page' creates a short hyperlink function for a PDF") (find-eev2020video "11:38" "let's try...") (find-eev2020video "11:55" "`find-fongspivatext'") (find-eev2020video "12:25" "This block is a kind of an index for that book") (find-eev2020video "12:54" "This block is a kind of an index for that video") and click on the links with the timemarks... If that looks like something that you would like to try then send me an e-mail and let's see if we can arrange to chat by IRC or by some other means! Cheers, Eduardo Ochs http://angg.twu.net/#eev