unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Academic workflow with old PDFs
@ 2022-08-17 21:36 Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alessandro Bertulli @ 2022-08-17 21:36 UTC (permalink / raw)
  To: help-gnu-emacs

Hi all!

I reckon this message may start a flame but that's not my intention, I'm
looking to hear your advice (especially, but not limited, if you work
in/with academia)

I'm currently writing my MS's thesis. Searching for the state of the art
of my assigned technology, I am struggling to read and reason about some
old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
currently switching back and forth between Sioyek and Evince to read my
pdfs, while taking notes in Org mode.

I wonder wether I should switch to using pdf-tools (potentially with the
integration of org-noter). So, my point is: can pdf-tools, in your
opinion, work with old pdf files, or it's just a limitation of the file
type? If you know Sioyek, how do you integrate it with Emacs? Is it
worth doing so? Or Sioyek is clearly better/worse than Emacs *for an
academic workflow*? Would you suggest something like Logseq?

P.S. note: Sioyek aims to reconstruct hyperlinks to references and
equations in text even for old papers. That's awesome and very useful,
but unfortunately it seems to depend on the quality of the file, as
sometimes it doesn't work. Here my need for a replacement.

Thanks!

Bertulli



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Alessandro Bertulli
@ 2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 11:23 ` Jean Louis
  2022-08-18 20:45 ` Emanuel Berg
  2 siblings, 0 replies; 11+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18  2:26 UTC (permalink / raw)
  To: help-gnu-emacs

> P.S. note: Sioyek aims to reconstruct hyperlinks to references and
> equations in text even for old papers.  That's awesome and very useful,
> but unfortunately it seems to depend on the quality of the file, as
> sometimes it doesn't work. Here my need for a replacement.

FWIW, I mostly use `doc-view-mode` to read PDFs.  In many ways it's very
limited, but I like the fact that I can crop to the particular part of
the page(s) I'm interested in and that's preserved as I move between
pages (some other readers have an auto-crop feature which is even
better), and more importantly I can do `C-x 5 2` to see several pages at
the same time (I very often keep a frame/window displaying the
bibliography, but other times I use that to display a figure while
I read the corresponding description from another page, or to display
several figures next to each other, ...).

I tried Sioyek and it's nice, but not sufficiently nicer to make me
change :-)

Have you tried `pdf-tools`?  AFAICT it has all the advantages of
`doc-view-mode` but without its many limitations.  It's truly nice.
I don't use it often enough because all too often my main Emacs is in
a state of hacking that breaks my externally installed packages, so I'm
still used to using `doc-view-mode` as my main driver, but the more time
passes the more I find it too limited.


        Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 11:23 ` Jean Louis
  2022-08-18 20:45 ` Emanuel Berg
  2 siblings, 0 replies; 11+ messages in thread
From: Jean Louis @ 2022-08-18 11:23 UTC (permalink / raw)
  To: Alessandro Bertulli; +Cc: help-gnu-emacs

* Alessandro Bertulli <alessandro.bertulli96@gmail.com> [2022-08-18 01:27]:
> I'm currently writing my MS's thesis. Searching for the state of the art
> of my assigned technology, I am struggling to read and reason about some
> old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
> currently switching back and forth between Sioyek and Evince to read my
> pdfs, while taking notes in Org mode.

Images with text you may process with OCR program to get some
meanings, and then text may be connected to pages as well and PDF
packed again.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
@ 2022-08-18 11:31 Alessandro Bertulli
  0 siblings, 0 replies; 11+ messages in thread
From: Alessandro Bertulli @ 2022-08-18 11:31 UTC (permalink / raw)
  To: monnier; +Cc: help-gnu-emacs

> more importantly I can do `C-x 5 2` to see several pages at the same
> time (I very often keep a frame/window displaying the bibliography,
> but other times I use that to display a figure while I read the
> corresponding description from another page, or to display several
> figures next to each other, ...)

This is a very good idea, thanks! I find that sometimes, using my DE tab
switching (alt+tab on GNOME) is quicker than changing Emacs buffer, so
that may be a good tip.

> I tried Sioyek and it's nice, but not sufficiently nicer to make me
> change :-)

I'm quite conflicted about it :-)
On one hand, it is specifically designed for academics, and it works
decently well; on the other, it has Vim-style keybindings (that may
overridden tho), and as I was saying it isn't always able to reconstruct
the hyperlinks if the pdf is old. But again, I'm starting to suspect
this is a limitation intrinsic to the quality of the pdf, and that there
isn't a magical tool to perform so accurate OCR. However, if any one has
comments or suggestions about this, they're welcome.

> Have you tried `pdf-tools`?

In fact, I have, but just barely. Until now, I sticked to Sioyek since I
needed to read almost exclusively old papers. But I'm willing to explore
it further.

Alessandro



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 11:23 ` Jean Louis
@ 2022-08-18 20:45 ` Emanuel Berg
  2 siblings, 0 replies; 11+ messages in thread
From: Emanuel Berg @ 2022-08-18 20:45 UTC (permalink / raw)
  To: help-gnu-emacs

Alessandro Bertulli wrote:

> I reckon this message may start a flame but that's not my
> intention, I'm looking to hear your advice (especially, but
> not limited, if you work in/with academia)

Guys, no one uses the word "academia" any more.

Maybe at the Department of Philosophy, it's called academia.
So that's right, all the more reason.

It is called higher education, university, research, science
and maybe other words as well depending on context, but not
that one.

> I'm currently writing my MS's thesis

Guys, there are there levels:

  Bachelor
  Master
  Ph.D.

> Or Sioyek is clearly better/worse than Emacs *for an
> academic workflow*?

Guys ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
@ 2022-08-18 21:08 Alessandro Bertulli
  2022-08-18 22:00 ` Emanuel Berg
  2022-08-19  4:25 ` tomas
  0 siblings, 2 replies; 11+ messages in thread
From: Alessandro Bertulli @ 2022-08-18 21:08 UTC (permalink / raw)
  To: incal; +Cc: help-gnu-emacs

Sorry if that bothered you :-)

> Guys, no one uses the word "academia" any more.

> It is called higher education, university, research, science
> and maybe other words as well depending on context, but not
> that one.

Dunno, in Italy it's still used sometimes, I just assumed it was canon.

> Guys, there are there levels:
> 
>   Bachelor
>   Master
>   Ph.D.

True, point is that "Master" has a different meaning in Italy, so I
always specify MS as "Master of Science", to disambiguate.

> Guys ...

Here I suspect you are referring to the use of "academia" again

Alessandro



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-18 21:08 Academic workflow with old PDFs Alessandro Bertulli
@ 2022-08-18 22:00 ` Emanuel Berg
  2022-08-19  4:28   ` Derailing a thread [was: Academic workflow with old PDFs] tomas
  2022-08-20 21:02   ` Academic workflow with old PDFs Alessandro Bertulli
  2022-08-19  4:25 ` tomas
  1 sibling, 2 replies; 11+ messages in thread
From: Emanuel Berg @ 2022-08-18 22:00 UTC (permalink / raw)
  To: help-gnu-emacs

Alessandro Bertulli wrote:

>> It is called higher education, university, research,
>> science and maybe other words as well depending on context,
>> but not that one.
>
> Dunno

nno.

> in Italy it's still used sometimes, I just assumed it
> was canon.

Italy has a history of being a bit "behind", this has as often
been good as it has been bad, however in this case "academic"
brings the thoughts to a stinking professor of English
literature who cannot do laundry, this obviously has nothing
to do with the theoretic superstructure of very practical
things like technology and engineering. Even in language you
may have heard phrases like "the debate has been largely
academic" meaning without substance and not of practical
relevance. (Not that there is anything wrong with
English literature.)

>> Guys, there are there levels:
>> 
>>   Bachelor
>>   Master
>>   Ph.D.
>
> True, point is that "Master" has a different meaning in
> Italy, so I always specify MS as "Master of Science",
> to disambiguate.

Well, the international language - English;
the language of science, very international indeed - English;
the language of computers - English (US English in terms of speeling);
the language of your post and my reply - English;
"Master" - an English word ...

>> Guys ...
>
> Here I suspect you are referring to the use of "academia"
> again

If you woold specify what tasks in general and what features
in particular you look for to carry out those tasks, there is
no "academic workflow". But, you said it in subsequent
messages and to some some extent in the first post as well so
yeah, it is enough we cross that from the proceedings ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-18 21:08 Academic workflow with old PDFs Alessandro Bertulli
  2022-08-18 22:00 ` Emanuel Berg
@ 2022-08-19  4:25 ` tomas
  1 sibling, 0 replies; 11+ messages in thread
From: tomas @ 2022-08-19  4:25 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

On Thu, Aug 18, 2022 at 11:08:33PM +0200, Alessandro Bertulli wrote:
> Sorry if that bothered you :-)
> 
> > Guys, no one uses the word "academia" any more.
> 
> > It is called higher education, university, research, science
> > and maybe other words as well depending on context, but not
> > that one.
> 
> Dunno, in Italy it's still used sometimes, I just assumed it was canon.

I use it all the time.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Derailing a thread [was: Academic workflow with old PDFs]
  2022-08-18 22:00 ` Emanuel Berg
@ 2022-08-19  4:28   ` tomas
  2022-08-20 21:02   ` Academic workflow with old PDFs Alessandro Bertulli
  1 sibling, 0 replies; 11+ messages in thread
From: tomas @ 2022-08-19  4:28 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

On Fri, Aug 19, 2022 at 12:00:24AM +0200, Emanuel Berg wrote:
> Alessandro Bertulli wrote:
> 
> >> It is called higher education, university, research,
> >> science and maybe other words as well depending on context,
> >> but not that one.
> >
> > Dunno
> 
> nno.
> 
> > in Italy it's still used sometimes, I just assumed it
> > was canon.
> 
> Italy has a history of being a bit "behind" [...]

Whatever. The thread used to be interesting. Now it's about
Emanuel's perception on  a word's usage. I'm out.

-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-18 22:00 ` Emanuel Berg
  2022-08-19  4:28   ` Derailing a thread [was: Academic workflow with old PDFs] tomas
@ 2022-08-20 21:02   ` Alessandro Bertulli
  2022-08-20 21:32     ` Eduardo Ochs
  1 sibling, 1 reply; 11+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 21:02 UTC (permalink / raw)
  To: help-gnu-emacs

> Italy has a history of being a bit "behind"

Well, first of all, thank you :-)

> [...] however in this case "academic"
> brings the thoughts to a stinking professor of English
> literature who cannot do laundry, this obviously has nothing
> to do with the theoretic superstructure of very practical
> things like technology and engineering. Even in language you
> may have heard phrases like "the debate has been largely
> academic" meaning without substance and not of practical
> relevance. (Not that there is anything wrong with
> English literature.)

Surely you know English language better than me, so I don't question
here.

> Well, the international language - English;
> the language of science, very international indeed - English;
> the language of computers - English (US English in terms of speeling);
> the language of your post and my reply - English;
> "Master" - an English word ...

You're right, my point is that "Master of Science" is English enough:
https://en.wikipedia.org/wiki/Master_of_Science

> If you woold specify what tasks in general and what features
> in particular you look for to carry out those tasks, there is
> no "academic workflow". But, you said it in subsequent
> messages and to some some extent in the first post as well so
> yeah, it is enough we cross that from the proceedings ...

Here my bad, I should have asked a narrower question. Looking back, I'd
say you're right, my first question was if it was possible (in the
community's opinion) to study old, scanned, poorly indexed PDFs with
pdf-tools, and if not, what other tools do you use. I should have been
more focused, I apologize.

Alessandro



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-20 21:02   ` Academic workflow with old PDFs Alessandro Bertulli
@ 2022-08-20 21:32     ` Eduardo Ochs
  0 siblings, 0 replies; 11+ messages in thread
From: Eduardo Ochs @ 2022-08-20 21:32 UTC (permalink / raw)
  To: Alessandro Bertulli; +Cc: help-gnu-emacs

On Sat, 20 Aug 2022 at 18:03, Alessandro Bertulli
<alessandro.bertulli96@gmail.com> wrote:
>
> Here my bad, I should have asked a narrower question. Looking back, I'd
> say you're right, my first question was if it was possible (in the
> community's opinion) to study old, scanned, poorly indexed PDFs with
> pdf-tools, and if not, what other tools do you use. I should have been
> more focused, I apologize.

Hi Alessandro,

my favorite tool for indexing PDFs - and that I use even for PDFs that
only contain photos of whiteboards, and that are totally unOCRizable -
is the module of eev that is explained in this tutorial,

  http://angg.twu.net/eev-intros/find-pdf-like-intro.html

and in the video whose index is here:

  http://angg.twu.net/.emacs.videos.html#eev2020

Look for the lines in the index that look like these ones,

  (find-eev2020video "4:52" "`find-pdf-page' calls an external program")
  (find-eev2020video "5:26" "`find-pdf-text' converts the PDF to text and")
  (find-eev2020video "10:45" "`code-pdf-page' creates a short
hyperlink function for a PDF")
  (find-eev2020video "11:38" "let's try...")
  (find-eev2020video "11:55" "`find-fongspivatext'")
  (find-eev2020video "12:25" "This block is a kind of an index for that book")
  (find-eev2020video "12:54" "This block is a kind of an index for that video")

and click on the links with the timemarks...

If that looks like something that you would like to try then send me
an e-mail and let's see if we can arrange to chat by IRC or by some
other means!

  Cheers,
    Eduardo Ochs
    http://angg.twu.net/#eev



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-20 21:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-18 21:08 Academic workflow with old PDFs Alessandro Bertulli
2022-08-18 22:00 ` Emanuel Berg
2022-08-19  4:28   ` Derailing a thread [was: Academic workflow with old PDFs] tomas
2022-08-20 21:02   ` Academic workflow with old PDFs Alessandro Bertulli
2022-08-20 21:32     ` Eduardo Ochs
2022-08-19  4:25 ` tomas
  -- strict thread matches above, loose matches on Subject: below --
2022-08-18 11:31 Alessandro Bertulli
2022-08-17 21:36 Alessandro Bertulli
2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 11:23 ` Jean Louis
2022-08-18 20:45 ` Emanuel Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).