unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Academic workflow with old PDFs
@ 2022-08-17 21:36 Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Alessandro Bertulli @ 2022-08-17 21:36 UTC (permalink / raw)
  To: help-gnu-emacs

Hi all!

I reckon this message may start a flame but that's not my intention, I'm
looking to hear your advice (especially, but not limited, if you work
in/with academia)

I'm currently writing my MS's thesis. Searching for the state of the art
of my assigned technology, I am struggling to read and reason about some
old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
currently switching back and forth between Sioyek and Evince to read my
pdfs, while taking notes in Org mode.

I wonder wether I should switch to using pdf-tools (potentially with the
integration of org-noter). So, my point is: can pdf-tools, in your
opinion, work with old pdf files, or it's just a limitation of the file
type? If you know Sioyek, how do you integrate it with Emacs? Is it
worth doing so? Or Sioyek is clearly better/worse than Emacs *for an
academic workflow*? Would you suggest something like Logseq?

P.S. note: Sioyek aims to reconstruct hyperlinks to references and
equations in text even for old papers. That's awesome and very useful,
but unfortunately it seems to depend on the quality of the file, as
sometimes it doesn't work. Here my need for a replacement.

Thanks!

Bertulli



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
@ 2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 11:23 ` Jean Louis
  2022-08-18 20:45 ` Emanuel Berg
  2 siblings, 0 replies; 19+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18  2:26 UTC (permalink / raw)
  To: help-gnu-emacs

> P.S. note: Sioyek aims to reconstruct hyperlinks to references and
> equations in text even for old papers.  That's awesome and very useful,
> but unfortunately it seems to depend on the quality of the file, as
> sometimes it doesn't work. Here my need for a replacement.

FWIW, I mostly use `doc-view-mode` to read PDFs.  In many ways it's very
limited, but I like the fact that I can crop to the particular part of
the page(s) I'm interested in and that's preserved as I move between
pages (some other readers have an auto-crop feature which is even
better), and more importantly I can do `C-x 5 2` to see several pages at
the same time (I very often keep a frame/window displaying the
bibliography, but other times I use that to display a figure while
I read the corresponding description from another page, or to display
several figures next to each other, ...).

I tried Sioyek and it's nice, but not sufficiently nicer to make me
change :-)

Have you tried `pdf-tools`?  AFAICT it has all the advantages of
`doc-view-mode` but without its many limitations.  It's truly nice.
I don't use it often enough because all too often my main Emacs is in
a state of hacking that breaks my externally installed packages, so I'm
still used to using `doc-view-mode` as my main driver, but the more time
passes the more I find it too limited.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 11:23 ` Jean Louis
  2022-08-18 13:48   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 20:45 ` Emanuel Berg
  2 siblings, 1 reply; 19+ messages in thread
From: Jean Louis @ 2022-08-18 11:23 UTC (permalink / raw)
  To: Alessandro Bertulli; +Cc: help-gnu-emacs

* Alessandro Bertulli <alessandro.bertulli96@gmail.com> [2022-08-18 01:27]:
> I'm currently writing my MS's thesis. Searching for the state of the art
> of my assigned technology, I am struggling to read and reason about some
> old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
> currently switching back and forth between Sioyek and Evince to read my
> pdfs, while taking notes in Org mode.

Images with text you may process with OCR program to get some
meanings, and then text may be connected to pages as well and PDF
packed again.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 11:23 ` Jean Louis
@ 2022-08-18 13:48   ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 14:38     ` Emanuel Berg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 13:48 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis [2022-08-18 14:23:05] wrote:
> * Alessandro Bertulli <alessandro.bertulli96@gmail.com> [2022-08-18 01:27]:
>> I'm currently writing my MS's thesis. Searching for the state of the art
>> of my assigned technology, I am struggling to read and reason about some
>> old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
>> currently switching back and forth between Sioyek and Evince to read my
>> pdfs, while taking notes in Org mode.
>
> Images with text you may process with OCR program to get some
> meanings, and then text may be connected to pages as well and PDF
> packed again.

Some of the old articles in ACM are already processed this way for
you, actually.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 13:48   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 14:38     ` Emanuel Berg
  2022-08-18 14:41       ` Stefan Monnier via Users list for the GNU Emacs text editor
  0 siblings, 1 reply; 19+ messages in thread
From: Emanuel Berg @ 2022-08-18 14:38 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>>> I'm currently writing my MS's thesis. Searching for the
>>> state of the art of my assigned technology, I am
>>> struggling to read and reason about some old papers from
>>> ACM and IEEE (pre-2000, scanned, with no index). I am
>>> currently switching back and forth between Sioyek and
>>> Evince to read my pdfs, while taking notes in Org mode.
>>
>> Images with text you may process with OCR program to get
>> some meanings, and then text may be connected to pages as
>> well and PDF packed again.
>
> Some of the old articles in ACM are already processed this
> way for you, actually.

Are there some articles that are really good?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 14:38     ` Emanuel Berg
@ 2022-08-18 14:41       ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 20:45         ` Emanuel Berg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 14:41 UTC (permalink / raw)
  To: help-gnu-emacs

> Are there some articles that are really good?

Only those that I wrote, of course,


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
@ 2022-08-18 20:22 Alessandro Bertulli
  2022-08-18 22:14 ` Stefan Monnier
  2022-08-19  4:24 ` tomas
  0 siblings, 2 replies; 19+ messages in thread
From: Alessandro Bertulli @ 2022-08-18 20:22 UTC (permalink / raw)
  To: monnier; +Cc: help-gnu-emacs

> Some of the old articles in ACM are already processed this way for
> you, actually.

I dunno, the ones I read were actually simple scans (as far as I can
tell). The "search in text" function of Evince/Sioyek worked, tho. Are
you referring to that?

Alessandro



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Academic workflow with old PDFs
  2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
  2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 11:23 ` Jean Louis
@ 2022-08-18 20:45 ` Emanuel Berg
  2022-08-18 22:24   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
  2 siblings, 1 reply; 19+ messages in thread
From: Emanuel Berg @ 2022-08-18 20:45 UTC (permalink / raw)
  To: help-gnu-emacs

Alessandro Bertulli wrote:

> I reckon this message may start a flame but that's not my
> intention, I'm looking to hear your advice (especially, but
> not limited, if you work in/with academia)

Guys, no one uses the word "academia" any more.

Maybe at the Department of Philosophy, it's called academia.
So that's right, all the more reason.

It is called higher education, university, research, science
and maybe other words as well depending on context, but not
that one.

> I'm currently writing my MS's thesis

Guys, there are there levels:

  Bachelor
  Master
  Ph.D.

> Or Sioyek is clearly better/worse than Emacs *for an
> academic workflow*?

Guys ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 14:41       ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 20:45         ` Emanuel Berg
  0 siblings, 0 replies; 19+ messages in thread
From: Emanuel Berg @ 2022-08-18 20:45 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>> Are there some articles that are really good?
>
> Only those that I wrote, of course,

Examples please :D

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 20:22 Alessandro Bertulli
@ 2022-08-18 22:14 ` Stefan Monnier
  2022-08-19  4:24 ` tomas
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2022-08-18 22:14 UTC (permalink / raw)
  To: Alessandro Bertulli; +Cc: help-gnu-emacs

>> Some of the old articles in ACM are already processed this way for
>> you, actually.
> I dunno, the ones I read were actually simple scans (as far as I can
> tell). The "search in text" function of Evince/Sioyek worked, tho. Are
> you referring to that?

Yes.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 20:45 ` Emanuel Berg
@ 2022-08-18 22:24   ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-08-18 23:22     ` Emanuel Berg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 22:24 UTC (permalink / raw)
  To: help-gnu-emacs

Emanuel Berg [2022-08-18 22:45:14] wrote:
> Guys, no one uses the word "academia" any more.

In academia, we do.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 22:24   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 23:22     ` Emanuel Berg
  2022-08-18 23:34       ` Emanuel Berg
  2022-08-19  9:47       ` Marcin Borkowski
  0 siblings, 2 replies; 19+ messages in thread
From: Emanuel Berg @ 2022-08-18 23:22 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>> Guys, no one uses the word "academia" any more.
>
> In academia, we do.

Examples? :)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 23:22     ` Emanuel Berg
@ 2022-08-18 23:34       ` Emanuel Berg
  2022-08-20 20:37         ` Alessandro Bertulli
  2022-08-19  9:47       ` Marcin Borkowski
  1 sibling, 1 reply; 19+ messages in thread
From: Emanuel Berg @ 2022-08-18 23:34 UTC (permalink / raw)
  To: help-gnu-emacs

>>> Guys, no one uses the word "academia" any more.
>>
>> In academia, we do.
>
> Examples?

  I reckon this message may start a flame but that's not my
  intention, I'm looking to hear your advice (especially, but
  not limited, if you work in/with academia)

I meant a _good_ example ... and list isn't "academia",
wherever that's suppose to be.

This - "if you work in/with academia" - should be "if you are
a researcher/scientist" if the OP is from the
technology/engineering/science world.

In "academia" there are "intellectual's" and "scholars" (AAAAH!
it gets worse!)

Stefan, you gonna be a Shakespearian scholar now? LOL Actually
I can't even envision you as one, and I mean that as
a compliment, of course ...

[Or a Civil War buff? Bonus fact/question: have more books
 been written on the American Civil War than on WW2? On the
 American civil war, "[t]here are over 60 000 books"
 <https://en.wikipedia.org/wiki/Bibliography_of_the_American_Civil_War>
 so ... how many on WW2? Ask the academics at the History
 Department - but don't assume they can count or maintain
 a Bibtex file...]

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 20:22 Alessandro Bertulli
  2022-08-18 22:14 ` Stefan Monnier
@ 2022-08-19  4:24 ` tomas
  2022-08-20 20:57   ` Alessandro Bertulli
  1 sibling, 1 reply; 19+ messages in thread
From: tomas @ 2022-08-19  4:24 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

On Thu, Aug 18, 2022 at 10:22:09PM +0200, Alessandro Bertulli wrote:
> > Some of the old articles in ACM are already processed this way for
> > you, actually.
> 
> I dunno, the ones I read were actually simple scans (as far as I can
> tell). The "search in text" function of Evince/Sioyek worked, tho. Are
> you referring to that?

If that works (and assuming Evince hasn't acquired OCR powers stelathily),
the pre-scanned text must be somewhere in the document, yes.

Does selecting a region of text and copying that elsewhere work, too?

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 23:22     ` Emanuel Berg
  2022-08-18 23:34       ` Emanuel Berg
@ 2022-08-19  9:47       ` Marcin Borkowski
  2022-08-19 13:51         ` Emanuel Berg
  1 sibling, 1 reply; 19+ messages in thread
From: Marcin Borkowski @ 2022-08-19  9:47 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs


On 2022-08-19, at 01:22, Emanuel Berg <incal@dataswamp.org> wrote:

> Stefan Monnier via Users list for the GNU Emacs text editor wrote:
>
>>> Guys, no one uses the word "academia" any more.
>>
>> In academia, we do.
>
> Examples? :)

https://academia.stackexchange.com/

Good enough?

-- 
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-19  9:47       ` Marcin Borkowski
@ 2022-08-19 13:51         ` Emanuel Berg
  0 siblings, 0 replies; 19+ messages in thread
From: Emanuel Berg @ 2022-08-19 13:51 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:

>>>> Guys, no one uses the word "academia" any more.
>>>
>>> In academia, we do.
>>
>> Examples? :)
>
> https://academia.stackexchange.com/
>
> Good enough?

-1

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-18 23:34       ` Emanuel Berg
@ 2022-08-20 20:37         ` Alessandro Bertulli
  0 siblings, 0 replies; 19+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 20:37 UTC (permalink / raw)
  To: incal; +Cc: help-gnu-emacs

> I meant a _good_ example ... and list isn't "academia",
> wherever that's suppose to be.

Again, I'm sorry this bothers you so much. Anyway, that was the reason I
specified "especially, but not limited to".

> This - "if you work in/with academia" - should be "if you are
> a researcher/scientist" if the OP is from the
> technology/engineering/science world.

You're right, I am, but actually in my country we never did that
distinction.

> In "academia" there are "intellectual's" and "scholars" (AAAAH!
> it gets worse!)

Here, on the other hand, you're completely right. Those two words are
not used even here, unless you're (inho) stucking up :-)

Anyway, as I said, i didn't mean to turn this thread into a linguistic
flame, so I don't want to further clutter the mailing list, if you are
ok with it.

Alessandro



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-19  4:24 ` tomas
@ 2022-08-20 20:57   ` Alessandro Bertulli
  2022-08-20 21:14     ` Alessandro Bertulli
  0 siblings, 1 reply; 19+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 20:57 UTC (permalink / raw)
  To: tomas; +Cc: help-gnu-emacs

> Does selecting a region of text and copying that elsewhere work, too?

Yes, but not everywhere. Again, thank you very much, I suppose PDF
readers cannot do miracles. If the point is the quality of the paper,
that's fine, it means I can stop searching for a magical, non-existent
PDF software.

> If that works (and assuming Evince hasn't acquired OCR powers stelathily),
> the pre-scanned text must be somewhere in the document, yes.

You're right.

Alessandro



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Academic workflow with old PDFs
  2022-08-20 20:57   ` Alessandro Bertulli
@ 2022-08-20 21:14     ` Alessandro Bertulli
  0 siblings, 0 replies; 19+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 21:14 UTC (permalink / raw)
  To: help-gnu-emacs

Following last message, I'd like to thank Jean and Stefan:

> Images with text you may process with OCR program to get some
> meanings, and then text may be connected to pages as well and PDF
> packed again.
> 
> -- 
> Jean

> > The "search in text" function of Evince/Sioyek worked, tho. Are
> > you referring to that?
> 
> Yes.
> 
> 
>         Stefan

As I was saying, it seems like I'll struggle with the quality of the
PDFs. No big deal, having them at least OCRed by the publisher is a good
thing. Thanks again!

Alessandro



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-08-20 21:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
2022-08-18  2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 11:23 ` Jean Louis
2022-08-18 13:48   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 14:38     ` Emanuel Berg
2022-08-18 14:41       ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 20:45         ` Emanuel Berg
2022-08-18 20:45 ` Emanuel Berg
2022-08-18 22:24   ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 23:22     ` Emanuel Berg
2022-08-18 23:34       ` Emanuel Berg
2022-08-20 20:37         ` Alessandro Bertulli
2022-08-19  9:47       ` Marcin Borkowski
2022-08-19 13:51         ` Emanuel Berg
  -- strict thread matches above, loose matches on Subject: below --
2022-08-18 20:22 Alessandro Bertulli
2022-08-18 22:14 ` Stefan Monnier
2022-08-19  4:24 ` tomas
2022-08-20 20:57   ` Alessandro Bertulli
2022-08-20 21:14     ` Alessandro Bertulli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).