* Academic workflow with old PDFs
@ 2022-08-17 21:36 Alessandro Bertulli
2022-08-18 2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Alessandro Bertulli @ 2022-08-17 21:36 UTC (permalink / raw)
To: help-gnu-emacs
Hi all!
I reckon this message may start a flame but that's not my intention, I'm
looking to hear your advice (especially, but not limited, if you work
in/with academia)
I'm currently writing my MS's thesis. Searching for the state of the art
of my assigned technology, I am struggling to read and reason about some
old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
currently switching back and forth between Sioyek and Evince to read my
pdfs, while taking notes in Org mode.
I wonder wether I should switch to using pdf-tools (potentially with the
integration of org-noter). So, my point is: can pdf-tools, in your
opinion, work with old pdf files, or it's just a limitation of the file
type? If you know Sioyek, how do you integrate it with Emacs? Is it
worth doing so? Or Sioyek is clearly better/worse than Emacs *for an
academic workflow*? Would you suggest something like Logseq?
P.S. note: Sioyek aims to reconstruct hyperlinks to references and
equations in text even for old papers. That's awesome and very useful,
but unfortunately it seems to depend on the quality of the file, as
sometimes it doesn't work. Here my need for a replacement.
Thanks!
Bertulli
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
@ 2022-08-18 2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 11:23 ` Jean Louis
2022-08-18 20:45 ` Emanuel Berg
2 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 2:26 UTC (permalink / raw)
To: help-gnu-emacs
> P.S. note: Sioyek aims to reconstruct hyperlinks to references and
> equations in text even for old papers. That's awesome and very useful,
> but unfortunately it seems to depend on the quality of the file, as
> sometimes it doesn't work. Here my need for a replacement.
FWIW, I mostly use `doc-view-mode` to read PDFs. In many ways it's very
limited, but I like the fact that I can crop to the particular part of
the page(s) I'm interested in and that's preserved as I move between
pages (some other readers have an auto-crop feature which is even
better), and more importantly I can do `C-x 5 2` to see several pages at
the same time (I very often keep a frame/window displaying the
bibliography, but other times I use that to display a figure while
I read the corresponding description from another page, or to display
several figures next to each other, ...).
I tried Sioyek and it's nice, but not sufficiently nicer to make me
change :-)
Have you tried `pdf-tools`? AFAICT it has all the advantages of
`doc-view-mode` but without its many limitations. It's truly nice.
I don't use it often enough because all too often my main Emacs is in
a state of hacking that breaks my externally installed packages, so I'm
still used to using `doc-view-mode` as my main driver, but the more time
passes the more I find it too limited.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
2022-08-18 2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 11:23 ` Jean Louis
2022-08-18 13:48 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 20:45 ` Emanuel Berg
2 siblings, 1 reply; 20+ messages in thread
From: Jean Louis @ 2022-08-18 11:23 UTC (permalink / raw)
To: Alessandro Bertulli; +Cc: help-gnu-emacs
* Alessandro Bertulli <alessandro.bertulli96@gmail.com> [2022-08-18 01:27]:
> I'm currently writing my MS's thesis. Searching for the state of the art
> of my assigned technology, I am struggling to read and reason about some
> old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
> currently switching back and forth between Sioyek and Evince to read my
> pdfs, while taking notes in Org mode.
Images with text you may process with OCR program to get some
meanings, and then text may be connected to pages as well and PDF
packed again.
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
In support of Richard M. Stallman
https://stallmansupport.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 11:23 ` Jean Louis
@ 2022-08-18 13:48 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 14:38 ` Emanuel Berg
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 13:48 UTC (permalink / raw)
To: help-gnu-emacs
Jean Louis [2022-08-18 14:23:05] wrote:
> * Alessandro Bertulli <alessandro.bertulli96@gmail.com> [2022-08-18 01:27]:
>> I'm currently writing my MS's thesis. Searching for the state of the art
>> of my assigned technology, I am struggling to read and reason about some
>> old papers from ACM and IEEE (pre-2000, scanned, with no index). I am
>> currently switching back and forth between Sioyek and Evince to read my
>> pdfs, while taking notes in Org mode.
>
> Images with text you may process with OCR program to get some
> meanings, and then text may be connected to pages as well and PDF
> packed again.
Some of the old articles in ACM are already processed this way for
you, actually.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 13:48 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 14:38 ` Emanuel Berg
2022-08-18 14:41 ` Stefan Monnier via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 20+ messages in thread
From: Emanuel Berg @ 2022-08-18 14:38 UTC (permalink / raw)
To: help-gnu-emacs
Stefan Monnier via Users list for the GNU Emacs text editor wrote:
>>> I'm currently writing my MS's thesis. Searching for the
>>> state of the art of my assigned technology, I am
>>> struggling to read and reason about some old papers from
>>> ACM and IEEE (pre-2000, scanned, with no index). I am
>>> currently switching back and forth between Sioyek and
>>> Evince to read my pdfs, while taking notes in Org mode.
>>
>> Images with text you may process with OCR program to get
>> some meanings, and then text may be connected to pages as
>> well and PDF packed again.
>
> Some of the old articles in ACM are already processed this
> way for you, actually.
Are there some articles that are really good?
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
2022-08-18 2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 11:23 ` Jean Louis
@ 2022-08-18 20:45 ` Emanuel Berg
2022-08-18 22:24 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2 siblings, 1 reply; 20+ messages in thread
From: Emanuel Berg @ 2022-08-18 20:45 UTC (permalink / raw)
To: help-gnu-emacs
Alessandro Bertulli wrote:
> I reckon this message may start a flame but that's not my
> intention, I'm looking to hear your advice (especially, but
> not limited, if you work in/with academia)
Guys, no one uses the word "academia" any more.
Maybe at the Department of Philosophy, it's called academia.
So that's right, all the more reason.
It is called higher education, university, research, science
and maybe other words as well depending on context, but not
that one.
> I'm currently writing my MS's thesis
Guys, there are there levels:
Bachelor
Master
Ph.D.
> Or Sioyek is clearly better/worse than Emacs *for an
> academic workflow*?
Guys ...
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 20:45 ` Emanuel Berg
@ 2022-08-18 22:24 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 23:22 ` Emanuel Berg
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-08-18 22:24 UTC (permalink / raw)
To: help-gnu-emacs
Emanuel Berg [2022-08-18 22:45:14] wrote:
> Guys, no one uses the word "academia" any more.
In academia, we do.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 22:24 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-08-18 23:22 ` Emanuel Berg
2022-08-18 23:34 ` Emanuel Berg
2022-08-19 9:47 ` Marcin Borkowski
0 siblings, 2 replies; 20+ messages in thread
From: Emanuel Berg @ 2022-08-18 23:22 UTC (permalink / raw)
To: help-gnu-emacs
Stefan Monnier via Users list for the GNU Emacs text editor wrote:
>> Guys, no one uses the word "academia" any more.
>
> In academia, we do.
Examples? :)
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 23:22 ` Emanuel Berg
@ 2022-08-18 23:34 ` Emanuel Berg
2022-08-20 20:37 ` Alessandro Bertulli
2022-08-19 9:47 ` Marcin Borkowski
1 sibling, 1 reply; 20+ messages in thread
From: Emanuel Berg @ 2022-08-18 23:34 UTC (permalink / raw)
To: help-gnu-emacs
>>> Guys, no one uses the word "academia" any more.
>>
>> In academia, we do.
>
> Examples?
I reckon this message may start a flame but that's not my
intention, I'm looking to hear your advice (especially, but
not limited, if you work in/with academia)
I meant a _good_ example ... and list isn't "academia",
wherever that's suppose to be.
This - "if you work in/with academia" - should be "if you are
a researcher/scientist" if the OP is from the
technology/engineering/science world.
In "academia" there are "intellectual's" and "scholars" (AAAAH!
it gets worse!)
Stefan, you gonna be a Shakespearian scholar now? LOL Actually
I can't even envision you as one, and I mean that as
a compliment, of course ...
[Or a Civil War buff? Bonus fact/question: have more books
been written on the American Civil War than on WW2? On the
American civil war, "[t]here are over 60 000 books"
<https://en.wikipedia.org/wiki/Bibliography_of_the_American_Civil_War>
so ... how many on WW2? Ask the academics at the History
Department - but don't assume they can count or maintain
a Bibtex file...]
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 23:34 ` Emanuel Berg
@ 2022-08-20 20:37 ` Alessandro Bertulli
0 siblings, 0 replies; 20+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 20:37 UTC (permalink / raw)
To: incal; +Cc: help-gnu-emacs
> I meant a _good_ example ... and list isn't "academia",
> wherever that's suppose to be.
Again, I'm sorry this bothers you so much. Anyway, that was the reason I
specified "especially, but not limited to".
> This - "if you work in/with academia" - should be "if you are
> a researcher/scientist" if the OP is from the
> technology/engineering/science world.
You're right, I am, but actually in my country we never did that
distinction.
> In "academia" there are "intellectual's" and "scholars" (AAAAH!
> it gets worse!)
Here, on the other hand, you're completely right. Those two words are
not used even here, unless you're (inho) stucking up :-)
Anyway, as I said, i didn't mean to turn this thread into a linguistic
flame, so I don't want to further clutter the mailing list, if you are
ok with it.
Alessandro
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [OFFTOPIC] Academic workflow with old PDFs
2022-08-18 23:22 ` Emanuel Berg
2022-08-18 23:34 ` Emanuel Berg
@ 2022-08-19 9:47 ` Marcin Borkowski
2022-08-19 13:51 ` Emanuel Berg
1 sibling, 1 reply; 20+ messages in thread
From: Marcin Borkowski @ 2022-08-19 9:47 UTC (permalink / raw)
To: Emanuel Berg; +Cc: help-gnu-emacs
On 2022-08-19, at 01:22, Emanuel Berg <incal@dataswamp.org> wrote:
> Stefan Monnier via Users list for the GNU Emacs text editor wrote:
>
>>> Guys, no one uses the word "academia" any more.
>>
>> In academia, we do.
>
> Examples? :)
https://academia.stackexchange.com/
Good enough?
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
@ 2022-08-18 11:31 Alessandro Bertulli
0 siblings, 0 replies; 20+ messages in thread
From: Alessandro Bertulli @ 2022-08-18 11:31 UTC (permalink / raw)
To: monnier; +Cc: help-gnu-emacs
> more importantly I can do `C-x 5 2` to see several pages at the same
> time (I very often keep a frame/window displaying the bibliography,
> but other times I use that to display a figure while I read the
> corresponding description from another page, or to display several
> figures next to each other, ...)
This is a very good idea, thanks! I find that sometimes, using my DE tab
switching (alt+tab on GNOME) is quicker than changing Emacs buffer, so
that may be a good tip.
> I tried Sioyek and it's nice, but not sufficiently nicer to make me
> change :-)
I'm quite conflicted about it :-)
On one hand, it is specifically designed for academics, and it works
decently well; on the other, it has Vim-style keybindings (that may
overridden tho), and as I was saying it isn't always able to reconstruct
the hyperlinks if the pdf is old. But again, I'm starting to suspect
this is a limitation intrinsic to the quality of the pdf, and that there
isn't a magical tool to perform so accurate OCR. However, if any one has
comments or suggestions about this, they're welcome.
> Have you tried `pdf-tools`?
In fact, I have, but just barely. Until now, I sticked to Sioyek since I
needed to read almost exclusively old papers. But I'm willing to explore
it further.
Alessandro
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
@ 2022-08-18 21:08 Alessandro Bertulli
2022-08-18 22:00 ` Emanuel Berg
2022-08-19 4:25 ` tomas
0 siblings, 2 replies; 20+ messages in thread
From: Alessandro Bertulli @ 2022-08-18 21:08 UTC (permalink / raw)
To: incal; +Cc: help-gnu-emacs
Sorry if that bothered you :-)
> Guys, no one uses the word "academia" any more.
> It is called higher education, university, research, science
> and maybe other words as well depending on context, but not
> that one.
Dunno, in Italy it's still used sometimes, I just assumed it was canon.
> Guys, there are there levels:
>
> Bachelor
> Master
> Ph.D.
True, point is that "Master" has a different meaning in Italy, so I
always specify MS as "Master of Science", to disambiguate.
> Guys ...
Here I suspect you are referring to the use of "academia" again
Alessandro
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-18 21:08 Alessandro Bertulli
@ 2022-08-18 22:00 ` Emanuel Berg
2022-08-20 21:02 ` Alessandro Bertulli
2022-08-19 4:25 ` tomas
1 sibling, 1 reply; 20+ messages in thread
From: Emanuel Berg @ 2022-08-18 22:00 UTC (permalink / raw)
To: help-gnu-emacs
Alessandro Bertulli wrote:
>> It is called higher education, university, research,
>> science and maybe other words as well depending on context,
>> but not that one.
>
> Dunno
nno.
> in Italy it's still used sometimes, I just assumed it
> was canon.
Italy has a history of being a bit "behind", this has as often
been good as it has been bad, however in this case "academic"
brings the thoughts to a stinking professor of English
literature who cannot do laundry, this obviously has nothing
to do with the theoretic superstructure of very practical
things like technology and engineering. Even in language you
may have heard phrases like "the debate has been largely
academic" meaning without substance and not of practical
relevance. (Not that there is anything wrong with
English literature.)
>> Guys, there are there levels:
>>
>> Bachelor
>> Master
>> Ph.D.
>
> True, point is that "Master" has a different meaning in
> Italy, so I always specify MS as "Master of Science",
> to disambiguate.
Well, the international language - English;
the language of science, very international indeed - English;
the language of computers - English (US English in terms of speeling);
the language of your post and my reply - English;
"Master" - an English word ...
>> Guys ...
>
> Here I suspect you are referring to the use of "academia"
> again
If you woold specify what tasks in general and what features
in particular you look for to carry out those tasks, there is
no "academic workflow". But, you said it in subsequent
messages and to some some extent in the first post as well so
yeah, it is enough we cross that from the proceedings ...
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-18 22:00 ` Emanuel Berg
@ 2022-08-20 21:02 ` Alessandro Bertulli
2022-08-20 21:32 ` Eduardo Ochs
0 siblings, 1 reply; 20+ messages in thread
From: Alessandro Bertulli @ 2022-08-20 21:02 UTC (permalink / raw)
To: help-gnu-emacs
> Italy has a history of being a bit "behind"
Well, first of all, thank you :-)
> [...] however in this case "academic"
> brings the thoughts to a stinking professor of English
> literature who cannot do laundry, this obviously has nothing
> to do with the theoretic superstructure of very practical
> things like technology and engineering. Even in language you
> may have heard phrases like "the debate has been largely
> academic" meaning without substance and not of practical
> relevance. (Not that there is anything wrong with
> English literature.)
Surely you know English language better than me, so I don't question
here.
> Well, the international language - English;
> the language of science, very international indeed - English;
> the language of computers - English (US English in terms of speeling);
> the language of your post and my reply - English;
> "Master" - an English word ...
You're right, my point is that "Master of Science" is English enough:
https://en.wikipedia.org/wiki/Master_of_Science
> If you woold specify what tasks in general and what features
> in particular you look for to carry out those tasks, there is
> no "academic workflow". But, you said it in subsequent
> messages and to some some extent in the first post as well so
> yeah, it is enough we cross that from the proceedings ...
Here my bad, I should have asked a narrower question. Looking back, I'd
say you're right, my first question was if it was possible (in the
community's opinion) to study old, scanned, poorly indexed PDFs with
pdf-tools, and if not, what other tools do you use. I should have been
more focused, I apologize.
Alessandro
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-20 21:02 ` Alessandro Bertulli
@ 2022-08-20 21:32 ` Eduardo Ochs
0 siblings, 0 replies; 20+ messages in thread
From: Eduardo Ochs @ 2022-08-20 21:32 UTC (permalink / raw)
To: Alessandro Bertulli; +Cc: help-gnu-emacs
On Sat, 20 Aug 2022 at 18:03, Alessandro Bertulli
<alessandro.bertulli96@gmail.com> wrote:
>
> Here my bad, I should have asked a narrower question. Looking back, I'd
> say you're right, my first question was if it was possible (in the
> community's opinion) to study old, scanned, poorly indexed PDFs with
> pdf-tools, and if not, what other tools do you use. I should have been
> more focused, I apologize.
Hi Alessandro,
my favorite tool for indexing PDFs - and that I use even for PDFs that
only contain photos of whiteboards, and that are totally unOCRizable -
is the module of eev that is explained in this tutorial,
http://angg.twu.net/eev-intros/find-pdf-like-intro.html
and in the video whose index is here:
http://angg.twu.net/.emacs.videos.html#eev2020
Look for the lines in the index that look like these ones,
(find-eev2020video "4:52" "`find-pdf-page' calls an external program")
(find-eev2020video "5:26" "`find-pdf-text' converts the PDF to text and")
(find-eev2020video "10:45" "`code-pdf-page' creates a short
hyperlink function for a PDF")
(find-eev2020video "11:38" "let's try...")
(find-eev2020video "11:55" "`find-fongspivatext'")
(find-eev2020video "12:25" "This block is a kind of an index for that book")
(find-eev2020video "12:54" "This block is a kind of an index for that video")
and click on the links with the timemarks...
If that looks like something that you would like to try then send me
an e-mail and let's see if we can arrange to chat by IRC or by some
other means!
Cheers,
Eduardo Ochs
http://angg.twu.net/#eev
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Academic workflow with old PDFs
2022-08-18 21:08 Alessandro Bertulli
2022-08-18 22:00 ` Emanuel Berg
@ 2022-08-19 4:25 ` tomas
1 sibling, 0 replies; 20+ messages in thread
From: tomas @ 2022-08-19 4:25 UTC (permalink / raw)
To: help-gnu-emacs
[-- Attachment #1: Type: text/plain, Size: 434 bytes --]
On Thu, Aug 18, 2022 at 11:08:33PM +0200, Alessandro Bertulli wrote:
> Sorry if that bothered you :-)
>
> > Guys, no one uses the word "academia" any more.
>
> > It is called higher education, university, research, science
> > and maybe other words as well depending on context, but not
> > that one.
>
> Dunno, in Italy it's still used sometimes, I just assumed it was canon.
I use it all the time.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-08-20 21:32 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-17 21:36 Academic workflow with old PDFs Alessandro Bertulli
2022-08-18 2:26 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 11:23 ` Jean Louis
2022-08-18 13:48 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 14:38 ` Emanuel Berg
2022-08-18 14:41 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 20:45 ` Emanuel Berg
2022-08-18 20:45 ` Emanuel Berg
2022-08-18 22:24 ` [OFFTOPIC] " Stefan Monnier via Users list for the GNU Emacs text editor
2022-08-18 23:22 ` Emanuel Berg
2022-08-18 23:34 ` Emanuel Berg
2022-08-20 20:37 ` Alessandro Bertulli
2022-08-19 9:47 ` Marcin Borkowski
2022-08-19 13:51 ` Emanuel Berg
-- strict thread matches above, loose matches on Subject: below --
2022-08-18 11:31 Alessandro Bertulli
2022-08-18 21:08 Alessandro Bertulli
2022-08-18 22:00 ` Emanuel Berg
2022-08-20 21:02 ` Alessandro Bertulli
2022-08-20 21:32 ` Eduardo Ochs
2022-08-19 4:25 ` tomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).