On Wed, 11-11-2015, at 21:33, Matt Price <moptop99@gmail.com> wrote:
>>
>>
>> instead of the text. Bummer! I wonder if RepliGO gives you a lot more
>> than the rest, or if I am doing something silly.
>>
>> I think that there is no standard way of storing the highlight contents. I
> chose Repligo over EZPDF because it gives you access to the text of the
> highlights!
I'll try to see if I can get repligo (I had it a few years ago)
>
> Okular, I think, stores your annotations in its own database, rather than
> in the pdf. You can (I think!) attach the annotations to the pdf from
> inside Okular. At leasts, that's what I remember from when I was looking
> around.
Actually, Okular stores the annotations in the PDF itself if you do "Save
As". (It still keeps an internal db, but I never use it anymore). It is
easy to check by doing that and then opening the file with another reader
in another machine (e.g., in an android).
>
> Repligo stores the highlighted text in the "subject" field of the
> annotation. It's possible that the content of the annotation is stored in
> some other field, like "content". Maybe you can try:
>
> M-: (pdf-annot-get-annots) and look at the output in the *Messages*
> buffer. Can you see any evidence of the the text? Can you share what you
> learned?
Nope, no evidence of the text. I get things such as
(((buffer . #<buffer Frank_2015_Commentary.pdf>) (page . 13) (edges
0.113553 0.31717 0.868657 0.361746) (type . highlight) (id . annot-13-0)
(flags . 4) (color . "#ffff00") (contents . "") (modified 22081 45188)
(label . "TF201") (subject . "Highlight") (opacity . 1.0) ...)
so we get the location of the highlight (and its properties), but not the
textual contents. And this is the case whether I make the annotation with
EzPDF or Okular or, for that matter, with pdf-tools itself.
So it seems RepliGO is actually giving you a lot more by default :-)
(text (assoc-default 'subject annot))
with
(text (pdf-info-gettext page (assoc-default 'edges annot)))
>
> Politza and I are discussing this here:
> https://github.com/politza/pdf-tools/issues/137
>
> that might be a good place to ocntinue the conversation.
>
I'll do. In the meantime, I think this is a limitation coming from
poppler. Other people have mentioned similar things (e.g.,
http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
tools that depend on poppler (such as Leela:
https://github.com/TrilbyWhite/Leela) also will not give us the text
itself.
>>
>> Until I found pdf-tools, I had planned to write a node wrapper for pdf.js
> and grab the annotations that way. But I don't really know how to do that,
> so this turned out to be easier :-)
>
> Anyway, I've judated the post, and it's now possible to create links to
> individualt annotations, though you will have to use my updated version of
> org-pdfview, until/unless Markus accepts my patch.
I just updated packages, and things are working perfectly: I am jumping to
the page and location.
Thanks,
R.