On Thu, Nov 12, 2015 at 7:23 AM, Ramon Diaz-Uriarte <rdiaz02@gmail.com> wrote:



On Wed, 11-11-2015, at 21:33, Matt Price <moptop99@gmail.com> wrote:
>>
>>
>>   instead of the text. Bummer! I wonder if RepliGO gives you a lot more
>>   than the rest, or if I am doing something silly.
>>
>> I think that there is no standard way of storing the highlight contents. I
> chose Repligo over EZPDF because it gives you access to the text of the
> highlights!


I'll try to see if I can get repligo (I had it a few years ago)

>
> Okular, I think, stores your annotations in its own database, rather than
> in the pdf. You can (I think!) attach the annotations to the pdf from
> inside Okular.  At leasts, that's what I remember from when I was looking
> around.

Actually, Okular stores the annotations in the PDF itself if you do "Save
As". (It still keeps an internal db, but I never use it anymore). It is
easy to check by doing that and then opening the file with another reader
in another machine (e.g., in an android).


my bad, thx. 

>
> Repligo stores the highlighted text in the "subject" field of the
> annotation. It's possible that the content of the annotation is stored in
> some other field, like "content".  Maybe you can try:
>
> M-: (pdf-annot-get-annots) and look at the output in the *Messages*
> buffer.  Can you see any evidence of the the text? Can you share what you
> learned?

Nope, no evidence of the text. I get things such as

(((buffer . #<buffer Frank_2015_Commentary.pdf>) (page . 13) (edges
0.113553 0.31717 0.868657 0.361746) (type . highlight) (id . annot-13-0)
(flags . 4) (color . "#ffff00") (contents . "") (modified 22081 45188)
(label . "TF201") (subject . "Highlight") (opacity . 1.0) ...)


so we get the location of the highlight (and its properties), but not the
textual contents. And this is the case whether I make the annotation with
EzPDF or Okular or, for that matter, with pdf-tools itself.

So it seems RepliGO is actually giving you a lot more by default :-)


Try replacing
(text (assoc-default 'subject annot))

with

(text (pdf-info-gettext page (assoc-default 'edges annot)))

in the lambda function in pdf-annot-markups-as-org-text.  This will fail on cropped pdfs if you have added highlights using the most recent pdf-tools, which stores negative values in the 'edges field, but I've found it works otherwise.  I'd love to hear if it works for you too. (I know you're following the relevant bug report on the pdf-tools github repo).
 

>
> Politza and I are discussing this here:
> https://github.com/politza/pdf-tools/issues/137
>
> that might be a good place to ocntinue the conversation.
>

I'll do. In the meantime, I think this is a limitation coming from
poppler. Other people have mentioned similar things (e.g.,
http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
tools that depend on poppler (such as Leela:
https://github.com/TrilbyWhite/Leela) also will not give us the text
itself.



>>
>> Until I found pdf-tools, I had planned to write a node wrapper for pdf.js
> and grab the annotations that way.  But I don't really know how to do that,
> so this turned out to be easier :-)
>
> Anyway, I've judated the post, and it's now possible to create links to
> individualt annotations, though you will have to use my updated version of
> org-pdfview, until/unless Markus accepts my patch.


I just updated packages, and things are working perfectly: I am jumping to
the page and location.



Thanks,


R.