unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* emacs metadata editor for (mostly) scientific pdf's
@ 2013-01-15  7:55 Martin Rubey
  2013-01-15 21:14 ` Daimrod
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Rubey @ 2013-01-15  7:55 UTC (permalink / raw)
  To: help-gnu-emacs

Hi there!

I wonder whether anybody has tried or would like to try to adapt dired
to manage pdf's metadata.

Namely, I have a collection of files, organized in a simple directory
structure by topic (as "algebra", "combinatorics", ...), which mostly
works for me.

I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk)
that I may want to use on the files.

The main problem is: instead of mode, owner, size, date I would rather
like to see (and possibly edit) some fields from the file's metadata
(eg. author, title) in addition to the filename.

There is no way I could write this, but I'd be happy to fiddle around a
little...

Best,

Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: emacs metadata editor for (mostly) scientific pdf's
  2013-01-15  7:55 emacs metadata editor for (mostly) scientific pdf's Martin Rubey
@ 2013-01-15 21:14 ` Daimrod
  2013-01-16  8:22   ` Martin Rubey
  0 siblings, 1 reply; 6+ messages in thread
From: Daimrod @ 2013-01-15 21:14 UTC (permalink / raw)
  To: help-gnu-emacs

Martin Rubey <axiomize@yahoo.de> writes:

> Hi there!
>
> I wonder whether anybody has tried or would like to try to adapt dired
> to manage pdf's metadata.
>
> Namely, I have a collection of files, organized in a simple directory
> structure by topic (as "algebra", "combinatorics", ...), which mostly
> works for me.
>
> I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk)
> that I may want to use on the files.
>
> The main problem is: instead of mode, owner, size, date I would rather
> like to see (and possibly edit) some fields from the file's metadata
> (eg. author, title) in addition to the filename.
>
> There is no way I could write this, but I'd be happy to fiddle around a
> little...

Do you know any tools usable from the command line to extract this
information?

Though I don't know how easy it is to customize the attributes show by
dired, I think it wouldn't be difficult to add a shortcut to display
some information about specific files in another buffer or via `message'.

> Best,
>
> Martin

-- 
Daimrod/Greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: emacs metadata editor for (mostly) scientific pdf's
  2013-01-15 21:14 ` Daimrod
@ 2013-01-16  8:22   ` Martin Rubey
  2013-01-16 10:38     ` Thien-Thi Nguyen
       [not found]     ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Martin Rubey @ 2013-01-16  8:22 UTC (permalink / raw)
  To: help-gnu-emacs

Daimrod <daimrod@gmail.com> writes:

> Martin Rubey <axiomize@yahoo.de> writes:
>
>> Hi there!
>>
>> I wonder whether anybody has tried or would like to try to adapt dired
>> to manage pdf's metadata.
>>
>> Namely, I have a collection of files, organized in a simple directory
>> structure by topic (as "algebra", "combinatorics", ...), which mostly
>> works for me.
>>
>> I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk)
>> that I may want to use on the files.
>>
>> The main problem is: instead of mode, owner, size, date I would rather
>> like to see (and possibly edit) some fields from the file's metadata
>> (eg. author, title) in addition to the filename.
>>
>> There is no way I could write this, but I'd be happy to fiddle around a
>> little...
>
> Do you know any tools usable from the command line to extract this
> information?

as I wrote above: pdfmeat from http://code.google.com/p/pdfmeat/

    pdfmeat.py --alone --inject myfile.pdf

extracts some text from myfile.pdf, searches google scholar to find a
match, injects it into the metadata section of myfile.pdf.  Of course it
makes mistakes sometimes.

    pdftk myfile.pdf dump_data

lists just its info fields, while

    pdfinfo -meta myfile.pdf

lists info fields (Title, Subject, ... PDF version) and XMP stream
(Metadata).  Example below, where pdfmeat was used to inject the info
fields and the XMP stream.

> Though I don't know how easy it is to customize the attributes show by
> dired, I think it wouldn't be difficult to add a shortcut to display
> some information about specific files in another buffer or via
> `message'.

Well, the main point is being able to edit at least Author, Title, Year
easily.  (Because pdfmeat makes mistakes) I guess the reason that
pdfmeat really writes into the XMP stream is that the infofields are
somewhat restricted.  Therefore, it might be best to be able to connect
with the bibtex-mode...

Martin

pdfinfo -meta Hu\,Yang\;\ 2004\;\ Some\ irreducible\ representations\ of\ Brauer\'s\ centralizer\ algebras.pdf 
Title:          Some irreducible representations of Brauer's centralizer
algebras
Subject:        Glasgow Mathematical Journal, 2004
Keywords:       article: hu2004some
Author:         Hu, J.; Yang, Y.
Creator:        PDFMeat's bibtex2pdfmeta
Producer:       PDFMeat's bibtex2pdfmeta
CreationDate:   Wed Sep 15 16:28:29 2004
ModDate:        Wed Nov 10 06:48:47 2010
Tagged:         no
Pages:          15
Encrypted:      no
Page size:      493 x 700 pts
File size:      175296 bytes
Optimized:      no
PDF version:    1.3
Metadata:
<?xml version='1.0' encoding='UTF-8'?>
<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibtex="http://jabref.sourceforge.net/bibteXMP/"
    bibtex:bibtexkey="hu2004some"
    bibtex:owner=""
    bibtex:timestamp="2013-01-11 10:44:35">
       <bibtex:title>Some irreducible representations of Brauer's
       centralizer algebras</bibtex:title>
<bibtex:author>Hu, J. and Yang, Y.</bibtex:author>
<bibtex:journal>Glasgow Mathematical Journal</bibtex:journal>
<bibtex:volume>46</bibtex:volume>
<bibtex:number>03</bibtex:number>
<bibtex:pages>499--513</bibtex:pages>
<bibtex:year>2004</bibtex:year>
<bibtex:publisher>Cambridge Univ Press</bibtex:publisher>
<bibtex:file>file:///home/rubey/Books+Papers/algebra/Hu Yang Some
Irreducible Representations of Brauer's Centralizer
Algebras.pdf:pdf</bibtex:file>
<bibtex:md5sum>f296ecff7b3e2b6b78ca6eb57f1458eb</bibtex:md5sum>
<bibtex:url>http://journals.cambridge.org/abstract_S001708950400196X</bibtex:url>
<bibtex:citations>4</bibtex:citations>
<bibtex:citedbyid>13306009431956969271</bibtex:citedbyid>
<bibtex:abstract>Let m, n ∈ ,ގV be a 2m-dimensional complex vector
space. The irreducible representations of the Brauer's centralizer
algebra Bn (-2m) appearing in V (x)n are in 1-1 correspondence to the
set of pairs ( f, λ), where f ∈ ޚwith 0 &lt;= f &lt;= [n/2], and λ n -
2f satisfying λ1 &lt;= m. In this paper, we first show that each of
these representations has a basis consists of eigenvectors for the
subalgebra of Bn (-2m) generated by all the Jucys-Murphy operators, and
we determine the corresponding eigenvalues. Then we identify these
representations with the irreducible representations constructed from a
cellular basis of Bn (-2m). Finally, an explicit description of the
action of each generator of Bn (-2m) on such a basis is also given,
which generalizes earlier work of [15] for Brauer's centralizer algebra
Bn (m). 2000 Mathematics Subject
Classification. 16G99.</bibtex:abstract>
<bibtex:mailhosts>mathematik.uni-stuttgart.de;
yahoo.com.cn</bibtex:mailhosts>
<bibtex:pdfmeat>timestamp: 2013-01-11 10:44:35; queries: 1; inode:
2505172</bibtex:pdfmeat>

        <bibtex:author>
            <rdf:Seq><rdf:li>J. Hu</rdf:li><rdf:li>Y. Yang</rdf:li></rdf:Seq>
        </bibtex:author>
    <bibtex:entrytype>article</bibtex:entrytype>
</rdf:Description>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: emacs metadata editor for (mostly) scientific pdf's
  2013-01-16  8:22   ` Martin Rubey
@ 2013-01-16 10:38     ` Thien-Thi Nguyen
       [not found]     ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Thien-Thi Nguyen @ 2013-01-16 10:38 UTC (permalink / raw)
  To: Martin Rubey; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

() Martin Rubey <axiomize@yahoo.de>
() Wed, 16 Jan 2013 09:22:42 +0100

   Well, the main point is being able to edit at least Author, Title,
   Year easily.  (Because pdfmeat makes mistakes)

Do these papers have online authoritative metadata (akin to PUBMED for
the biological sciences)?  It strikes me as backwards to have to mine
metadata out of the PDF itself.

-- 
Thien-Thi Nguyen ..................................... GPG key: 4C807502
.                  NB: ttn at glug dot org is not me                   .
.                 (and has not been since 2007 or so)                  .
.                        ACCEPT NO SUBSTITUTES                         .
........... please send technical questions to mailing lists ...........

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: emacs metadata editor for (mostly) scientific pdf's
       [not found]     ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org>
@ 2013-01-16 10:46       ` Martin Rubey
  2013-01-16 11:10         ` Rainer M Krug
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Rubey @ 2013-01-16 10:46 UTC (permalink / raw)
  To: help-gnu-emacs

Thien-Thi Nguyen <ttn@gnuvola.org> writes:

> () Martin Rubey <axiomize@yahoo.de>
> () Wed, 16 Jan 2013 09:22:42 +0100
>
>    Well, the main point is being able to edit at least Author, Title,
>    Year easily.  (Because pdfmeat makes mistakes)
>
> Do these papers have online authoritative metadata (akin to PUBMED for
> the biological sciences)?  It strikes me as backwards to have to mine
> metadata out of the PDF itself.

Well, sometimes... I often get articles from the arxiv, which do not
have metadata, but the contents coincides with the "published" version,
so I can then use the metadata from mathscinet even if my library does
not have access.

However, I just found zotero + zotfile, which is a firefox plugin and
does what I need.  Although I love emacs, it's probably better to stick
to something which is well-maintained.

Best,

Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: emacs metadata editor for (mostly) scientific pdf's
  2013-01-16 10:46       ` Martin Rubey
@ 2013-01-16 11:10         ` Rainer M Krug
  0 siblings, 0 replies; 6+ messages in thread
From: Rainer M Krug @ 2013-01-16 11:10 UTC (permalink / raw)
  To: help-gnu-emacs

On 16/01/13 11:46, Martin Rubey wrote:
> Thien-Thi Nguyen <ttn@gnuvola.org> writes:
> 
>> () Martin Rubey <axiomize@yahoo.de> () Wed, 16 Jan 2013 09:22:42 +0100
>> 
>> Well, the main point is being able to edit at least Author, Title, Year easily.  (Because
>> pdfmeat makes mistakes)
>> 
>> Do these papers have online authoritative metadata (akin to PUBMED for the biological
>> sciences)?  It strikes me as backwards to have to mine metadata out of the PDF itself.
> 
> Well, sometimes... I often get articles from the arxiv, which do not have metadata, but the
> contents coincides with the "published" version, so I can then use the metadata from mathscinet
> even if my library does not have access.
> 
> However, I just found zotero + zotfile, which is a firefox plugin and does what I need.
> Although I love emacs, it's probably better to stick to something which is well-maintained.

In this case, you can also look at mendeley (extracting data from pdf) and JabRef (write data to pdf).

Although I think it would be nice to edit the data in the pdf from emacs.

Cheers,

Rainer

> 
> Best,
> 
> Martin
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-16 11:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-15  7:55 emacs metadata editor for (mostly) scientific pdf's Martin Rubey
2013-01-15 21:14 ` Daimrod
2013-01-16  8:22   ` Martin Rubey
2013-01-16 10:38     ` Thien-Thi Nguyen
     [not found]     ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org>
2013-01-16 10:46       ` Martin Rubey
2013-01-16 11:10         ` Rainer M Krug

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).