* emacs metadata editor for (mostly) scientific pdf's @ 2013-01-15 7:55 Martin Rubey 2013-01-15 21:14 ` Daimrod 0 siblings, 1 reply; 6+ messages in thread From: Martin Rubey @ 2013-01-15 7:55 UTC (permalink / raw) To: help-gnu-emacs Hi there! I wonder whether anybody has tried or would like to try to adapt dired to manage pdf's metadata. Namely, I have a collection of files, organized in a simple directory structure by topic (as "algebra", "combinatorics", ...), which mostly works for me. I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk) that I may want to use on the files. The main problem is: instead of mode, owner, size, date I would rather like to see (and possibly edit) some fields from the file's metadata (eg. author, title) in addition to the filename. There is no way I could write this, but I'd be happy to fiddle around a little... Best, Martin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: emacs metadata editor for (mostly) scientific pdf's 2013-01-15 7:55 emacs metadata editor for (mostly) scientific pdf's Martin Rubey @ 2013-01-15 21:14 ` Daimrod 2013-01-16 8:22 ` Martin Rubey 0 siblings, 1 reply; 6+ messages in thread From: Daimrod @ 2013-01-15 21:14 UTC (permalink / raw) To: help-gnu-emacs Martin Rubey <axiomize@yahoo.de> writes: > Hi there! > > I wonder whether anybody has tried or would like to try to adapt dired > to manage pdf's metadata. > > Namely, I have a collection of files, organized in a simple directory > structure by topic (as "algebra", "combinatorics", ...), which mostly > works for me. > > I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk) > that I may want to use on the files. > > The main problem is: instead of mode, owner, size, date I would rather > like to see (and possibly edit) some fields from the file's metadata > (eg. author, title) in addition to the filename. > > There is no way I could write this, but I'd be happy to fiddle around a > little... Do you know any tools usable from the command line to extract this information? Though I don't know how easy it is to customize the attributes show by dired, I think it wouldn't be difficult to add a shortcut to display some information about specific files in another buffer or via `message'. > Best, > > Martin -- Daimrod/Greg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: emacs metadata editor for (mostly) scientific pdf's 2013-01-15 21:14 ` Daimrod @ 2013-01-16 8:22 ` Martin Rubey 2013-01-16 10:38 ` Thien-Thi Nguyen [not found] ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 6+ messages in thread From: Martin Rubey @ 2013-01-16 8:22 UTC (permalink / raw) To: help-gnu-emacs Daimrod <daimrod@gmail.com> writes: > Martin Rubey <axiomize@yahoo.de> writes: > >> Hi there! >> >> I wonder whether anybody has tried or would like to try to adapt dired >> to manage pdf's metadata. >> >> Namely, I have a collection of files, organized in a simple directory >> structure by topic (as "algebra", "combinatorics", ...), which mostly >> works for me. >> >> I have a few tools (pdfmeat http://code.google.com/p/pdfmeat/, pdftk) >> that I may want to use on the files. >> >> The main problem is: instead of mode, owner, size, date I would rather >> like to see (and possibly edit) some fields from the file's metadata >> (eg. author, title) in addition to the filename. >> >> There is no way I could write this, but I'd be happy to fiddle around a >> little... > > Do you know any tools usable from the command line to extract this > information? as I wrote above: pdfmeat from http://code.google.com/p/pdfmeat/ pdfmeat.py --alone --inject myfile.pdf extracts some text from myfile.pdf, searches google scholar to find a match, injects it into the metadata section of myfile.pdf. Of course it makes mistakes sometimes. pdftk myfile.pdf dump_data lists just its info fields, while pdfinfo -meta myfile.pdf lists info fields (Title, Subject, ... PDF version) and XMP stream (Metadata). Example below, where pdfmeat was used to inject the info fields and the XMP stream. > Though I don't know how easy it is to customize the attributes show by > dired, I think it wouldn't be difficult to add a shortcut to display > some information about specific files in another buffer or via > `message'. Well, the main point is being able to edit at least Author, Title, Year easily. (Because pdfmeat makes mistakes) I guess the reason that pdfmeat really writes into the XMP stream is that the infofields are somewhat restricted. Therefore, it might be best to be able to connect with the bibtex-mode... Martin pdfinfo -meta Hu\,Yang\;\ 2004\;\ Some\ irreducible\ representations\ of\ Brauer\'s\ centralizer\ algebras.pdf Title: Some irreducible representations of Brauer's centralizer algebras Subject: Glasgow Mathematical Journal, 2004 Keywords: article: hu2004some Author: Hu, J.; Yang, Y. Creator: PDFMeat's bibtex2pdfmeta Producer: PDFMeat's bibtex2pdfmeta CreationDate: Wed Sep 15 16:28:29 2004 ModDate: Wed Nov 10 06:48:47 2010 Tagged: no Pages: 15 Encrypted: no Page size: 493 x 700 pts File size: 175296 bytes Optimized: no PDF version: 1.3 Metadata: <?xml version='1.0' encoding='UTF-8'?> <rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibtex="http://jabref.sourceforge.net/bibteXMP/" bibtex:bibtexkey="hu2004some" bibtex:owner="" bibtex:timestamp="2013-01-11 10:44:35"> <bibtex:title>Some irreducible representations of Brauer's centralizer algebras</bibtex:title> <bibtex:author>Hu, J. and Yang, Y.</bibtex:author> <bibtex:journal>Glasgow Mathematical Journal</bibtex:journal> <bibtex:volume>46</bibtex:volume> <bibtex:number>03</bibtex:number> <bibtex:pages>499--513</bibtex:pages> <bibtex:year>2004</bibtex:year> <bibtex:publisher>Cambridge Univ Press</bibtex:publisher> <bibtex:file>file:///home/rubey/Books+Papers/algebra/Hu Yang Some Irreducible Representations of Brauer's Centralizer Algebras.pdf:pdf</bibtex:file> <bibtex:md5sum>f296ecff7b3e2b6b78ca6eb57f1458eb</bibtex:md5sum> <bibtex:url>http://journals.cambridge.org/abstract_S001708950400196X</bibtex:url> <bibtex:citations>4</bibtex:citations> <bibtex:citedbyid>13306009431956969271</bibtex:citedbyid> <bibtex:abstract>Let m, n ∈ ,ގV be a 2m-dimensional complex vector space. The irreducible representations of the Brauer's centralizer algebra Bn (-2m) appearing in V (x)n are in 1-1 correspondence to the set of pairs ( f, λ), where f ∈ ޚwith 0 <= f <= [n/2], and λ n - 2f satisfying λ1 <= m. In this paper, we first show that each of these representations has a basis consists of eigenvectors for the subalgebra of Bn (-2m) generated by all the Jucys-Murphy operators, and we determine the corresponding eigenvalues. Then we identify these representations with the irreducible representations constructed from a cellular basis of Bn (-2m). Finally, an explicit description of the action of each generator of Bn (-2m) on such a basis is also given, which generalizes earlier work of [15] for Brauer's centralizer algebra Bn (m). 2000 Mathematics Subject Classification. 16G99.</bibtex:abstract> <bibtex:mailhosts>mathematik.uni-stuttgart.de; yahoo.com.cn</bibtex:mailhosts> <bibtex:pdfmeat>timestamp: 2013-01-11 10:44:35; queries: 1; inode: 2505172</bibtex:pdfmeat> <bibtex:author> <rdf:Seq><rdf:li>J. Hu</rdf:li><rdf:li>Y. Yang</rdf:li></rdf:Seq> </bibtex:author> <bibtex:entrytype>article</bibtex:entrytype> </rdf:Description> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: emacs metadata editor for (mostly) scientific pdf's 2013-01-16 8:22 ` Martin Rubey @ 2013-01-16 10:38 ` Thien-Thi Nguyen [not found] ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 6+ messages in thread From: Thien-Thi Nguyen @ 2013-01-16 10:38 UTC (permalink / raw) To: Martin Rubey; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 752 bytes --] () Martin Rubey <axiomize@yahoo.de> () Wed, 16 Jan 2013 09:22:42 +0100 Well, the main point is being able to edit at least Author, Title, Year easily. (Because pdfmeat makes mistakes) Do these papers have online authoritative metadata (akin to PUBMED for the biological sciences)? It strikes me as backwards to have to mine metadata out of the PDF itself. -- Thien-Thi Nguyen ..................................... GPG key: 4C807502 . NB: ttn at glug dot org is not me . . (and has not been since 2007 or so) . . ACCEPT NO SUBSTITUTES . ........... please send technical questions to mailing lists ........... [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org>]
* Re: emacs metadata editor for (mostly) scientific pdf's [not found] ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org> @ 2013-01-16 10:46 ` Martin Rubey 2013-01-16 11:10 ` Rainer M Krug 0 siblings, 1 reply; 6+ messages in thread From: Martin Rubey @ 2013-01-16 10:46 UTC (permalink / raw) To: help-gnu-emacs Thien-Thi Nguyen <ttn@gnuvola.org> writes: > () Martin Rubey <axiomize@yahoo.de> > () Wed, 16 Jan 2013 09:22:42 +0100 > > Well, the main point is being able to edit at least Author, Title, > Year easily. (Because pdfmeat makes mistakes) > > Do these papers have online authoritative metadata (akin to PUBMED for > the biological sciences)? It strikes me as backwards to have to mine > metadata out of the PDF itself. Well, sometimes... I often get articles from the arxiv, which do not have metadata, but the contents coincides with the "published" version, so I can then use the metadata from mathscinet even if my library does not have access. However, I just found zotero + zotfile, which is a firefox plugin and does what I need. Although I love emacs, it's probably better to stick to something which is well-maintained. Best, Martin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: emacs metadata editor for (mostly) scientific pdf's 2013-01-16 10:46 ` Martin Rubey @ 2013-01-16 11:10 ` Rainer M Krug 0 siblings, 0 replies; 6+ messages in thread From: Rainer M Krug @ 2013-01-16 11:10 UTC (permalink / raw) To: help-gnu-emacs On 16/01/13 11:46, Martin Rubey wrote: > Thien-Thi Nguyen <ttn@gnuvola.org> writes: > >> () Martin Rubey <axiomize@yahoo.de> () Wed, 16 Jan 2013 09:22:42 +0100 >> >> Well, the main point is being able to edit at least Author, Title, Year easily. (Because >> pdfmeat makes mistakes) >> >> Do these papers have online authoritative metadata (akin to PUBMED for the biological >> sciences)? It strikes me as backwards to have to mine metadata out of the PDF itself. > > Well, sometimes... I often get articles from the arxiv, which do not have metadata, but the > contents coincides with the "published" version, so I can then use the metadata from mathscinet > even if my library does not have access. > > However, I just found zotero + zotfile, which is a firefox plugin and does what I need. > Although I love emacs, it's probably better to stick to something which is well-maintained. In this case, you can also look at mendeley (extracting data from pdf) and JabRef (write data to pdf). Although I think it would be nice to edit the data in the pdf from emacs. Cheers, Rainer > > Best, > > Martin > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-01-16 11:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-15 7:55 emacs metadata editor for (mostly) scientific pdf's Martin Rubey 2013-01-15 21:14 ` Daimrod 2013-01-16 8:22 ` Martin Rubey 2013-01-16 10:38 ` Thien-Thi Nguyen [not found] ` <mailman.17600.1358332613.855.help-gnu-emacs@gnu.org> 2013-01-16 10:46 ` Martin Rubey 2013-01-16 11:10 ` Rainer M Krug
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).