emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* More questions about CSL and org-mode
@ 2015-12-06 21:25 John Kitchin
  2015-12-06 23:24 ` Richard Lawrence
  0 siblings, 1 reply; 7+ messages in thread
From: John Kitchin @ 2015-12-06 21:25 UTC (permalink / raw)
  To: Org Mode

Hi all,

This is mostly for the people working on citations in org-mode.

I have been reading about CSL more this weekend. IIRC, one of the
reasons to develop the new citation syntax was to get the ability to
have pre/post text in citations more conveniently than what is currently
possible.

I have not seen any possibility for this with CSL, however. Is my
understanding correct? Is this a problem, or something partially handled
by org-export and partially by a citeproc?

IIUC, the current aim is to get a citeproc that will do the following on
export:
1. replace in-text citation syntax with org-formatted replacements
2. Insert an org-formatted bibliography somewhere in the document
3. proceed with org-to-something export, with built-in
exporters.

The current contenders for a citeproc are Zotero and Pandoc.

Has anyone looked at https://pypi.python.org/pypi/citeproc-py/
or https://github.com/inukshuk/citeproc-ruby

The ruby one looks pretty advanced.






--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-06 21:25 More questions about CSL and org-mode John Kitchin
@ 2015-12-06 23:24 ` Richard Lawrence
  2015-12-06 23:45   ` Richard Lawrence
  2015-12-07 16:18   ` John Kitchin
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Lawrence @ 2015-12-06 23:24 UTC (permalink / raw)
  To: John Kitchin, Org Mode

Hi John,

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> Hi all,
>
> This is mostly for the people working on citations in org-mode.
>
> I have been reading about CSL more this weekend. IIRC, one of the
> reasons to develop the new citation syntax was to get the ability to
> have pre/post text in citations more conveniently than what is currently
> possible.

Yes, that is my understanding, too.

> I have not seen any possibility for this with CSL, however. Is my
> understanding correct? Is this a problem, or something partially handled
> by org-export and partially by a citeproc?

The CSL processors I've looked at support prefix and suffix text for
individual references within a citation.  See, for example, the
citeproc-js documentation:

http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#citation-data-object

prefix, suffix, and some other fields are supported.  pandoc-citeproc
supports the same set of fields.

However, my understanding is that neither citeproc-js nor
pandoc-citeproc support a BibLaTeX-style "common" prefix/suffix that
belongs to the citation as a whole, rather than the individual
references within it, as is available in the multi-cite commands.  We
currently have support for such common prefixes/suffixes in Org syntax. 

My solution to this in my org-citeproc wrapper for pandoc-citeproc is to
prepend the common prefix to the prefix for the first reference in a
citation, and append the common suffix to the last reference.  This is
not a great solution, because it is not really defined what kind of
punctuation (if any) should separate the common prefix from the first
item's prefix, and so on.  But I figured that was not an important issue
to address until we actually have people making use of common prefix and
suffix syntax who are not exporting to LaTeX...

> IIUC, the current aim is to get a citeproc that will do the following on
> export:
> 1. replace in-text citation syntax with org-formatted replacements
> 2. Insert an org-formatted bibliography somewhere in the document
> 3. proceed with org-to-something export, with built-in
> exporters.

That's basically my understanding too.  There is one snag with the
"org-formatted replacement" plan, though, which I saw in a Zotero dev
discussion yesterday.  CSL processing might result in multiple levels of
formatting, e.g. nested italics like

<em>Something with an internal <em>Title</em></em>

and that won't translate very well back to Org syntax in general:

/Something with an internal /Title//

The suggestion was to just use HTML output, and then parse the HTML to
get a data structure that could be directly rendered into HTML, LaTeX,
etc., which support nested italics just fine.  I think we could do this,
though maybe there's a better solution.  That is, we can take HTML from
the citation processor and go directly to org-element objects, without
producing and re-parsing citations in Org format.

> The current contenders for a citeproc are Zotero and Pandoc.
>
> Has anyone looked at https://pypi.python.org/pypi/citeproc-py/
> or https://github.com/inukshuk/citeproc-ruby
>
> The ruby one looks pretty advanced.

I haven't looked at them closely.  My impression was that the Python
version was quite incomplete; and unfortunately, I don't know Ruby, so I
would be the wrong person to evaluate it (or write code for it).

Best,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-06 23:24 ` Richard Lawrence
@ 2015-12-06 23:45   ` Richard Lawrence
  2015-12-07 11:56     ` John Kitchin
  2015-12-07 16:18   ` John Kitchin
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Lawrence @ 2015-12-06 23:45 UTC (permalink / raw)
  To: John Kitchin, Org Mode

Richard Lawrence <richard.lawrence@berkeley.edu> writes:

>> IIUC, the current aim is to get a citeproc that will do the following on
>> export:
>> 1. replace in-text citation syntax with org-formatted replacements
>> 2. Insert an org-formatted bibliography somewhere in the document
>> 3. proceed with org-to-something export, with built-in
>> exporters.
>
> That's basically my understanding too.  There is one snag with the
> "org-formatted replacement" plan, though, which I saw in a Zotero dev
> discussion yesterday.  

Here's the reference for that discussion, by the way:

https://groups.google.com/d/msg/zotero-dev/Bz_IenruxX4/24QWuyEIp_IJ

Best,
Richard

P.S.  John, thanks for your continued research on this.  I see that our
procrastination habits are on the same schedule. :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-06 23:45   ` Richard Lawrence
@ 2015-12-07 11:56     ` John Kitchin
  2015-12-07 19:55       ` Richard Lawrence
  0 siblings, 1 reply; 7+ messages in thread
From: John Kitchin @ 2015-12-07 11:56 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: Org Mode

Thanks.

Its an interesting jam. You want to have multiple outputs as a
possibility, but there isn't a robust markup that readily works across
all backends.

What about this. For now consider a bibliography database with
org-formatting in the entries, e.g. subscripts, superscripts, etc...
(but not like putting italics on titles or anything related to
bibliography formatting). So I can have a title like "The role of H_{2}O
in /d/-orbital splitting of \alpha particles" in an entry. I assume it
would also be ok to have utf-8 characters in it. Equations are still
problematic, as we use LaTeX syntax for those.

On export the in-text citations are transformed to unique text blobs,
e.g. uuids, and the document exported. The only important features of
these blobs is that they do not get changed on export, and they are
unique because we replace them later.

The strings in the bibliography entry are "exported" to convert the
org-markup to the output format. The in-text citations, expanded
bibliography and style are sent to the citation processor, which outputs
replacements and a formatted bibliography in the desired output format.

Finally, you replace each uuid with the appropriate replacement, and
insert the bibliography where it belongs. That should be the final
document.

If you did this with a bibtex file, it would probably break its use in
LaTeX without some clever transformation of the bibtex file to a new
file that was LaTeX formatted, and an on the fly change to the org
buffer to use this new file. But, since the point of this is for
non-LaTeX export, I guess this is ok.

I bet you could even expand the bibtex format to include journal
abbreviations, and directly use the fields that CSL uses (although I
strongly dislike "container-title" for the journal name!)

The downside is the processor now needs to output different formats, but
presumably there are a few standard ones that are a one-time investment
like html.


Richard Lawrence writes:

> Richard Lawrence <richard.lawrence@berkeley.edu> writes:
>
>>> IIUC, the current aim is to get a citeproc that will do the following on
>>> export:
>>> 1. replace in-text citation syntax with org-formatted replacements
>>> 2. Insert an org-formatted bibliography somewhere in the document
>>> 3. proceed with org-to-something export, with built-in
>>> exporters.
>>
>> That's basically my understanding too.  There is one snag with the
>> "org-formatted replacement" plan, though, which I saw in a Zotero dev
>> discussion yesterday.
>
> Here's the reference for that discussion, by the way:
>
> https://groups.google.com/d/msg/zotero-dev/Bz_IenruxX4/24QWuyEIp_IJ
>
> Best,
> Richard
>
> P.S.  John, thanks for your continued research on this.  I see that our
> procrastination habits are on the same schedule. :)

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-06 23:24 ` Richard Lawrence
  2015-12-06 23:45   ` Richard Lawrence
@ 2015-12-07 16:18   ` John Kitchin
  1 sibling, 0 replies; 7+ messages in thread
From: John Kitchin @ 2015-12-07 16:18 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: Org Mode


Richard Lawrence writes:

> Hi John,
>
> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
>> Hi all,
>>
>> This is mostly for the people working on citations in org-mode.
>>
>> I have been reading about CSL more this weekend. IIRC, one of the
>> reasons to develop the new citation syntax was to get the ability to
>> have pre/post text in citations more conveniently than what is currently
>> possible.
>
> Yes, that is my understanding, too.
>
>> I have not seen any possibility for this with CSL, however. Is my
>> understanding correct? Is this a problem, or something partially handled
>> by org-export and partially by a citeproc?
>
> The CSL processors I've looked at support prefix and suffix text for
> individual references within a citation.  See, for example, the
> citeproc-js documentation:
>
> http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#citation-data-object
>
> prefix, suffix, and some other fields are supported.  pandoc-citeproc
> supports the same set of fields.

Interesting. I guess these are not standard for all processors? It also
looks like it would be hard to get something like an inline reference
formatted as [1] but refer to Reference 1, e.g. from citenum. It is
possible to have (Kitchin 2007) and (2007) but not a citation reference
to Kitchin that is derived from e.g. a citeauthor command in LaTeX. I am
not raising any objections here, just getting a sense for what is
feasible.

>
> However, my understanding is that neither citeproc-js nor
> pandoc-citeproc support a BibLaTeX-style "common" prefix/suffix that
> belongs to the citation as a whole, rather than the individual
> references within it, as is available in the multi-cite commands.  We
> currently have support for such common prefixes/suffixes in Org syntax.
>
> My solution to this in my org-citeproc wrapper for pandoc-citeproc is to
> prepend the common prefix to the prefix for the first reference in a
> citation, and append the common suffix to the last reference.  This is
> not a great solution, because it is not really defined what kind of
> punctuation (if any) should separate the common prefix from the first
> item's prefix, and so on.  But I figured that was not an important issue
> to address until we actually have people making use of common prefix and
> suffix syntax who are not exporting to LaTeX...

agreed.

>
>> IIUC, the current aim is to get a citeproc that will do the following on
>> export:
>> 1. replace in-text citation syntax with org-formatted replacements
>> 2. Insert an org-formatted bibliography somewhere in the document
>> 3. proceed with org-to-something export, with built-in
>> exporters.
>
> That's basically my understanding too.  There is one snag with the
> "org-formatted replacement" plan, though, which I saw in a Zotero dev
> discussion yesterday.  CSL processing might result in multiple levels of
> formatting, e.g. nested italics like
>
> <em>Something with an internal <em>Title</em></em>
>
> and that won't translate very well back to Org syntax in general:
>
> /Something with an internal /Title//
>
> The suggestion was to just use HTML output, and then parse the HTML to
> get a data structure that could be directly rendered into HTML, LaTeX,
> etc., which support nested italics just fine.  I think we could do this,
> though maybe there's a better solution.  That is, we can take HTML from
> the citation processor and go directly to org-element objects, without
> producing and re-parsing citations in Org format.

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-07 11:56     ` John Kitchin
@ 2015-12-07 19:55       ` Richard Lawrence
  2015-12-08 11:41         ` John Kitchin
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Lawrence @ 2015-12-07 19:55 UTC (permalink / raw)
  To: John Kitchin; +Cc: Org Mode

Hi John,

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> Thanks.
>
> Its an interesting jam. You want to have multiple outputs as a
> possibility, but there isn't a robust markup that readily works across
> all backends.

Yes, indeed.  

> On export the in-text citations are transformed to unique text blobs,
> e.g. uuids, and the document exported. The only important features of
> these blobs is that they do not get changed on export, and they are
> unique because we replace them later.
>
> The strings in the bibliography entry are "exported" to convert the
> org-markup to the output format. The in-text citations, expanded
> bibliography and style are sent to the citation processor, which outputs
> replacements and a formatted bibliography in the desired output format.
>
> Finally, you replace each uuid with the appropriate replacement, and
> insert the bibliography where it belongs. That should be the final
> document.

IIUC, the problem with this approach is that it will not work well when
the citation style is note-based rather than inline.  The main
motivation for going "back to Org" is that note-based styles require the
document structure to change as a result of citation processing: new
footnotes have to be inserted, and existing ones have to be renumbered.
That is relatively hard to do if the rest of the document is already in
the target format (except with LaTeX).  By doing citation processing
early in the export process and converting the results to Org, we can
rely on Org's footnote processing to handle this later in the export
process.

As far as I can see, if it weren't for note-based styles, this approach
would work fine.  (Indeed, it is pretty much what the existing org-cite
code does, except that the mapping between citations and their
replacements is done with Lisp data structures rather than via string
replacement in the output buffer.  I stopped work on that right about
the time I realized the existing approach wouldn't work very well with
note-based styles.)

But given the problem about nested formatting, going back to Org at the
level of text replacements doesn't work.  In other words: both of the
simple-minded approaches (process citations directly to text in the
target format, or process them to Org text, then let Org convert them to
the target format) face problems.

I think probably what we'll have to do to accommodate both note-based
styles and the possibility of nested formatting is to get the results of
citation processing in some unambiguous format like HTML or JSON, then
parse it, and then use the result to directly modify the parse tree for
the Org document before continuing the export process.  I can't see an
easier way...can anyone else?

Best,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More questions about CSL and org-mode
  2015-12-07 19:55       ` Richard Lawrence
@ 2015-12-08 11:41         ` John Kitchin
  0 siblings, 0 replies; 7+ messages in thread
From: John Kitchin @ 2015-12-08 11:41 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: Org Mode


>> On export the in-text citations are transformed to unique text blobs,
>> e.g. uuids, and the document exported. The only important features of
>> these blobs is that they do not get changed on export, and they are
>> unique because we replace them later.
>>
>> The strings in the bibliography entry are "exported" to convert the
>> org-markup to the output format. The in-text citations, expanded
>> bibliography and style are sent to the citation processor, which outputs
>> replacements and a formatted bibliography in the desired output format.
>>
>> Finally, you replace each uuid with the appropriate replacement, and
>> insert the bibliography where it belongs. That should be the final
>> document.
>
> IIUC, the problem with this approach is that it will not work well when
> the citation style is note-based rather than inline.  The main
> motivation for going "back to Org" is that note-based styles require the
> document structure to change as a result of citation processing: new
> footnotes have to be inserted, and existing ones have to be renumbered.
> That is relatively hard to do if the rest of the document is already in
> the target format (except with LaTeX).  By doing citation processing
> early in the export process and converting the results to Org, we can
> rely on Org's footnote processing to handle this later in the export
> process.

I guess I don't understand what note-based citations look like, or why
you would have to renumber footnotes in this process. Does the order
change for some reason? Even if it does, it sounds like this might just
require another pass of calculations to figure out how to replace
things.

Any chance you could send me a document with note-based citations?

One place where text-based replacement doesn't work I guess is outputs
that aren't plain text based. Maybe, for example, to ODT where the
output creates multiple xml files in a zip file?

> As far as I can see, if it weren't for note-based styles, this approach
> would work fine.  (Indeed, it is pretty much what the existing org-cite
> code does, except that the mapping between citations and their
> replacements is done with Lisp data structures rather than via string
> replacement in the output buffer.  I stopped work on that right about
> the time I realized the existing approach wouldn't work very well with
> note-based styles.)
>
> But given the problem about nested formatting, going back to Org at the
> level of text replacements doesn't work.  In other words: both of the
> simple-minded approaches (process citations directly to text in the
> target format, or process them to Org text, then let Org convert them to
> the target format) face problems.
>
> I think probably what we'll have to do to accommodate both note-based
> styles and the possibility of nested formatting is to get the results of
> citation processing in some unambiguous format like HTML or JSON, then
> parse it, and then use the result to directly modify the parse tree for
> the Org document before continuing the export process.  I can't see an
> easier way...can anyone else?

Like getting an xml citation, and then using xslt to translate it to the
format you want? Or something equivalent? Your translation would still
have to be clever to avoid nested syntax, which I guess requires some
recursive parsing of the output.

Modifying the parse tree is more elegant than the replacement text idea.
I have to learn how to do this one day ;)

>
> Best,
> Richard

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-08 11:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-06 21:25 More questions about CSL and org-mode John Kitchin
2015-12-06 23:24 ` Richard Lawrence
2015-12-06 23:45   ` Richard Lawrence
2015-12-07 11:56     ` John Kitchin
2015-12-07 19:55       ` Richard Lawrence
2015-12-08 11:41         ` John Kitchin
2015-12-07 16:18   ` John Kitchin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).