From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Kitchin Subject: Re: Citation syntax: a revised proposal Date: Sun, 15 Feb 2015 15:49:44 -0500 Message-ID: References: <87k2zjnc0e.fsf@berkeley.edu> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:54184) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YN68c-0005II-QJ for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 15:50:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YN68U-0004Hi-8I for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 15:49:58 -0500 Received: from smtp.andrew.cmu.edu ([128.2.157.37]:44460) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YN68U-0004Hc-0x for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 15:49:50 -0500 In-reply-to: <87k2zjnc0e.fsf@berkeley.edu> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Richard Lawrence Cc: emacs-orgmode@gnu.org I guess this looks workable. The syntax is generally more verbose than I am accustomed to, and less explicit in my (latex centric) opinion. But, the majority of our citations would be the simplest form, which is maybe even shorter. It looks like the citation insertion could be practically automated, e.g. a command with a convenient key-binding looks up references in some database, selects them, and then inserts them as properly formatted citations at point. I think the usual suspects reftex, helm-bibtex, and probably ebib could be taught to output most of this syntax for whatever type, and they could give human readable hints about the intended format, e.g. intext, parenthetical, noauthor, etc... Or you could have dedicated commands with key completion to do that. So many options, this should not be an issue. Presumably each &/@key will be clickable like a link, and the function it runs would get the key (and maybe additional info about the cite)? If not, that would be a show-stopper to me. Not because of any syntax reason, but a functional one. Right now every link-based citation is a one click gateway to every scientific search engine I know, the pdf, the bibtex entry, functions that copy citations, summarize citations, email citations, etc... for the key that was clicked on. That is too useful to give up for any syntax. But if each key was clickable, to a user-definable, or default function (which might even be nothing), then no problem. Probably the user should have to define this function, since a key is no good without getting information about where the database is from somewhere (e.g. a variable or in the file). I gather we are split between bibtex, org-bibtex, and zotero as backends, and maybe there are others (RIS, Mendeley, ...). Maybe one day there is good support for all of them. I have tried them all, and for 15+ years, I keep coming back to bibtex ;) I am a little concerned about what the latex export will eventually look like. Out of the box, I suppose a handful of types will be pretty well supported, something like: (cite, citet, citep, citeauthor, citeyear, citenum) which I suspect would cover many people's needs. There is no question in my mind that some people will want to extend this, as there are just too few of the latex citation commands supported out of the box, especially for biblatex users (who used that because of limitations in bibtex ;). My sense is the syntax may then be too verbose, and difficult to write exporters for and they would go back to links. That is probably a small number of people, and maybe I am wrong about it. I am anyway supportive enough to see it tried out. My final comment is that I suggest two additional things to go with this syntax: [bibliographystyle: some-kind-of-information-probably-unsrt/alpha/chicago] This would tell some backend how to style the bibliography entries. This does not need to be clickable (I don't know what a click would do anyway, at most select the style? edit the style?). [bibliography: @some-kind-of-source-probably-a-file; @maybe-more-than-one] This is where the keys are stored. And, it would also indicate where the bibliography should actually be placed. This should also be clickable, with a default action to just open the file that was clicked on. I prefer those to file attributes, e.g. #+BIBLIOGRAPHY: @some-kind-of-source-probably-a-file; @maybe-more-than-one I don't think that can be used to specify where a bibliography should be placed, and it doesn't make sense to me to use two things to specify the same information. Each of these would need customizable export. I am pretty sure those are backend independent (even though I took the names straight from LaTeX ;). With Endnote/Word for example you have to choose a bibliography and style. So, overall, I am on the positive side of zero. Richard Lawrence writes: > Hi everyone, > > Since discussion seems to have petered out on the previous thread (see: > http://thread.gmane.org/gmane.emacs.orgmode/94524), I took some time to > go back over the discussion and write up a concrete proposal for > citation syntax. > > This proposal represents my attempt to formulate a syntax that is easy > to read, easy to parse, and covers all the use-cases that people > mentioned as being important. It is surely not perfect, but I learned a > lot from the previous thread, and I hope something like this will serve > the community's needs. > > The proposal is below, both inline (for easy quoting) and attached (for > easy reading). To keep it relatively short, I have mostly not explained > my reasoning for the choices I made, but I am happy to do so here if > anyone has questions. > > I welcome feedback, comments, criticisms, and objections on any point. > However, since we've already had a long discussion about this, I > respectfully request that we try to keep this thread focused. To that > end, I suggest: > > 1) If you have criticisms or objections, please try to indicate > whether you think they are `substantive' (e.g., you see a problem > that would prevent you from using this syntax, or prevent Org from > implementing it) or not (e.g., you would prefer a slightly > different but equivalent way of expressing something). > > 2) If you wish to express an opinion about the proposal without > offering further comments, let us know by just replying with +1 > (meaning you'd like to see this syntax, or something reasonably > similar to it, be adopted), 0, or -1 (meaning you'd prefer not to > see this syntax or anything similar to it adopted). > > I guess this is my Valentine to the Org community. :) Thanks for reading! > > Best, > Richard > > #+TITLE: Citation syntax, a revised proposal > #+DATE: <2015-02-14 Sat> > #+AUTHOR: Richard Lawrence > #+EMAIL: richard.lawrence@berkeley.edu > #+LANGUAGE: en > #+SELECT_TAGS: export > #+EXCLUDE_TAGS: noexport > > * Citation syntax > ** Requirements > A citation is a textual reference to one or more individual works, > together with other information about those works, grouped together in > a single place. > > Within a citation, each reference to an individual work needs to be > capable of containing: > 1) a database key that references the cited work > 2) prefix / pre-note > 3) suffix / post-note > > Whole citations also need: > 4) [@4] a way of specifying whether the citation is in-text or > parenthetical > 5) a way of representing a common prefix and suffix, if the citation > is a multi-cite > 6) a way of specifying whether the citation should produce a > complete bibliography entry in-place > 7) an extensible way of specifying formatting properties to export > filters and/or specific export backends > > ** Citation definitions > *** Citation keys; bibliography references vs. complete entries > A citation key consists of a unique label preceded by a flag, which is > optionally preceded by a hyphen. > > The flag is either `@' or `&'. `@' indicates that the citation should > produce a normal reference to the bibliography entry for the cited > work (in whatever style the document uses), located elsewhere. > > The `&' flag indicates that the citation should produce a complete > bibliography entry for the cited work in the place where the citation > appears. > > The optional hyphen (`-') indicates that the author's name should be > suppressed from the rendered citation. (Note that this is only useful > in author-X citation styles; it should have no effect in numeric > styles.) > > *** Basic citations: Parenthetical vs. in-text > There are two basic types of citation: /parenthetical/ and /in-text/. > Each of these may contain references to one or more individual works. > > The difference between parenthetical and in-text citations is > expressed using parentheses around the /first/ citation key. A > parenthetical citation has such parentheses around the first citation > key; an in-text citation lacks them. (Parentheses around non-initial > keys are permitted for visual consistency and to keep the grammar > simple, but have no meaning.) > > A citation thus consists in general of a bracketed list, beginning > with `cite:', of one or more individual references, each of which: > - may contain a prefix, > - must contain a citation key, which may or may not be surrounded by `(...)' > - and may contain a suffix > Individual references are separated by semi-colons. > > There are also two special cases to make simple-but-common uses very > easy to type and read: > 1) a parenthetical citation for a single work with no prefix and > suffix may be written by just surrounding the key with brackets, > like: [@Doe99]. > 2) an in-text citation for a single work with no prefix and suffix > may be written as a /bare/ key, without brackets, like: @Doe99. > (Thus, in both of the `simple' cases, one less level of bracketing is > required.) > > Prefix and suffix text are regular Org text, which are allowed to > contain various kinds of Org markup (see the grammar below for a > complete list). > > *** Multi-cite citations > Multi-cite citations are distinguished from basic parenthetical and > in-text citations by the presence of an optional common prefix or > common suffix (which may not contain keys). If present, the common > prefix must occur before the first individual reference, and the > common suffix must occur after the last individual reference. The > common prefix and suffix are separated from the individual references > by semi-colons. > > *** Examples of main citation syntax > Basic parenthetical citation: > #+BEGIN_QUOTE > The nineteenth century was very interesting. [cite: (@Doe99)] > #+END_QUOTE > > Basic parenthetical citation using special-case syntax: > #+BEGIN_QUOTE > The nineteenth century was very interesting. [@Doe99] > #+END_QUOTE > > Parenthetical citation with multiple works and prefix and suffix: > #+BEGIN_QUOTE > The nineteenth century was in fact lovely [cite: see (@Doe99) p. 44; > @Smith2000 has a review]. > #+END_QUOTE > > Basic in-text citation with a suffix: > #+BEGIN_QUOTE > As [cite: @Doe99 p. 44] says, the nineteenth century was very interesting. > #+END_QUOTE > > In-text citation using special-case syntax: > #+BEGIN_QUOTE > @Doe2000 explains that the twentieth century was even more interesting. > #+END_QUOTE > > In-text citation with author suppressed: > #+BEGIN_QUOTE > As Doe explained in his -@Doe2003, the twentieth century was somewhat > less interesting than previously thought. > #+END_QUOTE > > Parenthetical citation with full-entry key: > #+BEGIN_QUOTE > A complete bibliography entry follows in parentheses. [cite: (&Doe99)] > A complete bibliography entry follows in parentheses. [&Doe99] > #+END_QUOTE > > In-text citation with full-entry key: > #+BEGIN_QUOTE > A complete bibliography entry follows: [cite: &Doe99]. > A complete bibliography entry follows: &Doe99. > #+END_QUOTE > > Full-entry in-text citation, in a footnote: > #+BEGIN_QUOTE > Doe exhibits unusual scholarship.[fn:: &Doe99.] > #+END_QUOTE > > In-text citation, with a complete bibliography entry minus the author > in a footnote, plus a suffix: > #+BEGIN_QUOTE > @Doe99 exhibits unusual scholarship.[fn:1] > > [fn:1] [cite: -&Doe99 Cf. especially section 4.] > #+END_QUOTE > > In-text multi-cite: > #+BEGIN_QUOTE > Speculation abounds about what the twenty-first century will > bring. [cite: For an overview of this topic, see; @Smith1998; > @Jones1999; @Miller2001; and references therein.] > #+END_QUOTE > > Parenthetical multi-cite: > #+BEGIN_QUOTE > Speculation abounds about what the twenty-first century will > bring. [cite: For an overview of this topic, see; (@Smith1998); > @Jones1999; @Miller2001; and references therein.] > #+END_QUOTE > > *** Syntax for extensions > Additional information can be supplied in a citation that may affect > how export filters or particular backends format it. > > This additional information may be supplied following the brackets of > a citation between the following delimiters: `%%( ... )'. > > (Note: I am proposing that this expression go /after/ the main > citation brackets both because it visually separates this extra > information from the main citation, and in order to avoid imposing any > further syntactic restriction on suffixes.) > > At least for now, any information supplied this way is /strictly the > user's responsibility/ to interpret (e.g., using an export filter). > This means that citations that have information like this are not > portable and might not be exported correctly: > - in other users' setups > - by particular backends > - by future versions of Org > > I will not deal with the details of how this additional information > should be syntactically represented, since this has not really been > discussed. But I suggest that, to deal with the complexities of > additional information in full generality, something like a complete > Lisp list is required. Thus, I suggest that this additional > information simply be represented as a Lisp list. (Besides > generality, this has the benefit of making the syntax easy to parse: > the parser can just call Elisp's read function with a marker after the > `%%'.) > > I provide these examples merely to illustrate the possibilities here: > #+BEGIN_QUOTE > @vonNeumann1930 %%(:type genitive :capitalize t) model can only handle > a limited range of observed cases. > > @McCarthy1950 %%('s) clever use of Lisp syntax was also used to > express the Saxon genitive. > > For more, see Ref. @Doe99 %%(:type refnum :follow-to "some.pdf"). > > Even more complicated examples occur after Doe's famous article from > [cite: @Doe99] %%(:type date-only). > > And in [cite: @Doe2000] %%(:attr_latex (:format-string > "\citeyear{%KEY}") :attr_html (:only-fields (month year))), Doe > finally realized that arbitrary complexity was a powerful but > double-edged sword. > > @_aParticularlyUGLYkey:is-this-one %%(:overlay "Nice Display") > #+END_QUOTE > > ** Grammar > This section formally documents the syntax of citations discussed > above. > > To represent the syntax of citations, we need a category of /citation/ > objects, which require the following properties (the names here are not > important and could be changed): > - is-parenthetical (boolean; nil means is in-text) > - common-prefix (text) > - common-suffix (text) > - references (list) > - extra-info (list) > > Each reference in the list of references should be a plist with the > following properties: > - prefix (text) > - suffix (text) > - key (string) > - is-parenthesized (boolean; t means key was parenthesized; only > significant for the first reference in a citation) > - suppress-author (boolean; t means author name should not be output) > - is-full (boolean; t means a full bibliography entry should be > output in-place) > > The category of citations has the following grammar: > - A CITATION is a PARENTHETICAL-CITATION or an IN-TEXT citation. > - A PARENTHETICAL-CITATION is either a SIMPLE-PARENTHETICAL or a > CITATION-LIST whose first individual INDIVIDUAL-REFERENCE is a > PARENTHESIZED-KEY > - An IN-TEXT-CITATION is either a SIMPLE-IN-TEXT, or a > CITATION-LIST whose first INDIVIDUAL-REFERENCE is a BARE-KEY. > - A SIMPLE-PARENTHETICAL is a KEY immediately surrounded by square > brackets, optionally followed by an EXTRA-INFO clause. > - A SIMPLE-IN-TEXT is a BARE-KEY, optionally followed by an > EXTRA-INFO clause > - A CITATION-LIST has the format > [cite: PREFIX; INDIVIDUAL-REFERENCE; ... INDIVIDUAL-REFERENCE; SUFFIX] EXTRA-INFO > where the initial PREFIX, final SUFFIX, and EXTRA-INFO clause are > optional. At least one INDIVIDUAL-REFERENCE must be present. > - An INDIVIDUAL-REFERENCE has the format: > PREFIX KEY-MAYBE-PARENS SUFFIX > The KEY-MAYBE-PARENS is obligatory, and the prefix and suffix > are optional. > - A KEY-MAYBE-PARENS is either a BARE-KEY or PARENTHESIZED-KEY > - A BARE-KEY is a KEY with immediately-preceding whitespace > - A PARENTHESIZED-KEY is a KEY immediately surrounded by `(' and `)'. > - A KEY optionally begins with `-', and obligatorily contains `@' or > `&' followed by a string of characters which begins with a letter > or `_', and may contain alphanumeric characters and the following > internal punctuation characters: > :.#$%&-+?<>~/ > - A PREFIX or SUFFIX is arbitrary text (except `;', `]', and > KEY-MAYBE-PARENs) which may contain only the following Org > objects: > - bold > - code > - entity > - italic > - latex-fragment > - line-break > - strike-through > - subscript > - superscript > - underline > - superscript > (Note that this list could be extended somewhat if necessary.) > - An EXTRA-INFO clause consists of data not specified by this > grammar, in between `%%(' and `)' > > ** Outstanding issues > It seems to me that there are potential problems with the above > proposal in a number of areas, but I cannot tell how serious they are, > or what changes (if any) should be made to solve them. I don't > pretend that this is an exhaustive list: > 1) *Nesting.* I have favored LaTeX compatibility for in-text > citations with multiple references; but this means there is no > way to `nest' citations. Thus, there is no way to express (in > the main syntax) what Pandoc expresses as: > @Doe99 [p. 34; see also @DoeRoe2000] > which renders like: > Doe (1999, p. 34; see also Doe and Roe 2000) > Instead, since a citation is in-text or parenthetical as a whole, > the equivalent in the above syntax > [cite: @Doe99 p. 34; see also @DoeRoe2000] > should render like: > Doe (1999, p. 34), see also Doe and Roe (2000). > I am not certain if Pandoc-like output is important in this case. > The few people who commented on this said that it was not. > 2) *Limitations on prefixes and suffixes.* There may be legitimate > uses of `@', `;', `]', etc. inside prefix or suffix text that the > above syntax does not allow. Examples might include: > - use of semi-colons as part of the prefix/suffix text > - footnotes, links, or timestamps inside a prefix/suffix > I am not certain how important these cases are. If they are > important, some of them might be able to be worked around with > entities. > 3) *Edge cases.* The above syntax may make it possible to express > things that don't make sense, or would be too difficult to > export. The only one I can think of is that it is possible to > mix `@'-style and `&'-style keys in the same citation. I am not > sure if this should be forbidden; it may sometimes make sense. > It may also be possible to express things that external tools, > such as citeproc-js, don't know how to process. I do not have a > good sense of what, if anything, falls into that category, and > what should be done about it. > 4) *Citation commands.* Rather than introduce an explicit > representation for different citation commands/types, I have used > different parts of the syntax to express the common distinctions > that people mentioned. I suggest that, for now, anything beyond > these basic distinctions be left to the user-extension syntax. > However, if it becomes clear in the future that there is a need > to add a representation for a command to the main syntax, there > is a natural place to do so: immediately after the `cite:' tag > (as Nicolas suggested). > > Also, I have not said anything in this proposal to address how other > document metadata should be represented, which has not been discussed > much on the list. I think this should be discussed separately. > -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu