#+TITLE: Citation syntax, a revised proposal #+DATE: <2015-02-14 Sat> #+AUTHOR: Richard Lawrence #+EMAIL: address@hidden #+LANGUAGE: en #+SELECT_TAGS: export #+EXCLUDE_TAGS: noexport * Citation syntax ** Requirements A citation is a textual reference to one or more individual works, together with other information about those works, grouped together in a single place. Within a citation, each reference to an individual work needs to be capable of containing: 1) a database key that references the cited work 2) prefix / pre-note 3) suffix / post-note Whole citations also need: 4) address@hidden a way of specifying whether the citation is in-text or parenthetical 5) a way of representing a common prefix and suffix, if the citation is a multi-cite 6) a way of specifying whether the citation should produce a complete bibliography entry in-place 7) an extensible way of specifying formatting properties to export filters and/or specific export backends ** Citation definitions *** Citation keys; bibliography references vs. complete entries A citation key consists of a unique label preceded by a flag, which is optionally preceded by a hyphen. The flag is either `@' or `&'. `@' indicates that the citation should produce a normal reference to the bibliography entry for the cited work (in whatever style the document uses), located elsewhere. The `&' flag indicates that the citation should produce a complete bibliography entry for the cited work in the place where the citation appears. The optional hyphen (`-') indicates that the author's name should be suppressed from the rendered citation. (Note that this is only useful in author-X citation styles; it should have no effect in numeric styles.) *** Basic citations: Parenthetical vs. in-text There are two basic types of citation: /parenthetical/ and /in-text/. Each of these may contain references to one or more individual works. The difference between parenthetical and in-text citations is expressed using parentheses around the /first/ citation key. A parenthetical citation has such parentheses around the first citation key; an in-text citation lacks them. (Parentheses around non-initial keys are permitted for visual consistency and to keep the grammar simple, but have no meaning.) A citation thus consists in general of a bracketed list, beginning with `cite:', of one or more individual references, each of which: - may contain a prefix, - must contain a citation key, which may or may not be surrounded by `(...)' - and may contain a suffix Individual references are separated by semi-colons. There are also two special cases to make simple-but-common uses very easy to type and read: 1) a parenthetical citation for a single work with no prefix and suffix may be written by just surrounding the key with brackets, like: address@hidden 2) an in-text citation for a single work with no prefix and suffix may be written as a /bare/ key, without brackets, like: @Doe99. (Thus, in both of the `simple' cases, one less level of bracketing is required.) Prefix and suffix text are regular Org text, which are allowed to contain various kinds of Org markup (see the grammar below for a complete list). *** Multi-cite citations Multi-cite citations are distinguished from basic parenthetical and in-text citations by the presence of an optional common prefix or common suffix (which may not contain keys). If present, the common prefix must occur before the first individual reference, and the common suffix must occur after the last individual reference. The common prefix and suffix are separated from the individual references by semi-colons. *** Examples of main citation syntax Basic parenthetical citation: #+BEGIN_QUOTE The nineteenth century was very interesting. [cite: (@Doe99)] #+END_QUOTE Basic parenthetical citation using special-case syntax: #+BEGIN_QUOTE The nineteenth century was very interesting. address@hidden #+END_QUOTE Parenthetical citation with multiple works and prefix and suffix: #+BEGIN_QUOTE The nineteenth century was in fact lovely [cite: see (@Doe99) p. 44; @Smith2000 has a review]. #+END_QUOTE Basic in-text citation with a suffix: #+BEGIN_QUOTE As [cite: @Doe99 p. 44] says, the nineteenth century was very interesting. #+END_QUOTE In-text citation using special-case syntax: #+BEGIN_QUOTE @Doe2000 explains that the twentieth century was even more interesting. #+END_QUOTE In-text citation with author suppressed: #+BEGIN_QUOTE As Doe explained in his address@hidden, the twentieth century was somewhat less interesting than previously thought. #+END_QUOTE Parenthetical citation with full-entry key: #+BEGIN_QUOTE A complete bibliography entry follows in parentheses. [cite: (&Doe99)] A complete bibliography entry follows in parentheses. [&Doe99] #+END_QUOTE In-text citation with full-entry key: #+BEGIN_QUOTE A complete bibliography entry follows: [cite: &Doe99]. A complete bibliography entry follows: &Doe99. #+END_QUOTE Full-entry in-text citation, in a footnote: #+BEGIN_QUOTE Doe exhibits unusual scholarship.[fn:: &Doe99.] #+END_QUOTE In-text citation, with a complete bibliography entry minus the author in a footnote, plus a suffix: #+BEGIN_QUOTE @Doe99 exhibits unusual scholarship.[fn:1] [fn:1] [cite: -&Doe99 Cf. especially section 4.] #+END_QUOTE In-text multi-cite: #+BEGIN_QUOTE Speculation abounds about what the twenty-first century will bring. [cite: For an overview of this topic, see; @Smith1998; @Jones1999; @Miller2001; and references therein.] #+END_QUOTE Parenthetical multi-cite: #+BEGIN_QUOTE Speculation abounds about what the twenty-first century will bring. [cite: For an overview of this topic, see; (@Smith1998); @Jones1999; @Miller2001; and references therein.] #+END_QUOTE *** Syntax for extensions Additional information can be supplied in a citation that may affect how export filters or particular backends format it. This additional information may be supplied following the brackets of a citation between the following delimiters: `%%( ... )'. (Note: I am proposing that this expression go /after/ the main citation brackets both because it visually separates this extra information from the main citation, and in order to avoid imposing any further syntactic restriction on suffixes.) At least for now, any information supplied this way is /strictly the user's responsibility/ to interpret (e.g., using an export filter). This means that citations that have information like this are not portable and might not be exported correctly: - in other users' setups - by particular backends - by future versions of Org I will not deal with the details of how this additional information should be syntactically represented, since this has not really been discussed. But I suggest that, to deal with the complexities of additional information in full generality, something like a complete Lisp list is required. Thus, I suggest that this additional information simply be represented as a Lisp list. (Besides generality, this has the benefit of making the syntax easy to parse: the parser can just call Elisp's read function with a marker after the `%%'.) I provide these examples merely to illustrate the possibilities here: #+BEGIN_QUOTE @vonNeumann1930 %%(:type genitive :capitalize t) model can only handle a limited range of observed cases. @McCarthy1950 %%('s) clever use of Lisp syntax was also used to express the Saxon genitive. For more, see Ref. @Doe99 %%(:type refnum :follow-to "some.pdf"). Even more complicated examples occur after Doe's famous article from [cite: @Doe99] %%(:type date-only). And in [cite: @Doe2000] %%(:attr_latex (:format-string "\citeyear{%KEY}") :attr_html (:only-fields (month year))), Doe finally realized that arbitrary complexity was a powerful but double-edged sword. @_aParticularlyUGLYkey:is-this-one %%(:overlay "Nice Display") #+END_QUOTE ** Grammar This section formally documents the syntax of citations discussed above. To represent the syntax of citations, we need a category of /citation/ objects, which require the following properties (the names here are not important and could be changed): - is-parenthetical (boolean; nil means is in-text) - common-prefix (text) - common-suffix (text) - references (list) - extra-info (list) Each reference in the list of references should be a plist with the following properties: - prefix (text) - suffix (text) - key (string) - is-parenthesized (boolean; t means key was parenthesized; only significant for the first reference in a citation) - suppress-author (boolean; t means author name should not be output) - is-full (boolean; t means a full bibliography entry should be output in-place) The category of citations has the following grammar: - A CITATION is a PARENTHETICAL-CITATION or an IN-TEXT citation. - A PARENTHETICAL-CITATION is either a SIMPLE-PARENTHETICAL or a CITATION-LIST whose first individual INDIVIDUAL-REFERENCE is a PARENTHESIZED-KEY - An IN-TEXT-CITATION is either a SIMPLE-IN-TEXT, or a CITATION-LIST whose first INDIVIDUAL-REFERENCE is a BARE-KEY. - A SIMPLE-PARENTHETICAL is a KEY immediately surrounded by square brackets, optionally followed by an EXTRA-INFO clause. - A SIMPLE-IN-TEXT is a BARE-KEY, optionally followed by an EXTRA-INFO clause - A CITATION-LIST has the format [cite: PREFIX; INDIVIDUAL-REFERENCE; ... INDIVIDUAL-REFERENCE; SUFFIX] EXTRA-INFO where the initial PREFIX, final SUFFIX, and EXTRA-INFO clause are optional. At least one INDIVIDUAL-REFERENCE must be present. - An INDIVIDUAL-REFERENCE has the format: PREFIX KEY-MAYBE-PARENS SUFFIX The KEY-MAYBE-PARENS is obligatory, and the prefix and suffix are optional. - A KEY-MAYBE-PARENS is either a BARE-KEY or PARENTHESIZED-KEY - A BARE-KEY is a KEY with immediately-preceding whitespace - A PARENTHESIZED-KEY is a KEY immediately surrounded by `(' and `)'. - A KEY optionally begins with `-', and obligatorily contains `@' or `&' followed by a string of characters which begins with a letter or `_', and may contain alphanumeric characters and the following internal punctuation characters: :.#$%&-+?<>~/ - A PREFIX or SUFFIX is arbitrary text (except `;', `]', and KEY-MAYBE-PARENs) which may contain only the following Org objects: - bold - code - entity - italic - latex-fragment - line-break - strike-through - subscript - superscript - underline - superscript (Note that this list could be extended somewhat if necessary.) - An EXTRA-INFO clause consists of data not specified by this grammar, in between `%%(' and `)' ** Outstanding issues It seems to me that there are potential problems with the above proposal in a number of areas, but I cannot tell how serious they are, or what changes (if any) should be made to solve them. I don't pretend that this is an exhaustive list: 1) *Nesting.* I have favored LaTeX compatibility for in-text citations with multiple references; but this means there is no way to `nest' citations. Thus, there is no way to express (in the main syntax) what Pandoc expresses as: @Doe99 [p. 34; see also @DoeRoe2000] which renders like: Doe (1999, p. 34; see also Doe and Roe 2000) Instead, since a citation is in-text or parenthetical as a whole, the equivalent in the above syntax [cite: @Doe99 p. 34; see also @DoeRoe2000] should render like: Doe (1999, p. 34), see also Doe and Roe (2000). I am not certain if Pandoc-like output is important in this case. The few people who commented on this said that it was not. 2) *Limitations on prefixes and suffixes.* There may be legitimate uses of `@', `;', `]', etc. inside prefix or suffix text that the above syntax does not allow. Examples might include: - use of semi-colons as part of the prefix/suffix text - footnotes, links, or timestamps inside a prefix/suffix I am not certain how important these cases are. If they are important, some of them might be able to be worked around with entities. 3) *Edge cases.* The above syntax may make it possible to express things that don't make sense, or would be too difficult to export. The only one I can think of is that it is possible to mix `@'-style and `&'-style keys in the same citation. I am not sure if this should be forbidden; it may sometimes make sense. It may also be possible to express things that external tools, such as citeproc-js, don't know how to process. I do not have a good sense of what, if anything, falls into that category, and what should be done about it. 4) *Citation commands.* Rather than introduce an explicit representation for different citation commands/types, I have used different parts of the syntax to express the common distinctions that people mentioned. I suggest that, for now, anything beyond these basic distinctions be left to the user-extension syntax. However, if it becomes clear in the future that there is a need to add a representation for a command to the main syntax, there is a natural place to do so: immediately after the `cite:' tag (as Nicolas suggested). Also, I have not said anything in this proposal to address how other document metadata should be represented, which has not been discussed much on the list. I think this should be discussed separately.