emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Citations, continued


From: Richard Lawrence
Subject: Re: [O] Citations, continued
Date: Fri, 06 Feb 2015 14:41:19 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Hi Nicolas and all,

Nicolas Goaziou <address@hidden> writes:

> Richard Lawrence <address@hidden> writes:
>
> Thanks for this reverse engineering.
>
>> Specifically I think we need the following categories, all of which
>> would be objects:
>>   - key
>>   - prefix / pre-text
>>   - suffix / post-text
>>   - locator
>>   - individual citation
>>   - bracketed citation
>>   - unbracketed citation
>>
>> These should have a grammar like the following, based on my
>> (reverse-engineered) understanding of the Pandoc syntax for citations:
>>
>>   - A bracketed citation is a list of one or more individual citations, 
>>     separated by ';' if there are two or more, and surrounded by '[' ']'
>>   - An individual citation is formatted like: PREFIX KEY LOCATOR SUFFIX
>>     The key is obligatory, and the prefix, locator and suffix
>>     are optional.
>>   - A key optionally begins with '-', and obligatorily contains '@'
>>     followed by a string of charcters which begins with a letter or '_',
>>     and may contain alphanumeric characters and the following internal
>>     punctuation characters:
>>        :.#$%&-+?<>~/
>>   - A prefix or suffix is a text object (that may contain markup like
>>     emphasis or macros)
>>   - An unbracketed citation consists of a key, optionally followed by a
>>     locator which is enclosed in '[' ']'
>
> I don't think all should be objects. For example, prefix and suffix can
> be properties in a `full-citation' object (like :tag in items).

Yes, sorry, this was dumb of me...for some reason, I was thinking
"everything in Org syntax has to be an object or an element, and these
aren't elements, so they're objects".  But obviously, some of these
categories are merely internal or merely represent properties of
objects.

> IIUC, we need three objects (I'm not wedded to the names):
>
>   - short-citation (aka unbracketed citation), with :cite-key
>     and :locator properties, both being strings and :suppress-author as
>     a boolean ;
>     
>   - full-citation (aka individual citation), with, in addition to the
>     properties above, :prefix and :suffix, both being parsed string.

> Since full citations can only exist in a bracketed citation, there is no
> reason to create a third object type for the latter. It acts as a mere
> container only useful for lexer.

I think this is not quite right: in my original terminology, `individual
citation' is just an intermediate category.  A bracketed/full citation
contains at least one, but may contain many, `individual' citations,
like:

[See @Doe99, p. 3; also @Doe2000, p. 989.]

This is a bracketed/full citation containing two individual citations,
each with their own prefix and suffix.   

>> I am not sure about the syntax of locators.  In particular, I do not
>> know if they should allow internal markup, I do not know if they have an
>> internal syntax, and I do not know if a comma is required to separate
>> them from a key in a bracketed citation.
>
> This needs to be decided indeed. Is there any reason to allow markup
> there?

I had a look at the Pandoc parser; see:
https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/Markdown.hs
(Citation stuff starts at line 1843.)

My Haskell is about as good as my German (which is to say: not very),
but I think I learned a few interesting things.

First, from what I can tell, there is actually no separate category of
`locators', despite the documentation.  There's just the suffix, which
is anything between the key and `;' or `]'.  (The bareloc function seems
to just look for a regular suffix.  But I could be missing something.)
Thus, maybe we can drop that.

Also, it appears that you can write things like

@Smith99 [p. 33; see also @Doe2014] says something interesting.

That is, an in-text citation with a suffix may also contain further
citations in the brackets, after the suffix for the in-text citations.
  
> My only concern is speed. A bracketed citation can induce a lot of
> backtracking since it can be triggered each time a square bracket is
> opened, which is not too uncommon, I think. Basically, at each "[", we
> need to find corresponding "]", and if there is, any key between the
> two. That's some overhead.

Good point.  I hadn't thought about this at all.

> Also, syntax is ambiguous. For example, in
>
>   [[http://orgmode.org][some @key]]
>
> it is not clear if @key should be treated as a short-citation in a link
> description, or included in a full citation with
> "[http://orgmode.org][some " as its prefix. I mean, the answer is clear
> for you and me, but not necessarily at lexer's level. For example,
> Eric's parser chose the former, which is good, but also disallows square
> brackets in prefix, which rules out some objects from this location
> (mainly links and footnotes).

Yes, good point.  Also, inline export snippets (@@latex: ...@@)
could prove problematic.

I do think it's important to allow some markup in the prefix and suffix,
because there are obvious uses where you might want emphasis, etc.

My initial thought is that a prefix or suffix should only allow:
  - Entities and LaTeX fragments
  - Line breaks?
  - Macros
  - Text markup

I'd also be happy without macros and line breaks, personally.

> That's why I suggested the [cite: ...] part in the first place, which
> you dismissed quickly. It reduces backtracking a lot and can solve
> easily some confusing situations.
>
> Of course I understand the need for compatibility with existing Pandoc
> syntax, but I wouldn't want us to shoot ourselves in the foot. Even if
> we don't use "cite:" markup, I think we should carefully specify current
> syntax to avoid loopholes.

Another interesting thing I learned from the Pandoc source is that,
should we want to adopt "[cite: ...]" syntax, I think it would be pretty
trivial for Pandoc to support it.  (Worst case, they can copy-and-paste
the Markdown citation parser and then add "cite:" in a couple of
places.)  So if this is necessary on the Org side for performance or
ambiguity reasons, I am not against it.

One question, though, is how this should work with in-text citations.
Should I have to write:

@Smith99 [cite:p. 33]

or

@Smith99 [cite:p. 33; see also @Doe2014]

?

Best,
Richard




reply via email to

[Prev in Thread] Current Thread [Next in Thread]