emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH][oc-csl] Improve reference parsing


From: Ihor Radchenko
Subject: Re: [PATCH][oc-csl] Improve reference parsing
Date: Thu, 19 Jan 2023 09:56:50 +0000

András Simonyi <andras.simonyi@gmail.com> writes:

> As for the question of other elements, I proposed the custom
> backend-based approach because CSL has its own rich-text markup (which
> is actually not simply a subset of Org's, for example, it contains
> small-caps, which is not in Org), and, consequently, Citeproc-el has
> its own internal rich-text representations (ASTs), on which it
> performs the operations that are prescribed by the various CSL styles.
> When the rich text citation/bibliography is finalized, it can be
> "serialized" or "formatted" (analogously to Org's exporting a parse
> tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or
> Org. As the prefix, suffix and the locator also need to be operated on
> by the processor (concatenated to other rich text elements etc.,),
> they also have to be parsed into CIteproc el's internal rich-text
> representations. Since this is a given, the only question is in what
> format should they be passed, and the simple HTML-like standard which
> is already supported by Citeproc-el (see
> https://www.zotero.org/support/kb/rich_text_bibliography) seems to be
> the simplest solution.

So, do I understand correctly that italics, bold, subscript,
superscript, small-caps, and nocase must be passed to the CSL processor
in a format understood by CSL? Everything else could just be left in Org
and later exported according to actual export settings?

> Ihor Radchenko <yantar92@posteo.net> wrote:
>> Could you please explain in more details why CSL require special
>> export of the prefix/suffix? What will happen if we simply pass the Org
>> markup verbatim?
>
> Since Citeproc-el assumes that all formatting in the prefix/suffix is
> in the HTML-like markup mentioned above, any Org markup would be
> treated as plain text which should be preserved as is, and not
> interpreted as formatting, so, for example, when an Org document with
> underlined text in a citation prefix were exported to LaTeX then the
> Citeproc LaTeX formatter would escape the underscore characters ("\_")
> to preserve them in the output and the citation would be inserted in
> this form into the resulting LaTeX document.

What if we pass Org constructs as verbatim html? That way, LaTeX
formatter should not alter the text.

>> I am asking because org-cite-csl-render-citation uses
>> org-cite-parse-objects so, unless citeproc does something terrible with
>> the original Org syntax, we can re-parse the output string and export
>> appropriately according to the current export backend.
>
> See above, unfortunately, this wouldn't work, at least not in a
> general and safe way.

May we:
1. Convert the Org markup supported by CSL into CSL-understood HTML
format
2. Convert all other Org markup into verbatim
3. Convert back non-verbatim markup altered by CSL into Org
4. Perform exporting Org->current export backend as usual.

(In the worst case scenario, we might replace non-convertable Org markup
constructs into dummy text and later replace the dummies back into
original Org markup)

WDYT?

Also, small-caps and nocase are currently not supported by Org. Maybe it
would make sense to document how to pass these constructs to CSL
properly.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]