[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
org parser and priorities of inline elements
From: |
Max Nikulin |
Subject: |
org parser and priorities of inline elements |
Date: |
Sat, 27 Nov 2021 19:16:08 +0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 |
On 21/11/2021 16:28, Ihor Radchenko wrote:
Also, is there any reason why we are not simply using punctuation
character class instead of listing punctuation chars explicitly (and
only for English)? What about "_你叫什么名字_?"
It seems punctuation character class is too broad. E.g.
¿ INVERTED QUESTION MARK
normally appears before words, while "?" is usually after them. I do not
see anything special in
(category-set-mnemonics (char-category-set ?¿))
that may help to discriminate such cases.
An example that confuses fontification but not parser:
: false [[http://te.st/dir?b-=&a=-][verbatim]] fontification
It is a simplified example, original one:
Chris Hunt. Bug: Tildes in URL impact visible link text
Sun, 27 Dec 2020 11:44:07 -0500.
https://list.orgmode.org/CAH+Wm4-_XHUZKFTf=ZtbfnCPvQWkbEoeGs8EpYm+8SPmu8LHFg@mail.gmail.com/
Nicolas Goaziou. Thu, 18 Nov 2021 13:35:19 +0100.
87y25l8wvs.fsf@nicolasgoaziou.fr">https://list.orgmode.org/87y25l8wvs.fsf@nicolasgoaziou.fr
Ihor Radchenko writes:
My intuition says that the current parser behaviour is not correct. It
would make more sense to prioritise link over italics. However, it would
require a major change in the parser - instead of a single pass, the
parser may parse different types of objects sequentially. The emphasis
objects should come last avoiding the markers to have different parents.
I disagree. Priority should be given to the first object being started.
This is, IMO, the only sane way to handle syntax.
Origin of such expectation is not only TeX that changes category of
characters for argument of verbatim commands. In markdown links and code
have higher priorities than emphasis as well:
echo 'A _b `c_ d` e_ f' | pandoc -f markdown -t html -
<p>A <em>b <code>c_ d</code> e</em> f</p>
Org:
A _b =c_ d= e_ f
export result (it is more concise and easier to read than output of
`org-element-parse-secondary-string'):
<p>
A <span class="underline">b =c</span> d= e_ f
</p>
Link in markdown:
echo 'A _b c <https://orgmode.org/index.htm_?k=v> d e_ f' \
| pandoc -f markdown -t html -
<p>A <em>b c <a href="https://orgmode.org/index.htm_?k=v"
class="uri">https://orgmode.org/index.htm_?k=v</a> d e</em> f</p>
Org:
<p>
A <span class="underline">b /c <<a
href="https://orgmode.org/index.htm">https://orgmode.org/index.htm</a></span>?k=v>
d/ e_ f
</p>
I can not estimate efforts necessary to implement priorities of objects
(verbatim - link - emphasis) in org-elements parser since I have not
looked into its code. Comparing the following snippets, I might naively
expect some kind of backtracking:
- A /b *c +d e+ f* g/ h
- A /b *c +d f* e+ h
I admit that I can be wrong and "first wins" approach handles buffer of
incomplete parsed entities in a different way.
P.S. In reStructured text simple nesting is not allowed, maybe it is
possible to use replacements.
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, (continued)
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Nicolas Goaziou, 2021/11/18
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/18
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Nicolas Goaziou, 2021/11/19
- [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/19
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Nicolas Goaziou, 2021/11/19
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/19
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Nicolas Goaziou, 2021/11/20
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/21
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Nicolas Goaziou, 2021/11/22
- Re: [PATCH] Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/23
- org parser and priorities of inline elements,
Max Nikulin <=
- Re: org parser and priorities of inline elements, Nicolas Goaziou, 2021/11/27
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Max Nikulin, 2021/11/19
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Max Nikulin, 2021/11/20
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Ihor Radchenko, 2021/11/21
- Re: c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’, Max Nikulin, 2021/11/21
- [PATCH] org.el: Warning for unsupported markers in `org-set-emphasis-alist', Max Nikulin, 2021/11/23