emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol


From: Max Nikulin
Subject: Re: [PATCH] Add new entity \-- serving as markup separator/escape symbol
Date: Fri, 29 Jul 2022 09:50:58 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 29/07/2022 08:43, Ihor Radchenko wrote:
Max Nikulin writes:

The good point in your patch is that \- is still work as shy hyphen
(that, by the way, may be used in some cases instead of zero width
space: *intra*\-word). On the other hand I have managed to find a case
when your approach is not ideal:

*\--scratch\--*

<p>
<b>&#x00ad;-scratch</b></p>

Well. I think that it is impossible to use the same escape construct to
both force emphasis and escape it.

Let's articulate the problem as follows: when some characters ("*". "/". etc.) besides used literally are overloaded with 2 additional roles that are start emphasis group and terminate emphasis group, in addition to lightweight markup heuristics, it is necessary to provide a way to disambiguate which of 3 roles is associated with particular character.

"Activate" and "deactivate" characters or entities for emphasis markers are alternative and perhaps not so clear terms have used before.

The advantage of zero width space is that "[:space:]" is part of PREMATCH and POSTMATCH (outer) regexps in `org-emphasis-regexp-components' and "[:space:]" is forbidden at the inner borders of emphasized span of text. The latter is mostly meaningful, however I am unsure if bold space has the same width as regular one, and space in fixed width font is certainly distinct.

The problem with the "\--" entity is that it is not handled properly at the start of emphasis region. It neither disables emphasis nor parsed as complete entity, instead it becomes combination of "\-" shy hyphen and literal "-".

Unsure if it can be solved consistently. Possible ways:
- It addition to space-like (in respect to current regexp) entity add another one that acts as a part of word, but like "\--" stripped from output. Likely it should be accompanied by more changes in the parser and regexps. - Provide some new explicit syntax for literal character, start of emphasis group, end of emphasis group.

Concerning zero width space workaround, I may be wrong, but Nicolas might consider using U+200B zero width space as the escape character for itself: single one is filtered out during export, double zero width space becomes single character. (I do not like this kind of "white space" programming language".) Another question is whether U+2060 word joiner (or some other character) should be added either as alternative to zero width space or to allow = verbatim = fixed width text surrounded by fixed width spaces.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]