emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Trailing whitespace after export snippets without a transcoder


From: Ihor Radchenko
Subject: Re: Trailing whitespace after export snippets without a transcoder
Date: Mon, 22 Apr 2024 19:01:07 +0000

Max Nikulin <manikulin@gmail.com> writes:

>> I do not think that we need to handle this Org mode-wide (it will be
>> difficult and will likely cause breaking changes).
>
> I have not figured out why it may become a breaking changes and what 
> backends need blank lines inside paragraph. I would make stripping empty 
> lines default behavior with some option to disable this feature.

For example, consider an HTML exporter that aligns tags nicely and keeps
blank lines between markup blocks for readability.  If we remove such
blank lines unconditionally, it will be problematic.

>> See the attached tentative fix.
>
> Since zero width spaces are part of Org syntax, they need special treatment.

They are not a part of Org syntax, and we currently do not handle them
specially. They still work as escape-character simply because Org syntax
defines markup boundaries using a closed set of whitespace characters -
(rx (any " \t")). So, any non-tab non-space whitespace will be an
equivalent of zero-width space for all practical purposes.

> ---- 8< ----
> #+macro: empty (eval "")
>
> Some *bold*​@@comment: *@@ text.
> @@comment: line@@
> More /italic/​{{{empty}}} text.
> {{{empty}}}
> Last line.
> ---- >8 ----
>
> LaTeX export:
> ---- 8< ----
> Some \textbf{bold}​text.
> More \emph{italic}​ text.
>
> Last line.
> ---- >8 ----
>
> Notice visible space character disappeared after "bold".

I guess that I can change the condition to not include trailing space
from (rx whitespace eol) to (rx (any " \t|) eol).

See the attached updated version of the patch set.

> ... I am leaving up 
> to you to decide if empty line appeared due to a macro is a bug or a 
> feature. If I remember it correctly, your opinion is that a macro 
> expanding to multiple paragraphs is a valid one.

Yes. I do believe that we should keep macros as dumb as possible, so
that people can use them in the most flexible ways, including breaking
paragraphs, if so desired.

A more annoying one is

First line
@@comment:foo@@
last line.

vs.

First line
@@comment:foo
@@last line.

where we encounter the peculiarity of Org syntax with trailing tabs and
spaces included as part of the object, but not newlines.

But I do not see any good way to address this problem without rewriting
half of Org mode.

>From 229a563dc38e1fdfd63be2dfebb1a9e9023e44b2 Mon Sep 17 00:00:00 2001
Message-ID: 
<229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 21 Apr 2024 15:37:18 +0300
Subject: [PATCH v2 1/2] org-export-data: Handle trailing spaces when
 transcoder returns nil

* lisp/ox.el (org-export--keep-spaces): New helper function containing
logic about keeping spaces in place of removed object from
`org-export--prune-tree'.  The logic is modified to keep spaces in the
case when previous plain-string object ends with a whitespace, but not
" " or "\t".  This can happen, for example, when there is a trailing
zero-width space.  We do want to keep spaces in such scenario.
(org-export-data): When transcoder returns nil, handle
trailing spaces after an object the same way `org-export--prune-tree'
does.  Remove special handling of export snippets that unconditionally
keep their trailing spaces.
(org-export--prune-tree): Use the helper function.

Link: https://orgmode.org/list/87h6fwmgkm.fsf@localhost
---
 lisp/ox.el | 67 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/lisp/ox.el b/lisp/ox.el
index fc746950d..6f6689188 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -1880,6 +1880,38 @@ (defun org-export-transcoder (blob info)
       (let ((transcoder (cdr (assq type (plist-get info :translate-alist)))))
        (and (functionp transcoder) transcoder)))))
 
+(defun org-export--keep-spaces (data info)
+  "Non-nil, when post-blank spaces after removing DATA should be preserved.
+INFO is the info channel.
+
+This function returns nil, when previous exported element already has
+trailing spaces or when DATA does not have non-zero non-nil
+`:post-blank' property.
+
+When the return value is non-nil, it is a string containing the trailing
+spaces."
+  ;; When DATA is an object, interpret this as if DATA should be
+  ;; ignored (see `org-export--prune-tree').  Keep spaces in place of
+  ;; removed element, if necessary.  Example: "Foo.[10%] Bar" would
+  ;; become "Foo.Bar" if we do not keep spaces.  Another example: "A
+  ;; space@@ascii:*@@ character."  should become "A space character"
+  ;; in non-ASCII export.
+  (let ((post-blank (org-element-post-blank data)))
+    (unless (or (not post-blank)
+                (zerop post-blank)
+                (eq 'element (org-element-class data)))
+      (let ((previous (org-export-get-previous-element data info)))
+       (unless (or (not previous)
+                   (pcase (org-element-type previous)
+                     (`plain-text
+                      (string-match-p
+                       (rx (any " \t") eos) previous))
+                     (_ (org-element-post-blank previous))))
+          ;; When previous element does not have
+          ;; trailing spaces, keep the trailing
+          ;; spaces from DATA.
+         (make-string post-blank ?\s))))))
+
 ;;;###autoload
 (defun org-export-data (data info)
   "Convert DATA into current backend format.
@@ -1930,15 +1962,11 @@ (defun org-export-data (data info)
                           (eq (plist-get info :with-archived-trees) 'headline)
                           (org-element-property :archivedp data)))
                  (let ((transcoder (org-export-transcoder data info)))
-                   (or (and (functionp transcoder)
-                             (if (eq type 'link)
-                                (broken-link-handler
-                                 (funcall transcoder data nil info))
-                               (funcall transcoder data nil info)))
-                       ;; Export snippets never return a nil value so
-                       ;; that white spaces following them are never
-                       ;; ignored.
-                       (and (eq type 'export-snippet) ""))))
+                   (and (functionp transcoder)
+                         (if (eq type 'link)
+                            (broken-link-handler
+                             (funcall transcoder data nil info))
+                           (funcall transcoder data nil info)))))
                 ;; Element/Object with contents.
                 (t
                  (let ((transcoder (org-export-transcoder data info)))
@@ -1979,8 +2007,8 @@ (defun org-export-data (data info)
          (puthash
           data
           (cond
-           ((not results) "")
-           ((memq type '(nil org-data plain-text raw)) results)
+           ((not results) (or (org-export--keep-spaces data info) ""))
+            ((memq type '(nil org-data plain-text raw)) results)
            ;; Append the same white space between elements or objects
            ;; as in the original buffer, and call appropriate filters.
            (t
@@ -2641,24 +2669,13 @@ (defun org-export--prune-tree (data info)
                (let ((type (org-element-type data)))
                  (if (org-export--skip-p data info selected excluded)
                      (if (memq type '(table-cell table-row)) (push data ignore)
-                       (let ((post-blank (org-element-post-blank data)))
-                         (if (or (not post-blank) (zerop post-blank)
-                                 (eq 'element (org-element-class data)))
-                             (org-element-extract data)
+                        (if-let ((keep-spaces (org-export--keep-spaces data 
info)))
                            ;; Keep spaces in place of removed
                            ;; element, if necessary.
                            ;; Example: "Foo.[10%] Bar" would become
                            ;; "Foo.Bar" if we do not keep spaces.
-                           (let ((previous (org-export-get-previous-element 
data info)))
-                             (if (or (not previous)
-                                     (pcase (org-element-type previous)
-                                       (`plain-text
-                                        (string-match-p
-                                         (rx  whitespace eos) previous))
-                                       (_ (org-element-post-blank previous))))
-                                 ;; Previous object ends with whitespace 
already.
-                                 (org-element-extract data)
-                               (org-element-set data (make-string post-blank 
?\s)))))))
+                            (org-element-set data keep-spaces)
+                         (org-element-extract data)))
                    (if (and (eq type 'headline)
                             (eq (plist-get info :with-archived-trees)
                                 'headline)
-- 
2.44.0

>From 3fa3ed068fcfc58470430a5c4bae3a5ffd1ca3ed Mon Sep 17 00:00:00 2001
Message-ID: 
<3fa3ed068fcfc58470430a5c4bae3a5ffd1ca3ed.1713812419.git.yantar92@posteo.net>
In-Reply-To: 
<229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
References: 
<229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 21 Apr 2024 15:54:48 +0300
Subject: [PATCH v2 2/2] ox-md, ox-ascii, ox-texinfo: Strip blank lines from
 paragraphs

* lisp/org-macs.el (org-remove-blank-lines): New helper function to
strip blank lines from string.
* lisp/ox-ascii.el (org-ascii-paragraph):
* lisp/ox-latex.el (org-latex-paragraph):
* lisp/ox-md.el (org-md-paragraph):
* lisp/ox-texinfo.el (org-texinfo-paragraph): Strip blank lines from
paragraphs - these exporters are using blank lines as paragraph
separators.

Reported-by: Max Nikulin <manikulin@gmail.com>
Link: https://orgmode.org/list/v00le7$frp$1@ciao.gmane.io
---
 lisp/org-macs.el   | 4 ++++
 lisp/ox-ascii.el   | 6 ++++++
 lisp/ox-latex.el   | 4 +---
 lisp/ox-md.el      | 6 ++++++
 lisp/ox-texinfo.el | 7 ++++++-
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/lisp/org-macs.el b/lisp/org-macs.el
index 1254ddb54..93803bfe9 100644
--- a/lisp/org-macs.el
+++ b/lisp/org-macs.el
@@ -1244,6 +1244,10 @@ (defun org-remove-tabs (s &optional width)
             t t s)))
   s)
 
+(defun org-remove-blank-lines (s)
+  "Remove blank lines in S."
+  (replace-regexp-in-string (rx "\n" (1+ (0+ space) "\n")) "\n" s))
+
 (defun org-wrap (string &optional width lines)
   "Wrap string to either a number of lines, or a width in characters.
 If WIDTH is non-nil, the string is wrapped to that width, however many lines
diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index db4356ec6..e767f66cf 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -1651,6 +1651,12 @@ (defun org-ascii-paragraph (paragraph contents info)
   "Transcode a PARAGRAPH element from Org to ASCII.
 CONTENTS is the contents of the paragraph, as a string.  INFO is
 the plist used as a communication channel."
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (setq contents (org-remove-blank-lines contents))
   (org-ascii--justify-element
    (let ((indented-line-width (plist-get info :ascii-indented-line-width)))
      (if (not (wholenump indented-line-width)) contents
diff --git a/lisp/ox-latex.el b/lisp/ox-latex.el
index 5c19e1fe7..2267a604e 100644
--- a/lisp/ox-latex.el
+++ b/lisp/ox-latex.el
@@ -3039,9 +3039,7 @@ (defun org-latex-paragraph (_paragraph contents _info)
   ;; Multiple newlines may appear in CONTENTS, for example, when
   ;; certain objects are stripped from export, leaving single newlines
   ;; before and after.
-  (replace-regexp-in-string
-   (rx "\n" (1+ (0+ space) "\n")) "\n"
-   contents))
+  (org-remove-blank-lines contents))
 
 
 ;;;; Plain List
diff --git a/lisp/ox-md.el b/lisp/ox-md.el
index fa2beeb95..28f0a4cf6 100644
--- a/lisp/ox-md.el
+++ b/lisp/ox-md.el
@@ -628,6 +628,12 @@ (defun org-md-paragraph (paragraph contents _info)
   "Transcode PARAGRAPH element into Markdown format.
 CONTENTS is the paragraph contents.  INFO is a plist used as
 a communication channel."
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (setq contents (org-remove-blank-lines contents))
   (let ((first-object (car (org-element-contents paragraph))))
     ;; If paragraph starts with a #, protect it.
     (if (and (stringp first-object) (string-prefix-p "#" first-object))
diff --git a/lisp/ox-texinfo.el b/lisp/ox-texinfo.el
index 4aef9c41c..fc9ec9209 100644
--- a/lisp/ox-texinfo.el
+++ b/lisp/ox-texinfo.el
@@ -1517,7 +1517,12 @@ (defun org-texinfo-paragraph (_paragraph contents _info)
   "Transcode a PARAGRAPH element from Org to Texinfo.
 CONTENTS is the contents of the paragraph, as a string.  INFO is
 the plist used as a communication channel."
-  contents)
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (org-remove-blank-lines contents))
 
 ;;;; Plain List
 
-- 
2.44.0

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]