[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#48211: 28.0.50; eww strips whitespace between <mark> elements
From: |
Stefan Kangas |
Subject: |
bug#48211: 28.0.50; eww strips whitespace between <mark> elements |
Date: |
Mon, 3 May 2021 19:35:35 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
"Basil L. Contovounesios" <contovob@tcd.ie> writes:
> I think this is because libxml-parse-html-region specifies
> HTML_PARSE_NOBLANKS:
>
> Return CDATA sections (like <style>foo</style>) as text nodes.
> 3c2317e891 2010-12-06 17:59:52 +0100
> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=3c2317e89100833812a7194c0d9d39ae0f52cb33
Hmm, okay. For now, I'm seeing this issue with basically any tag that
libxml2 does not already know about, e.g. "<summary>" or "<bdi>".
This is what I came up with before reading Basil's reply:
(with-temp-buffer
(insert "<p><tt>foo</tt> <tt>bar</tt></p>")
(libxml-parse-html-region (point-min) (point-max)))
=> (html nil (body nil (p nil (tt nil "foo") " " (tt nil "bar"))))
(with-temp-buffer
(insert "<p><mark>foo</mark> <mark>bar</mark></p>")
(libxml-parse-html-region (point-min) (point-max)))
=> (html nil (body nil (p nil (mark nil "foo") (mark nil "bar"))))
I guess this is a bug in libxml2, so I reported it here:
https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
FWIW, the below diff works around this bug for me.
diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cbdeb65ba8..3eb3a5bc49 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1485,6 +1485,12 @@ shr-tag-tt
;; The `tt' tag is deprecated in favor of `code'.
(shr-tag-code dom))
+(defun shr-tag-mark (dom)
+ (shr-generic dom)
+ ;; Hack to work around bug in libxml2 (Bug#48211):
+ ;; https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
+ (insert " "))
+
(defun shr-tag-ins (cont)
(let* ((start (point))
(color "green")