unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters.
@ 2024-01-16 13:24 Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-16 13:47 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-16 13:24 UTC (permalink / raw)
  To: 68508

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Tags: patch

This makes `dom-print` encode HTML reserved characters that occur in
string elements of the DOM, to ensure the validity of the result.

For example, put the following in `foo.html`:

--8<---------------cut here---------------start------------->8---
<html><body>
Add ‘<samp class="samp">&lt;div class="default"&gt; &lt;/div&gt;</samp>’ tags around the fontified body.
<body><html>
--8<---------------cut here---------------end--------------->8---
(Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlfontify.html)

Open that file in Emacs and say `M-: (require 'dom)` and then
`(dom-print (libxml-parse-html-region))` in the HTML buffer.  This
produces invalid HTML since `libxml-parse-html-region` correctly decodes
HTML entities, but `dom-print` doesn't encode (without this patch).




[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-dom-print-Use-HTML-entities-for-reserved-characters.patch --]
[-- Type: text/patch, Size: 704 bytes --]

From 259c0138623c352acc7bcd79a1fda42ec606a0cf Mon Sep 17 00:00:00 2001
From: Eshel Yaron <me@eshelyaron.com>
Date: Fri, 5 Jan 2024 16:40:44 +0100
Subject: [PATCH] ; (dom-print): Use HTML entities for reserved characters.

---
 lisp/dom.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/dom.el b/lisp/dom.el
index f7043ba8252..b329379fdc3 100644
--- a/lisp/dom.el
+++ b/lisp/dom.el
@@ -288,7 +288,7 @@ dom-print
 	(insert ">")
         (dolist (child children)
 	  (if (stringp child)
-	      (insert child)
+	      (insert (url-insert-entities-in-string child))
 	    (setq non-text t)
 	    (when pretty
               (insert "\n" (make-string (+ column 2) ?\s)))
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters.
  2024-01-16 13:24 bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-16 13:47 ` Eli Zaretskii
  2024-01-16 16:29   ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-01-16 13:47 UTC (permalink / raw)
  To: Eshel Yaron; +Cc: 68508

> Date: Tue, 16 Jan 2024 14:24:40 +0100
> From:  Eshel Yaron via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> This makes `dom-print` encode HTML reserved characters that occur in
> string elements of the DOM, to ensure the validity of the result.
> 
> For example, put the following in `foo.html`:
> 
> --8<---------------cut here---------------start------------->8---
> <html><body>
> Add ‘<samp class="samp">&lt;div class="default"&gt; &lt;/div&gt;</samp>’ tags around the fontified body.
> <body><html>
> --8<---------------cut here---------------end--------------->8---
> (Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlfontify.html)
> 
> Open that file in Emacs and say `M-: (require 'dom)` and then
> `(dom-print (libxml-parse-html-region))` in the HTML buffer.  This
> produces invalid HTML since `libxml-parse-html-region` correctly decodes
> HTML entities, but `dom-print` doesn't encode (without this patch).

Thanks, but could you please also add tests for this?





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters.
  2024-01-16 13:47 ` Eli Zaretskii
@ 2024-01-16 16:29   ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-20  9:42     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-16 16:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68508

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Tue, 16 Jan 2024 14:24:40 +0100
>> From:  Eshel Yaron via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>>
>> This makes `dom-print` encode HTML reserved characters that occur in
>> string elements of the DOM, to ensure the validity of the result.
>>
>> For example, put the following in `foo.html`:
>>
>> --8<---------------cut here---------------start------------->8---
>> <html><body>
>> Add ‘<samp class="samp">&lt;div class="default"&gt; &lt;/div&gt;</samp>’ tags around the fontified body.
>> <body><html>
>> --8<---------------cut here---------------end--------------->8---
>> (Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlfontify.html)
>>
>> Open that file in Emacs and say `M-: (require 'dom)` and then
>> `(dom-print (libxml-parse-html-region))` in the HTML buffer.  This
>> produces invalid HTML since `libxml-parse-html-region` correctly decodes
>> HTML entities, but `dom-print` doesn't encode (without this patch).
>
> Thanks, but could you please also add tests for this?

Sure, I've added a test to dom-tests.el in the updated patch below.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0001-Use-HTML-entities-for-reserved-characters-in-dom-.patch --]
[-- Type: text/x-patch, Size: 1732 bytes --]

From 8d60074053ee1ebc04fc3fda417d53ddc5a4fac9 Mon Sep 17 00:00:00 2001
From: Eshel Yaron <me@eshelyaron.com>
Date: Fri, 5 Jan 2024 16:40:44 +0100
Subject: [PATCH v2] ; Use HTML entities for reserved characters in 'dom-print'

* lisp/dom.el (dom-print): Encode HTML reserved characters in strings.
* test/lisp/dom-tests.el (dom-tests-print): New test.  (Bug#68508)
---
 lisp/dom.el            |  2 +-
 test/lisp/dom-tests.el | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lisp/dom.el b/lisp/dom.el
index f7043ba8252..b329379fdc3 100644
--- a/lisp/dom.el
+++ b/lisp/dom.el
@@ -288,7 +288,7 @@ dom-print
 	(insert ">")
         (dolist (child children)
 	  (if (stringp child)
-	      (insert child)
+	      (insert (url-insert-entities-in-string child))
 	    (setq non-text t)
 	    (when pretty
               (insert "\n" (make-string (+ column 2) ?\s)))
diff --git a/test/lisp/dom-tests.el b/test/lisp/dom-tests.el
index 8cbfb9ad9df..a4e913541bf 100644
--- a/test/lisp/dom-tests.el
+++ b/test/lisp/dom-tests.el
@@ -209,6 +209,16 @@ dom-tests-pp
       (dom-pp node t)
       (should (equal (buffer-string) "(\"foo\" nil)")))))
 
+(ert-deftest dom-tests-print ()
+  "Test that `dom-print' correctly encodes HTML reserved characters."
+  (with-temp-buffer
+    (dom-print '(samp ((class . "samp")) "<div class=\"default\"> </div>"))
+    (should (equal
+             (buffer-string)
+             (concat "<samp class=\"samp\">"
+                     "&lt;div class=&quot;default&quot;&gt; &lt;/div&gt;"
+                     "</samp>")))))
+
 (ert-deftest dom-test-search ()
   (let ((dom '(a nil (b nil (c nil)))))
     (should (equal (dom-search dom (lambda (d) (eq (dom-tag d) 'a)))
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters.
  2024-01-16 16:29   ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-20  9:42     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2024-01-20  9:42 UTC (permalink / raw)
  To: Eshel Yaron; +Cc: 68508-done

> From: Eshel Yaron <me@eshelyaron.com>
> Cc: 68508@debbugs.gnu.org
> Date: Tue, 16 Jan 2024 17:29:12 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Thanks, but could you please also add tests for this?
> 
> Sure, I've added a test to dom-tests.el in the updated patch below.

Thanks, installed on master, and closing the bug.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-20  9:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16 13:24 bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-16 13:47 ` Eli Zaretskii
2024-01-16 16:29   ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-20  9:42     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).