* bug#71685: [PATCH] fix shr rendering in tables without tbody
@ 2024-06-20 19:15 JD Smith
2024-07-06 7:36 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: JD Smith @ 2024-06-20 19:15 UTC (permalink / raw)
To: 71685
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]
It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell.
The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body.
(let ((shr-table-vertical-line ?|)
(shr-table-horizontal-line ?-))
(shr-insert-document
(with-temp-buffer
(insert "<table>
<thead><tr><th>A</th><th>B</th></tr></thead>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
</table>")
(libxml-parse-html-region))))
---------
| --- -- |
||A |B | |
| --- -- |
||AB | |
| --- -- |
||1 |2 | |
| --- -- |
||3 |4 | |
| --- -- |
---------
[-- Attachment #2: shr_fix_tbody.patch --]
[-- Type: application/octet-stream, Size: 595 bytes --]
--- shr.el 2024-06-20 15:03:52
+++ shr_fix_tbody.el 2024-06-20 15:00:49
@@ -2053,8 +2053,9 @@
(defun shr--fix-tbody (tbody)
(nconc (list 'tbody (dom-attributes tbody))
(cl-loop for child in (dom-children tbody)
- collect (if (or (stringp child)
- (not (eq (dom-tag child) 'tr)))
+ for tag = (and (not (stringp child)) (dom-tag child))
+ unless (or (eq tag 'thead) (eq tag 'tfoot))
+ collect (if (not (eq tag 'tr))
(list 'tr nil (list 'td nil child))
child))))
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody
2024-06-20 19:15 bug#71685: [PATCH] fix shr rendering in tables without tbody JD Smith
@ 2024-07-06 7:36 ` Eli Zaretskii
2024-07-06 18:13 ` JD Smith
0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-07-06 7:36 UTC (permalink / raw)
To: JD Smith; +Cc: 71685
> From: JD Smith <jdtsmith@gmail.com>
> Date: Thu, 20 Jun 2024 15:15:32 -0400
>
> It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell.
>
> The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body.
Thanks. I don't see any experts chiming in, so if you are confident
in the patch, and it doesn't fail the existing tests, please install
on the emacs-30 branch, and thanks. Bonus points for adding a test
for this case.
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody
2024-07-06 7:36 ` Eli Zaretskii
@ 2024-07-06 18:13 ` JD Smith
2024-07-06 19:11 ` Stefan Kangas
0 siblings, 1 reply; 4+ messages in thread
From: JD Smith @ 2024-07-06 18:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 71685
[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]
> On Jul 6, 2024, at 3:36 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: JD Smith <jdtsmith@gmail.com>
>> Date: Thu, 20 Jun 2024 15:15:32 -0400
>>
>> It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell.
>>
>> The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body.
>
> Thanks. I don't see any experts chiming in, so if you are confident
> in the patch, and it doesn't fail the existing tests, please install
> on the emacs-30 branch, and thanks. Bonus points for adding a test
> for this case.
Thanks. I'm afraid I don't have write access on savannah. I've added a test and formatted the patch, below. All shr tests succeed.
[-- Attachment #2: 0001-Fix-formatting-of-tables-with-thead-tfoot-but-no-tbo.patch --]
[-- Type: application/octet-stream, Size: 2349 bytes --]
From 623ecf07dc1b215cbc98f5804d58b571a649e9ba Mon Sep 17 00:00:00 2001
From: JD Smith <93749+jdtsmith@users.noreply.github.com>
Date: Sat, 6 Jul 2024 09:22:33 -0400
Subject: [PATCH] Fix formatting of tables with thead/tfoot but no tbody
Correctly handle formatting of tables containing thead and/or tfoot, but
without any tbody, to prevent including thead/tfoot content twice within
the table's derived body (Bug#71685).
* lisp/net.shr.el (shr--fix-tbody): Omit thead/tfoot from implicit body
* test/lisp/net/shr-resources/table.html:
* test/lisp/net/shr-resources/table.txt:
Added table rendering test.
---
lisp/net/shr.el | 5 +++--
test/lisp/net/shr-resources/table.html | 7 +++++++
test/lisp/net/shr-resources/table.txt | 5 +++++
3 files changed, 15 insertions(+), 2 deletions(-)
create mode 100644 test/lisp/net/shr-resources/table.html
create mode 100644 test/lisp/net/shr-resources/table.txt
diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index 3dadcb9a09b..fb72ea6aa67 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -2261,8 +2261,9 @@ shr-table-body
(defun shr--fix-tbody (tbody)
(nconc (list 'tbody (dom-attributes tbody))
(cl-loop for child in (dom-children tbody)
- collect (if (or (stringp child)
- (not (eq (dom-tag child) 'tr)))
+ for tag = (and (not (stringp child)) (dom-tag child))
+ unless (or (eq tag 'thead) (eq tag 'tfoot))
+ collect (if (not (eq tag 'tr))
(list 'tr nil (list 'td nil child))
child))))
diff --git a/test/lisp/net/shr-resources/table.html b/test/lisp/net/shr-resources/table.html
new file mode 100644
index 00000000000..c5e8875ac91
--- /dev/null
+++ b/test/lisp/net/shr-resources/table.html
@@ -0,0 +1,7 @@
+<table>
+<thead><tr><th>A</th><th>B</th></tr></thead>
+<tr><td>1</td><td>2</td></tr>
+<tr><td>3</td><td>4</td></tr>
+5678
+<tfoot><tr><th>A</th><th>B</th></tr></tfoot>
+</table>
diff --git a/test/lisp/net/shr-resources/table.txt b/test/lisp/net/shr-resources/table.txt
new file mode 100644
index 00000000000..70939effb63
--- /dev/null
+++ b/test/lisp/net/shr-resources/table.txt
@@ -0,0 +1,5 @@
+ A B
+ 1 2
+ 3 4
+ 5678
+ A B
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody
2024-07-06 18:13 ` JD Smith
@ 2024-07-06 19:11 ` Stefan Kangas
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Kangas @ 2024-07-06 19:11 UTC (permalink / raw)
To: JD Smith, Eli Zaretskii; +Cc: 71685-done
Version: 30.1
JD Smith <jdtsmith@gmail.com> writes:
> I've added a test and formatted the patch, below. All shr tests
> succeed.
Thanks, installed on emacs-30 as commit 9625e4af994.
I'm therefore closing this bug report.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-06 19:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 19:15 bug#71685: [PATCH] fix shr rendering in tables without tbody JD Smith
2024-07-06 7:36 ` Eli Zaretskii
2024-07-06 18:13 ` JD Smith
2024-07-06 19:11 ` Stefan Kangas
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.