* bug#71685: [PATCH] fix shr rendering in tables without tbody @ 2024-06-20 19:15 JD Smith 2024-07-06 7:36 ` Eli Zaretskii 0 siblings, 1 reply; 4+ messages in thread From: JD Smith @ 2024-06-20 19:15 UTC (permalink / raw) To: 71685 [-- Attachment #1: Type: text/plain, Size: 1080 bytes --] It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell. The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body. (let ((shr-table-vertical-line ?|) (shr-table-horizontal-line ?-)) (shr-insert-document (with-temp-buffer (insert "<table> <thead><tr><th>A</th><th>B</th></tr></thead> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> </table>") (libxml-parse-html-region)))) --------- | --- -- | ||A |B | | | --- -- | ||AB | | | --- -- | ||1 |2 | | | --- -- | ||3 |4 | | | --- -- | --------- [-- Attachment #2: shr_fix_tbody.patch --] [-- Type: application/octet-stream, Size: 595 bytes --] --- shr.el 2024-06-20 15:03:52 +++ shr_fix_tbody.el 2024-06-20 15:00:49 @@ -2053,8 +2053,9 @@ (defun shr--fix-tbody (tbody) (nconc (list 'tbody (dom-attributes tbody)) (cl-loop for child in (dom-children tbody) - collect (if (or (stringp child) - (not (eq (dom-tag child) 'tr))) + for tag = (and (not (stringp child)) (dom-tag child)) + unless (or (eq tag 'thead) (eq tag 'tfoot)) + collect (if (not (eq tag 'tr)) (list 'tr nil (list 'td nil child)) child)))) ^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody 2024-06-20 19:15 bug#71685: [PATCH] fix shr rendering in tables without tbody JD Smith @ 2024-07-06 7:36 ` Eli Zaretskii 2024-07-06 18:13 ` JD Smith 0 siblings, 1 reply; 4+ messages in thread From: Eli Zaretskii @ 2024-07-06 7:36 UTC (permalink / raw) To: JD Smith; +Cc: 71685 > From: JD Smith <jdtsmith@gmail.com> > Date: Thu, 20 Jun 2024 15:15:32 -0400 > > It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell. > > The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body. Thanks. I don't see any experts chiming in, so if you are confident in the patch, and it doesn't fail the existing tests, please install on the emacs-30 branch, and thanks. Bonus points for adding a test for this case. ^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody 2024-07-06 7:36 ` Eli Zaretskii @ 2024-07-06 18:13 ` JD Smith 2024-07-06 19:11 ` Stefan Kangas 0 siblings, 1 reply; 4+ messages in thread From: JD Smith @ 2024-07-06 18:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 71685 [-- Attachment #1: Type: text/plain, Size: 1172 bytes --] > On Jul 6, 2024, at 3:36 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: JD Smith <jdtsmith@gmail.com> >> Date: Thu, 20 Jun 2024 15:15:32 -0400 >> >> It is very common for HTML tables to include a header (<thead>) and/or footer (<tfoot>) without using <tbody>. Modern browsers simply supply an implicit <tbody>..</tbody> around all the unparented rows in a table. `shr' does not handle this common case correctly. Below is an example with <thead> but not <tbody>. It prints the header, but then subsumes it again inside the derived body, printing the header again in a single cell. >> >> The relevant function which should handle this is `shr--fix-tbody'. The included patch to this function simply avoids including `thead` and `tfoot` children in the implicit body. > > Thanks. I don't see any experts chiming in, so if you are confident > in the patch, and it doesn't fail the existing tests, please install > on the emacs-30 branch, and thanks. Bonus points for adding a test > for this case. Thanks. I'm afraid I don't have write access on savannah. I've added a test and formatted the patch, below. All shr tests succeed. [-- Attachment #2: 0001-Fix-formatting-of-tables-with-thead-tfoot-but-no-tbo.patch --] [-- Type: application/octet-stream, Size: 2349 bytes --] From 623ecf07dc1b215cbc98f5804d58b571a649e9ba Mon Sep 17 00:00:00 2001 From: JD Smith <93749+jdtsmith@users.noreply.github.com> Date: Sat, 6 Jul 2024 09:22:33 -0400 Subject: [PATCH] Fix formatting of tables with thead/tfoot but no tbody Correctly handle formatting of tables containing thead and/or tfoot, but without any tbody, to prevent including thead/tfoot content twice within the table's derived body (Bug#71685). * lisp/net.shr.el (shr--fix-tbody): Omit thead/tfoot from implicit body * test/lisp/net/shr-resources/table.html: * test/lisp/net/shr-resources/table.txt: Added table rendering test. --- lisp/net/shr.el | 5 +++-- test/lisp/net/shr-resources/table.html | 7 +++++++ test/lisp/net/shr-resources/table.txt | 5 +++++ 3 files changed, 15 insertions(+), 2 deletions(-) create mode 100644 test/lisp/net/shr-resources/table.html create mode 100644 test/lisp/net/shr-resources/table.txt diff --git a/lisp/net/shr.el b/lisp/net/shr.el index 3dadcb9a09b..fb72ea6aa67 100644 --- a/lisp/net/shr.el +++ b/lisp/net/shr.el @@ -2261,8 +2261,9 @@ shr-table-body (defun shr--fix-tbody (tbody) (nconc (list 'tbody (dom-attributes tbody)) (cl-loop for child in (dom-children tbody) - collect (if (or (stringp child) - (not (eq (dom-tag child) 'tr))) + for tag = (and (not (stringp child)) (dom-tag child)) + unless (or (eq tag 'thead) (eq tag 'tfoot)) + collect (if (not (eq tag 'tr)) (list 'tr nil (list 'td nil child)) child)))) diff --git a/test/lisp/net/shr-resources/table.html b/test/lisp/net/shr-resources/table.html new file mode 100644 index 00000000000..c5e8875ac91 --- /dev/null +++ b/test/lisp/net/shr-resources/table.html @@ -0,0 +1,7 @@ +<table> +<thead><tr><th>A</th><th>B</th></tr></thead> +<tr><td>1</td><td>2</td></tr> +<tr><td>3</td><td>4</td></tr> +5678 +<tfoot><tr><th>A</th><th>B</th></tr></tfoot> +</table> diff --git a/test/lisp/net/shr-resources/table.txt b/test/lisp/net/shr-resources/table.txt new file mode 100644 index 00000000000..70939effb63 --- /dev/null +++ b/test/lisp/net/shr-resources/table.txt @@ -0,0 +1,5 @@ + A B + 1 2 + 3 4 + 5678 + A B -- 2.43.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#71685: [PATCH] fix shr rendering in tables without tbody 2024-07-06 18:13 ` JD Smith @ 2024-07-06 19:11 ` Stefan Kangas 0 siblings, 0 replies; 4+ messages in thread From: Stefan Kangas @ 2024-07-06 19:11 UTC (permalink / raw) To: JD Smith, Eli Zaretskii; +Cc: 71685-done Version: 30.1 JD Smith <jdtsmith@gmail.com> writes: > I've added a test and formatted the patch, below. All shr tests > succeed. Thanks, installed on emacs-30 as commit 9625e4af994. I'm therefore closing this bug report. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-06 19:11 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-20 19:15 bug#71685: [PATCH] fix shr rendering in tables without tbody JD Smith 2024-07-06 7:36 ` Eli Zaretskii 2024-07-06 18:13 ` JD Smith 2024-07-06 19:11 ` Stefan Kangas
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).