* guile-lib <p> handling
@ 2020-03-16 13:40 Michal Herko
2020-03-16 17:15 ` tomas
2020-03-16 17:16 ` sirgazil
0 siblings, 2 replies; 3+ messages in thread
From: Michal Herko @ 2020-03-16 13:40 UTC (permalink / raw)
To: guile-user
[-- Attachment #1: Type: text/plain, Size: 457 bytes --]
Dear maintainor of guile-lib.
I believe the special handling of <p> elements in (htmlprag) module to
be a bug.
For example:
(use-modules (htmlprag))
(html->shtml "<html><body><div><p>text</p></div></body></html>")
; expected result (*TOP* (html (body (div (p "text")))))
; actual (*TOP* (html (body (div) (p "text"))))
Note that the <p> element is parsed outside the <div> element.
I attach the simple patch to remove the special case for <p> elements.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 405 bytes --]
diff --git a/src/htmlprag.scm b/src/htmlprag.scm
index 3bd352b..df99612 100644
--- a/src/htmlprag.scm
+++ b/src/htmlprag.scm
@@ -1099,7 +1099,6 @@
(meta . (head))
(noframes . (frameset))
(option . (select))
- (p . (body td th))
(param . (applet))
(tbody . (table))
(td . (tr))
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: guile-lib <p> handling
2020-03-16 13:40 guile-lib <p> handling Michal Herko
@ 2020-03-16 17:15 ` tomas
2020-03-16 17:16 ` sirgazil
1 sibling, 0 replies; 3+ messages in thread
From: tomas @ 2020-03-16 17:15 UTC (permalink / raw)
To: guile-user
[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]
On Mon, Mar 16, 2020 at 02:40:45PM +0100, Michal Herko wrote:
> Dear maintainor of guile-lib.
> I believe the special handling of <p> elements in (htmlprag) module
> to be a bug.
> For example:
>
> (use-modules (htmlprag))
> (html->shtml "<html><body><div><p>text</p></div></body></html>")
> ; expected result (*TOP* (html (body (div (p "text")))))
> ; actual (*TOP* (html (body (div) (p "text"))))
>
> Note that the <p> element is parsed outside the <div> element.
> I attach the simple patch to remove the special case for <p> elements.
>
> diff --git a/src/htmlprag.scm b/src/htmlprag.scm
> index 3bd352b..df99612 100644
> --- a/src/htmlprag.scm
> +++ b/src/htmlprag.scm
> @@ -1099,7 +1099,6 @@
> (meta . (head))
> (noframes . (frameset))
> (option . (select))
> - (p . (body td th))
> (param . (applet))
> (tbody . (table))
> (td . (tr))
Where did you get htmlprag from? I'll guess it's from the
Debian package guile-lib.
It seems the upstream isn't maintained anymore [1]. The
Debian package page [2] lists a maintainer you might want
to contact.
That said, you are modifying the parser's "parent constraints";
I would go the other direction and add <span> to the set of
<p>'s possible parents:
> (option . (select))
> - (p . (body td th))
> + (p . (body td th span))
> (param . (applet))
HTML has changed a lot since htmlprag saw its heyday.
Cheers
[1] https://planet.racket-lang.org/package-source/neil/htmlprag.plt/1/7/planet-docs/htmlprag/index.html
[2] https://packages.debian.org/buster/guile-library
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: guile-lib <p> handling
2020-03-16 13:40 guile-lib <p> handling Michal Herko
2020-03-16 17:15 ` tomas
@ 2020-03-16 17:16 ` sirgazil
1 sibling, 0 replies; 3+ messages in thread
From: sirgazil @ 2020-03-16 17:16 UTC (permalink / raw)
To: Michal Herko; +Cc: guile-user
---- On Mon, 16 Mar 2020 08:40:45 -0500 Michal Herko <michal.herko@disroot.org> wrote ----
> Dear maintainor of guile-lib.
> I believe the special handling of <p> elements in (htmlprag) module to
> be a bug.
> For example:
>
> (use-modules (htmlprag))
> (html->shtml "<html><body><div><p>text</p></div></body></html>")
> ; expected result (*TOP* (html (body (div (p "text")))))
> ; actual (*TOP* (html (body (div) (p "text"))))
>
> Note that the <p> element is parsed outside the <div> element.
> I attach the simple patch to remove the special case for <p> elements.
>
>
Sounds familiar. I think I actually stopped using htmlprag a few months ago because of this, and I think this is an older thread about the problem:
https://lists.gnu.org/archive/html/guile-user/2019-09/msg00009.html
I'm currently using SXML whenever possible.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-03-16 17:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 13:40 guile-lib <p> handling Michal Herko
2020-03-16 17:15 ` tomas
2020-03-16 17:16 ` sirgazil
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).